Large Language Models (LLMs) like GPT-4, Claude, and Gemini are incredibly powerful, but what if you want your model to sound like your brand, understand your documents, or perform your specialised tasks?
Page Contents
That’s where fine-tuning comes in. Fine-tuning lets you train an existing LLM on your own data, giving it new domain knowledge or a unique conversational style, without starting from scratch. And the best part? You can do it on a budget, thanks to open-source models like LLaMA 3, Mistral, Falcon, and Gemma, combined with lightweight techniques like LoRA and QLoRA.
In this step-by-step guide, you’ll learn how to fine-tune an open-source LLM on your own dataset — affordably and effectively.
Why Fine-Tune Instead of Starting Fresh?
Training an LLM from scratch can cost millions in GPU hours and data collection. Fine-tuning, on the other hand, is like giving an already-trained model a “masterclass” in your specific domain.
For example:
- A healthcare startup can fine-tune an LLM on medical research papers.
- A SaaS company can fine-tune it on customer support tickets.
- A marketing team can fine-tune it on brand tone and writing style.
This results in faster, cheaper, and highly targeted performance.

Step 1. Choose Your Base Model
You’ll want an open-source model that fits your hardware and target use case. Here are a few great starting points:
Model | Size | Ideal Use Case |
---|---|---|
LLaMA 3 (Meta) | 8B–70B | General-purpose reasoning |
Mistral 7B | 7B | Fast, efficient, and multilingual |
Gemma (Google) | 2B–7B | Lightweight and easy to deploy |
Falcon 7B | 7B | Open-weight, strong for instruction tuning |
For this walkthrough, we’ll use Mistral 7B, as it’s small, powerful, and freely available on Hugging Face.
Step 2. Prepare Your Custom Dataset
Fine-tuning data is typically in JSONL (JSON Lines) format, where each line contains an instruction
, input
, and output
.
Example:
{"instruction": "Summarize the following article.", "input": "The Indian space program achieved...", "output": "ISRO successfully completed..."}
{"instruction": "Answer customer query about billing", "input": "How can I update my payment method?", "output": "You can update it under Account > Billing Settings."}
👉 Keep your data clean, concise, and aligned with the task you’re optimising for (e.g., summarisation, Q&A, dialogue).

Step 3. Environment Setup
Install the required dependencies:
pip install transformers datasets peft bitsandbytes accelerate
transformers
: Core model interfacedatasets
: For loading your datasetpeft
: For efficient fine-tuning using LoRAbitsandbytes
: Enables quantised training (saves GPU memory)accelerate
: Handles distributed/optimised training
Step 4. Load the Model and Tokeniser
You can load the base model from Hugging Face easily:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "mistralai/Mistral-7B-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
load_in_8bit=True,
device_map="auto"
)
This loads the model efficiently in 8-bit precision, significantly reducing GPU usage.
Step 5. Apply LoRA for Efficient Fine-Tuning
Instead of modifying all parameters, LoRA (Low-Rank Adaptation) adds a few small trainable layers on top of the base model, drastically reducing cost and time.
from peft import LoraConfig, get_peft_model
lora_config = LoraConfig(
r=16,
lora_alpha=32,
target_modules=["q_proj", "v_proj"],
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM"
)
model = get_peft_model(model, lora_config)
Step 6. Load Your Dataset
Load your prepared JSONL dataset:
from datasets import load_dataset
dataset = load_dataset("json", data_files="custom_data.jsonl")
train_data = dataset["train"].shuffle(seed=42)
Step 7. Fine-Tune the Model
Now, it’s time to train:
from transformers import TrainingArguments, Trainer
training_args = TrainingArguments(
output_dir="./mistral-finetuned",
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
learning_rate=2e-4,
num_train_epochs=3,
fp16=True,
logging_steps=50,
save_total_limit=2
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_data
)
trainer.train()
This setup utilises mixed precision (FP16) and LoRA to accelerate training and reduce costs, even on a single GPU like the RTX 3090.
Step 8. Test Your Fine-Tuned Model
After training, you can test how well it performs:
prompt = "Explain the refund process in simple terms."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=150)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Compare outputs before and after fine-tuning. You’ll notice how your model becomes more domain-specific and natural for your use case.
Step 9. Optimise for Cost
Fine-tuning doesn’t need expensive infrastructure.
Here are a few low-cost strategies:
- Use LoRA/QLoRA instead of full fine-tuning.
- Run on Google Colab Pro, Lambda Cloud, or RunPod.
- Use 4-bit quantisation (
bnb_4bit=True
) for smaller GPUs. - Fine-tune fewer epochs with high-quality data.
Even with a $20–$50 GPU rental, you can achieve excellent specialised results.
Step 10. Save and Deploy Your Model
Once trained, you can push your model to Hugging Face or run it locally:
model.push_to_hub("your-username/mistral-custom-finetuned")
tokenizer.push_to_hub("your-username/mistral-custom-finetuned")
Or, integrate it directly with LangChain or a FastAPI endpoint to build your custom AI app.
Final Thoughts
You’ve just fine-tuned an open-source LLM on your own data, without massive costs or a data centre! With LoRA and tools like Hugging Face Transformers, customising AI for your business or project has never been easier. Whether you’re building a medical assistant, a legal chatbot, or a brand voice generator, fine-tuning puts true personalisation within reach.
Recommended Resources
- Hugging Face Transformers Docs
- PEFT (Parameter-Efficient Fine-Tuning)
- Mistral 7B Model Card
- QLoRA Paper (2023)

Parvesh Sandila is a passionate web and Mobile app developer from Jalandhar, Punjab, who has over six years of experience. Holding a Master’s degree in Computer Applications (2017), he has also mentored over 100 students in coding. In 2019, Parvesh founded Owlbuddy.com, a platform that provides free, high-quality programming tutorials in languages like Java, Python, Kotlin, PHP, and Android. His mission is to make tech education accessible to all aspiring developers.