Fine-Tuning in 2026: The Definitive Guide
Everyone says "just use RAG." But sometimes, fine-tuning is the right call.
When to Fine-Tune
✅ Fine-tune when:
- You need consistent style/tone
- Domain-specific terminology is critical
- Latency matters (smaller model, no retrieval)
- You have 1000+ high-quality examples
❌ Don't fine-tune when:
- Your data changes frequently
- You need citations/sources
- You have less than 500 examples
- Prompt engineering solves it
The Modern Fine-Tuning Stack
1. Data Preparation
from datasets import Dataset
# Your training data
examples = [
{"input": "Customer: My order is late",
"output": "I apologize for the delay. Let me check your order status..."},
# ... 1000+ examples
]
dataset = Dataset.from_list(examples)
dataset.push_to_hub("your-org/customer-service-data")
2. Choose Your Base Model
| Use Case | Recommended Base | |----------|------------------| | General assistant | Llama 3.1 70B | | Code generation | CodeLlama 34B | | Fast inference | Mistral 7B | | Multilingual | Qwen 2.5 |
3. Fine-Tuning with Unsloth
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="unsloth/llama-3-8b-Instruct",
max_seq_length=2048,
load_in_4bit=True, # QLoRA
)
model = FastLanguageModel.get_peft_model(
model,
r=16, # LoRA rank
target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
lora_alpha=16,
lora_dropout=0,
)
# Training
from trl import SFTTrainer
trainer = SFTTrainer(
model=model,
train_dataset=dataset,
dataset_text_field="text",
max_seq_length=2048,
)
trainer.train()
model.save_pretrained_merged("my-fine-tuned-model", tokenizer)
4. Deployment Options
- Hugging Face Inference Endpoints: Easiest
- Together AI: Best price/performance
- Modal: Great for burst traffic
- Self-hosted: Maximum control
Cost Breakdown
Fine-tuning Llama 3.1 8B on 10K examples:
- Cloud GPUs: ~$20 (4 hours on A100)
- Together AI: ~$5 (managed)
- Local RTX 4090: ~$2 (electricity)
Common Mistakes
- Too little data - Quality over quantity, but you still need quantity
- Overfitting - Always hold out a test set
- Wrong base model - Start with the best instruct-tuned version
- Ignoring evals - Set up automated benchmarks
Fine-tuning isn't magic. But when applied correctly, it's incredibly powerful.