Fine-Tuning Fundamentals: Customizing AI Models
By Marcin Piekarski builtweb.com.au · Last Updated: 11 February 2026
TL;DR: Fine-tuning adapts pre-trained models to your specific use case. Learn when to fine-tune, how it works, and alternatives.
TL;DR
Fine-tuning trains a pre-trained model on your specific data to improve performance on your task. Consider it when prompting and RAG aren't sufficient—but it requires data, cost, and maintenance.
What is fine-tuning?
Definition:
Additional training on a pre-trained model using your own dataset.
Goal:
- Adapt to your domain (medical, legal, etc.)
- Learn your style or format
- Improve specific task performance
Not:
- Teaching completely new knowledge (use RAG)
- Fixing all model limitations
When to fine-tune
Good candidates:
- Specific style/format needed
- Domain-specific language
- Consistent task structure
- Have labeled data (100s-1000s examples)
Examples:
- Generate emails in your company's tone
- Classify support tickets into custom categories
- Extract entities specific to your industry
When NOT to fine-tune
Use RAG instead if:
- Need to add knowledge
- Knowledge changes frequently
- Don't have training data
Use better prompting if:
- Task is general
- Few-shot examples work well
- Data collection is hard
The fine-tuning process
1. Prepare data:
- Collect 100-10,000 examples
- Format as input-output pairs
- Clean and deduplicate
2. Choose base model:
- GPT-3.5, GPT-4 (OpenAI)
- Llama, Mistral (open source)
3. Train:
- Upload data to platform or run locally
- Set hyperparameters (learning rate, epochs)
- Monitor training metrics
4. Evaluate:
- Test on held-out data
- Compare to base model
5. Deploy:
- Use fine-tuned model via API or hosting
Data requirements
Quantity:
- Minimum: 50-100 examples
- Recommended: 500-1000+
- More is better (diminishing returns)
Quality:
- Accurate labels
- Representative of production
- Diverse examples
Format (example for OpenAI):
[
{"messages": [
{"role": "system", "content": "You are a customer support agent."},
{"role": "user", "content": "My order is late"},
{"role": "assistant", "content": "I apologize. Let me check your order status..."}
]},
...
]
Fine-tuning platforms
OpenAI:
- GPT-3.5, GPT-4 fine-tuning
- Easy API
- Paid per training + usage
Hugging Face:
- Open source models
- Training scripts provided
- Self-host or use Endpoints
Google Vertex AI:
- Fine-tune PaLM models
- Managed service
Self-hosted (advanced):
- Full control
- Requires ML expertise
Costs
OpenAI fine-tuning:
Self-hosted:
- GPU costs ($500-5000/month)
- Engineering time
- Cheaper at scale
Common pitfalls
Overfitting:
- Model memorizes training data
- Fails on new examples
- Solution: More diverse data, early stopping
Insufficient data:
- Model doesn't learn patterns
- Solution: Collect more or use few-shot prompting
Wrong base model:
- Too small (can't learn)
- Too large (expensive, slow)
Ignoring alternatives:
- Sometimes better prompts = same results
- Try RAG first
Evaluation
Compare:
- Fine-tuned vs base model
- Fine-tuned vs few-shot prompting
- Fine-tuned vs RAG
Metrics:
- Accuracy, F1, BLEU (task-dependent)
- Human evaluation
- A/B test in production
Maintaining fine-tuned models
- Retrain periodically with new data
- Monitor for drift
- Update when base model improves
Decision framework
Need to add knowledge? → RAG
Specific style/format? → Fine-tuning
Complex reasoning? → Better prompting
All of the above? → Combine techniques
What's next
- Fine-Tuning vs RAG (deeper comparison)
- Training Data Preparation
- Model Selection
Frequently Asked Questions
How many examples do I need to fine-tune a model?
A minimum of 50-100 examples to see any effect, but 500-1000+ is recommended for reliable results. Quality matters more than quantity. Well-curated, representative examples outperform large noisy datasets. Start small, evaluate, and add more data if needed.
Is fine-tuning expensive?
It varies by approach. Fine-tuning GPT-3.5 through OpenAI costs around $8 per 1 million training tokens, which is affordable for most teams. Self-hosted fine-tuning on open-source models requires GPU infrastructure costing $500-5000 per month but becomes cheaper at scale.
Can fine-tuning make a model worse?
Yes. Overfitting on a small dataset can make the model memorize training examples instead of learning patterns. Using low-quality or biased data can degrade general capabilities. Always compare fine-tuned performance against the base model on held-out test data before deploying.
Should I fine-tune or just write better prompts?
Try better prompts first. If few-shot examples in your prompt produce good results, you may not need fine-tuning at all. Fine-tune when you need consistent style, specialized domain knowledge, or reduced prompt complexity that prompting alone cannot achieve.
Was this guide helpful?
Your feedback helps us improve our guides
About the Authors
Marcin Piekarski· Frontend Lead & AI Educator
Marcin is a Frontend Lead with 20+ years in tech. Currently building headless ecommerce at Harvey Norman (Next.js, Node.js, GraphQL). He created Field Guide to AI to help others understand AI tools practically—without the jargon.
Credentials & Experience:
- 20+ years web development experience
- Frontend Lead at Harvey Norman (10 years)
- Worked with: Gumtree, CommBank, Woolworths, Optus, M&C Saatchi
- Runs AI workshops for teams
- Founder of builtweb.com.au
- Daily AI tools user: ChatGPT, Claude, Gemini, AI coding assistants
- Specializes in React ecosystem: React, Next.js, Node.js
Areas of Expertise:
Prism AI· AI Research & Writing Assistant
Prism AI is the AI ghostwriter behind Field Guide to AI—a collaborative ensemble of frontier models (Claude, ChatGPT, Gemini, and others) that assist with research, drafting, and content synthesis. Like light through a prism, human expertise is refracted through multiple AI perspectives to create clear, comprehensive guides. All AI-generated content is reviewed, fact-checked, and refined by Marcin before publication.
Transparency Note: All AI-assisted content is thoroughly reviewed, fact-checked, and refined by Marcin Piekarski before publication.
Key Terms Used in This Guide
Model
The trained AI system that contains all the patterns and knowledge learned from data. It's the end product of training—the 'brain' that takes inputs and produces predictions, decisions, or generated content.
Fine-Tuning
Taking a pre-trained AI model and training it further on your specific data to make it better at your particular task or adopt a specific style.
AI (Artificial Intelligence)
Making machines perform tasks that typically require human intelligence—like understanding language, recognizing patterns, or making decisions.
Training
The process of feeding large amounts of data to an AI system so it learns patterns, relationships, and rules, enabling it to make predictions or generate output.
Training Data
The collection of examples an AI system learns from. The quality, quantity, and diversity of training data directly determines what the AI can and cannot do.
Related Guides
Retrieval Strategies for RAG Systems
IntermediateRAG systems retrieve relevant context before generating responses. Learn retrieval strategies, ranking, and optimization techniques.
7 min readVector Database Fundamentals
IntermediateVector databases store and search embeddings efficiently. Learn how they work, when to use them, and popular options.
7 min readTraining Custom Embedding Models
AdvancedFine-tune or train embedding models for your domain. Improve retrieval quality with domain-specific embeddings.
7 min read