Welcome to session 27! This week, we go through how LLMs can be finetuned.
The session is conducted by Sandhya Nallakkiniyan. Sandhya is a Product Manager for AI/ML platform from PayPal.
If you've missed the session or if you'd like to go through it again, here's the session video - https://youtu.be/smjGkbtd2K4
Here are the resources discussed and shared during the session.
Resources discussion in the video: Deck: Link OpenAI Platform: Link Gemini Finetuning: Link Open source finetuning service: Link
Here's the notes from the meeting:
Meeting Purpose To provide an overview of fine-tuning large language models (LLMs), including techniques, applications, and considerations.
Key Takeaways
- Fine-tuning allows customizing pre-trained LLMs for specific tasks, improving performance while reducing costs compared to full model training
- Key approaches include supervised fine-tuning, reinforcement learning, and instruction tuning, each with different tradeoffs in terms of data requirements, compute costs, and skill needed
- Fine-tuning should be considered after exploring prompt engineering and retrieval-augmented generation (RAG), as it requires more resources but can achieve higher accuracy for specialized tasks
- Both closed-source (e.g. Azure, Google) and open-source frameworks are available, with tradeoffs in flexibility, cost, and control
Topics Introduction to Fine-Tuning
- Fine-tuning customizes pre-trained LLMs for specific tasks when context window or performance is insufficient
- Allows achieving high performance on specialized tasks at lower cost than full model training
- Complementary to techniques like distillation for creating smaller, task-specific models
Applications of Fine-Tuning
- Creating digital twins/assistants that mimic specific individuals or roles
- Improving chatbots and customer service agents with company-specific knowledge
- Enhancing fraud detection by continuously adapting to new patterns
- Customizing model outputs for brand voice, tone, or domain-specific language
When to Fine-Tune
- Start with prompt engineering and RAG before considering fine-tuning
- Fine-tune when accuracy goals can't be met with simpler techniques
- Consider compute costs and data availability when deciding to fine-tune
- Hybrid approaches combining RAG and fine-tuning are common for production systems
Fine-Tuning Platforms and Frameworks
- Closed-source options: Azure OpenAI, Google Vertex AI
- Easier to use, less flexibility, pay per token
- Open-source frameworks: Hugging Face, TRL, Axolotl
- More control and flexibility, requires managing own compute
- Free options for experimentation: Google Colab, Kaggle (limited GPU access)
Fine-Tuning Techniques
- Supervised fine-tuning: Most straightforward, uses labeled datasets
- Instruction tuning: Similar to prompt engineering, teaches specific behaviors
- Reinforcement learning: Trains model through rewards in simulated environments
- Parameter-efficient techniques (LoRA, QLoRA): Freeze some layers to reduce compute and prevent catastrophic forgetting
Practical Considerations
- Data quality is crucial - 50 high-quality examples can outperform 1000 low-quality ones
- Continuous fine-tuning may be needed as data/requirements evolve
- Evaluation and benchmarking are critical, especially for safety-critical applications
- Cost varies widely based on approach, data volume, and compute requirements
Next Steps
- Explore free fine-tuning options on platforms like Google Colab for initial experimentation
- Consider hybrid approaches combining RAG and fine-tuning for production systems
- Evaluate compute costs and accuracy improvements to justify fine-tuning investments
- Stay updated on emerging techniques like on-device fine-tuning and efficient approaches
Here's the entire recording of the session.