Session 27: Introduction to LLM Finetuning

Welcome to session 27! This week, we go through how LLMs can be finetuned.

The session is conducted by Sandhya Nallakkiniyan. Sandhya is a Product Manager for AI/ML platform from PayPal.

If you've missed the session or if you'd like to go through it again, here's the session video - https://youtu.be/smjGkbtd2K4

Here are the resources discussed and shared during the session.

Resources discussion in the video: Deck: Link OpenAI Platform: Link Gemini Finetuning: Link Open source finetuning service: Link

Here's the notes from the meeting:

Meeting Purpose To provide an overview of fine-tuning large language models (LLMs), including techniques, applications, and considerations.

Key Takeaways

Fine-tuning allows customizing pre-trained LLMs for specific tasks, improving performance while reducing costs compared to full model training
Key approaches include supervised fine-tuning, reinforcement learning, and instruction tuning, each with different tradeoffs in terms of data requirements, compute costs, and skill needed
Fine-tuning should be considered after exploring prompt engineering and retrieval-augmented generation (RAG), as it requires more resources but can achieve higher accuracy for specialized tasks
Both closed-source (e.g. Azure, Google) and open-source frameworks are available, with tradeoffs in flexibility, cost, and control

Topics Introduction to Fine-Tuning

Fine-tuning customizes pre-trained LLMs for specific tasks when context window or performance is insufficient
Allows achieving high performance on specialized tasks at lower cost than full model training
Complementary to techniques like distillation for creating smaller, task-specific models

Applications of Fine-Tuning

When to Fine-Tune

Start with prompt engineering and RAG before considering fine-tuning
Fine-tune when accuracy goals can't be met with simpler techniques
Consider compute costs and data availability when deciding to fine-tune
Hybrid approaches combining RAG and fine-tuning are common for production systems

Fine-Tuning Platforms and Frameworks

Closed-source options: Azure OpenAI, Google Vertex AI
- Easier to use, less flexibility, pay per token
Open-source frameworks: Hugging Face, TRL, Axolotl
- More control and flexibility, requires managing own compute
Free options for experimentation: Google Colab, Kaggle (limited GPU access)

Fine-Tuning Techniques

Supervised fine-tuning: Most straightforward, uses labeled datasets
Instruction tuning: Similar to prompt engineering, teaches specific behaviors
Reinforcement learning: Trains model through rewards in simulated environments
Parameter-efficient techniques (LoRA, QLoRA): Freeze some layers to reduce compute and prevent catastrophic forgetting

Practical Considerations

Data quality is crucial - 50 high-quality examples can outperform 1000 low-quality ones
Continuous fine-tuning may be needed as data/requirements evolve
Evaluation and benchmarking are critical, especially for safety-critical applications
Cost varies widely based on approach, data volume, and compute requirements

Next Steps

Explore free fine-tuning options on platforms like Google Colab for initial experimentation
Consider hybrid approaches combining RAG and fine-tuning for production systems
Evaluate compute costs and accuracy improvements to justify fine-tuning investments
Stay updated on emerging techniques like on-device fine-tuning and efficient approaches

Here's the entire recording of the session.

When, why and how to finetune LLMs