Blogs & Webinars

Session 13: RAG - Technical Foundation

A technical foundational course on setting up RAG

Introduction to RAG Hey, folks, happy Saturday. In this session, Pavan Mantha from Orbcomm takes us through the technical foundations of Retrieval-Augmented Generation.

Key Takeaways

  • RAG is crucial for grounding LLMs with domain-specific knowledge not present in their training data
  • Effective RAG implementation requires addressing challenges at each stage: document processing, embedding, retrieval, and generation
  • Advanced techniques like hybrid embeddings, fine-tuned embedding models, and diverse chunking strategies can significantly improve RAG performance
  • Evaluation, observability, and continuous improvement are essential for production RAG systems

Introduction to RAG

  • RAG allows incorporating domain-specific knowledge into LLM responses
  • Basic RAG pipeline: document chunking → embedding → vector storage → retrieval → augmentation → LLM generation
  • Key components: embedding models (dense/sparse vectors), vector databases, chunking strategies

RAG Implementation Challenges

  • Document parsing: Handling unstructured data (images, tables, infographics)
  • Chunking: Determining optimal chunk size and overlap
  • Embedding: Selecting appropriate models and dimensions, handling multilingual content
  • Vector databases: Indexing and tuning for retrieval performance
  • Query processing: Addressing ambiguity and selecting effective search methodologies
  • Augmentation: Crafting clear instructions for LLMs
  • Generation: Ensuring high-quality, relevant responses

Advanced RAG Architectures

  • Hybrid embedding methods: Combining dense and sparse vectors for improved retrieval
  • Fine-tuned embedding models: Domain-specific training for better vector representations
  • LLM ensembling: Using multiple LLMs and a "judge" LLM for more accurate responses

Chunking Strategies

  • Token chunking: Basic splitting based on token count
  • Sentence chunking: Splitting based on sentence boundaries and semantics
  • Recursive chunking: Iterative splitting based on rules or document structure
  • Semantic chunking: Grouping semantically related content across document sections
  • Late chunking: Embedding full documents, then splitting embeddings
  • Neural chunking: Using fine-tuned models for intelligent splitting
  • Slumber chunking: Agent-based approach for adaptive chunking decisions

RAG vs. Fine-tuning

  • Fine-tuning LLMs: Suitable for stable domain knowledge, less frequent updates
  • RAG: Better for dynamic data, frequent updates, and maintaining up-to-date information

Structured Data in RAG

  • Approaches: Using agents (e.g., Agno, QAI frameworks) or SQL Alchemy
  • Process: Provide knowledge base of sample queries/answers, expose schema structure to LLM
  • LLM infers table relationships, generates SQL queries, executes in REPL for validation

Observability and Evaluation

  • Observability tools: MLflow, Langfuse, Aris Phoenix, Openlit
  • Evaluation frameworks: Ragas, Deep Evals
  • Key metrics: Context relevance, answer relevancy, faithfulness, retrieval accuracy

Continuous Improvement

  • Collect user feedback on response quality
  • Analyze traces of queries, retrievals, and contexts for dissatisfactory responses
  • Tune hyperparameters: top-K, embedding model, chunking strategy, instruction sets, LLM settings
  • Regularly update and refine the RAG pipeline based on performance metrics and user feedback

Next Steps

  • Share additional resources: Medium articles, GitHub repositories, research papers
  • Schedule follow-up session focused on practical code implementation of RAG concepts
  • Explore advanced RAG architectures: context-enriched retrievals, fusion RAG, hide RAG
  • Dive deeper into observability, deployment, and evaluation strategies for production RAG systems

Here's the entire recording of the session.