← Back to curriculum

Module 7 — GenAI & LLMs

RAG engineering — chunking, indexing & reranking

Chunking strategies, data ingestion pipelines, vector databases, hybrid search, reranking, and production retrieval quality.

~85 min read + exercises

RAG engineering — chunking, indexing & reranking

Before we begin

Lesson 7 compared fine-tuning vs RAG. This lesson goes deeper on retrieval — where most production RAG quality is won or lost.

Bad retrieval cannot be prompt-engineered away. Fix the index before blaming the LLM.

Figure

Production RAG stack

QueryuserEmbedvectorRetrieveFAISSLLMgroundedCitesources
Ingest → chunk → embed → index → retrieve → rerank → generate.

What you will learn

  • Design chunking strategies for PDFs, wikis, and code.
  • Build data ingestion pipelines that stay fresh.
  • Use vector databases and hybrid search.
  • Apply reranking to improve top-k quality.

Before this lesson


Chunking strategies

StrategyWhen
Fixed token size (512–1024)General docs; simple baseline
Structure-aware (headings, paragraphs)Wikis, MDX, HTML — respect boundaries
Overlap (50–100 tokens)Prevent answers split across chunk edges
Parent–childSmall chunks for search, large parent for LLM context

Metadata per chunk: source, title, url, page, updated_at — required for citations and debugging.


Data ingestion

Production ingestion is a pipeline, not a one-time script:

  1. Watch sources (S3 folder, Notion export, git repo).
  2. Parse — PDF text extraction, HTML cleanup, code AST optional.
  3. Chunk + embed — batch for cost.
  4. Upsert index — delete stale IDs when doc removed.
  5. Version — tag index v2025-06-25 for rollback.

Failure modes: scanned PDFs with no OCR, tables rendered as garbage, duplicate pages bloating retrieval.


Indexing & vector databases

OptionFit
FAISS / localPrototypes, single-server apps
Pinecone, Weaviate, QdrantManaged scale, metadata filters
pgvectorAlready on Postgres; good for small teams

Hybrid search: combine dense (embedding — meaning-based) + sparse (BM25 — keyword matching, like classic web search) — critical for SKU codes, legal citations, exact product names.


Reranking

First-stage retrieval returns top-20 by embedding similarity. A cross-encoder reranker scores (query, chunk) pairs more accurately → keep top-3 for the LLM.

StageSpeedQuality
Bi-encoder retrieveFastGood recall
Cross-encoder rerankSlowerBetter precision

Many teams: top_k=20 retrieve → rerank → top_n=5 to prompt.


Evaluation hooks

Before launch, log for each query:

  • Retrieved chunk IDs
  • Rerank scores
  • Whether answer cites correct source

Module 8 Evals lesson formalizes this; your Module 7 RAG project should include a small held-out Q&A set.


AI safety in RAG

  • Ground answers — instruct model to answer only from provided chunks.
  • Refuse when retrieval score is below threshold.
  • Show citations — user verifies; reduces blind trust.
  • PII scan on ingest — do not index secrets.

What's next

Lesson 9 — Hallucinations, trust & AI safety