RAG engineering — chunking, indexing & reranking
Before we begin
Lesson 7 compared fine-tuning vs RAG. This lesson goes deeper on retrieval — where most production RAG quality is won or lost.
Bad retrieval cannot be prompt-engineered away. Fix the index before blaming the LLM.
Figure
Production RAG stack
What you will learn
- Design chunking strategies for PDFs, wikis, and code.
- Build data ingestion pipelines that stay fresh.
- Use vector databases and hybrid search.
- Apply reranking to improve top-k quality.
Before this lesson
Chunking strategies
| Strategy | When |
|---|---|
| Fixed token size (512–1024) | General docs; simple baseline |
| Structure-aware (headings, paragraphs) | Wikis, MDX, HTML — respect boundaries |
| Overlap (50–100 tokens) | Prevent answers split across chunk edges |
| Parent–child | Small chunks for search, large parent for LLM context |
Metadata per chunk: source, title, url, page, updated_at — required for citations and debugging.
Data ingestion
Production ingestion is a pipeline, not a one-time script:
- Watch sources (S3 folder, Notion export, git repo).
- Parse — PDF text extraction, HTML cleanup, code AST optional.
- Chunk + embed — batch for cost.
- Upsert index — delete stale IDs when doc removed.
- Version — tag index
v2025-06-25for rollback.
Failure modes: scanned PDFs with no OCR, tables rendered as garbage, duplicate pages bloating retrieval.
Indexing & vector databases
| Option | Fit |
|---|---|
| FAISS / local | Prototypes, single-server apps |
| Pinecone, Weaviate, Qdrant | Managed scale, metadata filters |
| pgvector | Already on Postgres; good for small teams |
Hybrid search: combine dense (embedding — meaning-based) + sparse (BM25 — keyword matching, like classic web search) — critical for SKU codes, legal citations, exact product names.
Reranking
First-stage retrieval returns top-20 by embedding similarity. A cross-encoder reranker scores (query, chunk) pairs more accurately → keep top-3 for the LLM.
| Stage | Speed | Quality |
|---|---|---|
| Bi-encoder retrieve | Fast | Good recall |
| Cross-encoder rerank | Slower | Better precision |
Many teams: top_k=20 retrieve → rerank → top_n=5 to prompt.
Evaluation hooks
Before launch, log for each query:
- Retrieved chunk IDs
- Rerank scores
- Whether answer cites correct source
Module 8 Evals lesson formalizes this; your Module 7 RAG project should include a small held-out Q&A set.
AI safety in RAG
- Ground answers — instruct model to answer only from provided chunks.
- Refuse when retrieval score is below threshold.
- Show citations — user verifies; reduces blind trust.
- PII scan on ingest — do not index secrets.