Fine-tuning vs RAG

Before we begin

Your app needs company-specific knowledge. Two main paths:

Fine-tuning — change model weights on new data.
RAG (Retrieval-Augmented Generation) — fetch relevant docs at query time and paste into the prompt.

Figure

Two paths to custom knowledge

Fine-tuning bakes data in; RAG retrieves fresh context per question.

Figure

RAG pipeline

Embed query → search vector index → LLM answers with excerpts.

What you will learn

Compare fine-tuning and RAG trade-offs.
Outline chunking, embedding, and vector search.
Know when citations require RAG.

Before this lesson

Fine-tuning

Process: continue training (full or LoRA) on curated examples.

Good for:

Style / tone / format consistently
Specialized vocabulary in fixed domain
Teaching new behaviors (tool formats, JSON schemas)

Costs:

GPU time, MLOps, retrain when data changes
Risk of catastrophic forgetting if done poorly
Hard to cite which document supported an answer

RAG

Process:

Chunk documents (500–1000 tokens with overlap).
Embed chunks → store in vector DB (FAISS local, Pinecone hosted, etc.).
At query: embed question → retrieve top-k similar chunks.
Prompt LLM with chunks + user question + citation rules.

Good for:

PDFs, wikis, blogs that update often
Citations and audit trails
Smaller teams without fine-tune infra

Costs:

Retrieval quality matters — bad chunks → bad answers
Larger prompts → more tokens / latency

Difference (exam style)

	Fine-tuning	RAG
Updates weights	Yes	No (uses base model)
Fresh docs tomorrow	Retrain	Re-index
Citations	Hard	Natural
Teaches new skill	Strong	Weaker

Many products use both: RAG for facts + light fine-tune for tone.

Chunking tips

Split on headings / paragraphs, not mid-sentence.
Overlap 50–100 tokens so context isn’t lost at boundaries.
Store metadata: source, title, url, page.

What is embedding in LLM context?

For RAG: a sentence embedding model maps query and chunks to vectors; cosine similarity finds nearest neighbors — same intuition as Module 1 dot products, higher dimension.

What's next

Lesson 8 — RAG engineering