Project: RAG chatbot with citations
Before we begin
Build a career-relevant GenAI app: users ask questions; the system retrieves from your blog MDX and optional PDFs, then answers with cited sources.
Figure
What you are building
How this connects to Module 7
| Lesson | Where you use it |
|---|---|
| Tokenization | Chunk size in tokens/words affects retrieval quality |
| Embeddings | Same model at index time and query time |
| Vector search | FAISS finds nearest chunks by cosine similarity |
| Prompting | Grounding prompt forces cite-only answers |
| Hallucination | "If unknown, say you don't know" + eval table |
Folder layout:
rag-chatbot/
ingest.py # load MDX/PDF → chunks
build_index.py # embed + FAISS
query.py # retrieve + prompt builder
data/
chunks.json
index.faiss
app/
api/rag-chat/route.ts
rag-lab/page.tsx # chat UI
eval/
questions.json # 10 hand-written Q&A checksWhat you will build
- Ingest
content/blog/*.mdx(+ optional PDFs). - Chunk, embed, index with FAISS (local) or Pinecone (hosted).
- Next.js chat page — message list + input.
- API route — retrieve top-k chunks → call LLM with grounding prompt.
- Citations — show title/URL for each excerpt used.
- Mini eval — 10 hand-written questions with expected doc references.
Estimated time: 5–8 hours.
Before you start
- Finish Module 7 quiz.
- API key for an embedding + chat provider or Ollama locally.
pip install faiss-cpu openai pypdf tiktoken(adjust packages to your provider)
Create folder rag-chatbot/ in your workspace.
Step 1 — Ingest blog MDX
Goal: Turn each post into a document record with stable id, title, url, and full text.
# ingest.py
from pathlib import Path
import re
import json
def strip_mdx(raw: str) -> str:
body = re.sub(r"^---.*?---\s*", "", raw, flags=re.S)
body = re.sub(r"!\[[^\]]*\]\([^)]+\)", " ", body)
body = re.sub(r"\[[^\]]+\]\([^)]+\)", " ", body)
body = re.sub(r"#+\s*", "", body)
return re.sub(r"\s+", " ", body).strip()
def load_mdx_docs(root="../content/blog"):
docs = []
for path in Path(root).glob("*.mdx"):
raw = path.read_text(encoding="utf-8")
body = strip_mdx(raw)
title_match = re.search(r"^#\s+(.+)$", raw, re.M)
title = title_match.group(1).strip() if title_match else path.stem
docs.append({
"id": path.stem,
"title": title,
"url": f"/blog/{path.stem}",
"text": body,
})
return docs
# Optional PDF:
# from pypdf import PdfReader
# for page in PdfReader(path).pages: text += page.extract_text()Why strip links/images? URLs and markdown syntax add noise; retrieval should match semantic content.
Step 2 — Chunk with overlap
Goal: Split long posts into ~800-word windows with overlap so sentences at boundaries aren't lost.
def chunk_doc(doc, size=800, overlap=120):
words = doc["text"].split()
chunks = []
i = 0
while i < len(words):
piece = " ".join(words[i : i + size])
chunks.append({
"source_id": doc["id"],
"title": doc["title"],
"url": doc["url"],
"chunk_index": len(chunks),
"text": piece,
})
i += max(1, size - overlap)
return chunks
def build_all_chunks(docs):
out = []
for d in docs:
out.extend(chunk_doc(d))
return out| Parameter | Tradeoff |
|---|---|
size=800 | Larger → more context per hit; smaller → more precise retrieval |
overlap=120 | Prevents cutting facts across chunk borders |
Save: json.dump(chunks, open("data/chunks.json","w"), indent=2)
Step 3 — Embed and index (FAISS)
Goal: Convert each chunk to a vector; build an index for fast similarity search.
# build_index.py
import faiss
import numpy as np
import json
from openai import OpenAI # or Ollama / sentence-transformers
client = OpenAI()
EMBED_MODEL = "text-embedding-3-small"
def embed_texts(texts: list[str]) -> np.ndarray:
resp = client.embeddings.create(model=EMBED_MODEL, input=texts)
vecs = [d.embedding for d in resp.data]
return np.array(vecs, dtype="float32")
chunks = json.loads(open("data/chunks.json").read())
vectors = embed_texts([c["text"] for c in chunks])
# Cosine similarity = dot product after L2 normalize
faiss.normalize_L2(vectors)
index = faiss.IndexFlatIP(vectors.shape[1])
index.add(vectors)
faiss.write_index(index, "data/index.faiss")Critical: Use the same embedding model for indexing and queries. Changing models requires rebuilding the index.
Step 4 — Retrieve at query time
# query.py
import faiss
import numpy as np
import json
index = faiss.read_index("data/index.faiss")
chunks = json.loads(open("data/chunks.json").read())
def retrieve(query: str, k=4):
q = embed_texts([query])
faiss.normalize_L2(q)
scores, ids = index.search(q, k)
hits = []
for rank, idx in enumerate(ids[0]):
if idx < 0:
continue
hit = {**chunks[idx], "score": float(scores[0][rank])}
hits.append(hit)
return hitsLog score during eval — if all scores are low (< 0.3), retrieval failed; don't let the LLM guess.
Step 5 — Grounded prompt
def build_prompt(question: str, hits: list[dict]) -> str:
context = ""
for i, h in enumerate(hits, 1):
context += f"[{i}] Title: {h['title']}\nURL: {h['url']}\n{h['text'][:1200]}\n\n"
return f"""You are a helpful assistant. Use ONLY the excerpts below.
Cite sources inline like [1] or [2].
If the answer is not in the excerpts, say "I don't have that in the indexed docs."
{context}
Question: {question}
Answer:"""
def answer(question: str):
hits = retrieve(question, k=4)
prompt = build_prompt(question, hits)
resp = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}],
temperature=0.2,
)
text = resp.choices[0].message.content
citations = [{"title": h["title"], "url": h["url"], "excerpt": h["text"][:200]} for h in hits]
return {"answer": text, "citations": citations}Temperature 0–0.3 — lower randomness for factual Q&A.
Step 6 — Next.js API route
// app/api/rag-chat/route.ts
import { NextResponse } from "next/server";
import { spawn } from "child_process";
export async function POST(req: Request) {
const { message } = await req.json();
if (!message?.trim()) {
return NextResponse.json({ error: "message required" }, { status: 400 });
}
// Option A: call Python script that prints JSON to stdout
const result = await runPythonQuery(message);
return NextResponse.json(result);
}
function runPythonQuery(message: string): Promise<unknown> {
return new Promise((resolve, reject) => {
const proc = spawn("python", ["rag-chatbot/query.py", message]);
let out = "";
proc.stdout.on("data", (d) => (out += d));
proc.on("close", (code) => {
if (code !== 0) return reject(new Error("query failed"));
resolve(JSON.parse(out));
});
});
}Option B: Port embed + FAISS to TypeScript with @xenova/transformers for small offline demos (slower, but no Python subprocess).
Step 7 — Chat UI
Client page (app/rag-lab/page.tsx):
- Scrollable messages — user bubbles right, assistant left.
- Assistant message renders citation chips linking to
/blog/.... - Loading state: "Retrieving…" then "Generating…".
- Optional: show retrieved chunk titles in a sidebar for debugging.
// After fetch("/api/rag-chat")
// setMessages([...prev, { role: "assistant", text: data.answer, citations: data.citations }])Make it meaningful: ask questions only answerable from your blog — verify citation links open the correct post.
Step 8 — Evaluation table
Create eval/questions.json:
[
{
"question": "How does on-device AI differ from cloud?",
"expected_slug": "on-device-ai-vs-cloud-ai"
}
]| Question | Expected source post | Correct cite? |
|---|---|---|
| How does on-device AI differ from cloud? | on-device-ai-vs-cloud-ai… | |
| … add 9 more … |
Run a script that calls answer(q) and checks if expected_slug appears in citation URLs. Target ≥8/10 before calling the project done.
Stretch goals
- Swap FAISS → Pinecone for hosted index.
- Add hybrid search (BM25 + vectors).
- Stream tokens with Server-Sent Events.
Troubleshooting
| Symptom | Fix |
|---|---|
| Answers ignore docs | Prompt too weak; lower temperature; show chunks in UI |
| Wrong post cited | Smaller chunks, higher k, or hybrid BM25 |
| Empty index | Fix path to content/blog; re-run ingest.py |
| Slow first query | Batch embed at build time; cache query embeddings |
Deliverables
- Indexed corpus from blog (+ PDF optional)
- Working chat UI + API
- Answers include numbered citations
- Eval table with ≥8/10 correct grounding
What's next
Module 7 complete. Continue to Module 8 — Agentic AI when ready.
Return to the AI course curriculum anytime to track progress.