← Back to curriculum

Module 7 — GenAI & LLMs

LLM training lifecycle

End-to-end path from data processing and pre-training through post-training (SFT, RLHF) to inference and deployment.

~80 min read + exercises

LLM training lifecycle

Before we begin

A production LLM is not trained once and shipped. It moves through a lifecycle: raw data → pre-training → post-training → inference → monitoring.

Pre-training teaches language. Post-training teaches behavior. Inference is what users touch.


What you will learn

  • Map the end-to-end lifecycle from data to deployment.
  • Explain pre-training, SFT, and RLHF at a high level.
  • Know what happens at inference vs training time.
  • See where RAG and fine-tuning fit later in Module 7.

Before this lesson


Stage 1 — Data processing

Before any gradient step:

StepPurpose
CollectWeb, books, code, licensed corpora
FilterRemove spam, PII, toxic or low-quality text
DeduplicateSame paragraph repeated millions of times biases the model
NormalizeConsistent encoding, strip broken HTML
ShardSplit into training chunks for distributed jobs

Bad data at this stage cannot be fixed by a bigger model.


Stage 2 — Pre-training

Goal: predict the next token on massive unlabeled text.

  • Train a decoder-only transformer (GPT-style) or other architecture.
  • Runs for weeks on thousands of GPUs.
  • Output: base model — strong at language, weak at following instructions.

What it learns: grammar, facts (noisy), coding patterns, reasoning traces seen in data.

What it lacks: reliable obedience to “answer in JSON” or “refuse harmful requests” — that comes next.


Stage 3 — Post-training

Turns a base model into a helpful assistant.

Supervised fine-tuning (SFT)

  • Curated (prompt, ideal response) pairs written by humans or teachers.
  • Teaches formats: chat roles, tool JSON, concise answers.

Preference tuning / RLHF

  • Compare two answers; train model to prefer the better one (human or AI judge).
  • Reduces toxic or unhelpful outputs; aligns tone with product goals.

Variants you will hear: DPO, ORPO — same family, different math; awareness is enough for engineering interviews.


Stage 4 — Inference

What ships to apps:

  1. User sends messages via API.
  2. Model runs forward pass only (no weight updates).
  3. Tokens stream out until stop condition.

Not retrained on each user message unless you add fine-tuning or RAG (later lessons).

Optional inference optimizations: quantization (Lesson 5), KV-cache, speculative decoding.


Stage 5 — Operations (after launch)

ActivityWhy
MonitorLatency, errors, cost per request
EvalRegression tests when prompts or models change
Refresh dataRAG indexes, fine-tune sets
Version modelsPin gpt-4.1 vs gpt-4.1-mini per route

Module 10 (production) goes deep here; Module 8 covers evals for agents.


Lifecycle diagram (exam style)

plaintext
Raw data → clean/shard → PRE-TRAIN (next-token)
    → SFT (examples) → preference tune (RLHF/DPO)
    → deploy INFERENCE → monitor + eval + optional RAG/fine-tune

What's next

Lesson 3 — Prompt engineering