← Back to curriculum

Module 7 — GenAI & LLMs

LLM basics — GPT-style models

Pretraining, next-token prediction, inference vs training, API chat roles, and what models can and cannot do alone.

~70 min read + exercises

LLM basics — GPT-style models

Before we begin

A Large Language Model (LLM) in the ChatGPT sense is usually a decoder-only transformer trained on huge text to predict the next token.

It does not “look up answers” in a database by default — it continues text in a plausible way.

Figure

Autoregressive generation

GPT-style loop: prompt → next token → append → repeatPromptLLM+tokenAnswer
Each new token is conditioned on everything before it.

What you will learn

  • Describe pretraining vs inference.
  • Explain the token-by-token generation loop.
  • Map chat roles (system / user / assistant) to prompts.

Before this lesson


Pretraining

Goal: minimize next-token loss on internet-scale text (books, code, forums — curated and filtered).

Result: weights that encode grammar, facts ( imperfectly ), reasoning patterns, and style.

Not included: your private company PDFs unless you add RAG or fine-tuning later.


Inference (what users see)

  1. You send a prompt (chat messages).
  2. Model outputs logits for the next token.
  3. Pick a token (greedy or sampled — Lesson 3).
  4. Append token to context; repeat until stop token or limit.

Context window caps total tokens (prompt + completion).


Chat API shape

Typical messages:

RolePurpose
systemGlobal instructions
userHuman question
assistantModel reply (also in history for multi-turn)

The API serializes roles into one token sequence the model was fine-tuned to follow (instruction tuning / RLHFreinforcement learning from human feedback; awareness level only).


Capabilities vs limits

Strong at: drafting, summarizing, coding patterns, explaining concepts, format-following with good prompts.

Weak at: guaranteed facts, math without verification, private data not in context, real-time events unless tools/RAG added.


Embeddings in the LLM stack

Two related uses (quiz often conflates them):

  1. Input token embeddings — inside the LLM (Module 6).
  2. Retrieval embeddings — separate model maps sentences to vectors for search (Module 7 RAG).

Same word, two roles in production apps.


What's next

Lesson 2 — LLM training lifecycle