← Back to curriculum

Module 7 — GenAI & LLMs

Temperature, top-k & top-p

Controlling randomness at decode time, greedy vs sampled generation, and picking settings for chat vs creative tasks.

~60 min read + exercises

Temperature, top-k, and top-p

Before we begin

After logits, the model must choose the next token. Sampling controls trade randomness vs determinism.

Figure

Temperature effect

Temperature shapes the next-token probability spreadlow T — peakedhigh T — flat
Low temperature → confident peak; high → more spread.

What you will learn

  • Define temperature, top-k, and top-p.
  • Pick settings for factual vs creative tasks.

Before this lesson


Greedy decoding

Always pick highest probability token.

  • Pros: stable, good for exact extraction / JSON.
  • Cons: repetitive, boring prose.

Equivalent to temperature → 0 in many APIs.


Temperature

Scale logits before softmax: logits / T

TBehavior
0.0–0.3Focused, factual, coding
0.7–0.9Balanced chat
1.0+Creative, risky for facts

What is temperature in an LLM? A knob on randomness of token choice — not “creativity magic,” just probability shaping.


Top-k

Only sample from the k highest logits (e.g. k=40).

Cuts rare junk tokens while keeping variety.


Top-p (nucleus)

Sample from smallest set whose cumulative probability ≥ p (e.g. 0.9).

Adapts to situation — narrow when model is confident, wider when uncertain.

Often combined with temperature in APIs.


Practical defaults

TaskStarting point
RAG Q&A with citationsT=0–0.3, top_p=0.9
Marketing copyT=0.8–1.0
Code generationT=0.2, low top_p

Always eval on your data — defaults are not universal.


What's next

Lesson 5 — Fine-tuning & quantization