Temperature, top-k, and top-p
Before we begin
After logits, the model must choose the next token. Sampling controls trade randomness vs determinism.
Figure
Temperature effect
What you will learn
- Define temperature, top-k, and top-p.
- Pick settings for factual vs creative tasks.
Before this lesson
Greedy decoding
Always pick highest probability token.
- Pros: stable, good for exact extraction / JSON.
- Cons: repetitive, boring prose.
Equivalent to temperature → 0 in many APIs.
Temperature
Scale logits before softmax: logits / T
| T | Behavior |
|---|---|
| 0.0–0.3 | Focused, factual, coding |
| 0.7–0.9 | Balanced chat |
| 1.0+ | Creative, risky for facts |
What is temperature in an LLM? A knob on randomness of token choice — not “creativity magic,” just probability shaping.
Top-k
Only sample from the k highest logits (e.g. k=40).
Cuts rare junk tokens while keeping variety.
Top-p (nucleus)
Sample from smallest set whose cumulative probability ≥ p (e.g. 0.9).
Adapts to situation — narrow when model is confident, wider when uncertain.
Often combined with temperature in APIs.
Practical defaults
| Task | Starting point |
|---|---|
| RAG Q&A with citations | T=0–0.3, top_p=0.9 |
| Marketing copy | T=0.8–1.0 |
| Code generation | T=0.2, low top_p |
Always eval on your data — defaults are not universal.