Temperature, top-k, and top-p

Before we begin

After logits, the model must choose the next token. Sampling controls trade randomness vs determinism.

Figure

Temperature effect

Low temperature → confident peak; high → more spread.

Always pick highest probability token.

Equivalent to temperature → 0 in many APIs.

Scale logits before softmax: logits / T

What is temperature in an LLM? A knob on randomness of token choice — not “creativity magic,” just probability shaping.

Only sample from the k highest logits (e.g. k=40).

Cuts rare junk tokens while keeping variety.

Sample from smallest set whose cumulative probability ≥ p (e.g. 0.9).

Adapts to situation — narrow when model is confident, wider when uncertain.

Often combined with temperature in APIs.

Always eval on your data — defaults are not universal.