← Back to curriculum

Module 3 — Neural networks basics

Loss functions for neural networks

Cross-entropy with softmax, why MSE is weak for classification, multi-class MNIST loss, and reading training curves.

~65 min read + exercises

Loss functions for neural networks

Before we begin

Forward pass + backprop need a scalar loss — one number saying how wrong the batch was.

For MNIST (10 digit classes), the standard pair is:

Softmax outputs + cross-entropy loss


What you will learn

  • Explain cross-entropy in plain language.
  • Know why MSE is a weak default for classification.
  • Read a simple training curve (loss down, accuracy up).

Before this lesson


Cross-entropy (one example)

True digit: 3 (one-hot: index 3 = 1, others 0).
Model probabilities after softmax: p₀…p₉.

Loss = −log(p₃)

  • If model is confident and correct (p₃ ≈ 1) → loss ≈ 0.
  • If model assigns low probability to the true class → large loss.

Average loss over the batch → one number for backprop.


Softmax + cross-entropy together

Softmax converts 10 logits to probabilities summing to 1:

pi=ezijezjp_i = \frac{e^{z_i}}{\sum_j e^{z_j}}

Cross-entropy pushes mass onto the correct class. Frameworks often combine them as CrossEntropyLoss on logits (softmax inside for numerical stability).


Why not MSE on one-hot labels?

Mean squared error can work but often trains slower and is less aligned with probabilistic classification. Cross-entropy penalizes confident wrong answers more sharply.

TaskCommon loss
MNIST digitsCross-entropy
House priceMSE / MAE
Module 1 patch brightnessMSE

Reading training curves

Healthy training often shows:

  • Training loss trending down (not necessarily to zero).
  • Validation accuracy rising, then flattening.
  • If train acc ↑ but val acc ↓ → overfitting (Module 2).

Checkpoint

When is accuracy a misleading metric during training?

Answer sketch

Accuracy can hide poor performance on rare classes; loss captures how confident wrong predictions are. Always track val accuracy and confusion matrix for MNIST too.


What's next

Module 3 quiz