← Back to curriculum

Module 4 — Deep learning architectures

RNNs — sequences and hidden state

Unrolling over time, backprop through time intuition, and why vanilla RNNs forget long context.

~70 min read + exercises

RNNs — sequences and hidden state

Before we begin

Reviews, stock prices, and audio frames are sequences — order matters. Recurrent Neural Networks (RNNs) process one time step at a time and carry a hidden state forward.

“Great” then “not” means something different than “not” then “great”. RNNs model that order.

Figure

Unrolled RNN

RNN: same cell repeated — hidden state carries contexth0word 1h1word 2h2word 3h3word 4weights shared at every step
Same weights at each step; hidden state h carries context.

What you will learn

  • Define hidden state and unrolling.
  • Explain backprop through time at a high level.
  • State why vanilla RNNs struggle on long sequences.

Before this lesson


One step of vanilla RNN

At time t:

hₜ = activation(W_h hₜ₋₁ + W_x xₜ + b)

  • xₜ — input at step t (e.g. word embedding)
  • hₜ — summary of the sequence so far
  • Same W_h, W_x at every step — shared across time

For sentiment, h at the last word can feed a classifier (positive / negative).


Backprop through time (BPTT)

Training unrolls the RNN over all steps, computes loss (e.g. at final step), then backprops through every time step.

Gradients flow through repeated multiplications by W_h — if values are small → vanishing; large → exploding.


Long sequence problem

Plain RNNs forget context from many steps ago:

  • “The movie was not good” — negation far from “good” is hard.
  • Long documents lose early sentences’ influence.

That motivated LSTM and GRU (next lesson).


RNN vs CNN (when to use which)

DataTypical architecture
ImagesCNN
Text / time seriesRNN, LSTM, GRU (or transformers later)
Fixed tabular rowsMLP / gradient boosting

Checkpoint

Why do RNNs struggle with long sequences?

Answer sketch

Gradients over many time steps vanish or explode — hidden state cannot reliably store information from distant past steps in a vanilla RNN.


What's next

Lesson 3 — LSTM & GRU