← Back to curriculum

Module 4 — Deep learning architectures

LSTM & GRU — long-term memory

Gating intuition, cell state, forget/input/output gates, and when GRU is enough.

~75 min read + exercises

LSTM and GRU — long-term memory

Before we begin

LSTM (Long Short-Term Memory) adds a memory cell and gates that decide what to keep, add, and output. GRU is a lighter variant with fewer gates.

What problem does LSTM solve? Remembering useful context across many time steps without vanishing gradients wiping it out.

Figure

LSTM gates

LSTM gates control what to forget, store, and outputForgetInputCellOutput
Forget, input, cell, output — control information flow.

What you will learn

  • Name the three LSTM gates and their roles.
  • Compare LSTM vs GRU at a practical level.
  • Know when to pick LSTM/GRU for sentiment and sequences.

Before this lesson


LSTM intuition

Besides hidden state h, LSTM keeps cell state C — a conveyor belt of memory.

GateRole
ForgetDrop irrelevant old cell content
InputAdd new candidate information
OutputWhat to expose as hidden state h

Gates use sigmoid (0–1) to scale flows — differentiable “switches.”


GRU

Combines cell and hidden into one stream with:

  • Reset gate — how much past to ignore when computing candidate
  • Update gate — blend old hidden vs new candidate

Often similar accuracy to LSTM with fewer parameters — good default to try first on medium text tasks.


LSTM for sentiment

Review: “Not perfect, but honestly pretty good overall.”

LSTM can link “not perfect” with later “pretty good” better than bag-of-words — order and contrast matter.

Your Module 4 project uses LSTM on product reviews.


vs Transformers (preview)

Transformers (Module 6) attend to all tokens at once — often beat LSTM on long text today. LSTM/GRU remain valuable to understand sequential modeling and for small edge deployments.


Checkpoint

What does the forget gate do?

Answer sketch

It decides how much of the old cell state to erase before writing new information.


What's next

Lesson 4 — Word embeddings