Module 3 quiz and review

Before we begin

Test activations, forward/backward passes, and loss before the MNIST project. Aim for at least 19 out of 25.

Multiple choice quiz

Interactive quiz

Pick one answer per question. Feedback appears immediately — take your time before clicking.

0 / 25 correct·0 answered

Question 1 of 25
What does a perceptron compute before applying an activation?
Answer options for question 1
Question 2 of 25
Why can a single perceptron not learn XOR?
Answer options for question 2
Question 3 of 25
Why do we need activation functions in hidden layers?
Answer options for question 3
Question 4 of 25
What is a key advantage of ReLU over sigmoid in hidden layers?
Answer options for question 4
Question 5 of 25
Which activation is most common for binary output probability?
Answer options for question 5
Question 6 of 25
Forward propagation means:
Answer options for question 6
Question 7 of 25
What does backpropagation actually compute?
Answer options for question 7
Question 8 of 25
What is the vanishing gradient problem?
Answer options for question 8
Question 9 of 25
For multi-class digit classification (0–9), which loss is standard?
Answer options for question 9
Question 10 of 25
Loss is high but not decreasing. What is a sensible first check?
Answer options for question 10
Question 11 of 25
A network has input 784, hidden 128, output 10. How many weight connections (approx.) from input to hidden?
Answer options for question 11
Question 12 of 25
Sigmoid output near 0 or 1 with saturated hidden sigmoids contributed to vanishing gradients because:
Answer options for question 12
Question 13 of 25
After forward pass and loss, weights update using:
Answer options for question 13
Question 14 of 25
Stacking multiple neurons in a hidden layer fixes XOR because:
Answer options for question 14
Question 15 of 25
Why use softmax + cross-entropy together on the output layer for MNIST?
Answer options for question 15
Question 16 of 25
A perceptron without activation on the output is limited because:
Answer options for question 16
Question 17 of 25
Tanh activation outputs are in range:
Answer options for question 17
Question 18 of 25
In a 3-layer network (input → hidden → output), forward propagation computes:
Answer options for question 18
Question 19 of 25
Backpropagation is efficient because it:
Answer options for question 19
Question 20 of 25
Why is ReLU often preferred over sigmoid in hidden layers?
Answer options for question 20
Question 21 of 25
MSE loss is commonly used for:
Answer options for question 21
Question 22 of 25
If a network has no activation between linear layers, stacking many layers is equivalent to:
Answer options for question 22
Question 23 of 25
Learning rate too high during neural net training often causes:
Answer options for question 23
Question 24 of 25
The perceptron computes weighted sum + bias, then applies:
Answer options for question 24
Question 25 of 25
For binary classification (spam vs ham), a common output activation + loss pair is:
Answer options for question 25

After the quiz

19/25 or higher? Start the MNIST project.

Checklist:

I can explain why hidden layers need activations.
I can describe forward vs backward pass roles.
I know what vanishing gradients means.
I can name the standard MNIST loss setup.

What's next

Project: MNIST digit classifier + draw UI