Question 1 of 25
What does a perceptron compute before applying an activation?
Question 2 of 25
Why can a single perceptron not learn XOR?
Question 3 of 25
Why do we need activation functions in hidden layers?
Question 4 of 25
What is a key advantage of ReLU over sigmoid in hidden layers?
Question 5 of 25
Which activation is most common for binary output probability?
Question 6 of 25
Forward propagation means:
Question 7 of 25
What does backpropagation actually compute?
Question 8 of 25
What is the vanishing gradient problem?
Question 9 of 25
For multi-class digit classification (0–9), which loss is standard?
Question 10 of 25
Loss is high but not decreasing. What is a sensible first check?
Question 11 of 25
A network has input 784, hidden 128, output 10. How many weight connections (approx.) from input to hidden?
Question 12 of 25
Sigmoid output near 0 or 1 with saturated hidden sigmoids contributed to vanishing gradients because:
Question 13 of 25
After forward pass and loss, weights update using:
Question 14 of 25
Stacking multiple neurons in a hidden layer fixes XOR because:
Question 15 of 25
Why use softmax + cross-entropy together on the output layer for MNIST?
Question 16 of 25
A perceptron without activation on the output is limited because:
Question 17 of 25
Tanh activation outputs are in range:
Question 18 of 25
In a 3-layer network (input → hidden → output), forward propagation computes:
Question 19 of 25
Backpropagation is efficient because it:
Question 20 of 25
Why is ReLU often preferred over sigmoid in hidden layers?
Question 21 of 25
MSE loss is commonly used for:
Question 22 of 25
If a network has no activation between linear layers, stacking many layers is equivalent to:
Question 23 of 25
Learning rate too high during neural net training often causes:
Question 24 of 25
The perceptron computes weighted sum + bias, then applies:
Question 25 of 25
For binary classification (spam vs ham), a common output activation + loss pair is: