Demystifying Backpropagation: The Core of Neural Network Training

Backpropagation is the heart of how neural networks learn. It’s an algorithm that fine-tunes the network’s parameters by calculating how much each parameter contributed to the overall error. Think of it like a chef tasting a dish, realizing it’s too salty, and figuring out exactly which ingredient to adjust.

At its core, training a neural network involves a cycle of four key steps:

Forward Pass: The network takes an input and passes it through its layers to produce an output, or a prediction.
Calculate Loss: The prediction is compared to the actual target value using a loss function, which quantifies how “wrong” the prediction was.
Backward Pass (Backpropagation): The algorithm calculates the gradient of the loss function with respect to each weight and bias in the network. This gradient is a measure of how a small change in a parameter will affect the loss.
Update Weights: The gradients are used by an optimization algorithm (like Gradient Descent) to update the parameters in the direction that will reduce the loss.

The Magic Ingredient: The Chain Rule

Backpropagation relies on a concept from calculus called the Chain Rule. It allows us to calculate the derivative of a composite function. In a neural network, each layer is a function of the previous layer, so the entire network is a deeply nested composite function.

Backpropagation starts at the output layer, calculates the gradient of the loss with respect to the final layer’s weights, and then works its way backward, layer by layer, calculating the gradients for all parameters. This “propagation” of the error backward is what gives the algorithm its name.

Backpropagation in Action: A Python Example

Let’s implement a simple neural network from scratch using NumPy to see backpropagation at work. We’ll train it to solve the classic XOR problem.

import numpy as np

# Sigmoid activation function and its derivative
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    return x * (1 - x)

# Input dataset for XOR
X = np.array([[0,0],
              [0,1],
              [1,0],
              [1,1]])

# Output dataset
y = np.array([[0],[1],[1],[0]])

# Seed random numbers for consistency
np.random.seed(1)

# Initialize weights randomly with mean 0
input_layer_neurons = X.shape[1]
hidden_layer_neurons = 2
output_neurons = 1

hidden_weights = np.random.uniform(size=(input_layer_neurons, hidden_layer_neurons))
hidden_bias = np.random.uniform(size=(1, hidden_layer_neurons))
output_weights = np.random.uniform(size=(hidden_layer_neurons, output_neurons))
output_bias = np.random.uniform(size=(1, output_neurons))

learning_rate = 0.1
epochs = 10000

for i in range(epochs):
    # --- Forward Pass ---
    # Activate hidden layer
    hidden_layer_input = np.dot(X, hidden_weights) + hidden_bias
    hidden_layer_activation = sigmoid(hidden_layer_input)

    # Get predictions from output layer
    output_layer_input = np.dot(hidden_layer_activation, output_weights) + output_bias
    predicted_output = sigmoid(output_layer_input)

    # --- Backward Pass (Backpropagation) ---
    # Calculate the error
    error = y - predicted_output

    # Calculate gradients
    d_predicted_output = error * sigmoid_derivative(predicted_output)
    
    error_hidden_layer = d_predicted_output.dot(output_weights.T)
    d_hidden_layer = error_hidden_layer * sigmoid_derivative(hidden_layer_activation)

    # --- Update weights and biases ---
    # Update output layer
    output_weights += hidden_layer_activation.T.dot(d_predicted_output) * learning_rate
    output_bias += np.sum(d_predicted_output, axis=0, keepdims=True) * learning_rate

    # Update hidden layer
    hidden_weights += X.T.dot(d_hidden_layer) * learning_rate
    hidden_bias += np.sum(d_hidden_layer, axis=0, keepdims=True) * learning_rate

print("Final predicted_output:")
print(predicted_output)

Code Breakdown

Initialization: We define our network structure, input and output data for XOR, and initialize the weights and biases with random values.
Training Loop: We loop for a set number of epochs.
Forward Pass: We calculate the network’s output (predicted_output) for the given input X.
Backward Pass: This is the backpropagation step.
- We first calculate the error between our prediction and the true y values.
- Then we compute the gradient (d_predicted_output) for the output layer.
- Next, we propagate this error back to the hidden layer to calculate its gradient (d_hidden_layer).
Update Parameters: We use the calculated gradients and the learning_rate to adjust the weights and biases of both layers.

After thousands of iterations, the network’s predictions will be very close to the actual target values for the XOR problem.

Conclusion

While modern deep learning frameworks like TensorFlow and PyTorch automate this process, understanding what happens under the hood is crucial for any machine learning practitioner. Backpropagation, though mathematically intensive, is a clever and efficient algorithm that makes deep learning possible. By repeatedly adjusting its parameters based on the propagated error, a neural network can learn to solve incredibly complex tasks.

Recent content

Can AI Catch What Clinicians Miss? A Comparative Study of Diagnostic Accuracy

MLOps: From Model to Production

Cross-Validation: The Gold Standard for Model Evaluation

Support Vector Machines: Maximizing the Margin

Popular topics

Demystifying Backpropagation: The Core of Neural Network Training

The Magic Ingredient: The Chain Rule

Backpropagation in Action: A Python Example

Code Breakdown

Conclusion

Generative Adversarial Networks (GANs): The Art of AI Creativity

Convolutional Neural Networks (CNNs): The Eyes of Deep Learning

MLOps: From Model to Production

Cross-Validation: The Gold Standard for Model Evaluation

Support Vector Machines: Maximizing the Margin

The Bias-Variance Tradeoff: A Balancing Act in Machine Learning

Recent content

Popular topics

Demystifying Backpropagation: The Core of Neural Network Training

The Magic Ingredient: The Chain Rule

Backpropagation in Action: A Python Example

Code Breakdown

Conclusion

You may also like