Demystifying Backpropagation: The Core of Neural Network Training
Photo by JJ Ying on Unsplash

Backpropagation is the heart of how neural networks learn. It’s an algorithm that fine-tunes the network’s parameters by calculating how much each parameter contributed to the overall error. Think of it like a chef tasting a dish, realizing it’s too salty, and figuring out exactly which ingredient to adjust.

At its core, training a neural network involves a cycle of four key steps:

  1. Forward Pass: The network takes an input and passes it through its layers to produce an output, or a prediction.
  2. Calculate Loss: The prediction is compared to the actual target value using a loss function, which quantifies how “wrong” the prediction was.
  3. Backward Pass (Backpropagation): The algorithm calculates the gradient of the loss function with respect to each weight and bias in the network. This gradient is a measure of how a small change in a parameter will affect the loss.
  4. Update Weights: The gradients are used by an optimization algorithm (like Gradient Descent) to update the parameters in the direction that will reduce the loss.

The Magic Ingredient: The Chain Rule

Backpropagation relies on a concept from calculus called the Chain Rule. It allows us to calculate the derivative of a composite function. In a neural network, each layer is a function of the previous layer, so the entire network is a deeply nested composite function.

Backpropagation starts at the output layer, calculates the gradient of the loss with respect to the final layer’s weights, and then works its way backward, layer by layer, calculating the gradients for all parameters. This “propagation” of the error backward is what gives the algorithm its name.

Backpropagation in Action: A Python Example

Let’s implement a simple neural network from scratch using NumPy to see backpropagation at work. We’ll train it to solve the classic XOR problem.

import numpy as np

# Sigmoid activation function and its derivative
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    return x * (1 - x)

# Input dataset for XOR
X = np.array([[0,0],
              [0,1],
              [1,0],
              [1,1]])

# Output dataset
y = np.array([[0],[1],[1],[0]])

# Seed random numbers for consistency
np.random.seed(1)

# Initialize weights randomly with mean 0
input_layer_neurons = X.shape[1]
hidden_layer_neurons = 2
output_neurons = 1

hidden_weights = np.random.uniform(size=(input_layer_neurons, hidden_layer_neurons))
hidden_bias = np.random.uniform(size=(1, hidden_layer_neurons))
output_weights = np.random.uniform(size=(hidden_layer_neurons, output_neurons))
output_bias = np.random.uniform(size=(1, output_neurons))

learning_rate = 0.1
epochs = 10000

for i in range(epochs):
    # --- Forward Pass ---
    # Activate hidden layer
    hidden_layer_input = np.dot(X, hidden_weights) + hidden_bias
    hidden_layer_activation = sigmoid(hidden_layer_input)

    # Get predictions from output layer
    output_layer_input = np.dot(hidden_layer_activation, output_weights) + output_bias
    predicted_output = sigmoid(output_layer_input)

    # --- Backward Pass (Backpropagation) ---
    # Calculate the error
    error = y - predicted_output

    # Calculate gradients
    d_predicted_output = error * sigmoid_derivative(predicted_output)
    
    error_hidden_layer = d_predicted_output.dot(output_weights.T)
    d_hidden_layer = error_hidden_layer * sigmoid_derivative(hidden_layer_activation)

    # --- Update weights and biases ---
    # Update output layer
    output_weights += hidden_layer_activation.T.dot(d_predicted_output) * learning_rate
    output_bias += np.sum(d_predicted_output, axis=0, keepdims=True) * learning_rate

    # Update hidden layer
    hidden_weights += X.T.dot(d_hidden_layer) * learning_rate
    hidden_bias += np.sum(d_hidden_layer, axis=0, keepdims=True) * learning_rate

print("Final predicted_output:")
print(predicted_output)

Code Breakdown

  1. Initialization: We define our network structure, input and output data for XOR, and initialize the weights and biases with random values.
  2. Training Loop: We loop for a set number of epochs.
  3. Forward Pass: We calculate the network’s output (predicted_output) for the given input X.
  4. Backward Pass: This is the backpropagation step.
    • We first calculate the error between our prediction and the true y values.
    • Then we compute the gradient (d_predicted_output) for the output layer.
    • Next, we propagate this error back to the hidden layer to calculate its gradient (d_hidden_layer).
  5. Update Parameters: We use the calculated gradients and the learning_rate to adjust the weights and biases of both layers.

After thousands of iterations, the network’s predictions will be very close to the actual target values for the XOR problem.

Conclusion

While modern deep learning frameworks like TensorFlow and PyTorch automate this process, understanding what happens under the hood is crucial for any machine learning practitioner. Backpropagation, though mathematically intensive, is a clever and efficient algorithm that makes deep learning possible. By repeatedly adjusting its parameters based on the propagated error, a neural network can learn to solve incredibly complex tasks.

Demystifying Backpropagation: The Core of Neural Network Training
Older post

Generative Adversarial Networks (GANs): The Art of AI Creativity

Explore the fascinating world of Generative Adversarial Networks (GANs), where two neural networks compete to create stunningly realistic images, music, and more.

Newer post

Convolutional Neural Networks (CNNs): The Eyes of Deep Learning

A deep dive into Convolutional Neural Networks (CNNs), the powerhouse behind modern computer vision. Learn how they 'see' and classify images with incredible accuracy.

Demystifying Backpropagation: The Core of Neural Network Training