The Bias-Variance Tradeoff: A Balancing Act in Machine Learning
Photo by Casey Olsen on Unsplash

In supervised machine learning, our goal is to build a model that can make accurate predictions on new, unseen data. The prediction error for any machine learning algorithm can be broken down into three parts: Bias Error, Variance Error, and Irreducible Error.

  • Irreducible Error: This is the noise that is inherent in the problem itself. It cannot be reduced by any model.
  • Bias Error: These are the simplifying assumptions made by a model to make the target function easier to learn.
  • Variance Error: This is the amount that the estimate of the target function will change if different training data was used.

Understanding the relationship between bias and variance is crucial for diagnosing model performance and avoiding overfitting and underfitting.

Understanding Bias

Bias is the difference between the average prediction of our model and the correct value which we are trying to predict. A model with high bias pays very little attention to the training data and oversimplifies the model. It always leads to high error on both training and test data. This is underfitting.

  • High Bias: A very simple model, like a linear regression model trying to fit a complex, non-linear relationship.
  • Low Bias: A model that makes fewer assumptions about the target function, like a deep neural network.

Understanding Variance

Variance refers to the model’s sensitivity to small fluctuations in the training data. A model with high variance pays a lot of attention to training data and does not generalize well on the new data. It learns the noise in the training data, leading to high performance on the training set but poor performance on the test set. This is overfitting.

  • High Variance: A very complex model, like a very deep decision tree.
  • Low Variance: A simple model whose predictions don’t change much if the training data is modified slightly.

The Tradeoff

Here is the crux of the issue:

  • Increasing a model’s complexity will decrease its bias.
  • Increasing a model’s complexity will increase its variance.

This is the bias-variance tradeoff. It’s a balancing act. We can’t have both low bias and low variance. An ideal model is one that is complex enough to capture the underlying structure of the data (low bias), but not so complex that it starts modeling the noise (low variance).

Bias-Variance Tradeoff Diagram

The goal is to find a model in that “sweet spot” in the middle, where the total error is minimized.

Visualizing the Tradeoff

Let’s use Python to visualize how model complexity affects bias and variance. We’ll fit polynomial regression models of different degrees to a synthetic dataset. A low-degree polynomial is a simple model (high bias, low variance), while a high-degree polynomial is a complex model (low bias, high variance).

import numpy as np
import matplotlib.pyplot as plt
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_score

def true_fun(X):
    return np.cos(1.5 * np.pi * X)

np.random.seed(0)

n_samples = 30
degrees = [1, 4, 15]

X = np.sort(np.random.rand(n_samples))
y = true_fun(X) + np.random.randn(n_samples) * 0.1

plt.figure(figsize=(14, 5))
for i in range(len(degrees)):
    ax = plt.subplot(1, len(degrees), i + 1)
    plt.setp(ax, xticks=(), yticks=())

    polynomial_features = PolynomialFeatures(degree=degrees[i], include_bias=False)
    linear_regression = LinearRegression()
    pipeline = Pipeline([("polynomial_features", polynomial_features),
                         ("linear_regression", linear_regression)])
    pipeline.fit(X[:, np.newaxis], y)

    # Evaluate the models using cross-validation
    scores = cross_val_score(pipeline, X[:, np.newaxis], y,
                             scoring="neg_mean_squared_error", cv=10)

    X_test = np.linspace(0, 1, 100)
    plt.plot(X_test, pipeline.predict(X_test[:, np.newaxis]), label="Model")
    plt.plot(X_test, true_fun(X_test), label="True function")
    plt.scatter(X, y, edgecolor='b', s=20, label="Samples")
    plt.xlabel("x")
    plt.ylabel("y")
    plt.xlim((0, 1))
    plt.ylim((-2, 2))
    plt.legend(loc="best")
    plt.title("Degree {}\\nMSE = {:.2e}(+/- {:.2e})".format(
        degrees[i], -scores.mean(), scores.std()))

plt.show()

What the Code Visualizes

  1. Degree 1 (High Bias): The straight line is too simple to capture the underlying cosine function. It underfits the data. It has high bias and low variance.
  2. Degree 4 (Good Balance): This model fits the data well and is close to the true function. It has found a good balance between bias and variance.
  3. Degree 15 (High Variance): This model goes through every single data point. It has learned the noise in the training data perfectly. It has very low bias but extremely high variance. It overfits the data and would not generalize well.

The Mean Squared Error (MSE) reported for each model shows that the model with degree 4 has the lowest error, confirming it’s the best fit among the three.

Conclusion

The bias-variance tradeoff is a core concept that helps us understand the behavior of machine learning models. Techniques like cross-validation help us estimate the generalization error of a model, while regularization techniques (like Ridge and Lasso) are explicitly designed to manage this tradeoff by penalizing model complexity, thus reducing variance at the cost of a slight increase in bias.

The Bias-Variance Tradeoff: A Balancing Act in Machine Learning
Older post

Finding the Sweet Spot: An Introduction to Hyperparameter Tuning

Machine learning models have many knobs and dials called hyperparameters. Learn how to tune them effectively using techniques like Grid Search and Random Search to unlock your model's true potential.

Newer post

Support Vector Machines: Maximizing the Margin

An introduction to Support Vector Machines (SVMs), a powerful and versatile supervised learning algorithm capable of performing linear or non-linear classification, regression, and outlier detection.

The Bias-Variance Tradeoff: A Balancing Act in Machine Learning