In supervised machine learning, our goal is to build a model that can make accurate predictions on new, unseen data. The prediction error for any machine learning algorithm can be broken down into three parts: Bias Error, Variance Error, and Irreducible Error.
- Irreducible Error: This is the noise that is inherent in the problem itself. It cannot be reduced by any model.
- Bias Error: These are the simplifying assumptions made by a model to make the target function easier to learn.
- Variance Error: This is the amount that the estimate of the target function will change if different training data was used.
Understanding the relationship between bias and variance is crucial for diagnosing model performance and avoiding overfitting and underfitting.
Understanding Bias
Bias is the difference between the average prediction of our model and the correct value which we are trying to predict. A model with high bias pays very little attention to the training data and oversimplifies the model. It always leads to high error on both training and test data. This is underfitting.
- High Bias: A very simple model, like a linear regression model trying to fit a complex, non-linear relationship.
- Low Bias: A model that makes fewer assumptions about the target function, like a deep neural network.
Understanding Variance
Variance refers to the model’s sensitivity to small fluctuations in the training data. A model with high variance pays a lot of attention to training data and does not generalize well on the new data. It learns the noise in the training data, leading to high performance on the training set but poor performance on the test set. This is overfitting.
- High Variance: A very complex model, like a very deep decision tree.
- Low Variance: A simple model whose predictions don’t change much if the training data is modified slightly.
The Tradeoff
Here is the crux of the issue:
- Increasing a model’s complexity will decrease its bias.
- Increasing a model’s complexity will increase its variance.
This is the bias-variance tradeoff. It’s a balancing act. We can’t have both low bias and low variance. An ideal model is one that is complex enough to capture the underlying structure of the data (low bias), but not so complex that it starts modeling the noise (low variance).

The goal is to find a model in that “sweet spot” in the middle, where the total error is minimized.
Visualizing the Tradeoff
Let’s use Python to visualize how model complexity affects bias and variance. We’ll fit polynomial regression models of different degrees to a synthetic dataset. A low-degree polynomial is a simple model (high bias, low variance), while a high-degree polynomial is a complex model (low bias, high variance).
import numpy as np
import matplotlib.pyplot as plt
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_score
def true_fun(X):
return np.cos(1.5 * np.pi * X)
np.random.seed(0)
n_samples = 30
degrees = [1, 4, 15]
X = np.sort(np.random.rand(n_samples))
y = true_fun(X) + np.random.randn(n_samples) * 0.1
plt.figure(figsize=(14, 5))
for i in range(len(degrees)):
ax = plt.subplot(1, len(degrees), i + 1)
plt.setp(ax, xticks=(), yticks=())
polynomial_features = PolynomialFeatures(degree=degrees[i], include_bias=False)
linear_regression = LinearRegression()
pipeline = Pipeline([("polynomial_features", polynomial_features),
("linear_regression", linear_regression)])
pipeline.fit(X[:, np.newaxis], y)
# Evaluate the models using cross-validation
scores = cross_val_score(pipeline, X[:, np.newaxis], y,
scoring="neg_mean_squared_error", cv=10)
X_test = np.linspace(0, 1, 100)
plt.plot(X_test, pipeline.predict(X_test[:, np.newaxis]), label="Model")
plt.plot(X_test, true_fun(X_test), label="True function")
plt.scatter(X, y, edgecolor='b', s=20, label="Samples")
plt.xlabel("x")
plt.ylabel("y")
plt.xlim((0, 1))
plt.ylim((-2, 2))
plt.legend(loc="best")
plt.title("Degree {}\\nMSE = {:.2e}(+/- {:.2e})".format(
degrees[i], -scores.mean(), scores.std()))
plt.show()What the Code Visualizes
- Degree 1 (High Bias): The straight line is too simple to capture the underlying cosine function. It underfits the data. It has high bias and low variance.
- Degree 4 (Good Balance): This model fits the data well and is close to the true function. It has found a good balance between bias and variance.
- Degree 15 (High Variance): This model goes through every single data point. It has learned the noise in the training data perfectly. It has very low bias but extremely high variance. It overfits the data and would not generalize well.
The Mean Squared Error (MSE) reported for each model shows that the model with degree 4 has the lowest error, confirming it’s the best fit among the three.
Conclusion
The bias-variance tradeoff is a core concept that helps us understand the behavior of machine learning models. Techniques like cross-validation help us estimate the generalization error of a model, while regularization techniques (like Ridge and Lasso) are explicitly designed to manage this tradeoff by penalizing model complexity, thus reducing variance at the cost of a slight increase in bias.



