Finding the Sweet Spot: An Introduction to Hyperparameter Tuning
Photo by Anete Lūsiņa on Unsplash

When we train a machine learning model, there are two types of parameters: those that are learned from the data (like the weights in a linear regression), and those that are set by the data scientist before training begins. These external, user-set parameters are called hyperparameters.

Examples of hyperparameters include:

  • The learning rate (alpha or lambda) in regularization.
  • The number of trees (n_estimators) in a Random Forest.
  • The number of clusters (k) in K-Means.
  • The number of layers and neurons in a neural network.

The performance of a model can be critically dependent on the choice of these hyperparameters. The process of finding the optimal combination of hyperparameters is called hyperparameter tuning.

Common Tuning Strategies

How do we find the best settings? We can’t know them ahead of time. The solution is to try many different combinations and see what works best, using a validation set to evaluate performance.

Grid Search is the most traditional method. You define a “grid” of specific hyperparameter values you want to try. The algorithm then exhaustively trains and evaluates a model for every possible combination of these values.

  • Pros: It’s guaranteed to find the best combination within the grid you specify.
  • Cons: It can be incredibly slow and computationally expensive, especially if you have many hyperparameters or a large range of values for each. This is often called the “curse of dimensionality.”

Random Search offers a more efficient alternative. Instead of trying every single combination, it randomly samples a fixed number of combinations from the hyperparameter space.

  • Pros: It’s much faster than Grid Search. Research has shown that Random Search is often more effective because some hyperparameters are much more important than others. By searching randomly, you are more likely to find a good value for the important parameters.
  • Cons: It’s not guaranteed to find the absolute best combination.

More advanced techniques like Bayesian Optimization exist, which use the results from previous iterations to intelligently choose the next set of hyperparameters to try.

Hyperparameter Tuning with Scikit-Learn

scikit-learn provides excellent tools for both Grid Search (GridSearchCV) and Random Search (RandomizedSearchCV). Let’s see how to use them to tune a RandomForestClassifier.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier
from scipy.stats import randint

# --- 1. Generate Data ---
X, y = make_classification(n_samples=1000, n_features=20,
                           n_informative=5, n_redundant=0,
                           random_state=42)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# --- 2. Grid Search Example ---
print("--- Starting Grid Search ---")
# Define the grid of parameters to search
param_grid = {
    'n_estimators': [100, 200],
    'max_depth': [10, 20, None],
    'min_samples_leaf': [1, 2, 4]
}

# Total combinations: 2 * 3 * 3 = 18

# Instantiate the grid search model
# cv=3 means 3-fold cross-validation
grid_search = GridSearchCV(estimator=RandomForestClassifier(random_state=42),
                           param_grid=param_grid,
                           cv=3, n_jobs=-1, verbose=1)

grid_search.fit(X_train, y_train)
print(f"Best parameters found by Grid Search: {grid_search.best_params_}")


# --- 3. Random Search Example ---
print("\\n--- Starting Random Search ---")
# Define the distribution of parameters to sample from
param_dist = {
    'n_estimators': randint(50, 250),
    'max_depth': randint(5, 30),
    'min_samples_leaf': randint(1, 5)
}

# n_iter=18 means we will try 18 random combinations
random_search = RandomizedSearchCV(estimator=RandomForestClassifier(random_state=42),
                                   param_distributions=param_dist,
                                   n_iter=18, cv=3, n_jobs=-1, verbose=1, random_state=42)

random_search.fit(X_train, y_train)
print(f"Best parameters found by Random Search: {random_search.best_params_}")

# You can access the best model directly
best_model = random_search.best_estimator_
print(f"\\nBest model accuracy on test set: {best_model.score(X_test, y_test):.4f}")

What the Code Does

  1. Grid Search: We define a param_grid with specific values for n_estimators, max_depth, and min_samples_leaf. GridSearchCV will train a model for all 18 possible combinations, using 3-fold cross-validation for each one to ensure robustness.
  2. Random Search: We define a param_dist using probability distributions from scipy.stats. For example, randint(50, 250) will randomly sample integers between 50 and 250. We set n_iter=18 to make it comparable to the grid search in terms of the number of models trained.
  3. Best Model: Both search objects have a best_params_ attribute that shows the optimal combination found, and a best_estimator_ attribute that gives you the model trained with those parameters, which you can then use for prediction.

Conclusion

Hyperparameter tuning is a critical step for maximizing the performance of your machine learning models. While Grid Search is a solid, exhaustive approach, Random Search often provides a better balance between computation time and performance. By systematically exploring different hyperparameter settings, you can move from a good baseline model to a highly optimized one that is tailored to the specific characteristics of your dataset.

Finding the Sweet Spot: An Introduction to Hyperparameter Tuning
Older post

The Battle Against Overfitting: An Introduction to Regularization

Learn about one of the most common pitfalls in machine learning—overfitting—and explore powerful techniques like L1 (Lasso) and L2 (Ridge) regularization to build more generalizable models.

Newer post

The Bias-Variance Tradeoff: A Balancing Act in Machine Learning

A fundamental concept in machine learning, the Bias-Variance Tradeoff explains the delicate balance between a model that is too simple and one that is too complex. Understanding it is key to diagnosing model performance.

Finding the Sweet Spot: An Introduction to Hyperparameter Tuning