Support Vector Machines: Maximizing the Margin

Support Vector Machines (SVMs) are a powerful class of supervised learning models. The core idea behind SVMs is to find the optimal hyperplane that best separates the data points of different classes in a high-dimensional space.

The Maximum Margin Classifier

Imagine you have data points for two different classes on a 2D plane. You could draw many possible lines to separate them. Which one is the best?

An SVM answers this by finding the line (or, in higher dimensions, the hyperplane) that has the largest possible margin between the two classes. The margin is the distance between the hyperplane and the nearest data point from either class. The data points that lie on this margin are called the support vectors—they are the critical elements of the dataset that “support” the hyperplane.

By maximizing the margin, the SVM creates a decision boundary that is as robust as possible, which often leads to better generalization performance on unseen data. This is why SVMs are often called maximum margin classifiers.

The Kernel Trick: Handling Non-Linear Data

What if the data isn’t linearly separable? You can’t draw a straight line to separate the classes. This is where the most powerful feature of SVMs comes into play: the kernel trick.

The kernel trick allows SVMs to perform non-linear classification. It works by projecting the data into a higher-dimensional space where it is linearly separable.

Imagine you have data points in a 1D line that cannot be separated. You could project them into 2D (e.g., by squaring each value, (x \rightarrow (x, x^2))). In this new, higher-dimensional space, you might be able to draw a line to separate them.

Kernels are functions that calculate the dot product of the data points in this higher-dimensional space without ever actually transforming the data. This is incredibly efficient. Common kernels include:

Linear Kernel: For linearly separable data.
Polynomial Kernel: For data with polynomial relationships.
Radial Basis Function (RBF) Kernel: A very popular and flexible kernel that can handle complex, non-linear relationships.

SVMs in Scikit-Learn

Let’s use scikit-learn to visualize the decision boundaries of SVMs with different kernels.

import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets

# --- 1. Load the Iris dataset ---
# We'll use the first two features for visualization
iris = datasets.load_iris()
X = iris.data[:, :2]
y = iris.target

# --- 2. Create SVM models with different kernels ---
C = 1.0  # SVM regularization parameter
models = (svm.SVC(kernel='linear', C=C),
          svm.SVC(kernel='rbf', gamma=0.7, C=C),
          svm.SVC(kernel='poly', degree=3, C=C))
models = [clf.fit(X, y) for clf in models]

# --- 3. Plot the decision boundaries ---
def plot_decision_boundary(clf, title, ax):
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.02),
                         np.arange(y_min, y_max, 0.02))

    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    ax.contourf(xx, yy, Z, cmap=plt.cm.coolwarm, alpha=0.8)
    ax.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.coolwarm, s=20, edgecolors='k')
    ax.set_title(title)

titles = ('SVC with linear kernel',
          'SVC with RBF kernel',
          'SVC with polynomial (degree 3) kernel')

fig, axes = plt.subplots(1, 3, figsize=(15, 5))
for clf, title, ax in zip(models, titles, axes.flatten()):
    plot_decision_boundary(clf, title, ax)

plt.show()

What the Code Does

Data: We load the classic Iris dataset, which has three classes of flowers, but we only use the first two features so we can visualize the results in 2D.
Models: We create three different SVC (Support Vector Classifier) models: one with a linear kernel, one with an rbf kernel, and one with a poly kernel.
- C is the regularization parameter. It controls the tradeoff between achieving a low training error and a low testing error. A smaller C creates a wider margin but may misclassify more training points.
- gamma is a parameter for the RBF kernel that defines how much influence a single training example has.
- degree is the degree for the polynomial kernel.
Plotting: We create a function to plot the decision boundary for each trained classifier. The different colored regions show how each model would classify a new data point in that area. You can see how the linear kernel produces a straight line, while the RBF and polynomial kernels produce complex, non-linear boundaries.

Conclusion

Support Vector Machines are a robust and versatile class of models that are effective in high-dimensional spaces and are memory efficient because they only use a subset of training points (the support vectors) in the decision function. While they have been somewhat surpassed by tree-based ensembles like XGBoost for tabular data and by neural networks for perceptual data (images, audio), they are still a powerful tool to have in your machine learning arsenal, especially for small to medium-sized datasets with many features.

Recent content

Can AI Catch What Clinicians Miss? A Comparative Study of Diagnostic Accuracy

MLOps: From Model to Production

Cross-Validation: The Gold Standard for Model Evaluation

Support Vector Machines: Maximizing the Margin

Popular topics

Support Vector Machines: Maximizing the Margin

The Maximum Margin Classifier

The Kernel Trick: Handling Non-Linear Data

SVMs in Scikit-Learn

What the Code Does

Conclusion

The Bias-Variance Tradeoff: A Balancing Act in Machine Learning

Cross-Validation: The Gold Standard for Model Evaluation

MLOps: From Model to Production

Cross-Validation: The Gold Standard for Model Evaluation

The Bias-Variance Tradeoff: A Balancing Act in Machine Learning

Finding the Sweet Spot: An Introduction to Hyperparameter Tuning

Recent content

Popular topics

Support Vector Machines: Maximizing the Margin

The Maximum Margin Classifier

The Kernel Trick: Handling Non-Linear Data

SVMs in Scikit-Learn

What the Code Does

Conclusion

You may also like