A machine learning model that isn’t deployed in production is just a research project. The real value of machine learning is realized when it’s making predictions on live data and influencing business decisions. But getting a model from a Jupyter Notebook to a stable, production environment is a major challenge. This is where MLOps comes in.
MLOps is to machine learning what DevOps is to software engineering. It’s a set of practices that aims to deploy and maintain machine learning models in production reliably and efficiently. It combines machine learning, data engineering, and DevOps.
The Machine Learning Lifecycle
MLOps covers the entire lifecycle of a machine learning model:
- Data Ingestion & Versioning: Sourcing, cleaning, and tracking datasets.
- Model Training & Tuning: Experimenting with different models and hyperparameters.
- Model Versioning: Storing and tracking trained models as artifacts.
- Model Deployment: Serving the model so that other applications can use it to make predictions.
- Model Monitoring: Watching the model’s performance in production.
- Retraining: Automatically retraining the model on new data when its performance degrades.
Deployment Strategies
Once you have a trained model, how do you make it available to users?
- Online/Real-time Inference: The model is deployed as an API endpoint. An application can send a request with a single data point and get a prediction back in real-time. This is common for interactive applications.
- Batch Inference: The model runs on a schedule (e.g., once a day) and processes a large batch of data at once, storing the predictions in a database. This is common for non-time-critical tasks like generating daily reports.
Monitoring: The Forgotten Essential
A model’s performance is not static. The real world changes, and a model that was accurate last month might not be accurate today. This is called model drift or concept drift.
Effective MLOps involves continuous monitoring to detect these issues:
- Data Drift: Is the statistical distribution of the live data different from the training data? For example, if a loan approval model was trained on data from one country and is now seeing data from a new country.
- Performance Degradation: Is the model’s accuracy (or other key metric) dropping over time?
- Infrastructure Health: Is the API endpoint up and running? Is it responding quickly?
When drift is detected, it’s a signal that the model needs to be retrained on more recent data.
A Conceptual MLOps Workflow with a REST API
Let’s look at a conceptual example of how you might serve a scikit-learn model using Flask, a simple Python web framework.
import joblib
from flask import Flask, request, jsonify
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
# --- 1. Train and Save a Model (Offline Step) ---
# In a real project, this would be a separate script.
X, y = make_classification(n_samples=1000, n_features=20, random_state=42)
model = LogisticRegression(solver='liblinear')
model.fit(X, y)
# Save the trained model to a file
joblib.dump(model, 'model.pkl')
print("Model trained and saved to model.pkl")
# --- 2. Build a Flask API to Serve the Model ---
app = Flask(__name__)
# Load the model when the app starts
try:
production_model = joblib.load('model.pkl')
print("Model loaded successfully.")
except FileNotFoundError:
production_model = None
print("Model file not found. API will not be able to make predictions.")
@app.route('/predict', methods=['POST'])
def predict():
if production_model is None:
return jsonify({'error': 'Model not loaded'}), 500
# Get the JSON data from the request
data = request.get_json()
# We expect a list or list of lists for the features
features = data.get('features')
if features is None or not isinstance(features, list):
return jsonify({'error': 'Missing or invalid "features" key'}), 400
# Make prediction
try:
prediction = production_model.predict(features)
return jsonify({'prediction': prediction.tolist()})
except Exception as e:
return jsonify({'error': str(e)}), 400
# To run this app:
# 1. Install Flask: pip install Flask
# 2. Save the code as a Python file (e.g., app.py)
# 3. Run from your terminal: flask run
# 4. You can then send POST requests to http://127.0.0.1:5000/predict
# Example request using curl:
# curl -X POST -H "Content-Type: application/json" -d '{"features": [[0.1, -0.5, ...]]}' http://127.0.0.1:5000/predictWhat the Code Does
- Model Training: We train a simple
LogisticRegressionmodel and save it to a file calledmodel.pklusingjoblib. This is our model artifact. - Flask API: We create a simple web server with a single endpoint:
/predict. - Loading: When the server starts, it loads the saved model into memory.
- Prediction Endpoint: The
/predictendpoint listens forPOSTrequests. It expects JSON data containing a key called"features". It uses the loaded model to make a prediction on these features and returns the prediction as a JSON response.
This is a very basic example, but it illustrates the core idea of model deployment. A full MLOps pipeline would automate this entire process using tools for CI/CD (like Jenkins or GitHub Actions), containerization (like Docker and Kubernetes), and dedicated monitoring services.
Conclusion
MLOps is the bridge between building a model and creating real-world value. It turns machine learning from a research-oriented discipline into a mature engineering practice. By focusing on automation, reproducibility, and monitoring, MLOps allows organizations to build, deploy, and maintain machine learning systems that are robust, scalable, and trustworthy.



