πŸ”„

MLOps and Deployment

Operations, deployment, and maintenance of ML models in production

⏱️ Estimated reading time: 18 minutes

Introduction to MLOps

MLOps - Machine Learning Operations



MLOps is a set of practices that combines Machine Learning, DevOps, and Data Engineering to deploy and maintain ML systems in production reliably and efficiently.

Why MLOps?



Challenges without MLOps:


- Model drift: Performance degrades over time
- Reproducibility: Difficult to recreate results
- Scalability: Problems moving from experimentation to production
- Monitoring: Lack of performance visibility
- Versioning: Difficult to track models and data

Benefits of MLOps:


- Faster and more reliable deployments
- Better team collaboration
- Reproducibility and audit
- Automatic scalability
- Continuous monitoring
- Early problem detection

MLOps Principles



1. Automation


- CI/CD for ML models
- Automated training pipelines
- Automated testing

2. Versioning


- Code: Git for ML scripts
- Data: Dataset versioning
- Models: Model registry
- Experiments: Hyperparameter tracking

3. Monitoring


- Model performance metrics
- Data drift
- Model drift
- Latency and throughput

4. Governance


- Model auditing
- Regulatory compliance
- Explainability
- Risk management

MLOps Lifecycle



1. Development: Experimentation and training
2. Integration: Model CI/CD
3. Deployment: Production release
4. Monitoring: Continuous observability
5. Retraining: Model updates
6. Governance: Compliance and audit

🎯 Key Points

  • βœ“ Automating pipelines reduces errors and speeds retraining
  • βœ“ Version code, data and models for reproducibility and audit
  • βœ“ Proactive monitoring of drift and business metrics is essential
  • βœ“ Define retraining strategy (scheduled vs drift-triggered)
  • βœ“ Include governance and rollback processes in deployment pipelines

Model Deployment with SageMaker

Model Deployment with Amazon SageMaker



Deployment Options



1. Real-Time Inference (Endpoints)



Features:
- Synchronous inference with low latency
- Persistent endpoint (always-on)
- Auto-scaling based on traffic
- Automatic load balancing

When to use:
- Applications requiring immediate responses
- Individual or small batch predictions
- Latency < 100ms important

Configuration:
- Select instance type
- Configure auto-scaling
- Multiple variants for A/B testing

2. Serverless Inference



Features:
- No server management
- Automatically scales to 0
- Pay per use (per invocation)
- Initial cold start

When to use:
- Intermittent or unpredictable traffic
- Cost optimization
- Development and testing

3. Batch Transform



Features:
- Asynchronous batch processing
- Processes complete S3 datasets
- No persistent infrastructure
- Automatic parallelization

When to use:
- Periodic predictions (daily, weekly)
- Large data volumes
- Real-time not required

4. Asynchronous Inference



Features:
- Asynchronous inference with queue
- Handle large payloads (up to 1GB)
- Long processing time (up to 15 min)
- Queue-based auto-scaling

When to use:
- Large file processing
- Models with variable latency
- Unpredictable workloads

SageMaker Edge Manager



Edge device deployment:
- Model optimization for edge
- Device fleet management
- Edge model monitoring
- OTA (Over-The-Air) updates

Multi-Model Endpoints



Features:
- Multiple models on single endpoint
- Dynamic model loading
- Reduces infrastructure costs
- Ideal for many similar models

Deployment Strategies



Blue/Green Deployment


- Two environments: current (blue) and new (green)
- Instant switch between versions
- Fast rollback if issues

Canary Deployment


- Gradual deployment to percentage of traffic
- Monitor new version with real traffic
- Progressive increase if all goes well

A/B Testing


- Multiple model variants
- Traffic distribution between variants
- Production performance comparison

🎯 Key Points

  • βœ“ Choose deployment type by latency and cost needs (real-time, serverless, batch)
  • βœ“ Serverless inference reduces cost for intermittent traffic but may introduce cold starts
  • βœ“ Multi-model endpoints help when many similar models and reduce infra costs
  • βœ“ Use canary or blue/green testing for safe rollouts
  • βœ“ Tune instance types and autoscaling to balance cost/performance

Monitoring and Maintenance

Model Monitoring and Maintenance



Amazon SageMaker Model Monitor



Service to detect drift and maintain model quality in production.

Monitoring Types



1. Data Quality Monitoring

Detects changes in input data quality:
- Missing values
- Distribution changes
- Schema violations
- Out-of-range values

2. Model Quality Monitoring

Monitors model performance:
- Accuracy
- Custom metrics
- Ground truth comparison
- Degradation detection

3. Bias Drift Monitoring

Detects bias changes:
- Monitors fairness metrics
- Detects emerging biases
- Bias change alerts

4. Feature Attribution Drift

Monitors feature importance:
- SHAP value changes
- Feature drift detection
- Explainability analysis

Key Monitoring Concepts



Data Drift


Changes in input data distribution

Causes:
- Changes in user behavior
- Seasonal changes
- Changes in data sources
- Pipeline issues

Detection:
- Statistical tests (KS test, Chi-squared)
- Baseline comparison
- Distribution distance

Model Drift (Concept Drift)


Changes in relationship between features and target

Types:
- Sudden drift: Abrupt change
- Gradual drift: Progressive change
- Incremental drift: Continuous small changes
- Recurring drift: Cyclical patterns

Performance Monitoring



System Metrics:
- Prediction latency
- Throughput (predictions/sec)
- Error rate
- Resource utilization

Model Metrics:
- Accuracy, Precision, Recall
- AUC-ROC
- MSE, RMSE (for regression)
- Custom business metrics

Retraining Strategies



1. Scheduled Retraining


- Fixed frequency (daily, weekly, monthly)
- Automatic via pipelines
- Useful when drift is predictable

2. Drift-Triggered Retraining


- Continuous metric monitoring
- Configurable alert threshold
- Automatic retraining when drift detected

3. Online/Incremental Retraining


- Continuous update with new data
- No training from scratch
- For streaming data

SageMaker Pipelines



ML workflow orchestration:
- Define preprocessing steps
- Automatic training
- Model evaluation
- Conditional approval
- Automatic deployment

Benefits:
- Reproducibility
- End-to-end automation
- Pipeline versioning
- CI/CD integration

Best Practices



1. Data Baseline: Establish reference distribution
2. Alert Thresholds: Define acceptable drift levels
3. Proactive Monitoring: Don't wait for prediction failures
4. Complete Logging: Record inputs, outputs, and metrics
5. Rollback Plan: Have quick reversion strategy
6. Documentation: Maintain change and decision log
7. Production Testing: Validate before full deployment
8. Observability: Configure dashboards and alerts

🎯 Key Points

  • βœ“ Monitor system metrics (latency, throughput) and model metrics (accuracy, drift)
  • βœ“ Log inputs and outputs for reproducibility and debugging
  • βœ“ Set alerts and thresholds for retraining or rollback
  • βœ“ Use A/B tests and canary before full rollout
  • βœ“ Document pipelines and maintain incident playbooks