🔄

MLOps and Deployment

Operations, deployment, and maintenance of ML models in production

⏱️ Estimated reading time: 18 minutes

Introduction to MLOps

MLOps - Machine Learning Operations

MLOps is a set of practices that combines Machine Learning, DevOps, and Data Engineering to deploy and maintain ML systems in production reliably and efficiently.

Why MLOps?

Challenges without MLOps:

- Model drift: Performance degrades over time
- Reproducibility: Difficult to recreate results
- Scalability: Problems moving from experimentation to production
- Monitoring: Lack of performance visibility
- Versioning: Difficult to track models and data

Benefits of MLOps:

- Faster and more reliable deployments
- Better team collaboration
- Reproducibility and audit
- Automatic scalability
- Continuous monitoring
- Early problem detection

MLOps Principles

1. Automation

- CI/CD for ML models
- Automated training pipelines
- Automated testing

2. Versioning

- Code: Git for ML scripts
- Data: Dataset versioning
- Models: Model registry
- Experiments: Hyperparameter tracking

3. Monitoring

- Model performance metrics
- Data drift
- Model drift
- Latency and throughput

4. Governance

- Model auditing
- Regulatory compliance
- Explainability
- Risk management

MLOps Lifecycle

1. Development: Experimentation and training
2. Integration: Model CI/CD
3. Deployment: Production release
4. Monitoring: Continuous observability
5. Retraining: Model updates
6. Governance: Compliance and audit

🎯 Key Points

✓ Automating pipelines reduces errors and speeds retraining
✓ Version code, data and models for reproducibility and audit
✓ Proactive monitoring of drift and business metrics is essential
✓ Define retraining strategy (scheduled vs drift-triggered)
✓ Include governance and rollback processes in deployment pipelines

Model Deployment with SageMaker

Model Deployment with Amazon SageMaker

Deployment Options

1. Real-Time Inference (Endpoints)

Features:
- Synchronous inference with low latency
- Persistent endpoint (always-on)
- Auto-scaling based on traffic
- Automatic load balancing

When to use:
- Applications requiring immediate responses
- Individual or small batch predictions
- Latency < 100ms important

Configuration:
- Select instance type
- Configure auto-scaling
- Multiple variants for A/B testing

2. Serverless Inference

Features:
- No server management
- Automatically scales to 0
- Pay per use (per invocation)
- Initial cold start

When to use:
- Intermittent or unpredictable traffic
- Cost optimization
- Development and testing

3. Batch Transform

Features:
- Asynchronous batch processing
- Processes complete S3 datasets
- No persistent infrastructure
- Automatic parallelization

When to use:
- Periodic predictions (daily, weekly)
- Large data volumes
- Real-time not required

4. Asynchronous Inference

Features:
- Asynchronous inference with queue
- Handle large payloads (up to 1GB)
- Long processing time (up to 15 min)
- Queue-based auto-scaling

When to use:
- Large file processing
- Models with variable latency
- Unpredictable workloads

SageMaker Edge Manager

Edge device deployment:
- Model optimization for edge
- Device fleet management
- Edge model monitoring
- OTA (Over-The-Air) updates

Multi-Model Endpoints

Features:
- Multiple models on single endpoint
- Dynamic model loading
- Reduces infrastructure costs
- Ideal for many similar models

Deployment Strategies

Blue/Green Deployment

- Two environments: current (blue) and new (green)
- Instant switch between versions
- Fast rollback if issues

Canary Deployment

- Gradual deployment to percentage of traffic
- Monitor new version with real traffic
- Progressive increase if all goes well

A/B Testing

- Multiple model variants
- Traffic distribution between variants
- Production performance comparison

🎯 Key Points

✓ Choose deployment type by latency and cost needs (real-time, serverless, batch)
✓ Serverless inference reduces cost for intermittent traffic but may introduce cold starts
✓ Multi-model endpoints help when many similar models and reduce infra costs
✓ Use canary or blue/green testing for safe rollouts
✓ Tune instance types and autoscaling to balance cost/performance

Monitoring and Maintenance

Model Monitoring and Maintenance

Amazon SageMaker Model Monitor

Service to detect drift and maintain model quality in production.

Monitoring Types

1. Data Quality Monitoring

Detects changes in input data quality:
- Missing values
- Distribution changes
- Schema violations
- Out-of-range values

2. Model Quality Monitoring

Monitors model performance:
- Accuracy
- Custom metrics
- Ground truth comparison
- Degradation detection

3. Bias Drift Monitoring

Detects bias changes:
- Monitors fairness metrics
- Detects emerging biases
- Bias change alerts

4. Feature Attribution Drift

Monitors feature importance:
- SHAP value changes
- Feature drift detection
- Explainability analysis

Key Monitoring Concepts

Data Drift

Changes in input data distribution

Causes:
- Changes in user behavior
- Seasonal changes
- Changes in data sources
- Pipeline issues

Detection:
- Statistical tests (KS test, Chi-squared)
- Baseline comparison
- Distribution distance

Model Drift (Concept Drift)

Changes in relationship between features and target

Types:
- Sudden drift: Abrupt change
- Gradual drift: Progressive change
- Incremental drift: Continuous small changes
- Recurring drift: Cyclical patterns

Performance Monitoring

System Metrics:
- Prediction latency
- Throughput (predictions/sec)
- Error rate
- Resource utilization

Model Metrics:
- Accuracy, Precision, Recall
- AUC-ROC
- MSE, RMSE (for regression)
- Custom business metrics

Retraining Strategies

1. Scheduled Retraining

- Fixed frequency (daily, weekly, monthly)
- Automatic via pipelines
- Useful when drift is predictable

2. Drift-Triggered Retraining

- Continuous metric monitoring
- Configurable alert threshold
- Automatic retraining when drift detected

3. Online/Incremental Retraining

- Continuous update with new data
- No training from scratch
- For streaming data

SageMaker Pipelines

ML workflow orchestration:
- Define preprocessing steps
- Automatic training
- Model evaluation
- Conditional approval
- Automatic deployment

Benefits:
- Reproducibility
- End-to-end automation
- Pipeline versioning
- CI/CD integration

Best Practices

1. Data Baseline: Establish reference distribution
2. Alert Thresholds: Define acceptable drift levels
3. Proactive Monitoring: Don't wait for prediction failures
4. Complete Logging: Record inputs, outputs, and metrics
5. Rollback Plan: Have quick reversion strategy
6. Documentation: Maintain change and decision log
7. Production Testing: Validate before full deployment
8. Observability: Configure dashboards and alerts

🎯 Key Points

✓ Monitor system metrics (latency, throughput) and model metrics (accuracy, drift)
✓ Log inputs and outputs for reproducibility and debugging
✓ Set alerts and thresholds for retraining or rollback
✓ Use A/B tests and canary before full rollout
✓ Document pipelines and maintain incident playbooks

← Back to AWS-AIF-C01