Machine learning operations (MLOps) are a set of practices that:
- Automate and simplify machine learning (ML) workflows and deployments
- Bring DevOps discipline to building, shipping, and running ML models
- Improve reliability, reproducibility, and productivity across the ML lifecycle
- Understand the problem and use case - Define business objectives and identify the specific problem that needs to be solved with data science.
- EDA: Data nature finding - Exploratory Data Analysis to understand data patterns, distributions, relationships, and anomalies.
- Data pre-processing - Clean and prepare data using techniques like IQR, box plot, Q-Q plot, standardization, and handling null values.
- Feature engineering - Create new meaningful features from existing data to improve model performance.
- Feature selection - Identify and select the most relevant features that contribute to the predictive power of the model.
- Model training and hyperparameter tuning - Train machine learning models and optimize their parameters for best performance.
- Model evaluation - Assess model performance using appropriate metrics to ensure it meets business requirements.
- App building/UI - Develop user interface and application to make the model accessible to end users.
- Deploy - Deploy the model to production environment where it can serve real-world predictions.
- Low Coding Standards - OOPS concept, modular coding, logging, exception handling, etc.
- No Data Management - Data ingestion/artifacts management
- Versioning - Code, data, and model versioning not implemented
- Data Pipeline/Experiments - Lack of reproducible pipelines and experiment tracking
- No CI/CD Concept - Missing continuous integration and continuous deployment practices
- Scalability & Monitoring (Production) - Missing tools like Kubernetes, Prometheus, Grafana for production monitoring
- Cross Team Friction - Communication and coordination issues between teams
| Aspect | Software Development Lifecycle (SDLC) | Data Science Lifecycle (ML) |
|---|---|---|
| Primary Goal | Build reliable, maintainable software products | Build accurate, generalizable ML models |
| Output | Software application/system | Trained ML model with predictions |
| Testing | Unit testing, integration testing, QA testing | Data validation, model validation, cross-validation |
| Versioning | Code versioning (Git) | Code, data, and model versioning required |
| Deployment | Fixed requirements, deterministic outputs | Dynamic requirements, probabilistic outputs |
| Monitoring | Application performance, errors, uptime | Model performance, data drift, prediction accuracy |
| Reproducibility | Easier to reproduce with same code | Harder due to randomness and data variability |
| CI/CD | Well-established practices | Emerging best practices in MLOps |
| Key Challenge | Feature completeness and bug-free code | Model accuracy and handling data/concept drift |
| Maintenance | Bug fixes, feature updates | Model retraining, data pipeline updates |
| Stakeholders | Developers, QA, DevOps | Data scientists, ML engineers, DevOps engineers |
- Code Standards - OOPS concept, modular coding, logging module for better debugging, managing artifacts, components, and pipelines
- Code Versioning - Git & GitHub (Bitbucket, GitLab)
- Data/Model Versioning - Maintaining data pipelines and experimentation using DVC, MLflow (Neptune, Seldon, Kubeflow, ZenML)
- CI/CD Tools - GitHub Actions, CircleCI, TravisCI
- Containerization - Docker and Docker Hub for code reliability
- Scalability & Monitoring - Kubernetes, Prometheus, Grafana
- Cloud Services - AWS Services (IAM User, ECR, S3, EC2, etc.) or all-in-one platforms (AWS SageMaker, Google Vertex AI, Azure ML)

