Skip to content

Mrcoderv/MLops_learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Day1:MLops_learning

Introduction to MLOps

Machine learning operations (MLOps) are a set of practices that:

  • Automate and simplify machine learning (ML) workflows and deployments
  • Bring DevOps discipline to building, shipping, and running ML models
  • Improve reliability, reproducibility, and productivity across the ML lifecycle

1769158408538

Life Cycle of Data Science Project

1769159045814

  1. Understand the problem and use case - Define business objectives and identify the specific problem that needs to be solved with data science.
  2. EDA: Data nature finding - Exploratory Data Analysis to understand data patterns, distributions, relationships, and anomalies.
  3. Data pre-processing - Clean and prepare data using techniques like IQR, box plot, Q-Q plot, standardization, and handling null values.
  4. Feature engineering - Create new meaningful features from existing data to improve model performance.
  5. Feature selection - Identify and select the most relevant features that contribute to the predictive power of the model.
  6. Model training and hyperparameter tuning - Train machine learning models and optimize their parameters for best performance.
  7. Model evaluation - Assess model performance using appropriate metrics to ensure it meets business requirements.
  8. App building/UI - Develop user interface and application to make the model accessible to end users.
  9. Deploy - Deploy the model to production environment where it can serve real-world predictions.

Issues with DS Practice Without MLOps

  1. Low Coding Standards - OOPS concept, modular coding, logging, exception handling, etc.
  2. No Data Management - Data ingestion/artifacts management
  3. Versioning - Code, data, and model versioning not implemented
  4. Data Pipeline/Experiments - Lack of reproducible pipelines and experiment tracking
  5. No CI/CD Concept - Missing continuous integration and continuous deployment practices
  6. Scalability & Monitoring (Production) - Missing tools like Kubernetes, Prometheus, Grafana for production monitoring
  7. Cross Team Friction - Communication and coordination issues between teams

Comparison: Data Science Lifecycle vs Software Development Lifecycle

Aspect Software Development Lifecycle (SDLC) Data Science Lifecycle (ML)
Primary Goal Build reliable, maintainable software products Build accurate, generalizable ML models
Output Software application/system Trained ML model with predictions
Testing Unit testing, integration testing, QA testing Data validation, model validation, cross-validation
Versioning Code versioning (Git) Code, data, and model versioning required
Deployment Fixed requirements, deterministic outputs Dynamic requirements, probabilistic outputs
Monitoring Application performance, errors, uptime Model performance, data drift, prediction accuracy
Reproducibility Easier to reproduce with same code Harder due to randomness and data variability
CI/CD Well-established practices Emerging best practices in MLOps
Key Challenge Feature completeness and bug-free code Model accuracy and handling data/concept drift
Maintenance Bug fixes, feature updates Model retraining, data pipeline updates
Stakeholders Developers, QA, DevOps Data scientists, ML engineers, DevOps engineers

Standards That Should Be Followed by New Beginners

  1. Code Standards - OOPS concept, modular coding, logging module for better debugging, managing artifacts, components, and pipelines
  2. Code Versioning - Git & GitHub (Bitbucket, GitLab)
  3. Data/Model Versioning - Maintaining data pipelines and experimentation using DVC, MLflow (Neptune, Seldon, Kubeflow, ZenML)
  4. CI/CD Tools - GitHub Actions, CircleCI, TravisCI
  5. Containerization - Docker and Docker Hub for code reliability
  6. Scalability & Monitoring - Kubernetes, Prometheus, Grafana
  7. Cloud Services - AWS Services (IAM User, ECR, S3, EC2, etc.) or all-in-one platforms (AWS SageMaker, Google Vertex AI, Azure ML)

About

Machine learning operations learning meterial

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages