An enterprise CI/CD simulation for medical claim ML pipelines with real MLflow tracking.
This project was created as a hands-on educational lab to teach the complete lifecycle of ML engineering in an enterprise environment. Many data scientists are familiar with model training in notebooks, but struggle to understand how ML models actually get deployed in production.
In enterprise IT departments (like healthcare/medical claims), deploying ML models involves:
- CI/CD Pipelines: Automated testing, validation, and deployment gates
- Experiment Tracking: Logging every training run for reproducibility and auditability
- Champion vs Challenger: Safe model promotion with automatic comparison
- Rollback Capability: Quick recovery when new models underperform
- Drift Monitoring: Detecting when production data diverges from training data
Learning these concepts by reading documentation is insufficient—you need to see and interact with a real pipeline.
| Concept | How This Lab Demonstrates It |
|---|---|
| CI vs CD | CI = quick validation on every commit; CD = full training before production |
| MLflow | Real experiment tracking with params, metrics, artifacts, and model registry |
| Quality Gates | Data validation, champion comparison, and manual approval steps |
| Reproducibility | Every run is tagged with commit_sha, seed, and dataset_window |
| Model Promotion | Challenger model must beat champion before deployment |
| Shadow Testing | Run new models on live traffic without affecting predictions |
| Rollback | One-click revert to previous production model |
We use synthetic medical claims (non-PHI) because:
- No compliance risk: No HIPAA/PHI concerns for learning environments
- Reproducible: Deterministic seeding ensures identical results
- Realistic patterns: Approval probability based on feature combinations mimics real-world patterns
- Safe to share: Can be used in demos, training, and open-source repositories
This simulation teaches concepts that apply to many production tools. Here's where you'd do each step in real enterprise environments:
| Stage | This Simulation | Real-World Tools |
|---|---|---|
| CI Pipeline | FastAPI pipeline engine | GitHub Actions, GitLab CI, Jenkins, Azure DevOps Pipelines, CircleCI |
| CD Pipeline | FastAPI pipeline engine | Same as CI, or Argo Workflows, Kubeflow Pipelines, AWS Step Functions |
| Experiment Tracking | MLflow (real!) | MLflow, Weights & Biases, Neptune.ai, Comet ML, SageMaker Experiments |
| Model Registry | MLflow Model Registry | MLflow, SageMaker Model Registry, Vertex AI Model Registry, Azure ML |
| Model Deployment | Simulated local pointer | SageMaker Endpoints, Vertex AI, Azure ML, Kubernetes + KServe, Databricks Model Serving |
| Feature Store | Synthetic generator | Feast, Databricks Feature Store, SageMaker Feature Store, Tecton |
| Drift Monitoring | PSI calculation | Evidently AI, WhyLabs, Arize AI, Fiddler, SageMaker Model Monitor |
| Approval Gates | UI button | GitHub PR reviews, Slack/Teams approvals, ServiceNow, PagerDuty |
| Artifact Storage | MinIO (S3-compatible) | AWS S3, GCS, Azure Blob, MinIO |
AWS Stack:
GitHub Actions → SageMaker Pipelines → MLflow/SageMaker → SageMaker Endpoints → CloudWatch
GCP Stack:
Cloud Build → Vertex AI Pipelines → Vertex AI Experiments → Vertex AI Endpoints → Cloud Monitoring
Azure Stack:
Azure DevOps → Azure ML Pipelines → Azure ML → Azure ML Endpoints → Azure Monitor
Open Source Stack:
GitHub Actions → Kubeflow Pipelines → MLflow → KServe/Seldon → Prometheus/Grafana
Databricks Stack:
Databricks Repos → Databricks Workflows → MLflow → Databricks Model Serving → Lakehouse Monitoring
Note: This simulation uses MLflow for real (not simulated), so you're already learning one of the most widely-adopted experiment tracking tools in the industry!
| Component | Status | What It Does Here | Real-World Equivalent |
|---|---|---|---|
| MLflow Tracking | ✅ REAL | Logs experiments, params, metrics, artifacts | Same (MLflow, W&B, Neptune) |
| MLflow Model Registry | ✅ REAL | Registers and versions trained models | Same (MLflow, SageMaker, Vertex AI) |
| PostgreSQL | ✅ REAL | Stores MLflow metadata | Same (PostgreSQL, MySQL, cloud DBs) |
| MinIO Artifacts | ✅ REAL | S3-compatible storage for model files & plots | AWS S3, GCS, Azure Blob |
| scikit-learn Training | ✅ REAL | Trains actual Random Forest models | Same (scikit-learn, XGBoost, PyTorch) |
| SHAP Analysis | ✅ REAL | Generates real feature importance explanations | Same (SHAP, LIME) |
| Docker Compose | ✅ REAL | Orchestrates all services locally | Kubernetes, ECS, Cloud Run |
| WebSocket Logs | ✅ REAL | Streams real-time logs to browser | Same (WebSockets, Server-Sent Events) |
| Git Commits | 🔶 SIMULATED | "Fake Commit" button generates SHA | Real git commits trigger CI via webhooks |
| CI/CD Orchestration | 🔶 SIMULATED | FastAPI runs steps sequentially | GitHub Actions, Jenkins, GitLab CI |
| Deployment | 🔶 SIMULATED | Updates a database pointer | SageMaker Endpoints, Kubernetes, API Gateway |
| Production API | 🔶 SIMULATED | No real inference endpoint | REST/gRPC model serving (KServe, TF Serving) |
| Shadow Scoring | 🔶 SIMULATED | Logs metrics but doesn't serve real traffic | A/B testing frameworks, shadow deployments |
| Approval Workflow | 🔶 SIMULATED | UI button click | GitHub PR reviews, Slack approvals, ServiceNow |
| Claims Data | 🔶 SIMULATED | Synthetic generator (no PHI) | Real claims from data warehouse |
Even though some parts are simulated, you're building real skills:
| Skill | How This Lab Teaches It |
|---|---|
| MLflow API | You interact with a real MLflow server—same API used in production |
| Experiment Design | Real params, metrics, and artifact logging |
| Model Comparison | Real champion vs challenger evaluation logic |
| Reproducibility | Real seed-based deterministic training |
| Docker/DevOps | Real containerized services with networking |
| Pipeline Thinking | Real understanding of CI → CD → Deploy flow |
| Monitoring Concepts | Real PSI drift calculation, real metric tracking |
Bottom Line: The "simulation" is primarily in the orchestration trigger (fake commits instead of real git) and deployment target (database pointer instead of cloud endpoint). Everything else—training, tracking, evaluation, artifacts—is production-grade.
- Interactive Pipeline DAG: Visual representation of CI/CD stages with real-time status updates
- Step Inspector: View actual code, configuration, logs, and outputs for each step
- Real MLflow Integration: Experiment tracking, model registry, and artifact storage
- Synthetic Claims Stream: Live data feed with drift monitoring
- Champion vs Challenger: Model promotion logic with evaluation gates
- Failure Mode Toggles: Simulate various failure scenarios for testing
- Shadow/A-B Testing: Monitor model performance on live traffic
┌─────────────────────────────────────────────────────────────────┐
│ Frontend (Next.js) │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Pipeline │ │ Step │ │ MLflow │ │
│ │ Graph │ │ Inspector │ │ Explorer │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Claims Stream & Drift Monitor │ │
│ └──────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│ REST/WebSocket
▼
┌─────────────────────────────────────────────────────────────────┐
│ Backend (FastAPI) │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Pipeline │ │ ML │ │ MLflow │ │
│ │ Engine │ │ Scripts │ │ Client │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
┌───────────────────┼───────────────────┐
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ PostgreSQL │ │ MLflow │ │ MinIO │
│ (Backend DB) │ │ Server │ │ (Artifacts) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
- Docker Desktop (with Docker Compose)
- At least 4GB RAM available for Docker
- Ports available: 3000, 5000, 8000, 9000, 9001, 5432
-
Setup environment (optional -
.envis already included):cp .env.example .env
-
Start all services:
docker compose up --build
-
Access the applications:
- Frontend UI: http://localhost:3001 (main application)
- MLflow UI: http://localhost:5000 (experiment tracking)
- Backend API: http://localhost:8000/docs (API documentation)
- MinIO Console: http://localhost:9001 (artifact storage - optional)
-
Run a pipeline:
- Click "Fake Commit" to simulate a new commit
- Watch the CI pipeline execute
- Click "Continue to CD" after CI completes
- Approve the manual gate to proceed to deployment
| Service | Port | Description |
|---|---|---|
| Frontend | 3001 | Next.js web application |
| Backend | 8000 | FastAPI REST/WebSocket server |
| MLflow | 5000 | MLflow tracking server |
| PostgreSQL | 5432 | Database for MLflow backend |
| MinIO | 9000/9001 | S3-compatible artifact storage |
| Service | Username | Password |
|---|---|---|
| MinIO Console | minioadmin |
minioadmin123 |
| PostgreSQL | mlflow |
mlflow123 |
Note: MinIO is used for artifact storage behind the scenes. You don't need to log into it for normal usage - it's only needed if you want to browse stored artifacts directly.
The UI is divided into 4 main areas:
┌─────────────────┬─────────────────────┬─────────────────┐
│ Pipeline DAG │ Step Inspector │ MLflow Explorer │
│ (left) │ (center) │ (right) │
├─────────────────┴─────────────────────┴─────────────────┤
│ Claims Stream & Drift Monitor │
│ (bottom) │
└─────────────────────────────────────────────────────────┘
- Click the "+ Fake Commit" button in the top-right header
- This simulates a new git commit and automatically starts the CI pipeline
- You'll see a new
run_idappear in the header
- In the Pipeline DAG (left panel), watch the nodes change color:
- Gray = Idle
- Blue (pulsing) = Running
- Green = Success
- Red = Failed
- Click on any node to inspect it in the center panel
When you click a pipeline node, the Step Inspector (center) shows 4 tabs:
| Tab | Description |
|---|---|
| Code | The actual Python script that runs for this step |
| Config | YAML/JSON configuration used by the step |
| Logs | Real-time streaming logs while the step runs |
| Outputs | Metrics, artifacts, and results after completion |
- After CI completes (all CI nodes turn green), the status shows "ci_complete"
- Click "Continue to CD →" button in the header
- The CD pipeline (Full Train → Evaluate → Approval) starts running
- When the pipeline reaches the Manual Approval step, it pauses
- The status shows "awaiting_approval"
- Review the metrics in the Step Inspector (Outputs tab):
- Challenger vs Champion model comparison
- Improvement percentage
- Click "✓ Approve" to proceed to deployment, or "✗ Reject" to stop
After approval:
- The model deploys to Staging
- Shadow monitoring runs to detect drift
- If no issues, the model promotes to Production
In the MLflow Explorer (right panel):
- Filter runs by stage:
CI,CD, orAll - Click on a run to see:
- Parameters (model settings)
- Metrics (accuracy, F1, AUC-ROC)
- Artifacts (plots, model files)
- Click "MLflow UI ↗" in the header for the full MLflow interface
In the Claims Stream (bottom panel):
- Click "Start Stream" to begin generating synthetic claims
- Watch claims flow in real-time (shows approved/denied)
- The Drift Monitor shows:
- PSI (Population Stability Index)
- Current vs reference statistics
- Drift detection alerts
- Click "Failure Modes" button in the header
- Toggle on a failure scenario:
- Schema Validation: Data validation will fail
- Metric Regression: Model performs worse than champion
- MLflow Connection: MLflow logging fails
- Training Error: CI tests fail
- Start a new pipeline to see how failures are handled
- Click "Rollback" button in the header
- This reverts production to the previous model version
- Check the logs to confirm rollback success
- Commit/PR: Triggered on new commit
- CI Tests: Unit tests, integration tests, linting
- Data Validation: Schema and quality checks
- Quick Train: Fast model training on sample data
- MLflow Log: Log CI metrics to MLflow
- Full Train: Complete model training
- Evaluate vs Champion: Compare with production model
- Manual Approval: Human gate for deployment
- Deploy Staging: Push to staging environment
- Shadow Monitor: Run shadow scoring, detect drift
- Promote Production: Make model the new champion
- Rollback: Revert to previous version (available anytime)
Toggle failure scenarios from the UI to test error handling:
| Mode | Effect |
|---|---|
| Schema Validation | Data validation fails on missing column |
| Metric Regression | Model performs worse than champion |
| MLflow Connection | MLflow server connection fails |
| Training Error | CI tests fail |
POST /pipeline/start- Start new pipeline runPOST /pipeline/commit- Generate fake commit and start CIGET /pipeline/{run_id}/status- Get pipeline statusPOST /pipeline/{run_id}/approve- Approve manual gatePOST /pipeline/{run_id}/reject- Reject manual gatePOST /pipeline/rollback- Rollback to previous model
GET /steps- List all step definitionsGET /steps/{name}/code- Get step source codeGET /steps/{name}/config- Get step configuration
GET /mlflow/runs- List MLflow runsGET /mlflow/runs/{run_id}- Get run detailsGET /mlflow/champion- Get current champion model
GET /claims/generate- Generate synthetic claimsPOST /claims/stream/start- Start claims streamPOST /claims/stream/stop- Stop claims streamGET /claims/drift- Calculate drift metrics
GET /failures- List failure modesPOST /failures/{mode}/toggle- Toggle failure mode
ws://localhost:8000/ws/logs/{run_id}- Pipeline log streamws://localhost:8000/ws/claims- Claims data stream
cd frontend
npm install
npm run devcd backend
pip install -r requirements.txt
uvicorn app.main:app --reload# Backend tests
cd backend
pytest
# Frontend tests
cd frontend
npm testThe system generates synthetic medical claims with:
- CPT Buckets: 10 categories (Evaluation, Surgery, Radiology, etc.)
- Provider Types: Hospital, Physician Office, Clinic, Urgent Care, Telehealth
- Diagnosis Groups: 20 categories (Cardiovascular, Respiratory, etc.)
- Amounts: Billed and allowed amounts with realistic distributions
- Outcome: Settlement prediction (Approved/Denied)
All data is synthetic and contains no PHI.
- Type: Random Forest Classifier (scikit-learn)
- Features: CPT bucket, provider type, billed amount, allowed amount, diagnosis group, patient age
- Target: Settlement outcome (binary classification)
- Metrics: Accuracy, F1 Score, AUC-ROC, Precision, Recall
- Interpretability: SHAP summary plots, feature importance
All runs include tracking tags:
commit_sha: Simulated git commitstage: CI/CD/Deploydataset_window: Data range usedseed: Random seed for reproducibilitymodel_version: Version string
# Check service logs
docker compose logs -f [service_name]
# Restart specific service
docker compose restart [service_name]# Check what's using a port
netstat -tulpn | grep [port]
# Or on Windows
netstat -ano | findstr [port]docker compose down -v
docker compose up --build- Check MLflow server is running: http://localhost:5000
- Check backend can connect: http://localhost:8000/health
- Verify MinIO bucket exists: http://localhost:9001
MIT License - See LICENSE file for details.
The main dashboard showing the pipeline DAG, step inspector, and MLflow explorer.
Real-time experiment tracking with metrics, parameters, and artifact storage.
Live synthetic claims feed with drift monitoring.
