GitHub - sagm07/bloop: ML failure analysis agent

# Bloop 🔍

### The Ruthless ML Auditor Agent

Bloop is a git-native AI agent built on the gitagent standard.

You give it a broken ML model. It tells you exactly where it fails,

why it fails, and what to do about it — in that order.

No sugarcoating. No generic advice. Just answers.

---

## The Problem Bloop Solves

Every ML engineer has been here:

- Model is stuck at 87% accuracy

- You don't know if it's your data, your features, or your model

- You waste days trying random fixes

Bloop runs a structured 3-step audit and gives you a ranked action plan in seconds.

---

## How It Works

Bloop uses 3 skills in sequence:

### 1. Segment Analysis

Finds exactly where the model fails — which classes, slices, or cohorts

have the worst performance. Surfaces critical failures (F1 < 0.5).

### 2. Root Cause Analysis

Diagnoses why it fails — class imbalance, label noise, feature leakage,

or distribution shift. Always backed by evidence.

### 3. Fix Generator

Produces a ranked action plan — each fix linked to a root cause,

with expected accuracy gain and effort level.

---

## Demo

Set your Groq API key:


$env:GROQ\_API\_KEY="your-key"

Run Bloop:


node run.mjs

### Sample Output


Segment Analysis

\- Class 1 (positive): F1 = 0.60 — critical failure

\- Low lighting images: F1 = 0.45

\- Retinal hemorrhages: F1 = 0.48



Root Causes

1\. Class imbalance — 80% negative, 20% positive

2\. Label noise — low confidence scores in Class 1

3\. Distribution shift — train vs validation gap



Fix Plan

1\. Oversample with SMOTE → +2-3% accuracy, effort 6/10

2\. Label smoothing → +1-2% accuracy, effort 5/10

3\. Collect balanced data → +2-4% accuracy, effort 8/10

---

## Agent Structure


bloop/

├── agent.yaml              # Agent manifest

├── SOUL.md                 # Ruthless ML auditor personality  

├── RULES.md                # Hard constraints — never hallucinate metrics

├── run.mjs                 # Entry point

└── skills/

&#x20;   ├── segment-analysis/   # Where does it fail?

&#x20;   ├── root-cause/         # Why does it fail?

&#x20;   └── fix-generator/      # How do we fix it?

---

## Built With

- gitagent standard — git-native agent definition

- gitclaw — agent runtime

- Groq — llama-3.3-70b-versatile

- Node.js

---

## Real World Impact

Bloop was built to solve a real problem — a diabetic retinopathy

detection XGBoost model plateaued at 87% accuracy. Bloop diagnosed

the exact issue in seconds: class imbalance in positive cases,

label noise in low-lighting images, and distribution shift.

Everyone trains models. Almost no one does deep failure analysis.

Bloop does.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.gitagent		.gitagent
.github/workflows		.github/workflows
compliance		compliance
config		config
docs		docs
examples		examples
hooks		hooks
knowledge		knowledge
memory		memory
reports		reports
sample_data		sample_data
skillflows		skillflows
skills		skills
tools		tools
.gitignore		.gitignore
DUTIES.md		DUTIES.md
README.md		README.md
ROADMAP.md		ROADMAP.md
RULES.md		RULES.md
SOUL.md		SOUL.md
agent.yaml		agent.yaml
amazon_hiring.csv		amazon_hiring.csv
debug.js		debug.js
index.html		index.html
index.js		index.js
metrics.mjs		metrics.mjs
package-lock.json		package-lock.json
package.json		package.json
run.mjs		run.mjs
server.mjs		server.mjs
test.js		test.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages