Agent-based system for AI-driven microbial cultivation and growth media design
Part of the CultureBotAI initiative led by Dr. Marcin Joachimiak at Lawrence Berkeley National Laboratory.
Last updated: 2026-05-18
- ✅ DBTL round 2 complete (May 2026): full analysis pipeline executed (3-way Pareto, MC stability, abiotic correction, t2-paired biology, kinetic fits, precipitation risk).
- 📍 t2 (6 h) adopted as canonical analysis endpoint (2026-05-15) — only
timepoint with paired abiotic controls. Round-2 scripts default to t2;
pass
--endpoint-timepoint t3for legacy behaviour. - 📑 Round 3 (v16) plan published — 8-factor design adding Nd³⁺ and citrate; LanM-fluorescence primary Nd assay + cell-pellet ICP-MS confirmatory subset.
- 🛂 ResearchAuditor framework added — file / data / provenance
auditors orchestrated by
src/microgrowagents/agents/analysis/research_auditor.py. - 🛠️ Two new Claude Code slash commands:
/plot(natural-language plots viascripts/microgrow-plot.py) and/validate-linkml(schema + example validator).
- Overview
- Documentation Quick Links
- DBTL Campaign Status
- Installation
- Quick Start
- Agents & Skills
- Experimental Analysis Pipeline
- Cofactor & Chemistry Reference
- Data Integrity, Provenance & Audit
- Core Capabilities
- Advanced Usage
- Repository Structure
- Development
- Tools, APIs & Datasets
- Historical Appendix
- Contributing / License / Citation / Contact
MicroGrowAgents bridges the microbial cultivation gap through AI-powered multi-agent systems that integrate knowledge graphs, machine learning, and experimental automation. The platform combines specialized agents (LiteratureAgent, AnalogyReasoningAgent, GenomeFunctionAgent, MediaFormulationAgent, …) operating on KG-Microbe (864,000+ validated species) to design optimized growth media for previously uncultured microorganisms, and now drives a multi-round Design-Build-Test-Learn (DBTL) campaign for Methylorubrum extorquens AM1 ΔmxaF lanthanide-dependent growth.
- Project state: docs/STATUS.md
- Round-3 plans: outputs/round2_recommendations/v16_design_recommendation.md, outputs/round3_recommendations/nd_assay_alternatives_report.md
- Agents / skills: docs/AGENTS_SKILLS_TOOLS.md
- Audit: docs/RESEARCH_AUDITOR.md, docs/AUDIT_REPORT_BBOP_SKILLS.md
- Cofactors / chemistry: docs/COFACTOR_REFERENCE.md, docs/LANTHANIDE_BIOAVAILABILITY_COMPLETE.md
- Optimization pipeline: docs/OPTIMIZATION_GUIDE.md
- Architecture diagrams: docs/architecture/README.md
- Dev guidance: CLAUDE.md
- 🧪 DBTL campaign infrastructure — multi-round analysis pipeline for M. extorquens AM1 lanthanide-dependent growth, from raw plate-reader CSVs through MaxPro+OptBlock design generation to Bayesian-optimisation seeds.
- 🔬 Advanced chemistry — osmolarity / water activity, redox potential, C:N:P ratios, Gibbs free energy via eQuilibrator, NdPO₄ precipitation, citrate/malate chelation.
- 🤖 Multi-agent media formulation — literature mining (245+ papers), analogy reasoning (208K+ chemical embeddings), genome-guided design (57 Bakta-annotated genomes), toxicity flagging.
- 🗺️ Response surface modelling — Gaussian Process fits, Pareto frontiers, expected-improvement Bayesian optimisation, Sobol sensitivity.
- 🛂 ResearchAuditor — file / data / provenance / report auditors for scientific reproducibility across the analysis pipeline.
- 📚 Sheet Query System — entity lookup, cross-reference, publication search, evidence-rich reports.
Two rounds of DBTL executed on the v10 MaxPro+OptBlock design (69 designed
conditions + ctrl_media baseline, 4 replicates) for M. extorquens AM1
ΔmxaF under lanthanide stress. Round 3 (v16) is in planning. Full
narrative in docs/STATUS.md §DBTL Campaign Status.
| Round | Date | Growth assay | Nd assay | Status |
|---|---|---|---|---|
| 1 | Feb–Mar 2026 | OD600 @ 600 nm, 3 timepoints | Arsenazo III @ 660 nm | analysed |
| 2 | May 2026 | Biolog PM08, 740/590 nm, 144 timepoints | Arsenazo III @ 660 nm (15 µM Nd dose) | analysed |
| 3 | planned | TBD | LanM-fluorescence (proposed) | planning |
First execution of the v10 design. Two parallel measurement modalities: OD600 plate-reader and arsenazo III Nd-depletion assay. Key findings preserved in the Historical Appendix; summary: peak biomass (MPOB_040, max OD600 0.95) and most stable growth (MPOB_053, mixed C1+C2 metabolism) identified.
Repeat of v10 with minor recipe adjustments. Switched growth modality to Biolog PM08 (740 nm biomass + 590 nm redox in parallel), bumped Nd³⁺ dose to 15 µM, added 48 row-B Nd calibration standards across 4 plates.
Canonical analysis endpoint: t2 (6 h) since 2026-05-15 — t2 is the only
timepoint with a paired abiotic control, making it the only timepoint at
which chemistry-vs-biology attribution is empirically possible. All
round-2 scripts default to t2; pass --endpoint-timepoint t3 for legacy.
At t2 the 3-way Pareto frontier is 3 conditions:
| Condition | OD600 t2 | Abs590 t2 | Nd remaining t2 (µM) | MC freq |
|---|---|---|---|---|
| MPOB_058 | 0.241 | 0.300 | 2.58 | 0.99 (stable) |
| MPOB_008 | 0.231 | 0.293 | 2.07 | borderline |
| MPOB_019 | 0.208 | 0.251 | 1.70 | borderline |
The five conditions that were t3-only Pareto winners (MPOB_022/_066/_020/_035/_024) fell off the frontier at t2; their late depletion happened between t2 and t3, outside the paired-control window and so cannot be attributed to biology without further measurement (cell-pellet ICP-MS recommended in round 3).
Round-2 analysis outputs live in 8 subdirs under outputs/round2_*:
outputs/round2_3way_pareto/— joint OD600 × Abs590 × Nd Pareto.outputs/round2_mc_pareto/— Monte-Carlo Pareto stability under replicate σ; emits a 2-panel composite figure plus standalone histogram and scatter PDFs/PNGs.outputs/round2_double_winners/— cross-cluster join (only MPOB_008 is a majority growth ∩ Nd-uptake double winner).outputs/round2_abiotic_correction/— empirical t1→t2 abiotic drift per condition.outputs/round2_t2_paired_biology/— (biotic − abiotic) at t2.outputs/round2_precipitation_risk/— Q/Ksp NdPO₄ model (refuted by the abiotic data).outputs/round2_kinetic_fits/— per-condition kinetic fits.outputs/round1_vs_round2/— reproducibility report (Spearman ρ ≈ 0: measurement-modality drift, not biology drift).
Abs590 caveat: r(OD600, Abs590) = 0.982 across all 70 conditions — the redox channel adds no independent information beyond biomass in round 2 and is a candidate to drop in v16 unless the chemistry changes (e.g., a tetrazolium dye that decouples respiration from growth).
The v16 design upgrades from 6 factors to 8 factors — adding Nd³⁺ (0–30 µM, 5-point grid) and citrate (0–300 µM) as first-class variables so the next round can disentangle MxaF-MDH vs XoxF-MDH and chemistry-vs-biology attribution. The full proposal:
v16_design_recommendation.md— factor ranges, t2-canonical 13-well anchor allocation (MPOB_058 × 4, MPOB_008 × 3, MPOB_019 × 2, plus 4 t3-only references × 1).v16_bo_seeds.md— 10 Gaussian-Process + Expected-Improvement seed candidates (top predicted OD600 = 0.268 vs round-2 best 0.241).nd_assay_alternatives_report.md+nd_assay_alternatives_1pager.md— recommends lanmodulin (LanM) fluorescence as primary HT readout (picomolar Nd affinity, 10⁸× Ca²⁺ selectivity, no per-plate calibration), cell-pellet ICP-MS on the 3 t2 Pareto winners + 5 BO seeds (≈ 32 samples) as the confirmatory subset, and an optional 1-plate arsenazo III bridge for cross-round comparability.
New collaborator? See the full Getting Started Guide for detailed setup including data downloads, database build, and troubleshooting.
# Clone the repository
git clone --recurse-submodules https://github.com/CultureBotAI/MicroGrowAgents.git
cd MicroGrowAgents
# Install dependencies using uv
uv sync --group dev
# Download framework data (KG-Microbe, MediaDive, embeddings)
just download
# Download BER-CMM-AM1 project data (optional, for AM1 work)
just download-project
# Build database from downloaded sources
just build-db
# Verify installation
just test# Growth (Biolog 740 nm), Nd uptake (arsenazo III), and redox (Biolog 590 nm)
just analyze-experimental-round2 data/experimental/plate_designs_v10_maxprooptblock_long__round2_results
just analyze-experimental-round2-nd data/experimental/plate_designs_v10_maxprooptblock_long__round2_results_asezuran
just analyze-experimental-round2-redox data/experimental/plate_designs_v10_maxprooptblock_long__round2_results
# Joint OD600 × Nd_uM Pareto at the canonical t2 endpoint
uv run python scripts/three_way_pareto_round2.py
uv run python scripts/mc_pareto_round2.py # MC stability + 2-panel figure
# Round-3 deliverables (already committed under outputs/round{2,3}_recommendations/)# Get MP medium concentrations
uv run python run.py gen-media-conc "MP medium"
# Custom ingredients with PubChem enrichment
uv run python run.py gen-media-conc "glucose,NaCl,KH2PO4" --mode ingredients --enrich pubchem# Basic pH + salinity sweep
uv run python run.py sensitivity "MP medium"
# With all advanced properties
uv run python run.py sensitivity "MP medium" \
--calculate-osmotic --calculate-redox --calculate-nutrients --plotMicroGrowAgents provides 29 specialized agent classes, 52 user-facing Python skills, and 22 Claude Code slash commands for microbial cultivation and media design. Complete reference in docs/AGENTS_SKILLS_TOOLS.md.
Knowledge & Reasoning:
KGReasoningAgent— query KG-Microbe (1.5M nodes, 5.1M edges)LiteratureAgent— literature mining and evidence extractionAnalogyReasoningAgent— chemical similarity search (208K+ embeddings)SheetQueryAgent— query extended information sheets
Genome Analysis:
GenomeFunctionAgent— genome-guided media design (57 genomes, 667K features)LanthanideGenesAgent— lanthanide-dependent gene analysisTransporterAgent— nutrient transporter annotation and analysis
Media Design & Optimization:
MediaFormulationAgent— multi-source media recommendationGenMediaConcAgent— ML-based concentration predictionCofactorMediaAgent— cofactor requirement analysisAlternateIngredientAgent— alternative ingredient suggestionsMediaRoleAgent— ingredient metabolic role classificationMaxProOptBlockAgent— MaxPro optimal blocking design generationReconcileAgent— experimental vs prediction reconciliationEnsembleOptimizationAgent— response surface modelling and BODesignRecommendationAgent— interpret results to recommend next designExperimentalInterpretationAgent— evidence-based biological interpretations with inline citations
Metabolic Modeling:
MetabolicSourceAgent— metabolic source identificationGapMindAgent— GapMind pathway gap analysisGEMsemblerAgent— genome-scale metabolic model reconstructionGrowthCodonAgent— codon usage bias-based growth predictionMediaMatchAgent— MediaDive database integration
Chemistry & Properties:
ChemistryAgent— osmotic, redox, nutrient-ratio calculationsSensitivityAnalysisAgent— parameter sweep and sensitivity analysis
Audit & Provenance: (new)
ResearchAuditor— orchestrates file / data / provenance / report auditors against the analysis pipeline (src/microgrowagents/agents/analysis/research_auditor.py).SchemaReviewAgent— LinkML schema review helper.
Data Management:
SQLAgent— database queriesIngredientCooccurrenceAgent,IngredientEffectsEnrichmentAgent,CSVAllDOIsEnrichmentAgentPDFEvidenceExtractor,EvidenceExtractionOrchestrator— multi-source evidence orchestration
64 skill modules organised under src/microgrowagents/skills/ —
Analysis (19+), Prediction & Design (12), Query & Search (5),
Chemistry & Validation (5), Workflows (6), Utilities (3), Meta (2,
new — includes validate_linkml). The user-facing count is 52 after the
recent additions; see
docs/AGENTS_SKILLS_TOOLS.md §Skills for the
complete categorical listing.
Slash commands under .claude/skills/ callable from Claude Code:
| Command | Purpose |
|---|---|
/plot (new) |
Publication-quality plots from data files via natural language. Backed by scripts/microgrow-plot.py; 11 plot types, 4 journal style presets (nature/science/minimal/dark), PNG/PDF/SVG output. |
/validate-linkml (new) |
Validate a LinkML schema + example pair. Backed by src/microgrowagents/skills/meta/validate_linkml.py and scripts/validate_linkml_cli.py. |
/recommend-media |
Multi-agent media formulation recommendation. |
/design-maxpro-optblock |
MaxPro+OptBlock experimental design generation. |
/lhs-design-generation |
Latin-hypercube design generation. |
/predict-concentration |
Predict ingredient concentration ranges. |
/predict-growth-cub, /predict-growth-hybrid |
Codon-usage-bias and hybrid growth predictors. |
/analyze-gaps, /analyze-limitations, /analyze-lanthanide-genes, /analyze-ingredient-cooccurrence |
Analysis utilities. |
/check-carbon-sources, /compare-gap-fba |
Carbon-source and gap-vs-FBA comparators. |
/fba-gene-knockout-lanthanophore |
FBA-based gene-knockout analysis for lanthanophore biosynthesis. |
/search-ingredients-hierarchical, /search-mediadive |
Search utilities. |
/ingredient-report |
Per-ingredient evidence report. |
/validate-media, /review-schema, /file-naming-conventions |
Validation + standards helpers. |
Comprehensive dual-pipeline for analysing experimental growth data with both absolute (raw OD600) and relative (vs baseline) analysis modes, plus response surface modelling and Bayesian optimisation.
Features:
- 📊 Dual-mode analysis (absolute + relative).
- 🔬 Hierarchical clustering (276 replicates, 6 clusters).
- 🗺️ Gaussian Process response surfaces with multi-objective Pareto.
- 🤖 Ensemble optimisation (GP + polynomial + Random Forest).
- 🎯 Bayesian optimisation with Expected Improvement acquisition.
- 📈 ANOVA, main effects, Sobol sensitivity indices.
- ✅ Schema-driven validation of all outputs.
- 🔍 Evidence-based interpretation with inline citations.
The round-2 data ships as Biolog raw CSVs + per-condition rollups + per-well
Nd predictions (a different layout than round-1's flat plate{1,2,3}.tsv).
The adapter at scripts/build_round2_replicate_statistics.py converts both
into the round-1 schema so all downstream recipes run unchanged:
# Growth (Biolog 740 nm) — builds outputs/..._round2_results_experimental_analysis_absolute/
just analyze-experimental-round2 data/experimental/plate_designs_v10_maxprooptblock_long__round2_results
# Nd uptake (arsenazo III)
just analyze-experimental-round2-nd data/experimental/plate_designs_v10_maxprooptblock_long__round2_results_asezuran
# Redox channel (Biolog 590 nm) — note the r=0.982 with OD600 (see DBTL §Round 2)
just analyze-experimental-round2-redox data/experimental/plate_designs_v10_maxprooptblock_long__round2_resultsThe following per-analysis scripts accept --endpoint-timepoint {t1,t2,t3}; default is t2 since 2026-05-15:
scripts/three_way_pareto_round2.pyscripts/mc_pareto_round2.pyscripts/compare_round1_vs_round2.pyscripts/analyze_round2_precipitation_risk.pyscripts/plot_pairwise_response_surfaces.py
The joint Pareto/BO driver (scripts/analyze_response_surfaces.py)
auto-detects: tries t2 first, falls back to t3, then to max_*
columns — no flag needed.
The original (round-1) flat-file pipeline:
# Run BOTH absolute and relative analyses (recommended)
just analyze-experimental data/experimental/plate_designs_v10_maxprooptblock_long__results
# Either mode alone
just analyze-experimental-absolute data/experimental/plate_designs_v10_maxprooptblock_long__results
just analyze-experimental-relative data/experimental/plate_designs_v10_maxprooptblock_long__results
# Cluster
just cluster-experimental outputs/plate_designs_v10_maxprooptblock_long__results_experimental_analysis_absolute/v10_maxprooptblock_long__results_replicate_statistics_absolute.tsv outputs/plate_designs_v10_maxprooptblock_long__results_experimental_analysis_clustering_absolute absolute
# Validate
just validate-experimental plate_designs_v10_maxprooptblock_long__resultsModes:
- Absolute (raw OD600): "Which conditions grew best overall?"
- Relative (fold-change vs control): "Which variations improved over baseline media?"
Output directories (per source data ID, e.g.,
v10_maxprooptblock_long__results):
outputs/{source_data_id}_experimental_analysis_{mode}/outputs/{source_data_id}_experimental_analysis_clustering_{mode}/
Every output file is labelled with the source data ID for full traceability
(the auto-generated prefix removes the plate_designs_ portion of the
directory name and adds a trailing underscore).
Optional response surface modelling using Gaussian Processes for ingredient-measurement relationships and multi-objective optimisation:
# Runs automatically with analyze-experimental (enabled by default)
just analyze-experimental data/experimental/plate_designs_v13_latinhypercube_long__results
# Faster analysis without surfaces
python scripts/run_dual_analysis.py data/experimental/plate_designs_v10_maxprooptblock_long__results --disable-response-surfaces
# Standalone
python scripts/analyze_response_surfaces.py \
outputs/plate_designs_v13_latinhypercube_long__results_experimental_analysis_absolute/ \
--mode absolute --measurements OD600 Nd_uMCapabilities: 3D surface plots, Pareto frontiers, predictions over design space, contour maps.
Measurement interpretation:
- OD600 — biomass; absolute = raw, relative = fold-change vs control.
- Nd_uM — Nd remaining in supernatant. In round-2, the absolute value at t2 is what the canonical pipeline reports; in the round-1 relative framing, negative = more consumption than control baseline. Initial Nd dose: 5.5 µM (round-1), 15 µM (round-2).
Outputs (per mode):
response_surfaces/surface_predictions_{measurement}_{mode}.csvresponse_surfaces/surface_3d_{measurement}_{mode}.pdf/pngresponse_surfaces/pareto_frontier_{mode}.csv/pdf/pngresponse_surfaces/optimization_report_{mode}.txt
uv run python -m microgrowagents.skills.simple.optimize_growth_conditions \
--data outputs/experimental_analysis \
--source-data-id plate_designs_v10_maxprooptblock_long__results \
--output-dir outputs/optimization \
--strategy hybrid \
--n-suggestions 69Trains ensemble models (GP + Polynomial + Random Forest), analyses
ingredient effects + interactions, and uses Bayesian optimisation to
suggest next experiments. Strategies: bayesian, local, uncertainty,
or hybrid (70% local + 15% uncertainty + 15% space-filling).
Generate publication-ready biological interpretations with inline citations
and bibliography via ExperimentalInterpretationAgent:
from microgrowagents.agents.analysis import ExperimentalInterpretationAgent
agent = ExperimentalInterpretationAgent(source_version="v10")
result = agent.run()Produces four artifacts:
- INTERPRETATION_REPORT.md — clean biological interpretation (executive summary, factor-by-factor analysis, metabolic insights, testable hypotheses, recommendations for next design iteration).
- INTERPRETATION_EVIDENCE.md — evidence companion file with data evidence E1–E# (cited file + section + snippet) and literature evidence L1–L# (DOIs).
- INTERPRETATION_REPORT_evidence.md — citation-based version with
inline
[E1],[L2]markers and a complete bibliography. - interpretation_metadata.json — execution metadata.
See docs/EXPERIMENTAL_INTERPRETATION_AGENT.md for the complete documentation.
The CofactorMediaAgent and the generate_cofactor_reference script
integrate 6 biological databases (ChEBI, KEGG, BRENDA, ExplorEnz, KG-Microbe,
plus literature) to produce two reference TSVs:
just generate-cofactor-reference
# emits:
# data/references/cofactors_complete.tsv (60 cofactors with CHEBI IDs, EC associations, usage tracking)
# data/references/cofactors_metals.tsv (19-cofactor metal/REE subset including 5 lanthanides)Primary data sources:
- ChEBI — chemical identifiers (DOI: 10.1093/nar/gkv1031)
- KEGG — biosynthesis pathways (DOI: 10.1093/nar/gkac963)
- BRENDA — EC ↔ cofactor relationships (DOI: 10.1093/nar/gky1048)
- ExplorEnz — Enzyme Commission nomenclature (DOI: 10.1093/nar/gkn582)
- KG-Microbe — enzyme-substrate relationships, pathway context.
Reference files (in-repo):
src/microgrowagents/data/cofactor_hierarchy.yaml— 44 cofactors across 5 categories (curated)src/microgrowagents/data/ec_to_cofactor_map.yaml— 68 EC pattern mappingsdata/references/cofactors_complete.tsv— 60 cofactors (generated)data/references/cofactors_metals.tsv— 19 metals / REEs (generated)data/processed/ingredient_cofactor_mapping.csv— 13 MP medium cofactor providers
Docs:
docs/COFACTOR_REFERENCE.md— data dictionarydocs/COFACTOR_REFERENCE_V3_USAGE.md— enrichment workflowdocs/cofactor_data_sources.md— methodology and citations
Module: src/microgrowagents/chemistry/
Osmotic Properties (osmotic_properties.py):
calculate_osmolarity(ingredients, temperature=25.0)calculate_water_activity(ingredients, temperature=25.0, method="raoult")estimate_van_hoff_factor(formula, charge, name)- Methods: Raoult's law (dilute), Robinson-Stokes (concentrated), Bromley (high ionic strength).
Redox Properties (redox_properties.py):
calculate_redox_potential(ingredients, ph, temperature=25.0)— Eh and pE via Nernst.calculate_electron_balance(ingredients).
Nutrient Ratios (nutrient_ratios.py):
calculate_cnp_ratios(ingredients)— C:N:P, limiting-nutrient classification.calculate_trace_metal_ratios(ingredients)— Fe:P, Mn:P, Zn:P with deficiency / excess flags.- Redfield ratio comparison (marine: 106:16:1, terrestrial: ~60:7:1).
Thermodynamic Properties (thermodynamic_properties.py):
calculate_gibbs_free_energy(reactants, products, ph=7.0)— ΔG via eQuilibrator + Component Contribution.calculate_formation_energy(compound)— ΔGf°.
Specialised modules and docs for Nd³⁺ bioavailability — Ksp-based NdPO₄ precipitation, citrate / malate chelation, bioavailable-fraction calculation:
src/microgrowagents/chemistry/precipitation.py— NdPO₄ Ksp + activity-coefficient model.src/microgrowagents/chemistry/chelation.py— citrate / malate Nd chelation.src/microgrowagents/chemistry/bioavailability.py— bioavailable-fraction calculation.docs/LANTHANIDE_BIOAVAILABILITY_COMPLETE.md— consolidated chemistry reference.docs/LANTHANIDE_PRECIPITATION_IMPLEMENTATION_STATUS.md— Ksp values, activity-coefficient model, validated condition ranges, known limitations.
Cited from the round-3 Nd-assay recommendation
(nd_assay_alternatives_report.md)
as the chemistry rationale for the LanM + cell-pellet ICP-MS protocol.
All input data files are protected with SHA256 checksums for cryptographic reproducibility (bbop-skills Criterion 4):
just verify-data-integrity # check `data/checksums.txt` against current data filesStored at data/checksums.txt (global) and outputs/*/input_data_checksums.json
(per-analysis). Every analysis records checksums of its input files. See
docs/ARTIFACT_CLEANUP_POLICY.md for the
generation procedure.
End-to-end auditing of the analysis pipeline, scored against the bbop-skills criteria for local-first agentic systems. Components:
- Agent:
src/microgrowagents/agents/analysis/research_auditor.py— orchestrator. - Provenance auditor:
src/microgrowagents/provenance/auditor.py— replays a session's action log against actual file/directory state. - Utility auditors:
src/microgrowagents/utils/—data_auditor(input checksums),file_auditor(output presence + size + checksums),audit_report_generator(markdown rendering),audit_structures(shared dataclasses). - Schema:
src/microgrowagents/schema/audit_outputs_schema.yaml— LinkML definitions for report objects. - Runner:
scripts/run_research_audit.py(production),scripts/demo_research_audit.py(example). - Docs:
docs/RESEARCH_AUDITOR.md,docs/RESEARCH_AUDITOR_IMPLEMENTATION.md.
uv run python scripts/run_research_audit.py \
--session-id <session-uuid> \
--output outputs/research_audit_<date>/Three-tier retention (docs/ARTIFACT_CLEANUP_POLICY.md):
Archival (keep): published designs (v10, v13, …), validated analysis with interpretations, response surface models. Temporary (30 days): per-run analysis outputs, clustering, intermediate optimisation runs. Ephemeral (7 days): test outputs, debug artifacts, scratch visualisations.
just archive-outputs # move to archive/
just clean-old-outputs # >30 days
just clean-ephemeral # >7 daysSteady-state ~185 MB (with cleanup) vs ~4 GB/year unmanaged (96% reduction).
Overall: 78% (7/9 PASS) — full breakdown in docs/AUDIT_REPORT_BBOP_SKILLS.md:
✅ PASS (7): provenance tracking, model tracking, reasoning/code separation, validation (LinkML schemas + validators), error-correction (DOI validation + corrections), RAG (KG-Microbe + literature + genomes), artifact cleanup.
Action checklist with implementation steps + target dates:
docs/AUDIT_ACTIONS_CHECKLIST.md.
DOI validation: 90.5% (143/158 DOIs) — 92 PDFs, 44 abstracts, 15 missing.
uv run python scripts/doi_validation/validate_failed_dois.py
uv run python scripts/doi_corrections/apply_doi_corrections.py
uv run python scripts/pdf_downloads/download_all_pdfs_automated.pyHistory: notes/DOI_CORRECTIONS_FINAL_UPDATED.md.
Predicts LOW, DEFAULT, and HIGH concentration ranges for media ingredients:
uv run python run.py gen-media-conc "MP medium"
uv run python run.py gen-media-conc "PIPES,NaCl,glucose" --mode ingredients
uv run python run.py gen-media-conc "MP medium" --enrich pubchemOutput: predicted concentration ranges (mM), molecular weights, chemical formulas, confidence scores.
uv run python run.py sensitivity "MP medium"
uv run python run.py sensitivity "MP medium" --calculate-osmotic --calculate-nutrients
uv run python run.py sensitivity "MP medium" --plot --plot-output analysis.pngCalculates pH, salinity (TDS + NaCl-equivalent), ionic strength; optionally osmotic / redox / nutrient ratios.
uv run python run.py compare-media "MP medium" "LB medium"Common vs unique ingredients, concentration differences.
from microgrowagents.skills.workflows import RecommendMediaWorkflow
workflow = RecommendMediaWorkflow()
result = workflow.run(
query="Recommend medium for methanotrophic bacteria",
organism="Methylococcus capsulatus",
temperature=42.0, pH=6.8,
carbon_source="methane", oxygen="aerobic",
goals="defined,selective",
output_format="markdown",
)Multi-source evidence integration (KG-Microbe + literature + MP database).
Complete formulation with ingredient list, concentrations, roles, alternatives,
confidence scores. Goal presets: minimal, defined, complex,
cost_effective, high_yield, selective. Full skill docs in
.claude/skills/recommend-media.md.
Organism-specific media design using 57 Bakta-annotated genomes (667,502
features). EC-number queries with wildcard support (1.1.*.*), auxotrophy
detection, cofactor analysis, transporter analysis. Automatically integrated
into MediaFormulationAgent, GenMediaConcAgent, and KGReasoningAgent.
See docs/GENOME_FUNCTION.md for examples.
uv run python run.py sensitivity "MP medium" \
--calculate-osmotic --calculate-redox --calculate-nutrients \
--ph 7.0 --temperature 30 \
--format json --output complete_analysis.jsonuv run python run.py gen-media-conc "MP medium" --format json > predictions.json
uv run python run.py sensitivity --input-file predictions.json --calculate-osmoticfrom microgrowagents.agents.sensitivity_analysis_agent import SensitivityAnalysisAgent
agent = SensitivityAnalysisAgent(db_path="data/microgrowdb.db")
result = agent.run(
query="MP medium",
mode="medium",
calculate_osmotic=True, calculate_redox=True, calculate_nutrients=True,
temperature=37.0,
)
print(f"pH: {result['baseline']['ph']}")
print(f"Limiting nutrient: {result['baseline']['nutrient_ratios']['limiting_nutrient']}")MicroGrowAgents/
├── CLAUDE.md # Project instructions for Claude Code
├── README.md # this file
├── justfile + project.justfile # task recipes
├── pyproject.toml + uv.lock # dependencies (uv)
│
├── src/microgrowagents/
│ ├── agents/ # 29 specialized agent classes
│ │ ├── analysis/ # research_auditor, schema_review, interpretation, design_recommendation
│ │ └── …
│ ├── skills/ # 52 user-facing skills (64 .py modules)
│ │ ├── analysis/{experimental,statistical,visualization}/
│ │ ├── core/{chemistry,genome,knowledge,modeling}/
│ │ ├── design/{doe,media,validation}/
│ │ ├── meta/ # validate_linkml (new)
│ │ └── workflows/, utilities/, formatters/, executors/, simple/, development/
│ ├── chemistry/ # osmotic, redox, nutrient_ratios, thermodynamic, precipitation, chelation, bioavailability
│ ├── provenance/ # auditor (ResearchAuditor backend)
│ ├── utils/ # audit_{report_generator,structures}, data_auditor, file_auditor, checksums, …
│ └── schema/ # LinkML schemas including audit_outputs_schema.yaml
│
├── scripts/ # Integration / analysis scripts
│ ├── build_round2_replicate_statistics.py # DBTL2 adapter
│ ├── mc_pareto_round2.py, three_way_pareto_round2.py, …
│ ├── microgrow-plot.py # /plot skill backend
│ ├── validate_linkml_cli.py # /validate-linkml skill CLI
│ ├── run_research_audit.py, demo_research_audit.py
│ ├── generate_cofactor_reference.py + enhance_cofactor_references_v3.py + validate_enhance_cofactors_metals.py
│ ├── generate_ko_to_{ec,go_map_{bakta,uniprot}}.py
│ ├── generate_architecture_diagrams_{simplified,abstract,vivid}.py
│ ├── generate_explanatory_heatmap.py + regenerate_*heatmap.py + generate_provenance_heatmap.py
│ ├── compute_toxicity_report_bioavailability.py, extract_presentation_data.py
│ └── doi_validation/, doi_corrections/, pdf_downloads/, enrichment/, schema/
│
├── tests/ # 1900+ pytest tests across modules
├── data/
│ ├── raw/ # source data with checksums
│ ├── experimental/
│ │ ├── plate_designs_v10_maxprooptblock_long__results/ # round 1 OD600
│ │ ├── plate_designs_v10_maxprooptblock_long__results_asezuran/ # round 1 arsenazo III
│ │ ├── plate_designs_v10_maxprooptblock_long__round2_results/ # round 2 Biolog
│ │ └── plate_designs_v10_maxprooptblock_long__round2_results_asezuran/ # round 2 arsenazo III
│ ├── references/ # cofactors_{complete,metals}.tsv
│ ├── corrections/, results/, sheets_cmm/, pdfs/, designs/
│ └── checksums.txt
│
├── outputs/
│ ├── round1_vs_round2/, round2_3way_pareto/, round2_mc_pareto/, round2_double_winners/,
│ ├── round2_abiotic_correction/, round2_t2_paired_biology/, round2_kinetic_fits/,
│ ├── round2_precipitation_risk/ # round-2 analyses (the dirs above are reproducible from scripts/)
│ ├── round2_recommendations/ # v16_design_recommendation.md, v16_bo_seeds.md
│ ├── round3_recommendations/ # nd_assay_alternatives_report.md (+ _1pager.md)
│ ├── cofactor_analysis/, lanthanide_genes/, optimization/, media/
│ └── plate_designs_v10_*_experimental_analysis_{absolute,relative,clustering_*}/
│
├── docs/ # MkDocs documentation
│ ├── STATUS.md, AGENTS_SKILLS_TOOLS.md, RESEARCH_AUDITOR.md, …
│ ├── COFACTOR_REFERENCE.md, LANTHANIDE_BIOAVAILABILITY_COMPLETE.md, …
│ ├── architecture/ # simplified / abstract / vivid diagram variants
│ └── figures/ # explanatory heatmaps, provenance heatmap
│
├── notes/ # research notes, DOI corrections, session summaries
└── .claude/
├── provenance/ # session manifests + action logs
└── skills/ # 22 slash commands (/plot, /validate-linkml, …)
# All tests + type checking + formatting
just test
# Targeted
uv run pytest tests/test_chemistry/test_osmotic_properties.py -v
uv run pytest --cov=microgrowagents --cov-report=html
# Type / format
just mypy
just format
# Documentation
just _serve # local mkdocs
mkdocs buildThe full test suite is ~2000 tests across the agents, skills, chemistry,
KG, validators, and scripts trees; coverage report via --cov-report=html.
https://CultureBotAI.github.io/MicroGrowAgents
External tools, APIs, and datasets integrated with MicroGrowAgents.
Chemical:
- PubChem — chemical structures, properties, identifiers.
- ChEBI — Chemical Entities of Biological Interest (DOI: 10.1093/nar/gkv1031).
- eQuilibrator — biochemical thermodynamics, ΔG calculations.
Biological:
- KEGG — pathway definitions (DOI: 10.1093/nar/gkac963).
- BRENDA — enzyme info, EC ↔ cofactor (DOI: 10.1093/nar/gky1048).
- ExplorEnz — EC nomenclature (DOI: 10.1093/nar/gkn582).
- UniProt — protein sequences and annotations.
- NCBI — genome sequences, taxonomy, literature.
Planned:
- NIST WebBook — inorganic thermodynamic data.
KG-Microbe — 1.5M nodes, 5.1M edges; 864,363 species (GTDB + LPSN + NCBI). Genome annotations — 57 Bakta-annotated genomes, 667,502 features (incl. M. extorquens AM1, M. capsulatus). Chemical embeddings — 208K+ Morgan fingerprints / descriptors for analogy-based reasoning. MP Medium database — 158 ingredients × 68 columns, 158 unique DOIs, 90.5% citation coverage. Literature corpus — 245+ papers with extracted excerpts.
Metabolic modelling: GapMind, GEMsembler, COBRApy. Genome annotation: Bakta, NCBI BLAST. Experimental design: MaxPro+OptBlock (custom), Latin Hypercube Sampling. Growth prediction: GrowthCodon (codon usage bias), MediaDive.
Scientific: numpy, pandas, scipy, scikit-learn. Chemistry: rdkit, equilibrator-api. Visualization: matplotlib, seaborn, plotly. Database / KG: duckdb, sqlalchemy, linkml. Optimization: scikit-optimize, pymoo, SALib, statsmodels. PDF / PPTX: pypdf, python-pptx. Dev: pytest, mypy, ruff, uv.
data/raw/mp_medium_ingredient_properties.csv— ingredient data with DOIs.docs/STATUS.md— citation coverage metrics.notes/DOI_CORRECTIONS_FINAL_UPDATED.md— DOI validation history.docs/cofactor_data_sources.md— cofactor source methodology.
Top performer: MPOB_040
- Max OD600: 0.95 (highest overall).
- Strategy: pure C1 methylotrophy (67.9 mM methanol, low succinate).
- Challenge: 98% crash at 48 h due to methanol depletion.
- Crash analysis:
outputs/optimization/MPOB_040_CRASH_ANALYSIS.md.
Most stable: MPOB_053
- Max OD600: 0.66 (sustained growth across all timepoints).
- Strategy: mixed C1+C2 metabolism (19.9 mM methanol, 58.7 mM succinate).
- Key finding: 40–60 mM succinate provides metabolic backup when methanol depletes, preventing culture crash while maintaining high peak growth.
These v10-era results motivated the round-2 design (Biolog dual-channel, arsenazo III at higher 15 µM Nd dose). Round-2 supersedes round-1 for foreground decision-making — see DBTL Campaign Status §Round 2.
v13 varied Neodymium 0–5 µM to test MxaF vs XoxF-MDH pathways:
- High OD600 at low Nd → lanthanide-independent (MxaF-MDH).
- High OD600 at high Nd → lanthanide-dependent (XoxF-MDH).
- Response surface modelling identifies Pareto-optimal conditions.
- Multi-objective optimisation balances growth AND Nd utilisation.
v16 extends the Nd³⁺ axis to 0–30 µM (5-point grid) and adds citrate (0–300 µM) so the round-3 experiment can directly probe chemistry-vs-biology attribution.
Spearman ρ ≈ 0 on OD600 and ≈ −0.17 on Nd across 69 matched conditions;
26 / 69 growth and 42 / 69 Nd conditions disagree at |z| > 2σ. The two
rounds switched instruments (600 nm → Biolog 740 nm) and Nd calibration
(raw abs660 → Miller §5.10 inverse fit), so this is measurement-modality
drift, not biology drift. v16 anchors on round-2 winners alone. Full
analysis: outputs/round1_vs_round2/REPRODUCIBILITY_REPORT.md.
Contributions are welcome:
- Fork the repository.
- Create a feature branch.
- Write tests for new functionality.
- Ensure all tests pass (
just test). - Submit a pull request.
BSD 3-Clause License. See LICENSE for details.
Copyright (c) 2026 Marcin P. Joachimiak, Lawrence Berkeley National Laboratory
This project uses the template monarch-project-copier.
If you use MicroGrowAgents in your research, please cite this repository.
Principal Investigator: Dr. Marcin P. Joachimiak
- Institution: Lawrence Berkeley National Laboratory
- Project: CultureBotAI Initiative
- GitHub: CultureBotAI
For questions or issues:
- Open an issue on GitHub Issues
- See CLAUDE.md for development guidance