Skip to content

CultureBotAI/MicroGrowAgents

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

100 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

MicroGrowAgents (MGA)

Agent-based system for AI-driven microbial cultivation and growth media design

Part of the CultureBotAI initiative led by Dr. Marcin Joachimiak at Lawrence Berkeley National Laboratory.

Tests Documentation

Last updated: 2026-05-18

Recent updates

  • DBTL round 2 complete (May 2026): full analysis pipeline executed (3-way Pareto, MC stability, abiotic correction, t2-paired biology, kinetic fits, precipitation risk).
  • 📍 t2 (6 h) adopted as canonical analysis endpoint (2026-05-15) — only timepoint with paired abiotic controls. Round-2 scripts default to t2; pass --endpoint-timepoint t3 for legacy behaviour.
  • 📑 Round 3 (v16) plan published — 8-factor design adding Nd³⁺ and citrate; LanM-fluorescence primary Nd assay + cell-pellet ICP-MS confirmatory subset.
  • 🛂 ResearchAuditor framework added — file / data / provenance auditors orchestrated by src/microgrowagents/agents/analysis/research_auditor.py.
  • 🛠️ Two new Claude Code slash commands: /plot (natural-language plots via scripts/microgrow-plot.py) and /validate-linkml (schema + example validator).

Table of Contents

Overview

MicroGrowAgents bridges the microbial cultivation gap through AI-powered multi-agent systems that integrate knowledge graphs, machine learning, and experimental automation. The platform combines specialized agents (LiteratureAgent, AnalogyReasoningAgent, GenomeFunctionAgent, MediaFormulationAgent, …) operating on KG-Microbe (864,000+ validated species) to design optimized growth media for previously uncultured microorganisms, and now drives a multi-round Design-Build-Test-Learn (DBTL) campaign for Methylorubrum extorquens AM1 ΔmxaF lanthanide-dependent growth.

Documentation Quick Links

Key Features

  • 🧪 DBTL campaign infrastructure — multi-round analysis pipeline for M. extorquens AM1 lanthanide-dependent growth, from raw plate-reader CSVs through MaxPro+OptBlock design generation to Bayesian-optimisation seeds.
  • 🔬 Advanced chemistry — osmolarity / water activity, redox potential, C:N:P ratios, Gibbs free energy via eQuilibrator, NdPO₄ precipitation, citrate/malate chelation.
  • 🤖 Multi-agent media formulation — literature mining (245+ papers), analogy reasoning (208K+ chemical embeddings), genome-guided design (57 Bakta-annotated genomes), toxicity flagging.
  • 🗺️ Response surface modelling — Gaussian Process fits, Pareto frontiers, expected-improvement Bayesian optimisation, Sobol sensitivity.
  • 🛂 ResearchAuditor — file / data / provenance / report auditors for scientific reproducibility across the analysis pipeline.
  • 📚 Sheet Query System — entity lookup, cross-reference, publication search, evidence-rich reports.

DBTL Campaign Status

Two rounds of DBTL executed on the v10 MaxPro+OptBlock design (69 designed conditions + ctrl_media baseline, 4 replicates) for M. extorquens AM1 ΔmxaF under lanthanide stress. Round 3 (v16) is in planning. Full narrative in docs/STATUS.md §DBTL Campaign Status.

Round Date Growth assay Nd assay Status
1 Feb–Mar 2026 OD600 @ 600 nm, 3 timepoints Arsenazo III @ 660 nm analysed
2 May 2026 Biolog PM08, 740/590 nm, 144 timepoints Arsenazo III @ 660 nm (15 µM Nd dose) analysed
3 planned TBD LanM-fluorescence (proposed) planning

Round 1 (Feb–Mar 2026)

First execution of the v10 design. Two parallel measurement modalities: OD600 plate-reader and arsenazo III Nd-depletion assay. Key findings preserved in the Historical Appendix; summary: peak biomass (MPOB_040, max OD600 0.95) and most stable growth (MPOB_053, mixed C1+C2 metabolism) identified.

Round 2 (May 2026)

Repeat of v10 with minor recipe adjustments. Switched growth modality to Biolog PM08 (740 nm biomass + 590 nm redox in parallel), bumped Nd³⁺ dose to 15 µM, added 48 row-B Nd calibration standards across 4 plates.

Canonical analysis endpoint: t2 (6 h) since 2026-05-15 — t2 is the only timepoint with a paired abiotic control, making it the only timepoint at which chemistry-vs-biology attribution is empirically possible. All round-2 scripts default to t2; pass --endpoint-timepoint t3 for legacy.

At t2 the 3-way Pareto frontier is 3 conditions:

Condition OD600 t2 Abs590 t2 Nd remaining t2 (µM) MC freq
MPOB_058 0.241 0.300 2.58 0.99 (stable)
MPOB_008 0.231 0.293 2.07 borderline
MPOB_019 0.208 0.251 1.70 borderline

The five conditions that were t3-only Pareto winners (MPOB_022/_066/_020/_035/_024) fell off the frontier at t2; their late depletion happened between t2 and t3, outside the paired-control window and so cannot be attributed to biology without further measurement (cell-pellet ICP-MS recommended in round 3).

Round-2 analysis outputs live in 8 subdirs under outputs/round2_*:

  • outputs/round2_3way_pareto/ — joint OD600 × Abs590 × Nd Pareto.
  • outputs/round2_mc_pareto/ — Monte-Carlo Pareto stability under replicate σ; emits a 2-panel composite figure plus standalone histogram and scatter PDFs/PNGs.
  • outputs/round2_double_winners/ — cross-cluster join (only MPOB_008 is a majority growth ∩ Nd-uptake double winner).
  • outputs/round2_abiotic_correction/ — empirical t1→t2 abiotic drift per condition.
  • outputs/round2_t2_paired_biology/ — (biotic − abiotic) at t2.
  • outputs/round2_precipitation_risk/ — Q/Ksp NdPO₄ model (refuted by the abiotic data).
  • outputs/round2_kinetic_fits/ — per-condition kinetic fits.
  • outputs/round1_vs_round2/ — reproducibility report (Spearman ρ ≈ 0: measurement-modality drift, not biology drift).

Abs590 caveat: r(OD600, Abs590) = 0.982 across all 70 conditions — the redox channel adds no independent information beyond biomass in round 2 and is a candidate to drop in v16 unless the chemistry changes (e.g., a tetrazolium dye that decouples respiration from growth).

Round 3 (v16, planned)

The v16 design upgrades from 6 factors to 8 factors — adding Nd³⁺ (0–30 µM, 5-point grid) and citrate (0–300 µM) as first-class variables so the next round can disentangle MxaF-MDH vs XoxF-MDH and chemistry-vs-biology attribution. The full proposal:

  • v16_design_recommendation.md — factor ranges, t2-canonical 13-well anchor allocation (MPOB_058 × 4, MPOB_008 × 3, MPOB_019 × 2, plus 4 t3-only references × 1).
  • v16_bo_seeds.md — 10 Gaussian-Process + Expected-Improvement seed candidates (top predicted OD600 = 0.268 vs round-2 best 0.241).
  • nd_assay_alternatives_report.md + nd_assay_alternatives_1pager.md — recommends lanmodulin (LanM) fluorescence as primary HT readout (picomolar Nd affinity, 10⁸× Ca²⁺ selectivity, no per-plate calibration), cell-pellet ICP-MS on the 3 t2 Pareto winners + 5 BO seeds (≈ 32 samples) as the confirmatory subset, and an optional 1-plate arsenazo III bridge for cross-round comparability.

Installation

New collaborator? See the full Getting Started Guide for detailed setup including data downloads, database build, and troubleshooting.

Prerequisites

  • Python 3.10 or higher
  • uv package manager
  • just command runner

Quick Install

# Clone the repository
git clone --recurse-submodules https://github.com/CultureBotAI/MicroGrowAgents.git
cd MicroGrowAgents

# Install dependencies using uv
uv sync --group dev

# Download framework data (KG-Microbe, MediaDive, embeddings)
just download

# Download BER-CMM-AM1 project data (optional, for AM1 work)
just download-project

# Build database from downloaded sources
just build-db

# Verify installation
just test

Quick Start

DBTL: round-2 analysis (current campaign)

# Growth (Biolog 740 nm), Nd uptake (arsenazo III), and redox (Biolog 590 nm)
just analyze-experimental-round2     data/experimental/plate_designs_v10_maxprooptblock_long__round2_results
just analyze-experimental-round2-nd  data/experimental/plate_designs_v10_maxprooptblock_long__round2_results_asezuran
just analyze-experimental-round2-redox data/experimental/plate_designs_v10_maxprooptblock_long__round2_results

# Joint OD600 × Nd_uM Pareto at the canonical t2 endpoint
uv run python scripts/three_way_pareto_round2.py
uv run python scripts/mc_pareto_round2.py           # MC stability + 2-panel figure

# Round-3 deliverables (already committed under outputs/round{2,3}_recommendations/)

Media concentration prediction

# Get MP medium concentrations
uv run python run.py gen-media-conc "MP medium"

# Custom ingredients with PubChem enrichment
uv run python run.py gen-media-conc "glucose,NaCl,KH2PO4" --mode ingredients --enrich pubchem

Sensitivity analysis

# Basic pH + salinity sweep
uv run python run.py sensitivity "MP medium"

# With all advanced properties
uv run python run.py sensitivity "MP medium" \
    --calculate-osmotic --calculate-redox --calculate-nutrients --plot

Agents & Skills

MicroGrowAgents provides 29 specialized agent classes, 52 user-facing Python skills, and 22 Claude Code slash commands for microbial cultivation and media design. Complete reference in docs/AGENTS_SKILLS_TOOLS.md.

Specialized Agents (29)

Knowledge & Reasoning:

  • KGReasoningAgent — query KG-Microbe (1.5M nodes, 5.1M edges)
  • LiteratureAgent — literature mining and evidence extraction
  • AnalogyReasoningAgent — chemical similarity search (208K+ embeddings)
  • SheetQueryAgent — query extended information sheets

Genome Analysis:

  • GenomeFunctionAgent — genome-guided media design (57 genomes, 667K features)
  • LanthanideGenesAgent — lanthanide-dependent gene analysis
  • TransporterAgent — nutrient transporter annotation and analysis

Media Design & Optimization:

  • MediaFormulationAgent — multi-source media recommendation
  • GenMediaConcAgent — ML-based concentration prediction
  • CofactorMediaAgent — cofactor requirement analysis
  • AlternateIngredientAgent — alternative ingredient suggestions
  • MediaRoleAgent — ingredient metabolic role classification
  • MaxProOptBlockAgent — MaxPro optimal blocking design generation
  • ReconcileAgent — experimental vs prediction reconciliation
  • EnsembleOptimizationAgent — response surface modelling and BO
  • DesignRecommendationAgent — interpret results to recommend next design
  • ExperimentalInterpretationAgent — evidence-based biological interpretations with inline citations

Metabolic Modeling:

  • MetabolicSourceAgent — metabolic source identification
  • GapMindAgent — GapMind pathway gap analysis
  • GEMsemblerAgent — genome-scale metabolic model reconstruction
  • GrowthCodonAgent — codon usage bias-based growth prediction
  • MediaMatchAgent — MediaDive database integration

Chemistry & Properties:

  • ChemistryAgent — osmotic, redox, nutrient-ratio calculations
  • SensitivityAnalysisAgent — parameter sweep and sensitivity analysis

Audit & Provenance: (new)

  • ResearchAuditor — orchestrates file / data / provenance / report auditors against the analysis pipeline (src/microgrowagents/agents/analysis/research_auditor.py).
  • SchemaReviewAgent — LinkML schema review helper.

Data Management:

  • SQLAgent — database queries
  • IngredientCooccurrenceAgent, IngredientEffectsEnrichmentAgent, CSVAllDOIsEnrichmentAgent
  • PDFEvidenceExtractor, EvidenceExtractionOrchestrator — multi-source evidence orchestration

Python Skills (52)

64 skill modules organised under src/microgrowagents/skills/Analysis (19+), Prediction & Design (12), Query & Search (5), Chemistry & Validation (5), Workflows (6), Utilities (3), Meta (2, new — includes validate_linkml). The user-facing count is 52 after the recent additions; see docs/AGENTS_SKILLS_TOOLS.md §Skills for the complete categorical listing.

Claude Code Slash Commands (22)

Slash commands under .claude/skills/ callable from Claude Code:

Command Purpose
/plot (new) Publication-quality plots from data files via natural language. Backed by scripts/microgrow-plot.py; 11 plot types, 4 journal style presets (nature/science/minimal/dark), PNG/PDF/SVG output.
/validate-linkml (new) Validate a LinkML schema + example pair. Backed by src/microgrowagents/skills/meta/validate_linkml.py and scripts/validate_linkml_cli.py.
/recommend-media Multi-agent media formulation recommendation.
/design-maxpro-optblock MaxPro+OptBlock experimental design generation.
/lhs-design-generation Latin-hypercube design generation.
/predict-concentration Predict ingredient concentration ranges.
/predict-growth-cub, /predict-growth-hybrid Codon-usage-bias and hybrid growth predictors.
/analyze-gaps, /analyze-limitations, /analyze-lanthanide-genes, /analyze-ingredient-cooccurrence Analysis utilities.
/check-carbon-sources, /compare-gap-fba Carbon-source and gap-vs-FBA comparators.
/fba-gene-knockout-lanthanophore FBA-based gene-knockout analysis for lanthanophore biosynthesis.
/search-ingredients-hierarchical, /search-mediadive Search utilities.
/ingredient-report Per-ingredient evidence report.
/validate-media, /review-schema, /file-naming-conventions Validation + standards helpers.

Experimental Analysis Pipeline

Comprehensive dual-pipeline for analysing experimental growth data with both absolute (raw OD600) and relative (vs baseline) analysis modes, plus response surface modelling and Bayesian optimisation.

Features:

  • 📊 Dual-mode analysis (absolute + relative).
  • 🔬 Hierarchical clustering (276 replicates, 6 clusters).
  • 🗺️ Gaussian Process response surfaces with multi-objective Pareto.
  • 🤖 Ensemble optimisation (GP + polynomial + Random Forest).
  • 🎯 Bayesian optimisation with Expected Improvement acquisition.
  • 📈 ANOVA, main effects, Sobol sensitivity indices.
  • Schema-driven validation of all outputs.
  • 🔍 Evidence-based interpretation with inline citations.

Round-2 recipes

The round-2 data ships as Biolog raw CSVs + per-condition rollups + per-well Nd predictions (a different layout than round-1's flat plate{1,2,3}.tsv). The adapter at scripts/build_round2_replicate_statistics.py converts both into the round-1 schema so all downstream recipes run unchanged:

# Growth (Biolog 740 nm) — builds outputs/..._round2_results_experimental_analysis_absolute/
just analyze-experimental-round2 data/experimental/plate_designs_v10_maxprooptblock_long__round2_results

# Nd uptake (arsenazo III)
just analyze-experimental-round2-nd  data/experimental/plate_designs_v10_maxprooptblock_long__round2_results_asezuran

# Redox channel (Biolog 590 nm) — note the r=0.982 with OD600 (see DBTL §Round 2)
just analyze-experimental-round2-redox data/experimental/plate_designs_v10_maxprooptblock_long__round2_results

The following per-analysis scripts accept --endpoint-timepoint {t1,t2,t3}; default is t2 since 2026-05-15:

  • scripts/three_way_pareto_round2.py
  • scripts/mc_pareto_round2.py
  • scripts/compare_round1_vs_round2.py
  • scripts/analyze_round2_precipitation_risk.py
  • scripts/plot_pairwise_response_surfaces.py

The joint Pareto/BO driver (scripts/analyze_response_surfaces.py) auto-detects: tries t2 first, falls back to t3, then to max_* columns — no flag needed.

Round-1 dual-mode analysis

The original (round-1) flat-file pipeline:

# Run BOTH absolute and relative analyses (recommended)
just analyze-experimental data/experimental/plate_designs_v10_maxprooptblock_long__results

# Either mode alone
just analyze-experimental-absolute data/experimental/plate_designs_v10_maxprooptblock_long__results
just analyze-experimental-relative data/experimental/plate_designs_v10_maxprooptblock_long__results

# Cluster
just cluster-experimental outputs/plate_designs_v10_maxprooptblock_long__results_experimental_analysis_absolute/v10_maxprooptblock_long__results_replicate_statistics_absolute.tsv outputs/plate_designs_v10_maxprooptblock_long__results_experimental_analysis_clustering_absolute absolute

# Validate
just validate-experimental plate_designs_v10_maxprooptblock_long__results

Modes:

  • Absolute (raw OD600): "Which conditions grew best overall?"
  • Relative (fold-change vs control): "Which variations improved over baseline media?"

Output directories (per source data ID, e.g., v10_maxprooptblock_long__results):

  • outputs/{source_data_id}_experimental_analysis_{mode}/
  • outputs/{source_data_id}_experimental_analysis_clustering_{mode}/

Every output file is labelled with the source data ID for full traceability (the auto-generated prefix removes the plate_designs_ portion of the directory name and adds a trailing underscore).

Response Surface Modeling

Optional response surface modelling using Gaussian Processes for ingredient-measurement relationships and multi-objective optimisation:

# Runs automatically with analyze-experimental (enabled by default)
just analyze-experimental data/experimental/plate_designs_v13_latinhypercube_long__results

# Faster analysis without surfaces
python scripts/run_dual_analysis.py data/experimental/plate_designs_v10_maxprooptblock_long__results --disable-response-surfaces

# Standalone
python scripts/analyze_response_surfaces.py \
    outputs/plate_designs_v13_latinhypercube_long__results_experimental_analysis_absolute/ \
    --mode absolute --measurements OD600 Nd_uM

Capabilities: 3D surface plots, Pareto frontiers, predictions over design space, contour maps.

Measurement interpretation:

  • OD600 — biomass; absolute = raw, relative = fold-change vs control.
  • Nd_uM — Nd remaining in supernatant. In round-2, the absolute value at t2 is what the canonical pipeline reports; in the round-1 relative framing, negative = more consumption than control baseline. Initial Nd dose: 5.5 µM (round-1), 15 µM (round-2).

Outputs (per mode):

  • response_surfaces/surface_predictions_{measurement}_{mode}.csv
  • response_surfaces/surface_3d_{measurement}_{mode}.pdf/png
  • response_surfaces/pareto_frontier_{mode}.csv/pdf/png
  • response_surfaces/optimization_report_{mode}.txt

Optimization Workflow

uv run python -m microgrowagents.skills.simple.optimize_growth_conditions \
    --data outputs/experimental_analysis \
    --source-data-id plate_designs_v10_maxprooptblock_long__results \
    --output-dir outputs/optimization \
    --strategy hybrid \
    --n-suggestions 69

Trains ensemble models (GP + Polynomial + Random Forest), analyses ingredient effects + interactions, and uses Bayesian optimisation to suggest next experiments. Strategies: bayesian, local, uncertainty, or hybrid (70% local + 15% uncertainty + 15% space-filling).

Evidence-Based Interpretation

Generate publication-ready biological interpretations with inline citations and bibliography via ExperimentalInterpretationAgent:

from microgrowagents.agents.analysis import ExperimentalInterpretationAgent

agent = ExperimentalInterpretationAgent(source_version="v10")
result = agent.run()

Produces four artifacts:

  1. INTERPRETATION_REPORT.md — clean biological interpretation (executive summary, factor-by-factor analysis, metabolic insights, testable hypotheses, recommendations for next design iteration).
  2. INTERPRETATION_EVIDENCE.md — evidence companion file with data evidence E1–E# (cited file + section + snippet) and literature evidence L1–L# (DOIs).
  3. INTERPRETATION_REPORT_evidence.md — citation-based version with inline [E1], [L2] markers and a complete bibliography.
  4. interpretation_metadata.json — execution metadata.

See docs/EXPERIMENTAL_INTERPRETATION_AGENT.md for the complete documentation.

Cofactor & Chemistry Reference

Cofactor reference (60 cofactors)

The CofactorMediaAgent and the generate_cofactor_reference script integrate 6 biological databases (ChEBI, KEGG, BRENDA, ExplorEnz, KG-Microbe, plus literature) to produce two reference TSVs:

just generate-cofactor-reference
# emits:
#   data/references/cofactors_complete.tsv  (60 cofactors with CHEBI IDs, EC associations, usage tracking)
#   data/references/cofactors_metals.tsv    (19-cofactor metal/REE subset including 5 lanthanides)

Primary data sources:

Reference files (in-repo):

  • src/microgrowagents/data/cofactor_hierarchy.yaml — 44 cofactors across 5 categories (curated)
  • src/microgrowagents/data/ec_to_cofactor_map.yaml — 68 EC pattern mappings
  • data/references/cofactors_complete.tsv — 60 cofactors (generated)
  • data/references/cofactors_metals.tsv — 19 metals / REEs (generated)
  • data/processed/ingredient_cofactor_mapping.csv — 13 MP medium cofactor providers

Docs:

Chemistry modules

Module: src/microgrowagents/chemistry/

Osmotic Properties (osmotic_properties.py):

  • calculate_osmolarity(ingredients, temperature=25.0)
  • calculate_water_activity(ingredients, temperature=25.0, method="raoult")
  • estimate_van_hoff_factor(formula, charge, name)
  • Methods: Raoult's law (dilute), Robinson-Stokes (concentrated), Bromley (high ionic strength).

Redox Properties (redox_properties.py):

  • calculate_redox_potential(ingredients, ph, temperature=25.0) — Eh and pE via Nernst.
  • calculate_electron_balance(ingredients).

Nutrient Ratios (nutrient_ratios.py):

  • calculate_cnp_ratios(ingredients) — C:N:P, limiting-nutrient classification.
  • calculate_trace_metal_ratios(ingredients) — Fe:P, Mn:P, Zn:P with deficiency / excess flags.
  • Redfield ratio comparison (marine: 106:16:1, terrestrial: ~60:7:1).

Thermodynamic Properties (thermodynamic_properties.py):

  • calculate_gibbs_free_energy(reactants, products, ph=7.0) — ΔG via eQuilibrator + Component Contribution.
  • calculate_formation_energy(compound) — ΔGf°.

Lanthanide chemistry

Specialised modules and docs for Nd³⁺ bioavailability — Ksp-based NdPO₄ precipitation, citrate / malate chelation, bioavailable-fraction calculation:

Cited from the round-3 Nd-assay recommendation (nd_assay_alternatives_report.md) as the chemistry rationale for the LanM + cell-pellet ICP-MS protocol.

Data Integrity, Provenance & Audit

Input Data Checksums

All input data files are protected with SHA256 checksums for cryptographic reproducibility (bbop-skills Criterion 4):

just verify-data-integrity   # check `data/checksums.txt` against current data files

Stored at data/checksums.txt (global) and outputs/*/input_data_checksums.json (per-analysis). Every analysis records checksums of its input files. See docs/ARTIFACT_CLEANUP_POLICY.md for the generation procedure.

ResearchAuditor

End-to-end auditing of the analysis pipeline, scored against the bbop-skills criteria for local-first agentic systems. Components:

uv run python scripts/run_research_audit.py \
    --session-id <session-uuid> \
    --output outputs/research_audit_<date>/

Artifact Cleanup Policy

Three-tier retention (docs/ARTIFACT_CLEANUP_POLICY.md):

Archival (keep): published designs (v10, v13, …), validated analysis with interpretations, response surface models. Temporary (30 days): per-run analysis outputs, clustering, intermediate optimisation runs. Ephemeral (7 days): test outputs, debug artifacts, scratch visualisations.

just archive-outputs      # move to archive/
just clean-old-outputs    # >30 days
just clean-ephemeral      # >7 days

Steady-state ~185 MB (with cleanup) vs ~4 GB/year unmanaged (96% reduction).

Audit Compliance

Overall: 78% (7/9 PASS) — full breakdown in docs/AUDIT_REPORT_BBOP_SKILLS.md:

PASS (7): provenance tracking, model tracking, reasoning/code separation, validation (LinkML schemas + validators), error-correction (DOI validation + corrections), RAG (KG-Microbe + literature + genomes), artifact cleanup. ⚠️ PARTIAL (1): documentation/automation. ❌ FAIL (1): MCP integration (under consideration).

Action checklist with implementation steps + target dates: docs/AUDIT_ACTIONS_CHECKLIST.md.

Citation Coverage

DOI validation: 90.5% (143/158 DOIs) — 92 PDFs, 44 abstracts, 15 missing.

uv run python scripts/doi_validation/validate_failed_dois.py
uv run python scripts/doi_corrections/apply_doi_corrections.py
uv run python scripts/pdf_downloads/download_all_pdfs_automated.py

History: notes/DOI_CORRECTIONS_FINAL_UPDATED.md.

Core Capabilities

Media Concentration Generation (gen-media-conc)

Predicts LOW, DEFAULT, and HIGH concentration ranges for media ingredients:

uv run python run.py gen-media-conc "MP medium"
uv run python run.py gen-media-conc "PIPES,NaCl,glucose" --mode ingredients
uv run python run.py gen-media-conc "MP medium" --enrich pubchem

Output: predicted concentration ranges (mM), molecular weights, chemical formulas, confidence scores.

Sensitivity Analysis (sensitivity)

uv run python run.py sensitivity "MP medium"
uv run python run.py sensitivity "MP medium" --calculate-osmotic --calculate-nutrients
uv run python run.py sensitivity "MP medium" --plot --plot-output analysis.png

Calculates pH, salinity (TDS + NaCl-equivalent), ionic strength; optionally osmotic / redox / nutrient ratios.

Media Comparison (compare-media)

uv run python run.py compare-media "MP medium" "LB medium"

Common vs unique ingredients, concentration differences.

Media Formulation Recommendation (recommend-media)

from microgrowagents.skills.workflows import RecommendMediaWorkflow

workflow = RecommendMediaWorkflow()
result = workflow.run(
    query="Recommend medium for methanotrophic bacteria",
    organism="Methylococcus capsulatus",
    temperature=42.0, pH=6.8,
    carbon_source="methane", oxygen="aerobic",
    goals="defined,selective",
    output_format="markdown",
)

Multi-source evidence integration (KG-Microbe + literature + MP database). Complete formulation with ingredient list, concentrations, roles, alternatives, confidence scores. Goal presets: minimal, defined, complex, cost_effective, high_yield, selective. Full skill docs in .claude/skills/recommend-media.md.

Genome Function Interpretation

Organism-specific media design using 57 Bakta-annotated genomes (667,502 features). EC-number queries with wildcard support (1.1.*.*), auxotrophy detection, cofactor analysis, transporter analysis. Automatically integrated into MediaFormulationAgent, GenMediaConcAgent, and KGReasoningAgent. See docs/GENOME_FUNCTION.md for examples.

Advanced Usage

Combining Multiple Property Calculations

uv run python run.py sensitivity "MP medium" \
    --calculate-osmotic --calculate-redox --calculate-nutrients \
    --ph 7.0 --temperature 30 \
    --format json --output complete_analysis.json

Pipeline Mode

uv run python run.py gen-media-conc "MP medium" --format json > predictions.json
uv run python run.py sensitivity --input-file predictions.json --calculate-osmotic

Python API

from microgrowagents.agents.sensitivity_analysis_agent import SensitivityAnalysisAgent

agent = SensitivityAnalysisAgent(db_path="data/microgrowdb.db")
result = agent.run(
    query="MP medium",
    mode="medium",
    calculate_osmotic=True, calculate_redox=True, calculate_nutrients=True,
    temperature=37.0,
)
print(f"pH: {result['baseline']['ph']}")
print(f"Limiting nutrient: {result['baseline']['nutrient_ratios']['limiting_nutrient']}")

Repository Structure

MicroGrowAgents/
├── CLAUDE.md                        # Project instructions for Claude Code
├── README.md                        # this file
├── justfile + project.justfile      # task recipes
├── pyproject.toml + uv.lock         # dependencies (uv)
│
├── src/microgrowagents/
│   ├── agents/                      # 29 specialized agent classes
│   │   ├── analysis/                # research_auditor, schema_review, interpretation, design_recommendation
│   │   └── …
│   ├── skills/                      # 52 user-facing skills (64 .py modules)
│   │   ├── analysis/{experimental,statistical,visualization}/
│   │   ├── core/{chemistry,genome,knowledge,modeling}/
│   │   ├── design/{doe,media,validation}/
│   │   ├── meta/                    # validate_linkml (new)
│   │   └── workflows/, utilities/, formatters/, executors/, simple/, development/
│   ├── chemistry/                   # osmotic, redox, nutrient_ratios, thermodynamic, precipitation, chelation, bioavailability
│   ├── provenance/                  # auditor (ResearchAuditor backend)
│   ├── utils/                       # audit_{report_generator,structures}, data_auditor, file_auditor, checksums, …
│   └── schema/                      # LinkML schemas including audit_outputs_schema.yaml
│
├── scripts/                         # Integration / analysis scripts
│   ├── build_round2_replicate_statistics.py  # DBTL2 adapter
│   ├── mc_pareto_round2.py, three_way_pareto_round2.py, …
│   ├── microgrow-plot.py            # /plot skill backend
│   ├── validate_linkml_cli.py       # /validate-linkml skill CLI
│   ├── run_research_audit.py, demo_research_audit.py
│   ├── generate_cofactor_reference.py + enhance_cofactor_references_v3.py + validate_enhance_cofactors_metals.py
│   ├── generate_ko_to_{ec,go_map_{bakta,uniprot}}.py
│   ├── generate_architecture_diagrams_{simplified,abstract,vivid}.py
│   ├── generate_explanatory_heatmap.py + regenerate_*heatmap.py + generate_provenance_heatmap.py
│   ├── compute_toxicity_report_bioavailability.py, extract_presentation_data.py
│   └── doi_validation/, doi_corrections/, pdf_downloads/, enrichment/, schema/
│
├── tests/                           # 1900+ pytest tests across modules
├── data/
│   ├── raw/                         # source data with checksums
│   ├── experimental/
│   │   ├── plate_designs_v10_maxprooptblock_long__results/                       # round 1 OD600
│   │   ├── plate_designs_v10_maxprooptblock_long__results_asezuran/              # round 1 arsenazo III
│   │   ├── plate_designs_v10_maxprooptblock_long__round2_results/                # round 2 Biolog
│   │   └── plate_designs_v10_maxprooptblock_long__round2_results_asezuran/       # round 2 arsenazo III
│   ├── references/                  # cofactors_{complete,metals}.tsv
│   ├── corrections/, results/, sheets_cmm/, pdfs/, designs/
│   └── checksums.txt
│
├── outputs/
│   ├── round1_vs_round2/, round2_3way_pareto/, round2_mc_pareto/, round2_double_winners/,
│   ├── round2_abiotic_correction/, round2_t2_paired_biology/, round2_kinetic_fits/,
│   ├── round2_precipitation_risk/   # round-2 analyses (the dirs above are reproducible from scripts/)
│   ├── round2_recommendations/      # v16_design_recommendation.md, v16_bo_seeds.md
│   ├── round3_recommendations/      # nd_assay_alternatives_report.md (+ _1pager.md)
│   ├── cofactor_analysis/, lanthanide_genes/, optimization/, media/
│   └── plate_designs_v10_*_experimental_analysis_{absolute,relative,clustering_*}/
│
├── docs/                            # MkDocs documentation
│   ├── STATUS.md, AGENTS_SKILLS_TOOLS.md, RESEARCH_AUDITOR.md, …
│   ├── COFACTOR_REFERENCE.md, LANTHANIDE_BIOAVAILABILITY_COMPLETE.md, …
│   ├── architecture/                # simplified / abstract / vivid diagram variants
│   └── figures/                     # explanatory heatmaps, provenance heatmap
│
├── notes/                           # research notes, DOI corrections, session summaries
└── .claude/
    ├── provenance/                  # session manifests + action logs
    └── skills/                      # 22 slash commands (/plot, /validate-linkml, …)

Development

# All tests + type checking + formatting
just test

# Targeted
uv run pytest tests/test_chemistry/test_osmotic_properties.py -v
uv run pytest --cov=microgrowagents --cov-report=html

# Type / format
just mypy
just format

# Documentation
just _serve          # local mkdocs
mkdocs build

The full test suite is ~2000 tests across the agents, skills, chemistry, KG, validators, and scripts trees; coverage report via --cov-report=html.

Documentation Website

https://CultureBotAI.github.io/MicroGrowAgents

Tools, APIs & Datasets

External tools, APIs, and datasets integrated with MicroGrowAgents.

External APIs

Chemical:

Biological:

Planned:

Knowledge Graphs & Datasets

KG-Microbe — 1.5M nodes, 5.1M edges; 864,363 species (GTDB + LPSN + NCBI). Genome annotations — 57 Bakta-annotated genomes, 667,502 features (incl. M. extorquens AM1, M. capsulatus). Chemical embeddings — 208K+ Morgan fingerprints / descriptors for analogy-based reasoning. MP Medium database — 158 ingredients × 68 columns, 158 unique DOIs, 90.5% citation coverage. Literature corpus — 245+ papers with extracted excerpts.

External Software

Metabolic modelling: GapMind, GEMsembler, COBRApy. Genome annotation: Bakta, NCBI BLAST. Experimental design: MaxPro+OptBlock (custom), Latin Hypercube Sampling. Growth prediction: GrowthCodon (codon usage bias), MediaDive.

Python Libraries

Scientific: numpy, pandas, scipy, scikit-learn. Chemistry: rdkit, equilibrator-api. Visualization: matplotlib, seaborn, plotly. Database / KG: duckdb, sqlalchemy, linkml. Optimization: scikit-optimize, pymoo, SALib, statsmodels. PDF / PPTX: pypdf, python-pptx. Dev: pytest, mypy, ruff, uv.

Data Provenance

  • data/raw/mp_medium_ingredient_properties.csv — ingredient data with DOIs.
  • docs/STATUS.md — citation coverage metrics.
  • notes/DOI_CORRECTIONS_FINAL_UPDATED.md — DOI validation history.
  • docs/cofactor_data_sources.md — cofactor source methodology.

Historical Appendix

Round 1 key findings (Feb–Mar 2026, v10 design, OD600)

Top performer: MPOB_040

  • Max OD600: 0.95 (highest overall).
  • Strategy: pure C1 methylotrophy (67.9 mM methanol, low succinate).
  • Challenge: 98% crash at 48 h due to methanol depletion.
  • Crash analysis: outputs/optimization/MPOB_040_CRASH_ANALYSIS.md.

Most stable: MPOB_053

  • Max OD600: 0.66 (sustained growth across all timepoints).
  • Strategy: mixed C1+C2 metabolism (19.9 mM methanol, 58.7 mM succinate).
  • Key finding: 40–60 mM succinate provides metabolic backup when methanol depletes, preventing culture crash while maintaining high peak growth.

These v10-era results motivated the round-2 design (Biolog dual-channel, arsenazo III at higher 15 µM Nd dose). Round-2 supersedes round-1 for foreground decision-making — see DBTL Campaign Status §Round 2.

v13 lanthanide-dependent growth design

v13 varied Neodymium 0–5 µM to test MxaF vs XoxF-MDH pathways:

  • High OD600 at low Nd → lanthanide-independent (MxaF-MDH).
  • High OD600 at high Nd → lanthanide-dependent (XoxF-MDH).
  • Response surface modelling identifies Pareto-optimal conditions.
  • Multi-objective optimisation balances growth AND Nd utilisation.

v16 extends the Nd³⁺ axis to 0–30 µM (5-point grid) and adds citrate (0–300 µM) so the round-3 experiment can directly probe chemistry-vs-biology attribution.

Reproducibility: round-1 vs round-2

Spearman ρ ≈ 0 on OD600 and ≈ −0.17 on Nd across 69 matched conditions; 26 / 69 growth and 42 / 69 Nd conditions disagree at |z| > 2σ. The two rounds switched instruments (600 nm → Biolog 740 nm) and Nd calibration (raw abs660 → Miller §5.10 inverse fit), so this is measurement-modality drift, not biology drift. v16 anchors on round-2 winners alone. Full analysis: outputs/round1_vs_round2/REPRODUCIBILITY_REPORT.md.

Contributing

Contributions are welcome:

  1. Fork the repository.
  2. Create a feature branch.
  3. Write tests for new functionality.
  4. Ensure all tests pass (just test).
  5. Submit a pull request.

License

BSD 3-Clause License. See LICENSE for details.

Copyright (c) 2026 Marcin P. Joachimiak, Lawrence Berkeley National Laboratory

Credits

This project uses the template monarch-project-copier.

Citation

If you use MicroGrowAgents in your research, please cite this repository.

Contact

Principal Investigator: Dr. Marcin P. Joachimiak

  • Institution: Lawrence Berkeley National Laboratory
  • Project: CultureBotAI Initiative
  • GitHub: CultureBotAI

For questions or issues:

About

No description or website provided.

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages