Skip to content

Enhance cross-repository environmental linking with CultureMech and MediaIngredientMech #30

@realmarcin

Description

@realmarcin

Summary

Enhance CommunityMech's environmental linking capabilities to support bidirectional references with CultureMech media and MediaIngredientMech ingredients based on shared ENVO terms.

Background

Triggered by: Issue #24 (SPRUCE Peatland Community addition)

CommunityMech currently has robust environmental metadata:

environment_term:
  preferred_term: peatland
  term:
    id: ENVO:00000044
    label: peatland

However, there's no mechanism to:

  1. Link communities to relevant CultureMech media for the same environment
  2. Link communities to relevant MediaIngredientMech ingredients
  3. Auto-suggest media/ingredients when curating new communities

Related Schema Enhancements

In Progress:

These will enable environment-based cross-repository queries.

Proposed CommunityMech Enhancements

Option 1: Enhance growth_media Field (Recommended)

Currently, growth_media field exists but may not have CultureMech ID linking:

# Current (if implemented)
growth_media:
  - name: Some Medium
    description: "Medium description"

# Proposed Enhancement
growth_media:
  - preferred_term: Acidic Peatland Medium
    culturemech_id: CultureMech:010001  # Direct link
    environment_match: ENVO:00000044    # Matched via environment
    notes: "Used for cultivating methanogenic archaea from SPRUCE"
    
  - preferred_term: Generic Anaerobic Medium  
    culturemech_id: CultureMech:005432
    environment_match: null  # Not environment-specific
    notes: "General purpose medium"

Option 2: Add related_media Field

If growth_media is for actual media used, add separate field for related media:

related_media:
  description: CultureMech media relevant to this community's environment
  range: RelatedMedia
  multivalued: true
  inlined_as_list: true

RelatedMedia:
  attributes:
    preferred_term:
      description: Media name
      required: true
    culturemech_id:
      description: CultureMech identifier
      range: string
      pattern: "^CultureMech:\\d{6}$"
    relationship_type:
      description: How media relates to community
      range: MediaRelationshipEnum
      # VALUES: CULTIVATION_MEDIUM, ISOLATION_MEDIUM, 
      #         ENVIRONMENT_ANALOG, REFERENCED_IN_STUDY
    evidence:
      description: Evidence for this relationship
      range: EvidenceItem
      multivalued: true

Option 3: Add related_ingredients Field

Similarly for ingredients:

related_ingredients:
  description: MediaIngredientMech ingredients relevant to this community
  range: RelatedIngredient
  multivalued: true
  inlined_as_list: true

RelatedIngredient:
  attributes:
    preferred_term:
      description: Ingredient name
      required: true
    mediaingredientmech_id:
      description: MediaIngredientMech identifier  
      range: string
      pattern: "^MediaIngredientMech:\\d{6}$"
    relevance:
      description: Why ingredient is relevant
      range: string
    evidence:
      description: Evidence for relevance
      range: EvidenceItem
      multivalued: true

Use Cases

Use Case 1: Adding a New Peatland Community

Current Workflow:

  1. User provides PMIDs, environmental info
  2. Create community record with environment_term: ENVO:00000044
  3. ❌ No way to find relevant media
  4. ❌ No way to find relevant ingredients

Enhanced Workflow:

  1. User provides PMIDs, environmental info
  2. Create community record with environment_term: ENVO:00000044
  3. Auto-query CultureMech: "Find media with source_environment: ENVO:00000044"
  4. Auto-query MediaIngredientMech: "Find ingredients with environmental_context: ENVO:00000044"
  5. Auto-suggest: "15 peatland media found, 8 peatland ingredients found"
  6. User selects relevant media/ingredients to link

Use Case 2: Environmental Coverage Dashboard

Environment: Peatland (ENVO:00000044)
┌─────────────────────┬───────┬────────────────────────┐
│ Repository          │ Count │ Status                 │
├─────────────────────┼───────┼────────────────────────┤
│ Communities         │ 3     │ ✅ SPRUCE, ...         │
│ Media (CultureMech) │ 15    │ ✅ Good coverage       │
│ Ingredients (MIM)   │ 8     │ ✅ Specialized items   │
└─────────────────────┴───────┴────────────────────────┘

Environment: Deep-sea hydrothermal vent (ENVO:01000030)
┌─────────────────────┬───────┬────────────────────────┐
│ Repository          │ Count │ Status                 │
├─────────────────────┼───────┼────────────────────────┤
│ Communities         │ 5     │ ✅ Well studied        │
│ Media (CultureMech) │ 2     │ ⚠️ Need more media     │
│ Ingredients (MIM)   │ 1     │ ⚠️ Need more items     │
└─────────────────────┴───────┴────────────────────────┘

Use Case 3: Cross-Repository SPARQL Query

# Find complete environmental profile
SELECT ?community ?community_name ?media ?media_name ?ingredient ?ingredient_name
WHERE {
  # Communities in peatland
  ?community a communitymech:MicrobialCommunity ;
             communitymech:environment_term/communitymech:id "ENVO:00000044" ;
             communitymech:name ?community_name .
  
  # Media for peatland organisms
  ?media a culturemech:CultureMedia ;
         culturemech:source_environment/culturemech:id "ENVO:00000044" ;
         culturemech:name ?media_name .
  
  # Ingredients relevant to peatland
  ?ingredient a mediaingredientmech:MappedIngredient ;
              mediaingredientmech:environmental_context/mediaingredientmech:environment_term "ENVO:00000044" ;
              mediaingredientmech:preferred_term ?ingredient_name .
}

Use Case 4: add_community Skill Enhancement

Update the orchestration skill to automatically suggest cross-repo links:

# In add_community skill workflow
def match_culturemech_media(community_environment: str):
    """Find CultureMech media matching community environment"""
    
    # Query CultureMech for matching environment
    media_results = culturemech_api.search(
        source_environment=community_environment
    )
    
    # Return suggestions
    return [
        {
            "culturemech_id": media.id,
            "name": media.name,
            "confidence": calculate_relevance_score(media, community),
            "evidence": "Environment match: " + community_environment
        }
        for media in media_results
    ]

# Similarly for ingredients
def match_mediaingredient_items(community_environment: str):
    """Find MediaIngredientMech ingredients matching environment"""
    
    ingredient_results = mediaingredient_api.search(
        environmental_context=community_environment
    )
    
    return format_ingredient_suggestions(ingredient_results)

Implementation Plan

Phase 1: Schema Review (Weeks 1-2)

  • Review current growth_media field implementation
  • Decide: enhance growth_media vs. add related_media/related_ingredients
  • Define validation rules for cross-repo IDs
  • Coordinate with CultureMech and MediaIngredientMech schema changes

Phase 2: Schema Enhancement (Weeks 3-4)

  • Update communitymech.yaml schema
  • Add CultureMech ID and MediaIngredientMech ID linking
  • Regenerate Python dataclasses
  • Update validation pipelines

Phase 3: Tooling Enhancement (Weeks 5-6)

  • Update add_community skill to query CultureMech by environment
  • Update add_community skill to query MediaIngredientMech by environment
  • Create auto-suggestion interface for media/ingredient linking
  • Add cross-repo validation (check IDs exist)

Phase 4: Documentation (Week 7)

  • Document cross-repo linking patterns
  • Create tutorial for adding environment-linked communities
  • Add SPARQL query examples

Phase 5: Backfill (Ongoing)

  • For existing communities with environment terms, suggest media/ingredient links
  • Prioritize well-characterized environments (peatland, marine, gut, soil)

Example: SPRUCE Community Enhancement

Current State (Issue #24)

id: CommunityMech:000024
name: SPRUCE Peatland Warming Microbial Community
environment_term:
  preferred_term: peatland
  term:
    id: ENVO:00000044
    label: peatland

Enhanced State (After Implementation)

id: CommunityMech:000024
name: SPRUCE Peatland Warming Microbial Community

environment_term:
  preferred_term: peatland
  term:
    id: ENVO:00000044
    label: peatland

# Auto-discovered from CultureMech
related_media:
  - preferred_term: Acidic Peatland Medium
    culturemech_id: CultureMech:010001
    relationship_type: ENVIRONMENT_ANALOG
    evidence:
      - reference: PMID:38515239
        supports: SUPPORT
        evidence_source: IN_VIVO
        snippet: "Peat microbial communities characterized in situ"

  - preferred_term: Methanogen Enrichment Medium
    culturemech_id: CultureMech:010045
    relationship_type: CULTIVATION_MEDIUM
    evidence:
      - reference: PMID:34836550
        supports: SUPPORT
        evidence_source: IN_VIVO
        snippet: "Methanogenic archaea detected in anoxic peat"

# Auto-discovered from MediaIngredientMech  
related_ingredients:
  - preferred_term: Humic acid
    mediaingredientmech_id: MediaIngredientMech:000523
    relevance: "Major peat organic matter component, provides carbon source"
    
  - preferred_term: Sphagnum moss extract
    mediaingredientmech_id: MediaIngredientMech:001234
    relevance: "Extracted from dominant peatland plant species"

Benefits

  1. Automated discovery: Find media/ingredients when adding communities
  2. Coverage tracking: Identify environments needing more resources
  3. Knowledge graph: Rich cross-repository linking
  4. User experience: Guided curation with suggestions
  5. FAIR data: Improved findability and interoperability

Success Metrics

  • Linking Coverage: % of communities with media/ingredient links
  • Query Success: Cross-repo queries return complete results
  • Curation Time: Time to add new community (should decrease)
  • Coverage Gaps: Identified environments needing resources

Related Issues

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions