-
Notifications
You must be signed in to change notification settings - Fork 0
Add metpo-proposal skill #75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
d7d4c45
Add METPO ROBOT-template proposal lifting CommunityMech enums
realmarcin 554c52e
Add metpo-proposal skill
realmarcin 3a89bff
Extend metpo-proposal skill with update-path guidance
realmarcin 76176bc
Address Copilot review on PR #75: replace kg-microbe relative links
realmarcin 5ce70f9
Merge remote-tracking branch 'origin/main' into claude/metpo-proposal…
realmarcin File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,391 @@ | ||
| --- | ||
| name: metpo-proposal | ||
| description: Generate a ROBOT-template METPO proposal that lifts CommunityMech LinkML enums, classes, and slots into METPO classes and predicates with a curated hierarchy | ||
| category: workflow | ||
| requires_database: false | ||
| requires_internet: false | ||
| version: 1.0.0 | ||
| tags: [metpo, ontology, robot, linkml, proposal, schema-lift, kg-microbe] | ||
| --- | ||
|
|
||
| # METPO Proposal Skill | ||
|
|
||
| ## Overview | ||
|
|
||
| Turn a slice of the CommunityMech LinkML schema (one or more `*Enum` | ||
| permissible-value sets, plus the slots that connect them) into a METPO | ||
| ROBOT-template proposal that can be merged with the existing kg-microbe | ||
| METPO proposal pipeline. | ||
|
|
||
| Three artifacts are produced under `proposals/<cohort-name>/`: | ||
|
|
||
| | File | Format | | ||
| |---|---| | ||
| | `metpo_proposal_classes_robot.tsv` | 11-column ROBOT template (mirrors kg-microbe convention) | | ||
| | `metpo_proposal_properties_robot.tsv` | 12-column ROBOT template | | ||
| | `proposal.md` | Reviewer narrative: scope, hierarchy decisions, predicate rationale, verification, upstream path | | ||
|
|
||
| **Run from `CommunityMech/CommunityMech/` directory.** Reference example: | ||
| [PR #74](https://github.com/CultureBotAI/CommunityMech/pull/74) and the files | ||
| under `proposals/metpo_communitymech_v1/`. | ||
|
|
||
| --- | ||
|
|
||
| ## When to use this skill | ||
|
|
||
| - A new CommunityMech enum (or set of enums) is added to | ||
| `src/communitymech/schema/communitymech.yaml` and should be exposed to | ||
| KG-Microbe consumers as METPO classes. | ||
| - An existing METPO proposal cohort needs an additional batch of community | ||
| scope (e.g., lifting `AtmosphereEnum` after a previous round shipped the | ||
| 9 community-shaped enums). | ||
| - A reviewer asks for "the METPO version" of a CommunityMech slot | ||
| (e.g., `EcologicalInteraction.source_taxon` → `has source taxon` object | ||
| property). | ||
|
|
||
| Do NOT use for: lifting PATO/GO/CHEBI cross-references (those already exist | ||
| upstream), lifting tolerance ranges (use the upstream | ||
| [kg-microbe metpo-proposal skill](https://github.com/Knowledge-Graph-Hub/kg-microbe/blob/main/.claude/skills/metpo-proposal/SKILL.md) | ||
| which handles the paired positive/negative predicate convention). | ||
|
|
||
| --- | ||
|
|
||
| ## Required reading | ||
|
|
||
| Before generating a proposal, read: | ||
|
|
||
| 1. **[kg-microbe/.claude/skills/metpo-proposal/SKILL.md](https://github.com/Knowledge-Graph-Hub/kg-microbe/blob/main/.claude/skills/metpo-proposal/SKILL.md)** | ||
| (local clone path: `~/Documents/VIMSS/ontology/KG-Hub/KG-Microbe/kg-microbe/.claude/skills/metpo-proposal/SKILL.md`) — | ||
| the upstream metpo-proposal skill. Defines: | ||
| - Aristotelian definition style | ||
| - `definition_source` citation forms (PMID, DOI, BacDive, `TODO:add_citation`) | ||
| - Numeric ID-range conventions | ||
| - Parent-class selection (audit for siblings before falling back to | ||
| `METPO:1000000`) | ||
| - The 12-point pre-submission checklist | ||
| 2. **[kg-microbe/mappings/metpo_proposal_classes_robot.tsv](https://github.com/Knowledge-Graph-Hub/kg-microbe/blob/main/mappings/metpo_proposal_classes_robot.tsv)** — | ||
| the canonical 11-column class template. Copy the two-row header verbatim. | ||
| 3. **[kg-microbe/mappings/metpo_proposal_properties_robot.tsv](https://github.com/Knowledge-Graph-Hub/kg-microbe/blob/main/mappings/metpo_proposal_properties_robot.tsv)** — | ||
| the canonical 12-column property template. | ||
| 4. **`proposals/metpo_communitymech_v1/`** in this repo — the reference | ||
| example. Read all three files end-to-end before writing a new cohort. | ||
|
|
||
| --- | ||
|
|
||
| ## ROBOT template column conventions | ||
|
|
||
| These are the only column structures that will parse cleanly. Tab-separated, | ||
| two header rows. | ||
|
|
||
| **Classes** (11 columns): | ||
|
|
||
| ``` | ||
| proposed_id<TAB>label<TAB>definition<TAB>definition_source<TAB>parent<TAB>synonyms<TAB>xrefs<TAB>subset<TAB>priority<TAB>observations<TAB>traits_addressed | ||
| ID<TAB>LABEL<TAB>A IAO:0000115<TAB>>A IAO:0000119<TAB>SC %<TAB>A oboInOwl:hasExactSynonym SPLIT=|<TAB>A oboInOwl:hasDbXref SPLIT=|<TAB>A oboInOwl:inSubset<TAB><TAB><TAB> | ||
| ``` | ||
|
|
||
| **Properties** (12 columns): | ||
|
|
||
| ``` | ||
| proposed_id<TAB>label<TAB>definition<TAB>definition_source<TAB>type<TAB>domain<TAB>range<TAB>xrefs<TAB>subset<TAB>priority<TAB>traits_addressed<TAB>observations | ||
| ID<TAB>LABEL<TAB>A IAO:0000115<TAB>>A IAO:0000119<TAB>TYPE<TAB>DOMAIN<TAB>RANGE<TAB>A oboInOwl:hasDbXref SPLIT=|<TAB>A oboInOwl:inSubset<TAB><TAB><TAB> | ||
| ``` | ||
|
|
||
| The **second row (ROBOT header) must have trailing tabs to reach the full | ||
| column count**, even when the trailing columns are blank. Validate with: | ||
|
|
||
| ```bash | ||
| awk -F'\t' 'NF != 11 {print NR": "NF" cols"}' proposals/<cohort>/metpo_proposal_classes_robot.tsv | ||
| awk -F'\t' 'NF != 12 {print NR": "NF" cols"}' proposals/<cohort>/metpo_proposal_properties_robot.tsv | ||
| ``` | ||
|
|
||
| Both commands should print nothing. | ||
|
|
||
| --- | ||
|
|
||
| ## ID-space conventions | ||
|
|
||
| | Range | Use | | ||
| |---|---| | ||
| | `METPO:1000000` | METPO root (use as `SC %` parent only when no closer parent exists) | | ||
| | `METPO:1000525` | "microbe" — use as `DOMAIN` for predicates whose subject is a microbial taxon | | ||
| | `METPO:1007NNN` | **Placeholder** range for new class proposals from KG-Microbe / CommunityMech (reserved per the SKILL.md placeholder policy). Real METPO IDs are minted upstream after sign-off. | | ||
| | `METPO:2007NNN` | **Placeholder** range for new predicate proposals from this pipeline. | | ||
|
|
||
| Within `METPO:1007NNN` and `METPO:2007NNN`, allocate contiguous numeric blocks | ||
| per enum so the file scans easily. The v1 proposal uses: | ||
|
|
||
| - `1007100`–`1007102` — top-level domain classes | ||
| - `1007103`–`1007110` — `FunctionalRoleEnum` | ||
| - `1007120`–`1007132` — `InteractionTypeEnum` | ||
| - `1007140`–`1007142` — `InteractionScopeEnum` | ||
| - `1007150`–`1007155` — `EvidenceItemSupportEnum` | ||
| - `1007160`–`1007165` — `EvidenceSourceEnum` | ||
| - `1007170`–`1007174` — `AbundanceEnum` | ||
| - `1007180`–`1007184` — `EcologicalStateEnum` | ||
| - `1007190`–`1007193` — `CommunityOriginEnum` | ||
| - `1007200`–`1007220` — `CommunityCategoryEnum` | ||
|
|
||
| Future cohorts should pick a fresh block starting at `1007300+` or higher to | ||
| avoid collision; document the new block in `proposal.md`. | ||
|
|
||
| --- | ||
|
|
||
| ## Subset tag | ||
|
|
||
| Every row must carry the same `oboInOwl:inSubset` value in column 8 (classes) | ||
| or column 9 (properties). Format: `metpo_<repo-id>_<YYYY>_<MM>`. | ||
|
|
||
| Examples: | ||
| - `metpo_communitymech_2026_05` — initial CommunityMech cohort (v1) | ||
| - `metpo_kgmicrobe_2026_04` — existing kg-microbe cohort | ||
|
|
||
| --- | ||
|
|
||
| ## Workflow | ||
|
|
||
| ### 1. Pick the scope | ||
|
|
||
| For each LinkML enum or class you intend to lift, write a one-line statement | ||
| of why it belongs in METPO. If the answer is "because KG-Microbe consumers | ||
| need to filter by it", proceed; if it's "because it's in the schema", | ||
| reconsider — METPO is a curated ontology, not a schema dump. | ||
|
|
||
| Skip enums that are pure machine identifiers (e.g., `DatasetTypeEnum`, | ||
| `CultureCollectionEnum`); those belong in NMDC / re3data, not METPO. | ||
|
|
||
| ### 2. Design the hierarchy | ||
|
|
||
| For each enum: | ||
|
|
||
| - **Pick a parent class** (will be `METPO:1007NNN` enum-parent — invent one if | ||
| none exists). Definition source: schema enum's own `description:` field. | ||
| Cite as `CommunityMech:communitymech.yaml#<EnumName>`. | ||
| - **Decide on intermediate parents**: | ||
| - Use them when the enum description or comments group values by valence | ||
| (e.g., InteractionTypeEnum's `+/+, -/-` annotations → positive/negative | ||
| parents). | ||
| - Use them when the schema has explicit `is_a` hints in comments | ||
| (e.g., `STRAIN_COMPETITION` ⊂ `COMPETITION`). | ||
| - Use them when there is an obvious ecological / functional grouping (e.g., | ||
| `CommunityCategoryEnum` clusters into metal-processing, host-associated, | ||
| etc.). | ||
| - Otherwise leave the enum flat. | ||
| - **Lift each permissible value as a leaf class**. The leaf's | ||
| `definition_source` is `CommunityMech:communitymech.yaml#<EnumName>.<VALUE>`. | ||
| The leaf's `parent` (`SC %`) is the appropriate intermediate parent or the | ||
| enum-parent. | ||
|
|
||
| For each top-level CommunityMech class (e.g., `MicrobialCommunity`, | ||
| `EcologicalInteraction`, `EvidenceItem`) that is referenced as a predicate | ||
| domain or range, declare a **top-level domain class** under `METPO:1000000`. | ||
| These get IDs like `1007100`–`1007102` in the v1 cohort. | ||
|
|
||
| ### 3. Design the predicates | ||
|
|
||
| For each LinkML slot that: | ||
| - has a non-primitive range (i.e., references another class or enum), AND | ||
| - is exercised by at least one community YAML in `kb/communities/`, | ||
|
|
||
| write one object-property row. Domain and range MUST resolve either to: | ||
| - another row in the same proposal (e.g., `METPO:1007101` for community | ||
| interaction), or | ||
| - an existing METPO IRI (`METPO:1000525` for microbe), or | ||
| - an external IRI (`NCBITaxon:1`, `CHEBI:24431`, `GO:0008150`). | ||
|
|
||
| Skip slots that just hold metadata (e.g., `community_id`, `created_at`). | ||
|
|
||
| Use the `does not X` paired-negative convention only when the predicate | ||
| relates a microbe to a chemical/physical capability (per the kg-microbe | ||
| SKILL.md). Domain-modeling slots (community ↔ taxon, interaction ↔ scope) | ||
| have no meaningful negative form; do not pair them. | ||
|
|
||
| ### 4. Write the TSVs | ||
|
|
||
| Write `metpo_proposal_classes_robot.tsv` and | ||
| `metpo_proposal_properties_robot.tsv` directly with the `Write` tool. Build | ||
| each row as a literal tab-separated string. After writing, fix the ROBOT | ||
| header row's trailing tabs: | ||
|
|
||
| ```bash | ||
| python3 -c " | ||
| lines = open('proposals/<cohort>/metpo_proposal_classes_robot.tsv').readlines() | ||
| lines[1] = lines[1].rstrip('\n') + '\t\t\t\n' # 3 trailing tabs to reach 11 cols | ||
| with open('proposals/<cohort>/metpo_proposal_classes_robot.tsv', 'w') as f: | ||
| f.writelines(lines) | ||
| " | ||
| ``` | ||
|
|
||
| ### 5. Verify | ||
|
|
||
| ```bash | ||
| # Column-count sanity (must print nothing) | ||
| awk -F'\t' 'NF != 11 {print NR": "NF" cols"}' proposals/<cohort>/metpo_proposal_classes_robot.tsv | ||
| awk -F'\t' 'NF != 12 {print NR": "NF" cols"}' proposals/<cohort>/metpo_proposal_properties_robot.tsv | ||
|
|
||
| # Enum coverage — every CommunityMech permissible value should appear as a leaf row | ||
| # Use the per-enum-value definition_source markers (e.g., "FunctionalRoleEnum.PRIMARY_DEGRADER") | ||
| python3 <<'EOF' | ||
| import re | ||
| schema = open('src/communitymech/schema/communitymech.yaml').read() | ||
| target_enums = ['FunctionalRoleEnum', 'InteractionTypeEnum', 'EvidenceItemSupportEnum'] | ||
| # ... see proposals/metpo_communitymech_v1/ for the canonical coverage check | ||
| EOF | ||
|
|
||
| # Parent integrity — every SC % parent must resolve in-file or to a known METPO IRI | ||
| # See proposals/metpo_communitymech_v1/ for the canonical integrity check | ||
| ``` | ||
|
|
||
| If you have ROBOT installed locally, also run: | ||
|
|
||
| ```bash | ||
| robot template --template proposals/<cohort>/metpo_proposal_classes_robot.tsv \ | ||
| --output /tmp/classes.owl | ||
| robot template --template proposals/<cohort>/metpo_proposal_properties_robot.tsv \ | ||
| --output /tmp/properties.owl | ||
| robot merge --input metpo-edit.owl --input /tmp/classes.owl --input /tmp/properties.owl \ | ||
| --output /tmp/merged.owl | ||
| robot reason --reasoner ELK --input /tmp/merged.owl --output /tmp/reasoned.owl | ||
| ``` | ||
|
|
||
| ROBOT must parse both TSVs and ELK must reason without unsatisfiable classes. | ||
|
|
||
| ### 6. Write the narrative | ||
|
|
||
| `proposal.md` is the reviewer's entry point. Required sections: | ||
|
|
||
| | Section | Content | | ||
| |---|---| | ||
| | **Context** | Why this cohort exists; what gap it fills in METPO | | ||
| | **Scope** | Table of enums lifted × parent class × leaf count | | ||
| | **Hierarchy decisions** | Per intermediate-parent: why you grouped these enum values together (cite schema comments where possible) | | ||
| | **Predicate proposals** | Table of property rows × domain × range × source slot | | ||
| | **ID space and subset** | The blocks you allocated and the subset tag | | ||
| | **Files** | Three-row table listing each artifact and row count | | ||
| | **Verification** | ROBOT commands plus the column-count and coverage checks | | ||
| | **Upstream path** | What happens after CommunityMech sign-off (typically: copy TSVs to `kg-microbe/mappings/`, run SKILL.md checklist, mint real METPO IDs) | | ||
| | **Change log** | Version + date + headline change | | ||
|
|
||
| ### 7. Open a PR | ||
|
|
||
| Standard CommunityMech workflow: | ||
|
|
||
| ```bash | ||
| git checkout -b claude/metpo-<cohort>-proposal | ||
| git add proposals/<cohort>/ | ||
| git commit -m "Add METPO ROBOT-template proposal: <cohort>" | ||
| git push -u origin claude/metpo-<cohort>-proposal | ||
| gh pr create --title "METPO ROBOT-template proposal: <cohort>" | ||
| gh api repos/CultureBotAI/CommunityMech/pulls/<n>/requested_reviewers -X POST -f "reviewers[]=Copilot" | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## Updating an existing proposal | ||
|
|
||
| The workflow above produces a **new** cohort. When work on an existing cohort | ||
| isn't finished, pick one of three update paths instead of starting fresh. | ||
|
|
||
| ### Decision rule | ||
|
|
||
| | Situation | Path | Why | | ||
| |---|---|---| | ||
| | Reviewer (human or Copilot) asks for changes on an open PR before merge | **Edit in place** | The IDs and subset tag are still proposal-stage; no downstream consumer has pinned them yet. | | ||
| | Cohort is merged on main and you want to add more enum lifts (e.g., `AtmosphereEnum` after the original 9 enums) | **Extend in place** | Append-only is non-breaking; reviewers can diff the new rows against the same cohort. | | ||
| | Cohort is merged and the underlying schema changed in a way that invalidates existing rows (e.g., an enum value was renamed or removed, or a definition was sharpened in a way that breaks substring lifts) | **New cohort version** | A breaking semantic change needs a fresh subset tag and ID block so consumers can pin to the old or new version. | | ||
| | Real METPO IDs have been minted upstream and a row has been re-IDed | **New cohort version** | Mutating an already-minted ID is hostile to downstream consumers. | | ||
|
|
||
| ### Path A — Edit in place (open PR) | ||
|
|
||
| Modify rows directly in | ||
| `proposals/<cohort>/metpo_proposal_*.tsv`. Keep the original | ||
| `proposed_id`, `subset`, and column structure. Update | ||
| `definition`, `definition_source`, `parent`, `synonyms`, or | ||
| `label` in place. Re-run the verification steps (column-count, | ||
| enum-coverage, parent-integrity). Commit on the same branch and | ||
| let the Copilot review thread close. | ||
|
|
||
| If the edit is to the narrative only (no TSV change), edit | ||
| `proposal.md` and add a one-line note under the Change log | ||
| section: `- v1, 2026-05 (revised <date>): <one-line summary>`. | ||
|
|
||
| ### Path B — Extend in place (merged cohort) | ||
|
|
||
| Add new rows to the **same** `metpo_proposal_*.tsv` files in the | ||
| **same** cohort directory. Append-only — never reorder or | ||
| renumber existing rows. Conventions: | ||
|
|
||
| - New rows use a **contiguous fresh block** within the same | ||
| `METPO:1007NNN` / `METPO:2007NNN` range. Pick a block at least | ||
| 10 above the highest existing ID to leave room for v1 patches. | ||
| Document the new block in `proposal.md` under the ID space | ||
| section. | ||
| - Same `subset` tag as the rest of the cohort. The tag identifies | ||
| the cohort, not the round of additions. | ||
| - New `parent` references may point at existing rows in the cohort | ||
| (this is the main reason to extend rather than re-version). | ||
| - Update `proposal.md`: | ||
| - Add the new enum(s) to the Scope table. | ||
| - Add the new ID block to the ID space section. | ||
| - Append a Change log entry: `- v1.N, <date>: extended with | ||
| <enum names> (+<row count> classes / +<row count> | ||
| properties)`. | ||
|
|
||
| Open a new PR titled `Extend METPO proposal <cohort> with | ||
| <enum-names>`. Re-request Copilot review. | ||
|
|
||
| ### Path C — New cohort version | ||
|
|
||
| Create a fresh directory | ||
| `proposals/<base-name>_v<N>/` (e.g., `metpo_communitymech_v2/`) | ||
| with: | ||
|
|
||
| - Fresh `subset` tag: bump the month-year suffix | ||
| (`metpo_communitymech_2026_05` → `metpo_communitymech_2026_07`). | ||
| - Fresh **ID block** in the `1007NNN` / `2007NNN` placeholder | ||
| range, **not overlapping with v1**. Document the new block in | ||
| v2's `proposal.md`. | ||
| - v2 `proposal.md` must include a `## Relationship to v1` section | ||
| that names every v1 ID retired or redefined, with the reason. | ||
| - v1 stays on disk read-only. Add a top-of-file note to v1's | ||
| `proposal.md`: `> Superseded by | ||
| proposals/<base-name>_v<N>/proposal.md (<date>). v1 is retained | ||
| for traceability.` | ||
|
|
||
| Use this path sparingly — the kg-microbe pipeline assumes one | ||
| active cohort per source repository at a time, and re-versioning | ||
| forces downstream consumers to switch their queries. | ||
|
|
||
| ### What not to do | ||
|
|
||
| - Do **not** edit IDs after they've been merged on main (use | ||
| Path B or C instead). | ||
| - Do **not** mix Path B (extend) and Path C (re-version) in the | ||
| same PR — reviewers can't tell what's append-only vs. breaking. | ||
| - Do **not** delete rows in Path B. If a row is obsolete, mark it | ||
| by setting `priority` to `LOW` and adding an observation note; | ||
| let Path C retire it cleanly. | ||
|
|
||
| --- | ||
|
|
||
| ## Common pitfalls | ||
|
|
||
| | Symptom | Cause | Fix | | ||
| |---|---|---| | ||
| | `awk` reports row 2 has 8/9 columns | Trailing tabs missing on ROBOT header | Append `\t\t\t` to row 2 (see step 4) | | ||
| | ROBOT error "subject of axiom is not a class" | Property row referencing a class IRI in the `RANGE` column when ROBOT expects a class declaration | Declare the range class as its own row in the classes TSV first | | ||
| | ELK reports unsatisfiable class | Intermediate parent created with conflicting `SC %` axioms | Inspect the parent chain — usually a copy-paste error in the `parent` column | | ||
| | Copilot flags "schema lifted incorrectly" | The leaf's definition doesn't match the schema enum's description verbatim | Copy the schema description verbatim into the `definition` column; reword in the proposal narrative if you want a different phrasing | | ||
| | Reviewer asks for an existing METPO ID | The lifted concept already exists in METPO under a different label | Use the existing IRI; record the alias in `mappings/metpo_existing_aliases.tsv` upstream when copying to kg-microbe | | ||
|
|
||
| --- | ||
|
|
||
| ## Canonical example | ||
|
|
||
| The v1 cohort `proposals/metpo_communitymech_v1/` lifts 9 CommunityMech enums | ||
| (74 leaf + parent classes) and 14 predicates. Inspect that directory before | ||
| writing a new cohort — every convention in this skill was instantiated there. | ||
|
|
||
| Coverage check from v1: **52/52 enum permissible values** mapped to leaf class | ||
| rows; **74/74 `SC %` parent references** resolved. PR #74 in the CommunityMech | ||
| repo has the full reviewer trace. | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.