Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 8 additions & 4 deletions .github/workflows/validate.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,12 @@ jobs:
name: Detect changed tool directories
run: |
# Note: github.base_ref is only available on pull_request events
# Find all tool directories that changed in this PR
# Find all tool directories that changed in this PR, keeping only
# those that still exist (deleted tool dirs must not be linted/tested).
CHANGED=$(git diff --name-only origin/${{ github.base_ref }}...HEAD -- 'tools/' \
| grep -oP 'tools/[^/]+/[^/]+/[^/]+/' \
| sort -u \
| while read -r dir; do [ -d "$dir" ] && echo "$dir"; done \
| jq -R -s -c 'split("\n") | map(select(length > 0))')

if [ "$CHANGED" = "[]" ] || [ -z "$CHANGED" ]; then
Expand Down Expand Up @@ -60,9 +62,11 @@ jobs:
run: |
DIRS='${{ needs.detect-changes.outputs.matrix }}'
echo "$DIRS" | jq -r '.[]' | while read -r dir; do
echo "::group::ruff $dir"
/tmp/validate_venv/bin/python -m ruff check "$dir"
echo "::endgroup::"
if [ -d "$dir" ]; then
echo "::group::ruff $dir"
/tmp/validate_venv/bin/python -m ruff check "$dir"
echo "::endgroup::"
fi
done

- name: Test changed tools
Expand Down
13 changes: 5 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Agentomics

A growing collection of **123 standalone CLI tools** built with [pyopenms](https://pyopenms.readthedocs.io/) for proteomics and metabolomics workflows. Every tool in this repository fills a gap not covered by existing OpenMS TOPP tools — small, focused utilities that researchers need daily but typically write as throwaway scripts.
A growing collection of **118 standalone CLI tools** built with [pyopenms](https://pyopenms.readthedocs.io/) for proteomics and metabolomics workflows. Every tool in this repository fills a gap not covered by existing OpenMS TOPP tools — small, focused utilities that researchers need daily but typically write as throwaway scripts.

## Why This Exists

Expand Down Expand Up @@ -155,12 +155,11 @@ Both `ruff` and `pytest` must pass with zero errors.
| [`fasta_in_silico_digest_stats`](tools/proteomics/fasta_utils/fasta_in_silico_digest_stats/) | Digest a FASTA and report peptide-level statistics |
| [`fasta_taxonomy_splitter`](tools/proteomics/fasta_utils/fasta_taxonomy_splitter/) | Split multi-organism FASTA by taxonomy from headers |

#### File Conversion (8 tools)
#### File Conversion (7 tools)

| Tool | Description |
|------|-------------|
| [`mzml_to_mgf_converter`](tools/proteomics/file_conversion/mzml_to_mgf_converter/) | Convert MS2 spectra from mzML to MGF format |
| [`mgf_to_mzml_converter`](tools/proteomics/file_conversion/mgf_to_mzml_converter/) | Convert MGF files to mzML format |
| [`mgf_mzml_converter`](tools/proteomics/file_conversion/mgf_mzml_converter/) | Bidirectional MGF ↔ mzML converter with spectrum filtering (merged from `mgf_to_mzml_converter` + `mzml_to_mgf_converter`) |
| [`consensus_map_to_matrix`](tools/proteomics/file_conversion/consensus_map_to_matrix/) | Convert consensusXML to flat quantification matrix |
| [`idxml_to_tsv_exporter`](tools/proteomics/file_conversion/idxml_to_tsv_exporter/) | Export idXML identification results to flat TSV |
| [`ms_data_to_csv_exporter`](tools/proteomics/file_conversion/ms_data_to_csv_exporter/) | Export mzML/featureXML data to CSV with column selection |
Expand Down Expand Up @@ -281,15 +280,13 @@ Both `ruff` and `pytest` must pass with zero errors.
| [`mass_defect_filter`](tools/metabolomics/feature_processing/mass_defect_filter/) | Filter features by mass defect and Kendrick mass defect |
| [`metabolite_feature_detection`](tools/metabolomics/feature_processing/metabolite_feature_detection/) | Metabolite feature detection from LC-MS data |

#### Spectral Analysis (6 tools)
#### Spectral Analysis (4 tools)

| Tool | Description |
|------|-------------|
| [`spectral_entropy_scorer`](tools/metabolomics/spectral_analysis/spectral_entropy_scorer/) | Compute spectral entropy similarity (Li & Fiehn 2021) |
| [`neutral_loss_scanner`](tools/metabolomics/spectral_analysis/neutral_loss_scanner/) | Scan MS2 spectra for characteristic neutral losses |
| [`isotope_pattern_scorer`](tools/metabolomics/spectral_analysis/isotope_pattern_scorer/) | Score observed vs. theoretical isotope patterns |
| [`isotope_pattern_matcher`](tools/metabolomics/spectral_analysis/isotope_pattern_matcher/) | Generate theoretical isotope distributions and cosine similarity scoring |
| [`isotope_pattern_fit_scorer`](tools/metabolomics/spectral_analysis/isotope_pattern_fit_scorer/) | Score isotope pattern fit, detect Cl/Br from M+2 enhancement |
| [`isotope_pattern_analyzer`](tools/metabolomics/spectral_analysis/isotope_pattern_analyzer/) | Generate theoretical isotope distributions, cosine similarity scoring, Da/ppm tolerance, Cl/Br halogen detection (merged from `isotope_pattern_matcher` + `isotope_pattern_scorer` + `isotope_pattern_fit_scorer`) |
| [`massql_query_tool`](tools/metabolomics/spectral_analysis/massql_query_tool/) | Query mzML data using MassQL-like syntax |

#### Compound Annotation (4 tools)
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
# Isotope Pattern Analyzer

Generate theoretical isotope distributions for molecular formulas, score observed
isotope patterns using cosine similarity, and detect halogenation (Cl/Br).

This tool consolidates `isotope_pattern_matcher`, `isotope_pattern_scorer`, and
`isotope_pattern_fit_scorer` into a single, improved utility.

## Features

- Theoretical isotope pattern generation via pyopenms `CoarseIsotopePatternGenerator`
- Cosine similarity scoring between observed and theoretical patterns
- **Da or ppm m/z tolerance** — choose your preferred unit
- Halogen (Cl/Br) detection from M+2 peak enhancement
- JSON output with per-peak detail
- Terminal bar-chart preview of the theoretical distribution
- Optional numpy acceleration for cosine computation

## Installation

```bash
pip install pyopenms
```

## CLI Usage

```bash
# Generate and display the isotope pattern for glucose
python isotope_pattern_analyzer.py --formula C6H12O6

# Score observed peaks against the formula (colon-separated format)
python isotope_pattern_analyzer.py --formula C6H12O6 \
--observed "180.063:100,181.067:6.5,182.070:0.5" \
--output result.json

# Use legacy comma-separated format (one --peaks flag per peak)
python isotope_pattern_analyzer.py --formula C6H12O6 \
--peaks 180.063,100.0 --peaks 181.067,6.5 \
--output result.json

# Use ppm tolerance
python isotope_pattern_analyzer.py --formula C6H12O6 \
--observed "180.063:100,181.067:6.5" \
--tolerance 10 --tolerance-unit ppm

# Detect halogenation (chlorinated compound example)
python isotope_pattern_analyzer.py --formula C6H5Cl \
--observed "112.007:100,113.011:5.5,114.004:33.0" \
--output halogen_result.json
```

## Output JSON Structure

```json
{
"formula": "C6H12O6",
"cosine_similarity": 0.9987,
"n_peaks_compared": 3,
"tolerance": 0.05,
"tolerance_unit": "da",
"peaks": [
{"peak_index": 0, "obs_mz": 180.063, "theo_mz": 180.0634, "obs_intensity": 100.0, "theo_intensity": 100.0},
...
],
"theoretical_pattern": [...],
"halogen_detection": {
"m2_ratio_observed": 0.5,
"m2_ratio_theoretical": 0.42,
"m2_excess": 0.08,
"halogen_flag": false,
"possible_halogen": "none"
}
}
```

## Halogen Detection Thresholds

| M+2 excess above theoretical | Interpretation |
|------------------------------|---------------------------------|
| < 10 % | No halogenation detected |
| 10–20 % | Cl (weak signal) |
| 20–70 % | Cl |
| > 70 % | Br |
Loading
Loading