Conversation
Open BAM/SAM/CRAM without a FASTA: bases that don't need a reference decode as before, and `pileup()` reports `Base::Unknown` for the reference column. CRAM follows the writer's `RR` flag and per-slice `embedded_reference` to decide whether external reference bytes are needed; if a slice needs them and no FASTA was supplied, fetch returns `CramError::MissingReference` rather than silently producing N's. `Readers::open` now takes `impl Into<Option<&Path>>` for the FASTA argument, so `Readers::open(bam, &fa)`, `Readers::open(bam, Some(&fa))`, and `Readers::open(bam, None)` all work. Closes #46. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
LLM critique7 findings: 0 showstoppers, 3 gaps, 0 inconsistencies, 1 underspecified, 3 suggestions. [Gap] [1] Multi-ref CRAM containers with The missing-reference gate in Multi-ref containers without embedded references are uncommon but not impossible — htslib can produce them for multi-contig regions where individual contigs are too small to fill a container. Scenario: Open a CRAM file written with Sources: [Gap] [2] FASTA fetch happens before In For most production files this is a minor performance issue rather than a correctness one — Scenario: A CRAM written with Sources: [Gap] [3] Working directory has uncommitted WIP that breaks compilation
Scenario: A reviewer checks out the branch and runs Sources: unstaged diff in [Underspecified] [4] Multi-ref absence-of-reference decision is deferred to per-record decode where it can't be checked The gate at slice.rs:231-239 makes a slice-level decision (error or proceed) based on You might want to decide now whether to (a) add a per-record gate in Sources: [Suggestion] [5] The The parameter name describes an external state (was FASTA provided at open?) rather than what the function needs to decide (can we reconstruct bases?). Since the function also knows This is a naming suggestion only — the logic is correct (aside from gap [1]). Sources: [Suggestion] [6]
Sources: [Suggestion] [7] The The What looks good
|
Quick exploration for #46.
This PR implements:
pileup()reportsBase::Unknownfor the reference column.RRflag and per-sliceembedded_referenceto decide whether external reference bytes are needed; if a slice needs them and no FASTA was supplied, fetch returnsCramError::MissingReferencerather than silently producing N's.Not sure I like this as it might be surprising to the user, but it is kinda what htslib does. Alternative would be to have type-state for
Readers<T>withT: NoReference | WithReferenceand only supportpileupson theWithReferencevariant. CRAM throwing errors when reference is missing is very unfortunate tho.