Warning
Seqsum was rewritten in Rust in 0.3.0. The original Python version of seqsum and how to use it is archived in the python branch. It remains available on PyPI.
Robust checksums for nucleotide sequences. Accepts one or more fast[a|q][.gz|.zst] files or standard input. Generates an aggregate checksum for each input file by default, similar to md5sum/sha256sum. Warnings are shown for duplicate sequences and within-collection checksum collisions at the selected bit depth. Sequences are uppercased before hashing with RapidHash and may be normalised (with -n) to use only ACGTN-. Read IDs and FASTQ base quality scores do not inform the checksum. Output is tab-delimited text to stdout.
By default, seqsum outputs one aggregate checksum per file. Use --individual (-i) for per-record checksums, or --all (-a) for both individual and aggregate checksums. These flags are mutually exclusive.
cargo install seqsumgit clone https://github.com/bede/seqsum.git
cd seqsum
cargo test# Default: aggregate checksum per file
$ seqsum tests/data/MN908947.fasta
33ba13564e0a63e3 tests/data/MN908947.fasta
# Multiple files
$ seqsum tests/data/MN908947.fasta tests/data/MN908947-BA_2_86_1.fasta
33ba13564e0a63e3 tests/data/MN908947.fasta
d3a94eb82357ece5 tests/data/MN908947-BA_2_86_1.fasta
# Stdin
$ cat tests/data/MN908947.fasta | seqsum
33ba13564e0a63e3 -
# Individual per-record checksums
$ seqsum -i tests/data/MN908947-BA_2_86_1.fasta
33ba13564e0a63e3 MN908947.3
9fef3b61d54d8902 BA.2.86.1
# All: individual checksums + aggregate
$ seqsum -a tests/data/MN908947-BA_2_86_1.fasta
33ba13564e0a63e3 MN908947.3 tests/data/MN908947-BA_2_86_1.fasta
9fef3b61d54d8902 BA.2.86.1 tests/data/MN908947-BA_2_86_1.fasta
d3a94eb82357ece5 sum tests/data/MN908947-BA_2_86_1.fasta
Built-in help
$ seqsum -h