casanovoutils#
casanovoutils is an open-source collection of python and command-line utilities for evaluating, visualizing, and manipulating peptide-spectrum match (PSM) data. It is designed to complement Casanovo, the state-of-the-art de novo peptide sequencing tool, and works directly with mzTab and MGF file formats.
Key capabilities#
MGF processing pipeline — shuffle, downsample by peptide sequence, and purge near-duplicate peaks, either as individual steps or chained together via the
casanovoutils mgf pipelinecommand.mzML sampling — stream-sample a proportion of spectra from mzML files in a single pass using per-buffer random sampling, writing output as MGF via
casanovoutils mzmlutils.PSM data loading — parse MGF and mzTab files into Polars DataFrames, join predicted and ground-truth annotations, and export to Parquet, CSV, or TSV via
casanovoutils denovo.Residue mass tables — export and customize the amino acid mass table used for evaluation via
casanovoutils dump-residues dump.
User Guide
Reference
Community