casanovoutils

Contents

casanovoutils#

casanovoutils is an open-source collection of python and command-line utilities for evaluating, visualizing, and manipulating peptide-spectrum match (PSM) data. It is designed to complement Casanovo, the state-of-the-art de novo peptide sequencing tool, and works directly with mzTab and MGF file formats.

Key capabilities#

  • MGF processing pipeline — shuffle, downsample by peptide sequence, and purge near-duplicate peaks, either as individual steps or chained together via the casanovoutils mgf pipeline command.

  • mzML sampling — stream-sample a proportion of spectra from mzML files in a single pass using per-buffer random sampling, writing output as MGF via casanovoutils mzmlutils.

  • PSM data loading — parse MGF and mzTab files into Polars DataFrames, join predicted and ground-truth annotations, and export to Parquet, CSV, or TSV via casanovoutils denovo.

  • Residue mass tables — export and customize the amino acid mass table used for evaluation via casanovoutils dump-residues dump.

Reference