Getting Started#
Requirements#
casanovoutils requires Python 3.13 or later.
Installation#
From PyPI#
pip install casanovoutils
From source#
Clone the repository and install with pip:
git clone https://github.com/Noble-Lab/casanovoutils.git
cd casanovoutils
pip install .
If you use uv:
git clone https://github.com/Noble-Lab/casanovoutils.git
cd casanovoutils
uv sync
Verifying the installation#
After installation, the casanovoutils command should be available in your
shell:
casanovoutils --help
casanovoutils mgfutils --help
casanovoutils mzmlutils --help
casanovoutils denovoutils --help
Quick start examples#
Shuffle and downsample an MGF file#
Shuffle spectra and retain at most 2 per peptide sequence:
casanovoutils mgfutils pipeline input.mgf --outfile out.mgf --downsample_k 2
Downsample only (no shuffle)#
casanovoutils mgfutils downsample input.mgf --outfile sampled.mgf --k 2
Remove near-duplicate peaks#
Remove peaks separated by less than 0.001 Da:
casanovoutils mgfutils purge-redundant input.mgf --outfile purged.mgf
Sample spectra from an mzML file#
Sample 10% of spectra and write to MGF:
casanovoutils mzmlutils input.mzML 0.1 sampled.mgf
Load PSM data into a DataFrame#
casanovoutils denovoutils get_groundtruth input.mgf results.mztab \
--out_path groundtruth.parquet
Compute precision-coverage metrics#
casanovoutils preccov get_pc_df \
--mgf_df psms.parquet --mztab_df matches.parquet \
--out_path pc.parquet
casanovoutils preccov graph_prec_cov pc.parquet --out_path pc_curve.png
Summarise an MGF file#
casanovoutils summarize_mgf summarize input.mgf --output_root my_report
Create train/val/test splits#
casanovoutils datasets input.mgf --output_root splits/run1
Export the residue mass table#
casanovoutils residues residues.yaml
Edit residues.yaml to add custom modifications or non-standard residues,
then pass it back to other tools via --residues_path.