casanovoutils.summarize_mgf#

MGF file visualization tools.

fragment-coverage:

Calculate the proportion of total fragment ion intensity explained by b- and y-ions (including neutral losses of NH3 and H2O) for each spectrum in an annotated MGF file.

The peptide annotation is read from the SEQ= field (ProForma notation).

Outputs:
  1. A TSV with columns: scan, filename, sequence, charge, n_peaks, n_matched, proportion_matched

  2. A histogram (PNG) of the proportion distribution.

charge-distribution:

Compute the global charge state distribution from an MGF file.

Outputs:
  1. A TSV with columns: charge, count

  2. A bar chart (PNG) of the distribution.

peak-counts:

Count the number of peaks per spectrum in an MGF file.

Outputs:
  1. A TSV with columns: n_peaks, count

  2. A histogram (PNG) of the distribution.

peptide-lengths:

Measure peptide lengths for annotated spectra in an MGF file. Requires SEQ= field in ProForma notation.

Outputs:
  1. A TSV with columns: length, count

  2. A histogram (PNG) of the distribution.

summarize-mgf:

Produce a self-contained HTML summary of an MGF file with basic statistics and embedded histograms for charge state distribution, peaks per spectrum, peptide lengths, and fragment ion coverage.

Requires: pyteomics, spectrum_utils, numpy, matplotlib

Attributes#

Functions#

count_charge_states(→ tuple[dict[int, int], int])

Count precursor charge states across spectra.

charge_distribution(→ None)

Charge state distribution for an MGF file.

count_peaks(→ list[int])

Return the number of peaks for each spectrum.

peak_counts(→ None)

Number of peaks per spectrum in an MGF file.

measure_peptide_lengths(→ tuple[list[int], int])

Return (lengths, n_skipped) for annotated spectra.

peptide_lengths(→ None)

Peptide length distribution for annotated spectra in an MGF file.

fragment_coverage(mgf_file[, tolerance, ...])

Fragment ion intensity coverage for annotated MGF spectra.

summarize_mgf(→ None)

Produce a self-contained HTML summary of an MGF file.

main(→ None)

Module Contents#

casanovoutils.summarize_mgf.count_charge_states(spectra: Iterable) tuple[dict[int, int], int]#

Count precursor charge states across spectra.

Parameters:

spectra (iterable) – Iterable of pyteomics spectrum dicts.

Returns:

  • counts (dict[int, int]) – Mapping of charge state to count.

  • n_skipped (int) – Number of spectra skipped (missing, empty, or multiple charge states).

casanovoutils.summarize_mgf.charge_distribution(mgf_file: os.PathLike, output_tsv: os.PathLike = 'charge_distribution.tsv', output_plot: os.PathLike = 'charge_distribution.png') None#

Charge state distribution for an MGF file.

Parameters:
  • mgf_file (PathLike) – Input MGF file.

  • output_tsv (PathLike) – Output TSV path (default: charge_distribution.tsv).

  • output_plot (PathLike) – Output bar chart path (default: charge_distribution.png).

casanovoutils.summarize_mgf.count_peaks(spectra: Iterable) list[int]#

Return the number of peaks for each spectrum.

Parameters:

spectra (Iterable) – Iterable of pyteomics spectrum dicts.

Returns:

Peak count for each spectrum, in input order.

Return type:

list[int]

casanovoutils.summarize_mgf.peak_counts(mgf_file: os.PathLike, output_tsv: os.PathLike = 'peak_counts.tsv', output_plot: os.PathLike = 'peak_counts.png') None#

Number of peaks per spectrum in an MGF file.

Parameters:
  • mgf_file (PathLike) – Input MGF file.

  • output_tsv (PathLike) – Output TSV path (default: peak_counts.tsv).

  • output_plot (PathLike) – Output histogram path (default: peak_counts.png).

casanovoutils.summarize_mgf.measure_peptide_lengths(spectra: Iterable) tuple[list[int], int]#

Return (lengths, n_skipped) for annotated spectra.

Spectra without SEQ= or with invalid ProForma sequences are counted as skipped.

casanovoutils.summarize_mgf.peptide_lengths(mgf_file: os.PathLike, output_tsv: os.PathLike = 'peptide_lengths.tsv', output_plot: os.PathLike = 'peptide_lengths.png') None#

Peptide length distribution for annotated spectra in an MGF file.

Parameters:
  • mgf_file (PathLike) – Input MGF file.

  • output_tsv (PathLike) – Output TSV path (default: peptide_lengths.tsv).

  • output_plot (PathLike) – Output histogram path (default: peptide_lengths.png).

casanovoutils.summarize_mgf.fragment_coverage(mgf_file, tolerance=0.05, tolerance_unit='Da', output_tsv='fragment_coverage.tsv', output_full_tsv='fragment_coverage.full.tsv', output_plot='fragment_coverage.png', workers=1, max_charge='1less', neutral_losses=True)#

Fragment ion intensity coverage for annotated MGF spectra.

Parameters:
  • mgf_file (str) – Annotated MGF file (with SEQ= in ProForma notation).

  • tolerance (float) – Mass tolerance (default: 10).

  • tolerance_unit (str) – Tolerance unit: ‘ppm’ or ‘Da’ (default: ppm).

  • output_tsv (str) – Output TSV path (default: fragment_coverage.tsv).

  • output_full_tsv (str) – Output per-spectrum TSV path (default: fragment_coverage.full.tsv).

  • output_plot (str) – Output histogram path (default: fragment_coverage.png).

  • workers (int) – Number of parallel worker processes (default: 1).

  • max_charge (str) – Maximum charge state for fragment ions: ‘max’ (precursor charge) or ‘1less’ (precursor charge minus one, default).

  • neutral_losses (bool) – Include neutral losses in annotation (default: True).

casanovoutils.summarize_mgf.summarize_mgf(mgf_file: os.PathLike, output_root: os.PathLike = 'mgf_summary', tolerance: float = 0.05, tolerance_unit: str = 'Da', workers: int = 1, max_charge: str = '1less', neutral_losses: bool = True) None#

Produce a self-contained HTML summary of an MGF file.

Parameters:
  • mgf_file (PathLike) – Input MGF file.

  • output_root (PathLike) – Output directory name; the HTML file inside will share this basename (default: mgf_summary).

  • tolerance (float) – Fragment mass tolerance for coverage calculation (default: 10).

  • tolerance_unit (str) – Tolerance unit: ‘ppm’ or ‘Da’ (default: ppm).

  • workers (int) – Number of parallel worker processes for coverage annotation (default: 1).

  • max_charge (str) – Maximum charge state for fragment ions: ‘max’ (precursor charge) or ‘1less’ (precursor charge minus one, default).

  • neutral_losses (bool) – Include neutral losses in annotation (default: True).

casanovoutils.summarize_mgf.COMMANDS#
casanovoutils.summarize_mgf.main() None#