casanovoutils.summarize_mgf#
MGF file visualization tools.
- fragment-coverage:
Calculate the proportion of total fragment ion intensity explained by b- and y-ions (including neutral losses of NH3 and H2O) for each spectrum in an annotated MGF file.
The peptide annotation is read from the SEQ= field (ProForma notation).
- Outputs:
A TSV with columns: scan, filename, sequence, charge, n_peaks, n_matched, proportion_matched
A histogram (PNG) of the proportion distribution.
- charge-distribution:
Compute the global charge state distribution from an MGF file.
- Outputs:
A TSV with columns: charge, count
A bar chart (PNG) of the distribution.
- peak-counts:
Count the number of peaks per spectrum in an MGF file.
- Outputs:
A TSV with columns: n_peaks, count
A histogram (PNG) of the distribution.
- peptide-lengths:
Measure peptide lengths for annotated spectra in an MGF file. Requires SEQ= field in ProForma notation.
- Outputs:
A TSV with columns: length, count
A histogram (PNG) of the distribution.
- summarize-mgf:
Produce a self-contained HTML summary of an MGF file with basic statistics and embedded histograms for charge state distribution, peaks per spectrum, peptide lengths, and fragment ion coverage.
Requires: pyteomics, spectrum_utils, numpy, matplotlib
Attributes#
Functions#
|
Count precursor charge states across spectra. |
|
Charge state distribution for an MGF file. |
|
Return the number of peaks for each spectrum. |
|
Number of peaks per spectrum in an MGF file. |
|
Return (lengths, n_skipped) for annotated spectra. |
|
Peptide length distribution for annotated spectra in an MGF file. |
|
Fragment ion intensity coverage for annotated MGF spectra. |
|
Produce a self-contained HTML summary of an MGF file. |
|
Module Contents#
- casanovoutils.summarize_mgf.count_charge_states(spectra: Iterable) tuple[dict[int, int], int]#
Count precursor charge states across spectra.
- Parameters:
spectra (iterable) – Iterable of pyteomics spectrum dicts.
- Returns:
counts (dict[int, int]) – Mapping of charge state to count.
n_skipped (int) – Number of spectra skipped (missing, empty, or multiple charge states).
- casanovoutils.summarize_mgf.charge_distribution(mgf_file: os.PathLike, output_tsv: os.PathLike = 'charge_distribution.tsv', output_plot: os.PathLike = 'charge_distribution.png') None#
Charge state distribution for an MGF file.
- Parameters:
mgf_file (PathLike) – Input MGF file.
output_tsv (PathLike) – Output TSV path (default: charge_distribution.tsv).
output_plot (PathLike) – Output bar chart path (default: charge_distribution.png).
- casanovoutils.summarize_mgf.count_peaks(spectra: Iterable) list[int]#
Return the number of peaks for each spectrum.
- Parameters:
spectra (Iterable) – Iterable of pyteomics spectrum dicts.
- Returns:
Peak count for each spectrum, in input order.
- Return type:
list[int]
- casanovoutils.summarize_mgf.peak_counts(mgf_file: os.PathLike, output_tsv: os.PathLike = 'peak_counts.tsv', output_plot: os.PathLike = 'peak_counts.png') None#
Number of peaks per spectrum in an MGF file.
- Parameters:
mgf_file (PathLike) – Input MGF file.
output_tsv (PathLike) – Output TSV path (default: peak_counts.tsv).
output_plot (PathLike) – Output histogram path (default: peak_counts.png).
- casanovoutils.summarize_mgf.measure_peptide_lengths(spectra: Iterable) tuple[list[int], int]#
Return (lengths, n_skipped) for annotated spectra.
Spectra without SEQ= or with invalid ProForma sequences are counted as skipped.
- casanovoutils.summarize_mgf.peptide_lengths(mgf_file: os.PathLike, output_tsv: os.PathLike = 'peptide_lengths.tsv', output_plot: os.PathLike = 'peptide_lengths.png') None#
Peptide length distribution for annotated spectra in an MGF file.
- Parameters:
mgf_file (PathLike) – Input MGF file.
output_tsv (PathLike) – Output TSV path (default: peptide_lengths.tsv).
output_plot (PathLike) – Output histogram path (default: peptide_lengths.png).
- casanovoutils.summarize_mgf.fragment_coverage(mgf_file, tolerance=0.05, tolerance_unit='Da', output_tsv='fragment_coverage.tsv', output_full_tsv='fragment_coverage.full.tsv', output_plot='fragment_coverage.png', workers=1, max_charge='1less', neutral_losses=True)#
Fragment ion intensity coverage for annotated MGF spectra.
- Parameters:
mgf_file (str) – Annotated MGF file (with SEQ= in ProForma notation).
tolerance (float) – Mass tolerance (default: 10).
tolerance_unit (str) – Tolerance unit: ‘ppm’ or ‘Da’ (default: ppm).
output_tsv (str) – Output TSV path (default: fragment_coverage.tsv).
output_full_tsv (str) – Output per-spectrum TSV path (default: fragment_coverage.full.tsv).
output_plot (str) – Output histogram path (default: fragment_coverage.png).
workers (int) – Number of parallel worker processes (default: 1).
max_charge (str) – Maximum charge state for fragment ions: ‘max’ (precursor charge) or ‘1less’ (precursor charge minus one, default).
neutral_losses (bool) – Include neutral losses in annotation (default: True).
- casanovoutils.summarize_mgf.summarize_mgf(mgf_file: os.PathLike, output_root: os.PathLike = 'mgf_summary', tolerance: float = 0.05, tolerance_unit: str = 'Da', workers: int = 1, max_charge: str = '1less', neutral_losses: bool = True) None#
Produce a self-contained HTML summary of an MGF file.
- Parameters:
mgf_file (PathLike) – Input MGF file.
output_root (PathLike) – Output directory name; the HTML file inside will share this basename (default: mgf_summary).
tolerance (float) – Fragment mass tolerance for coverage calculation (default: 10).
tolerance_unit (str) – Tolerance unit: ‘ppm’ or ‘Da’ (default: ppm).
workers (int) – Number of parallel worker processes for coverage annotation (default: 1).
max_charge (str) – Maximum charge state for fragment ions: ‘max’ (precursor charge) or ‘1less’ (precursor charge minus one, default).
neutral_losses (bool) – Include neutral losses in annotation (default: True).
- casanovoutils.summarize_mgf.COMMANDS#
- casanovoutils.summarize_mgf.main() None#