casanovoutils.summarize_mgf
===========================

.. py:module:: casanovoutils.summarize_mgf

.. autoapi-nested-parse::

   MGF file visualization tools.

   fragment-coverage:
       Calculate the proportion of total fragment ion intensity explained by
       b- and y-ions (including neutral losses of NH3 and H2O) for each spectrum
       in an annotated MGF file.

       The peptide annotation is read from the SEQ= field (ProForma notation).

       Outputs:
         1. A TSV with columns: scan, filename, sequence, charge, n_peaks,
            n_matched, proportion_matched
         2. A histogram (PNG) of the proportion distribution.

   charge-distribution:
       Compute the global charge state distribution from an MGF file.

       Outputs:
         1. A TSV with columns: charge, count
         2. A bar chart (PNG) of the distribution.

   peak-counts:
       Count the number of peaks per spectrum in an MGF file.

       Outputs:
         1. A TSV with columns: n_peaks, count
         2. A histogram (PNG) of the distribution.

   peptide-lengths:
       Measure peptide lengths for annotated spectra in an MGF file.
       Requires SEQ= field in ProForma notation.

       Outputs:
         1. A TSV with columns: length, count
         2. A histogram (PNG) of the distribution.

   summarize-mgf:
       Produce a self-contained HTML summary of an MGF file with basic statistics
       and embedded histograms for charge state distribution, peaks per spectrum,
       peptide lengths, and fragment ion coverage.

   Requires: pyteomics, spectrum_utils, numpy, matplotlib



Attributes
----------

.. autoapisummary::

   casanovoutils.summarize_mgf.COMMANDS


Functions
---------

.. autoapisummary::

   casanovoutils.summarize_mgf.count_charge_states
   casanovoutils.summarize_mgf.charge_distribution
   casanovoutils.summarize_mgf.count_peaks
   casanovoutils.summarize_mgf.peak_counts
   casanovoutils.summarize_mgf.measure_peptide_lengths
   casanovoutils.summarize_mgf.peptide_lengths
   casanovoutils.summarize_mgf.fragment_coverage
   casanovoutils.summarize_mgf.summarize_mgf
   casanovoutils.summarize_mgf.main


Module Contents
---------------

.. py:function:: count_charge_states(spectra: Iterable) -> tuple[dict[int, int], int]

   Count precursor charge states across spectra.

   :param spectra: Iterable of pyteomics spectrum dicts.
   :type spectra: iterable

   :returns: * **counts** (*dict[int, int]*) -- Mapping of charge state to count.
             * **n_skipped** (*int*) -- Number of spectra skipped (missing, empty, or multiple charge states).


.. py:function:: charge_distribution(mgf_file: os.PathLike, output_tsv: os.PathLike = 'charge_distribution.tsv', output_plot: os.PathLike = 'charge_distribution.png') -> None

   Charge state distribution for an MGF file.

   :param mgf_file: Input MGF file.
   :type mgf_file: PathLike
   :param output_tsv: Output TSV path (default: charge_distribution.tsv).
   :type output_tsv: PathLike
   :param output_plot: Output bar chart path (default: charge_distribution.png).
   :type output_plot: PathLike


.. py:function:: count_peaks(spectra: Iterable) -> list[int]

   Return the number of peaks for each spectrum.

   :param spectra: Iterable of pyteomics spectrum dicts.
   :type spectra: Iterable

   :returns: Peak count for each spectrum, in input order.
   :rtype: list[int]


.. py:function:: peak_counts(mgf_file: os.PathLike, output_tsv: os.PathLike = 'peak_counts.tsv', output_plot: os.PathLike = 'peak_counts.png') -> None

   Number of peaks per spectrum in an MGF file.

   :param mgf_file: Input MGF file.
   :type mgf_file: PathLike
   :param output_tsv: Output TSV path (default: peak_counts.tsv).
   :type output_tsv: PathLike
   :param output_plot: Output histogram path (default: peak_counts.png).
   :type output_plot: PathLike


.. py:function:: measure_peptide_lengths(spectra: Iterable) -> tuple[list[int], int]

   Return (lengths, n_skipped) for annotated spectra.

   Spectra without SEQ= or with invalid ProForma sequences are counted
   as skipped.


.. py:function:: peptide_lengths(mgf_file: os.PathLike, output_tsv: os.PathLike = 'peptide_lengths.tsv', output_plot: os.PathLike = 'peptide_lengths.png') -> None

   Peptide length distribution for annotated spectra in an MGF file.

   :param mgf_file: Input MGF file.
   :type mgf_file: PathLike
   :param output_tsv: Output TSV path (default: peptide_lengths.tsv).
   :type output_tsv: PathLike
   :param output_plot: Output histogram path (default: peptide_lengths.png).
   :type output_plot: PathLike


.. py:function:: fragment_coverage(mgf_file, tolerance=0.05, tolerance_unit='Da', output_tsv='fragment_coverage.tsv', output_full_tsv='fragment_coverage.full.tsv', output_plot='fragment_coverage.png', workers=1, max_charge='1less', neutral_losses=True)

   Fragment ion intensity coverage for annotated MGF spectra.

   :param mgf_file: Annotated MGF file (with SEQ= in ProForma notation).
   :type mgf_file: str
   :param tolerance: Mass tolerance (default: 10).
   :type tolerance: float
   :param tolerance_unit: Tolerance unit: 'ppm' or 'Da' (default: ppm).
   :type tolerance_unit: str
   :param output_tsv: Output TSV path (default: fragment_coverage.tsv).
   :type output_tsv: str
   :param output_full_tsv: Output per-spectrum TSV path (default: fragment_coverage.full.tsv).
   :type output_full_tsv: str
   :param output_plot: Output histogram path (default: fragment_coverage.png).
   :type output_plot: str
   :param workers: Number of parallel worker processes (default: 1).
   :type workers: int
   :param max_charge: Maximum charge state for fragment ions: 'max' (precursor charge)
                      or '1less' (precursor charge minus one, default).
   :type max_charge: str
   :param neutral_losses: Include neutral losses in annotation (default: True).
   :type neutral_losses: bool


.. py:function:: summarize_mgf(mgf_file: os.PathLike, output_root: os.PathLike = 'mgf_summary', tolerance: float = 0.05, tolerance_unit: str = 'Da', workers: int = 1, max_charge: str = '1less', neutral_losses: bool = True) -> None

   Produce a self-contained HTML summary of an MGF file.

   :param mgf_file: Input MGF file.
   :type mgf_file: PathLike
   :param output_root: Output directory name; the HTML file inside will share this basename
                       (default: mgf_summary).
   :type output_root: PathLike
   :param tolerance: Fragment mass tolerance for coverage calculation (default: 10).
   :type tolerance: float
   :param tolerance_unit: Tolerance unit: 'ppm' or 'Da' (default: ppm).
   :type tolerance_unit: str
   :param workers: Number of parallel worker processes for coverage annotation (default: 1).
   :type workers: int
   :param max_charge: Maximum charge state for fragment ions: 'max' (precursor charge)
                      or '1less' (precursor charge minus one, default).
   :type max_charge: str
   :param neutral_losses: Include neutral losses in annotation (default: True).
   :type neutral_losses: bool


.. py:data:: COMMANDS

.. py:function:: main() -> None

