![]() |
OpenMS
|
Subset-neighbor peptide search: find peptides from a wider pool (typically a FASTA digest) that are spectral neighbors of a smaller "relevant" peptide set, useful when only part of a complex sample is of interest. More...
#include <OpenMS/ANALYSIS/ID/NeighborSeq.h>
Classes | |
| struct | NeighborStats |
| Statistics of how many neighbors were found per reference peptide. More... | |
Public Member Functions | |
| NeighborSeq (std::vector< AASequence > &&digested_relevant_peptides) | |
| Construct from a vector of "relevant" digested peptides. | |
| MSSpectrum | generateSpectrum (const AASequence &peptide_sequence) |
| Generates a theoretical spectrum for a given peptide sequence with b/y ions at charge 1. | |
| bool | isNeighborPeptide (const AASequence &neighbor_candidate, const double mass_tolerance_pc, const bool mass_tolerance_pc_ppm, const double min_shared_ion_fraction, const double mz_bin_size) |
Whether neighbor_candidate is a spectral neighbor of any of the relevant peptides. | |
| NeighborStats | getNeighborStats () const |
| after calling isNeighborPeptide() multiple times, this function returns the statistics of how many neighbors were found per reference peptide | |
Static Public Member Functions | |
| static bool | isNeighborSpectrum (const MSSpectrum &spec1, const MSSpectrum &spec2, const double min_shared_ion_fraction, const double mz_bin_size) |
Whether two spectra share enough peaks (in mz_bin_size m/z bins) to be considered neighbors. | |
| static int | computeSharedIonCount (const MSSpectrum &spec1, const MSSpectrum &spec2, const double &mz_bin_size) |
| Compute the number of shared ions between two spectra. | |
Protected Member Functions | |
| std::map< double, std::vector< int > > | createMassLookup_ () |
| Creates a map of masses to positions from the internal relevant peptides. | |
| auto | findCandidatePositions_ (const double mono_weight, double mass_tolerance, const bool mass_tolerance_pc_ppm) |
| Finds candidate positions based on a given mono-isotopic weight and mass tolerance. | |
Private Attributes | |
| const std::vector< AASequence > & | digested_relevant_peptides_ |
| digested relevant peptides | |
| std::map< double, std::vector< int > > | mass_position_map_ |
| map of masses to positions in digested_relevant_peptides_ | |
| TheoreticalSpectrumGenerator | spec_gen_ |
| for b/y ions with charge 1 | |
| const Residue * | x_residue_ |
| residue for unknown amino acid | |
| std::vector< int > | neighbor_stats_ |
| how many neighbors per reference peptide searched using isNeighborPeptide()? | |
Subset-neighbor peptide search: find peptides from a wider pool (typically a FASTA digest) that are spectral neighbors of a smaller "relevant" peptide set, useful when only part of a complex sample is of interest.
Two peptides are considered neighbors when their precursor masses are within tolerance and their theoretical b/y fragment spectra share enough peaks. The class is constructed once with the relevant peptides, then queried with isNeighborPeptide for each candidate. After the queries are done, getNeighborStats summarises how many relevant peptides had zero, one, or multiple neighbors.
Background: Cormen et al., J. Proteome Research 2021, 10.1021/acs.jproteome.1c00483 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8489664/).
| NeighborSeq | ( | std::vector< AASequence > && | digested_relevant_peptides | ) |
Construct from a vector of "relevant" digested peptides.
Builds an internal mass index (relevant peptides containing the unknown-amino-acid residue 'X' are skipped and a count is logged via OPENMS_LOG_WARN) and configures the internal theoretical-spectrum generator for charge-1 b/y ions including the b1 prefix ion.
const-reference to the moved-in vector via its internal member (not a copy). The vector passed to the constructor must therefore outlive every call on this instance.| [in] | digested_relevant_peptides | Digested peptides to use as the "relevant" reference set. |
|
static |
Compute the number of shared ions between two spectra.
All peaks are considered. Use generateSpectrum() to generate theoretical spectra with b/y ions.
| [in] | spec1 | The first theoretical spectrum. |
| [in] | spec2 | The second theoretical spectrum. |
| [in] | mz_bin_size | Bin size for the m/z values, which determines if two peaks are considered to be the same. |
|
protected |
Creates a map of masses to positions from the internal relevant peptides.
|
protected |
Finds candidate positions based on a given mono-isotopic weight and mass tolerance.
| [in] | mono_weight | The mono-isotopic weight to find candidates for. |
| [in] | mass_tolerance | The allowed tolerance for matching the mass. |
| [in] | mass_tolerance_pc_ppm | Whether the mass tolerance is in ppm. |
| MSSpectrum generateSpectrum | ( | const AASequence & | peptide_sequence | ) |
Generates a theoretical spectrum for a given peptide sequence with b/y ions at charge 1.
Includes all b and y ions with charge 1 (even the prefix ions, e.g. b1), but no internal ions.
| [in] | peptide_sequence | The peptide sequence for which to generate the spectrum. |
| NeighborStats getNeighborStats | ( | ) | const |
after calling isNeighborPeptide() multiple times, this function returns the statistics of how many neighbors were found per reference peptide
| bool isNeighborPeptide | ( | const AASequence & | neighbor_candidate, |
| const double | mass_tolerance_pc, | ||
| const bool | mass_tolerance_pc_ppm, | ||
| const double | min_shared_ion_fraction, | ||
| const double | mz_bin_size | ||
| ) |
Whether neighbor_candidate is a spectral neighbor of any of the relevant peptides.
Looks up the relevant peptides whose precursor mass is within mass_tolerance_pc of neighbor_candidate's mono-isotopic mass, generates b/y spectra for each candidate match plus neighbor_candidate, and compares them with isNeighborSpectrum. Returns true as soon as any relevant peptide qualifies, but continues iterating so that the internal per-relevant-peptide neighbor counters are updated for every match. Call getNeighborStats once all candidates have been queried.
| [in] | neighbor_candidate | Candidate peptide (typically from a digested FASTA). |
| [in] | mass_tolerance_pc | Maximum precursor mass difference between neighbor_candidate and a relevant peptide, expressed in Da or ppm per mass_tolerance_pc_ppm. |
| [in] | mass_tolerance_pc_ppm | true to interpret mass_tolerance_pc as ppm (converted internally via Math::ppmToMass); false to interpret it as Da. |
| [in] | min_shared_ion_fraction | Threshold passed straight to isNeighborSpectrum. |
| [in] | mz_bin_size | m/z bin size passed straight to isNeighborSpectrum (typical values: 0.05 Th for high-res, 1.0005079 Th for low-res). |
true when at least one relevant peptide is a neighbor.
|
static |
Whether two spectra share enough peaks (in mz_bin_size m/z bins) to be considered neighbors.
Computes the Dice-style fraction 2 * shared_peaks / (spec1.size() + spec2.size()) and returns true when it is strictly greater than min_shared_ion_fraction. All peaks of both spectra are considered.
| [in] | spec1 | First theoretical spectrum. |
| [in] | spec2 | Second theoretical spectrum. |
| [in] | min_shared_ion_fraction | Minimum required Dice-style shared-peak fraction (in [0, 1]). |
| [in] | mz_bin_size | Bin size for m/z comparison (typical values: 0.05 Th for high-resolution data, 1.0005079 Th for low-resolution data). |
true when the shared fraction is strictly greater than min_shared_ion_fraction.
|
private |
digested relevant peptides
|
private |
map of masses to positions in digested_relevant_peptides_
|
private |
how many neighbors per reference peptide searched using isNeighborPeptide()?
|
private |
for b/y ions with charge 1
|
private |
residue for unknown amino acid