OpenMS
Loading...
Searching...
No Matches
NeighborSeq Class Reference

Subset-neighbor peptide search: find peptides from a wider pool (typically a FASTA digest) that are spectral neighbors of a smaller "relevant" peptide set, useful when only part of a complex sample is of interest. More...

#include <OpenMS/ANALYSIS/ID/NeighborSeq.h>

Collaboration diagram for NeighborSeq:
[legend]

Classes

struct  NeighborStats
 Statistics of how many neighbors were found per reference peptide. More...
 

Public Member Functions

 NeighborSeq (std::vector< AASequence > &&digested_relevant_peptides)
 Construct from a vector of "relevant" digested peptides.
 
MSSpectrum generateSpectrum (const AASequence &peptide_sequence)
 Generates a theoretical spectrum for a given peptide sequence with b/y ions at charge 1.
 
bool isNeighborPeptide (const AASequence &neighbor_candidate, const double mass_tolerance_pc, const bool mass_tolerance_pc_ppm, const double min_shared_ion_fraction, const double mz_bin_size)
 Whether neighbor_candidate is a spectral neighbor of any of the relevant peptides.
 
NeighborStats getNeighborStats () const
 after calling isNeighborPeptide() multiple times, this function returns the statistics of how many neighbors were found per reference peptide
 

Static Public Member Functions

static bool isNeighborSpectrum (const MSSpectrum &spec1, const MSSpectrum &spec2, const double min_shared_ion_fraction, const double mz_bin_size)
 Whether two spectra share enough peaks (in mz_bin_size m/z bins) to be considered neighbors.
 
static int computeSharedIonCount (const MSSpectrum &spec1, const MSSpectrum &spec2, const double &mz_bin_size)
 Compute the number of shared ions between two spectra.
 

Protected Member Functions

std::map< double, std::vector< int > > createMassLookup_ ()
 Creates a map of masses to positions from the internal relevant peptides.
 
auto findCandidatePositions_ (const double mono_weight, double mass_tolerance, const bool mass_tolerance_pc_ppm)
 Finds candidate positions based on a given mono-isotopic weight and mass tolerance.
 

Private Attributes

const std::vector< AASequence > & digested_relevant_peptides_
 digested relevant peptides
 
std::map< double, std::vector< int > > mass_position_map_
 map of masses to positions in digested_relevant_peptides_
 
TheoreticalSpectrumGenerator spec_gen_
 for b/y ions with charge 1
 
const Residuex_residue_
 residue for unknown amino acid
 
std::vector< int > neighbor_stats_
 how many neighbors per reference peptide searched using isNeighborPeptide()?
 

Detailed Description

Subset-neighbor peptide search: find peptides from a wider pool (typically a FASTA digest) that are spectral neighbors of a smaller "relevant" peptide set, useful when only part of a complex sample is of interest.

Two peptides are considered neighbors when their precursor masses are within tolerance and their theoretical b/y fragment spectra share enough peaks. The class is constructed once with the relevant peptides, then queried with isNeighborPeptide for each candidate. After the queries are done, getNeighborStats summarises how many relevant peptides had zero, one, or multiple neighbors.

Background: Cormen et al., J. Proteome Research 2021, 10.1021/acs.jproteome.1c00483 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8489664/).

Constructor & Destructor Documentation

◆ NeighborSeq()

NeighborSeq ( std::vector< AASequence > &&  digested_relevant_peptides)

Construct from a vector of "relevant" digested peptides.

Builds an internal mass index (relevant peptides containing the unknown-amino-acid residue 'X' are skipped and a count is logged via OPENMS_LOG_WARN) and configures the internal theoretical-spectrum generator for charge-1 b/y ions including the b1 prefix ion.

Note
The class stores a const-reference to the moved-in vector via its internal member (not a copy). The vector passed to the constructor must therefore outlive every call on this instance.
Parameters
[in]digested_relevant_peptidesDigested peptides to use as the "relevant" reference set.

Member Function Documentation

◆ computeSharedIonCount()

static int computeSharedIonCount ( const MSSpectrum spec1,
const MSSpectrum spec2,
const double &  mz_bin_size 
)
static

Compute the number of shared ions between two spectra.

All peaks are considered. Use generateSpectrum() to generate theoretical spectra with b/y ions.

Parameters
[in]spec1The first theoretical spectrum.
[in]spec2The second theoretical spectrum.
[in]mz_bin_sizeBin size for the m/z values, which determines if two peaks are considered to be the same.
Returns
The number of shared ions

◆ createMassLookup_()

std::map< double, std::vector< int > > createMassLookup_ ( )
protected

Creates a map of masses to positions from the internal relevant peptides.

Returns
A map where the key is the mass and the value is a vector of positions.

◆ findCandidatePositions_()

auto findCandidatePositions_ ( const double  mono_weight,
double  mass_tolerance,
const bool  mass_tolerance_pc_ppm 
)
protected

Finds candidate positions based on a given mono-isotopic weight and mass tolerance.

Parameters
[in]mono_weightThe mono-isotopic weight to find candidates for.
[in]mass_toleranceThe allowed tolerance for matching the mass.
[in]mass_tolerance_pc_ppmWhether the mass tolerance is in ppm.
Returns
A pair of begin/end iterators into mass_position_map_ for the candidate positions

◆ generateSpectrum()

MSSpectrum generateSpectrum ( const AASequence peptide_sequence)

Generates a theoretical spectrum for a given peptide sequence with b/y ions at charge 1.

Includes all b and y ions with charge 1 (even the prefix ions, e.g. b1), but no internal ions.

Parameters
[in]peptide_sequenceThe peptide sequence for which to generate the spectrum.
Returns
The generated theoretical spectrum.

◆ getNeighborStats()

NeighborStats getNeighborStats ( ) const

after calling isNeighborPeptide() multiple times, this function returns the statistics of how many neighbors were found per reference peptide

◆ isNeighborPeptide()

bool isNeighborPeptide ( const AASequence neighbor_candidate,
const double  mass_tolerance_pc,
const bool  mass_tolerance_pc_ppm,
const double  min_shared_ion_fraction,
const double  mz_bin_size 
)

Whether neighbor_candidate is a spectral neighbor of any of the relevant peptides.

Looks up the relevant peptides whose precursor mass is within mass_tolerance_pc of neighbor_candidate's mono-isotopic mass, generates b/y spectra for each candidate match plus neighbor_candidate, and compares them with isNeighborSpectrum. Returns true as soon as any relevant peptide qualifies, but continues iterating so that the internal per-relevant-peptide neighbor counters are updated for every match. Call getNeighborStats once all candidates have been queried.

Parameters
[in]neighbor_candidateCandidate peptide (typically from a digested FASTA).
[in]mass_tolerance_pcMaximum precursor mass difference between neighbor_candidate and a relevant peptide, expressed in Da or ppm per mass_tolerance_pc_ppm.
[in]mass_tolerance_pc_ppmtrue to interpret mass_tolerance_pc as ppm (converted internally via Math::ppmToMass); false to interpret it as Da.
[in]min_shared_ion_fractionThreshold passed straight to isNeighborSpectrum.
[in]mz_bin_sizem/z bin size passed straight to isNeighborSpectrum (typical values: 0.05 Th for high-res, 1.0005079 Th for low-res).
Returns
true when at least one relevant peptide is a neighbor.

◆ isNeighborSpectrum()

static bool isNeighborSpectrum ( const MSSpectrum spec1,
const MSSpectrum spec2,
const double  min_shared_ion_fraction,
const double  mz_bin_size 
)
static

Whether two spectra share enough peaks (in mz_bin_size m/z bins) to be considered neighbors.

Computes the Dice-style fraction 2 * shared_peaks / (spec1.size() + spec2.size()) and returns true when it is strictly greater than min_shared_ion_fraction. All peaks of both spectra are considered.

Parameters
[in]spec1First theoretical spectrum.
[in]spec2Second theoretical spectrum.
[in]min_shared_ion_fractionMinimum required Dice-style shared-peak fraction (in [0, 1]).
[in]mz_bin_sizeBin size for m/z comparison (typical values: 0.05 Th for high-resolution data, 1.0005079 Th for low-resolution data).
Returns
true when the shared fraction is strictly greater than min_shared_ion_fraction.

Member Data Documentation

◆ digested_relevant_peptides_

const std::vector<AASequence>& digested_relevant_peptides_
private

digested relevant peptides

◆ mass_position_map_

std::map<double, std::vector<int> > mass_position_map_
private

map of masses to positions in digested_relevant_peptides_

◆ neighbor_stats_

std::vector<int> neighbor_stats_
private

how many neighbors per reference peptide searched using isNeighborPeptide()?

◆ spec_gen_

TheoreticalSpectrumGenerator spec_gen_
private

for b/y ions with charge 1

◆ x_residue_

const Residue* x_residue_
private

residue for unknown amino acid