OpenMS
NeighborSeq Class Reference

The Neighbor Peptide functionality is designed to find peptides (neighbors) in a given set of sequences (FASTA file) that are similar to a target peptide (aka relevant peptide) based on mass and spectral characteristics. This provides more power when searching complex samples, when only a subset of the peptides/proteins is of interest. More...

#include <OpenMS/ANALYSIS/ID/NeighborSeq.h>

Collaboration diagram for NeighborSeq:
[legend]

Classes

struct  NeighborStats
 Statistics of how many neighbors were found per reference peptide. More...
 

Public Member Functions

 NeighborSeq (std::vector< AASequence > &&digested_relevant_peptides)
 
MSSpectrum generateSpectrum (const AASequence &peptide_sequence)
 Generates a theoretical spectrum for a given peptide sequence with b/y ions at charge 1. More...
 
bool isNeighborPeptide (const AASequence &neighbor_candidate, const double mass_tolerance_pc, const bool mass_tolerance_pc_ppm, const double min_shared_ion_fraction, const double mz_bin_size)
 Is this peptide a neighbor to one of the relevant peptides? More...
 
NeighborStats getNeighborStats () const
 after calling isNeighborPeptide() multiple times, this function returns the statistics of how many neighbors were found per reference peptide More...
 

Static Public Member Functions

static bool isNeighborSpectrum (const MSSpectrum &spec1, const MSSpectrum &spec2, const double min_shared_ion_fraction, const double mz_bin_size)
 Compares two spectra to determine if they share a sufficient number of ions. More...
 
static int computeSharedIonCount (const MSSpectrum &spec1, const MSSpectrum &spec2, const double &mz_bin_size)
 Compute the number of shared ions between two spectra. More...
 

Protected Member Functions

std::map< double, std::vector< int > > createMassLookup_ ()
 Creates a map of masses to positions from the internal relevant peptides. More...
 
auto findCandidatePositions_ (const double mono_weight, double mass_tolerance, const bool mass_tolerance_pc_ppm)
 Finds candidate positions based on a given mono-isotopic weight and mass tolerance. More...
 

Private Attributes

const std::vector< AASequence > & digested_relevant_peptides_
 digested relevant peptides More...
 
std::map< double, std::vector< int > > mass_position_map_
 map of masses to positions in digested_relevant_peptides_ More...
 
TheoreticalSpectrumGenerator spec_gen_
 for b/y ions with charge 1 More...
 
const Residuex_residue_
 residue for unknown amino acid More...
 
std::vector< int > neighbor_stats_
 how many neighbors per reference peptide searched using isNeighborPeptide()? More...
 

Detailed Description

The Neighbor Peptide functionality is designed to find peptides (neighbors) in a given set of sequences (FASTA file) that are similar to a target peptide (aka relevant peptide) based on mass and spectral characteristics. This provides more power when searching complex samples, when only a subset of the peptides/proteins is of interest.

The paper on subset neighbor search is www.ncbi.nlm.nih.gov/pmc/articles/PMC8489664/ DOI: 10.1021/acs.jproteome.1c00483

Constructor & Destructor Documentation

◆ NeighborSeq()

NeighborSeq ( std::vector< AASequence > &&  digested_relevant_peptides)

Constructor

Parameters
digested_relevant_peptidesA vector of digested relevant peptides

Member Function Documentation

◆ computeSharedIonCount()

static int computeSharedIonCount ( const MSSpectrum spec1,
const MSSpectrum spec2,
const double &  mz_bin_size 
)
static

Compute the number of shared ions between two spectra.

All peaks are considered. Use generateSpectrum() to generate theoretical spectra with b/y ions.

Parameters
spec1The first theoretical spectrum.
spec2The second theoretical spectrum.
mz_bin_sizeBin size for the m/z values, which determines if two peaks are considered to be the same.
Returns
The number of shared ions

◆ createMassLookup_()

std::map<double, std::vector<int> > createMassLookup_ ( )
protected

Creates a map of masses to positions from the internal relevant peptides.

Returns
A map where the key is the mass and the value is a vector of positions.

◆ findCandidatePositions_()

auto findCandidatePositions_ ( const double  mono_weight,
double  mass_tolerance,
const bool  mass_tolerance_pc_ppm 
)
protected

Finds candidate positions based on a given mono-isotopic weight and mass tolerance.

Parameters
mono_weightThe mono-isotopic weight to find candidates for.
mass_toleranceThe allowed tolerance for matching the mass.
mass_tolerance_pc_ppmWhether the mass tolerance is in ppm.
Returns
A pair of begin/end iterators into mass_position_map_ for the candidate positions

◆ generateSpectrum()

MSSpectrum generateSpectrum ( const AASequence peptide_sequence)

Generates a theoretical spectrum for a given peptide sequence with b/y ions at charge 1.

Includes all b and y ions with charge 1 (even the prefix ions, e.g. b1), but no internal ions.

Parameters
peptide_sequenceThe peptide sequence for which to generate the spectrum.
Returns
The generated theoretical spectrum.

◆ getNeighborStats()

NeighborStats getNeighborStats ( ) const

after calling isNeighborPeptide() multiple times, this function returns the statistics of how many neighbors were found per reference peptide

◆ isNeighborPeptide()

bool isNeighborPeptide ( const AASequence neighbor_candidate,
const double  mass_tolerance_pc,
const bool  mass_tolerance_pc_ppm,
const double  min_shared_ion_fraction,
const double  mz_bin_size 
)

Is this peptide a neighbor to one of the relevant peptides?

Also updates the internal statistics, which can be retrieved using getNeighborStats().

Parameters
neighbor_candidateThe peptide sequence (from a neighbor protein) to compare against the internal relevant peptides (see constructor).
mass_tolerance_pcMaximal precursor mass difference (in Da or ppm; see 'mass_tolerance_pc_ppm') between neighbor and relevant peptide.
mass_tolerance_pc_ppmIs 'mass_tolerance_pc' in Da or ppm?
min_shared_ion_fractionThe ion tolerance for neighbor peptides.
mz_bin_sizeBin size for spectra m/z comparison (the original study suggests 0.05 Th for high-res and 1.0005079 Th for low-res spectra).
Returns
true if neighbor_candidate is neighbor to one or more relevant peptides, false otherwise.

◆ isNeighborSpectrum()

static bool isNeighborSpectrum ( const MSSpectrum spec1,
const MSSpectrum spec2,
const double  min_shared_ion_fraction,
const double  mz_bin_size 
)
static

Compares two spectra to determine if they share a sufficient number of ions.

All peaks are considered. Use generateSpectrum() to generate theoretical spectra with b/y ions.

Parameters
spec1The first theoretical spectrum.
spec2The second theoretical spectrum.
min_shared_ion_fractionThe minimal required proportion of shared ions in [0, 1]
mz_bin_sizeBin size for the m/z values, which determines if two peaks are considered to be the same (typically, 0.05 for high resolution and 1.0005079 for low resolution).
Returns
True if the spectra share a sufficient number of ions, false otherwise.

Member Data Documentation

◆ digested_relevant_peptides_

const std::vector<AASequence>& digested_relevant_peptides_
private

digested relevant peptides

◆ mass_position_map_

std::map<double, std::vector<int> > mass_position_map_
private

map of masses to positions in digested_relevant_peptides_

◆ neighbor_stats_

std::vector<int> neighbor_stats_
private

how many neighbors per reference peptide searched using isNeighborPeptide()?

◆ spec_gen_

TheoreticalSpectrumGenerator spec_gen_
private

for b/y ions with charge 1

◆ x_residue_

const Residue* x_residue_
private

residue for unknown amino acid