MapAlignmentAlgorithmIdentification Class Reference

A map alignment algorithm based on peptide identifications from MS2 spectra. More...

#include <OpenMS/ANALYSIS/MAPMATCHING/MapAlignmentAlgorithmIdentification.h>

Public Member Functions

 MapAlignmentAlgorithmIdentification ()
 Default constructor. More...
 ~MapAlignmentAlgorithmIdentification () override
 Destructor. More...
template<typename DataType >
void setReference (DataType &data)
template<typename DataType >
void align (std::vector< DataType > &data, std::vector< TransformationDescription > &transformations, Int reference_index=-1)
 Align feature maps, consensus maps, peak maps, or peptide identifications. More...
Detailed Description

A map alignment algorithm based on peptide identifications from MS2 spectra.

PeptideIdentification instances are grouped by sequence of the respective best-scoring PeptideHit and retention time data is collected (PeptideIdentification::getRT()). ID groups with the same sequence in different maps represent points of correspondence between the maps and form the basis of the alignment. Only the best PSM per spectrum is considered as the correct identification.

Each map is aligned to a reference retention time scale. This time scale can either come from a reference file (reference parameter) or be computed as a consensus of the input maps (median retention times over all maps of the ID groups). The maps are then aligned to this scale as follows:
The median retention time of each ID group in a map is mapped to the reference retention time of this group. Cubic spline smoothing is used to convert this mapping to a smooth function. Retention times in the map are transformed to the consensus scale by applying this function.

Parameters of this class are:

score_type string  Name of the score type to use for ranking and filtering (.oms input only). If left empty, a score type is picked automatically.
score_cutoff stringfalse true, falseUse only IDs above a score cut-off (parameter 'min_score') for alignment?
min_score float0.05  If 'score_cutoff' is 'true': Minimum score for an ID to be considered.
Unless you have very few runs or identifications, increase this value to focus on more informative peptides.
min_run_occur int2 min: 2Minimum number of runs (incl. reference, if any) in which a peptide must occur to be used for the alignment.
Unless you have very few runs or identifications, increase this value to focus on more informative peptides.
max_rt_shift float0.5 min: 0.0Maximum realistic RT difference for a peptide (median per run vs. reference). Peptides with higher shifts (outliers) are not used to compute the alignment.
If 0, no limit (disable filter); if > 1, the final value in seconds; if <= 1, taken as a fraction of the range of the reference RT scale.
use_unassigned_peptides stringtrue true, falseShould unassigned peptide identifications be used when computing an alignment of feature or consensus maps? If 'false', only peptide IDs assigned to features will be used.
use_feature_rt stringfalse true, falseWhen aligning feature or consensus maps, don't use the retention time of a peptide identification directly; instead, use the retention time of the centroid of the feature (apex of the elution profile) that the peptide was matched to. If different identifications are matched to one feature, only the peptide closest to the centroid in RT is used.
Precludes 'use_unassigned_peptides'.
use_adducts stringtrue true, falseIf IDs contain adducts, treat differently adducted variants of the same molecule as different.

  • If a section name is documented, the documentation is displayed as tooltip.
  • Advanced parameter names are italic.

Member Typedef Documentation

◆ SeqToList

typedef std::map<String, DoubleList> SeqToList

Type to store retention times given for individual peptide sequences.

◆ SeqToValue

typedef std::map<String, double> SeqToValue

Type to store one representative retention time per peptide sequence.

Constructor & Destructor Documentation

◆ MapAlignmentAlgorithmIdentification() [1/2]

Default constructor.

◆ ~MapAlignmentAlgorithmIdentification()


◆ MapAlignmentAlgorithmIdentification() [2/2]

Copy constructor intentionally not implemented -> private.

Member Function Documentation

◆ align()

void align ( std::vector< DataType > &  data,
std::vector< TransformationDescription > &  transformations,
Int  reference_index = -1 

Align feature maps, consensus maps, peak maps, or peptide identifications.

dataVector of input data (FeatureMap, ConsensusMap, PeakMap or vector<PeptideIdentification>) that should be aligned.
transformationsVector of RT transformations that will be computed.
reference_indexIndex in data of the reference to align to, if any
Exception::MissingInformationNot enough suitable RT data to perform alignment

◆ checkParameters_()

void checkParameters_ ( const Size  runs)

Check that parameter values are valid.

Currently only 'min_run_occur' is checked.

runsNumber of runs (input files) to be aligned

◆ computeMedians_()

void computeMedians_ ( SeqToList rt_data,
SeqToValue medians,
bool  sorted = false 

Compute the median retention time for each peptide sequence.

rt_dataLists of RT values for diff. peptide sequences (input, will be sorted)
mediansMedian RT values for the peptide sequences (output)
sortedAre RT lists already sorted?
Exception::IllegalArgumentif the input list is empty

◆ computeTransformations_()

void computeTransformations_ ( std::vector< SeqToList > &  rt_data,
std::vector< TransformationDescription > &  transforms,
bool  sorted = false 

Compute retention time transformations from RT data grouped by peptide sequence.

rt_dataLists of RT values for diff. peptide sequences, per dataset (input, will be sorted)
transformsResulting transformations, per dataset (output)
sortedAre RT lists already sorted?

◆ getReference_()

void getReference_ ( )

Get reference retention times.

If a reference file is supplied via the reference parameter, extract retention time information and store it in reference_.

◆ getRetentionTimes_() [1/4]

bool getRetentionTimes_ ( IdentificationData id_data,
SeqToList rt_data 

Collect retention time data from spectrum matches.

id_dataInput identification data
rt_dataLists of RT values for diff. spectrum matches (output)
Are the RTs already sorted? (Here: false)

◆ getRetentionTimes_() [2/4]

bool getRetentionTimes_ ( MapType features,
SeqToList rt_data 

Collect retention time data from peptide IDs contained in feature maps or consensus maps.

The following global flags (mutually exclusive) influence the processing:
Depending on use_unassigned_peptides, unassigned peptide IDs are used in addition to IDs annotated to features.
Depending on use_feature_rt, feature retention times are used instead of peptide retention times. Depending on score_cutoff and min_score, only peptide IDs with minimum score X are used. Higher score better is determined from the first PeptideID encountered. Make sure they are the same. This param is useless with use_feature_rt yet.

featuresInput features for RT data
rt_dataLists of RT values for diff. peptide sequences (output)
Are the RTs already sorted? (Here: true)

References MSExperiment::begin(), and MSExperiment::end().

◆ getRetentionTimes_() [3/4]

bool getRetentionTimes_ ( PeakMap experiment,
SeqToList rt_data 

Collect retention time data from peptide IDs annotated to spectra.

experimentInput map for RT data
rt_dataLists of RT values for diff. peptide sequences (output)
Are the RTs already sorted? (Here: false)

◆ getRetentionTimes_() [4/4]

bool getRetentionTimes_ ( std::vector< PeptideIdentification > &  peptides,
SeqToList rt_data 

Collect retention time data from peptide IDs.

peptidesInput peptide IDs (lists of peptide hits will be sorted)
rt_dataLists of RT values for diff. peptide sequences (output)
Are the RTs already sorted? (Here: false)

◆ handleIdDataScoreType_()

IdentificationData::ScoreTypeRef handleIdDataScoreType_ ( const IdentificationData id_data)

Helper function to find/define the score type for processing IdentificationData.

Reference to the score type denoted by algorithm parameter "score_type"

◆ operator=()

Assignment operator intentionally not implemented -> private.

◆ setReference()

void setReference ( DataType &  data)

Member Data Documentation

◆ better_

bool(* better_) (double, double) = [](double, double) {return true;}

Score better?

◆ min_run_occur_

Size min_run_occur_

Minimum number of runs a peptide must occur in.

◆ min_score_

double min_score_

Minimum score to reach for a peptide to be considered.

◆ reference_

SeqToValue reference_

Reference retention times (per peptide sequence)

◆ reference_index_

Int reference_index_

Index of input file to use as reference (if any)

◆ score_cutoff_

bool score_cutoff_ {}

Actually use the above defined score_cutoff? Needed since it is hard to define a non-cutting score for a user.

◆ score_type_

String score_type_

Score type to use for filtering.

◆ use_adducts_

bool use_adducts_ {}

Consider differently adducted IDs as different?

◆ use_feature_rt_

bool use_feature_rt_ {}

Use feature RT instead of RT from best peptide ID in the feature?