OpenMS  2.8.0
StablePairFinder Class Reference

This class implements a pair finding algorithm for consensus features. More...

#include <OpenMS/ANALYSIS/MAPMATCHING/StablePairFinder.h>

Public Types

typedef BaseGroupFinder Base
 Base class. More...
Public Member Functions

 StablePairFinder ()
 Constructor. More...
 ~StablePairFinder () override
 Destructor. More...
void run (const std::vector< ConsensusMap > &input_maps, ConsensusMap &result_map) override
 Run the algorithm. More...
static BaseGroupFinder* create ()
 Returns an instance of this class. More...
static const String getProductName ()
 Returns the name of this module. More...
Internal helper classes and enums

enum  { RT = Peak2D::RT , MZ = Peak2D::MZ }
double second_nearest_gap_
 The distance to the second nearest neighbors must be by this factor larger than the distance to the matched element itself. More...
bool use_IDs_
 Only match if peptide IDs are compatible? More...
void updateMembers_ () override
 This method is used to update extra member variables at the end of the setParameters() method. More...
bool compatibleIDs_ (const ConsensusFeature &feat1, const ConsensusFeature &feat2) const
 Checks if the peptide IDs of two features are compatible. More...
const AASequence& getBestHitSequence_ (const PeptideIdentification &peptideIdentification) const
 Returns the highest scoring peptide hit in the the given peptide identification. More...

Detailed Description

This class implements a pair finding algorithm for consensus features.

It offers a method to determine pairs across two consensus maps. The corresponding consensus features must be aligned, but may have small position deviations.

The distance measure is implemented in class FeatureDistance - see there for details.

Additional criteria for pairing

Depending on parameter use_identifications, peptide identifications annotated to the features may have to be compatible (i.e. no annotation or the same annotation) for a pairing to occur.

Stability criterion: The distance to the nearest neighbor must be smaller than the distance to the second-nearest neighbor by a certain factor, see parameter second_nearest_gap. There is a non-trivial relation between this parameter and the maximum allowed difference (in RT or m/z) of the distance measure: If second_nearest_gap is greater than one, lowering max_difference may in fact lead to more - rather than fewer - pairings, because it increases the distance difference between the nearest and the second-nearest neighbor, so that the constraint imposed by second_nearest_gap may be fulfilled more often.

Quality calculation

The quality of a pairing is computed from the distance between the paired elements (nearest neighbors) and the distances to the second-nearest neighbors of both elements, according to the formula:

\[ q_{i,j} = \big( 1 - d_{i,j} \big) \cdot \big( 1 - \frac{g \cdot d_{i,j}}{d_{2,i}} \big) \cdot \big( 1 - \frac{g \cdot d_{i,j}}{d_{2,j}} \big) \cdot \]

$ q_{i,j} $ is the quality of the pairing of elements i and j, $ d_{i,j} $ is the distance between the two, $ d_{2,i} $ and $d_{2,j} $ are the distances to the second-nearest neighbors of i and j, respectively, and g is the factor defined by parameter second_nearest_gap.

Note that by the definition of the distance measure, $ 0 \leq d_{i,j} \leq 1 $ if i and j are to form a pair. The criteria for pairing further require that $ g \cdot d_{i,j} \leq d_{2,i} $ and $ g \cdot d_{i,j} \leq d_{2,j} $. This ensures that the resulting quality is always between one (best) and zero (worst).

For the final quality q of the consensus feature produced by merging two paired elements (i and j), the existing quality values of the two elements are taken into account. The final quality is a weighted average of the existing qualities ( $ q_i $ and $ q_j $) and the quality of the pairing ( $ q_{i,j} $, see above):

\[ q = \frac{q_{i,j} + (s_i - 1) \cdot q_i + (s_j - 1) \cdot q_j}{s_i + s_j - 1} \]

The weighting factors $ s_i $ and $ s_j $ are the sizes (i.e. numbers of subelements) of the two consensus features i and j. That way, it is possible to link several feature maps to a growing consensus map in a stepwise fashion (as done by FeatureGroupingAlgorithmUnlabeled), and in the end obtain quality values that incorporate the qualities of all pairings that occurred during the generation of a consensus feature. Note that "missing" elements (if a consensus feature does not contain sub-features from all input maps) are not punished in this definition of quality.

Parameters of this class are:

second_nearest_gap float2.0 min: 1.0Only link features whose distance to the second nearest neighbors (for both sides) is larger by 'second_nearest_gap' than the distance between the matched pair itself.
use_identifications stringfalse true, falseNever link features that are annotated with different peptides (features without ID's always match; only the best hit per peptide identification is considered).
ignore_charge stringfalse true, falsefalse [default]: pairing requires equal charge state (or at least one unknown charge '0'); true: Pairing irrespective of charge state
ignore_adduct stringtrue true, falsetrue [default]: pairing requires equal adducts (or at least one without adduct annotation); true: Pairing irrespective of adducts
distance_RT:max_difference float100.0 min: 0.0Never pair features with a larger RT distance (in seconds).
distance_RT:exponent float1.0 min: 0.0Normalized RT differences ([0-1], relative to 'max_difference') are raised to this power (using 1 or 2 will be fast, everything else is REALLY slow)
distance_RT:weight float1.0 min: 0.0Final RT distances are weighted by this factor
distance_MZ:max_difference float0.3 min: 0.0Never pair features with larger m/z distance (unit defined by 'unit')
distance_MZ:unit stringDa Da, ppmUnit of the 'max_difference' parameter
distance_MZ:exponent float2.0 min: 0.0Normalized ([0-1], relative to 'max_difference') m/z differences are raised to this power (using 1 or 2 will be fast, everything else is REALLY slow)
distance_MZ:weight float1.0 min: 0.0Final m/z distances are weighted by this factor
distance_intensity:exponent float1.0 min: 0.0Differences in relative intensity ([0-1]) are raised to this power (using 1 or 2 will be fast, everything else is REALLY slow)
distance_intensity:weight float0.0 min: 0.0Final intensity distances are weighted by this factor
distance_intensity:log_transform stringdisabled enabled, disabledLog-transform intensities? If disabled, d = |int_f2 - int_f1| / int_max. If enabled, d = |log(int_f2 + 1) - log(int_f1 + 1)| / log(int_max + 1))


Member Typedef Documentation

◆ Base

Base class.

Member Enumeration Documentation

◆ anonymous enum

anonymous enum

Constructor & Destructor Documentation

StablePairFinder ()


◆ ~StablePairFinder()

~StablePairFinder ( )


Member Function Documentation

◆ compatibleIDs_()

bool compatibleIDs_ ( const ConsensusFeature feat1,
const ConsensusFeature feat2 
) const

Checks if the peptide IDs of two features are compatible.

A feature without identification is always compatible. Otherwise, two features are compatible if the best peptide hits of their identifications have the same sequences.

◆ create()

static BaseGroupFinder* create ( )

Returns an instance of this class.

◆ getBestHitSequence_()

const AASequence& getBestHitSequence_ ( const PeptideIdentification peptideIdentification) const

Returns the highest scoring peptide hit in the the given peptide identification.

peptideIdentificationThe peptideIdentification to scan.

◆ getProductName()

static const String getProductName ( )

Returns the name of this module.

◆ run()

void run ( const std::vector< ConsensusMap > &  input_maps,
ConsensusMap result_map 

Run the algorithm.

Exactly two input maps must be provided.
Exception::IllegalArgumentis thrown if the input data is not valid.

Implements BaseGroupFinder.

◆ updateMembers_()

void updateMembers_ ( )

This method is used to update extra member variables at the end of the setParameters() method.

Also call it at the end of the derived classes' copy constructor and assignment operator.

The default implementation is empty.

Reimplemented from DefaultParamHandler.

Member Data Documentation

◆ second_nearest_gap_

double second_nearest_gap_

The distance to the second nearest neighbors must be by this factor larger than the distance to the matched element itself.

◆ use_IDs_

bool use_IDs_

Only match if peptide IDs are compatible?