All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Properties Friends Macros Modules Pages
MapAlignerPoseClustering

Corrects retention time distortions between maps, using a pose clustering approach.

potential predecessor tools → MapAlignerPoseClustering → potential successor tools
FeatureFinderCentroided
(or another feature finding algorithm)
FeatureLinkerUnlabeled or
FeatureLinkerUnlabeledQT

This tool provides an algorithm to align the retention time scales of multiple input files, correcting shifts and distortions between them. Retention time adjustment may be necessary to correct for chromatography differences e.g. before data from multiple LC-MS runs can be combined (feature grouping), or when one run should be annotated with peptide identifications obtained in a different run.

All map alignment tools (MapAligner...) collect retention time data from the input files and - by fitting a model to this data - compute transformations that map all runs to a common retention time scale. They can apply the transformations right away and return output files with aligned time scales (parameter out), and/or return descriptions of the transformations in trafoXML format (parameter trafo_out). Transformations stored as trafoXML can be applied to arbitrary files with the MapRTTransformer tool.

The map alignment tools differ in how they obtain retention time data for the modeling of transformations, and consequently what types of data they can be applied to. The alignment algorithm implemented here is the pose clustering algorithm as described in doi:10.1093/bioinformatics/btm209. It is used to find an affine transformation, which is further refined by a feature grouping step. This algorithm can be applied to features (featureXML) and peaks (mzML), but it has mostly been developed and tested on features. For more details and algorithm-specific parameters (set in the INI file) see "Detailed Description" in the algorithm documentation.

See also
MapAlignerPoseClustering MapRTTransformer

This algorithm uses an affine transformation model.

To speed up the alignment, consider reducing 'max_number_of_peaks_considered'. If your alignment is not good enough, consider increasing this number (the alignment will take longer though).

The command line parameters of this tool are:

MapAlignerPoseClustering -- Corrects retention time distortions between maps using a pose clustering approach
.
Full documentation: http://www.openms.de/doxygen/release/3.2.0/html/TOPP_MapAlignerPoseClustering.html
Version: 3.2.0 Nov 26 2024, 13:16:38, Revision: 962e60f
To cite OpenMS:
 + Pfeuffer, J., Bielow, C., Wein, S. et al.. OpenMS 3 enables reproducible analysis of large-scale mass spec
   trometry data. Nat Methods (2024). doi:10.1038/s41592-024-02197-7.

Usage:
  MapAlignerPoseClustering <options>

This tool has algorithm parameters that are not shown here! Please check the ini file for a detailed descript
ion or use the --helphelp option

Options (mandatory options marked with '*'):
  -in <files>*               Input files to align (all must have the same file type) (valid formats: 'feature
                             XML', 'mzML')
  -out <files>               Output files (same file type as 'in'). This option or 'trafo_out' has to be prov
                             ided; they can be used together. (valid formats: 'featureXML', 'mzML')
  -trafo_out <files>         Transformation output files. This option or 'out' has to be provided; they can 
                             be used together. (valid formats: 'trafoXML')

Options to define a reference file (use either 'file' or 'index', not both):
  -reference:file <file>     File to use as reference (same file format as input files required) (valid forma
                             ts: 'featureXML', 'mzML')
  -reference:index <number>  Use one of the input files as reference ('1' for the first file, etc.).
                             If '0', no explicit reference is set - the algorithm will select a reference. 
                             (default: '0') (min: '0')

                             
Common TOPP options:
  -ini <file>                Use the given TOPP INI file
  -threads <n>               Sets the number of threads allowed to be used by the TOPP tool (default: '1')
  -write_ini <file>          Writes the default configuration file
  --help                     Shows options
  --helphelp                 Shows all options (including advanced)

The following configuration subsections are valid:
 - algorithm   Algorithm parameters section

You can write an example INI file using the '-write_ini' option.
Documentation of subsection parameters can be found in the doxygen documentation or the INIFileEditor.
For more information, please consult the online documentation for this tool:
  - http://www.openms.de/doxygen/release/3.2.0/html/TOPP_MapAlignerPoseClustering.html

INI file documentation of this tool:

Legend:
required parameter
advanced parameter
+MapAlignerPoseClusteringCorrects retention time distortions between maps using a pose clustering approach.
version3.2.0 Version of the tool that generated this parameters file.
++1Instance '1' section for 'MapAlignerPoseClustering'
in[] Input files to align (all must have the same file type)input file*.featureXML, *.mzML
out[] Output files (same file type as 'in'). This option or 'trafo_out' has to be provided; they can be used together.output file*.featureXML, *.mzML
trafo_out[] Transformation output files. This option or 'out' has to be provided; they can be used together.output file*.trafoXML
log Name of log file (created only when specified)
debug0 Sets the debug level
threads1 Sets the number of threads allowed to be used by the TOPP tool
no_progressfalse Disables progress logging to command linetrue, false
forcefalse Overrides tool-specific checkstrue, false
testfalse Enables the test mode (needed for internal use only)true, false
+++referenceOptions to define a reference file (use either 'file' or 'index', not both)
file File to use as reference (same file format as input files required)input file*.featureXML, *.mzML
index0 Use one of the input files as reference ('1' for the first file, etc.).
If '0', no explicit reference is set - the algorithm will select a reference.
0:∞
+++algorithmAlgorithm parameters section
max_num_peaks_considered1000 The maximal number of peaks/features to be considered per map. To use all, set to '-1'.-1:∞
++++superimposer
mz_pair_max_distance0.5 Maximum of m/z deviation of corresponding elements in different maps. This condition applies to the pairs considered in hashing.0.0:∞
rt_pair_distance_fraction0.1 Within each of the two maps, the pairs considered for pose clustering must be separated by at least this fraction of the total elution time interval (i.e., max - min). 0.0:1.0
num_used_points2000 Maximum number of elements considered in each map (selected by intensity). Use this to reduce the running time and to disregard weak signals during alignment. For using all points, set this to -1.-1:∞
scaling_bucket_size5.0e-03 The scaling of the retention time interval is being hashed into buckets of this size during pose clustering. A good choice for this would be a bit smaller than the error you would expect from repeated runs.0.0:∞
shift_bucket_size3.0 The shift at the lower (respectively, higher) end of the retention time interval is being hashed into buckets of this size during pose clustering. A good choice for this would be about the time between consecutive MS scans.0.0:∞
max_shift1000.0 Maximal shift which is considered during histogramming (in seconds). This applies for both directions.0.0:∞
max_scaling2.0 Maximal scaling which is considered during histogramming. The minimal scaling is the reciprocal of this.1.0:∞
dump_buckets [DEBUG] If non-empty, base filename where hash table buckets will be dumped to. A serial number for each invocation will be appended automatically.
dump_pairs [DEBUG] If non-empty, base filename where the individual hashed pairs will be dumped to (large!). A serial number for each invocation will be appended automatically.
++++pairfinder
second_nearest_gap2.0 Only link features whose distance to the second nearest neighbors (for both sides) is larger by 'second_nearest_gap' than the distance between the matched pair itself.1.0:∞
use_identificationsfalse Never link features that are annotated with different peptides (features without ID's always match; only the best hit per peptide identification is considered).true, false
ignore_chargefalse false [default]: pairing requires equal charge state (or at least one unknown charge '0'); true: Pairing irrespective of charge statetrue, false
ignore_adducttrue true [default]: pairing requires equal adducts (or at least one without adduct annotation); true: Pairing irrespective of adductstrue, false
+++++distance_RTDistance component based on RT differences
max_difference100.0 Never pair features with a larger RT distance (in seconds).0.0:∞
exponent1.0 Normalized RT differences ([0-1], relative to 'max_difference') are raised to this power (using 1 or 2 will be fast, everything else is REALLY slow)0.0:∞
weight1.0 Final RT distances are weighted by this factor0.0:∞
+++++distance_MZDistance component based on m/z differences
max_difference0.3 Never pair features with larger m/z distance (unit defined by 'unit')0.0:∞
unitDa Unit of the 'max_difference' parameterDa, ppm
exponent2.0 Normalized ([0-1], relative to 'max_difference') m/z differences are raised to this power (using 1 or 2 will be fast, everything else is REALLY slow)0.0:∞
weight1.0 Final m/z distances are weighted by this factor0.0:∞
+++++distance_intensityDistance component based on differences in relative intensity (usually relative to highest peak in the whole data set)
exponent1.0 Differences in relative intensity ([0-1]) are raised to this power (using 1 or 2 will be fast, everything else is REALLY slow)0.0:∞
weight0.0 Final intensity distances are weighted by this factor0.0:∞
log_transformdisabled Log-transform intensities? If disabled, d = |int_f2 - int_f1| / int_max. If enabled, d = |log(int_f2 + 1) - log(int_f1 + 1)| / log(int_max + 1))enabled, disabled