Group corresponding features across labelfree experiments.

Group corresponding features across labelfree experiments. This tool produces results similar to those of FeatureLinkerUnlabeledQT, since it optimizes a similar objective. However, this algorithm is more efficient than FLQT as it uses a kd-tree for fast 2D region queries in m/z - RT space and a sorted binary search tree to choose the best cluster among the remaining ones in O(1). Insertion and searching in this tree have O(log n) runtime. KD-tree insertion and search have O(log n) runtime. The overall complexity of the algorithm is O(n log(n)) time and O(n) space.

In practice, the runtime of FeatureLinkerUnlabeledQT is often not significantly worse than that of FeatureLinkerUnlabeledKD if the datasets are relatively small and/or the value of the -nr_partitions parameter is chosen large enough. If, however, the datasets are very large, and especially if they are so dense that a partitioning based on the specified m/z tolerance is not possible anymore, then this algorithm becomes orders of magnitudes faster than FLQT.

Notably, this algorithm can be used to align featureXML files containing unassembled mass traces (as produced by MassTraceExtractor), which is often impossible for reasonably large datasets using other aligners, as these datasets tend to be too dense and hence cannot be partitioned.

Prior to feature linking, this tool performs an (optional) retention time transformation on the features using LOWESS regression in order to minimize retention time differences between corresponding features across different maps. These transformed RTs are used only internally. In the results, original RTs will be reported.

The linking behavior can be influenced by separately specifying how to use the available charge and adduct information. Options allow to restrict linking to features with the same adduct/charge (or lack thereof, i.e. features with charge zero or no adduct annotation), additionally allowing the linking of charged/adduct-annotated features with those having no charge/adduct information, or allowing all features to be linked irrespective of charge state/adduct information.

Note that the more relaxed the allowed grouping criteria, the larger internally used connected components memory-wise. More stringent m/z or retention time tolerances might be required then.

The command line parameters of this tool are:

FeatureLinkerUnlabeledKD -- Groups corresponding features from multiple maps.
Full documentation: http://www.openms.de/doxygen/release/3.0.0/html/TOPP_FeatureLinkerUnlabeledKD.html
Version: 3.0.0 Jul 14 2023, 11:57:33, Revision: be787e9
To cite OpenMS:
+ Rost HL, Sachsenberg T, Aiche S, Bielow C et al.. OpenMS: a flexible open-source software platform for
mass spectrometry data analysis. Nat Meth. 2016; 13, 9: 741-748. doi:10.1038/nmeth.3959.

Usage:
FeatureLinkerUnlabeledKD <options>

This tool has algorithm parameters that are not shown here! Please check the ini file for a detailed descript
ion or use the --helphelp option

Options (mandatory options marked with '*'):
-in <files>* Input files separated by blanks (valid formats: 'featureXML', 'consensusXML')
-out <file>* Output file (valid formats: 'consensusXML')
-design <file> Input file containing the experimental design (valid formats: 'tsv')

-keep_subelements For consensusXML input only: If set, the sub-features of the inputs are transferred to
the output.

Common TOPP options:
-ini <file> Use the given TOPP INI file
-threads <n> Sets the number of threads allowed to be used by the TOPP tool (default: '1')
-write_ini <file> Writes the default configuration file
--help Shows options
--helphelp Shows all options (including advanced)

The following configuration subsections are valid:
- algorithm Algorithm parameters section

You can write an example INI file using the '-write_ini' option.
Documentation of subsection parameters can be found in the doxygen documentation or the INIFileEditor.
For more information, please consult the online documentation for this tool:
- http://www.openms.de/doxygen/release/3.0.0/html/TOPP_FeatureLinkerUnlabeledKD.html

INI file documentation of this tool:

Legend:

required parameter

advanced parameter

+FeatureLinkerUnlabeledKDGroups corresponding features from multiple maps.

version3.0.0 Version of the tool that generated this parameters file.

++1Instance '1' section for 'FeatureLinkerUnlabeledKD'

in[] input files separated by blanksinput file*.featureXML, *.consensusXML

out Output fileoutput file*.consensusXML

design input file containing the experimental designinput file*.tsv

keep_subelementsfalse For consensusXML input only: If set, the sub-features of the inputs are transferred to the output.true, false

log Name of log file (created only when specified)

debug0 Sets the debug level

threads1 Sets the number of threads allowed to be used by the TOPP tool

no_progressfalse Disables progress logging to command linetrue, false

forcefalse Overrides tool-specific checkstrue, false

testfalse Enables the test mode (needed for internal use only)true, false

+++algorithmAlgorithm parameters section

mz_unitppm Unit of m/z toleranceppm, Da

nr_partitions100 Number of partitions in m/z space1:∞

++++warp

enabledtrue Whether or not to internally warp feature RTs using LOWESS transformation before linking (reported RTs in results will always be the original RTs)true, false

rt_tol100.0 Width of RT tolerance window (sec)0.0:∞

mz_tol5.0 m/z tolerance (in ppm or Da)0.0:∞

max_pairwise_log_fc0.5 Maximum absolute log10 fold change between two compatible signals during compatibility graph construction. Two signals from different maps will not be connected by an edge in the compatibility graph if absolute log fold change exceeds this limit (they might still end up in the same connected component, however). Note: this does not limit fold changes in the linking stage, only during RT alignment, where we try to find high-quality alignment anchor points. Setting this to a value < 0 disables the FC check.

min_rel_cc_size0.5 Only connected components containing compatible features from at least max(2, (warp_min_occur * number_of_input_maps)) input maps are considered for computing the warping function0.0:1.0

max_nr_conflicts0 Allow up to this many conflicts (features from the same map) per connected component to be used for alignment (-1 means allow any number of conflicts)-1:∞

++++link

rt_tol30.0 Width of RT tolerance window (sec)0.0:∞

mz_tol10.0 m/z tolerance (in ppm or Da)0.0:∞

charge_mergingWith_charge_zero whether to disallow charge mismatches (Identical), allow to link charge zero (i.e., unknown charge state) with every charge state, or disregard charges (Any).Identical, With_charge_zero, Any

adduct_mergingAny whether to only allow the same adduct for linking (Identical), also allow linking features with adduct-free ones, or disregard adducts (Any).Identical, With_unknown_adducts, Any

++++distance_RTDistance component based on RT differences

exponent1.0 Normalized RT differences ([0-1], relative to 'max_difference') are raised to this power (using 1 or 2 will be fast, everything else is REALLY slow)0.0:∞

weight1.0 Final RT distances are weighted by this factor0.0:∞

++++distance_MZDistance component based on m/z differences

exponent2.0 Normalized ([0-1], relative to 'max_difference') m/z differences are raised to this power (using 1 or 2 will be fast, everything else is REALLY slow)0.0:∞

weight1.0 Final m/z distances are weighted by this factor0.0:∞

++++distance_intensityDistance component based on differences in relative intensity (usually relative to highest peak in the whole data set)

exponent1.0 Differences in relative intensity ([0-1]) are raised to this power (using 1 or 2 will be fast, everything else is REALLY slow)0.0:∞

weight1.0 Final intensity distances are weighted by this factor0.0:∞

log_transformenabled Log-transform intensities? If disabled, d = |int_f2 - int_f1| / int_max. If enabled, d = |log(int_f2 + 1) - log(int_f1 + 1)| / log(int_max + 1))enabled, disabled

++++LOWESSLOWESS parameters for internal RT transformations (only relevant if 'warp:enabled' is set to 'true')

span0.666666666666667 Fraction of datapoints (f) to use for each local regression (determines the amount of smoothing). Choosing this parameter in the range .2 to .8 usually results in a good fit.0.0:1.0

num_iterations3 Number of robustifying iterations for lowess fitting.0:∞

delta-1.0 Nonnegative parameter which may be used to save computations (recommended value is 0.01 of the range of the input, e.g. for data ranging from 1000 seconds to 2000 seconds, it could be set to 10). Setting a negative value will automatically do this.

interpolation_typecspline Method to use for interpolation between datapoints computed by lowess. 'linear': Linear interpolation. 'cspline': Use the cubic spline for interpolation. 'akima': Use an akima spline for interpolationlinear, cspline, akima

extrapolation_typefour-point-linear Method to use for extrapolation outside the data range. 'two-point-linear': Uses a line through the first and last point to extrapolate. 'four-point-linear': Uses a line through the first and second point to extrapolate in front and and a line through the last and second-to-last point in the end. 'global-linear': Uses a linear regression to fit a line through all data points and use it for interpolation.two-point-linear, four-point-linear, global-linear