Group corresponding features across labelfree experiments.
Group corresponding features across labelfree experiments. This tool produces results similar to those of FeatureLinkerUnlabeledQT, since it optimizes a similar objective. However, this algorithm is more efficient than FLQT as it uses a kd-tree for fast 2D region queries in m/z - RT space and a sorted binary search tree to choose the best cluster among the remaining ones in O(1). Insertion and searching in this tree have O(log n) runtime. KD-tree insertion and search have O(log n) runtime. The overall complexity of the algorithm is O(n log(n)) time and O(n) space.
In practice, the runtime of FeatureLinkerUnlabeledQT is often not significantly worse than that of FeatureLinkerUnlabeledKD if the datasets are relatively small and/or the value of the -nr_partitions parameter is chosen large enough. If, however, the datasets are very large, and especially if they are so dense that a partitioning based on the specified m/z tolerance is not possible anymore, then this algorithm becomes orders of magnitudes faster than FLQT.
Notably, this algorithm can be used to align featureXML files containing unassembled mass traces (as produced by MassTraceExtractor), which is often impossible for reasonably large datasets using other aligners, as these datasets tend to be too dense and hence cannot be partitioned.
Prior to feature linking, this tool performs an (optional) retention time transformation on the features using LOWESS regression in order to minimize retention time differences between corresponding features across different maps. These transformed RTs are used only internally. In the results, original RTs will be reported.
Legend:
required parameter
advanced parameter
+FeatureLinkerUnlabeledKDGroups corresponding features from multiple maps.
version2.3.0
Version of the tool that generated this parameters file.
++1Instance '1' section for 'FeatureLinkerUnlabeledKD'
in[]
input files separated by blanksinput file*.featureXML,*.consensusXML
out
Output fileoutput file*.consensusXML
design
input file containing the experimental designinput file*.tsv
keep_subelementsfalse
For consensusXML input only: If set, the sub-features of the inputs are transferred to the output.true,false
log
Name of log file (created only when specified)
debug0
Sets the debug level
threads1
Sets the number of threads allowed to be used by the TOPP tool
no_progressfalse
Disables progress logging to command linetrue,false
forcefalse
Overwrite tool specific checks.true,false
testfalse
Enables the test mode (needed for internal use only)true,false
+++algorithmAlgorithm parameters section
mz_unitppm
Unit of m/z toleranceppm,Da
nr_partitions100
Number of partitions in m/z space1:∞
++++warp
enabledtrue
Whether or not to internally warp feature RTs using LOWESS transformation before linking (reported RTs in results will always be the original RTs)true,false
rt_tol100
Width of RT tolerance window (sec)0:∞
mz_tol5
m/z tolerance (in ppm or Da)0:∞
max_pairwise_log_fc0.5
Maximum absolute log10 fold change between two compatible signals during compatibility graph construction. Two signals from different maps will not be connected by an edge in the compatibility graph if absolute log fold change exceeds this limit (they might still end up in the same connected component, however). Note: this does not limit fold changes in the linking stage, only during RT alignment, where we try to find high-quality alignment anchor points. Setting this to a value < 0 disables the FC check.
min_rel_cc_size0.5
Only connected components containing compatible features from at least max(2, (warp_min_occur * number_of_input_maps)) input maps are considered for computing the warping function0:1
max_nr_conflicts0
Allow up to this many conflicts (features from the same map) per connected component to be used for alignment (-1 means allow any number of conflicts)-1:∞
++++link
rt_tol30
Width of RT tolerance window (sec)0:∞
mz_tol10
m/z tolerance (in ppm or Da)0:∞
++++distance_RTDistance component based on RT differences
exponent1
Normalized RT differences ([0-1], relative to 'max_difference') are raised to this power (using 1 or 2 will be fast, everything else is REALLY slow)0:∞
weight1
Final RT distances are weighted by this factor0:∞
++++distance_MZDistance component based on m/z differences
exponent2
Normalized ([0-1], relative to 'max_difference') m/z differences are raised to this power (using 1 or 2 will be fast, everything else is REALLY slow)0:∞
weight1
Final m/z distances are weighted by this factor0:∞
++++distance_intensityDistance component based on differences in relative intensity (usually relative to highest peak in the whole data set)
exponent1
Differences in relative intensity ([0-1]) are raised to this power (using 1 or 2 will be fast, everything else is REALLY slow)0:∞
weight1
Final intensity distances are weighted by this factor0:∞
log_transformenabled
Log-transform intensities? If disabled, d = |int_f2 - int_f1| / int_max. If enabled, d = |log(int_f2 + 1) - log(int_f1 + 1)| / log(int_max + 1))enabled,disabled
++++LOWESSLOWESS parameters for internal RT transformations (only relevant if 'warp:enabled' is set to 'true')
span0.666666666666667
Fraction of datapoints (f) to use for each local regression (determines the amount of smoothing). Choosing this parameter in the range .2 to .8 usually results in a good fit.0:1
num_iterations3
Number of robustifying iterations for lowess fitting.0:∞
delta-1
Nonnegative parameter which may be used to save computations (recommended value is 0.01 of the range of the input, e.g. for data ranging from 1000 seconds to 2000 seconds, it could be set to 10). Setting a negative value will automatically do this.
interpolation_typecspline
Method to use for interpolation between datapoints computed by lowess. 'linear': Linear interpolation. 'cspline': Use the cubic spline for interpolation. 'akima': Use an akima spline for interpolationlinear,cspline,akima
extrapolation_typefour-point-linear
Method to use for extrapolation outside the data range. 'two-point-linear': Uses a line through the first and last point to extrapolate. 'four-point-linear': Uses a line through the first and second point to extrapolate in front and and a line through the last and second-to-last point in the end. 'global-linear': Uses a linear regression to fit a line through all data points and use it for interpolation.two-point-linear,four-point-linear,global-linear