OpenSwathConfidenceScoring

Computes confidence scores for OpenSwath results.

potential predecessor tools	$\longrightarrow$ OpenSwathConfidenceScoring $\longrightarrow$	potential successor tools
OpenSwathAnalyzer		OpenSwathFeatureXMLToTSV

This is an implementation of the SRM scoring algorithm described in:

Malmstroem, L.; Malmstroem, J.; Selevsek, N.; Rosenberger, G. & Aebersold, R.:
Automated workflow for large-scale selected reaction monitoring experiments.
J. Proteome Res., 2012, 11, 1644-1653

It has been adapted for the scoring of OpenSwath results.

The algorithm compares SRM/MRM features (peak groups) to assays and computes scores for the agreements. Every feature is compared not only to the "true" assay that was used to acquire the corresponding ion chromatograms, but also to a number (parameter decoys) of unrelated - but real - assays selected at random from the assay library (parameter lib). This serves to establish a background distribution of scores, against which the significance of the "true" score can be evaluated. The final confidence value of a feature is the local false discovery rate (FDR), calculated as the fraction of decoy assays that score higher than the "true" assay against the feature. In the output feature map, every feature is annotated with its local FDR in the meta value "local_FDR" (a "userParam" element in the featureXML), and its overall quality is set to "1 - local_FDR".

The agreement of a feature and an assay is assessed based on the difference in retention time (RT) and on the deviation of relative transition intensities. The score S is computed using a binomial generalized linear model (GLM) of the form:

$S = \frac{1}{1 + \exp(-(a + b \cdot \Delta_{RT}^2 + c \cdot d_{int}))}$

The meanings of the model terms are as follows:

$\Delta_{RT}$ : Observed retention times are first mapped to the scale of the assays (parameter trafo), then all RTs are scaled to the range 0 to 100 (based on the lowest/highest RT in the assay library). $\Delta_{RT}$ is the absolute difference of the scaled RTs; note that this is squared in the scoring model.

$d_{int}$ : To compute the intensity distance, the n (advanced parameter transitions) most intensive transitions of the feature are selected. For comparing against the "true" assay, the same transitions are considered; otherwise, the same number of most intensive transitions from the decoy assay. Transition intensities are scaled to a total of 1 per feature/assay and are ordered by the product (Q3) m/z value. Then the Manhattan distance of the intensity vectors is calculated (Malmstroem et al. used the RMSD instead, which has been replaced here to be independent of the number of transitions).

$ a, b, c $ : Model coefficients, stored in the advanced parameters GLM:intercept, GLM:delta_rt, and GLM:dist_int. The default values were estimated based on the training dataset used in the Malmstroem et al. study, reprocessed with the OpenSwath pipeline.

In addition to the local FDRs, the scores of features against their "true" assays are recorded in the output - in the meta value "GLM_score" of the respective feature.

The command line parameters of this tool are:

OpenSwathConfidenceScoring -- Compute confidence scores for OpenSwath results
Version: 2.3.0 Jan  9 2018, 17:46:23, Revision: 38ae115

Usage:
  OpenSwathConfidenceScoring <options>

Options (mandatory options marked with '*'):
  -in <file>*            Input file (OpenSwath results) (valid formats: 'featureXML')
  -lib <file>*           Assay library (valid formats: 'traML')
  -out <file>*           Output file (results with confidence scores) (valid formats: 'featureXML')
  -trafo <file>          Retention time transformation (valid formats: 'trafoXML')
  -decoys <number>       Number of decoy assays to select from the library for every true assay (0 for "all")
                         (default: '1000' min: '0')
  -transitions <number>  Number of transitions per feature to consider (highest intensities first; 0 for "all
                         ") (default: '6' min: '0')
                         
Common TOPP options:
  -ini <file>            Use the given TOPP INI file
  -threads <n>           Sets the number of threads allowed to be used by the TOPP tool (default: '1')
  -write_ini <file>      Writes the default configuration file
  --help                 Shows options
  --helphelp             Shows all options (including advanced)

INI file documentation of this tool:

Legend:

required parameter

advanced parameter

+OpenSwathConfidenceScoringCompute confidence scores for OpenSwath results

version2.3.0 Version of the tool that generated this parameters file.

++1Instance '1' section for 'OpenSwathConfidenceScoring'

in Input file (OpenSwath results)input file*.featureXML

lib Assay libraryinput file*.traML

out Output file (results with confidence scores)output file*.featureXML

trafo Retention time transformationinput file*.trafoXML

decoys1000 Number of decoy assays to select from the library for every true assay (0 for "all")0:∞

transitions6 Number of transitions per feature to consider (highest intensities first; 0 for "all")0:∞

log Name of log file (created only when specified)

debug0 Sets the debug level

threads1 Sets the number of threads allowed to be used by the TOPP tool

no_progressfalse Disables progress logging to command linetrue,false

forcefalse Overwrite tool specific checks.true,false

testfalse Enables the test mode (needed for internal use only)true,false

+++GLMParameters of the binomial GLM

intercept3.87333466 Intercept term

delta_rt-0.02898629 Coefficient of retention time difference

dist_int-7.75880768 Coefficient of intensity distance