Home  · Classes  · Annotated Classes  · Modules  · Members  · Namespaces  · Related Pages
FileFilter

Extracts portions of the data from an mzML, featureXML or consensusXML file.

pot. predecessor tools $ \longrightarrow $ FileFilter $ \longrightarrow $ pot. successor tools
any tool yielding output
in mzML, featureXML
or consensusXML format

any tool that profits on reduced input

With this tool it is possible to extract m/z, retention time and intensity ranges from an input file and to write all data that lies within the given ranges to an output file.

Depending on the input file type, additional specific operations are possible:

The priority of the id-flags is (decreasing order): remove_annotated_features / remove_unannotated_features -> remove_clashes -> keep_best_score_id -> sequences_whitelist / accessions_whitelist

MS2 and higher spectra can be filtered according to precursor m/z (see 'peak_options:pc_mz_range'). This flag can be combined with 'rt' range to filter precursors by RT and m/z. If you want to extract an MS1 region with untouched MS2 spectra included, you will need to split the dataset by MS level, then use the 'mz' option for MS1 data and 'peak_options:pc_mz_range' for MS2 data. Afterwards merge the two files again. RT can be filtered at any step.

Note
For filtering peptide/protein identification data, see the IDFilter tool.
Currently mzIdentML (mzid) is not directly supported as an input/output format of this tool. Convert mzid files to/from idXML using IDFileConverter if necessary.

The command line parameters of this tool are:

FileFilter -- Extracts or manipulates portions of data from peak, feature or consensus-feature files.
Version: 2.3.0 Jan  9 2018, 17:46:23, Revision: 38ae115

Usage:
  FileFilter <options>

This tool has algorithm parameters that are not shown here! Please check the ini file for a detailed descript
ion or use the --helphelp option.

Options (mandatory options marked with '*'):
  -in <file>*                                            Input file (valid formats: 'mzML', 'featureXML', 
                                                         'consensusXML')
  -in_type <type>                                        Input file type -- default: determined from file 
                                                         extension or content (valid: 'mzML', 'featureXML',
                                                         'consensusXML')
  -out <file>*                                           Output file (valid formats: 'mzML', 'featureXML', 
                                                         'consensusXML')
  -out_type <type>                                       Output file type -- default: determined from file 
                                                         extension or content (valid: 'mzML', 'featureXML',
                                                         'consensusXML')
  -rt [min]:[max]                                        Retention time range to extract (default: ':')
  -mz [min]:[max]                                        M/z range to extract (applies to ALL ms levels!) 
                                                         (default: ':')
  -int [min]:[max]                                       Intensity range to extract (default: ':')
  -sort                                                  Sorts the output according to RT and m/z.

Peak data options:
  -peak_options:sn <s/n ratio>                           Write peaks with S/N > 'sn' values only (default: 
                                                         '0')
  -peak_options:rm_pc_charge i j ...                     Remove MS(2) spectra with these precursor charges. 
                                                         All spectra without precursor are kept!
  -peak_options:pc_mz_range [min]:[max]                  MSn (n>=2) precursor filtering according to their 
                                                         m/z value. Do not use this flag in conjunction with
                                                         'mz', unless you want to actually remove peaks in
                                                         spectra (see 'mz'). RT filtering is covered by 'rt'
                                                         and compatible with this flag. (default: ':')
  -peak_options:pc_mz_list mz_1 mz_2 ...                 List of m/z values. If a precursor window covers 
                                                         ANY of these values, the corresponding MS/MS spectru
                                                         m will be kept.
  -peak_options:level i j ...                            MS levels to extract (default: '[1 2 3]')
  -peak_options:sort_peaks                               Sorts the peaks according to m/z
  -peak_options:no_chromatograms                         No conversion to space-saving real chromatograms, 
                                                         e.g. from SRM scans
  -peak_options:remove_chromatograms                     Removes chromatograms stored in a file
  -peak_options:mz_precision 32 or 64                    Store base64 encoded m/z data using 32 or 64 bit 
                                                         precision (default: '64' valid: '32', '64')
  -peak_options:int_precision 32 or 64                   Store base64 encoded intensity data using 32 or 64 
                                                         bit precision (default: '32' valid: '32', '64')
  -peak_options:indexed_file true or false               Whether to add an index to the file when writing 
                                                         (default: 'false' valid: 'true', 'false')
  -peak_options:zlib_compression true or false           Whether to store data with zlib compression (lossles
                                                         s compression) (default: 'false' valid: 'true', 'fal
                                                         se')

Numpress compression for peak data:
  -peak_options:numpress:masstime <compression_scheme>   Apply MS Numpress compression algorithms in m/z or 
                                                         rt dimension (recommended: linear) (default: 'none'
                                                         valid: 'none', 'linear', 'pic', 'slof')
  -peak_options:numpress:masstime_error <error>          Maximal allowable error in m/z or rt dimension (defa
                                                         ult 10 ppm at 100 m/z; set to 0.5 for pic or negativ
                                                         e to disable check and speed up conversion) (default
                                                         : '0.0001')
  -peak_options:numpress:intensity <compression_scheme>  Apply MS Numpress compression algorithms in intensit
                                                         y dimension (recommended: slof or pic) (default:
                                                         'none' valid: 'none', 'linear', 'pic', 'slof')
  -peak_options:numpress:intensity_error <error>         Maximal allowable error in intensity dimension (set 
                                                         to 0.5 for pic or negative to disable check and spee
                                                         d up conversion) (default: '0.0001')

Remove spectra or select spectra (removing all others) with certain properties:
  -spectra:remove_zoom                                   Remove zoom (enhanced resolution) scans
  -spectra:remove_mode <mode>                            Remove scans by scan mode (valid: 'Unknown', 'MassSp
                                                         ectrum', 'MS1Spectrum', 'MSnSpectrum', 'SelectedIonM
                                                         onitoring', 'SelectedReactionMonitoring', 'Consecuti
                                                         veReactionMonitoring', 'ConstantNeutralGain', 'Const
                                                         antNeutralLoss', 'Precursor', 'EnhancedMultiplyCharg
                                                         ed', 'TimeDelayedFragmentation', 'ElectromagneticRad
                                                         iation', 'Emission', 'Absorption')

                                                         

Remove spectra or select spectra (removing all others) with certain properties:
  -spectra:remove_activation <activation>                Remove MSn scans where any of its precursors feature
                                                         s a certain activation method (valid: 'Collision-ind
                                                         uced dissociation', 'Post-source decay', 'Plasma
                                                         desorption', 'Surface-induced dissociation', 'Blackb
                                                         ody infrared radiative dissociation', 'Electron capt
                                                         ure dissociation', 'Infrared multiphoton dissociatio
                                                         n', 'Sustained off-resonance irradiation', 'High-ene
                                                         rgy collision-induced dissociation', 'Low-energy
                                                         ...
                                                         iation')
  -spectra:remove_collision_energy [min]:[max]           Remove MSn scans with a collision energy in the give
                                                         n interval (default: ':')
  -spectra:remove_isolation_window_width [min]:[max]     Remove MSn scans whose isolation window width is in 
                                                         the given interval (default: ':')

                                                         

Remove spectra or select spectra (removing all others) with certain properties:
  -spectra:select_zoom                                   Select zoom (enhanced resolution) scans
  -spectra:select_mode <mode>                            Selects scans by scan mode
                                                         (valid: 'Unknown', 'MassSpectrum', 'MS1Spectrum',
                                                         'MSnSpectrum', 'SelectedIonMonitoring', 'SelectedRea
                                                         ctionMonitoring', 'ConsecutiveReactionMonitoring',
                                                         'ConstantNeutralGain', 'ConstantNeutralLoss', 'Precu
                                                         rsor', 'EnhancedMultiplyCharged', 'TimeDelayedFragme
                                                         ntation', 'ElectromagneticRadiation', 'Emission',
                                                         'Absorption')
  -spectra:select_activation <activation>                Retain MSn scans where any of its precursors feature
                                                         s a certain activation method (valid: 'Collision-ind
                                                         uced dissociation', 'Post-source decay', 'Plasma
                                                         desorption', 'Surface-induced dissociation', 'Blackb
                                                         ody infrared radiative dissociation', 'Electron capt
                                                         ure dissociation', 'Infrared multiphoton dissociatio
                                                         n', 'Sustained off-resonance irradiation', 'High-ene
                                                         rgy collision-induced dissociation', 'Low-energy
                                                         ...
                                                         iation')
  -spectra:select_collision_energy [min]:[max]           Select MSn scans with a collision energy in the give
                                                         n interval (default: ':')
  -spectra:select_isolation_window_width [min]:[max]     Select MSn scans whose isolation window width is in 
                                                         the given interval (default: ':')

                                                         

Remove spectra or select spectra (removing all others) with certain properties:
  -spectra:select_polarity <polarity>                    Retain MSn scans with a certain scan polarity (valid
                                                         : 'unknown', 'positive', 'negative')

                                                         

Feature data options:
  -feature:q [min]:[max]                                 Overall quality range to extract [0:1] (default: 
                                                         ':')

                                                         

Consensus feature data options:
  -consensus:map i j ...                                 Maps to be extracted from a consensus
  -consensus:map_and                                     Consensus features are kept only if they contain 
                                                         exactly one feature from each map (as given above
                                                         in 'map')

Black or white listing of of MS2 spectra by consensus features:
  -consensus:blackorwhitelist:blacklist                  True: remove matched MS2. False: retain matched MS2 
                                                         spectra. Other levels are kept (default: 'true' vali
                                                         d: 'false', 'true')
  -consensus:blackorwhitelist:file <file>                Input file containing consensus features whose corre
                                                         sponding MS2 spectra should be removed from the mzML
                                                         file!
                                                         Matching tolerances are taken from 'consensus:black
                                                         orwhitelist:rt' and 'consensus:blackorwhitelist:mz'
                                                         options.
                                                         If consensus:blackorwhitelist:maps is specified, on
                                                         ly these will be used.
                                                         (valid formats: 'consensusXML')
  -consensus:blackorwhitelist:maps i j ...               Maps used for black/white list filtering
  -consensus:blackorwhitelist:rt tolerance               Retention tolerance [s] for precursor to consensus 
                                                         feature position (default: '60' min: '0')
  -consensus:blackorwhitelist:mz tolerance               M/z tolerance [Th] for precursor to consensus featur
                                                         e position (default: '0.01' min: '0')
  -consensus:blackorwhitelist:use_ppm_tolerance          If ppm tolerance should be used. Otherwise Da are 
                                                         used. (default: 'false' valid: 'false', 'true')

                                                         

Feature & Consensus data options:
  -f_and_c:charge [min]:[max]                            Charge range to extract (default: ':')
  -f_and_c:size [min]:[max]                              Size range to extract (default: ':')
  -f_and_c:remove_meta <name> 'lt|eq|gt' <value>         Expects a 3-tuple (=3 entries in the list), i.e. 
                                                         <name> 'lt|eq|gt' <value>; the first is the name of
                                                         meta value, followed by the comparison operator (equ
                                                         al, less or greater) and the value to compare to.
                                                         All comparisons are done after converting the given
                                                         value to the corresponding data value type of the
                                                         meta value (for lists, this simply compares length,
                                                         not content!)!

                                                         

ID options. The Priority of the id-flags is: remove_annotated_features / remove_unannotated_features -> remov
e_clashes -> keep_best_score_id -> sequences_whitelist / accessions_whitelist:
  -id:keep_best_score_id                                 In case of multiple peptide identifications, keep 
                                                         only the id with best score
  -id:sequences_whitelist <sequence>                     Keep only features with white listed sequences, e.g.
                                                         LYSNLVER or the modification (Oxidation)
  -id:accessions_whitelist <accessions>                  Keep only features with white listed accessions, 
                                                         e.g. sp|P02662|CASA1_BOVIN
  -id:remove_annotated_features                          Remove features with annotations
  -id:remove_unannotated_features                        Remove features without annotations
  -id:remove_unassigned_ids                              Remove unassigned peptide identifications
  -id:blacklist <file>                                   Input file containing MS2 identifications whose corr
                                                         esponding MS2 spectra should be removed from the
                                                         mzML file!
                                                         Matching tolerances are taken from 'id:rt' and 'id:
                                                         mz' options.
                                                         This tool will require all IDs to be matched to an
                                                         MS2 spectrum, and quit with error otherwise. Use
                                                         'id:blacklist_imperfect' to allow for mismatches.
                                                         (valid formats: 'idXML')
  -id:rt tolerance                                       Retention tolerance [s] for precursor to id position
                                                         (default: '0.1' min: '0')
  -id:mz tolerance                                       M/z tolerance [Th] for precursor to id position (def
                                                         ault: '0.001' min: '0')
  -id:blacklist_imperfect                                Allow for mismatching precursor positions (see 'id:b
                                                         lacklist')

                                                         
                                                         
Common TOPP options:
  -ini <file>                                            Use the given TOPP INI file
  -threads <n>                                           Sets the number of threads allowed to be used by 
                                                         the TOPP tool (default: '1')
  -write_ini <file>                                      Writes the default configuration file
  --help                                                 Shows options
  --helphelp                                             Shows all options (including advanced)

The following configuration subsections are valid:
 - algorithm   S/N algorithm section

You can write an example INI file using the '-write_ini' option.
Documentation of subsection parameters can be found in the doxygen documentation or the INIFileEditor.
Have a look at the OpenMS documentation for more information.

INI file documentation of this tool:

Legend:
required parameter
advanced parameter
+FileFilterExtracts or manipulates portions of data from peak, feature or consensus-feature files.
version2.3.0 Version of the tool that generated this parameters file.
++1Instance '1' section for 'FileFilter'
in Input fileinput file*.mzML,*.featureXML,*.consensusXML
in_type Input file type -- default: determined from file extension or contentmzML,featureXML,consensusXML
out Output fileoutput file*.mzML,*.featureXML,*.consensusXML
out_type Output file type -- default: determined from file extension or contentmzML,featureXML,consensusXML
rt: Retention time range to extract
mz: m/z range to extract (applies to ALL ms levels!)
int: Intensity range to extract
sortfalse Sorts the output according to RT and m/z.true,false
log Name of log file (created only when specified)
debug0 Sets the debug level
threads1 Sets the number of threads allowed to be used by the TOPP tool
no_progressfalse Disables progress logging to command linetrue,false
forcefalse Overwrite tool specific checks.true,false
testfalse Enables the test mode (needed for internal use only)true,false
+++peak_optionsPeak data options
sn0 Write peaks with S/N > 'sn' values only
rm_pc_charge[] Remove MS(2) spectra with these precursor charges. All spectra without precursor are kept!
pc_mz_range: MSn (n>=2) precursor filtering according to their m/z value. Do not use this flag in conjunction with 'mz', unless you want to actually remove peaks in spectra (see 'mz'). RT filtering is covered by 'rt' and compatible with this flag.
pc_mz_list[] List of m/z values. If a precursor window covers ANY of these values, the corresponding MS/MS spectrum will be kept.
level[1, 2, 3] MS levels to extract
sort_peaksfalse Sorts the peaks according to m/ztrue,false
no_chromatogramsfalse No conversion to space-saving real chromatograms, e.g. from SRM scanstrue,false
remove_chromatogramsfalse Removes chromatograms stored in a filetrue,false
mz_precision64 Store base64 encoded m/z data using 32 or 64 bit precision32,64
int_precision32 Store base64 encoded intensity data using 32 or 64 bit precision32,64
indexed_filefalse Whether to add an index to the file when writingtrue,false
zlib_compressionfalse Whether to store data with zlib compression (lossless compression)true,false
++++numpressNumpress compression for peak data
masstimenone Apply MS Numpress compression algorithms in m/z or rt dimension (recommended: linear)none,linear,pic,slof
masstime_error0.0001 Maximal allowable error in m/z or rt dimension (default 10 ppm at 100 m/z; set to 0.5 for pic or negative to disable check and speed up conversion)
intensitynone Apply MS Numpress compression algorithms in intensity dimension (recommended: slof or pic)none,linear,pic,slof
intensity_error0.0001 Maximal allowable error in intensity dimension (set to 0.5 for pic or negative to disable check and speed up conversion)
+++spectraRemove spectra or select spectra (removing all others) with certain properties
remove_zoomfalse Remove zoom (enhanced resolution) scanstrue,false
remove_mode Remove scans by scan modeUnknown,MassSpectrum,MS1Spectrum,MSnSpectrum,SelectedIonMonitoring,SelectedReactionMonitoring,ConsecutiveReactionMonitoring,ConstantNeutralGain,ConstantNeutralLoss,Precursor,EnhancedMultiplyCharged,TimeDelayedFragmentation,ElectromagneticRadiation,Emission,Absorption
remove_activation Remove MSn scans where any of its precursors features a certain activation methodCollision-induced dissociation,Post-source decay,Plasma desorption,Surface-induced dissociation,Blackbody infrared radiative dissociation,Electron capture dissociation,Infrared multiphoton dissociation,Sustained off-resonance irradiation,High-energy collision-induced dissociation,Low-energy collision-induced dissociation,Photodissociation,Electron transfer dissociation,Pulsed q dissociation
remove_collision_energy: Remove MSn scans with a collision energy in the given interval
remove_isolation_window_width: Remove MSn scans whose isolation window width is in the given interval
select_zoomfalse Select zoom (enhanced resolution) scanstrue,false
select_mode Selects scans by scan mode
Unknown,MassSpectrum,MS1Spectrum,MSnSpectrum,SelectedIonMonitoring,SelectedReactionMonitoring,ConsecutiveReactionMonitoring,ConstantNeutralGain,ConstantNeutralLoss,Precursor,EnhancedMultiplyCharged,TimeDelayedFragmentation,ElectromagneticRadiation,Emission,Absorption
select_activation Retain MSn scans where any of its precursors features a certain activation methodCollision-induced dissociation,Post-source decay,Plasma desorption,Surface-induced dissociation,Blackbody infrared radiative dissociation,Electron capture dissociation,Infrared multiphoton dissociation,Sustained off-resonance irradiation,High-energy collision-induced dissociation,Low-energy collision-induced dissociation,Photodissociation,Electron transfer dissociation,Pulsed q dissociation
select_collision_energy: Select MSn scans with a collision energy in the given interval
select_isolation_window_width: Select MSn scans whose isolation window width is in the given interval
select_polarity Retain MSn scans with a certain scan polarityunknown,positive,negative
+++featureFeature data options
q: Overall quality range to extract [0:1]
+++consensusConsensus feature data options
map[] Maps to be extracted from a consensus
map_andfalse Consensus features are kept only if they contain exactly one feature from each map (as given above in 'map')true,false
++++blackorwhitelistBlack or white listing of of MS2 spectra by consensus features
blacklisttrue True: remove matched MS2. False: retain matched MS2 spectra. Other levels are keptfalse,true
file Input file containing consensus features whose corresponding MS2 spectra should be removed from the mzML file!
Matching tolerances are taken from 'consensus:blackorwhitelist:rt' and 'consensus:blackorwhitelist:mz' options.
If consensus:blackorwhitelist:maps is specified, only these will be used.
input file*.consensusXML
maps[] Maps used for black/white list filtering
rt60 Retention tolerance [s] for precursor to consensus feature position0:∞
mz0.01 m/z tolerance [Th] for precursor to consensus feature position0:∞
use_ppm_tolerancefalse If ppm tolerance should be used. Otherwise Da are used.false,true
+++f_and_cFeature & Consensus data options
charge: Charge range to extract
size: Size range to extract
remove_meta[] Expects a 3-tuple (=3 entries in the list), i.e. 'lt|eq|gt' ; the first is the name of meta value, followed by the comparison operator (equal, less or greater) and the value to compare to. All comparisons are done after converting the given value to the corresponding data value type of the meta value (for lists, this simply compares length, not content!)!
+++idID options. The Priority of the id-flags is: remove_annotated_features / remove_unannotated_features -> remove_clashes -> keep_best_score_id -> sequences_whitelist / accessions_whitelist
remove_clashesfalse Remove features with id clashes (different sequences mapped to one feature)true,false
keep_best_score_idfalse in case of multiple peptide identifications, keep only the id with best scoretrue,false
sequences_whitelist[] keep only features with white listed sequences, e.g. LYSNLVER or the modification (Oxidation)
accessions_whitelist[] keep only features with white listed accessions, e.g. sp|P02662|CASA1_BOVIN
remove_annotated_featuresfalse Remove features with annotationstrue,false
remove_unannotated_featuresfalse Remove features without annotationstrue,false
remove_unassigned_idsfalse Remove unassigned peptide identificationstrue,false
blacklist Input file containing MS2 identifications whose corresponding MS2 spectra should be removed from the mzML file!
Matching tolerances are taken from 'id:rt' and 'id:mz' options.
This tool will require all IDs to be matched to an MS2 spectrum, and quit with error otherwise. Use 'id:blacklist_imperfect' to allow for mismatches.
input file*.idXML
rt0.1 Retention tolerance [s] for precursor to id position0:∞
mz0.001 m/z tolerance [Th] for precursor to id position0:∞
blacklist_imperfectfalse Allow for mismatching precursor positions (see 'id:blacklist')true,false
+++algorithmS/N algorithm section
++++SignalToNoise
max_intensity-1 maximal intensity considered for histogram construction. By default, it will be calculated automatically (see auto_mode). Only provide this parameter if you know what you are doing (and change 'auto_mode' to '-1')! All intensities EQUAL/ABOVE 'max_intensity' will be added to the LAST histogram bin. If you choose 'max_intensity' too small, the noise estimate might be too small as well. If chosen too big, the bins become quite large (which you could counter by increasing 'bin_count', which increases runtime). In general, the Median-S/N estimator is more robust to a manual max_intensity than the MeanIterative-S/N.-1:∞
auto_max_stdev_factor3 parameter for 'max_intensity' estimation (if 'auto_mode' == 0): mean + 'auto_max_stdev_factor' * stdev0:999
auto_max_percentile95 parameter for 'max_intensity' estimation (if 'auto_mode' == 1): auto_max_percentile th percentile0:100
auto_mode0 method to use to determine maximal intensity: -1 --> use 'max_intensity'; 0 --> 'auto_max_stdev_factor' method (default); 1 --> 'auto_max_percentile' method-1:1
win_len200 window length in Thomson1:∞
bin_count30 number of bins for intensity values3:∞
min_required_elements10 minimum number of elements required in a window (otherwise it is considered sparse)1:∞
noise_for_empty_window1e+20 noise value used for sparse windows
write_log_messagestrue Write out log messages in case of sparse windows or median in rightmost histogram bintrue,false

For the parameters of the S/N algorithm section see the class documentation there:
peak_options:sn

Todo:
add tests for selecting modes (port remove modes) (Andreas)

OpenMS / TOPP release 2.3.0 Documentation generated on Tue Jan 9 2018 18:22:06 using doxygen 1.8.13