OpenMS  2.6.0
DatabaseSuitability

Calculates the suitability of a database which was used a for peptide identification search. Also reports the quality of LC-MS spectra.

The metric this tool uses to determine the suitability of a database is based on a de novo model. Therefore it is crucial that your workflow is set up the right way. Above you can see an example.
Most importantly the peptide identification search needs to be done with a combination of the database in question and a de novo "database".
To generate the de novo "database":

For re-ranking all cases where a peptide hit only found in the de novo "database" scores above a peptide hit found in the actual database are checked. In all these cases the cross-correlation scores of those peptide hits are compared. If they are similar enough, the database hit will be re-ranked to be on top of the de novo hit. You can control how much of cases with similar scores will be re-ranked by using the reranking_cutoff_percentile.
For this to work it is important PeptideIndexer ran before. However it is also crucial that no FDR was performed. This tool does this itself and will crash if a q-value is found. You can still control the FDR that you want to establish using the corresponding flag.

Note
For identification search the only supported search engine for the time being is Comet because the Comet cross-correlation score is needed for re-ranking.
You can still uses other search engines and disable the re-ranking via the no_rerank flag in this tool. This will probably result in an underestimated suitability though.

The results are written directly into the console. But you can provide an optional tsv output file where the most important results will be exported to.

This tool uses the metrics and algorithms first presented in:
Assessing protein sequence database suitability using de novo sequencing. Molecular & Cellular Proteomics. January 1, 2020; 19, 1: 198-208. doi:10.1074/mcp.TIR119.001752.
Richard S. Johnson, Brian C. Searle, Brook L. Nunn, Jason M. Gilmore, Molly Phillips, Chris T. Amemiya, Michelle Heck, Michael J. MacCoss.

The command line parameters of this tool are:

DatabaseSuitability -- Computes a suitability score for a database which was used for a peptide identificatio
n search. Also reports the quality of LC-MS spectra.
Full documentation: 
Version: 2.6.0 Sep 30 2020, 12:54:34, Revision: c26f752
To cite OpenMS:
  Rost HL, Sachsenberg T, Aiche S, Bielow C et al.. OpenMS: a flexible open-source software platform for mass spectrometry data analysis. Nat Meth. 2016; 13, 9: 741-748. doi:10.1038/nmeth.3959.
To cite DatabaseSuitability:
  Richard S. Johnson, Brian C. Searle, Brook L. Nunn, Jason M. Gilmore, Molly Phillips, Chris T. Amemiya, Michelle Heck, Michael J. MacCoss. Assessing protein sequence database suitability using de novo sequencing. Molecular & Cellular Proteomics. January 1, 2020; 19, 1: 198-208. doi:10.1074/mcp.TIR119.001752.

Usage:
  DatabaseSuitability <options>

This tool has algorithm parameters that are not shown here! Please check the ini file for a detailed descript
ion or use the --helphelp option.

Options (mandatory options marked with '*'):
  -in_id <file>*     Input idXML file from peptide search with combined database with added de novo peptide. 
                     PeptideIndexer is needed, FDR is forbidden. (valid formats: 'idXML')
  -in_spec <file>*   Input MzML file used for the peptide identification (valid formats: 'mzML')
  -in_novo <file>*   Input idXML file containing de novo peptides (unfiltered) (valid formats: 'idXML')
  -out <file>        Optional tsv output containing database suitability information as well as spectral qual
                     ity. (valid formats: 'tsv')
                     
Common TOPP options:
  -ini <file>        Use the given TOPP INI file
  -threads <n>       Sets the number of threads allowed to be used by the TOPP tool (default: '1')
  -write_ini <file>  Writes the default configuration file
  --help             Shows options
  --helphelp         Shows all options (including advanced)

The following configuration subsections are valid:
 - algorithm   Parameter section for the suitability calculation algorithm

You can write an example INI file using the '-write_ini' option.
Documentation of subsection parameters can be found in the doxygen documentation or the INIFileEditor.
For more information, please consult the online documentation for this tool:
  - 

INI file documentation of this tool:

Legend:
required parameter
advanced parameter
+DatabaseSuitabilityComputes a suitability score for a database which was used for a peptide identification search. Also reports the quality of LC-MS spectra.
version2.6.0 Version of the tool that generated this parameters file.
++1Instance '1' section for 'DatabaseSuitability'
in_id Input idXML file from peptide search with combined database with added de novo peptide. PeptideIndexer is needed, FDR is forbidden.input file*.idXML
in_spec Input MzML file used for the peptide identificationinput file*.mzML
in_novo Input idXML file containing de novo peptides (unfiltered)input file*.idXML
out Optional tsv output containing database suitability information as well as spectral quality.output file*.tsv
log Name of log file (created only when specified)
debug0 Sets the debug level
threads1 Sets the number of threads allowed to be used by the TOPP tool
no_progressfalse Disables progress logging to command linetrue,false
forcefalse Overrides tool-specific checkstrue,false
testfalse Enables the test mode (needed for internal use only)true,false
+++algorithmParameter section for the suitability calculation algorithm
no_rerankfalse Use this flag if you want to disable re-ranking. Cases, where a de novo peptide scores just higher than the database peptide, are overlooked and counted as a de novo hit. This might underestimate the database quality.true,false
reranking_cutoff_percentile0.01 Swap a top-scoring deNovo hit with a lower scoring DB hit if their xcorr score difference is in the given percentile of all score differences between the first two decoy hits of a PSM. The lower the value the lower the decoy cut-off will be. Therefore it will be harder for a lower scoring DB hit to be re-ranked to the top.0.0:1.0
FDR0.01 Filter peptide hits based on this q-value. (e.g., 0.05 = 5 % FDR)0.0:1.0