OpenMS
DecoyHelper Class Reference

Helper class for calculations on decoy proteins. More...

#include <OpenMS/DATASTRUCTURES/FASTAContainer.h>

Collaboration diagram for DecoyHelper:
[legend]

Classes

struct  DecoyStatistics
 struct for intermediate results needed for calculations on decoy proteins More...
 
struct  Result
 

Static Public Member Functions

template<typename T >
static Result findDecoyString (FASTAContainer< T > &proteins)
 Heuristic to determine the decoy string given a set of protein names. More...
 
template<typename T >
static DecoyStatistics countDecoys (FASTAContainer< T > &proteins)
 Function to count the occurrences of decoy strings in a given set of protein names. More...
 

Static Public Attributes

static const std::vector< std::string > affixes = { "decoy", "dec", "reverse", "rev", "reversed", "__id_decoy", "xxx", "shuffled", "shuffle", "pseudo", "random" }
 
static const std::string regexstr_prefix = std::string("^(") + ListUtils::concatenate<std::string>(affixes, "_*|") + "_*)"
 
static const std::string regexstr_suffix = std::string("(_") + ListUtils::concatenate<std::string>(affixes, "*|_") + ")$"
 

Private Types

using DecoyStringToAffixCount = std::unordered_map< std::string, std::pair< Size, Size > >
 
using CaseInsensitiveToCaseSensitiveDecoy = std::unordered_map< std::string, std::string >
 

Detailed Description

Helper class for calculations on decoy proteins.

Member Typedef Documentation

◆ CaseInsensitiveToCaseSensitiveDecoy

using CaseInsensitiveToCaseSensitiveDecoy = std::unordered_map<std::string, std::string>
private

◆ DecoyStringToAffixCount

using DecoyStringToAffixCount = std::unordered_map<std::string, std::pair<Size, Size> >
private

Member Function Documentation

◆ countDecoys()

◆ findDecoyString()

static Result findDecoyString ( FASTAContainer< T > &  proteins)
inlinestatic

Heuristic to determine the decoy string given a set of protein names.

For tested decoy strings see DecoyHelper::affixes. Both prefix and suffix is tested and if one of the candidates above is found in at least 40% of all proteins, it is returned as the winner (see DecoyHelper::Result).

References DecoyHelper::DecoyStatistics::all_prefix_occur, DecoyHelper::DecoyStatistics::all_proteins_count, DecoyHelper::DecoyStatistics::all_suffix_occur, DecoyHelper::countDecoys(), DecoyHelper::DecoyStatistics::decoy_case_sensitive, DecoyHelper::DecoyStatistics::decoy_count, OPENMS_LOG_DEBUG, OPENMS_LOG_ERROR, and OPENMS_LOG_WARN.

Member Data Documentation

◆ affixes

const std::vector<std::string> affixes = { "decoy", "dec", "reverse", "rev", "reversed", "__id_decoy", "xxx", "shuffled", "shuffle", "pseudo", "random" }
inlinestatic

◆ regexstr_prefix

const std::string regexstr_prefix = std::string("^(") + ListUtils::concatenate<std::string>(affixes, "_*|") + "_*)"
inlinestatic

◆ regexstr_suffix

const std::string regexstr_suffix = std::string("(_") + ListUtils::concatenate<std::string>(affixes, "*|_") + ")$"
inlinestatic