Home  · Classes  · Annotated Classes  · Modules  · Members  · Namespaces  · Related Pages
Public Member Functions | Private Types | Private Attributes | List of all members
PeptideProteinResolution Class Reference

Resolves shared peptides based on protein scores. More...

#include <OpenMS/ANALYSIS/ID/PeptideProteinResolution.h>

Public Member Functions

 PeptideProteinResolution (bool statistics=false)
 
void buildGraph (const ProteinIdentification &protein, const std::vector< PeptideIdentification > &peptides)
 
void resolveGraph (ProteinIdentification &protein, std::vector< PeptideIdentification > &peptides)
 
ConnectedComponent findConnectedComponent (Size &root_prot_grp)
 
void resolveConnectedComponent (ConnectedComponent &conn_comp, ProteinIdentification &protein, std::vector< PeptideIdentification > &peptides)
 

Private Types

typedef std::map< Size, std::set< Size > > IndexMap_
 

Private Attributes

IndexMap_ indist_prot_grp_to_pep_
 mapping indist. protein group indices -> peptide identification indices More...
 
IndexMap_ pep_to_indist_prot_grp_
 mapping indist. protein group indices <- peptide identification indices More...
 
std::map< String, Sizeprot_acc_to_indist_prot_grp_
 
bool statistics_
 log debug information? More...
 

Detailed Description

Resolves shared peptides based on protein scores.

Resolves connected components of the bipartite protein-peptide graph based on protein probabilities/scores and adds them as additional protein_groups to the protein identification run processed. Thereby greedily assigns shared peptides in this component uniquely to the proteins of the current best indistinguishable protein group, until every peptide is uniquely assigned. This effectively allows more peptides to be used in ProteinQuantifier at the cost of potentially additional noise in the peptides quantities. In accordance with most state-of-the-art protein inference tools, only the best hit (PSM) for a peptide ID is considered. Probability ties are currently resolved by taking the first occurring protein of the component.

Implement probability tie resolution.

Improvement:
The class could provide iterator for ConnectedComponents in the future. One could extend the graph to include all PeptideHits (not only the best). It becomes a tripartite graph with larger connected components then. Maybe extend it to work with MS1 features. Separate resolution and adding groups to output.

Member Typedef Documentation

◆ IndexMap_

typedef std::map<Size, std::set<Size> > IndexMap_
private

Constructor & Destructor Documentation

◆ PeptideProteinResolution()

PeptideProteinResolution ( bool  statistics = false)

Constructor

Parameters
statisticsSpecifies if the class stores/outputs info about statistics

Member Function Documentation

◆ buildGraph()

void buildGraph ( const ProteinIdentification protein,
const std::vector< PeptideIdentification > &  peptides 
)

Initialize and store the graph (= maps)

Parameters
proteinProteinIdentification object storing IDs and groups
peptidesvector of ProteinIdentifications with links to the proteins

◆ findConnectedComponent()

ConnectedComponent findConnectedComponent ( Size root_prot_grp)

Does a BFS on the two maps (= two parts of the graph; indist. prot. groups and peptides), switching from one to the other in each step.

Parameters
root_prot_grpStarts the BFS at this protein group index
Returns
Returns a Connected Component as set of group and peptide indices.

◆ resolveConnectedComponent()

void resolveConnectedComponent ( ConnectedComponent conn_comp,
ProteinIdentification protein,
std::vector< PeptideIdentification > &  peptides 
)

Resolves connected components based on Fido probabilities and adds them as additional protein_groups to the output idXML. Thereby greedily assigns shared peptides in this component uniquely to the proteins of the current BEST INDISTINGUISHABLE protein group, ready to be used in ProteinQuantifier then. This is achieved by removing all other evidence from the input PeptideIDs and iterating until each peptide is uniquely assigned. In accordance with Fido only the best hit (PSM) for an ID is considered. Probability ties are _currently_ resolved by taking the first occurrence.

Parameters
conn_compThe component to be resolved
proteinProteinIdentification object storing IDs and groups
peptidesvector of ProteinIdentifications with links to the proteins

◆ resolveGraph()

void resolveGraph ( ProteinIdentification protein,
std::vector< PeptideIdentification > &  peptides 
)

Applies resolveConnectedComponent to every component of the graph and is able to write statistics when specified. Parameters will both be mutated in this method.

Parameters
proteinProteinIdentification object storing IDs and groups
peptidesvector of ProteinIdentifications with links to the proteins

Member Data Documentation

◆ indist_prot_grp_to_pep_

IndexMap_ indist_prot_grp_to_pep_
private

mapping indist. protein group indices -> peptide identification indices

◆ pep_to_indist_prot_grp_

IndexMap_ pep_to_indist_prot_grp_
private

mapping indist. protein group indices <- peptide identification indices

◆ prot_acc_to_indist_prot_grp_

std::map<String, Size> prot_acc_to_indist_prot_grp_
private

represents the middle layer of an implicit tripartite graph: consists of single protein accessions and their mapping to the (indist.) group's indices

◆ statistics_

bool statistics_
private

log debug information?


OpenMS / TOPP release 2.3.0 Documentation generated on Tue Jan 9 2018 18:22:11 using doxygen 1.8.13