OpenMS
2.4.0
|
Resolves shared peptides based on protein scores. More...
#include <OpenMS/ANALYSIS/ID/PeptideProteinResolution.h>
Public Member Functions | |
PeptideProteinResolution (bool statistics=false) | |
void | buildGraph (const ProteinIdentification &protein, const std::vector< PeptideIdentification > &peptides) |
void | resolveGraph (ProteinIdentification &protein, std::vector< PeptideIdentification > &peptides) |
ConnectedComponent | findConnectedComponent (Size &root_prot_grp) |
void | resolveConnectedComponent (ConnectedComponent &conn_comp, ProteinIdentification &protein, std::vector< PeptideIdentification > &peptides) |
Private Types | |
typedef std::map< Size, std::set< Size > > | IndexMap_ |
Private Attributes | |
IndexMap_ | indist_prot_grp_to_pep_ |
mapping indist. protein group indices -> peptide identification indices More... | |
IndexMap_ | pep_to_indist_prot_grp_ |
mapping indist. protein group indices <- peptide identification indices More... | |
std::map< String, Size > | prot_acc_to_indist_prot_grp_ |
bool | statistics_ |
log debug information? More... | |
Resolves shared peptides based on protein scores.
Resolves connected components of the bipartite protein-peptide graph based on protein probabilities/scores and adds them as additional protein_groups to the protein identification run processed. Thereby greedily assigns shared peptides in this component uniquely to the proteins of the current best indistinguishable protein group, until every peptide is uniquely assigned. This effectively allows more peptides to be used in ProteinQuantifier at the cost of potentially additional noise in the peptides quantities. In accordance with most state-of-the-art protein inference tools, only the best hit (PSM) for a peptide ID is considered. Probability ties are currently resolved by taking the first occurring protein of the component.
Implement probability tie resolution.
PeptideProteinResolution | ( | bool | statistics = false | ) |
Constructor
statistics | Specifies if the class stores/outputs info about statistics |
void buildGraph | ( | const ProteinIdentification & | protein, |
const std::vector< PeptideIdentification > & | peptides | ||
) |
Initialize and store the graph (= maps)
protein | ProteinIdentification object storing IDs and groups |
peptides | vector of ProteinIdentifications with links to the proteins |
ConnectedComponent findConnectedComponent | ( | Size & | root_prot_grp | ) |
Does a BFS on the two maps (= two parts of the graph; indist. prot. groups and peptides), switching from one to the other in each step.
root_prot_grp | Starts the BFS at this protein group index |
void resolveConnectedComponent | ( | ConnectedComponent & | conn_comp, |
ProteinIdentification & | protein, | ||
std::vector< PeptideIdentification > & | peptides | ||
) |
Resolves connected components based on Fido probabilities and adds them as additional protein_groups to the output idXML. Thereby greedily assigns shared peptides in this component uniquely to the proteins of the current BEST INDISTINGUISHABLE protein group, ready to be used in ProteinQuantifier then. This is achieved by removing all other evidence from the input PeptideIDs and iterating until each peptide is uniquely assigned. In accordance with Fido only the best hit (PSM) for an ID is considered. Probability ties are _currently_ resolved by taking the first occurrence.
conn_comp | The component to be resolved |
protein | ProteinIdentification object storing IDs and groups |
peptides | vector of ProteinIdentifications with links to the proteins |
void resolveGraph | ( | ProteinIdentification & | protein, |
std::vector< PeptideIdentification > & | peptides | ||
) |
Applies resolveConnectedComponent to every component of the graph and is able to write statistics when specified. Parameters will both be mutated in this method.
protein | ProteinIdentification object storing IDs and groups |
peptides | vector of ProteinIdentifications with links to the proteins |
|
private |
mapping indist. protein group indices -> peptide identification indices
|
private |
mapping indist. protein group indices <- peptide identification indices
represents the middle layer of an implicit tripartite graph: consists of single protein accessions and their mapping to the (indist.) group's indices
|
private |
log debug information?