OpenMS
2.8.0
|
Creates and maintains a boost graph based on the OpenMS ID datastructures. More...
#include <OpenMS/ANALYSIS/ID/IDBoostGraph.h>
Classes | |
class | dfs_ccsplit_visitor |
A boost dfs visitor that copies connected components into a vector of graphs. More... | |
class | GetPosteriorVisitor |
Visits nodes in the boost graph (either ptrs to an ID Object or some lightweight surrogates) and depending on their type gets the score (usually the posterior) More... | |
class | GetScoreTgTVisitor |
Visits nodes in the boost graph (either ptrs to an ID Object or some lightweight surrogates) and depending on their type gets the score (usually the posterior) plus if it is a decoy or a target. If not known or not defined, returns (-1.0, false) More... | |
class | LabelVisitor |
Visits nodes in the boost graph (ptrs to an ID Object) and depending on their type creates a label e.g. for printing to dot format. More... | |
class | PrintAddressVisitor |
Visits nodes in the boost graph (ptrs to an ID Object) and depending on their type prints the address. For debugging purposes only. More... | |
struct | ProteinGroup |
placeholder for peptides with the same parent proteins or protein groups More... | |
class | SetPosteriorVisitor |
Visits nodes in the boost graph (either ptrs to an ID Object or some lightweight surrogates) and depending on their type sets the posterior Don't forget to set higherScoreBetter and score names in the parent ID objects. More... | |
Public Types | |
typedef boost::variant< ProteinHit *, ProteinGroup, PeptideCluster, Peptide, RunIndex, Charge, PeptideHit * > | IDPointer |
an (currently unmodified) peptide sequence More... | |
typedef boost::variant< const ProteinHit *, const ProteinGroup *, const PeptideCluster *, const Peptide, const RunIndex, const Charge, const PeptideHit * > | IDPointerConst |
typedef boost::adjacency_list< boost::setS, boost::vecS, boost::undirectedS, IDPointer > | Graph |
typedef std::vector< Graph > | Graphs |
typedef boost::adjacency_list< boost::setS, boost::vecS, boost::undirectedS, IDPointer > | GraphConst |
typedef boost::graph_traits< Graph >::vertex_descriptor | vertex_t |
typedef boost::graph_traits< Graph >::edge_descriptor | edge_t |
typedef std::set< IDBoostGraph::vertex_t > | ProteinNodeSet |
typedef std::set< IDBoostGraph::vertex_t > | PeptideNodeSet |
Public Member Functions | |
IDBoostGraph (ProteinIdentification &proteins, std::vector< PeptideIdentification > &idedSpectra, Size use_top_psms, bool use_run_info, bool best_psms_annotated, const std::optional< const ExperimentalDesign > &ed=std::optional< const ExperimentalDesign >()) | |
Constructors. More... | |
IDBoostGraph (ProteinIdentification &proteins, ConsensusMap &cmap, Size use_top_psms, bool use_run_info, bool use_unassigned_ids, bool best_psms_annotated, const std::optional< const ExperimentalDesign > &ed=std::optional< const ExperimentalDesign >()) | |
void | applyFunctorOnCCs (const std::function< unsigned long(Graph &, unsigned int)> &functor) |
Do sth on connected components (your functor object has to inherit from std::function or be a lambda) More... | |
void | applyFunctorOnCCsST (const std::function< void(Graph &)> &functor) |
Do sth on connected components single threaded (your functor object has to inherit from std::function or be a lambda) More... | |
void | clusterIndistProteinsAndPeptides () |
void | clusterIndistProteinsAndPeptidesAndExtendGraph () |
void | annotateIndistProteins (bool addSingletons=true) |
void | calculateAndAnnotateIndistProteins (bool addSingletons=true) |
void | computeConnectedComponents () |
Splits the initialized graph into connected components and clears it. More... | |
void | resolveGraphPeptideCentric (bool removeAssociationsInData=true) |
Size | getNrConnectedComponents () |
Zero means the graph was not split yet. More... | |
const Graph & | getComponent (Size cc) |
Returns a specific connected component of the graph as a graph itself. More... | |
const ProteinIdentification & | getProteinIDs () |
Returns the underlying protein identifications for viewing. More... | |
void | getUpstreamNodesNonRecursive (std::queue< vertex_t > &q, const Graph &graph, int lvl, bool stop_at_first, std::vector< vertex_t > &result) |
Searches for all upstream nodes from a (set of) start nodes that are lower or equal than a given level. The ordering is the same as in the IDPointer variant typedef. More... | |
void | getDownstreamNodesNonRecursive (std::queue< vertex_t > &q, const Graph &graph, int lvl, bool stop_at_first, std::vector< vertex_t > &result) |
Searches for all downstream nodes from a (set of) start nodes that are higher or equal than a given level. The ordering is the same as in the IDPointer variant typedef. More... | |
void | getProteinScores_ (ScoreToTgtDecLabelPairs &scores_and_tgt) |
void | getProteinGroupScoresAndTgtFraction (ScoreToTgtDecLabelPairs &scores_and_tgt_fraction) |
void | getProteinGroupScoresAndHitchhikingTgtFraction (ScoreToTgtDecLabelPairs &scores_and_tgt_fraction) |
Static Public Member Functions | |
static void | printGraph (std::ostream &out, const Graph &fg) |
Prints a graph (component or if not split, the full graph) in graphviz (i.e. dot) format. More... | |
Private Member Functions | |
vertex_t | addVertexWithLookup_ (const IDPointer &ptr, std::unordered_map< IDPointer, vertex_t, boost::hash< IDPointer >> &vertex_map) |
void | annotateIndistProteins_ (const Graph &fg, bool addSingletons) |
internal function to annotate the underlying ID structures based on the given Graph More... | |
void | calculateAndAnnotateIndistProteins_ (const Graph &fg, bool addSingletons) |
void | buildGraph_ (ProteinIdentification &proteins, std::vector< PeptideIdentification > &idedSpectra, Size use_top_psms, bool best_psms_annotated=false) |
void | buildGraph_ (ProteinIdentification &proteins, ConsensusMap &cmap, Size use_top_psms, bool use_unassigned_ids, bool best_psms_annotated=false) |
void | addPeptideIDWithAssociatedProteins_ (PeptideIdentification &spectrum, std::unordered_map< IDPointer, vertex_t, boost::hash< IDPointer >> &vertex_map, const std::unordered_map< std::string, ProteinHit * > &accession_map, Size use_top_psms, bool best_psms_annotated) |
Used during building. More... | |
void | addPeptideAndAssociatedProteinsWithRunInfo_ (PeptideIdentification &spectrum, std::unordered_map< unsigned, unsigned > &indexToPrefractionationGroup, std::unordered_map< IDPointer, vertex_t, boost::hash< IDPointer >> &vertex_map, std::unordered_map< std::string, ProteinHit * > &accession_map, Size use_top_psms) |
void | buildGraphWithRunInfo_ (ProteinIdentification &proteins, ConsensusMap &cmap, Size use_top_psms, bool use_unassigned_ids, const ExperimentalDesign &ed) |
void | buildGraphWithRunInfo_ (ProteinIdentification &proteins, std::vector< PeptideIdentification > &idedSpectra, Size use_top_psms, const ExperimentalDesign &ed) |
void | resolveGraphPeptideCentric_ (Graph &fg, bool removeAssociationsInData) |
see equivalent public method More... | |
template<class NodeType > | |
void | getDownstreamNodes (const vertex_t &start, const Graph &graph, std::vector< NodeType > &result) |
template<class NodeType > | |
void | getUpstreamNodes (const vertex_t &start, const Graph graph, std::vector< NodeType > &result) |
Private Attributes | |
ProteinIdentification & | protIDs_ |
Graph | g |
the initial boost Graph (will be cleared when split into CCs) More... | |
Graphs | ccs_ |
the Graph split into connected components More... | |
std::unordered_map< vertex_t, Size > | pepHitVtx_to_run_ |
Size | nrPrefractionationGroups_ = 0 |
Creates and maintains a boost graph based on the OpenMS ID datastructures.
For finding connected components and applying functions to them. Currently assumes that all PeptideIdentifications are from the ProteinID run that is given. Please make sure this is right. VERY IMPORTANT NOTE: If you add Visitors here, make sure they do not touch members of the underlying ID objects that are responsible for the graph structure. E.g. the (protein/peptide)_hits vectors or the lists in ProteinGroups. You can set information like scores or metavalues, though.
struct OpenMS::Internal::IDBoostGraph::ProteinGroup |
placeholder for peptides with the same parent proteins or protein groups
indistinguishable protein groups (size, nr targets, score)
Class Members | ||
---|---|---|
double | score | |
int | size | |
int | tgts |
typedef boost::adjacency_list<boost::setS, boost::vecS, boost::undirectedS, IDPointer> GraphConst |
typedef boost::variant<ProteinHit*, ProteinGroup, PeptideCluster, Peptide, RunIndex, Charge, PeptideHit*> IDPointer |
an (currently unmodified) peptide sequence
in which run a PSM was observed in which charge state a PSM was observed
typedef boost::variant<const ProteinHit*, const ProteinGroup*, const PeptideCluster*, const Peptide, const RunIndex, const Charge, const PeptideHit*> IDPointerConst |
typedef std::set<IDBoostGraph::vertex_t> PeptideNodeSet |
typedef std::set<IDBoostGraph::vertex_t> ProteinNodeSet |
IDBoostGraph | ( | ProteinIdentification & | proteins, |
std::vector< PeptideIdentification > & | idedSpectra, | ||
Size | use_top_psms, | ||
bool | use_run_info, | ||
bool | best_psms_annotated, | ||
const std::optional< const ExperimentalDesign > & | ed = std::optional< const ExperimentalDesign >() |
||
) |
Constructors.
IDBoostGraph | ( | ProteinIdentification & | proteins, |
ConsensusMap & | cmap, | ||
Size | use_top_psms, | ||
bool | use_run_info, | ||
bool | use_unassigned_ids, | ||
bool | best_psms_annotated, | ||
const std::optional< const ExperimentalDesign > & | ed = std::optional< const ExperimentalDesign >() |
||
) |
|
private |
|
private |
Used during building.
|
private |
helper function to add a vertex if it is not present yet, otherwise return the present one needs a temporary filled vertex_map that is modifiable
void annotateIndistProteins | ( | bool | addSingletons = true | ) |
Annotate indistinguishable proteins by adding the groups to the underlying ProteinIdentification::ProteinGroups object. This has no effect on the graph itself.
addSingletons | if you want to annotate groups with just one protein entry |
|
private |
internal function to annotate the underlying ID structures based on the given Graph
void applyFunctorOnCCs | ( | const std::function< unsigned long(Graph &, unsigned int)> & | functor | ) |
Do sth on connected components (your functor object has to inherit from std::function or be a lambda)
void applyFunctorOnCCsST | ( | const std::function< void(Graph &)> & | functor | ) |
Do sth on connected components single threaded (your functor object has to inherit from std::function or be a lambda)
|
private |
|
private |
Initialize and store the graph IMPORTANT: Once the graph is built, editing members like (protein/peptide)_hits_ will invalidate it!
protein | ProteinIdentification object storing IDs and groups |
idedSpectra | vector of ProteinIdentifications with links to the proteins and PSMs in its PeptideHits |
use_top_psms | Nr of top PSMs used per spectrum (<= 0 means all) |
best_psms_annotated | Are the PSMs annotated with the "best_per_peptide" meta value. Otherwise all are taken into account. |
|
private |
Initialize and store the graph. Also stores run information to later group peptides more efficiently. IMPORTANT: Once the graph is built, editing members like (protein/peptide)_hits_ will invalidate it!
use_top_psms | Nr of top PSMs used per spectrum (<= 0 means all) |
|
private |
void calculateAndAnnotateIndistProteins | ( | bool | addSingletons = true | ) |
Annotate indistinguishable proteins by adding the groups to the underlying ProteinIdentification::ProteinGroups object. This has no effect on the graph itself.
addSingletons | if you want to annotate groups with just one protein entry |
|
private |
void clusterIndistProteinsAndPeptides | ( | ) |
Add intermediate nodes to the graph that represent indist. protein groups and peptides with the same parents this will save computation time and oscillations later on.
void clusterIndistProteinsAndPeptidesAndExtendGraph | ( | ) |
(under development) As above but adds charge, replicate and sequence layer of nodes (untested)
void computeConnectedComponents | ( | ) |
Splits the initialized graph into connected components and clears it.
Returns a specific connected component of the graph as a graph itself.
cc | the index of the component |
|
inlineprivate |
void getDownstreamNodesNonRecursive | ( | std::queue< vertex_t > & | q, |
const Graph & | graph, | ||
int | lvl, | ||
bool | stop_at_first, | ||
std::vector< vertex_t > & | result | ||
) |
Searches for all downstream nodes from a (set of) start nodes that are higher or equal than a given level. The ordering is the same as in the IDPointer variant typedef.
q | a queue of start nodes |
graph | the graph to look in (q has to be part of it) |
lvl | the level to start reporting from |
stop_at_first | do you want to stop at the first node >= lvl or also report its upstream "predecessors" |
result | vector of reported nodes |
Size getNrConnectedComponents | ( | ) |
Zero means the graph was not split yet.
void getProteinGroupScoresAndHitchhikingTgtFraction | ( | ScoreToTgtDecLabelPairs & | scores_and_tgt_fraction | ) |
void getProteinGroupScoresAndTgtFraction | ( | ScoreToTgtDecLabelPairs & | scores_and_tgt_fraction | ) |
Gets the scores and target decoy fraction from groups and score + binary values for singleton proteins. This function is usually used to create input for FDR calculations
const ProteinIdentification& getProteinIDs | ( | ) |
Returns the underlying protein identifications for viewing.
void getProteinScores_ | ( | ScoreToTgtDecLabelPairs & | scores_and_tgt | ) |
Gets the scores from the proteins included in the graph. The difference to querying the underlying ProteinIdentification structure is that not all proteins might be included in the graph due to using only the best psm per peptide
|
inlineprivate |
void getUpstreamNodesNonRecursive | ( | std::queue< vertex_t > & | q, |
const Graph & | graph, | ||
int | lvl, | ||
bool | stop_at_first, | ||
std::vector< vertex_t > & | result | ||
) |
Searches for all upstream nodes from a (set of) start nodes that are lower or equal than a given level. The ordering is the same as in the IDPointer variant typedef.
q | a queue of start nodes |
graph | the graph to look in (q has to be part of it) |
lvl | the level to start reporting from |
stop_at_first | do you want to stop at the first node <= lvl or also report its upstream "predecessors" |
result | vector of reported nodes |
|
static |
Prints a graph (component or if not split, the full graph) in graphviz (i.e. dot) format.
out | an ostream to print to |
fg | the graph to print |
void resolveGraphPeptideCentric | ( | bool | removeAssociationsInData = true | ) |
removeAssociationsInData | Also removes the corresponding PeptideEvidences in the underlying ID data structure. Only deactivate if you know what you are doing. |
|
private |
see equivalent public method
|
private |
the Graph split into connected components
|
private |
the initial boost Graph (will be cleared when split into CCs)
|
private |
this basically stores the number of different values in the pepHitVtx_to_run a Prefractionation group (previously called run) is a unique combination of all non-fractionation related entries in the exp. design i.e. one (sub-)experiment before fractionation
if a graph is built with run information, this will store the run, each peptide hit vertex belongs to. Important for extending the graph.
|
private |