OpenMS
2.8.0
|
A representation of a QT cluster used for feature grouping. More...
#include <OpenMS/DATASTRUCTURES/QTCluster.h>
Classes | |
class | BulkData |
Class to store the bulk internal data (neighbors, annotations, etc.) More... | |
struct | Element |
struct | Neighbor |
Public Types | |
typedef std::multimap< double, const GridFeature * > | NeighborList |
typedef std::unordered_map< Size, NeighborList > | NeighborMapMulti |
typedef std::unordered_map< Size, Neighbor > | NeighborMap |
typedef std::vector< Element > | Elements |
Public Member Functions | |
QTCluster (BulkData *const data, bool use_IDs) | |
Detailed constructor of the cluster head. More... | |
QTCluster ()=delete | |
Default constructor not accessible Objects of this class should only exist with a valid BulkData* given. Otherwise most of the member functions are undefined behavior or produce segfaults. More... | |
QTCluster (const QTCluster &rhs)=default | |
Cheap copy ctor because most of the data lies outside of this class (BulkData*) Be very careful with this copy constructor. The copy will point to the same BulkData object as the given QTCluster. The latter one shouldn't be used anymore. This operation is only allowed because the boost::heap interface needs it. More... | |
QTCluster & | operator= (const QTCluster &rhs)=default |
Cheap copy assignment, see copy ctor for details. More... | |
QTCluster (QTCluster &&rhs)=default | |
cheap move ctor because most of the data lies outside of this class (BulkData*) More... | |
QTCluster & | operator= (QTCluster &&rhs)=default |
cheap move assignment because most of the data lies outside of this class (BulkData*) More... | |
~QTCluster ()=default | |
const GridFeature * | getCenterPoint () const |
Returns the cluster center. More... | |
Size | getId () const |
returns the clusters id More... | |
double | getCenterRT () const |
Returns the RT value of the cluster. More... | |
double | getCenterMZ () const |
Returns the m/z value of the cluster center. More... | |
Int | getXCoord () const |
Returns the x coordinate in the grid. More... | |
Int | getYCoord () const |
Returns the y coordinate in the grid. More... | |
Size | size () const |
Returns the size of the cluster (number of elements, incl. center) More... | |
bool | operator< (const QTCluster &cluster) const |
Compare by quality. More... | |
void | add (const GridFeature *const element, double distance) |
Adds a new element/neighbor to the cluster. More... | |
Elements | getElements () const |
Gets the clustered elements meaning neighbors + cluster center. More... | |
bool | update (const Elements &removed) |
Updates the cluster after the indicated data points are removed. More... | |
double | getQuality () |
Returns the cluster quality and recomputes if necessary. More... | |
double | getCurrentQuality () const |
Returns the cluster quality without recomputing. More... | |
const std::set< AASequence > & | getAnnotations () |
Return the set of peptide sequences annotated to the cluster center. More... | |
void | setInvalid () |
Sets current cluster as invalid (also frees some memory) More... | |
bool | isInvalid () const |
Whether current cluster is invalid. More... | |
void | initializeCluster () |
Has to be called before adding elements (calling QTCluster::add) More... | |
void | finalizeCluster () |
Has to be called after adding elements (after calling QTCluster::add one or multiple times) More... | |
Elements | getAllNeighbors () const |
Get all current neighbors. More... | |
Private Member Functions | |
void | computeQuality_ () |
Computes the quality of the cluster. More... | |
double | optimizeAnnotations_ () |
Finds the optimal annotation (peptide sequences) for the cluster. More... | |
void | makeSeqTable_ (std::map< AASequence, std::map< Size, double >> &seq_table) const |
compute seq table, mapping: peptides -> best distance per input map More... | |
void | recomputeNeighbors_ () |
report elements that are compatible with the optimal annotation More... | |
Private Attributes | |
double | quality_ |
Quality of the cluster. More... | |
BulkData * | data_ |
Pointer to data members. More... | |
bool | valid_ |
Whether current cluster is valid. More... | |
bool | changed_ |
Has the cluster changed (if yes, quality needs to be recomputed)? More... | |
bool | use_IDs_ |
Keep track of peptide IDs and use them for matching? More... | |
bool | collect_annotations_ |
Whether initial collection of all neighbors is needed. More... | |
bool | finalized_ |
Whether current cluster is accepting new elements or not (if true, no more new elements allowed) More... | |
A representation of a QT cluster used for feature grouping.
Ultimately, a cluster represents a group of corresponding features (or consensus features) from different input maps (feature maps or consensus maps).
Clusters are defined by their center points (one feature each). A cluster also stores a number of potential cluster elements (other features) from different input maps, together with their distances to the cluster center. Every feature that satisfies certain constraints with respect to the cluster center is a potential cluster element. However, since a feature group can only contain one feature from each input map, only the "best" (i.e. closest to the cluster center) such feature is considered a true cluster element. To save memory, only the "best" element for each map is stored inside a cluster.
The QT clustering algorithm has the characteristic of initially producing all possible, overlapping clusters. Iteratively, the best cluster is then extracted and the clustering is recomputed for the remaining points.
In our implementation, multiple rounds of clustering are not necessary. Instead, the clustering is updated in each iteration. This is the reason for temporarily storing all potential cluster elements: When a certain cluster is finalized, its elements have to be removed from the remaining clusters, and affected clusters change their composition. (Note that clusters can also be invalidated by this, if the cluster center is being removed.)
The quality of a cluster is the normalized average distance to the cluster center for present and missing cluster elements. The distance value for missing elements (if the cluster contains no feature from a certain input map) is the user-defined threshold that marks the maximum allowed radius of a cluster.
When adding elements to the cluster, the client needs to call initializeCluster first and the client needs to call finalizeCluster after adding the last element. After finalizeCluster, the client may not add any more elements through the add function (the client must call initializeCluster again before adding new elements).
If use_id_ is set, clusters are extended only with elements that have at least one matching ID. Quality is then computed as the best quality of all possible IDs and this ID is then used as the only (representative) ID of the cluster. The left-out alternative IDs might be added back later based on the original features though.
struct OpenMS::QTCluster::Element |
Class Members | ||
---|---|---|
const GridFeature * | feature | |
Size | map_index |
struct OpenMS::QTCluster::Neighbor |
Class Members | ||
---|---|---|
double | distance | |
const GridFeature * | feature |
typedef std::multimap<double, const GridFeature*> NeighborList |
typedef std::unordered_map<Size, Neighbor> NeighborMap |
typedef std::unordered_map<Size, NeighborList> NeighborMapMulti |
Detailed constructor of the cluster head.
data | Pointer to internal data |
use_IDs | Use peptide annotations? |
|
delete |
Default constructor not accessible Objects of this class should only exist with a valid BulkData* given. Otherwise most of the member functions are undefined behavior or produce segfaults.
Cheap copy ctor because most of the data lies outside of this class (BulkData*) Be very careful with this copy constructor. The copy will point to the same BulkData object as the given QTCluster. The latter one shouldn't be used anymore. This operation is only allowed because the boost::heap interface needs it.
cheap move ctor because most of the data lies outside of this class (BulkData*)
|
default |
void add | ( | const GridFeature *const | element, |
double | distance | ||
) |
Adds a new element/neighbor to the cluster.
element | The element to be added |
distance | Distance of the element to the center point |
|
private |
Computes the quality of the cluster.
void finalizeCluster | ( | ) |
Has to be called after adding elements (after calling QTCluster::add one or multiple times)
Elements getAllNeighbors | ( | ) | const |
Get all current neighbors.
const std::set<AASequence>& getAnnotations | ( | ) |
Return the set of peptide sequences annotated to the cluster center.
double getCenterMZ | ( | ) | const |
Returns the m/z value of the cluster center.
const GridFeature* getCenterPoint | ( | ) | const |
Returns the cluster center.
double getCenterRT | ( | ) | const |
Returns the RT value of the cluster.
double getCurrentQuality | ( | ) | const |
Returns the cluster quality without recomputing.
Elements getElements | ( | ) | const |
Gets the clustered elements meaning neighbors + cluster center.
Size getId | ( | ) | const |
returns the clusters id
double getQuality | ( | ) |
Returns the cluster quality and recomputes if necessary.
Int getXCoord | ( | ) | const |
Returns the x coordinate in the grid.
Int getYCoord | ( | ) | const |
Returns the y coordinate in the grid.
void initializeCluster | ( | ) |
Has to be called before adding elements (calling QTCluster::add)
|
inline |
Whether current cluster is invalid.
|
private |
compute seq table, mapping: peptides -> best distance per input map
bool operator< | ( | const QTCluster & | cluster | ) | const |
Compare by quality.
Cheap copy assignment, see copy ctor for details.
cheap move assignment because most of the data lies outside of this class (BulkData*)
|
private |
Finds the optimal annotation (peptide sequences) for the cluster.
The optimal annotation is the one that results in the best quality. It is stored in annotations_
;
This function is only needed when peptide ids are used and the current center point does not have any peptide id associated with it. In this case, it is not clear which peptide id the current cluster should use. The function thus iterates through all possible peptide ids and selects the one producing the best cluster.
This function needs access to all possible neighbors for this cluster and thus can only be run when tmp_neighbors_ is filled (which is during the filling of a cluster). The function thus cannot be called after finalizing the cluster.
|
private |
report elements that are compatible with the optimal annotation
void setInvalid | ( | ) |
Sets current cluster as invalid (also frees some memory)
Size size | ( | ) | const |
Returns the size of the cluster (number of elements, incl. center)
bool update | ( | const Elements & | removed | ) |
Updates the cluster after the indicated data points are removed.
removed | The datapoints to be removed from the cluster |
|
private |
Has the cluster changed (if yes, quality needs to be recomputed)?
|
private |
Whether initial collection of all neighbors is needed.
This variable stores whether we need to collect all annotations first before we can decide upon the best set of cluster points. This is usually only necessary if the center point does not have an annotation but we want to use ids.
|
private |
Pointer to data members.
|
private |
Whether current cluster is accepting new elements or not (if true, no more new elements allowed)
|
private |
Quality of the cluster.
|
private |
Keep track of peptide IDs and use them for matching?
|
private |
Whether current cluster is valid.