A representation of a QT cluster used for feature grouping. More...
#include <OpenMS/DATASTRUCTURES/QTCluster.h>
Public Member Functions | |
QTCluster (GridFeature *center_point, Size num_maps, double max_distance, bool use_IDs, Int x_coord, Int y_coord) | |
Detailed constructor. More... | |
virtual | ~QTCluster () |
Destructor. More... | |
GridFeature * | getCenterPoint () |
Returns the cluster center. More... | |
double | getCenterRT () const |
Returns the RT value of the cluster. More... | |
double | getCenterMZ () const |
Returns the m/z value of the cluster center. More... | |
Int | getXCoord () const |
Returns the x coordinate in the grid. More... | |
Int | getYCoord () const |
Returns the y coordinate in the grid. More... | |
Size | size () const |
Returns the size of the cluster (number of elements, incl. center) More... | |
bool | operator< (QTCluster &cluster) |
Compare by quality. More... | |
void | add (GridFeature *element, double distance) |
Adds a new element/neighbor to the cluster. More... | |
void | getElements (OpenMSBoost::unordered_map< Size, GridFeature *> &elements) |
Gets the clustered elements. More... | |
bool | update (const OpenMSBoost::unordered_map< Size, GridFeature *> &removed) |
Updates the cluster after the indicated data points are removed. More... | |
double | getQuality () |
Returns the cluster quality. More... | |
const std::set< AASequence > & | getAnnotations () |
Return the set of peptide sequences annotated to the cluster center. More... | |
void | setInvalid () |
Sets current cluster as invalid (also frees some memory) More... | |
bool | isInvalid () const |
Whether current cluster is invalid. More... | |
void | initializeCluster () |
Has to be called before adding elements (calling QTCluster::add) More... | |
void | finalizeCluster () |
Has to be called after adding elements (after calling QTCluster::add one or multiple times) More... | |
OpenMSBoost::unordered_map< Size, std::vector< GridFeature * > > | getAllNeighbors () |
Get all current neighbors. More... | |
Private Types | |
typedef std::multimap< double, GridFeature * > | NeighborListType |
typedef OpenMSBoost::unordered_map< Size, NeighborListType > | NeighborMapMulti |
typedef std::pair< double, GridFeature * > | NeighborPairType |
typedef OpenMSBoost::unordered_map< Size, NeighborPairType > | NeighborMap |
Private Member Functions | |
QTCluster () | |
Base constructor (not accessible) More... | |
void | computeQuality_ () |
Computes the quality of the cluster. More... | |
double | optimizeAnnotations_ () |
Finds the optimal annotation (peptide sequences) for the cluster. More... | |
Private Attributes | |
GridFeature * | center_point_ |
Pointer to the cluster center. More... | |
NeighborMap | neighbors_ |
Map that keeps track of the best current feature for each map. More... | |
NeighborMapMulti * | tmp_neighbors_ |
Temporary map tracking *all* neighbors. More... | |
double | max_distance_ |
Maximum distance of a point that can still belong to the cluster. More... | |
Size | num_maps_ |
Number of input maps. More... | |
double | quality_ |
Quality of the cluster. More... | |
bool | changed_ |
Has the cluster changed (if yes, quality needs to be recomputed)? More... | |
bool | use_IDs_ |
Keep track of peptide IDs and use them for matching? More... | |
bool | valid_ |
Whether current cluster is valid. More... | |
bool | collect_annotations_ |
Whether initial collection of all neighbors is needed. More... | |
bool | finalized_ |
Whether current cluster is accepting new elements or not (if true, no more new elements allowed) More... | |
Int | x_coord_ |
x coordinate in the grid cell More... | |
Int | y_coord_ |
y coordinate in the grid cell More... | |
std::set< AASequence > | annotations_ |
Set of annotations of the cluster. More... | |
A representation of a QT cluster used for feature grouping.
Ultimately, a cluster represents a group of corresponding features (or consensus features) from different input maps (feature maps or consensus maps).
Clusters are defined by their center points (one feature each). A cluster also stores a number of potential cluster elements (other features) from different input maps, together with their distances to the cluster center. Every feature that satisfies certain constraints with respect to the cluster center is a potential cluster element. However, since a feature group can only contain one feature from each input map, only the "best" (i.e. closest to the cluster center) such feature is considered a true cluster element. To save memory, only the "best" element for each map is stored inside a cluster.
The QT clustering algorithm has the characteristic of initially producing all possible, overlapping clusters. Iteratively, the best cluster is then extracted and the clustering is recomputed for the remaining points.
In our implementation, multiple rounds of clustering are not necessary. Instead, the clustering is updated in each iteration. This is the reason for storing all potential cluster elements: When a certain cluster is finalized, its elements have to be removed from the remaining clusters, and affected clusters change their composition. (Note that clusters can also be invalidated by this, if the cluster center is being removed.)
The quality of a cluster is the normalized average distance to the cluster center for present and missing cluster elements. The distance value for missing elements (if the cluster contains no feature from a certain input map) is the user-defined threshold that marks the maximum allowed radius of a cluster.
When adding elements to the cluster, the client needs to call initializeCluster first and the client needs to call finalizeCluster after adding the last element. After finalizeCluster, the client may not add any more elements through the add function (the client must call initializeCluster again before adding new elements).
|
private |
|
private |
|
private |
|
private |
|
private |
Base constructor (not accessible)
QTCluster | ( | GridFeature * | center_point, |
Size | num_maps, | ||
double | max_distance, | ||
bool | use_IDs, | ||
Int | x_coord, | ||
Int | y_coord | ||
) |
Detailed constructor.
center_point | Pointer to the center point |
num_maps | Number of input maps |
max_distance | Maximum allowed distance of two points |
use_IDs | Use peptide annotations? |
|
virtual |
Destructor.
void add | ( | GridFeature * | element, |
double | distance | ||
) |
Adds a new element/neighbor to the cluster.
element | The element to be added |
distance | Distance of the element to the center point |
|
private |
Computes the quality of the cluster.
void finalizeCluster | ( | ) |
Has to be called after adding elements (after calling QTCluster::add one or multiple times)
OpenMSBoost::unordered_map<Size, std::vector<GridFeature*> > getAllNeighbors | ( | ) |
Get all current neighbors.
const std::set<AASequence>& getAnnotations | ( | ) |
Return the set of peptide sequences annotated to the cluster center.
double getCenterMZ | ( | ) | const |
Returns the m/z value of the cluster center.
GridFeature* getCenterPoint | ( | ) |
Returns the cluster center.
double getCenterRT | ( | ) | const |
Returns the RT value of the cluster.
void getElements | ( | OpenMSBoost::unordered_map< Size, GridFeature *> & | elements | ) |
Gets the clustered elements.
double getQuality | ( | ) |
Returns the cluster quality.
Int getXCoord | ( | ) | const |
Returns the x coordinate in the grid.
Int getYCoord | ( | ) | const |
Returns the y coordinate in the grid.
void initializeCluster | ( | ) |
Has to be called before adding elements (calling QTCluster::add)
|
inline |
Whether current cluster is invalid.
bool operator< | ( | QTCluster & | cluster | ) |
Compare by quality.
|
private |
Finds the optimal annotation (peptide sequences) for the cluster.
The optimal annotation is the one that results in the best quality. It is stored in annotations_
;
This function is only needed when peptide ids are used and the current center point does not have any peptide id associated with it. In this case, it is not clear which peptide id the current cluster should use. The function thus iterates through all possible peptide ids and selects the one producing the best cluster.
This function needs access to all possible neighbors for this cluster and thus can only be run when tmp_neighbors_ is filled (which is during the filling of a cluster). The function thus cannot be called after finalizing the cluster.
void setInvalid | ( | ) |
Sets current cluster as invalid (also frees some memory)
Size size | ( | ) | const |
Returns the size of the cluster (number of elements, incl. center)
bool update | ( | const OpenMSBoost::unordered_map< Size, GridFeature *> & | removed | ) |
Updates the cluster after the indicated data points are removed.
removed | The datapoints to be removed from the cluster |
|
private |
Set of annotations of the cluster.
The set of peptide sequences that is compatible to the cluster center and results in the best cluster quality.
|
private |
Pointer to the cluster center.
|
private |
Has the cluster changed (if yes, quality needs to be recomputed)?
|
private |
Whether initial collection of all neighbors is needed.
This variable stores whether we need to collect all annotations first before we can decide upon the best set of cluster points. This is usually only necessary if the center point does not have an annotation but we want to use ids.
|
private |
Whether current cluster is accepting new elements or not (if true, no more new elements allowed)
|
private |
Maximum distance of a point that can still belong to the cluster.
|
private |
Map that keeps track of the best current feature for each map.
|
private |
Number of input maps.
|
private |
Quality of the cluster.
|
private |
Temporary map tracking *all* neighbors.
For each input run, a multimap which contains pointers to all neighboring elements and the respective distance.
|
private |
Keep track of peptide IDs and use them for matching?
|
private |
Whether current cluster is valid.
|
private |
x coordinate in the grid cell
|
private |
y coordinate in the grid cell
OpenMS / TOPP release 2.3.0 | Documentation generated on Tue Jan 9 2018 18:22:12 using doxygen 1.8.13 |