This tutorial will give you an overview of how to use the peak intensity prediction (PIP). In general, PIP allows you to predict the peak intensity of a peptide relative to other peptides of the same abundance from its sequence alone. At the same time, this value allows to correct peak intensities for peptide-specific instrument sensitivity in a label-free quantitation application.

This method is still in an early phase: A proof of concept has been conducted and published in [1]. Peak intensities can be predicted with significant correlations, but application tests are yet to come.

Background

The sensitivity of a mass spectrometer depends on the analysed peptides, among other factors. This peptide-specific sensitivity causes peak heights of peptides with the same abundance to be generally different. PIP incorporates a model that maps peptide sequences to peptide-specific sensitivities.

Machine learning details

The incorporated model has been adapted with a Local Linear Map [2] - a machine learning algorithm that uses both supervised and unsupervised learning in its training, and which is fast and easy to implement. Better results can be achieved with other learning architectures [3], however, these are not implemented in this prototype stage yet.

About the training data

The model which the PIP module uses has been trained with data from a Bruker Ultraflex MALDI-TOF instrument. Details about these data can be found with [3]. A Pearson's squared correlation of 0.43 in ten-fold cross-validation and of 0.34 across datasets from the same instrument (but with different settings and operating persons) could be achieved. There is no experience yet about the performance across instruments. So we would be pleased if you could share your experience with the model incorporated in PIP applied to other datasets.

At this point, it is not possible to train a model with your own data, but it is a planned feature. It is as of yet unknown how similar peptide-specific sensitivities behave between different MALDI instruments.

How to use PIP

PIP lets you predict intensities using peptide sequences as input. The output values have been normalized to a mean of 0 and variance 1.

To test PIP with data from your instrument, MALDI spectra that contain only peptides of one protein can be used:

Normalize your peak intensities with the sum of only the peptide's peaks to make them comparable to other spectra.
Logarithmize the resulting values.
Center and normalize your peak intensities by variance (of course, multiple spectra should be used to find mean and variance), these value are referred to as tI in the following.
Predict the peptide's peak intensities (referred to as pI in the following)
Calculate the correlation between the tI and pI. If you calculate exp(log(tI) - pI), it should give 1 as a result in this test.

To calculate relative peptide abundance (relative to those of the other peptides in the mixture) from intensities of a peptide mixture using values predicted by PIP, do above steps 2. to 4. Then calculate the peptide level x = exp(log(tI) - pI). !!! The quantification with an actual protein mixture has never been tested with this model.

Example code

There is a usage example for the PeakIntensityPredictor class in doc/code_examples/Tutorial_PeakIntensityPredictor.cpp.

Sequences of peptides to be predicted should be stored in a vector of AASequence instances:

  //Create a vector for the predicted values that is large enough to hold them all
  vector<AASequence> peptides;
  peptides.push_back(AASequence::fromString("IVGLMPHPEHAVEK"));
  peptides.push_back(AASequence::fromString("LADNISNAMQGISEATEPR"));
  peptides.push_back(AASequence::fromString("ELDHSDTIEVIVNPEDIDYDAASEQAR"));
  peptides.push_back(AASequence::fromString("AVDTVR"));
  peptides.push_back(AASequence::fromString("AAWQVK"));
  peptides.push_back(AASequence::fromString("FLGTQGR"));
  peptides.push_back(AASequence::fromString("NYPSDWSDVDTK"));
  peptides.push_back(AASequence::fromString("GSPSFGPESISTETWSAEPYGR"));
  peptides.push_back(AASequence::fromString("TELGFDPEAHFAIDDEVIAHTR"));

Then create an instance of the model, and predict the peak intensities of the peptides:

  //Create new predictor model with vector of AASequences
  PeakIntensityPredictor model;
  //Perform prediction with LLM model
  vector<double> predicted = model.predict(peptides);

You can output AASequence instances like normal strings:

  //for each element in peptides print sequence as well as corresponding predicted peak intensity value.
  for (Size i = 0; i < peptides.size(); i++)
  {
    cout << "Intensity of " << peptides[i] << " is " << predicted[i] << endl;
  }

References

[1] :Wiebke Timm: Peak Intensity Prediction in Mass Spectra using Machine Learning Methods, PhD Thesis (2008) [2] :Helge Ritter: Learning with Self-Organizing Map, Artificial Neural Networks, In T. Kohonen et al., eds.: Artificial Neural Networks, Elsevier Science Publishers (1991), 379-384 [3] :W. Timm, A. Scherbart, S. Böcker, O. Kohlbacher, T.W. Nattkemper: Peak Intensity Prediction in MALDI-TOF Mass Spectrometry: A Machine Learning Study to support Quantitative Proteomics, BMC Bioinformatics (2008)