When you want to get to know and love your data

Posts tagged “mass spectral similarity

Tutorials- Statistical and Multivariate Analysis for Metabolomics

2014 winter LC-MS stats courseI recently had the pleasure in participating in the 2014 WCMC Statistics for Metabolomics Short Course. The course was hosted by the NIH West Coast Metabolomics Center and focused on statistical and multivariate strategies for metabolomic data analysis. A variety of topics were covered using 8 hands on tutorials which focused on:

  • data quality overview
  • statistical and power analysis
  • clustering
  • principal components analysis (PCA)
  • partial least squares (O-/PLS/-DA)
  • metabolite enrichment analysis
  • biochemical and structural similarity network construction
  • network mapping

I am happy to have taught the course using all open source software, including: R, and Cytoscape. The data analysis and visualization were done using Shiny-based apps:  DeviumWeb and MetaMapR. Check out some of the slides below or download all the class material and try it out for yourself.

Creative Commons License
2014 WCMC LC-MS Data Processing and Statistics for Metabolomics by Dmitry Grapov is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Special thanks to the developers of Shiny and Radiant by Vincent Nijs.


Featured Network in Chemical and Engineering News (C&EN)

I am happy to announce the release of MetaMapR (v1.2.0).

New features include: 

  • An independent module for biological database identifier translations using the Chemical Translation System (CTS)
  • a retention time filter for mass spectral connections
  • increase in calculation speed

An application of MetaMapR was recently featured in an article in the Nov. 4th 2013 issue of Chemical & Engineering News (C&EN) , 91(44). This tool was used to generate a network of > 1200 metabolites based on enzymatic transformations and structural similarities.

C and E figure

The full article can be found be found here as well as the original image.

Biochemical, Chemical and Mass Spectral Similarity Network

Here is an example of a network leveraging three dominant aspects of metabolomic experiments (biochemical, chemical and mass spectral knowledge) to connect measured variables. This is a network for a blinded data set (sample ids are not known), which I’ve made for a member of my lab presenting their work at the Metabolomics Society Conference in Glasgow, Scotland.


With out knowing the experimental design we can still analyze our data for analytical effects. For example below is a principal components analysis of ~400 samples and 600 variables, where I’ve annotated the sample scores to show data aquisition date (color) and experimental samples or laboratory quality controls (shape).  One thing to look for are trends or scores grouping in the PCA scores which are correlated to analytical conditions like, batch, date, technician, etc.

PCA scores

Finally we can take a look at the PCA variable loadings which highlights a major bottleneck in metabolomics experiments, the large amount of structurally unknown molecular features.

PCA loadings

Even using feature rich electron impact mass spectra (GC/TOF) only 40% of the unknown variables could be connected to known species based on a cosine correlation >0.75.  To give you an idea the cosine correlation or dot product between the mass spectra of two structurally very similar molecules xylose and xylitol is ~ 0.8.