When you want to get to know and love your data

Biochemical, Chemical and Mass Spectral Similarity Network

Here is an example of a network leveraging three dominant aspects of metabolomic experiments (biochemical, chemical and mass spectral knowledge) to connect measured variables. This is a network for a blinded data set (sample ids are not known), which I’ve made for a member of my lab presenting their work at the Metabolomics Society Conference in Glasgow, Scotland.


With out knowing the experimental design we can still analyze our data for analytical effects. For example below is a principal components analysis of ~400 samples and 600 variables, where I’ve annotated the sample scores to show data aquisition date (color) and experimental samples or laboratory quality controls (shape).  One thing to look for are trends or scores grouping in the PCA scores which are correlated to analytical conditions like, batch, date, technician, etc.

PCA scores

Finally we can take a look at the PCA variable loadings which highlights a major bottleneck in metabolomics experiments, the large amount of structurally unknown molecular features.

PCA loadings

Even using feature rich electron impact mass spectra (GC/TOF) only 40% of the unknown variables could be connected to known species based on a cosine correlation >0.75.  To give you an idea the cosine correlation or dot product between the mass spectra of two structurally very similar molecules xylose and xylitol is ~ 0.8.



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s