Here is an example of a network leveraging three dominant aspects of metabolomic experiments (biochemical, chemical and mass spectral knowledge) to connect measured variables. This is a network for a blinded data set (sample ids are not known), which I’ve made for a member of my lab presenting their work at the Metabolomics Society Conference in Glasgow, Scotland.
With out knowing the experimental design we can still analyze our data for analytical effects. For example below is a principal components analysis of ~400 samples and 600 variables, where I’ve annotated the sample scores to show data aquisition date (color) and experimental samples or laboratory quality controls (shape). One thing to look for are trends or scores grouping in the PCA scores which are correlated to analytical conditions like, batch, date, technician, etc.
Finally we can take a look at the PCA variable loadings which highlights a major bottleneck in metabolomics experiments, the large amount of structurally unknown molecular features.
Even using feature rich electron impact mass spectra (GC/TOF) only 40% of the unknown variables could be connected to known species based on a cosine correlation >0.75. To give you an idea the cosine correlation or dot product between the mass spectra of two structurally very similar molecules xylose and xylitol is ~ 0.8.