When you want to get to know and love your data

Archive for October, 2013

Biochemical Network Mapping: A Story of Two Tissues

Networks are powerful visualizations which can both express the relationships between many variables and also how those variables contribute to answering or supporting some other question or hypothesis. This is accomplished in two parts 1) network generation and 2) network mapping.

Here are some examples of these two ideas applied to biological data sets. The connections for these networks are calculated using a tool built in R and Shiny, MetaMapR. This prototype tool, given molecular identifiers (e.g. chemical name, KEGG, PubChem CID), mass spectra (encoded as m/z:intensity strings), and/or empirical information (see metabolomics demo data for formatting), and generates an edge list and node attributes tables describing networks of metabolites connected based on, 1) biochemical relationships (KEGG main reactions), 2) structural similarities (Tanimoto scores between molecular fingerprints), 3) mass spectral similarities (electron ionization on a GC/TOF) and 4) linear dependencies (partial correlations).

tissue network

1) The above network is constructed based on known biochemical relationships (direct precursor to product transformations, violet) and structural similarities calculated based on commonality/difference in functional groups, substructures, etc.

2) The properties of the network nodes or vertices represent both biochemical (shape), statistical (color, gray = p-value>0.05)  and multivariate rank (size, O-PLS-DA loadings).

The question is a two class discrimination in matched case/control study comparing two tissues (abnormal and normal) sampled from the same specimen. The statistical analysis was done using mixed effects modeling to adjust for intra-specimen variability unrelated to the main question: comparison of the abnormal tissues metabolomic features to those of normal tissue. A non gray color designates if the metabolite was increased (red) or decreased (green) in the abnormal tissue compared to normal. The size shows the importance of this change as rankings in a multivariate discrimination model calculated using orthogonal signal correction partial least squares discriminant analysis (O-PLS-DA)

Here are some more examples of the different combinations of edge and node attributes.

mass spectral

The network above maintains most of the node attributes mentioned above, but instead of biochemical and chemical similarity, now the metabolites are connected based on measures of empirical correlation (blue, negative; orange positive) and mass spectral similarity (cosine correlation between electron ionization metabolite mass spectra). This network is helpful for investigating the potential roles of unknown metabolites (designated by numbers) by linking them either through correlation or mass spectral similarity to known metabolites. Unknown metabolites making mass spectral connections to multiple closely structural related metabolites may suggest that the unknown share these molecules physical features. Another interesting feature of the network is that the relative increase (red) or decrease (green) of the variables is shown in the context of their linear relationships (in this case partial correlations). Only in the case of a partial correlation does it make any sense to see molecules positively correlated with each other but changing in opposite directions with regards to the experimental system.
Analysis of empirical relationships in the data, in this case mass spectral signal intensities, can be used to organize many changes into related groups.

known partial correlation network2

In the above network, changes in metabolites can be grouped into 3 major groups  1) related increases (red, A) to which are negatively correlated 2) decreased cluster (green, B), and bridging the two 3) a cluster of negatively correlated, to each other,  metabolites (center blue line).