When you want to get to know and love your data

Archive for February, 2013

Visualizing Sample Scores Trajectory for Repeated Measures

PLS-DA sample scores for a  discrimination model identifying multivariate changes in metabolomic measurements before (pre) or after (post) some experimental manipulation.

PLS_DA repeated measures trajectoryBased on scores plot for samples given changes in >300 biological parameters; it looks like there are two patterns of samples movement through this principal predictive plane. Few move in the direction capturing the most variance in data matrix or (x-axis, 31%), but the majority show an interaction between x and  y (the second dimension explaining only 8%). Also, the most pre or before looking samples in the first dimension  (142 and 77, note farthest right) are the least changed post or after the experimental treatment.

Visualization of Multivariate Biological Models (PLS-DA and O-PLS-DA)

Its not uncommon to be faced by multiple questions at the same time. For instance imagine the following experimental design. You have one MAIN question: what is different between groups A  and B, but among groups A and B are subgroups 1 and 2. This complicates things because now the answer to the MAIN question (what is different between A and B) may be slightly different for the two sub groups A|1, A|2 and B|1, B|2.



In statistics we can account for these types of experimental designs by choosing different tests. For instance in the case outlined above we could use a two-way analysis of variance (2-way ANOVA) to identify differences between A|B which are independent of differences between 1|2 (and interaction between A|B and 1|2). In the case of multivariate modeling we can achieve a similar effect by using covariate adjustments. For example we can use the residuals from a simple linear model for  differences between 1|2 as the 1|2-effect adjusted data to be used to test for differences between A|B. Here is a visual example of this approach using:

2) 1|2–adjusted PLS-DA model for A|B
1) PCA to evaluate the data variance between A and B (GREEN and RED) and 1 and 2 (SMALL or LARGE)

3) 1|2–adjusted O-PLS-DA model for A|B

Based on the PCA we see that the differences between A|B are also affected by 1|2. This is evident in distribution of scores based on LARGE|SMALL among A ( A|1 (GREEN|SMALL) is more different (further right) from all B than A|2 (GREEN|LARGE). The same can be said for B, and in particular the greatest differences between all groups is between those which have the greatest separation in the X-axis (1st principal component) which are RED|LARGE and GREEN|SMALL. 

To identify the greatest difference between RED|GREEN which is independent of differences due to SMALL|LARGE, we can use a SMALL|LARGE -adjusted data to create a PLS-DA model to discriminate between RED|GREEN.

This projection of the differences between A|B is the same for SMALL|LARGE groups. Ideally we want the two groups scores to be maximally separated in the X-axis or 1st LV. We see that this is not the case above, and instead the explanation of how the variables  contribute to differences between  GREEN|RED needs to be answered by explaining scores variance in X and Y axes or  two dimensions.

Next we try the O-PLS-DA algorithm, which aims to rotate the projection of the data to maximize the separation between GREEN|RED on the X-axis and capture unrelated or orthogonal variance on the Y-axis.
The O-PLS-DA model loadings for the 1st LV provide information regarding differences in variable magnitudes between the two groups (GREEN|RED).

We can use network mapping to visualize these weights within a domain specific context. In the case of metabolomics data this is best achieved using biochemical/chemical similarity networks.

We can create these networks by assigning edges between vertices (representing metabolites) based on biochemical relationships (KEGG RPAIRs ) or chemical similarities (Tanimoto coefficient >0.7). We can then map the O-PLS-DA model loadings to this network’s visual properties (vertex: size, color, border, and inset graphic).


For example we can map vertex size to the matabolite’s importance in the explained discrimination between groups (loading on O-PLS-DA LV 1) and color the direction of change (blue, decrease; red, increase). Metabolites displaying significant differences between RED and GREEN groups (two-way ANOVA, p < 0.05 adjusting for 1|2) are shown at maximum size, with a black border and contain a box-plot  visualization.

Here is  network mapping the O-PLS-DA model loadings into a biological context and displaying graphs for import parameters means among groups stratified by A|B and 1|2 (left to right: A|1, A|2,B|1,B|2).


Here is  another network with the same edge and vertex properties as above, except the inset graphs show differences between groups A|B adjusted for the effect of 1|2.


Data analysis approaches to modeling changes in primary metabolism