Visualization of Multivariate Biological Models (PLS-DA and O-PLS-DA)
Its not uncommon to be faced by multiple questions at the same time. For instance imagine the following experimental design. You have one MAIN question: what is different between groups A and B, but among groups A and B are subgroups 1 and 2. This complicates things because now the answer to the MAIN question (what is different between A and B) may be slightly different for the two sub groups A|1, A|2 and B|1, B|2.
In statistics we can account for these types of experimental designs by choosing different tests. For instance in the case outlined above we could use a two-way analysis of variance (2-way ANOVA) to identify differences between A|B which are independent of differences between 1|2 (and interaction between A|B and 1|2). In the case of multivariate modeling we can achieve a similar effect by using covariate adjustments. For example we can use the residuals from a simple linear model for differences between 1|2 as the 1|2-effect adjusted data to be used to test for differences between A|B. Here is a visual example of this approach using:
2) 1|2–adjusted PLS-DA model for A|B1) PCA to evaluate the data variance between A and B (GREEN and RED) and 1 and 2 (SMALL or LARGE)
3) 1|2–adjusted O-PLS-DA model for A|B
Based on the PCA we see that the differences between A|B are also affected by 1|2. This is evident in distribution of scores based on LARGE|SMALL among A ( A|1 (GREEN|SMALL) is more different (further right) from all B than A|2 (GREEN|LARGE). The same can be said for B, and in particular the greatest differences between all groups is between those which have the greatest separation in the X-axis (1st principal component) which are RED|LARGE and GREEN|SMALL.
To identify the greatest difference between RED|GREEN which is independent of differences due to SMALL|LARGE, we can use a SMALL|LARGE -adjusted data to create a PLS-DA model to discriminate between RED|GREEN.
This projection of the differences between A|B is the same for SMALL|LARGE groups. Ideally we want the two groups scores to be maximally separated in the X-axis or 1st LV. We see that this is not the case above, and instead the explanation of how the variables contribute to differences between GREEN|RED needs to be answered by explaining scores variance in X and Y axes or two dimensions.
Next we try the O-PLS-DA algorithm, which aims to rotate the projection of the data to maximize the separation between GREEN|RED on the X-axis and capture unrelated or orthogonal variance on the Y-axis.
The O-PLS-DA model loadings for the 1st LV provide information regarding differences in variable magnitudes between the two groups (GREEN|RED).
We can use network mapping to visualize these weights within a domain specific context. In the case of metabolomics data this is best achieved using biochemical/chemical similarity networks.
We can create these networks by assigning edges between vertices (representing metabolites) based on biochemical relationships (KEGG RPAIRs ) or chemical similarities (Tanimoto coefficient >0.7). We can then map the O-PLS-DA model loadings to this network’s visual properties (vertex: size, color, border, and inset graphic).
For example we can map vertex size to the matabolite’s importance in the explained discrimination between groups (loading on O-PLS-DA LV 1) and color the direction of change (blue, decrease; red, increase). Metabolites displaying significant differences between RED and GREEN groups (two-way ANOVA, p < 0.05 adjusting for 1|2) are shown at maximum size, with a black border and contain a box-plot visualization.
Here is network mapping the O-PLS-DA model loadings into a biological context and displaying graphs for import parameters means among groups stratified by A|B and 1|2 (left to right: A|1, A|2,B|1,B|2).
Here is another network with the same edge and vertex properties as above, except the inset graphs show differences between groups A|B adjusted for the effect of 1|2.
This entry was posted on February 9, 2013 by dgrapov. It was filed under Uncategorized and was tagged with ANOVA, chemical similarity network, covariate adjustment, Cytoscape, ExCytR, imDEV, metabolomics, O-PLS, O-PLS-DA, PCA, PLS-DA.