When you want to get to know and love your data

Archive for November, 2012

Excel + Cytoscape + R = ExCytR

My new project is coming along nicely and should be released early 2013. It builds on the structures developed in imDEV to link Excel, Cytoscape and R using RExcel,  RCytoscape, and CytoscapeRPC . This trio can be used to rapidly generate beautiful and  informative network representations of data.

Here is an example of a  undirected Gaussian graphical Markov metabolic network  calculated from time course metabolomic measurements generated by gas chromatography time-of-flight mass spectrometry (GC/TOF).

Nodes represent metabolomic variables whose characteristics encode chemometric data and the results of statistical analyses and multivariate modeling. Ggplot2 is used to generate graphs of the time course data representing the means and standard error of metaboloite concentrations in two study populations. The connections between nodes or edges are calculated from q-order partial correlations using the R package qpgraph.

Node graph gauss markov network

Type 2 Diabetes Associated Changes in the Plasma Non-Esterified Fatty Acids, Oxylipins and Endocannabinoids

Today marks the publication of the first article making heavy use of imDEV for  all aspects of data analysis and visualization!

Grapov D, Adams SH, Pedersen TL, Garvey WT, Newman JW (2012) Type 2 Diabetes Associated Changes in the Plasma Non-Esterified Fatty Acids, Oxylipins and Endocannabinoids. PLoS ONE 7(11): e48852. doi:10.1371/journal.pone.0048852

Here are some figures from the manuscript.

Metabolites are represented by circular “nodes” linked by “edges” with arrows designating the direction of the biosynthetic gradient (i.e. substrate to product). Some metabolites are linked by more than one enzymatic step. Node sizes represent magnitudes of differences in plasma metabolite geometric means (ΔGM). Arrow widths represent magnitudes of changes in product over substrate ratios (ΔP:S). Colors of node borders and arrows represent the significance and direction of changes relative to non-diabetics as per the figure legend. Differences are significant at p<0.05 by Mann-Whitney U test adjusted for FDR (q = 0.1).

Horizontal scatter plots of the log transformed concentrations for each model variable are shown. The horizontal arrangement of metabolite scatter plots is scaled to their loading in the discriminant model. A given species importance in the classification increases with increasing displacement from the origin (broken line). The direction of the displacement, left or right, designates whether the species was decreased (left) or increased (right) in the diabetic relative to the non-diabetic patients. The overall model discrimination performance is presented as a scatter plot of subject model scores (inset).

Significant (p<0.05) non-parametric Spearman’s correlations for non-diabetic (top left triangle) and type 2 diabetic (bottom right triangle) subjects are indicated by orange (positive) and blue (negative) intersections.

Spearman’s correlations were used to generate multi-dimensionally scaled parameter connectivity networks for variable intercorrelations. Networks were oriented with fasting glucose at the origin and SFA in the lower right quadrant. Colored ellipses represent the 95% probability locations of metabolite classes (Hoettlings T2, p<0.05). Nodes indicate clinical parameters (diamonds),

*Keep an eye out for news about my new network graphing package, which greatly advances imGraph by providing GUIs interfacing Excel, R and Cytoscape!