In addition to their more common uses, networks can be used as powerful multivariate data visualizations and exploration tools. Networks not only provide mathematical representations of data but are also one of the few data visualization methods capable of easily displaying multivariate variable relationships. The process of network mapping involves using the network manifold to display a variety of other information e.g. statistical, machine learning or functional analysis results (see more mapped network examples).

The combination of Plotly and Shiny is awesome for creating your very own network mapping tools. Networkly is an R package which can be used to create 2-D and 3-D interactive networks which are rendered with plotly and can be easily integrated into shiny apps or markdown documents. All you need to get started is an edge list and node attributes which can then be used to generate interactive 2-D and 3-D networks with customizable edge (color, width, hover, etc) and node (color, size, hover, label, etc) properties.
2-Dimensional Network (interactive version)

I recently had the pleasure of giving a presentation on one of my favorite topics, network mapping, and its application to metabolomic and genomic data integration. You can check out the full presentation below.

Recently I had the pleasure of speaking about one of my favorite topics, Network Mapping. This is a continuation of a general theme I’ve previously discussed and involves the merger of statistical and multivariate data analysis results with a network.

Over the past year I’ve been working on two major tools, DeviumWeb and MetaMapR, which aid the process of biological data (metabolomic) network mapping.

DeviumWeb– is a shiny based GUI written in R which is useful for:

data manipulation, transformation and visualization

statistical analysis (hypothesis testing, FDR, power analysis, correlations, etc)

orthogonal partial least squares multivariate modeling (O-/PLS/-DA)

MetaMapR– is also a shiny based GUI written in R which is useful for calculation and visualization of various networks including:

biochemical

structural similarity

mass spectral similarity

correlation

Both of theses projects are under development, and my ultimate goal is to design a one-stop-shop ecosystem for network mapping.

In addition to network mapping,the video above and presentation below also discuss normalization schemes for longitudinal data and genomic, proteomic and metabolomic functional analysis both on a pathway and global level.

I’ve recently participated in the American Society of Mass Spectrommetry (ASMS) conference and had a great time. I met some great people and have a few new ideas for future projects. Specifically giving a go at using self-organizing maps (SOM) and the R package mcclust for clustering alternatives to hierarchical and k-means methods.

I had the pleasure of speaking at the conference in the Informatics-Metabolomics section, and was also a co-author on a project detailing a multi-metabolomics strategy (primary metabolites, lipids, and oxylipins) for the study of type 1 diabetes in an animal model. Keep an eye out for my full talk in an upcoming post.

High dimensional biological data shares many qualities with other forms of data. Typically it is wide (samples << variables), complicated by experiential design and made up of complex relationships driven by both biological and analytical sources of variance. Luckily the powerful combination of R, Cytoscape (< v3) and the R package RCytoscape can be used to generate high dimensional and highly informative representations of complex biological (and really any type of) data. Check out the following examples of network mapping in action or view a more indepth presentation of the techniques used below.

Partial correlation network highlighting changes in tumor compared to control tissue from the same patient.

Biochemical and structural similarity network of changes in tumor compared to control tissue from the same patient.

Network mapping is a high-dimensional data visualization technique which can be applied to virtually any type of data. I recently gave a tutorial on the basics of network mapping where each participants generated a mapped network for their name.

Download the full tutorial at TeachingDemos, and then follow along with the tutorial at your own pace.

I am happy to announce the release of MetaMapR (v1.2.0). New features include:

An independent module for biological database identifier translations using the Chemical Translation System (CTS)

a retention time filter for mass spectral connections

increase in calculation speed

An application of MetaMapR was recently featured in an article in the Nov. 4th 2013 issue of Chemical & Engineering News (C&EN) , 91(44). This tool was used to generate a network of > 1200 metabolites based on enzymatic transformations and structural similarities.

The full article can be found be found here as well as the original image.

Metabolites are represented by circular “nodes” linked by “edges” with arrows designating the direction of the biosynthetic gradient (i.e. substrate to product). Some metabolites are linked by more than one enzymatic step. Node sizes represent magnitudes of differences in plasma metabolite geometric means (ΔGM). Arrow widths represent magnitudes of changes in product over substrate ratios (ΔP:S). Colors of node borders and arrows represent the significance and direction of changes relative to non-diabetics as per the figure legend. Differences are significant at p<0.05 by Mann-Whitney U test adjusted for FDR (q = 0.1).

Figure 1. The type 2 diabetes-associated lipidomic changes projected in context of their biological relationships in obese African-American women.

Horizontal scatter plots of the log transformed concentrations for each model variable are shown. The horizontal arrangement of metabolite scatter plots is scaled to their loading in the discriminant model. A given species importance in the classification increases with increasing displacement from the origin (broken line). The direction of the displacement, left or right, designates whether the species was decreased (left) or increased (right) in the diabetic relative to the non-diabetic patients. The overall model discrimination performance is presented as a scatter plot of subject model scores (inset).

Scatterplot matrix for overview of correlations and regressions, displaying box plots for Iris data species, variable histograms, correlation statistics, stripcharts and best fit lines.

Spearman’s correlations were used to generate multi-dimensionally scaled parameter connectivity networks for variable intercorrelations. Networks were oriented with fasting glucose at the origin and SFA in the lower right quadrant. Colored ellipses represent the 95% probability locations of metabolite classes (Hoettlings T2, p<0.05). Nodes indicate clinical parameters (diamonds), <20-carbon fatty acid metabolites (circles) and ≥20-carbon fatty acid metabolites (triangles), with discriminant model variables and glucose enlarged. Significant correlations between species are designated by orange (positive) or blue (negative) connecting lines (p<0.05, non-diabetic; p<0.01, diabetic participants).