Metabolomic network analysis can be used to interpret experimental results within a variety of contexts including: biochemical relationships, structural and spectral similarity and empirical correlation. Machine learning is useful for modeling relationships in the context of pattern recognition, clustering, classification and regression based predictive modeling. The combination of developed metabolomic networks and machine learning based predictive models offer a unique method to visualize empirical relationships while testing key experimental hypotheses. The following presentation focuses on data analysis, visualization, machine learning and network mapping approaches used to create richly mapped metabolomic networks. Learn more at www.createdatasol.com
The following presentation also shows a sneak peak of a new data analysis visualization software, DAVe: Data Analysis and Visualization engine. Check out some early features. DAVe is built in R and seeks to support a seamless environment for advanced data analysis and machine learning tasks and biological functional and network analysis.
As an aside, building the main site (in progress) was a fun opportunity to experiment with Jekyll, Ruby and embedding slick interactive canvas elements into websites. You can checkout all the code here https://github.com/dgrapov/CDS_jekyll_site.
R users: networkly: network visualization in R using Plotly
In addition to their more common uses, networks can be used as powerful multivariate data visualizations and exploration tools. Networks not only provide mathematical representations of data but are also one of the few data visualization methods capable of easily displaying multivariate variable relationships. The process of network mapping involves using the network manifold to display a variety of other information e.g. statistical, machine learning or functional analysis results (see more mapped network examples).
The combination of Plotly and Shiny is awesome for creating your very own network mapping tools. Networkly is an R package which can be used to create 2-D and 3-D interactive networks which are rendered with plotly and can be easily integrated into shiny apps or markdown documents. All you need to get started is an edge list and node attributes which can then be used to generate interactive 2-D and 3-D networks with customizable edge (color, width, hover, etc) and node (color, size, hover, label, etc) properties.
2-Dimensional Network (interactive version)
3-Dimensional Network (interactive version)
At a recent Saint Louis R users meeting I had the pleasure of giving a basic introduction to the awesome dplyr R package. For me, data analysis ubiquitously involves splitting the data based on grouping variable and then applying some function to the subsets or what is termed split-apply-combine. Having personally recently incorporated dplyr into my data wrangling workflows; I’ve found this package’s syntax and performance a joy to work with. My feeling about dplyr are as follows.
Data wrangling without dplyr.
Data wrangling with dplyr.
This tutorial features an introduction to common dplyr verbs and an overview of implementing split-apply-combine in dplyr.
Some of my conclusions were; not only does dplyr make writing data wrangling code clearer and far faster, the packages calculation speed is also very high (non-sophisticated comparison to base).
The plot above shows the calculation time for 10 replications in seconds (y-axis) for calculating the median of varying number of groups (x-axis), rows (y-facet) and columns (x-facet) with (green line) and without (red line) dplyr.
Recently I had the pleasure of teaching data analysis at the 2014 UC Davis Proteomics Workshop. This included a hands on lab for making gene ontology enrichment networks. You can check out my lecture and tutorial below or download all the material.
2014 UC Davis Proteomics Workshop Dmitry Grapov is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.