When you want to get to know and love your data

Posts tagged “chemical similarity network

Metabolomics and Beyond: Challenges and Strategies for Next-gen Omic Analyses

Recently I had the pleasure of giving lecture for the Metabolomics Society on Challenges and Strategies for Next-gen Omic Analyses. You can check out all of my slides and video of the lecture below.

Mapping to the MetabolOMIC Manifold

I recently had the pleasure of giving a presentation on one of my favorite topics, network mapping, and its application to metabolomic and genomic data integration. You can check out the full presentation below.

Diabetes associated metabolomic perturbations in NOD mice

Recently I was lucky enough to publish some of my research findings in the Journal Metabolomics. You can check out the full paper, 10.1007/s11306-014-0706-2,  or take a look at the abstract and figures below.
Non-obese diabetic (NOD) mice are a widely-used model of type 1 diabetes (T1D). However, not all animals develop overt diabetes. This study examined the circulating metabolomic profiles of NOD mice progressing or not progressing to T1D. Total beta-cell mass was quantified in the intact pancreas using transgenic NOD mice expressing green fluorescent protein under the control of mouse insulin I promoter. While both progressor and non-progressor animals displayed lymphocyte infiltration and endoplasmic reticulum stress in the pancreas tissue, overt T1D did not develop until animals lost ~70 % of the total beta-cell mass. Gas chromatography time of flight mass spectrometry was used to measure >470 circulating metabolites in male and female progressor and non-progressor animals (n = 76) across a wide range of ages (neonates to >40-week). Statistical and multivariate analyses were used to identify age and sex independent metabolic markers which best differentiated progressor and non-progressor animals’ metabolic profiles. Key T1D-associated perturbations were related with: (1) increased plasma glucose and reduced 1,5-anhydroglucitol markers of glycemic control; (2) increased allantoin, gluconic acid and nitric acid-derived saccharic acid markers of oxidative stress; (3) reduced lysine, an insulin secretagogue; (4) increased branched-chain amino acids, isoleucine and valine; (5) reduced unsaturated fatty acids including arachidonic acid; and (6) perturbations in urea cycle intermediates suggesting increased arginine-dependent NO synthesis. Together these findings highlight the strength of the unique approach of comparing progressor and non-progressor NOD mice to identify metabolic perturbations involved in T1D progression.


Fig. 1 Immune cell infiltration and beta-cell destruction in prediabetic NOD mice. A Visualization of spatial islet distribution in the context of the vascular network in the intact pancreas. A prediabetic NOD mouse at 27-week. B The body region of the NOD mouse shown in A. Note that substantial beta-cell destruction is observed in the NOD pancreas (i.e. a loss of GFP-expressing beta-cells). C Intraislet capillary network in the body region of a wild-type mouse at 21-week. D Immunohistochemical staining. Insulin (green), glucagon (red), somatostatin (white) and nuclei (blue). E Hypertrophic islet with massive infiltration of T-lymphocytes. (a) Hematoxylin-Eosin (HE) staining of the islet showing peripheral- and intra-islet infiltrating lymphocytes and remaining endocrine islet cells. (b) A serial section stained for CD4-positive lymphocytes by ABC-staining (brown). c A serial section stained for CD8-positive lymphocytes. F Ultrastructural analysis of hypertrophic islets in non-diabetic and diabetic littermates. (a) Non-diabetic male NOD mouse (41-week old, 4-h fasting BG: 136 mg/dL) showing a hyperactive beta-cell with lymphocyte infiltration and vesicles without dense core granules. (b) Beta-cells in diabetic female NOD mouse (40-week old, 4-h fasting BG: 559 mg/dL) appears to be intact despite the presence of ongoing insulitis. G Progressive degradation of endoplasmic reticulum (ER). (a) Well-developed ER (ER) in a beta-cell undergoing insulitis. (b) ER degradation. Ribosomes are detached (shed) from the ER membrane and are aggregated (ER). Nuclear damage is seen with the formation of foam-like structures (N). Immature granules with less dense cores (G) as well as cytoplasmic liquefaction (CL) are observed. (c) ER membrane breakdown. ER membrane breakdown resulted in aggregation of shed ribosomes (ER). An adjacent PP-cell (PP) appears to be intact (identified by characteristic moderately dense cores of pancreatic polypeptide-containing secretory granules). (d) Beta-cell degradation. ER swelling (ER), ribosome shedding, amorphous cytoplasmic material (R) and cytoplasmic, liquefaction (L) are observed in the same beta-cell


Fig. 2 Progression of autoimmune diabetes in NOD mice. A (a) Virtual slice capture of a whole mouse pancreas from mouse insulin promoter I (MIP)-GFP mice on NOD background. (b) Measured beta-cell/islet distribution. (c) Corresponding 3D scatter plot of islet parameters depicts distribution of islets with various sizes and shapes. Each dot represents a single islet. B (a) Representative data showing islet growth in wild-type mice at 20- and 28-week of age. (b) Examples of beta-cell loss at 20-week (non-diabetic) and 28-week (diabetic) in NOD mice. C Heterogeneous beta-cell loss in NOD mice. Frequency is plotted against islet size. D Three distinct groups in the development of T1D in NOD mice. 3D scatter plot showing the relationship among blood glucose levels (BG), total beta-cell area and age. Three groups of mice are color-coded as diabetic mice (red), young mice with normoglycemia (<25 week; green) and old mice with normoglycemia (25–40 week; blue)


Fig. 3 Biochemical network displaying metabolic differences between diabetic and non-diabetic NOD mice. Metabolites are connected based on biochemical relationships (blue, KEGG RPAIRS) or structural similarity (violet, Tanimoto coefficient ≥0.7). Metabolite size and color represent the importance (O-PLS-DA model loadings, LV 1) and relative change (gray p adj > 0.05; green increase; red decrease) in diabetic compared non-diabetic NOD mice. Shapes display metabolites’ molecular classes or biochemical sub-domains and top descriptors of T1D-associated metabolic perturbations (Table 1) are highlighted with thick black borders


Fig. 4 Partial correlation network displaying associations between all type 1 diabetes-dependent metabolomic perturbations. All significantly altered metabolites (p adj ≤ 0.05, Supplemental Table S3) are connected based on partial correlations (p adj ≤ 0.05). Edge width displays the absolute magnitude and color the direction (orange positive; blue negative) of the partial-coefficient of correlation. Metabolite size and color represent the importance (O-PLS-DA model loadings, LV 1) and relative change (gray p adj > 0.05; green increase; red decrease) in diabetic compared non-diabetic NOD mice. Shapes display metabolites’ molecular classes or biochemical sub-domains (see Fig. 3 legend), and top descriptors of T1D-associated metabolic perturbations (Table 1) are highlighted with thick black borders

In conclusion, we identified marked differences in the rates of progression of NOD mice to T1D. Metabolomic analysis was used to identify age and sex independent metabolic markers, which may explain this heterogeneity. Future studies combining metabolic end points (as they correlate with beta-cell mass) and genetic risk profiles will ultimately lead to a more complete understanding of disease onset and progression.

Multivariate Data Analysis and Visualization Through Network Mapping

Recently I had the pleasure of speaking about one of my favorite topics, Network Mapping. This is a continuation of a general theme I’ve previously discussed and involves the merger of statistical and multivariate data analysis results with a network.

Over the past year I’ve been working on two major tools, DeviumWeb and MetaMapR, which aid the process of biological data (metabolomic) network mapping.


DeviumWeb– is a shiny based GUI written in R which is useful for:

  • data manipulation, transformation and visualization
  • statistical analysis (hypothesis testing, FDR, power analysis, correlations, etc)
  • clustering (heiarchical, TODO: k-means, SOM, distribution)
  • principal components analysis (PCA)
  • orthogonal partial least squares multivariate modeling (O-/PLS/-DA)


MetaMapR– is also a shiny based GUI written in R which is useful for calculation and visualization of various networks including:

  • biochemical
  • structural similarity
  • mass spectral similarity
  • correlation

Both of theses projects are under development, and my ultimate goal is to design a one-stop-shop ecosystem for network mapping.

In addition to network mapping,the video above and presentation below also discuss normalization schemes for longitudinal data and genomic, proteomic and metabolomic functional analysis both on a pathway and global level.

As always happy network mapping!

Creative Commons License

ASMS 2014

I’ve recently participated in the American Society of Mass Spectrommetry (ASMS) conference and had a great time. I met some great people and have a few new ideas for future projects. Specifically giving a go at using self-organizing maps (SOM) and  the R package mcclust  for clustering alternatives to hierarchical and k-means methods.

I had the pleasure of speaking at the conference in the Informatics-Metabolomics section, and was also a co-author on a project detailing a multi-metabolomics strategy (primary metabolites, lipids, and oxylipins) for the study of type 1 diabetes in an animal model. Keep an eye out for my full talk in an upcoming post.

ASMS 2014 j fahrman

High Dimensional Biological Data Analysis and Visualization

High dimensional biological data shares many qualities with other forms of data. Typically it is wide (samples << variables), complicated by experiential design and made up of complex relationships driven by both biological and analytical sources of variance. Luckily the powerful combination of R, Cytoscape (< v3) and the R package RCytoscape can be used to generate high dimensional and highly informative representations of complex biological (and really any type of) data. Check out the following examples of network mapping in action or view a more indepth presentation of the techniques used below.

Partial correlation network highlighting changes in tumor compared to control tissue from the same patient.

Tissue network cancer

Biochemical and structural similarity network of changes in tumor compared to control tissue from the same patient.

Cancer tissue network

Hierarchical clusters (color) mapped to a biochemical and structural similarity network displaying difference before and after drug administration.

cough syrup network

Partial correlation network displaying changes in metabolite relationships in response to drug treatment.

Treatment response network

Partial correlation network displaying changes in disease and response to drug treatment.

Treatment effects network

Check out the full presentation below.

Creative Commons License

Tutorials- Statistical and Multivariate Analysis for Metabolomics

2014 winter LC-MS stats courseI recently had the pleasure in participating in the 2014 WCMC Statistics for Metabolomics Short Course. The course was hosted by the NIH West Coast Metabolomics Center and focused on statistical and multivariate strategies for metabolomic data analysis. A variety of topics were covered using 8 hands on tutorials which focused on:

  • data quality overview
  • statistical and power analysis
  • clustering
  • principal components analysis (PCA)
  • partial least squares (O-/PLS/-DA)
  • metabolite enrichment analysis
  • biochemical and structural similarity network construction
  • network mapping

I am happy to have taught the course using all open source software, including: R, and Cytoscape. The data analysis and visualization were done using Shiny-based apps:  DeviumWeb and MetaMapR. Check out some of the slides below or download all the class material and try it out for yourself.

Creative Commons License
2014 WCMC LC-MS Data Processing and Statistics for Metabolomics by Dmitry Grapov is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Special thanks to the developers of Shiny and Radiant by Vincent Nijs.


R news and tutorials contributed by (750) R bloggers