When you want to get to know and love your data

Posts tagged “chemical translations

Featured Network in Chemical and Engineering News (C&EN)

I am happy to announce the release of MetaMapR (v1.2.0).

New features include: 

  • An independent module for biological database identifier translations using the Chemical Translation System (CTS)
  • a retention time filter for mass spectral connections
  • increase in calculation speed

An application of MetaMapR was recently featured in an article in the Nov. 4th 2013 issue of Chemical & Engineering News (C&EN) , 91(44). This tool was used to generate a network of > 1200 metabolites based on enzymatic transformations and structural similarities.

C and E figure

The full article can be found be found here as well as the original image.


American Society for Mass Spectrometry 2013

I am getting ready to present at the upcoming American Society for Mass Spectrometry (ASMS) conference in Minneapolis, Minnesota (dont’cha know).

If you are around check out my talk  in the section Oral: ThOB am – Informatics: Metabolomics on Thursday (06/14) at 8:30 am in room L100. Here is teaser

WCMC network

Above is a network representation of biochemical (red edges, KEGG RPAIRS) and structural similarities (gray edges, Tanimoto coefficient> 0.7) of > 1100 biological molecules (see here for some of their descriptions). Keep an eye out for all the R code used to generate this network as well as all the slides from my talk.

Here is my talk abstract.

Multivariate and network tools for analysis and visualization of metabolomic data
Dmitry Grapov1, 2; Oliver Fiehn1, 2
1West Coast Metabolomics Center, Davis, CA; 2University of California Davis, Davis, California
NOVEL ASPECT: A software tool for calculation and mapping of statistical and multivariate results from metabolomic experiments into biologically relevant contexts.
INTRODUCTION: While a variety of tools capable of producing network representations of metabolomic data exist, none are fully integrated with statistical and multivariate methods necessary to analyze, visualize and summarize the high dimensional data. We have developed an open source toolset for the analysis of high dimensional biological data which combines the computational capabilities of the R statistical programming environment with the network mapping and visualization features of Cytoscape. A graphical user interface is used to seamlessly integrate calculation and interpretation of statistical and multivariate results in the context of network graphs which are constructed based on biological relationships, chemical similarities or empirical variable dependencies.
METHODS: An R based GUI utilizing RCytoscape and CytoscapeRPC is used to connect R and Cytoscape. Data import, manipulation  and export are achieved through an interface to MS Excel and Google Docs. R packages provide a variety of analyses methods including: parametric and non-parametric multiple hypotheses testing, false discovery rate correction, exploratory principal and independent components analyses, hierarchical and model based clustering, and multivariate predictive modeling such as partial least squares and support vector machines. Relationships between biological parameters can be represented in the form of networks which are connected based on user defined edge lists or from pubchem chemical identifiers which are used to construct biochemical and chemical similarity networks based on the KEGG reactant pairs and Tanimoto distances, or Gaussian Markov networks based partial correlations.
ABSTRACT: Comparisons of plasma primary metabolite excursion patterns during an oral glucose tolerance test (OGTT) were used to model changes in metabolism associated with a diet and exercise intervention. Plasma aliquots, taken at 30 minute intervals (0-120 minutes) were analyzed by GC/TOF and used to compare metabolite levels (n=323) in a cohort of overweight women before and after a 14 week dietary and exercise regimen. Mixed effects models, partial least squares and partial least squares discriminant analysis (PLS-DA)  were used to study OGTT and intervention-associated changes in metabolite baselines, area under the curve for OGTT-associated excursions , and metabolite time course patterns. Metabolic changes due to the oral infusion of glucose were visualized by mapping statistical test p-values and intervention-adjusted PLS model for time during the OGTT variable coefficient weights into a network connected based on KEGG reactant pairs and Tanimoto distances > 70. Vertices, representing metabolites were sized and colored based on the absolute PLS coefficient magnitude and sign respectively. Metabolites showing significant perturbations during the OGTT (false discovery rate (q = 0.05) adjusted p-value < 0.05) were highlighted with node-inset graphs displaying  means and confidence intervals during the time course for before and after intervention comparisons. This network was useful for identifying OGTT-associated interactions between the major biochemical domains (lipids, amino acids, organic acids, and carbohydrates). In a follow-up analysis a Gaussian Markov partial correlation network was used to investigate intervention-associated changes in metabolite-metabolite and metabolite-clinical parameter (insulin, hormones) dependency relationships.

Translating between identifiers: R interface to the Chemical Translation Service (CTS)


To enhance inference using  domain knowledge it is necessary to match your query to a database containing domain knowledge.

The Chemical Translation Service (CTS) can be used to translate between molecular identifiers for many (~400K) naturally occurring biological small molecules or metabolites, which enables

CTSgetR , is an easy to use R interface to CTS, which enables translation between the following repositories of biological domain knowledge:

  • “Chemical Name”
  • “InChIKey”
  • “InChI Code”
  • “PubChem CID”
  • “Pubchem SID”
  • “ChemDB”
  • “ZINC”
  • “Southern Research Institute”
  • “Specs”
  • “MolPort”
  • “ASINEX”
  • “ChemBank”
  • “MLSMR”
  • “Emory University Molecular Libraries Screening Center”
  • “ChemSpider”
  • “DiscoveryGate”
  • “Ambinter”
  • “Vitas-M Laboratory”
  • “ChemBlock”

Check out an example translation from the universal molecular identifier, InchiKey, to the well referenced  PubChem Chemical Identifier (CID)