## High Dimensional Biological Data Analysis and Visualization

High dimensional biological data shares many qualities with other forms of data. Typically it is wide (samples << variables), complicated by experiential design and made up of complex relationships driven by both biological and analytical sources of variance. Luckily the powerful combination of R, Cytoscape (< v3) and the R package RCytoscape can be used to generate high dimensional and highly informative representations of complex biological (and really any type of) data. Check out the following examples of network mapping in action or view a more indepth presentation of the techniques used below.

Partial correlation network highlighting changes in tumor compared to control tissue from the same patient.

Biochemical and structural similarity network of changes in tumor compared to control tissue from the same patient.

Hierarchical clusters (color) mapped to a biochemical and structural similarity network displaying difference before and after drug administration.

Partial correlation network displaying changes in metabolite relationships in response to drug treatment.

Partial correlation network displaying changes in disease and response to drug treatment.

Check out the full presentation below.

February 22, 2014 | Categories: Uncategorized | Tags: biochemical network, chemical similarity network, clustering, correlation network, Cytoscape, data analysis, data visualization, Devium, metabolomics, multivariate, network, network mapping, O-PLS-DA, r-bloggers, tutorial | Leave a comment

## Network Mapping Video

Here are a video and slides for a presentation of mine about my favorite topic :

June 14, 2013 | Categories: Uncategorized | Tags: biochemical network, biochmical network, chemical similarity network, clustering, Cytoscape, data analysis, data visualization, metabolomics, multivariate, network, network mapping, networks, O-PLS, O-PLS-DA, PCA, PLS, PLS-DA | 1 Comment

## Comparison of Serum vs Urine metabolites +

Primary metabolites in human serum or urine.

Oh oh, there seem to be some outliers: serum samples looking like urine and vice versa. Fix these and evaluate using PCA and hierarchical clustering on rank correlations.

Now things look more believable. Next let us test the effects of data pre-treatment on PLS-DA model scores for a 3 group comparison in serum. Ideally group scores would be maximally resolved in the dimension of the first latent variable (x) and inter-group variance would be orthogonal or in the y-axis.

Compared to raw data (TOP) where ~ 3 top variables (glucose, urea and mannitol) dominate the variance structure, the autoscaled model, due to variable-wise mean subtraction and division by the standard deviation, displays a more balanced contribution to scores variance by variables. The larger separation between WHITE and RED class scores along the x-axis suggest improved classifier performance over raw data model and overview of samples with scores outside their respective group’s Hotelling’s T ellipse (95%) might point to a sample outlier to further investigate or potentially exclude from the current test.

December 16, 2012 | Categories: Uncategorized | Tags: autoscaling, clustering, imDEV, metabolomics, normalizations, outliers, PCA, PLS-DA | Leave a comment