## Comparison of Serum vs Urine metabolites +

Primary metabolites in human serum or urine.

Oh oh, there seem to be some outliers: serum samples looking like urine and vice versa. Fix these and evaluate using PCA and hierarchical clustering on rank correlations.

Now things look more believable. Next let us test the effects of data pre-treatment on PLS-DA model scores for a 3 group comparison in serum. Ideally group scores would be maximally resolved in the dimension of the first latent variable (x) and inter-group variance would be orthogonal or in the y-axis.

Compared to raw data (TOP) where ~ 3 top variables (glucose, urea and mannitol) dominate the variance structure, the autoscaled model, due to variable-wise mean subtraction and division by the standard deviation, displays a more balanced contribution to scores variance by variables. The larger separation between WHITE and RED class scores along the x-axis suggest improved classifier performance over raw data model and overview of samples with scores outside their respective group’s Hotelling’s T ellipse (95%) might point to a sample outlier to further investigate or potentially exclude from the current test.

This entry was posted on December 16, 2012 by dgrapov. It was filed under Uncategorized and was tagged with autoscaling, clustering, imDEV, metabolomics, normalizations, outliers, PCA, PLS-DA.

## Leave a Reply