When you want to get to know and love your data

Archive for July, 2013

Orthogonal Signal Correction Partial Least Squares (O-PLS) in R

o-pls-da

I often need to analyze and model very wide data (variables >>>samples), and because of this I gravitate to robust yet relatively simple methods. In my opinion partial least squares (PLS) is a particular useful algorithm. Simply put, PLS is an extension of principal components analysis (PCA), a non-supervised  method to maximizing  variance explained in X, which instead maximizes the covariance between X and Y(s). Orthogonal signal correction partial least squares (O-PLS) is a variant of PLS which uses orthogonal signal correction to maximize the explained covariance between X and Y on the first latent variable, and components >1 capture variance in X which is orthogonal (or unrelated) to Y.

Because R does not have a simple interface for O-PLS, I am in the process of writing a package, which depends on the existing package pls.

Today I wanted to make a small example of conducting O-PLS in R, and  at the same time take a moment to try out the R package knitr and RStudio for markdown generation.

You can take a look at the O-PLS/O-PLS-DA tutorials.

I was extremely impressed with ease of using knitr and generating markdown from code using RStudio. A big thank you to Yihui Xie and the RStudio developers (Joe Cheng). This is an amazing capability which I will make much more use of in the future!

Advertisements

Interactive Heatmaps (and Dendrograms) – A Shiny App

Heatmaps are a great way to visualize data matrices. Heatmap color and organization can be used to  encode information about the data and metadata to help learn about the data at hand. An example of this could be looking at the raw data  or hierarchically clustering samples and variables based on their similarity or differences. There are a variety packages and functions in R for creating heatmaps,  including  heatmap.2. I find pheatmap particularly useful for the relative ease in annotating the top of the heat map using an arbitrary number of items (the legend needs to be controlled for best effect, not implemented).  

Heatmaps are also fun to use to interact with data!

Here is an example of a Heatmap and Dendrogram Visualizer built using the Shiny framework (and link to the code).

To run locally use the following code.

install.packages("shiny")
library(shiny)
runGitHub("Devium", username = "dgrapov",ref = "master", subdir = "Shiny/Heatmap", port = 8100)

It was interesting to debug this app using the variety of data sets available in the R  datasets package (limiting  options to data.frames).

clustering

My goals were to make an interface to:

  • transform data and visualize using Z-scales, spearman, pearson and biweight correlations
  • rotate the data (transpose dimensions) to view  row or column space separately

data

  • visualize data/relationships presented as heatmaps or dendrograms

dendrogram

  • use hierarchical clustering to organize  data

ww

  • add a top panel of annotation to display variables independent of the internal heatmap scales

wwww

  • use slider to visually select number(s) of sample or variable clusters (dendrogram cut height)

www

There are a few other options like changing heatmap color scales, adding borders or names that you can experiment with. I’ve preloaded many famous data sets found in the R data sets package a few of my favorites are iris and mtcars. There are other datsets some of which were useful for incorporating into the build to facilitate debugging and testing. The aspect of  dimension switching was probably the most difficult to keep straight (never mind legends, these may be hardest of all). What are left are informative (I hope) errors, usually coming from stats and data dimension mismatches. Try taking a look at the data structure on the “Data” tab or switching UI options for: Data, Dimension or Transformation until issues resolve. A final note before mentioning a few points about working with Shiny, missing data is set to zero and factors are omitted when making the internal heatmap but allowed in top row annotations.

Building with R and Shiny

This was my third try at building web/R/applications using Shiny.

Here are some other examples:

Basic plotting with ggplot2

Principal Components Analysis ( I suggest loading a simple .csv with headers)

It has definitely gotten easier building UIs and deploying them to the web using the excellent Rstudio and Shiny tools. Unfortunately this leaves me more time to be confused by “server side” issues.

My over all thoughts (so far) :

  • I have a lot to learn and the possibilities are immense
  • when things work as expected it is a stupendous joy! (thank you to Shiny, R and everyone who helped!)
  • when tracking down unexpected behavior I found it helpful to print app state at different levels to the browser using some simple mechanism like for instance:
#partial example

server.R
####
#create reactive objects to to "listen" for changes in states or R objects of interests
ui.opts$data<-reactive({
 tmp.data<-get(input$data)
 ui.opts$data<-tmp.data # either or
 tmp.data # may not be necessary
 })

#prepare to print objects/info to browser
output$any.name <- renderPrint({
tmp<-list()
tmp$data<-ui.opts$data()
tmp$match.dim<-match.dim
tmp$class.factor<-class.factor
tmp$dimnames<-dimnames(tmp.data)
str(tmp)
})

ui.r
####
#show/print info
mainPanel(
 verbatimTextOutput("any.name")

)

Over all two thumbs up.