When you want to get to know and love your data

Archive for July, 2012

Visualizing the Iris Data

I’ve been working on additional scatter plot matrix plotting capabilities for the imCorrelations module.

Here is a little preview of a modified gpairs function from the YaleToolkit R package which is used to visualize the Iris data set. This scatterplot matrix allows for many interesting combinations of plots, which can be annotated with colors based on categorical variable(s).

The upper and lower matrix triangles can be modified with a variety of inputs:

  • scatterplots: points, best-fit-line, loess, qqplot for linear model residuals, best-fit-line confidence interval, correlation statistics
  • conditional plots: boxplot, stripplot, barcode

    Scatterplot matrix for overview of correlations and regressions, displaying box plots for Iris data species, variable histograms, correlation statistics, stripcharts and best fit lines.

This can be easily modified to rapidly visualize and overview variable dependencies.

Displaying Iris data, confidence intervals for best fit lines, residual quantile-quantile plots and variable barcode plots.

I’ve all ways been fascinated by the use of networks to represent relationships among data objects. Recently I’ve been experimenting with using networks to represent ideas, concepts and knowledge models. FreeMind and Cmap Tools are two freely available and easy to use tools for doing this.

FreeMind only supports hierarchical layouts, but is very easy to use and make beautiful representations of complex ideas.

Cmap Tools is an incredibly expressive tool for making concept maps and  knowledge models. I just started using it, but already the possibilities for content types and linked files is very impressive.


This is a blog to document imDEV development and interesting things related to data analysis and visualization.