Correlation plot matrices using the ellipse library

Posted on Updated on


My new favorite library is the ellipse library. It includes functions for creating ellipses from various objects. It has a function, plotcorr() to create a correlation matrix where each correlation is represented with an ellipse approximating the shape of a bivariate normal distribution with the same correlation. While the function itself works well, I wanted a bit more redundancy in my plots and modified the code. I kept (most of) the main features provided by the function and I’ve included a few: the ability to plot ellipses and correlation values on the same plot, the ability to manipulate what is placed along the diagonal and the rounding behavior of the numbers plotted. Here is an example with some color manipulations. The colors represent the strength and direction of the correlation, -1 to 0 to 1, with University of Rochester approved red to white to blue.

First the function code:

my.plotcorr <- function (corr, outline = FALSE, col = "grey", upper.panel = c("ellipse", "number", "none"), lower.panel = c("ellipse", "number", "none"), diag = c("none", "ellipse", "number"), digits = 2, bty = "n", axes = FALSE, xlab = "", ylab = "", asp = 1, cex.lab = par("cex.lab"), cex = 0.75 * par("cex"), mar = 0.1 + c(2, 2, 4, 2), ...)
{
# this is a modified version of the plotcorr function from the ellipse package
# this prints numbers and ellipses on the same plot but upper.panel and lower.panel changes what is displayed
# diag now specifies what to put in the diagonal (numbers, ellipses, nothing)
# digits specifies the number of digits after the . to round to
# unlike the original, this function will always print x_i by x_i correlation rather than being able to drop it
# modified by Esteban Buz
  if (!require('ellipse', quietly = TRUE, character = TRUE)) {
    stop("Need the ellipse library")
  }
  savepar <- par(pty = "s", mar = mar)
  on.exit(par(savepar))
  if (is.null(corr))
    return(invisible())
  if ((!is.matrix(corr)) || (round(min(corr, na.rm = TRUE), 6) < -1) || (round(max(corr, na.rm = TRUE), 6) > 1))
    stop("Need a correlation matrix")
  plot.new()
  par(new = TRUE)
  rowdim <- dim(corr)[1]
  coldim <- dim(corr)[2]
  rowlabs <- dimnames(corr)[[1]]
  collabs <- dimnames(corr)[[2]]
  if (is.null(rowlabs))
    rowlabs <- 1:rowdim
  if (is.null(collabs))
    collabs <- 1:coldim
  rowlabs <- as.character(rowlabs)
  collabs <- as.character(collabs)
  col <- rep(col, length = length(corr))
  dim(col) <- dim(corr)
  upper.panel <- match.arg(upper.panel)
  lower.panel <- match.arg(lower.panel)
  diag <- match.arg(diag)
  cols <- 1:coldim
  rows <- 1:rowdim
  maxdim <- max(length(rows), length(cols))
  plt <- par("plt")
  xlabwidth <- max(strwidth(rowlabs[rows], units = "figure", cex = cex.lab))/(plt[2] - plt[1])
  xlabwidth <- xlabwidth * maxdim/(1 - xlabwidth)
  ylabwidth <- max(strwidth(collabs[cols], units = "figure", cex = cex.lab))/(plt[4] - plt[3])
  ylabwidth <- ylabwidth * maxdim/(1 - ylabwidth)
  plot(c(-xlabwidth - 0.5, maxdim + 0.5), c(0.5, maxdim + 1 + ylabwidth), type = "n", bty = bty, axes = axes, xlab = "", ylab = "", asp = asp, cex.lab = cex.lab, ...)
  text(rep(0, length(rows)), length(rows):1, labels = rowlabs[rows], adj = 1, cex = cex.lab)
  text(cols, rep(length(rows) + 1, length(cols)), labels = collabs[cols], srt = 90, adj = 0, cex = cex.lab)
  mtext(xlab, 1, 0)
  mtext(ylab, 2, 0)
  mat <- diag(c(1, 1))
  plotcorrInternal <- function() {
    if (i == j){ #diag behavior
      if (diag == 'none'){
        return()
      } else if (diag == 'number'){
        text(j + 0.3, length(rows) + 1 - i, round(corr[i, j], digits=digits), adj = 1, cex = cex)
      } else if (diag == 'ellipse') {
        mat[1, 2] <- corr[i, j]
        mat[2, 1] <- mat[1, 2]
        ell <- ellipse(mat, t = 0.43)
        ell[, 1] <- ell[, 1] + j
        ell[, 2] <- ell[, 2] + length(rows) + 1 - i
        polygon(ell, col = col[i, j])
        if (outline)
          lines(ell)
      }
    } else if (i >= j){ #lower half of plot
      if (lower.panel == 'ellipse') { #check if ellipses should go here
        mat[1, 2] <- corr[i, j]
        mat[2, 1] <- mat[1, 2]
        ell <- ellipse(mat, t = 0.43)
        ell[, 1] <- ell[, 1] + j
        ell[, 2] <- ell[, 2] + length(rows) + 1 - i
        polygon(ell, col = col[i, j])
        if (outline)
          lines(ell)
      } else if (lower.panel == 'number') { #check if ellipses should go here
        text(j + 0.3, length(rows) + 1 - i, round(corr[i, j], digits=digits), adj = 1, cex = cex)
      } else {
        return()
      }
    } else { #upper half of plot
      if (upper.panel == 'ellipse') { #check if ellipses should go here
        mat[1, 2] <- corr[i, j]
        mat[2, 1] <- mat[1, 2]
        ell <- ellipse(mat, t = 0.43)
        ell[, 1] <- ell[, 1] + j
        ell[, 2] <- ell[, 2] + length(rows) + 1 - i
        polygon(ell, col = col[i, j])
        if (outline)
          lines(ell)
      } else if (upper.panel == 'number') { #check if ellipses should go here
        text(j + 0.3, length(rows) + 1 - i, round(corr[i, j], digits=digits), adj = 1, cex = cex)
      } else {
        return()
      }
    }
  }
  for (i in 1:dim(corr)[1]) {
    for (j in 1:dim(corr)[2]) {
      plotcorrInternal()
    }
  }
  invisible()
}

And now a short walk through:

#usage of my.plotcorr
#much like the my.plotcorr function, this is modified from the plotcorr documentation
#this function requires the ellipse library, though, once installed you don't need to load it - it is loaded in the function
#install.packages(c('ellipse'))
#library(ellipse)
source('my.plotcorr.R')
# Get some data
data(mtcars)
# Get the correlation matrix
corr.mtcars <- cor(mtcars)
# Change the column and row names for clarity
colnames(corr.mtcars) = c('Miles/gallon', 'Number of cylinders', 'Displacement', 'Horsepower', 'Rear axle ratio', 'Weight', '1/4 mile time', 'V/S', 'Transmission type', 'Number of gears', 'Number of carburetors')
rownames(corr.mtcars) = colnames(corr.mtcars)

# Standard plot, all ellipses are grey, nothing is put in the diagonal
my.plotcorr(corr.mtcars)

# Here we play around with the colors, colors are selected from a list with colors recycled
# Thus to map correlations to colors we need to make a list of suitable colors
# To start, pick the end (and mid) points of a scale, here a red to white to blue for neg to none to pos correlation
colsc=c(rgb(241, 54, 23, maxColorValue=255), 'white', rgb(0, 61, 104, maxColorValue=255))

# Build a ramp function to interpolate along the scale, I've opted for the Lab interpolation rather than the default rgb, check the documentation about the differences
colramp = colorRampPalette(colsc, space='Lab')

# I'll show two types of color styles using this color ramp
# the first
# Use the same number of colors along the scale for the number of variables
colors = colramp(length(corr.mtcars[1,]))

# then plot an example with only ellipses, without a diagonal and with a main title
# the color selection stuff here multiplies the correlations such that they can index individual colors and create a sufficiently large list
# incase you are confused, r allows vector indexing with non-integers by rounding down, i.e. colors[1.8] == colors[1]
my.plotcorr(corr.mtcars, col=colors[5*corr.mtcars + 6], main='Predictor correlations')

# the second form
# we could, alternatively, make a scale with 100 points
colors = colramp(100)
# then pick colors along this 100 point scale given the correlation value * 100 rounded down to the nearest integer
# to do that we need to move the correlation range from [-1, 1] to [0, 100]
# now plot again with ellipses along the diagonal
my.plotcorr(corr.mtcars, col=colors[((corr.mtcars + 1)/2) * 100], diag='ellipse', main='Predictor correlations')

# or, add numbers to the bottom of the chart
my.plotcorr(corr.mtcars, col=colors[((corr.mtcars + 1)/2) * 100], diag='ellipse', lower.panel="number", main='Predictor correlations')

# or, switch the numbers and ellipses and reduce the margins
my.plotcorr(corr.mtcars, col=colors[((corr.mtcars + 1)/2) * 100], diag='ellipse', upper.panel="number", mar=c(0,2,0,0), main='Predictor correlations')

# or, drop the diagonal and numbers
my.plotcorr(corr.mtcars, col=colors[((corr.mtcars + 1)/2) * 100], upper.panel="none", mar=c(0,2,0,0), main='Predictor correlations')
About these ads

16 thoughts on “Correlation plot matrices using the ellipse library

    tiflo said:
    March 26, 2012 at 7:29 pm

    Nice!

    Like this

    Mayte said:
    April 11, 2013 at 6:25 am

    awesome! perfect visualization :-)

    Like this

    Roger said:
    May 22, 2013 at 8:26 am

    Very nice plot,
    but I tried to put my own data set, the R showed
    > data(XXX)
    Warning message:
    In data(XXX) : data set ‘XXX’ not found
    I am pretty sure I have imported my dataset in.
    Would you mind helping me to sort it out?

    Like this

      tiflo said:
      May 22, 2013 at 2:22 pm

      Hi Roger,

      the only call to data is “data(mtcars)”. mtcars is a data set that comes with R. I just tried that command in R 3.0 and it works. So, I am not sure what causes the problem. Or did you type “data(XXX)”? That would not work, even if you have created a data set called XXX. The command “data()” loads an data set that comes with a library in R. if you have already loaded your own data set. Just remove that command from the code.

      hth,
      Florian

      Like this

        Roger said:
        May 23, 2013 at 4:25 am

        Sorry, probably let you feel confused,
        load(XXX) , XXX means my own data set’ name, I know mtcars is built in R, so I changed the name,
        I mean I’d like to import my own data and make a beautiful plot as yours,
        but it showed:
        > data(XXX)
        Warning message:
        In data(XXX) : data set ‘XXX’ not found

        so I was just wondering that did I miss anything?
        Cheers,

        Like this

          exbuz responded:
          May 23, 2013 at 10:10 am

          Hi Roger,

          As Florian mentioned, the data() function is just to load an example data set that comes with an R library—not your own data. I am not sure what ‘XXX’ stands for in your question. If you have a saved dataset (in an .RData or similar R native file) you’ll need to use the load() function on that file. If your data is in a raw text format you should load it using the read.table() function. As an example, if you have your data in a tab delimited file named ‘MyData.tab’ you can replace the data() line with something like this:

          my.data = read.table(file="path/to/MyData.tab", sep="\t") #be sure to check for row and column name issues
          #and continue in a similar way through the rest of the code.
          corr.my.data = cor(my.data)
          #be sure to change any variable names and other specifics for your data in the rest of the code

          -Esteban

          Like this

    Jirka Spilka said:
    July 22, 2013 at 2:05 am

    Thanks for the nice code!

    Like this

    lemma said:
    November 7, 2013 at 4:49 am

    The numerical values are not displayed when i run the code, any help?

    Thanks
    lemma

    Like this

      exbuz responded:
      November 7, 2013 at 11:19 am

      Hi Lemma,

      I’m not sure what could be the issue. The numbers should be displayed if you specify that you want numbers in the lower left or upper right half of the plot. For example, using the mtcars data above you can put numbers in the upper right half like this:
      my.plotcorr(corr.mtcars, upper.panel="number")

      Like this

    beckmw said:
    December 3, 2013 at 12:13 pm

    This is a very nice modification of the plotcorr function, thanks! Might I suggest two things: 1) Perhaps include an option to plot histograms for single variables along the diagonal, and 2) options for significance stars on correlation values. Just suggestions…

    Like this

      tiflo said:
      December 3, 2013 at 5:51 pm

      Thanks for the suggestions. Do you have any specific code in mind that you’d be willing to share?

      Like this

    Soder said:
    January 29, 2014 at 6:45 am

    Hello Everyone.

    Sorry, if I post this question here, but I’m very new to R.
    How can I make this function run in my program? Do I have to kind of install it, or just insert the function code and everything into my scriptfile and press the button?

    Do I have to install this function somewhere in the library of ellipses?

    Thanks for your help (maybe a link would be helpfull :-/ )

    Like this

      tiflo said:
      January 29, 2014 at 4:48 pm

      You just paste the function into your script window, read it in (or ‘source’ it) and then you should be able to call it. You can also set up R so that it sources a specific script file every time it starts.

      HTH

      Like this

        Soder said:
        January 30, 2014 at 4:21 am

        Well, that is what I’ve tried so far.
        I took the whole code (from function code, not just the one line where the function is defined) and paste it into my script window and read it in.

        But then, when it comes to the “application”, when I tipe in source(‘my.plotcorr.R’), it says: Cannot open Connection. The ellipse-library is installed of course.

        Ty for help :)

        Like this

          exbuz responded:
          January 30, 2014 at 8:30 am

          I’m not quite sure what you’re doing but I’d suggest that you copy paste the function code from the post into a file called ‘my.plotcorr.R’ and save it somewhere. Then in a separate script file where you want to generate plots, source that file (i.e. with source(‘my.plotcorr.R’)). Make sure you also give the source function the right path to that file otherwise R will give you the error “cannot open connection”. Alternatively set R’s working directory to the same one as where that file is. I keep all my helper scripts in the same place on my computer so it’s easy for source whatever I need from any other script on my computer.

          Like this

    YB said:
    March 31, 2014 at 8:42 pm

    Hi, I’ve been using the great modified function for a while and everything was working until today.
    I’m getting the following error: “Error in ellipse(mat, t = 0.43) : center must be a vector of length 2″
    When I use the original ellipse package, it’s fine, but the modified version is giving me the error. Any tips?

    Thanks,YB

    Like this

Questions? Thoughts?

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s