### Correlation plot matrices using the ellipse library

My new favorite library is the ellipse library. It includes functions for creating ellipses from various objects. It has a function, plotcorr() to create a correlation matrix where each correlation is represented with an ellipse approximating the shape of a bivariate normal distribution with the same correlation. While the function itself works well, I wanted a bit more redundancy in my plots and modified the code. I kept (most of) the main features provided by the function and I’ve included a few: the ability to plot ellipses and correlation values on the same plot, the ability to manipulate what is placed along the diagonal and the rounding behavior of the numbers plotted. Here is an example with some color manipulations. The colors represent the strength and direction of the correlation, -1 to 0 to 1, with University of Rochester approved red to white to blue.

First the function code:

my.plotcorr <- function (corr, outline = FALSE, col = "grey", upper.panel = c("ellipse", "number", "none"), lower.panel = c("ellipse", "number", "none"), diag = c("none", "ellipse", "number"), digits = 2, bty = "n", axes = FALSE, xlab = "", ylab = "", asp = 1, cex.lab = par("cex.lab"), cex = 0.75 * par("cex"), mar = 0.1 + c(2, 2, 4, 2), ...) { # this is a modified version of the plotcorr function from the ellipse package # this prints numbers and ellipses on the same plot but upper.panel and lower.panel changes what is displayed # diag now specifies what to put in the diagonal (numbers, ellipses, nothing) # digits specifies the number of digits after the . to round to # unlike the original, this function will always print x_i by x_i correlation rather than being able to drop it # modified by Esteban Buz if (!require('ellipse', quietly = TRUE, character = TRUE)) { stop("Need the ellipse library") } savepar <- par(pty = "s", mar = mar) on.exit(par(savepar)) if (is.null(corr)) return(invisible()) if ((!is.matrix(corr)) || (round(min(corr, na.rm = TRUE), 6) < -1) || (round(max(corr, na.rm = TRUE), 6) > 1)) stop("Need a correlation matrix") plot.new() par(new = TRUE) rowdim <- dim(corr)[1] coldim <- dim(corr)[2] rowlabs <- dimnames(corr)[[1]] collabs <- dimnames(corr)[[2]] if (is.null(rowlabs)) rowlabs <- 1:rowdim if (is.null(collabs)) collabs <- 1:coldim rowlabs <- as.character(rowlabs) collabs <- as.character(collabs) col <- rep(col, length = length(corr)) dim(col) <- dim(corr) upper.panel <- match.arg(upper.panel) lower.panel <- match.arg(lower.panel) diag <- match.arg(diag) cols <- 1:coldim rows <- 1:rowdim maxdim <- max(length(rows), length(cols)) plt <- par("plt") xlabwidth <- max(strwidth(rowlabs[rows], units = "figure", cex = cex.lab))/(plt[2] - plt[1]) xlabwidth <- xlabwidth * maxdim/(1 - xlabwidth) ylabwidth <- max(strwidth(collabs[cols], units = "figure", cex = cex.lab))/(plt[4] - plt[3]) ylabwidth <- ylabwidth * maxdim/(1 - ylabwidth) plot(c(-xlabwidth - 0.5, maxdim + 0.5), c(0.5, maxdim + 1 + ylabwidth), type = "n", bty = bty, axes = axes, xlab = "", ylab = "", asp = asp, cex.lab = cex.lab, ...) text(rep(0, length(rows)), length(rows):1, labels = rowlabs[rows], adj = 1, cex = cex.lab) text(cols, rep(length(rows) + 1, length(cols)), labels = collabs[cols], srt = 90, adj = 0, cex = cex.lab) mtext(xlab, 1, 0) mtext(ylab, 2, 0) mat <- diag(c(1, 1)) plotcorrInternal <- function() { if (i == j){ #diag behavior if (diag == 'none'){ return() } else if (diag == 'number'){ text(j + 0.3, length(rows) + 1 - i, round(corr[i, j], digits=digits), adj = 1, cex = cex) } else if (diag == 'ellipse') { mat[1, 2] <- corr[i, j] mat[2, 1] <- mat[1, 2] ell <- ellipse(mat, t = 0.43) ell[, 1] <- ell[, 1] + j ell[, 2] <- ell[, 2] + length(rows) + 1 - i polygon(ell, col = col[i, j]) if (outline) lines(ell) } } else if (i >= j){ #lower half of plot if (lower.panel == 'ellipse') { #check if ellipses should go here mat[1, 2] <- corr[i, j] mat[2, 1] <- mat[1, 2] ell <- ellipse(mat, t = 0.43) ell[, 1] <- ell[, 1] + j ell[, 2] <- ell[, 2] + length(rows) + 1 - i polygon(ell, col = col[i, j]) if (outline) lines(ell) } else if (lower.panel == 'number') { #check if ellipses should go here text(j + 0.3, length(rows) + 1 - i, round(corr[i, j], digits=digits), adj = 1, cex = cex) } else { return() } } else { #upper half of plot if (upper.panel == 'ellipse') { #check if ellipses should go here mat[1, 2] <- corr[i, j] mat[2, 1] <- mat[1, 2] ell <- ellipse(mat, t = 0.43) ell[, 1] <- ell[, 1] + j ell[, 2] <- ell[, 2] + length(rows) + 1 - i polygon(ell, col = col[i, j]) if (outline) lines(ell) } else if (upper.panel == 'number') { #check if ellipses should go here text(j + 0.3, length(rows) + 1 - i, round(corr[i, j], digits=digits), adj = 1, cex = cex) } else { return() } } } for (i in 1:dim(corr)[1]) { for (j in 1:dim(corr)[2]) { plotcorrInternal() } } invisible() }

And now a short walk through:

#usage of my.plotcorr #much like the my.plotcorr function, this is modified from the plotcorr documentation #this function requires the ellipse library, though, once installed you don't need to load it - it is loaded in the function #install.packages(c('ellipse')) #library(ellipse) source('my.plotcorr.R') # Get some data data(mtcars) # Get the correlation matrix corr.mtcars <- cor(mtcars) # Change the column and row names for clarity colnames(corr.mtcars) = c('Miles/gallon', 'Number of cylinders', 'Displacement', 'Horsepower', 'Rear axle ratio', 'Weight', '1/4 mile time', 'V/S', 'Transmission type', 'Number of gears', 'Number of carburetors') rownames(corr.mtcars) = colnames(corr.mtcars) # Standard plot, all ellipses are grey, nothing is put in the diagonal my.plotcorr(corr.mtcars) # Here we play around with the colors, colors are selected from a list with colors recycled # Thus to map correlations to colors we need to make a list of suitable colors # To start, pick the end (and mid) points of a scale, here a red to white to blue for neg to none to pos correlation colsc=c(rgb(241, 54, 23, maxColorValue=255), 'white', rgb(0, 61, 104, maxColorValue=255)) # Build a ramp function to interpolate along the scale, I've opted for the Lab interpolation rather than the default rgb, check the documentation about the differences colramp = colorRampPalette(colsc, space='Lab') # I'll show two types of color styles using this color ramp # the first # Use the same number of colors along the scale for the number of variables colors = colramp(length(corr.mtcars[1,])) # then plot an example with only ellipses, without a diagonal and with a main title # the color selection stuff here multiplies the correlations such that they can index individual colors and create a sufficiently large list # incase you are confused, r allows vector indexing with non-integers by rounding down, i.e. colors[1.8] == colors[1] my.plotcorr(corr.mtcars, col=colors[5*corr.mtcars + 6], main='Predictor correlations') # the second form # we could, alternatively, make a scale with 100 points colors = colramp(100) # then pick colors along this 100 point scale given the correlation value * 100 rounded down to the nearest integer # to do that we need to move the correlation range from [-1, 1] to [0, 100] # now plot again with ellipses along the diagonal my.plotcorr(corr.mtcars, col=colors[((corr.mtcars + 1)/2) * 100], diag='ellipse', main='Predictor correlations') # or, add numbers to the bottom of the chart my.plotcorr(corr.mtcars, col=colors[((corr.mtcars + 1)/2) * 100], diag='ellipse', lower.panel="number", main='Predictor correlations') # or, switch the numbers and ellipses and reduce the margins my.plotcorr(corr.mtcars, col=colors[((corr.mtcars + 1)/2) * 100], diag='ellipse', upper.panel="number", mar=c(0,2,0,0), main='Predictor correlations') # or, drop the diagonal and numbers my.plotcorr(corr.mtcars, col=colors[((corr.mtcars + 1)/2) * 100], upper.panel="none", mar=c(0,2,0,0), main='Predictor correlations')

March 26, 2012 at 7:29 pm

Nice!

LikeLike this

April 11, 2013 at 6:25 am

awesome! perfect visualization :-)

LikeLike this

May 22, 2013 at 8:26 am

Very nice plot,

but I tried to put my own data set, the R showed

> data(XXX)

Warning message:

In data(XXX) : data set ‘XXX’ not found

I am pretty sure I have imported my dataset in.

Would you mind helping me to sort it out?

LikeLike this

May 22, 2013 at 2:22 pm

Hi Roger,

the only call to data is “data(mtcars)”. mtcars is a data set that comes with R. I just tried that command in R 3.0 and it works. So, I am not sure what causes the problem. Or did you type “data(XXX)”? That would not work, even if you have created a data set called XXX. The command “data()” loads an data set that comes with a library in R. if you have already loaded your own data set. Just remove that command from the code.

hth,

Florian

LikeLike this

May 23, 2013 at 4:25 am

Sorry, probably let you feel confused,

load(XXX) , XXX means my own data set’ name, I know mtcars is built in R, so I changed the name,

I mean I’d like to import my own data and make a beautiful plot as yours,

but it showed:

> data(XXX)

Warning message:

In data(XXX) : data set ‘XXX’ not found

so I was just wondering that did I miss anything?

Cheers,

LikeLike this

May 23, 2013 at 10:10 am

Hi Roger,

As Florian mentioned, the data() function is just to load an example data set that comes with an R library—not your own data. I am not sure what ‘XXX’ stands for in your question. If you have a saved dataset (in an .RData or similar R native file) you’ll need to use the load() function on that file. If your data is in a raw text format you should load it using the read.table() function. As an example, if you have your data in a tab delimited file named ‘MyData.tab’ you can replace the data() line with something like this:

my.data = read.table(file="path/to/MyData.tab", sep="\t") #be sure to check for row and column name issues

#and continue in a similar way through the rest of the code.

corr.my.data = cor(my.data)

#be sure to change any variable names and other specifics for your data in the rest of the code

-Esteban

LikeLike this

July 22, 2013 at 2:05 am

Thanks for the nice code!

LikeLike this

November 7, 2013 at 4:49 am

The numerical values are not displayed when i run the code, any help?

Thanks

lemma

LikeLike this

November 7, 2013 at 11:19 am

Hi Lemma,

I’m not sure what could be the issue. The numbers should be displayed if you specify that you want numbers in the lower left or upper right half of the plot. For example, using the mtcars data above you can put numbers in the upper right half like this:

`my.plotcorr(corr.mtcars, upper.panel="number")`

LikeLike this

December 3, 2013 at 12:13 pm

This is a very nice modification of the plotcorr function, thanks! Might I suggest two things: 1) Perhaps include an option to plot histograms for single variables along the diagonal, and 2) options for significance stars on correlation values. Just suggestions…

LikeLike this

December 3, 2013 at 5:51 pm

Thanks for the suggestions. Do you have any specific code in mind that you’d be willing to share?

LikeLike this

January 29, 2014 at 6:45 am

Hello Everyone.

Sorry, if I post this question here, but I’m very new to R.

How can I make this function run in my program? Do I have to kind of install it, or just insert the function code and everything into my scriptfile and press the button?

Do I have to install this function somewhere in the library of ellipses?

Thanks for your help (maybe a link would be helpfull :-/ )

LikeLike this

January 29, 2014 at 4:48 pm

You just paste the function into your script window, read it in (or ‘source’ it) and then you should be able to call it. You can also set up R so that it sources a specific script file every time it starts.

HTH

LikeLike this

January 30, 2014 at 4:21 am

Well, that is what I’ve tried so far.

I took the whole code (from function code, not just the one line where the function is defined) and paste it into my script window and read it in.

But then, when it comes to the “application”, when I tipe in source(‘my.plotcorr.R’), it says: Cannot open Connection. The ellipse-library is installed of course.

Ty for help :)

LikeLike this

January 30, 2014 at 8:30 am

I’m not quite sure what you’re doing but I’d suggest that you copy paste the function code from the post into a file called ‘my.plotcorr.R’ and save it somewhere. Then in a separate script file where you want to generate plots, source that file (i.e. with source(‘my.plotcorr.R’)). Make sure you also give the source function the right path to that file otherwise R will give you the error “cannot open connection”. Alternatively set R’s working directory to the same one as where that file is. I keep all my helper scripts in the same place on my computer so it’s easy for source whatever I need from any other script on my computer.

LikeLike this

March 31, 2014 at 8:42 pm

Hi, I’ve been using the great modified function for a while and everything was working until today.

I’m getting the following error: “Error in ellipse(mat, t = 0.43) : center must be a vector of length 2″

When I use the original ellipse package, it’s fine, but the modified version is giving me the error. Any tips?

Thanks,YB

LikeLike this

April 24, 2014 at 3:57 pm

Hi JB- looks like we got the same error. It’s because of another package with a different ellipse function (probably car, as it is in my case)

LikeLike this