### Correlation plot matrices using the ellipse library

Posted on Updated on

My new favorite library is the ellipse library. It includes functions for creating ellipses from various objects. It has a function, plotcorr() to create a correlation matrix where each correlation is represented with an ellipse approximating the shape of a bivariate normal distribution with the same correlation. While the function itself works well, I wanted a bit more redundancy in my plots and modified the code. I kept (most of) the main features provided by the function and I’ve included a few: the ability to plot ellipses and correlation values on the same plot, the ability to manipulate what is placed along the diagonal and the rounding behavior of the numbers plotted. Here is an example with some color manipulations. The colors represent the strength and direction of the correlation, -1 to 0 to 1, with University of Rochester approved red to white to blue.

First the function code:

```my.plotcorr <- function (corr, outline = FALSE, col = "grey", upper.panel = c("ellipse", "number", "none"), lower.panel = c("ellipse", "number", "none"), diag = c("none", "ellipse", "number"), digits = 2, bty = "n", axes = FALSE, xlab = "", ylab = "", asp = 1, cex.lab = par("cex.lab"), cex = 0.75 * par("cex"), mar = 0.1 + c(2, 2, 4, 2), ...)
{
# this is a modified version of the plotcorr function from the ellipse package
# this prints numbers and ellipses on the same plot but upper.panel and lower.panel changes what is displayed
# diag now specifies what to put in the diagonal (numbers, ellipses, nothing)
# digits specifies the number of digits after the . to round to
# unlike the original, this function will always print x_i by x_i correlation rather than being able to drop it
# modified by Esteban Buz
if (!require('ellipse', quietly = TRUE, character = TRUE)) {
stop("Need the ellipse library")
}
savepar <- par(pty = "s", mar = mar)
on.exit(par(savepar))
if (is.null(corr))
return(invisible())
if ((!is.matrix(corr)) || (round(min(corr, na.rm = TRUE), 6) < -1) || (round(max(corr, na.rm = TRUE), 6) > 1))
stop("Need a correlation matrix")
plot.new()
par(new = TRUE)
rowdim <- dim(corr)[1]
coldim <- dim(corr)[2]
rowlabs <- dimnames(corr)[[1]]
collabs <- dimnames(corr)[[2]]
if (is.null(rowlabs))
rowlabs <- 1:rowdim
if (is.null(collabs))
collabs <- 1:coldim
rowlabs <- as.character(rowlabs)
collabs <- as.character(collabs)
col <- rep(col, length = length(corr))
dim(col) <- dim(corr)
upper.panel <- match.arg(upper.panel)
lower.panel <- match.arg(lower.panel)
diag <- match.arg(diag)
cols <- 1:coldim
rows <- 1:rowdim
maxdim <- max(length(rows), length(cols))
plt <- par("plt")
xlabwidth <- max(strwidth(rowlabs[rows], units = "figure", cex = cex.lab))/(plt[2] - plt[1])
xlabwidth <- xlabwidth * maxdim/(1 - xlabwidth)
ylabwidth <- max(strwidth(collabs[cols], units = "figure", cex = cex.lab))/(plt[4] - plt[3])
ylabwidth <- ylabwidth * maxdim/(1 - ylabwidth)
plot(c(-xlabwidth - 0.5, maxdim + 0.5), c(0.5, maxdim + 1 + ylabwidth), type = "n", bty = bty, axes = axes, xlab = "", ylab = "", asp = asp, cex.lab = cex.lab, ...)
text(rep(0, length(rows)), length(rows):1, labels = rowlabs[rows], adj = 1, cex = cex.lab)
text(cols, rep(length(rows) + 1, length(cols)), labels = collabs[cols], srt = 90, adj = 0, cex = cex.lab)
mtext(xlab, 1, 0)
mtext(ylab, 2, 0)
mat <- diag(c(1, 1))
plotcorrInternal <- function() {
if (i == j){ #diag behavior
if (diag == 'none'){
return()
} else if (diag == 'number'){
text(j + 0.3, length(rows) + 1 - i, round(corr[i, j], digits=digits), adj = 1, cex = cex)
} else if (diag == 'ellipse') {
mat[1, 2] <- corr[i, j]
mat[2, 1] <- mat[1, 2]
ell <- ellipse(mat, t = 0.43)
ell[, 1] <- ell[, 1] + j
ell[, 2] <- ell[, 2] + length(rows) + 1 - i
polygon(ell, col = col[i, j])
if (outline)
lines(ell)
}
} else if (i >= j){ #lower half of plot
if (lower.panel == 'ellipse') { #check if ellipses should go here
mat[1, 2] <- corr[i, j]
mat[2, 1] <- mat[1, 2]
ell <- ellipse(mat, t = 0.43)
ell[, 1] <- ell[, 1] + j
ell[, 2] <- ell[, 2] + length(rows) + 1 - i
polygon(ell, col = col[i, j])
if (outline)
lines(ell)
} else if (lower.panel == 'number') { #check if ellipses should go here
text(j + 0.3, length(rows) + 1 - i, round(corr[i, j], digits=digits), adj = 1, cex = cex)
} else {
return()
}
} else { #upper half of plot
if (upper.panel == 'ellipse') { #check if ellipses should go here
mat[1, 2] <- corr[i, j]
mat[2, 1] <- mat[1, 2]
ell <- ellipse(mat, t = 0.43)
ell[, 1] <- ell[, 1] + j
ell[, 2] <- ell[, 2] + length(rows) + 1 - i
polygon(ell, col = col[i, j])
if (outline)
lines(ell)
} else if (upper.panel == 'number') { #check if ellipses should go here
text(j + 0.3, length(rows) + 1 - i, round(corr[i, j], digits=digits), adj = 1, cex = cex)
} else {
return()
}
}
}
for (i in 1:dim(corr)[1]) {
for (j in 1:dim(corr)[2]) {
plotcorrInternal()
}
}
invisible()
}

```

And now a short walk through:

```#usage of my.plotcorr
#much like the my.plotcorr function, this is modified from the plotcorr documentation
#this function requires the ellipse library, though, once installed you don't need to load it - it is loaded in the function
#install.packages(c('ellipse'))
#library(ellipse)
source('my.plotcorr.R')
# Get some data
data(mtcars)
# Get the correlation matrix
corr.mtcars <- cor(mtcars)
# Change the column and row names for clarity
colnames(corr.mtcars) = c('Miles/gallon', 'Number of cylinders', 'Displacement', 'Horsepower', 'Rear axle ratio', 'Weight', '1/4 mile time', 'V/S', 'Transmission type', 'Number of gears', 'Number of carburetors')
rownames(corr.mtcars) = colnames(corr.mtcars)

# Standard plot, all ellipses are grey, nothing is put in the diagonal
my.plotcorr(corr.mtcars)

# Here we play around with the colors, colors are selected from a list with colors recycled
# Thus to map correlations to colors we need to make a list of suitable colors
# To start, pick the end (and mid) points of a scale, here a red to white to blue for neg to none to pos correlation
colsc=c(rgb(241, 54, 23, maxColorValue=255), 'white', rgb(0, 61, 104, maxColorValue=255))

# Build a ramp function to interpolate along the scale, I've opted for the Lab interpolation rather than the default rgb, check the documentation about the differences
colramp = colorRampPalette(colsc, space='Lab')

# I'll show two types of color styles using this color ramp
# the first
# Use the same number of colors along the scale for the number of variables
colors = colramp(length(corr.mtcars[1,]))

# then plot an example with only ellipses, without a diagonal and with a main title
# the color selection stuff here multiplies the correlations such that they can index individual colors and create a sufficiently large list
# incase you are confused, r allows vector indexing with non-integers by rounding down, i.e. colors[1.8] == colors[1]
my.plotcorr(corr.mtcars, col=colors[5*corr.mtcars + 6], main='Predictor correlations')

# the second form
# we could, alternatively, make a scale with 100 points
colors = colramp(100)
# then pick colors along this 100 point scale given the correlation value * 100 rounded down to the nearest integer
# to do that we need to move the correlation range from [-1, 1] to [0, 100]
# now plot again with ellipses along the diagonal
my.plotcorr(corr.mtcars, col=colors[((corr.mtcars + 1)/2) * 100], diag='ellipse', main='Predictor correlations')

# or, add numbers to the bottom of the chart
my.plotcorr(corr.mtcars, col=colors[((corr.mtcars + 1)/2) * 100], diag='ellipse', lower.panel="number", main='Predictor correlations')

# or, switch the numbers and ellipses and reduce the margins
my.plotcorr(corr.mtcars, col=colors[((corr.mtcars + 1)/2) * 100], diag='ellipse', upper.panel="number", mar=c(0,2,0,0), main='Predictor correlations')

# or, drop the diagonal and numbers
my.plotcorr(corr.mtcars, col=colors[((corr.mtcars + 1)/2) * 100], upper.panel="none", mar=c(0,2,0,0), main='Predictor correlations')
```
About these ads

## 22 thoughts on “Correlation plot matrices using the ellipse library”

tiflo said:
March 26, 2012 at 7:29 pm

Nice!

Like

Mayte said:
April 11, 2013 at 6:25 am

awesome! perfect visualization :-)

Like

Roger said:
May 22, 2013 at 8:26 am

Very nice plot,
but I tried to put my own data set, the R showed
> data(XXX)
Warning message:
In data(XXX) : data set ‘XXX’ not found
I am pretty sure I have imported my dataset in.
Would you mind helping me to sort it out?

Like

tiflo said:
May 22, 2013 at 2:22 pm

Hi Roger,

the only call to data is “data(mtcars)”. mtcars is a data set that comes with R. I just tried that command in R 3.0 and it works. So, I am not sure what causes the problem. Or did you type “data(XXX)”? That would not work, even if you have created a data set called XXX. The command “data()” loads an data set that comes with a library in R. if you have already loaded your own data set. Just remove that command from the code.

hth,
Florian

Like

Roger said:
May 23, 2013 at 4:25 am

Sorry, probably let you feel confused,
load(XXX) , XXX means my own data set’ name, I know mtcars is built in R, so I changed the name,
I mean I’d like to import my own data and make a beautiful plot as yours,
but it showed:
> data(XXX)
Warning message:
In data(XXX) : data set ‘XXX’ not found

so I was just wondering that did I miss anything?
Cheers,

Like

exbuz responded:
May 23, 2013 at 10:10 am

Hi Roger,

As Florian mentioned, the data() function is just to load an example data set that comes with an R library—not your own data. I am not sure what ‘XXX’ stands for in your question. If you have a saved dataset (in an .RData or similar R native file) you’ll need to use the load() function on that file. If your data is in a raw text format you should load it using the read.table() function. As an example, if you have your data in a tab delimited file named ‘MyData.tab’ you can replace the data() line with something like this:
``` my.data = read.table(file="path/to/MyData.tab", sep="\t") #be sure to check for row and column name issues #and continue in a similar way through the rest of the code. corr.my.data = cor(my.data) #be sure to change any variable names and other specifics for your data in the rest of the code ```

-Esteban

Like

Jirka Spilka said:
July 22, 2013 at 2:05 am

Thanks for the nice code!

Like

lemma said:
November 7, 2013 at 4:49 am

The numerical values are not displayed when i run the code, any help?

Thanks
lemma

Like

exbuz responded:
November 7, 2013 at 11:19 am

Hi Lemma,

I’m not sure what could be the issue. The numbers should be displayed if you specify that you want numbers in the lower left or upper right half of the plot. For example, using the mtcars data above you can put numbers in the upper right half like this:
`my.plotcorr(corr.mtcars, upper.panel="number")`

Like

beckmw said:
December 3, 2013 at 12:13 pm

This is a very nice modification of the plotcorr function, thanks! Might I suggest two things: 1) Perhaps include an option to plot histograms for single variables along the diagonal, and 2) options for significance stars on correlation values. Just suggestions…

Like

tiflo said:
December 3, 2013 at 5:51 pm

Thanks for the suggestions. Do you have any specific code in mind that you’d be willing to share?

Like

Soder said:
January 29, 2014 at 6:45 am

Hello Everyone.

Sorry, if I post this question here, but I’m very new to R.
How can I make this function run in my program? Do I have to kind of install it, or just insert the function code and everything into my scriptfile and press the button?

Do I have to install this function somewhere in the library of ellipses?

Thanks for your help (maybe a link would be helpfull :-/ )

Like

tiflo said:
January 29, 2014 at 4:48 pm

You just paste the function into your script window, read it in (or ‘source’ it) and then you should be able to call it. You can also set up R so that it sources a specific script file every time it starts.

HTH

Like

Soder said:
January 30, 2014 at 4:21 am

Well, that is what I’ve tried so far.
I took the whole code (from function code, not just the one line where the function is defined) and paste it into my script window and read it in.

But then, when it comes to the “application”, when I tipe in source(‘my.plotcorr.R’), it says: Cannot open Connection. The ellipse-library is installed of course.

Ty for help :)

Like

exbuz responded:
January 30, 2014 at 8:30 am

I’m not quite sure what you’re doing but I’d suggest that you copy paste the function code from the post into a file called ‘my.plotcorr.R’ and save it somewhere. Then in a separate script file where you want to generate plots, source that file (i.e. with source(‘my.plotcorr.R’)). Make sure you also give the source function the right path to that file otherwise R will give you the error “cannot open connection”. Alternatively set R’s working directory to the same one as where that file is. I keep all my helper scripts in the same place on my computer so it’s easy for source whatever I need from any other script on my computer.

Like

YB said:
March 31, 2014 at 8:42 pm

Hi, I’ve been using the great modified function for a while and everything was working until today.
I’m getting the following error: “Error in ellipse(mat, t = 0.43) : center must be a vector of length 2″
When I use the original ellipse package, it’s fine, but the modified version is giving me the error. Any tips?

Thanks,YB

Like

Mark said:
April 24, 2014 at 3:57 pm

Hi JB- looks like we got the same error. It’s because of another package with a different ellipse function (probably car, as it is in my case)

Like

Jo said:
June 30, 2014 at 8:47 am

I’m getting the following error message too: “Error in ellipse(mat, t = 0.43) : center must be a vector of length 2″

I tried to solve the masking problem by adding ellipse : : ellipse() befor the function but I still receive the error. Any ideas? Thank you!

Like

exbuz responded:
June 30, 2014 at 9:37 am

Could you give me a minimal (non-) working example to see if I can reproduce the issue?

Like

TLab said:
August 1, 2014 at 2:51 pm

Hello, First of thanks for the original post!

I’m trying to apply this to a dataset that has some missing values under a few of the variable. In the correlation matrix the r value shows up as NA for any correlations including Variable that have any missing values.

I done see a natural place to insert something like “na.rm = TRUE” to have it carry out the correlation analysis even if values are missing.

Thanks in advance!

Like

tiflo said:
August 1, 2014 at 3:22 pm

It seems this function needs a correlation matrix as input, so you’d have to put in the na.rm in the call that creates the correlation matrix (what did you use)?

Like

exbuz responded:
August 2, 2014 at 11:06 pm

Thanks for the comment Florian. The plotting function does not calculate correlations for you so how you calculate them is left to the user but cor() doesn’t have an na.rm argument.
TLab, you’ll need to add the use=’complete’ option for your cor(). This will find complete cases to calculate your correlation matrix. In the case of the walkthrough above it’d be on line 10 like this:
corr.mcars <- cor(mtcars, use='complete')

Alternatively pass your data.matrix through the function complete.cases() before you pass it to cor() will also work. Just be aware that these solutions will remove rows for which there is any missing data so if you want to keep as much data as possible for each pairwise correlation you'll need to calculate each on their own and build the square correlation matrix to pass to the plotting function.

HTH,
Esteban

Like