I love deadlines. I especially love the whooshing sound they make as they fly by. — Douglas Adams

Redefining r-squared

Few statistics are more oft-quoted by empirical researchers than r-squared. While applauding the value of an intuitive interpretation in principle, it is pretty clear that the interpretation is wrong. Apart from honesty, the main reason I care about this is that it gets me into trouble with (the more discerning) students.

Not for the first time a student recently came back to me with a query. I have given him some data and the task was to draw some kind of a causaility diagram using correlations, partial correlations and commons sense (for the causailty). They had just had the class on r-squared so the idea was to put these on the arrows.

The student was interested in checking the interpretation of r-squared. So he broke the y-variable down into groups of equal x-values (which was discrete). He looked at the standard deviation of Y for each group (using Pivotables). He compared these with the overall standard deviation and found that the within group standard deviation was, on average, about 40% of the overall. So 60% is explained by X. Yet the correlation was about 0.9 and we say 81% is explained.

I had to tell him that the common interpretation of r-squared is wrong but ubiquitous and that I had hoped he wouldn’t notice!

The problem of course is that that we can explain 81% of the variance. But variance does not measure variability (or anything sensible?). Standard deviation does. This being the case, it seems that we should re-defined variation explained as

1(1r2)

which is always way smaller. Not that I am game to try! Maybe if we collectively came up with a better name we could get away with it. One possibility would be to incorporate this adjustment into the adjusted r-squared. In other words, substitute the adjusted r-squared into the above formula and call this variation explained. The downside of this is that the incorrectness of the interpretation of ordinary r-squared would then stand out like the proverbials.


You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

AddThis Social Bookmark Button

3 Responses to “Redefining r-squared”

  1. Chris Lloyd Says:

    Gary Glonek wrote:

    I came to the same conclusion many years ago, but have lost enthusiasm for R^2 for a number of reasons.

    1. In my experience, most students in elementary courses don’t really want to know. I think they are happier to accept “the proportion of variability explained” because it is easy to say without thinking about what it actually means. On the other hand, understanding it in terms of the reduction in predictive variance is, for them, much more difficult.

    2. I don’t accept that the common definition of R^2 is incorrect. It just indirect and invites superficial understanding. In the light of this plus the momentum attached to the concept, I don’t think there is much chance of redefining the quantity.

    3. I think the more important issue, and one that is frequently overlooked, is that unlike most other things to do with regression, the sampling assumptions for x are important for R^2. In particular if the observed distribution of the x’s doesn’t represent the population that you care about then the naive R^2 derived from the data is irrelevant.

  2. Bob Jordan Says:

    I agree totally with the comments so far on this site.

    We have been playing here since 2005 with the concept of a variable we call Ya (the reversed Russian R character - a bit hard to type in these places).

    We use the definition:

    Ya=sqrt(1-R^2)

    and when you write it correctly you will understand why we chose Ya. Note the relationship to the proposed re-defined variation explained discussed by Chris.

    This is hardly rocket science and is not totally new but may have a place instead of R^2

    But in an important change we used Radj rather than R to get around the degrees of freedom difficulties. So:

    Ya=sqrt(1-Radj^2)

    It is also related
    exactly’ to the simple ratio of the SE of a model fit to the SD of the data with the appropriate DOF. ie

    Ya = SE/SD

    When you start here you can see that this has exactly the same info as R (or at least Radj) so R is equally as good yet this form seems to tackle the nub of the problem - how good a fit is my model/curve/whatever?

    The advantages of Ya?

    1. It has the reverse logic to R in that smaller is better.

    2. It is a ratio of SDs (the common form to express a variability) and not variances (which have IMVHO no place in engineering.

    3. It is linear - halving the SE halves the Ya, but this change would make an asymptotic change to R that is hard to describe.

    4. It describes the unexplained rather than the explained.

    But one needs to be careful. Variances add linearly and standard deviations add vectorially. This to me is a nice concept that builds in the independence factor. We say if two SDs are independent then we add the sum of the squares and then square root ie add their vectors). If there is some dependence between them we need to take this into account - think of adding vectors at right angles if independent (orthogonal)and of adding them at zero degrees if totally correlated, and at some intermediate angle if partially correlated. This certainly makes sense to an electrical engineer who add voltages in this way exactly.

    And then the argument about the fractions described by each component. This is harder. Yes sure they all add up if they are variances but that is not the point. Just because it is easy does not mean it is right.

    We need to work out how to describe components of variation in a sensible way that does not involve variances. Variances are not variability!

    As I stated we cannot think variances - they do not make sense as a measure of variability as they are in the wrong units compared to the thing we are assessing and have very clumsy units. Would you say we measured the constant G as 6.67*10^-11 Nm^2kg^-2 to a variance of 1*10^-26 N^2m^4kg^-4?

    No, we would say G=6.67(±0.01)*10^-11 Nm^2kg^-2.

    That is an SD in there - no need for a variance at all!

    We already use a modified version of Ya in NIR spectroscopy.

    We talk of the SDR which SDR=SD/SE for an NIR model where SD is the data SD and SE is the model fit or residual standard deviation (taking into account the DOF).

    This SDR is almost as bad as R^2 in that it is non linear (in fact hyperbolic) in form. But note that Ya = 1/SDR
    - same info, but in a linear and understandable form.

    But I have probably come into this discussion too late and perhaps uninvited.

    Bob J.

  3. Chris Lloyd Says:

    Late but certainly very welcome Bob. thanks for your engineer’s perpective.

Leave a Reply