Birthdays are good for your health. Statistics show conclusively that those who celebrate the most birthdays live the longest.

Australia’s best known statistic

Probably the best known sporting statistic in Australia is Don Bradman’s career test average – 99.94. Oh the pain of that final duck. Even a poor score would have put his average over 100. Is there some way that a creative statistician might argue that the average is somehow biased downwards and that the “real’ average is greater than 100?


Statistician Charles Davis argues that Bradman’s performance is the most dominant of any player of any major sport. He analyses the statistics for several prominent sportsmen by comparing the number of standard deviations above the mean for their sport. The top performers in his selected sports are given below (never mind the silly probability based on the normal distribution. See HERE for more details.

Bradman Cricket(Batting average) z=4.4 p=1 in 184,000
Pelé Soccer(Goals per game) z=3.7 p=1 in 9,300
Ty Cobb Baseball(Batting average) z=3.6 p=1 in 6,300
Jack Nicklaus Golf(Major titles) z=3.5 p=1 in 4,300
Michael Jordan Basketball(Points per game) z=3.4 p=1 in 3,000

A few years ago I had what I thought was a great idea (I get one every few years on average). Think about how a batting average is calculated. If you are not out, then your score is added to the numerator but nothing is added to the denominator. You get exactly the same average if you replace each not-out score with that score plus the average, and treat it as an out score. So the conventional average assumes that at the point of your innings ending not-out, your mean score thereafter is the same as when you first walk out to the crease.

My idea was that the risk of getting out early is quite high, so I would expect that a batsman’s mean score after making say 50 runs would be considerably higher than their mean score when they walked to the crease. So Bradman’s average of 99.94 under-estimates what we would think of as the mean – namely the mean of the probability distribution that generated his score, taking into account that the data are observed with censoring.

And if I could show that Bradman’s test average was higher than 100, I would be a national hero and forever make the humble profession of statistician something to proclaim loudly and with pride. Not surprisingly, it turns out that I was not the first to think of this, see this jrssa paper in 1993.

We know how to estimate the underlying probability distribution with censored data without making the lack of memory assumption: the Kaplan-Meier estimator. You can probably guess the punch line. His average actually went down, a result which is likely to get me assassinated if published. The reason is that, like most human beings, his concentration was limited and his chance of being dismissed was increasing by the time he reached his average score.

Enter Professor Bruce Chapman from ANU. Bruce has taken a quite different approach in the eternal quest to raise the Don’s average to three figures, I am mortified to admit that an econometrician has succeeded where a statistician has failed.

Noting that Bradman’s career was interrupted for 6 years by the war, one might ask “what would his average have been if he had played test cricket during this period?” WW2 is just a different kind of censoring. Bradman played from 1928 to 1948 and his average increased slightly but systematically by about half a run per year. Estimating that there would have been four test matched played during the war and filling in his scores with the extrapolated averages for the years 1930-145 gives a figure…..100.74. Bruce’s paper is HERE. Alternatively, if you look at the graphic of Bradman’s career (noted that the horizontal scale is not linear in time) the average was clearly higher than 100 in the period just before 1939 and just after 1945. Further details are HERE.

I hesitate to say it, but if one applies the Kaplan-Meier estimator to the augmented data one gets a mean less than 100 again. Oh, dear.


You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

AddThis Social Bookmark Button

4 Responses to “Australia’s best known statistic”

  1. Censored likelihood models in a negative binomial family do the trick. I did this with Allan Border’s record (largest sample size!) in my Local Regression and Likelihood Book, improving his average by about 1.5 runs. Fitting Bradman’s record (using a constant model, rather
    than local) gives an average of 105.55.

    MLE of the shape parameter for Bradman (in notation of my book, and S-Plus) is 0.58. shape

  2. More power to you! But, your model is parametric. If the results are so different to the non-parametric KM estimate of the mean then doesn’t that mean there is something wrong with the model?

    There IS a slight problem with the non-parametric KM estimate - Bradman’s highest not-out score is larger than his highest out score. Which means you have to concoct an out-score for that innings. But I don’t think this can account for the difference.

  3. Sorry, only just read your reply.

    No, the problem is with estimating means based on KM, not the parametric model. KM can’t model the mean residual life properly for large observations.

    Even if the model isn’t exact, the basic property of a decreasing hazard rate (which holds generally for batting scores) is that MRL is increasing. Then, the `cricket average’ underestimates the mean.

  4. Межрегиональный Центр Аудита и Консалтинга ФАБУЛА - аудиторские услуги, регистрация промышленного образца. Бухгалтерские и консультационные услуги, оформление разрешений и лицензий. Защита интеллектуальной собственности, сопровождение в судах. Любые консультационные услуги в сферах аудита, бухгалтерского учета, налогообложения, юриспруденции.

Leave a Reply