Probably the best known sporting statistic in Australia is Don Bradman’s career test average – 99.94. Oh the pain of that final duck. Even a poor score would have put his average over 100. Is there some way that a creative statistician might argue that the average is somehow biased downwards and that the “real’ average is greater than 100?
Statistician Charles Davis argues that Bradman’s performance is the most dominant of any player of any major sport. He analyses the statistics for several prominent sportsmen by comparing the number of standard deviations above the mean for their sport. The top performers in his selected sports are given below (never mind the silly probability based on the normal distribution. See HERE for more details.
Bradman Cricket(Batting average) z=4.4 p=1 in 184,000
Pelé Soccer(Goals per game) z=3.7 p=1 in 9,300
Ty Cobb Baseball(Batting average) z=3.6 p=1 in 6,300
Jack Nicklaus Golf(Major titles) z=3.5 p=1 in 4,300
Michael Jordan Basketball(Points per game) z=3.4 p=1 in 3,000
A few years ago I had what I thought was a great idea (I get one every few years on average). Think about how a batting average is calculated. If you are not out, then your score is added to the numerator but nothing is added to the denominator. You get exactly the same average if you replace each not-out score with that score plus the average, and treat it as an out score. So the conventional average assumes that at the point of your innings ending not-out, your mean score thereafter is the same as when you first walk out to the crease.
My idea was that the risk of getting out early is quite high, so I would expect that a batsman’s mean score after making say 50 runs would be considerably higher than their mean score when they walked to the crease. So Bradman’s average of 99.94 under-estimates what we would think of as the mean – namely the mean of the probability distribution that generated his score, taking into account that the data are observed with censoring.
And if I could show that Bradman’s test average was higher than 100, I would be a national hero and forever make the humble profession of statistician something to proclaim loudly and with pride. Not surprisingly, it turns out that I was not the first to think of this, see this jrssa paper in 1993.
We know how to estimate the underlying probability distribution with censored data without making the lack of memory assumption: the Kaplan-Meier estimator. You can probably guess the punch line. His average actually went down, a result which is likely to get me assassinated if published. The reason is that, like most human beings, his concentration was limited and his chance of being dismissed was increasing by the time he reached his average score.
Enter Professor Bruce Chapman from ANU. Bruce has taken a quite different approach in the eternal quest to raise the Don’s average to three figures, I am mortified to admit that an econometrician has succeeded where a statistician has failed.
Noting that Bradman’s career was interrupted for 6 years by the war, one might ask “what would his average have been if he had played test cricket during this period?” WW2 is just a different kind of censoring. Bradman played from 1928 to 1948 and his average increased slightly but systematically by about half a run per year. Estimating that there would have been four test matched played during the war and filling in his scores with the extrapolated averages for the years 1930-145 gives a figure…..100.74. Bruce’s paper is HERE. Alternatively, if you look at the graphic of Bradman’s career (noted that the horizontal scale is not linear in time) the average was clearly higher than 100 in the period just before 1939 and just after 1945. Further details are HERE.
I hesitate to say it, but if one applies the Kaplan-Meier estimator to the augmented data one gets a mean less than 100 again. Oh, dear.