A problem well stated is a problem half solved —  Charles Kettering (1876-1958)

Melbourne Half Marathon

Some of my colleagues recently ran in the Melbourne Half Marathon. The good folks who administer the event are good enough to provide an excel spreadsheet listing the finishing times of each competitor with their registered age and gender. You might find it useful in the classroom. There are some interesting patterns in the data but not for the reason you might first think.

Actually there were several races in one. More than half of the starters continue on from 10km to the full half marathon and some only ran 5km. I tabulated the times by age and gender for the 10km race. Below is a plot  (with standard error bars). What I find most interesting about this graph is how flat the curves are until the mid-forties. The wouldn’t be too many sports where 40 years olds are competitive with 20 year olds, especially a sport such as long distance running where injury is a significant factor, I would have thought. There is hope for us after all!

However, if you think about it for more than 10 seconds, these data probably do not imply that 40 years olds are as good as 20 year olds. There would be a truly huge selection effect here. Yours truly did not run, partly because I had better things to do but also because I have a bad knee. We would surely expect that many of those whose performance is reduced by the ravages of age would not even bother to compete, especially when they are still in an age-groups - say up to about 40 - when they kid themselves that they are still near their prime. I’ll go in the next one when I have done a bit of training. Don’t want the guys at work finding out I couldn’t break 60 minutes….

Each mean is the mean of a truncated distribution so they mean pretty much nothing unless we know something about the truncation. A first pass at this selection effect is to check the participation rate amongst various age-groups.   So I downloaded file 320102 from the ABS which gives the 2008 Victorian population by age and gender. And I calculated the number of runners per 10,000 people within each age grouping. I did the same for the 22km race as well. The results are pretty telling (though not broken down by gender). 

Participation drops off drastically from a peak at 25-29. The obese and the ill do not arrive at the starting line. The participation in the 22k run is higher. These are all more serious athletes - by definition. If you examine the vertical scale in the plot below you will note that the average speed for the 22km is about half a minute per kilometer faster than for 10km. Partly, this will be due to the traffic jam at the beginning of the race which washes out more in a longer race. They are also better runners to start with - they are not all of a sudden getting a second wind at the 10km mark.

So in the final analysis this data set is probably not much good for estimating anything but I think it will make a good case for describing sampling bias in class. When sampling is not controlled and randomised, you need to think how the data points get into the study. When human beings select themselves, the raw data tell you as much about the psychology of selection as about performance. Sport is always a great vehicle for discussing statistics and sampling design is, let’s face it, a topic that easily sends people to sleep.

We do expect to see times increasing with age eventually, even with truncation. And that is what we see though it looks like male 22km runners just won’t enter if their speed is going to be worse than 6mins/km. I think it would be interesting to track down the reasons for the degrading performance at higher ages. Finishing time data is obviously useless for this. It is a medical issue. For myself, I feel that I am pretty fit in heart and lung but on the occasions when I run with my son I am disadvantaged by a very inefficient running style (that is what they all say!) mainly a result of general stiffness and joint pain. I use a lot more energy per (short) stride than he does.*

The main pattern in the data that I cannot explain is the large difference between males and females - around 1 minute per kilometer consistent across all ages and both distances. I would have thought the differences between genders would be much less since this is a test of fitness rather than strength. My speculative explanation is that this is again a selection effect - that females have a lower pride factor and are willing to participate even at a low level of performance. I find this much easier to believe than that 50 year old males are a minute faster than 50 year old females.

 

* I am reminded of some of the lap swimmers I have seen who barely expend an erg, and even enhance their speed with hand paddles and flippers. If you were trying to get the maximum exercise per minute, wouldn’t you attached weights to your feet rather than flippers?


You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

AddThis Social Bookmark Button

4 Responses to “Melbourne Half Marathon”

  1. Peter Lekkas Says:

    Hi Chris,
    There are physiological differences between the genders that account for the difference in times across both endurance running, swimming and cycling events (e.g. diffs in V02 max)… as an aside there is an interesting graph of the evolution of marathon running times by gender at the following website http://www.marathonguide.com/history/records/index.cfm - whilst the gap has closed it is still substantive.
    Cheers - Peter

  2. Peter, It is just the magnitude of the difference I find surprising. It would imply a 46 minute difference in full marathon times when, at the elite level, the difference is around 10-15 minutes. It is, of course, not clear what the extremes of the two gender distribution tell us about the bulk of the population.

  3. Chris, as you say, at the elite level there isn’t much difference between the times. However, these athletes are very much into the tail of the distributions of running times, and it well may be that there is little difference in these tails. I suspect it is a very different story when looking at the middle of the distribution where I would expect there to be far greater difference in times between men and women. Whether it is to the same magnitude you suggest of 45 mins I am not sure. Further analysis of other half marathons might yield more.
    Alex

  4. I haven’t gone back to look at my stats, but when I (a woman) used to run seriously, I used to use the difference in the winning times between men and women to scale my own time (and prove to my workmates that I really beat them after correcting for gender).

    My recollection is that the percentage scaling factor for the winner was fairly uniformly 10-15%, regardless of age. That’s reasonably consistent with your 1min per km - perhaps elite level has some odd effects at the tail of the distribution?

    Anecdotally, given there are usually more men than women in a fun run as long as 22K (less true at shorter distances), I would suggest that the women are usually more serious about it than the men - so the real physiological gap may be bigger than the 1min per K you have above.

Leave a Reply