Ranking Schools
On January 26, 2010 the Grattan Institute released a report (HERE) on measuring school performance. The main recommendation of the report is to replace measurement of average school performance with so-called value-added indices. The idea is very simple – to measure student progress as the primary outcome – and by employing an appropriate statistical model to extract that component of the improvement which can be attributed to the school.
The report was particularly well timed. The federal government launched the My school website on January 28 which publishes average student performance by school, reported within groups of “similar schools”. Moreover, results for the 2010 National Assessment Program – Literacy and Numeracy (Naplan) are to be published in May 2010 and will, for the first time in Australia, mean that value-added measures can be calculated.
There is much to like about the report and it is difficult to argue against it main conclusion – that measuring student outcomes at one time point and averaging over each school does not provide a valid measure of school performance. Obviously this depends on what one means by a school’s performance. If one means the ability of a school to attract and retain smart students while deterring less smart students then the average student outcome is probably the only meaningful measure. But if by school performance you have in mind the effect of the school on the student’s learning outcome, then school averages will be hopelessly biased. The reason is that students are not randomly allocated to schools. Rather, gifted students tend to concentrate in some schools while disadvantaged students concentrate in others.
MySchool tries to correct for this by measuring the level of disadvantage of the school. They use a single measure of disadvantage (ICSEA) largely based on the post code of the student’s address, averaged over the school. Not only is this very crude, but to use this data correctly it must be linked directly to the student’s score, not averaged over the school. And the most pertinent and easily measured index of the student’s previous background is surely their result on the previous Naplan test.
There are some interesting features of the Australian educational landscape that are pertinent. Based on an international standard test (PISA) the variation in scientific literacy outcomes in Australia is larger than the OECD average (by 11%). But this higher inequality is not explained by the popular notions of disadvantaged schools failing to compete with elite schools In fact, 81% of the variation in outcome is within schools. This suggests that measuring individual student outcomes over time will be useful in targeting problem students as well as assessing intervention programs.
Naplan tests are administered at years 3, 5, 7 and 9 every second year. With publication of the 2010 results in May, there will no longer be a practical excuse to not consider more meaningful measures, assuming that Naplan can match the students in their 2008 and 2010 exercises. To not use the student level identifiers (the main value of which is to track student level changes) is statistically indefensible. No trained statistician would publish the school level averages when student identifiers were available.
The report is (probably deliberately) vague about exactly how the value-added measures are to be calculated. The measure will depend on a statistical model which will require some expert statistical modeling. They do mention however that academic research has shown that the final measures are quite insensitive to the exact details of the mode employed.
An appropriate statistic model would try to deconstruct each student’s outcomes into different components. For instance one might say that the 2010 year 9 result depends primarily on (1) that student’s 2008 year 7 result, (b) the students ability to learn, (c) the student’s non-school circumstances and (4) the school environment. It is the last of these components that we think of as measuring school performance. The role of the statistical model is to measure this adjusting for the first three components which are out of the school’s control. The non-school circumstances might include things like parents’ education, occupation, marital and employment status, family size, parity, gender, ethnicity, migration status and language preference. In the Australian context, MySchool already measures socio-economic disadvantage of the student based on their postcode.
There are some misleading statements in the report but nothing that undermines their main contention. For instance, at various points it is claimed that naïve school averages have “consistently been shown to produce biased estimates of school performance compared to value added modeling.” As pointed out above, it is really a matter of how one defines school performance and the matter is not resolved by a mathematical or empirical study. Nevertheless, most reasonable people would agree with the author’s view that average school performance is not a good measure.
At another point, the report notes that overseas experience shows that the volatility of value-added measures is greater for smaller schools. They then argue that small school results should not hold implications for those schools. This is surely incorrect since each school’s value-added measure will come with a measure of statistical significance which takes into account the fact that measures for smaller schools are less reliable. Even for the smallest school, if results were sufficiently bad then some action would be indicated.
It would have been nice to see a suggestion for the kind of statistical model that one might employ. I was thinking of a fixed effects model like
yit = βxit +αyi,t-1+qj1stud I in school j + εit
where y is the outcome, “i” is the student, t is the time, j is the school and x are the individual level covariates of non-school environment. The aim is to estimate the qj.
This looks like yet another public issue where statistics is front and centre of the debate. I hope that the profession can be involved. Perhaps it is time for a letter from the SSAI President to the Minister offering our advice and expertise?
You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.
February 1st, 2010 at 3:35 pm
A major concern I have is the extent to which the “Napalm” tests can be (and have been) gamed.
An example is St Francis’ School, Paddington, NSW, which “was rated the best-performing Catholic primary school in NSW” by the Sydney Morning Herald:
http://www.smh.com.au/national/education/tests-just-childs-play-for-topperforming-school-20100129-n48k.html
According to one mother:
”There was enormous pressure at the school on both teachers and students from day one of term one to achieve good results,” she said. ”The ‘preparation’ for the tests was intense, with extreme pressure to practise through regular class time and heavy-duty homework.
”Many children felt overwhelmed and stressed by the level of work and the performance expectations.”
Also see:
http://www.myschool.edu.au/Main.aspx?PageId=0&SDRSchoolID=NSWC00284_8186&DEEWRID=1647&CalendarYear=2009
If measurement of “value adding” is to be the goal, a school such as St Francis could demonstrate this with half the effort - just intensively train the 5th graders.
Another issue I noticed was among schools with so-called “opportunity classes” (OC) in years 5 and 6. At the end of 4th grade students interested in such classes undergo tests and about 1% enter these classes (many relocate to do so) offered by about 75 of around 1500 primary schools in NSW. These schools’ Naplan results often improved noticeably (versus their peers) between year 3 (no OC classes) and year 5 (new students in OC classes).
It’s a quandary for me as a parent. I want information about the quality of local primary schools - my daughter will go to school next year.
February 1st, 2010 at 4:08 pm
Thanks for the very thorough precis of the MySchool website information from NAPLAN and the issues it raises.
You say “MySchool already measures socio-economic disadvantage of the student based on their postcode.” This isn’t necessarily the case as socio-economic differences occur within postcodes, and it depends on the extent to which ICSEA can remove these within-postcode differences. High socio-economic students within a low socio-economic postcode will tend to go to schools further up the socio-economic ladder (often private schools). This induces a bias as it artificially lowers the socio-economic average for the better schools.
A further bias occurs for selective schools. Such schools will attract the better students for their postcode, and can use the postcode averages to lower their socio-economic grouping for like schools.
SSA should go public on this issue as the present setup is a disgrace. Naplan is a very valuable diagnostic instrument for individual students and individual schools, but when it goes public as league tables it may lead to widening the social divide and not giving credit to excellent schools and teachers working with students at the lower end of the socio-economic spectrum.
February 1st, 2010 at 4:44 pm
Getting the data will be a logistical nightmare. How many kids will have changed school within 2 years? And that’s going to be biased - the kids who don’t change are the ones more likely to score well i.e. more settled home lives.
How do you measure a kids ability to learn?
How do you measure non-school circumstances?
How do you measure what school has done what when a child moves?
It all sounds so incredibly subjective that you’d have to go Bayes to account for all the errors flying around. The year 7 test doesn’t really measure their year 7 performance because it depends what test is chosen (out of all possible tests) and it depends on the how the kids feels on the day.
By the time you account for all the errors it would be interesting to see if there is any descernible discrimination between schools at all. Ranking statistics are notorious for having s.d. near or greater than the range of the ranks.
The large variation within a school being so large is not surprising. There is nothing in this for the kids so why should they put any effort into it, it just sounds like hard work for nothing (I’ve heard tales from kids sitting exams for an international study on literacy/numeracy/science who say they do nothing during the exam because what’s the point?).
February 1st, 2010 at 5:02 pm
Just to correct a small point. The ICSEA scores were based on Census Collectors District data not postcode.
February 1st, 2010 at 5:02 pm
Reply to Warren:
From the ICSEA technical paper linked by Richard Hockey:
http://www.myschool.edu.au/Resources/pdf/My%20School%20ICSEA%20TECHNICAL%20PAPER%2020091020.pdf
It appears the ICSEA scores for each school are based on geocoding student’s addresses to Census Collection Districts (CCDs), which are somewhat smaller than postal areas.
But your concern remains: “ICSEA makes use of the same fundamental approach that the Commonwealth has long used to allocate funds to non-government schools, namely to use CCD information on each student as a means of generating an index that best captures contextual characteristics of the school.”
That “same fundamental approach” has been controversial because of it apparent generous allocations to expensive already-wealthy private schools.
In fairness, the paper acknowledges the shortcomings of this approach.
February 1st, 2010 at 6:57 pm
I did make a point about gaming the system that seems to have been lost.
Basically, whether the outcome is naive school averages or some measure of value adding, schools can and will manipulate their score by intensive prepartion.
See the 2nd example here:
http://www.smh.com.au/national/education/tests-just-childs-play-for-topperforming-school-20100129-n48k.html
February 2nd, 2010 at 6:42 am
Picking up just three points:
1. NZ is currently in convulsions (those interested in education, anyway) about the current government’s steam-rolling of “National Standards”, where every child in Years 1-8 is going to be measured against rather rapidly-set national standards in literacy and numeracy, their parents are going to receive “plain language reports” on their progress, and the school is going to have to report how well their students are progressing (aggregated level report). Most of the hoo-hah is about the govt’s rather hopeful claim that NO league tables will be [able to be] produced. The teacher’s unions are incredulous, and some schools are planning to refuse to comply. The poor outcomes of such programmes in the US, UK and Australia have been much quoted in the local media.
2. The models usually used to measure “value added” progress are random-effect models (aka hierarchical linear models or multilevel models), where sometimes class, almost always school, and depending on the country some regional or local authority data are used as the levels for the random effects (i.e. the models are always 2-level, but can have more levels, if appropriate). Models allowing cross-classification can (while making people’s heads hurt) take account of students moving between schools (or classes).
The extent of bias introduced ignoring this structure depends on the degree of difference (typically in terms of socio-economic advantage)between schools. In NZ, for instance, much more of the difference is within schools than between them, but there are still differences in the 5-20% range between schools (ie the proportion of variability between schools out of the total unaccounted-for variability in the model is between 5 & 20%).
The late Ken Rowe did quite a lot of work, while at ACER, on value-added models, and the actual or imagined advantage of, for instance, private schools. The ACER website should still trawl up examples for anyone wanting to follow this up.
Of course, talking about using “measures of progress” is considerably easier than doing it. The two main approaches (differences scores, and essentially analysis of covariance) both have marked drawbacks. There is a considerable literature on this, with both camps having entrenched supporters.
3. We use “decile” to measure [dis]advantage. It’s based on mesh-block SES data from the census, aggregated for the likely catchment of the school, and, as the name suggests, all schools are then divided into 10 almost-equally-sized groups. More targetted funding is available to lower-decile schools. In our research, in the absence of better individual-level data (like maternal qualifications, family income, etc) we have found it to be a relatively useful place-holder variable. The original blog expressed concern about using a measure like this at the school level. I’ve seen student achievement data entered twice into models like these: once aggregated at the school level (is this a high- or low-achieving school?), and again at the individual level. [My head is still trying to fit around the implications of this.]
February 2nd, 2010 at 9:32 am
Bruce, interesting comments on “gaming”. Another type of gaming that the Grattan report considers (when performance i measured by value-add) is school trying to reduce the performance of a student on their first test in the school. Apart from this, I think that the value-add measure would somewhat reduce the incentive to focus on the exam.
Warren, thanks for the comments about bias. I did have a paragraph on this in an earlier draft but accidentally deleted it. Selective schools will be hugely advantaged by this – you are right. It would be fascinating to see how well these schools come out when assessed by a value-added measure.
Megan, I wonder if you have any faith in statistical analysis at all. You don’t need to measure a student’s ability learn (aka IQ). It is a conceptual model. Non-school circumstances are measured by variables such as those that I listed at the end of paragraph 8. The current policy of publishing school averaged incorrectly standardised by the ICSEA may be less of a “logistical nightmare”, but I would rather the comparisons be done more validly than less.
February 2nd, 2010 at 7:43 pm
You say \The role of the statistical model is to measure this adjusting for the first three components \ which are
\(1) that student’s 2008 year 7 result, (b) the students ability to learn, (c) the student’s non-school circumstances\.
If you don’t have a covariate that measures (b) (i.e. assume you capture in some sort of latent way) then how do you disentangle \(4) the school environment\ from it?
If you are measuring IQ at year 3 as your (b) covariate isn’t that just horribly confounded with literacy and numeracy skills already acquired.
It’s not that I don’t know there are cross-classification models but how do attribute which school gets the benefit of the kids performance. If the kids moves 1 day before the test or 2 years -1 before the test? How do you even get the data? A kid moves across the state, across the world, changes surname (from Mum’s to Dad’s). It all sounds a recipe for a very biased sample.
(Slightly changing the topic)
Unless the ranks are reported with a reasonable stab at a correct variance, I think this does an incredible disservice to schools and to parents. People read the ranks, not only as if they are fact, but that the distance between ranks is a meaningful difference - practically and statistically.
Since my statistics background is in medical statistics, I don’t really have much faith in cross-sectional or cohort studies. Double blind, randomised clinical trials or nothing, I say! ;->
February 8th, 2010 at 1:07 pm
I am concerned with the bias or inconsistancy in the ISCEA, which put my son’s comprehensive rural high school in the the same statistical grouping as Sydney Boys High and Brisbane State High both selective city schools. Needless to say they were both significantly better in their ratings than my son’s school. Sounds like an apples versus oranges comparison. The city/rural differences seemed to also be quite marked for statistically simialar schools.
October 19th, 2010 at 5:53 pm
2. The models usually used to measure “value added” progress are random-effect models (aka hierarchical linear models or multilevel models), where sometimes class, almost always school, and depending on the country some regional or local authority data are used as the levels for the random effects (i.e. the models are always 2-level, but can have more levels, if appropriate). Models allowing cross-classification can (while making people’s heads hurt) take account of students moving between schools (or classes).
October 20th, 2010 at 12:24 am
I am concerned with the bias or inconsistancy in the ISCEA, which put my son’s mirc indir comprehensive rural high school in the the same statistical grouping as Sydney Boys High and Brisbane State High both selective city schools.