Can we judge the performance of schools? (ESRC Festival of Social Science)

thinking-for-learningIs Secondary School Remodelling A Red Herring?

Stephen Gorard

(Published in Teaching Times. Available at

With the election in sight, policy-makers and their advisors are once again promoting various education policies for school improvement that have at their heart a slight remodelling of the nature of secondary schools in England. In the last two decades, we have added to grammar, secondary modern, comprehensive, VA, VC and ex-Direct Grant schools, through a wave of new school types including Foundation, Specialist, City Technology Colleges, Academies, and a wider range of faith-based schools. An increasing number of school-age pupils are being taught in non-school settings such as FE Colleges. Now we have further proposals for trust, parent-led, and Swedish-model schools, for example. Each new type is claimed, by its advocates, to be superior to the majority of those that have gone before. And this superiority is usually expressed in terms of formal measures of pupil attainment. Are these claims true, with schools in England on a continual upward trajectory of improvement as these new policies for school remodelling are imagined by politicians and then implemented? What does the evidence tell us?

It is reasonably clear that attending a school in England makes a difference when compared to not attending a school. Teachers make a difference. For most pupils, each school year of attendance makes a noticeable difference to attainment as measured by tests or qualifications. [See, for example, Luyten, H. (2006) An empirical assessment of the absolute effect of schooling: regression-discontinuity applied to TIMSS-95, Oxford Review of Education, 32, 3, 397-429.] It is important to be clear about this, because commentators are in danger of straying from the fact that going to school makes a difference to the unfounded claim that going to a specific school or type of school makes a difference (in comparison to another school or type of school).

As you may imagine, it is much harder to test the claim that a pupil would have done better in another school. Each pupil has only one chance at schooling, and so we cannot test differences between schools directly. For ethical reasons, we are unable to allocate large numbers of pupils randomly to schools, which is the next best way of testing differential impact (like a clinical trial). All we can do is to try and match pupils across schools, and estimate their progress. If, on average, pupils make more progress in one kind of school than very similar pupils in another kind of school then we might conclude that these schools are differentially effective. On the other hand, we might also conclude that the difference is evidence that our pupils were not really well matched at the outset. Is the difference between actual and expected attainment at school evidence of a differential school effect or merely evidence that our predictions of attainment are not very good?

I think we must favour the latter option. Let me explain why. Contextualised value-added (CVA) is currently the most sophisticated method in use to try and estimate the average difference between expected and actual attainment. The actual attainment scores are reasonably simple to comprehend. But they involve taking things like GCSE, GNVQ Intermediate, NVQ, National Certificate in Business, BTEC, Key Skills, Basic Skills, and Asset Language Units, and converting them to a common currency of points scores equivalent to the ‘best 8’ GCSEs. And GCSE grades, as one example, are not perfect, with annual reports of remarking and incorrect grading. They relate to different examination boards, syllabi, subjects and modes of assessment. So there will be a substantial error component in our measure of actual attainment.

The predicted attainment for each pupil is far more complex. Some pupils will not have a matching record on the database for their prior attainment (at Key Stage 2 for example). Each year around 10% of KS4 records have no KS2 match. And so these pupils either have to be eliminated from the calculation, or have their data simply made up. A further 10% of all pupils cannot be matched across their attainment (NPD file) and context (PLASC file). Of the remainder, around 15% have missing data on whether they are eligible for free school meals or not. Again, a compromise has to be made. FSM could be left out of CVA, or just those pupils with no FSM could be left out, or again we could make this data up. The DCSF currently choose the latter option in all cases, assigning the modal value to any missing data (so any blank becomes not eligible for FSM, white ethnicity, English as first language, or not special needs etc.). An analysis I ran in 2008 found that less than 20% of pupil records had complete data relevant for CVA calculation!

The final step in the CVA calculation involves finding the difference between the actual points score for each pupil and the best prediction based on their prior attainment and the context variables (like FSM). The total of these differences is the published CVA score for each school. But imagine what happens when the subtraction is done. Even if we were to assume that the actual and predicted scores were each 90% accurate (a very optimistic assumption) the result of the subtraction would be meaningless because the two scores will be so close. Imagine a pupil with an actual points score of 100 for attainment at KS4, but with a predicted points score of only 90. On the surface this pupil has done well, with a residual score of +10 that will bring the school average CVA up. Given the 10% error, however, this pupil could have a real score of between 90 and 110 (100 plus or minus 10%) and a true predicted score of 81 to 99 (90 plus or minus 10%). This means the genuine residual could be as high as 29 (110-81) or as low as -9 (90-99). We genuinely have no idea whether this pupil has done better or worse than predicted because our original error of 10% has propagated to nearly 400% in the answer (38/10). There is no way that such a result should be used for any practical purpose, even given the very favourable assumption of only 10% initial error. [See, for example, Gorard, S. (2010) Serious doubts about school effectiveness, British Educational Research Journal, 36, 2.]

So, we do not, and currently can not, have any evidence that one school or type of school performs better with equivalent pupils than any other. This is not the same as saying that all schools perform equally well, but it does mean that we have no reason to assume otherwise. It is clear that the overwhelming majority of the variation in pupil attainment between schools is explicable by prior pupil characteristics (one reason why raw-score league tables are no indication of school performance). This is, after all, what our compulsory, free, universal, SAT-assessed, OFSTED-inspected, QTS-assured, NC-driven system was set up for – so that it made little difference where on went to school. Perhaps we ought to celebrate this fact more, and seek to establish the superiority of any one type of school rather less. Grammar schools have high levels of pupil attainment at age 16 because they select children at 11 who are likely to do well at 16. Secondary modern schools have somewhat lower levels of attainment just because grammar schools exist. Adding the results of both types of schools yields the same kind of aggregate results that would be achieved by a non-selective system.

No one gains from selection, from dangerous faith-based segregation, nor from any of the untested remodelled schools of the last 20 years. I predict that no one will gain from the new whizzo ideas being unveiled for a Spring election. But does anyone lose by this diversification of school types? Yes – diversity of provision is linked to increasing stratification of school intakes, opportunities and life chances, and so to a decrease in equity. Why that is will have to be the subject of another article [or see Gorard, S. and Smith, E. (2010) Equity in Education: an international comparison of pupil perspectives, London: Palgrave]. What is clear is that any move away from the ‘bog-standard’ comprehensive will yield no gain for attainment, and so runs the risks associated with intake stratification for no reason.

Stephen Gorard

Professor of Education Research, University of Birmingham

Prof Gorard is a former teacher in both the state and independent sectors, and worked with the Welsh government when it decided to abolish school league tables. His research is published this month in his book Equity in Education.


Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>