Running a randomised controlled trial (RCT) for an educational innovation involves considerable planning at the outset, with care and attention throughout. The reward is a clear result that is easy to analyse and simple to interpret.
Planning involves a number of decisions including what the innovation is, how it will be implemented (scale, timing, duration, resources, year groups etc.), what it will be compared to (standard practice perhaps), how the cases for comparison will be selected, what the primary outcome measure will be, and whether the success/failure criterion will be about relative progress or eventual differences between the groups.
An example might be the use of teaching assistants in primary schools to hold reading-aloud sessions with individual Year 5 pupils outside classrooms during normal curriculum time. The question could be whether this produces improved average reading ability after one year, in comparison to the pre-existing standard deployment and use of TAs (Teaching Assistants).
Answering this would require a large number of pupils. The large scale is important so that any ‘effect’ from the innovation can be seen over and above the noise and clutter that will inevitably appear in the data. There will be errors in measurement of reading ability, fluke results, TA and pupil absences, pupils dropping out, and probably wide variation in the impact for different individuals.
These pupils would be tested for their reading ability via a commercial standardised reading test, and the scores recorded for later use. The individual pupils will then need to be randomly allocated to receive the innovation or not. This randomisation must be rigorous and independent of pupil characteristics (such as whether the pupil is in need of help). This creates an unbiased distribution of pupils between the innovation and comparison group, and is a necessary basis for a fair test of the innovation. Only the schools and pupils fully signed up for participation in the trial, and with a prior test score, must be included in the randomisation. They must be willing to accept allocation to either group from the outset. These form the ‘population’ for the trial. The testing is done before randomisatoin to protect against any possible bias in testing caused by knowledge of which group a pupil is in.
Some background data relevant to each pupil is also useful (binary classifications such as attracting pupil premium or not). These can be used later to see if the innovation was effective for particular kinds of pupils, such as those facing potential disadvantage. They can also help to check that the two groups appear well-balanced at the outset.
Care and attention
Now any necessary training for staff and changes to procedures take place, and the innovation starts, involving only those pupils allocated randomly to receive it. Once the specified length of the intervention is complete (one term perhaps), all of the pupils in both groups are re-tested. The re-test must be of the same kind as the initial test, but not usually exactly the same. And then analysis of the results can start.
The initial planning will have protected the trial against systematic bias in the selection of pupil participants, their allocation to groups and their pre-test scores. However, there are a number of well-known threats to trials that must still be guarded against once the trial is underway. These threats include bias caused by pupil demoralisation or dropout once their group allocation is known. One possible solution is to use a waiting-list approach, whereby all pupils get the intervention but half have this delayed by one term (the groups merely become phases in the delivery of the innovation). It is hard to avoid the possibility of bias in the re-testing process since everyone now knows their groups. Automating the procedure (on-line), testing in whole classes (involving both phases at once), and use of external moderators are just some of the ways to reduce this threat. Other threats include diffusion, where the innovation is inadvertently cascaded beyond the allocated group (pupils passing resources to each other, teachers adjusting practice before the trial is complete etc.). For this reason, it is usual to observe and monitor the process of the innovation (for ‘fidelity to treatment’). All threats and peculiarities must be recorded.
Above all, those involved in running the trial must genuinely care more about finding out whether the innovation works or not (curiosity) than about showing that it does work (prejudice). It is this attitude of mind more than any technique that determines the success of the trial.
Analysis and interpretation
When the first innovation phase is complete, and all pupils have been re-tested, analysis can proceed. This is deliberately simple because the planning and care should have taken care of so many of the alternative explanations for any result. In an aggregated trial, although individual schools may wish to look at their own results, the test scores will be passed over for an aggregated analysis of all relevant schools (to create the large numbers required). As far as possible the results should be included for all pupils in the original population, even where these had been absent or moved to a different school. Such things happen in real-life for any innovation, and the analysis is said to be of the ‘intention to treat’. An overall ‘effect’ size is calculated as follows:
- The progress of each pupil is defined as the pre-test score minus the initial test score.
- The average progress of pupils in each group (phase) is defined as the sum of their progress scores divided by the number of pupils in the group.
- The difference between the average progress for each group is defined as the average for the innovation group minus the average for the comparison group.
- This difference is standardised by dividing it by the standard deviation of the progress scores in the comparison group.
- The standardised difference is the effect size.
A positive result means the innovation group did better. A result close to zero means no effect. Successful educational innovations often have small effect sizes of around 0.2. If the trial was conducted properly, then an effect size of 0.2 is good evidence that the innovation was the primary cause of the difference between the groups. The planning and care make interpretation of the result easy.
The analysis can also be repeated with pupil premium pupils, or for low attainers only. However, it is important that there is no ‘dredging’ for success. The primary measure (for success or failure) must be specified at the outset.
- Decide on the intervention, timing, and testing
- Select the cohorts to take part in the trial, and secure agreement where needed
- Ensure that IT works for an on-line test, and upload pupil data from SIMS
- Test pupils, and send teh test results to EEF-appointed external evaluator
- Randomise pupils to groups/phases fairly (perhaps using playing cards, or names from a hat)
- Conduct the intervention, guarding against threats like preferential treatment of innovation group
- Permit the EEF-appointed external evaluator to observe innovation/testing in operation, if requested
- Re-test pupils, and send the new test results to EEF-appointed external evaluator
- Send comments and observations on the process of innovation to EEF-appointed external evaluator
- Calculate own school effect size, if desired
- Await aggregated results
Recommended further reading
Gorard, S. (2013) Research Design: Creating robust approaches for the social sciences, London:Sage – ISBN 978-1446249024