A claim that something acts as a cause of something else is a very strong one. This is partly because it is an intrinsically difficult thing to claim. We cannot observe a causal model directly, and it is even possible to argue that we can explain the world without recourse to causation at all – if everything were random, for example (Gorard 2013). We have proposed a four criteria model for establishing the feasibility of a causal model, building on the prior work of Hume (1962), Mill (1882) and Bradford-Hill (1966).
1. For X (a possible cause) and Y (a possible effect) to be in a causal relationship they must be repeatedly associated. This association must be strong, clearly observable, replicable, and it must be specific to X and Y.
2. For X (a possible cause) and Y (a possible effect) to be in a causal relationship, they must proceed in sequence. X must always precede Y (where both appear), and the appearance of Y must be safely predictable from the appearance of X.
3. For X (a possible cause) and Y (a possible effect) to be in a causal relationship, it must have been demonstrated repeatedly that an intervention to change the strength or appearance of X then also strongly and clearly changes the strength or appearance of Y.
4. For X (a possible cause) and Y (a possible effect) to be in a causal relationship, there must a coherent mechanism to explain the causal link. This mechanism must be the simplest available without which the evidence cannot be explained. Put another way, if the proposed mechanism were not true then there must be no simpler or equally simple way of explaining the evidence for it (Gorard 2002b).
Clearly such a model is not intended to deny the existence of one-off events being caused, or of mutual causation. If each criterion is seen as necessary (though not individually sufficient) for a causal model, then any evidence relevant to at least one of these criteria can contribute to the search for causal mechanisms, through the falsification principle. A cross-sectional study that finds no association between X and Y reduces the likelihood that there is a causal mechanism from X to Y, and so on. No one study is likely to be able to address all four criteria at once. This is not an especially severe threshold for evidence of causation.
However, one of the most noticeable themes from conducting a series of research syntheses (see Chapter Three) is how frequently research reports used strong causal terms to describe their findings, without any apparent justification. Abbot (1998, p.149) complained that ‘an unthinking causalism today pervades our journals’, meaning that correlation, pattern or even opinion was too often described in causal terms. Some studies are really studies of associations merely mis-described as causal, through using increasingly complex statistical analyses with passive or cross-sectional datasets, and using terms like ‘effects’ to mean ‘relations’ or ‘associations’. It was quite common for studies before 2000 to interpret coefficients in regression models as effects and the explanatory variables as causes (e.g. Bachman and O’Malley 1977, Green et al. 1984). Studies using in-depth or small-scale datasets are even worse (see Brown et al. 2004, for example).
If anything this unthinking causalism has worsened since 2000. Robinson et al. (2007) reviewed education journal contents 1994 to 2004, and reported a decline from 45% to 33% in studies using interventions, but a growth in the use of causal statements in non-intervention studies from 34% to 43%. The particular culprit here was statistical modelling, including HLM (MLM), structural equation modelling and path analyses, which was routinely misunderstood by researchers as being some kind of test of causation (Frank 2000, Shadish et al. 2002). No regression coefficient can be interpreted as causal until the list of all possible confounds has been exhausted (Pratt and Schlaifer 1988, Sobel 1998). But there is a ‘potentially inexhaustible list of potentially confounding variables’ (Pan and Frank 2003, p.23), and perhaps because of the traditions of different relevant disciplines such as psychology and sociology and their sub-divisions and themes, a lot of research studies variables in isolation rather than in concert (Newton 2010). This means that a lot of the association and longitudinal work is open to doubt about omitted variable bias. A key problem for modelling is that the post hoc fitting of results can yield very misleading associations. A simple model with one outcome variable (attainment at school perhaps), and a number of predictor variables can yield an R-squared of 1 even if the actual values of all variables are merely random numbers (Gorard 2008a). Regression techniques can uncover ‘patterns’ that simply do not exist. But the problem is more general than this. It pervades nearly all research.
One of the chief sources for the evidence on which this book is based is a series of rigorous reviews and syntheses of the pre-existing research – both published and unpublished. Each followed a similar approach, differing chiefly in their topic of interest. The reviews concerned:
- the causal link between individual attitudes and educational outcomes such as attainment and participation;
- the causal link between individual behaviour and educational outcomes;
- the causal link between parental attitudes/behaviour and their children’s educational outcomes;
- the causal link between teacher qualities or behaviour and their children’s educational outcomes;
- interventions to improve parental engagement in their children’s education;
- interventions to improve literacy and numeracy for disadvantaged students;
- interventions to improve post-compulsory participation for ethnic minority students and
- how to widen participation in higher education for disadvantaged students.
The hunt for evidence involved electronic searches of the main educational, sociological and psychological databases. These included ASSIA, Australian Education Index, British Education Index, EPPI-Centre database of education research, ERIC, International Bibliography of the Social Sciences, PsycINFO, Social Policy & Practice, Social Science Citation Index, and Sociological Abstracts. Following a substantial scoping review to test the sensitivity of the search terms, a standard and very inclusive statement of search terms was used for each database (adjusted to suit the idiosyncrasies of each). This statement of search terms was tested, adjusted and retested iteratively to ensure that as little as possible relevant material was missed. A key purpose of the search was to gather grey unpublished literature as well, wherever possible, so that the possibility of publication bias was reduced. To these were added the results from hand searches of new journal issues, and from experts in the field.
An example of the full search ‘syntax’ of the terms and the logical operators (and, or, not), used when looking for evidence on the causal link between attitudes and educational outcomes, was:
(attainment OR test score* OR school outcome OR qualification OR exam* OR proficiency OR achiev* OR “British Ability Scales” OR “Key Stage” OR NEET OR “sixth form” OR college OR post-16 OR “post-compulsory” OR “postcompulsory”) AND (attitud* OR expectation* OR aspiration* OR behaviour* OR intention* OR motivation OR self-efficacy OR locus of control OR “family background” OR “home background” OR SES OR “socio-economic status” OR “socioeconomic status” OR poverty OR disadvantage OR “low income” OR deprivation) AND (child* OR school) AND (caus* OR effect* OR determinant* OR “regression discontinuity” OR “instrumental variables” OR experiment* OR longitudinal OR randomi?ed control* OR controlled trial* OR cohort stud* OR meta-analysis OR “systematic review”)
Inevitably, each search initially yielded tens or even hundreds of thousands of separate research reports. These were initially screened for relevance and duplication by title, and the remaining reports were then double-screened by abstract. To be included in the subsequent synthesis, the report had to be comprehensible, relevant to the topic, and describe the methods and evidence in reasonable detail. In general, the reviews included only material published in English, from 1997 until early 2012. Some studies prior to this time period were also included where they were deemed well-cited pieces or were directly relevant pioneering work validated by the What Works Clearing House. The quality of any research as evidenced by its full report was used to judge how much weight to place on its evidence. The reviewers then synthesised all reports regardless of quality according to the four causal criteria (above). If each criterion is seen as necessary (though not individually sufficient) for a causal model, then any study including evidence relevant to at least one of these four criteria can contribute to the search for causal mechanisms. This inclusive approach, supported by others such as (Lykins 2012), is based on our earlier reviews which have found that the major problem with poor quality research lies in its unwarranted conclusions rather than necessarily with the evidence or kind of evidence it presents.
However, it was noticeable that it was possible to devise a plausible explanatory mechanism for the effect of any mental concept, even where there is no empirical evidence of effect, or even where there is good evidence of no effect. This suggests that the theorised mechanism is the least important part of any causal model, and so is given less emphasis in the following chapters. If it is clear that altering an attitude works to improve attainment with no damaging unintended consequences and at reasonable cost, for example, then it matters less if the mechanism is not understood. On the other hand, even the most convincing explanation possible is of little consequence if the attitude has no discernible or beneficial effect on educational outcomes. Evidence for all three of the other elements – association, sequence and intervention – must be present in order to be confident that any relationship is causal. In general, there was no consistent reporting of effect sizes in the studies, so no meta-analysis is possible (Gorard 2013). Hence, it is also not yet possible to conduct a cost-benefit analysis of interventions in any area.
For each review there will be studies that have been missed. This would only matter if their inclusion would have substantially altered the conclusions based on the hundreds of thousands of studies there were used. A more concerning issue is that there may be studies or commercial evaluations of learning artefacts missed because they have no publicly available or on-line reports. These are perhaps less likely to be positive evaluations than negative or neutral ones. Given that there are also well-known problems like the so-called ‘Hawthorne’ effect, and the higher effect sizes encountered in research with training, expertise, resource and enthusiasm than in roll out of the same interventions, readers should assume that each review paints a somewhat more optimistic picture than full disclosure would reveal.
For further details of each review see (Gorard et al. 2012a, Gorard et al. 2012b, Gorard and See 2012, and See et al. 2012).