Dr Patrick White, University of Leicester
The importance of the study
Banerjee et al. (2007) Remedying education: evidence from two randomized experiments in India, Quarterly Journal of Economics , Volume 122, Issue 3, pp. 1235-1264 is an important study because it addresses an issue of concern to all governments – identifying the potential for literacy and numeracy ‘catch-up’ among young students living in disadvantage or otherwise under threat of not reaching expected levels of attainment. The scale, design and analytic plan of the study reported in Banerjee et al. (2007) are all commendable. Around 15,000 young students in grades 3 and 4 are involved, in two areas, and with interventions affecting different sub-sets of students. All students receiving the interventions were reported as falling behind.
It is crucial that the stated ‘effect’ sizes in Banerjee et al. (2007) are accurate because it is these that will form the basis for calculations of cost-effectiveness (or dollar cost per long-term unit of improvement). The study deals both with literacy and maths, and uses a technology approach and a supplementary teacher approach. It focuses on the kinds of young students most at risk, in urban India, where, as the authors report, the system is struggling to adapt to the widening of participation. And it reports two interventions that were successful in terms of gain scores, and revisits these scores after one year in order to assess ‘intervention decay’. This is a potentially important and widely relevant study and one that is certainly worthy of replication. What is proposed below is an ‘internal’ replication, considering the internal validity of the research results, and of the interpretation of those results, for the population examined in the original article
Accessing and retrieving the data files and other material
The relevant data sets used in Banerjee et al. (2007 have been successfully located and retrieved from the MIT Dataverse website. These data sets are available in various formats and separated according to intervention, city and year. Additional files necessary for exact replications of particular data sets are also included, as are syntax files for use in the STATA software package. These files have been successfully opened and examined. Initial explorations suggest that they are suitable for replication of the original analyses, and some extensions of these that are discussed below.
In addition to the article published in the Quarterly Journal of Economics (Banerjee et al. 2007), two additional publications relating to the research have also been located (Banerjee et al. 2004, 2005). Each of these publications contains additional information about the analyses conducted by the authors not available in Banerjee (2007), presumably for reasons of space. Where appropriate and allowed by the available data, analyses conducted and reported in these earlier reports will be replicated in the reanalyses described below.
Some communication with the authors of the study may be required during the course of the re-analysis in order to clarify certain points regarding the nature of particular variables (where this is unavailable in the documentation) and the exact procedures undertaken during some of the analyses (where this cannot be deduced from the available syntax). Because, at this stage of the project, some uncertainty still exists around particular details, the plan of analysis outlined below should be taken as indicative of the analyses that will finally be undertaken. It is anticipated that a majority of analyses will be carried out as planned, however, and that additional lines of inquiry are likely to be identified in the course of the re-analysis.
The first phase of this new project will be a direct or ‘pure’ replication, involving re-running the analyses presented in Banerjee et al. (2007) and elsewhere using the same data and techniques as far as possible. The latter sources include mid-term tests and cost-benefit evaluations. These will also be replicated to assess whether an identical re-analysis of the data, using the same techniques, results in the same outcomes as reported by the original authors. Any discrepancy will be reported and investigated further. This phase also involves checking the syntax used in any analysis for face validity.
Measurement and estimation analysis
The remainder of the work, as time and resource allows will be in the form of measurement and estimation analysis replication. This includes redefining the variables of interest, using alternative estimation techniques, and implementing other redefinition strategies. This will not involve new datasets or additional variables. It does not include a theory of change analysis. The analyses that follow will differ from those reported by Banjeree et al. These analyses can usefully be divided into three types:
- The first group of re-analyses will be very similar to those used by the study authors but will differ in relatively minor respects, such as using different data as outcome variables (e.g. post-test only scores, competency levels, non-normalised data or school-level scores) or using different formulas to calculate effect sizes.
- A second group of analyses will re-analyse the data using both the original techniques and those variations described above but excluding non-responding institutions and/or those allocated for treatment from the analyses.
- The third group of analyses will use different analytic techniques to those used by the study authors (e.g. sensitivity analysis and regression analyses including contextual data).
A summary of the planned analyses is outlined below.
The next phase will involve recalculating separate effect sizes for each intervention, in each city, and for each year. For the Balkashi intervention, separate effect sizes will be calculated for literacy, numeracy, and literacy and numeracy combined. The computer assisted learning (CAL) intervention only targeted mathematics attainment and so effect sizes will calculated for differences relating to these scores. For the separate literacy and numeracy scores in the Balkashi intervention, and for the mathematics scores in the CAL intervention, effect sizes will be calculated and reported using standardised (normalised) scores and, unlike in the original report, non-standardised (non-normalised) data. Several different methods of calculating an effect size will be used for each analysis. Effects will be calculated for both one-year and two-year effects, and post-intervention to measure any ‘intervention decay’.
In Banjeree et al. (2007) the dependent variable for the above analyses is the change between pre-test and post-test scores. The re-analysis will also use this measure but will repeat all the above analyses using post-test scores only. As Gorard (2013) has shown, although traditional designs use change between pre- and post-test scores, using post-test scores only reduces the change of propagated measurement error forming a large proportion of the dependent variable. In the absence of such error propagation, and given reasonably balanced groups at the outset, both pre- and post-test and the post-test only approaches should produce equivalent results.
Given that the only probabilistic uncertainty is at the school grade level (there was no randomisation of students individually), then the most appropriate statistical analysis for significance will be at the school grade level. This is as recommended by the Campbell Collaboration (e.g. Cochrane 2012). We will also calculate robust standard errors – adjusted for clustering at the school-grade level –for the differences between treatment and control groups (using both change between pre- and post-test scores, and post-test scores only, as dependent variables). Significance levels will also be calculated for the differences between means. Unlike in Banjeree (2007), exact p-values will be reported in the re-analysis.
The clearest indication of the direct impact of both interventions is likely to come from a consideration of the main large study in Vadodara, and including in the analyses only those schools initially agreeing to take part. This will form the basis for an alternative analysis. If the addition or exclusion of the later 24 schools in the Balsakhi Program, for example, leads to different substantive outcomes then this could be seen as lessening the security of the published results.
Variation in and the distribution of ‘competence levels’ will also be investigated at the bivariate level. Because of the categorical nature of these data, different types of analysis (and correspondingly different measures of ‘effect’) will need to be used. If sufficient variation and/or change in these levels is identified through bivariate analysis, the possibility of applying multivariate analyses to these outcomes these outcomes will also be explored.
All the above analyses will be repeated using data where initially non-responding schools and those where treatment did not materialise are excluded. While the results of any inferential tests and related outputs may be affected by this, any strong effect unlikely to be dramatically reduced by the exclusion of these cases.
Where possible, similar bivariate analyses will be conducted with school-level data, for all dependent variables identified above. Aggregate measures of the performance of treatment and control groups will be used, eliminating any effect that clustering may have on standard errors. Although detail on individual variation is reduced in such an analysis, this approach is part of a wider attempt to test the strength of the observed effects of the treatment through the application of different analytic techniques using different levels of data.
The authors of the study use OLS regression to model the effect of receiving the intervention on the change between pre- and post-test scores. For the Balkashi intervention, change in mathematics, language, and mathematics and language scores combined are used as three separate dependent variables. Three different models are specified:
- A model with an intercept, a dummy variable indicating whether a child received the Balkashi intervention or not, and a control variable for the child’s pretest score
- A similar model but without the control for pre-test score.A difference in difference specification.
The results of 36 separate regression analyses are presented in Banerjee (2007), each corresponding to a different combination of city, year and dependent variable. These analyses have also been repeated without controlling for pre-test score and using a difference in difference (DD) specification. In the proposed replication, these analyses will be replicated exactly and the results presented in full (which was not possible in the original article). In the re-analyses, these analyses (apart from DD) will also be repeated using only post-test scores as the dependent variable in each model. If possible, multivariate analysis will also be conducted using competence levels (or change in competence levels) as an outcome variable. A variant of logistic regression analysis is likely to be most suitable for these analyses.
Banjeree et al. (2007) used a slightly different technique for the analysis of the Balkashi intervention in Mumbai in Year 2 (and also the combined analysis of the Vadodara and Mumbai Year 2 data). Because not all schools in the treatment group actually received a balkashi, a two-stage least squares (2SLS) model was used in these analysis, with a dummy variable for intention to treat used as an instrument. These analyses will be repeated in the re-analysis.
As with the bivariate analyses, the multivariate re-analyses will report both robust standard errors and exact p-values for the co-efficients associated with treatment.
All the above analyses will be repeated using data where initially non-responding schools and those where treatment did not materialise are excluded. This approach will inevitably compromise the random nature of the sampling and allocation to treatment, and so have a detrimental effect on the inferential statistical estimates. However, if the effect of the intervention is strong, this should not affect the coefficients associated with receiving treatment in such a way as to substantially alter any conclusions drawn from the findings.
It would also be useful, if the appropriate data still exists, to run the analyses for test items relevant to grades 1 to 4 separately. We know, from the existing research report, that the interventions appear more effective for the students with lower pre-test scores. What we hope to discover is whether this is related to the material of each grade or whether it is a ‘flat’ effect on materials covered in all grades 1 to 4. In all of these analyses we will use variants of regression modelling, appropriately adjusted, to predict student and school outcome scores using all available data.
Further extensions to the analysis
Three further analytic approaches will be used with the data provided, given appropriate conditions and as time and resource allows. These are outlined briefly below.
This analysis would investigate the impact of any drop-out from the intervention by estimating how different drop-outs would have had to be from others in their allocated groups for the difference in groups to be zero. This is a ‘what-if?’ analysis, based on a series of iterated calculations, starting from an unrealistic assumption that all dropout cases would have had scores in the opposite direction to the reported conclusion, and working out from here how many would in fact have needed such scores for the reported difference to disappear. This is not an exact calculation, since it depends not only on the direction of each difference but also its scale. However, it does gives a clear feel for how much faith to put in the results, given any reported level of dropout. This is an issue that standard statistical analyses and even imputation procedures completely ignore.
The regression model used by the authors is admirably simple. However, it would be interesting to rerun it with the addition of all available and relevant contextual variables for individual students. This could investigate further which subgroups are likely to benefit most, and whether there is a value-added or halo effect for peers who do not receive the intervention. The available contextual data is limited to include:
- age or DOB (age for Balkashi, DOB for CAL)
- pre-test score
- in lowest 20 (percentage presumably, does not specify) in scores of pre-test
- ‘competency level’ (literacy/numeracy, levels 1 to 3).
At the school level (c. 75 schools) we could also add aggregated individual scores (proportion of students of either sex, for example) and whether the grade size is below 50 students.
It appears that stratified random allocation to groups was used throughout. It would be useful to assess the impact of this allocation strategy on the range of pre-test scores (at least) both by following the same steps as in the original study, and by simulating different approaches such as more complex matching of cases into pairs for comparison. The Banerjee et al. paper does not seem to provide a table of descriptive statistics showing balance in the other pre-treatment covariates (besides pre-test scores). It seems important, thus, to look at whether balance was achieved for a larger set of pre-treatment covariates.
The overall results will show to what extent developing countries might wish to use their resources for either intervention, and at what grade and learning level each intervention is most cost-effective. The final report will transform the results into clear recommendations for policy and practice.
Banerjee, A., Cole, S., Duflo, E. And Linden, L. (2005)Remedying Education: Evidence from Two Randomized Experiments in India, NBER Working Paper: No. 11904
Banerjee, A., Jacob, S. and Kremer, M. (2004) Promoting School Participation in Rural Rajasthan: Results from Some Prospective Trials, MIT Department of Economics Working Paper, 2004
Cochrane (2012) http://www.cochrane-net.org/openlearning/html/modA2-4.htm
Gorard, S. (2013) Research Design: Robust approaches for the social sciences, London:Sage