### Marbles! Confusion over significance testing

**Stephen Gorard**

Consider this relatively simple problem of probability. A bag contains 100 balls of identical size, of which 30 are red and 70 are blue. If someone picks one ball at random from the bag, what is the probability it will be red? This is a good example of a mathematical question that might appear in a simple test paper, and that has some applications in real-life, such as in gaming for example. It is similar to asking about tosses of unbiased coins or rolling of unbiased dice. We have perfect information about the size of the population of balls (there are 100), and the distribution of the characteristics of interest (30:70). Given these clear initial conditions it is easy to see that the chance of drawing a red ball from the bag is 30/100 (30%). It is almost as easy to see that the chance of drawing two red balls one after another (putting each back after picking it) is 30/100 times 30/100 (9%), or that drawing two red balls at the same time is 30/100 times 29/99 (nearer 8.8%).When social scientists conduct a significance test, this is the situation they assume they are involved in. They assume an initial condition about the prevalence of the characteristics of interest in the population and then calculate, in much the same way as for coloured balls, the probability of the observing the data that they do observe. The calculation is relatively simple and can easily be handled by a computer. The analyst then knows, if their assumption is true, how probable their observed data is.Now consider a rather different problem of probability. A bag contains 100 balls of identical size, of two different colours (red and blue). We do not know how many of each colour there are. If someone picks a red ball at random from the bag, what does this tell us about the distribution of colours in the bag (beyond the fact that it must have originally contained at least one red ball)? It seems to tell us very little. There could be 30/100 red balls, or 70/100 or 99/100. The drawing of one red ball does not really help us to decide between these feasible alternatives. We certainly cannot use the existence of the red ball to calculate probable distributions in the population, because we do not have perfect information (unlike the first example). Yet this situation is much more life-like in being a scientific problem rather than a mathematical one. In social science we rarely have perfect information about a population, and if we did have it we would generally not bother sampling (because we already know how many balls are of each colour or how many people are in each income bracket). The more common situation is where we have information about a sample (the colour of two balls), and wish to use it to estimate something about the population (all of the balls in the bag). This is clearly impossible.

When social scientists conduct a significance test, this is the situation they are actually involved in. But estimating the population from a sample like this is so hard that they use an artificial device to help. They try to convert the second scenario into the first by assuming a hypothesis about the population at the outset. For example, they might assume beforehand that the bag with 100 balls of two colours has 50 red and 50 blue balls. Then if they draw a red ball at random they believe they can use this knowledge to help estimate the number of red balls in the bag. But can they? As in the first example, the calculation of the probability of drawing one red ball from a bag of 50 red and 50 blue is quite easy (it is 50/100 or 50%). This can be handled by a computer. The analyst then knows, if their assumption is true, how probable their observed data is. But how do they convert this probability into what they actually want, which is the probability that there are indeed 50 red and 50 blue balls? This is much trickier, and requires a more complex calculation involving more information than is usually available.

Gorard, S. (2013) *Research Design: Creating robust approaches for the social sciences*, London: Sage