It's the Effect Size, Stupid What effect size is and why it is important

6 years ago



Citation: Coe, Robert. 2002. It’s the effect size, stupid: what effect size is and why it is important. Paper presented at the Annual Conference of the British Educational Research Association, University of Exeter, England, 12-14 September 2002.

Author: Robert Coe Professor of Education, Durham University Assessment - Evaluation - Evidence-based school improvement Verified email at

Abstract Source:

Effect size is a simple way of quantifying the difference between two groups that has many advantages over the use of tests of statistical significance alone. Effect size emphasises the size of the difference rather than confounding this with sample size. However, primary reports rarely mention effect sizes and few textbooks, research methods courses or computer packages address the concept. This paper provides an explication of what an effect size is, how it is calculated and how it can be interpreted. The relationship between effect size and statistical significance is discussed and the use of confidence intervals for the latter outlined. Some advantages and dangers of using effect sizes in meta-analysis are discussed and other problems with the use of effect sizes are raised. A number of alternative measures of effect size are described. Finally, advice on the use of effect sizes is summarised.

During 1992 Bill

Clinton and George Bush Snr. were fighting for the presidency of the United States. Clinton was barely holding on to his place in the opinion polls. Bush was pushing ahead drawing his on his stature as an experienced world leader. James Carville, one of Clinton’s top advisers decided that their push for presidency needed focusing. Drawing on the research he had conducted he came up with a simple focus for their campaign. Every opportunity he had, Carville wrote four words - ‘It’s the economy, stupid’ - on a whiteboard for Bill Clinton to see every time he went out to speak.

‘Effect size’ is simply a way of quantifying the size of the difference between two groups. It is easy to calculate, readily understood and can be applied to any measured outcome in Education or Social Science. It is particularly valuable for quantifying the effectiveness of a particular intervention, relative to some comparison. It allows us to move beyond the simplistic, ‘Does it work or not?’ to the far more sophisticated, ‘How well does it work in a range of contexts?’ Moreover, by placing the emphasis on the most important aspect of an intervention - the size of the effect - rather than its statistical significance (which conflates effect size and sample size), it promotes a more scientific approach to the accumulation of knowledge. For these reasons, effect size is an important tool in reporting and interpreting effectiveness.

The routine use of effect sizes, however, has generally been limited to meta-analysis - for combining and comparing estimates from different studies - and is all too rare in original reports of educational research (Keselman et al., 1998). This is despite the fact that measures of effect size have been available for at least 60 years (Huberty, 2002), and the American Psychological Association has been officially encouraging authors to report effect sizes since 1994 - but with limited success (Wilkinson et al., 1999). Formulae for the calculation of effect sizes do not appear in most statistics text books (other than those devoted to meta-analysis), are not featured in many statistics computer packages and are seldom taught in standard research methods courses. For these reasons, even the researcher who is convinced by the wisdom of using measures of effect size, and is not afraid to confront the orthodoxy of conventional practice, may find that it is quite hard to know exactly how to do so.

The following guide is written for non-statisticians, though inevitably some equations and technical language have been used. It describes what effect size is, what it means, how it can be used and some potential problems associated with using it.


Source: My summary of the Conclusion+Skim+My ideas. Summary: 1.Motivates and describes the use of a metric for quantifying experimental results. 2.Effect size is normally a highly semantically meaningful measure of differences found in an experiment, so it should be included in all results, with preference over tests of statistical significance, which emphasize the sample size’s significance. 3.Raw ‘unstandardised’ mean differences, together with a confidence interval, might be better than the Effect Size metricwhen a sample has restricted range or does not come from a Normal distribution, or if the measurement from which it was derived has unknown reliability.



Great introduction to get intuition. See: 1.Figure 3: Normal and non-Normal distributions with effect-size = 1; 2.Table II: Examples of average effect sizes from research (example: Students’ test performance in reading 0.30).
(I wish we had more categories than Rigor/Coolness/Novelty. This comment perhaps isn’t exactly about ‘rigor’.)

Re: Rigor



(Very minor) To further concretize the point, one could include an extreme example. Suppose an experiment’s outcome is ‘age’ and the control leg has a very large range of measurement, eg sigma=40. Compared to the test leg, how much improvement was there when the difference in age between the two was 2? Clearly, little differred. When I was doing yield analysis, I naturally hit on the effect size metric, without giving it a name. It was so clear that differences between my samples could be dwarfed by the noise.

Re: Rigor