I’ve written several posts about the 2006 study of Iraqi deaths, written by Burnham et al, but not about the predecessor study, authored by Roberts et al and published in 2004.  (This blog started in August 2005.)  Both studies analysed data collected in personal interviews at Iraqi households selected using cluster sample methodology and purported to find that tens of thousands of Iraqis have died because of the US invasion.  Both were published in The Lancet.

The 2004 study is posted here and the 2006 study here.  (Both files in pdf format.)

David Kane of Harvard University’s Institute for Quantitative Social Science has written an intriguing critique of the 2004 study.  The random selection of clusters in that survey turned up a neighbourhood in Falluja, where intensive widespread bombing had recently occurred.  The estimated death rate in Falluja was far above the other clusters sampled—over seven times greater than the second-highest observation—and the authors decided to ignore that observation for their final analysis.  Dr Kane questions the decision to brand Falluja an outlier and remove its data.

From page 5 of the 2004 study:

[T]he Falluja cluster is an obvious outlier and might not belong with the others.

And this, from the findings summary on page 1:

The risk of death was estimated to be 2·5-fold (95% CI 1·6–4·2) higher after the invasion when compared with the pre-invasion period. Two-thirds of all violent deaths were reported in one cluster in the city of Falluja. If we exclude the Falluja data, the risk of death is 1·5-fold (1·1–2·3) higher after the invasion. We estimate that 98000 more deaths than expected (8000–194000) happened after the invasion outside of Falluja and far more if the outlier Falluja cluster is included.

First of all, it is eminently arguable that, as Kane says, “Falluja is a legitimate data point”.  Outliers generally appear as a result of measurement error, for example, data entry error or sample surveyor error.  That cannot account for the Falluja data because the data source is a survey by the study’s authors.  Presumably, they went back to their own raw survey data to verify calculated mortality rates.

Moreover, given that the hypothesised cause of higher deaths was an armed invasion, one would expect mortality increases to vary extremely widely across the country.  Some regions would experience far greater increases in death rates than others.  Something like Falluja is what one would anticipate.  By that line of reasoning, Falluja is indeed a legitimate data point.

In fact, the Falluja observation bolsters the authors' argument that the US invasion led to a disturbingly huge increase in deaths among Iraqi civilians.  (See the political perspective evidenced in the final paragraphs of the paper.)  Why then would the authors toss it out?

Dr Kane’s analysis suggests an answer.  He shows that, had Falluja been included, the confidence interval would have been much larger than the one reported—at both ends.  Indeed, the lower bound would have been negative, meaning that the number of excess deaths would have been statistically insignificantly different from zero.

[I]ncluding this cluster — i.e., using all the available data — generates a result with such a wide confidence interval that the reported increase in Iraqi mortality becomes statistically insignificant.

Dr Kane estimates that the confidence interval including Falluja data would be -130,000 to 659,000 with a mean of 264,000.  The published confidence interval excluding Falluja was 8,000 to 178,000 with a mean of 98,000.

In short, had Falluja been included in the analysis, the study would not have been able to reach any definite conclusions at all.

The Lancet authors cannot reject the null hypothesis that mortality in Iraq is unchanged.

In that case, the study would almost certainly not have been submitted for publication, let alone published.

h/t: Michelle Malkin

Previous related posts: