Earlier this week, British medical journal The Lancet published a study estimating that, since the US-led invasion in March 2003, almost 655,000 Iraqis have died who would not have died had the invasion not occurred. That estimate is far above previous estimates of post-invasion Iraqi deaths, which generally range between 40,000 and 120,000. Immediately, the study received widespread attention and generated a great deal of controversy in the media, in the halls of government, and around the blogosphere.
The article is entitled “Mortality after the 2003 invasion of Iraq: a cross-sectional cluster sample survey” by Gilbert Burnham, Riyadh Lafta, Shannon Doocy, and Les Roberts. Drs Burnham, Doocy, and Roberts are affiliated with the Johns Hopkins Bloomberg School of Public Health, Baltimore, and Dr Lafta with the Mustansiriya University, Baghdad. The full text is available here in html, and here as a pdf document. (All page references to the study in this post refer to the pdf version.)
I put on my professional statistician's hat and had a good long look at the study. In my opinion, it is statistically unsound and unreliable. The study violates the basic principle of good statistical practice by relying on a non-random sample survey. Also, the article's description of survey operations raises reliability, and perhaps even credibility, questions.
The study is based on a sample survey conducted between May and July of this year utilising a cluster sample methodology. Cluster sampling is a multi-stage procedure to select sample respondents. In the first stage, clusters, or small areas, of the region (in this case, Iraq) to be surveyed are selected. Within the clusters, neighbourhoods are selected, and then main streets; finally, particular residences are chosen and surveyed. (More details are given below.)
Forty-seven clusters were selected in proportion to the population of 16 of the country's 18 Governorates. (Originally, 50 clusters were to be surveyed representing all Governorates, but operational problems necessitated omission of three.) Within each of the clusters, administrative units and main streets were chosen at random in proportion to population; then particular residential streets were chosen at random where households were surveyed.
[S]election of survey sites was by random numbers applied to streets or blocks . . . [p. 2]
The plan was to interview forty households per cluster but, due to the vagaries of field operations under potentially dangerous conditions, fewer than 40 households were surveyed in some clusters. Thus, a sample of 1849 households with an average of 6.9 persons per household were surveyed, comprising a total of 12,801 individuals.
Here arises a problem with the purported randomness of the cluster selection. According to the methodology as just outlined, all of the 47 clusters were located in urban areas. Rural areas do not have “streets or blocks” as such, nor do they have residential streets with 40 adjacent households. According to the study’s own documentation, every cluster was located in an urban area; none was selected in a rural area.
According to the UN's 2004 Iraq Living Conditions Survey (ILCS), however, 7,132,000 of Iraq's total population of 27,132,000 live in rural areas. (See Table 1.6 on page 22 [numbered 21] of this pdf document.) Some 26% of Iraq's population live in rural areas, but not one of the 47 clusters was located in a rural area. The probability that, if a true random selection were made, all 47 clusters would be chosen from urban areas is 74% raised to the 47th power—a very small number indeed. It would appear that an a priori decision was made to exclude rural areas from consideration as cluster sites. In that case, the selection of sample respondents was not random. There are, I would think, good reasons for believing that armed conflict in urban areas is likely to kill more people than armed conflict in rural areas, other things being equal. It is therefore probable that the Lancet survey, because it includes only urban residents, is biased toward producing an overestimate of deaths.
Serious questions are also raised by the description of field operations, according to which the survey went smoother than any survey I’ve ever heard of.
There were two survey teams, each consisting of two female and two male interviewers, and one supervising field manager. The survey was in the field between 20 May and 10 July 2006. Survey respondents were chosen according to the procedure outlined above. Once a particular residential street was selected within an administrative unit within a cluster, a start household on the street was chosen at random. Beginning with that household, the interview team proceeded to survey adjacent households until forty were done. Here’s an outline of the survey content.
The survey purpose was explained to the head of household or spouse, and oral consent was obtained. Participants were assured that no unique identifiers would be gathered. No incentives were provided. The survey listed current household members by sex, and asked who had lived in this household on January 1, 2002. The interviewers then asked about births, deaths, and in-migration and out-migration, and confirmed that the reported inflow and exit of residents explained the differences in composition between the start and end of the recall period. Separation of combatant from non-combatant deaths during interviews was not attempted, since such information would probably be concealed by household informants, and to ask about this could put interviewers at risk. Deaths were recorded only if the decedent had lived in the household continuously for 3 months before the event. Additional probing was done to establish the cause and circumstances of deaths to the extent feasible, taking into account family sensitivities. At the conclusion of household interviews where deaths were reported, surveyors requested to see a copy of any death certificate and its presence was recorded. Where differences between the household account and the cause mentioned on the certificate existed, further discussions were sometimes needed to establish the primary cause of death. [p. 2]
Now check this summary of field operations:
In 16 (0·9%) dwellings, residents were absent; 15 (0·8%) households refused to participate. [p. 4]
The interview team went to 1849 households in urban areas of Iraq and encountered only 15 refusals and only 16 residences where neither the head of the household nor a spouse was in. Don’t forget that they only went to each household once: there was no follow-up whatever. If I ran a door-to-door survey with a response rate of 98.3% on the first go-round, I’d think I’d died and gone to statisticians’ heaven. That is nothing short of miraculous. That response rate implies that family heads in urban Iraq are virtually always at home.
Don’t heads of households and their spouses in urban Iraq have jobs? Don't they go out to meet friends? Do they never visit relatives in other neighbourhoods or towns? Do they not engage in any activities outside their homes? Are they never in the middle of a family meal and don’t want to be interrupted by unknown visitors asking intrusive personal questions? Never out shopping for groceries or passing the time of day at a local coffee shop or dropping off the family car at the mechanic’s? Do they just stay around the house all day every day? In short, do those folks living in urban Iraq have any semblance of normal lives?
I realise that armed conflict would impel most people to huddle in their homes behind locked doors (in which case they would be unlikely to open the door to strangers), but that possibility doesn’t enter into it because the locations selected for interview were altered if they appeared unsafe.
Decisions on sampling sites were made by the field manager. The interview team were given the responsibility and authority to change to an alternate location if they perceived the level of insecurity or risk to be unacceptable. [p. 2]
Admittedly, I have no personal experience of daily life in Iraq. Nevertheless, the 98.3% initial response rate is foreign, not just to my experience, but to any real-world survey situation imaginable.
Here's another strange remark about this survey's field operations:
One team could typically complete a cluster of 40 households in 1 day. [p. 4]
According to the summary of the survey content, quoted above, there’s a lot of ground to cover in each interview. Locate the head of household or spouse (fortunately, 99.1% of ‘em were at home when the interviewers showed up), and obtain oral consent. List by age and sex everyone living there now and everyone who lived there on a particular date over four years ago. Find out what happened to each of them and when, and write it all down. Focus on the ones who had died: find out the cause and circumstances of death; then ask to see the death certificate. If they have one (as 92% did), have them dig it out so the interviewer can take a good look at it. If there’s a discrepancy between the official cause of death and the one reported by the interviewee, hash that out. (The more I think about all that, the more unlikely that 0.8% refusal rate seems.)
Suppose each survey team is working 10-hour days. Even that’s pushing it because survey operations must be conducted with a view to finding respondents at home and willing to talk. (But apparently that's not a problem in urban Iraq.) That’s an average of four surveys per hour, i.e., one every fifteen minutes. Granted some interviews would be short: a husband and wife living alone for the past five years would only take a few minutes. Since the average household has over six members, however, interviews are much more likely to be lengthy. Also, the interviewers need meal and other breaks. The assertion that 40 households could be interviewed in one day strains credibility.
Another discrepancy in the article’s description of operations raises the disturbing possibility that the survey could have been tainted by surveyor bias. Here’s the methodological description of the selection of respondent households.
The third stage consisted of random selection of a main street within the administrative unit from a list of all main streets. A residential street was then randomly selected from a list of residential streets crossing the main street. On the residential street, houses were numbered and a start household was randomly selected. From this start household, the team proceeded to the adjacent residence until 40 households were surveyed. For this study, a household was defined as a unit that ate together, and had a separate entrance from the street or a separate apartment entrance. [p. 2]
An administrative unit within the cluster was chosen at random, a main street within the administrative unit was chosen at random, a residential street crossing the main street was chosen at random, and a start household on the residential street was chosen at random. The interview team has no discretion whatever in the selection of survey respondents, with one exception (as already cited above):
The interview team were given the responsibility and authority to change to an alternate location if they perceived the level of insecurity or risk to be unacceptable. [p. 2]
The article doesn’t say how often the interview team exercised its discretion to change to an alternate location. To me, that is a serious omission, unless we are to understand that this never, or rarely, happened. In any case, no instances are reported of interviewers coming under fire or other threat, so that would appear to have been a very unusual circumstance.
Why then does this statement appear in the article?
Although interviewers used a robust process for identifying clusters, the potential exists for interviewers to be drawn to especially affected houses through conscious or unconscious processes. Although evidence of this bias does not exist, its potential cannot be dismissed. [p. 7, footnote omitted]
How could interviewers be “drawn” to particular houses if the selection of households was driven by a completely random process, except when interviewers felt insecure or otherwise at risk? The quoted statement doesn’t make sense in the context of what is supposed to be random choice of particular streets and households. It only raises further serious doubts about the sample selection process.
There are many other problems with the Lancet study that could be discussed. What I’ve presented here, however, is more than sufficient to demonstrate that the survey behind the estimate of “excess” deaths was statistically unsound because biased by non-random selection of interview respondents. Moreover, the article’s description of survey field operations is, in the absence of further supporting documentation, highly problematic.
In my judgment, the estimate of 655,000 deaths lacks solid foundation and therefore should not be relied upon.
UPDATE (18 Oct.): Follow-up here.
UPDATE (22 Oct.): Further critique: "Main street bias" in Lancet study









Posts

According to the study’s own documentation, every cluster was located in an urban area; none was selected in a rural area.
What documentation are you talking about? I’m only aware of the 8 page version, which doesn’t say anything more about the clusters than governate.
The 2004 survey said explicitly that they sampled villages; I see no reason that this would be different. Are you claiming a significant portion of the population does not even live in villages?
Douglas,
I refer to the documentation contained in the 8-page report, relevant portions of which I have quoted and summarised.
I think the difference between the two surveys as far as sampling villages/rural areas is concerned can be explained by the change in procedure with respect to GPS units: They were used in the old survey but not in the new one.
[...] Lancet study of Iraqi deaths is statistically unsound and unreliable [...]
That seems like an incredibly unfair reading of the methodology. I read it as saying that they randomly select side streets and particular households only if necessary to narrow down the number. The study is certainly biased against villages with less than 40 households, and probably against much larger villages.
Also, the two surveys produced the same results for the same time period, so the claim that the new survey is more biased towards cities is absurd.
I‚Äôve read hundreds of methodological descriptions of sample surveys, and I know what they are and what information they’re intended to convey. They constitute precise records of how the sample frame was constructed and how the survey was done. You‚Äôre reading words into The Lancet article‚Äôs methodological information that aren’t actually there. If some steps in the methodology were not always followed, it should say so—explicitly—and why. It does not say that any steps were skipped, and I can only go by what they‚Äôre written.
The statement in your second paragraph is a non sequitur. Anyway, the earlier survey presented a 95% confidence interval estimate of 8,000-194,000 deaths. Given that degree of imprecision, practically any result of the second survey could be regarded as within the first estimate’s margin of error.
[...] Lancet study of Iraqi deaths is statistically unsound and unreliable [...]
[...] Turns out that the numbers, methodology, and ideology are all suspect. Ask a statistician.. you know, somebody who does analyzes and interprets numbers and such for a living. Says one such Canadian professional: “Admittedly, I have no personal experience of daily life in Iraq. Nevertheless, the 98.3% initial response rate is foreign, not just to my experience, but to any real-world survey situation imaginable.” And it gets worse: ” Lancet researchers ignored superior study on Iraqi deaths” by the UN. [...]
Five issues:
1. Next-household methodology: The study used next-household to choose households to interview, thus reducing the number of independent sample points to just 47. If I’m not mistaken, the ILCS chose households independently and so had many more sample points. (21,000?)
2. The study states, “Decisions on sampling sites were made by the field manager. The interview team were given the responsibility and authority to change to an alternate location if they perceived the level of insecurity or risk to be unacceptable.”
Have the interviewers introduced bias in their manual selection of starting points?
3. The study states, “The third stage consisted of random selection of a main street within the administrative unit from a list of all main streets. A residential street was then randomly selected from a list of residential streets crossing the main street. On the residential street, houses were numbered and a start household was randomly selected. From this start household, the team proceeded to the adjacent residence until 40 households were surveyed.”
This restricts start points to residential streets adjoining main streets, eliminating both main streets and residential streets not intersecting main streets. Does this introduce a bias?
4. The study’s Figure 4 claims its results confirmed by comparing “trend” of its rate to “trend” IBC and DoD death numbers.
a. Does it make sense to compare a rate to a value (deaths per year per population to deaths)? Are the authors blowing smoke?
b. Similar “trends” in two or more time-series might confirm they’re measuring the same underlying statistic but do not confirm the absolute value. Are the authors blowing smoke?
5. The survey mentions gathering gender and birth dates. However, Moore’s WSJ article quotes Roberts as saying no demographic data was gathered. This is contradictory unless both Roberts and Moore mean extended demographic data (education, income, social status, marital status, etc.) was not gathered.
http://www.thelancet.com/webfiles/images/journals/lancet/s0140673606694919.pdf
Clarification of my point 4a above: In the study’s Figure 4, The authors seem to compare “trend” of a value to “trend” of a rate representing (crudely) the first derivative of the value. They seem to be saying, “Our study is confirmed because the shape of our time-series of rate looks like the time-series of the value of which the rate is the first derivative.” This is a nonsensical comparison. Worse, the value curves are near-linear (implying constant rate) while the rate curve is not constant but dramatically increasing. In the figure, the Lancet survey’s rate could not possible be the time-derivative of the IBC and DoD curves shown.
[...] Major Statistics Dumb Looks Still free SEIXON Gateway Pundit Decision ‘08 America’s majority [...]
Sceptic,
Thanks for your comments. The points you raise are all important.
1. Very good point. In both Lancet surveys, only one household was randomly chosen within each cluster. Once the starting point was selected, the rest of the households in the cluster followed in order. The ILCS surveyed ten households within each cluster, and all ten were randomly chosen
2. That is still an unresolved question.
3. Indeed, it does. I plan to have a further post on that later today.
4a. This is one of the pitfalls of charting statistical data. Charts can be useful for illustrative purposes, but they can be manipulated by altering the scales of the axes. That is another reason for relying primarily on the statistical analysis. No chart can overcome or outweigh analytical problems.
5. The Lancet study mentions collecting age data, but not birth dates. This seems to mean that, for each person who had died, the researchers simply asked the age of the person at death. They did not gather birth dates as such. Also, Moore’s article says the researchers did not gather any demographic data on living persons.
[W]hile the gender and the age of the deceased were recorded in the 2006 Johns Hopkins study, nobody, according to Dr. Roberts, recorded demographic information for the living survey respondents. This would be the first survey I have looked at in my 15 years of looking that did not ask demographic questions of its respondents.
Business surveys generally do not gather demographic data on respondents. No one wants to know the birth date of the clerk filling out a firm’s questionnaire on manufacturing shipments. But for population surveys, Mr Moore is absolutely correct. Questions on sex and birth dates appear at or very close to the top of every household or individual survey I’ve ever seen.
I don’t understand how the peer reviewers can have missed all of this — or, for that matter, all these signatories of a letter supporting the study:
http://www.theage.com.au/news/opinion/the-iraq-deaths-study-was-valid-and-correct/2006/10/20/1160851135985.html?page=fullpage
[...] Main street bias in Lancet study By StatGuy One of the joys of blogging for me is interacting with people I'd never have met otherwise. My posts on the Lancet study of Iraqi deaths have afforded many opportunities for that. (Background here.) One in particular prompts this post. [...]
[...] CHRISTOPHER HITCHENS on the moral idiocy evident in Lancet‚Äôs work. Stick a fork in the Lancet study‚Äîit‚Äôs done; Lancet study of Iraqi deaths is statistically unsound and unreliable …. (timblair, magicstatistics) [...]
[...] I, along with many other statisticians, expressed serious reservations about the soundness of the survey and the reliability of the results. That The Lancet authors appear unwilling to answer basic questions that the original article’s inadequate documentation left unanswered only diminishes what credibility they have left. The unavoidable question now: Are they trying to hide something? [...]
[...] Lancet study of Iraqi deaths is statistically unsound and unreliable [...]
[...] Now that I think of it, the authors of the Lancet study of Iraqi deaths have also refused to release their raw data. I wonder if an FOI Act request would pry those data loose. [...]
[...] The study’s description of field operations indicates that the survey was administered more quickly and more smoothly than any comparable survey I’ve ever heard of. A team of two surveyors appeared unannounced at the front doors of 1849 Iraqi households asking highly sensitive and intrusive questions about everyone who had lived there since January 2002. Interviews were successfully completed in an average of only fifteen minutes each. We are told that interviewers found family heads at home in all but 16 households. What’s most mind-boggling (to me, anyway) is that only 15 households refused to participate—a refusal rate of only 0.8%. [...]
[...] Another co-author of the Lancet study, Les Roberts of Johns Hopkins Bloomberg School of Public Health, has been asked to speak if Dr Lafta can’t make it. So, there may still be an opportunity to ask questions about the study’s methodology. [...]
[...] Was 2004 Lancet study correct to toss out Falluja data? By StatGuy I’ve written several posts about the 2006 study of Iraqi deaths, written by Burnham et al, but not about the predecessor study, authored by Roberts et al and published in 2004. (This blog started in August 2005.) Both studies analysed data collected in personal interviews at Iraqi households selected using cluster sample methodology and purported to find that tens of thousands of Iraqis have died because of the US invasion. Both were published in The Lancet. [...]
[...] 2. For example, this: The study’s description of field operations indicates that the survey was administered more quickly and more smoothly than any comparable survey I’ve ever heard of. A team of two surveyors appeared [...]