Actual Analysis blog: Inside the information cult (1)

In Canada we're in the final week of a federal election campaign. So far press coverage has been dominated by coverage of poll results. Unlike, it seems, most people I am skeptical of the utility of election poll results, and here I'll explain why.

Polls of people’s opinions can be a useful exercise if they’re done properly and the results are interpreted carefully. The polls that the press publishes, however, often fail to satisfy these criteria (at least in the form in which they are presented in the press). For one thing, they usually ask at most a handful of questions and don’t attempt to assess how meaningful the responses are . The results of this type of poll are not information but pseudo-information. Electoral poll results are obviously uninformative simply because they don’t predict election results accurately enough. Informative data must be valid, and in general they are not valid.

If you consult the excellent PollingReport website you will find a summary of final estimates by eighteen polls of the popular vote in the 2000 United States presidential elections. Fourteen of those predictions had George W. Bush winning the popular vote, which in fact was won by Al Gore. Sure, the election was close, but isn’t that when you most need a good prediction? These poll results are not information, but rather some devout information cultists’ simulation of information.

In 2004, 15 of 22 polls predicted that Mr. Bush would win the popular vote, but five still predicted that John Kerry would win (two predicted a saw-off, as did two in 2000). This election wasn’t quite as close as the one in 2000, but the difference between the two candidates’ support was only about 2 percentage points. Even if the polls do give you a good idea of how people are going to vote, sampling error wipes out any utility they may have when a vote is close, which is a lot of the time. Furthermore, why would we expect polls to be all that valid as measures of what the population as a whole intends to do?

First, there’s their questionable sampling to consider. Poll results often come with statements, derived from sampling theory, saying that given the size of the sample they polled, their results will be accurate within so many percentage points of the actual percentages 95% or 99% of the time; in Canada the press is required to provide such estimates. These estimates are derived from sampling theory. However, sampling theory assumes that the samples polled are representative (that is, that they are random samples of the population). That is not true of any political poll.

A random sample is one in which each member of a population has a known probability of appearing. If you draw a simple random sample of 10% of a jar containing 2,000 jelly beans, each jelly bean will, if you draw the sample properly, have a 10% chance of appearing in the sample. However, let’s say that you want to draw a sample of 10% of the members of a club with 2,000 members so that you can ask them (the members of your sample) some questions about the club. All of a sudden you don’t know the probability that each member has of appearing in the sample, for a very simple reason.

The simple reason is that people can refuse to take part in your sample. If you mail them your questionnaire, others will forget to complete it, and some of the procrastinators will never get round to it. Some just won't be interested. The problem is that you can’t tell the people who won’t return the questionnaire from the ones who will.

You will end up drawing a random sample from the population of club members who complete questionnaires. The same is true of samples in political polls. Most people, in fact, refuse to take part in political polls. Secondly, people have to be home to answer the phone before they can consent to take part in the poll. It’s likely that some large subgroups of the population (the young, for example, or the employed) are less likely to be at home than others. Thirdly, the questions have to be asked in a language the person polled understands; people who can’t understand the language of the poll well enough have to be excluded.

For these and other reasons the sample you get in a political poll is never representative of the population as a whole but rather of that minority of the population that is both able and willing to take part in polls. If that minority thinks like the majority, then your results will apply to the majority as well. If the majority doesn’t think like the minority, then the results won’t apply. The catch, of course, is that you have no idea how closely the thinking of the minority corresponds to the thinking of the majority.

Even if you were able to get a representative sample, you would still have the problem that people sometimes don’t have too accurate an idea of what they’re going to do. Sometimes they change their minds between the time they take part in the poll and the time they actually vote. Sometimes they don’t know how they’re going to vote till they get in the booth. Sometimes they don’t vote. And even if they do know how they’re going to vote, why should we assume that they’ll tell us the truth?

Election polling is a cargo cult practice. We know that examining samples has been a productive practice in science, so we draw a few samples of our own to examine. However, just as the control towers at cargo cult airstrips in Melanesia don't have the crucial operating characteristics of real control towers, the samples drawn in election polls don;t have the crucial operating characteristics of samples from which estimates of statistics (the percentage of people likely to vote for a political party, for example) can be reliably derived. And even if they did, the mutability of human intentions would probably keep them inaccurate.

Actual Analysis website

Inside the information cult (1) © 2011, John FitzGerald

Actual Analysis blog

Tuesday, April 26, 2011

Inside the information cult (1)

No comments:

Post a Comment