Actual Analysis blog: The non-confidence interval

The rise of the opinion poll to pre-eminent importance has made us all familiar with statements like "These results are accurate within three percentage points, 95 times out of 100." This is a statement of what is known in sampling theory as a confidence interval.

Usually the result to which this statement refers is an estimate of the percentage of the population holding a certain opinion. The statement of the confidence interval implies that if 100 more samples of the same size had been drawn from the same population, the percentages estimated from 95 of those samples would have been within three percentage points of the percentage in the entire population.

The mathematics used to reach this conclusion is quite elegant, and the validity of confidence intervals is inescapable as long as certain conditions are met. What rarely is mentioned, though, is that these conditions are not met very often.

First, the sample has to be a random sample from the population. In opinion polling, this assumption is never met. For one thing, most people don't co-operate with poll takers. They hang up the phone, they don't return the questionnaire in the prepaid envelope, they don't stop for the people in malls with the clipboards. At best, polling samples are random samples from that minority of the population which agrees to be sampled.

Second, there is no one standard confidence interval for a poll. The confidence interval varies with the size of the percentage being estimated. This issue is rarely mentioned by researchers of any kind. In general, the confidence interval of a percentage becomes smaller as the percentage differs from 50%. The decrease can be important with smaller samples. For example, an estimate of 50% based on a sample of 200 has a 95% confidence interval of 6.9%, given certain assumptions. An estimate of 80% based on a sample of the same size has a 95% confidence interval of 5.5%.

Finally, the formulas for the confidence interval assume that you are measuring something reliable. These formulas were derived originally for problems in the natural sciences, where the items being sampled have solidity and consistency. In opinion polling, on the other hand, the items being sampled tend to be ethereal and ephemeral. Today I may feel like voting for the Vegetarian Party, but by the time I get behind the screen and pick up the little pencil I may well have decided that yesterday's scandal involving the executive committee of the Vegetarian Party, a seedy restaurant, and fish cakes disguised as tofu has pretty well demonstrated the unfitness of the Vegetarian Party to govern.

At best, a poll, or any similar survey, is a measure of what the minority of people who take part in polls think at a specific hour and minute of a specific day. As a guide to action, polls need to be supplemented by other information about the issues which they investigate.

Originally published at Actual Analysis

Actual Analysis blog

Friday, January 14, 2011

The non-confidence interval

No comments:

Post a Comment