Friday, April 8, 2011

Lady Luck is actually very democratic

A commercial for a poker site is advising us that Lady Luck hangs out with the better players. In fact, she demonstrably doesn't.

The only meaningful conception of luck that I'm aware of is the statistical one. I am identifying luck with the statistical concept of error, which, as we shall see, is well suited to be a conception of luck. Anyway, any result (winning a poker game, for example) can be statistically analyzed as the consequence of an effect (poker-playing skill, say) and error. Error is the sum of all those things that affect the result but aren't related to poker-playing skill — the specific cards you get, how alert you are, and so on.

Error is randomly distributed with a mean of zero (these characteristics follow from the mathematics required to distinguish effects from error). Since the effects of the variables that produce the error are not correlated with poker-playing, that means the mean error score for good players is zero, and the mean score for poor players is zero. And after all, there should be nothing about being a good player that makes you more likely to be dealt a pair of aces.

Saturday, April 2, 2011

Accuracy is not enough

Data are not necessarily information. They are informative only to the extent that they reduce uncertainty. If you want to know what programs are on television tonight, knowing yesterday's television schedule will not help you. Yesterday's schedule is full of data, but the data are no longer informative.

In psychometric terms, informative data are those which are valid – which predict events of interest to you. To be valid data must be accurate; in fact, the validity of information is limited by its accuracy. Of course, inaccurate data cannot be valid, and the maximum possible validity of accurate data is equal to the square root of its reliability coefficient.

The minimum validity of accurate data, however, is always zero. Sometimes data are not valid simply because they are distributed in a way ill-suited to the statistics which are used to assess validity; often the distribution can be modified through a mathematical transformation and validity restored. Sometimes the data are simply irrelevant or poorly defined.

In Canada a federal election campaign is under way. As usual the press commentary about it includes frequent presentation of poll results. At the moment the poll results are the unverifiable opinions of the 30% of the population that takes part in polls about what they think they'll be doing a month from now. These people probably differ significantly from people who don't take part in polls. They've probably got more time on their hands for a start, which means they're likely older, better off, and so on. That is, they are probably not even accurate estimates of unverifiable opinions. If you check the excellent Polling Report website you'll find that American polls have been dependably incompetent at predicting the results of American presidential elections, which are simple two-candidate races. In Canada, with three national parties and a big regional party they are likely to be even less effective.

Anyway, if you depend on any type of database, it should be checked regularly to ensure not only accuracy but also relevance and utility.

Accuracy is not Enough © 2001, 2011 John FitzGerald


More articles from www.ActualAnalysis.com

Friday, February 4, 2011

Friday, January 14, 2011

The non-confidence interval

The rise of the opinion poll to pre-eminent importance has made us all familiar with statements like "These results are accurate within three percentage points, 95 times out of 100." This is a statement of what is known in sampling theory as a confidence interval.

Usually the result to which this statement refers is an estimate of the percentage of the population holding a certain opinion. The statement of the confidence interval implies that if 100 more samples of the same size had been drawn from the same population, the percentages estimated from 95 of those samples would have been within three percentage points of the percentage in the entire population.

The mathematics used to reach this conclusion is quite elegant, and the validity of confidence intervals is inescapable as long as certain conditions are met. What rarely is mentioned, though, is that these conditions are not met very often.

First, the sample has to be a random sample from the population. In opinion polling, this assumption is never met. For one thing, most people don't co-operate with poll takers. They hang up the phone, they don't return the questionnaire in the prepaid envelope, they don't stop for the people in malls with the clipboards. At best, polling samples are random samples from that minority of the population which agrees to be sampled.

Second, there is no one standard confidence interval for a poll. The confidence interval varies with the size of the percentage being estimated. This issue is rarely mentioned by researchers of any kind. In general, the confidence interval of a percentage becomes smaller as the percentage differs from 50%. The decrease can be important with smaller samples. For example, an estimate of 50% based on a sample of 200 has a 95% confidence interval of 6.9%, given certain assumptions. An estimate of 80% based on a sample of the same size has a 95% confidence interval of 5.5%.

Finally, the formulas for the confidence interval assume that you are measuring something reliable. These formulas were derived originally for problems in the natural sciences, where the items being sampled have solidity and consistency. In opinion polling, on the other hand, the items being sampled tend to be ethereal and ephemeral. Today I may feel like voting for the Vegetarian Party, but by the time I get behind the screen and pick up the little pencil I may well have decided that yesterday's scandal involving the executive committee of the Vegetarian Party, a seedy restaurant, and fish cakes disguised as tofu has pretty well demonstrated the unfitness of the Vegetarian Party to govern.

At best, a poll, or any similar survey, is a measure of what the minority of people who take part in polls think at a specific hour and minute of a specific day. As a guide to action, polls need to be supplemented by other information about the issues which they investigate.

Originally published at Actual Analysis

The Non-confidence Interval © 1995, John FitzGerald

Thursday, January 13, 2011

The TRUTH about means and medians

One of the many bees I have in my bonnet is buzzing about how people talk about the mean and the median. I just read a research report in which the mean was described as the average and the median was just called the median. It was a pretty good report, so I suspect the author was trying to help her non-statistically-trained readers. I still think this can be misleading, though.

The mean of a set of scores is simply the result of adding them up and dividing by the number of scores. The median, on the other hand, is the score which has equal numbers of other scores above and below it.

The average of a set of scores is its midpoint. That is, the median is the average. The mean is an estimate of the median. We use it because it can be manipulated more effectively and profitably than the median can, usually without affecting the validity of conclusions.

Most people know that skew may make the mean inaccurate. However, most do not know that there are criteria for deciding if use of the mean should be reconsidered. I reconsider if the skewness coefficient (which is provided by spreadsheets as well as statistical software) is greater than 1 or less than -1.

Above all, do not do what some people do and throw out your highest and lowest scores, or the two highest and two lowest etc., as a protection against skew. While that probably does little harm, it doesn't help either. If you're using your data for descriptive purposes, the best solution is to use the median. That way you get the benefit of all your data.

If you're using the data for inferential purposes, you should of course be using statistical tests. I suggest you compare the result of a test of differences between means with the result of a test of differences between medians (this is often a good idea even with unskewed data). If you're comparing either means or medians without a test, you are wasting your time. You need to know how likely a difference is to happen by accident before you can decide how important it is.

I should note that there are occasions when discarding data before calculating the mean can be useful; however, these are occasions that are best handled by people trained to deal with them. If you ever find yourself having to estimate the location parameter of a Cauchy distribution, discarding data from the tails before calculating a mean can be helpful, but even more helpful is having someone do who's been trained to do it and does it a lot.

Main site

The Truth about Means and Medians © 2011, John FitzGerald

Friday, December 17, 2010

The future belongs to the Swiss Federal Institute of Technology

The Swiss Federal Institute of Technology in Zurich has a plan to simulate the entire world. According to this report the plan is to "gather data about the planet in unheard of detail, use it to simulate the behaviour of entire economies and then to predict and prevent crises from emerging."

Back when I was being trained in research, this type of project was always held up to us as an example of what not to do. Unfortunately, many people today believe that if you collect massive amounts of data the Truth will emerge from it. Dream on.

There are may problems with this approach. One of the chief ones is that the process is entirely inductive. Relations are identified in data from the past, and then extrapolated to the future. These relationships may hold in the future, or they may not. And if they do hold in the future, they may not hold forever.

I often think that it would help decision-makers if they spent some time handicapping horse races. Believe me, if you see that horses on the rail have been winning all week, that is no guarantee that they are going to win for you today. There is no reason they should.

The basic problem here is the lack of a theoretical approach. Ordinarily you start with a theory, or at least a hypothesis, about how something in the world works and then you collect the data necessary to establish whether your theory or hypothesis stands up to empirical test. If you confirm the theory you then try to modify it to increase its explanatory power. What the Swiss Federal Institute of Technology is trying to do is to get the data to think up the theory for them. But data don't have brains. The theories they come up with are going to lead you down a lot of paths that go nowhere. Any dataset is full of relationships, many — and sometimes all — of which are spurious, products of random variation or of systematic bias. Spurious relationships are not a foundation of successful forecasting.

Another important problem is that typically the relationships you find between the variables of the type that are going to be collected for this project are weak. Most of the variation in them is due to the effects of other variables that you usually don't have measures of. That means that predictions from your model will be at best only grossly approximate. Among other things, the developers of this model of everything want to predict economic bubbles and collapses. However, since these predictions are almost certain to be only grossly approximate, they will offer little guide to policy. If you remember last winter, you remember the extremes governments went to after an H1N1 pandemic was predicted, and how unnecessarily expensive (and ineffective) they were.

But the European Union is sinking a billion euros into this venture. I'd wish them good luck, but I'm confident that even with the best luck possible this project is going to fail, and fail miserably.

Link to the first in a series of related articles at the main site

The future belongs to the Swiss Federal Institute of Technology © 2010, John FitzGerald

Friday, December 3, 2010

Gross national happiness

Canada is apparently considering joining the group of countries that assess their gross national happiness. Happiness is one of those concepts that has always interested me because a) so many people think it's extremely important, and b) so few people even attempt to define it. Love is a similar concept.

In fact the definition of gross national happiness is vague. The project seems chiefly to be an attempt to link population characteristics to feelings of well-being. Why you'd want to do that is a mystery to me. Sure, they've found that countries with low rates of infant mortality have happier citizens, but surely we don't justify fighting infant mortality as necessary to keep the public happy.

Similarly, an assessment of the adequacy of a country's economy has been proposed as an indicator of happiness, but if people are happy with an unsound economy are we to take that as a Good Thing? That approach doesn't seem to have worked too well in the United States, where many people were astonishingly proud of their economy until it came crashing down a few years ago.

I'm certain — or at least I'd like to be certain — that our government doesn't intend the assessment of happiness to be a guide to policy. If that's their intent, they should logically end up doing things like legalizing marijuana — that makes lots of people happy. If they don't intend it to be a guide to policy (and there's no good reason they should), there's no good reason to assess national happiness at all.

Gross national happiness © 2010, John FitzGerald

Actual Analysis website