Thursday, January 13, 2011

The TRUTH about means and medians

One of the many bees I have in my bonnet is buzzing about how people talk about the mean and the median. I just read a research report in which the mean was described as the average and the median was just called the median. It was a pretty good report, so I suspect the author was trying to help her non-statistically-trained readers. I still think this can be misleading, though.

The mean of a set of scores is simply the result of adding them up and dividing by the number of scores. The median, on the other hand, is the score which has equal numbers of other scores above and below it.

The average of a set of scores is its midpoint. That is, the median is the average. The mean is an estimate of the median. We use it because it can be manipulated more effectively and profitably than the median can, usually without affecting the validity of conclusions.

Most people know that skew may make the mean inaccurate. However, most do not know that there are criteria for deciding if use of the mean should be reconsidered. I reconsider if the skewness coefficient (which is provided by spreadsheets as well as statistical software) is greater than 1 or less than -1.

Above all, do not do what some people do and throw out your highest and lowest scores, or the two highest and two lowest etc., as a protection against skew. While that probably does little harm, it doesn't help either. If you're using your data for descriptive purposes, the best solution is to use the median. That way you get the benefit of all your data.

If you're using the data for inferential purposes, you should of course be using statistical tests. I suggest you compare the result of a test of differences between means with the result of a test of differences between medians (this is often a good idea even with unskewed data). If you're comparing either means or medians without a test, you are wasting your time. You need to know how likely a difference is to happen by accident before you can decide how important it is.

I should note that there are occasions when discarding data before calculating the mean can be useful; however, these are occasions that are best handled by people trained to deal with them. If you ever find yourself having to estimate the location parameter of a Cauchy distribution, discarding data from the tails before calculating a mean can be helpful, but even more helpful is having someone do who's been trained to do it and does it a lot.

Main site

The Truth about Means and Medians © 2011, John FitzGerald

No comments:

Post a Comment