Monday, October 31, 2011

Average vs. average

I have run across people who, when calculating a mean, will discard their two or three highest and two or three lowest pieces of data and calculate the mean for the rest of their data. What they want to do is protect themselves against the effects of skew, specifically the distortion of a mean by a few extreme scores.

That probably doesn't hurt, but there is a simpler and much more effective way of dealing with this problem – use the median. The median is the score that is midway between the highest and the lowest. In other words it is the true average of your set of data (the mean is an estimate of the median). So use the MEDIAN function in your spreadsheet rather than the MEAN function.

There are some exceptions to this rule, though. If you're using your data to estimate a total – the total value of donations to an organization, for example – you'd use the mean. If you want to compare two sets of data with a statistical test you would usually be better off to use the mean.

And if the SKEWNESS function in your spreadsheet provides a skewness coefficient for your set of data that is higher than -1.00 and less than 1.00 you normally don't worry about this at all.