Tuesday, February 7, 2012

One reason selection tests may not work

Let's suppose you wanted to find out how well students' marks on graduation from high school predicted their marks in the first year of university. You select a sample of students and correlate their high school marks with their university marks. You will often fail to find a statistically significant correlation.

This result is counterintuitive, but the reason for it is simple. Only the best students get into university, and even if they do as well in university as they did in high school their marks will fall in a very restricted range. That is, there is simply less difference in ability between the students than there would be if the full range of ability had been sampled, so it is difficult to observe a correlation between their scores.

The distribution of marks will also probably be skewed (in the statistical sense - the mean will be much different from the median), which also militates against finding a correlation.

Problems like these are why I distrust the idea that people can conduct data mining even if they have no training in inferential statistics.

One reason selection tests may not work © 2001, John FitzGerald

More articles at the main site