Thursday, October 4, 2012

More ways to sabotage selection

Yesterday we saw how weighting the different measures you combine to rate applicants for jobs or promotions or school placements or grants can end up undermining your ratings. The measures to which you assign the highest weight end up having almost all the influence on selection, while the other measures end up with none.

There are times, though, when people don't intend to weight their measures but end up weighting them inadvertently anyway. For example, if you measure one characteristic on a scale of 10 and another on a scale of 5, the measure with a maximum score of 10 will end up having more influence (barring extraordinary and very rare circumstances).

That problem's easy to deal with: just make sure that all your measures have scales with the same maximum score. The second is a little more difficult. It is that differences in variability can accidentally weight the measures.

Some of your measures will almost always vary over a wider range than others. The statistic most widely used to assess variability is the standard deviation. The bigger the standard deviation, the more variable the scores. An example will demonstrate the problem differences in variability create.

Let's suppose that a professor gives two tests in a course, each of which is to count for 50% of the final mark. The first test has a mean of 65 and a standard deviation of 8, while the second has a mean of 65 and a standard deviation of 16. The problem with these statistics is that two students can do equally well but end up with different final marks. We'll look at two students' possible results.

The first student finishes one standard deviation above the mean on the first test and right at the mean on the second. That is, her marks were 73 and 65, and her final mark is half of 73 + 65, or 69. The second student finishes at the mean on the first test and one standard deviation above the mean on the second. That is, her marks are 65 and 81, and her final mark is (65 + 81)/2, or 73. So, even though each student finished at the mean on one test and one standard deviation above the mean on the other, one ended up with a higher mark than the other.

To eliminate this bias you can calculate standard scores. You simply subtract the mean from each applicant's score and divide by the standard deviation. That gives you a standard score with a mean of zero; applicants with scores above the mean will have positive standard scores and applicants with scores below the mean will have negative ones. If that sounds complicated, it's not. Spreadsheets will do it for you; in Excel you use the AVERAGE function to get the mean and the STDDEV function to get the standard deviation (there is a STANDARDIZE function, but since it requires you to enter the mean and standard deviation it it's no faster than writing a formula yourself)).

Even if that still seems like a lot of work to you, the choice is clear: either you do the work or you sabotage your ratings. If you sabotage your ratings you sabotage your selection, and if you sabotage your selection you sabotage your organization (and maybe others, if you're doing something like selecting outside applicants for grants).

For more information about standardization click here for the first of a series of brief articles. Alternatively, the next time you're compiling ratings you can involve staff with statistical training or a consultant.

More Ways to Sabotage Selection © 2012, John FitzGerald

Wednesday, October 3, 2012

The hidden danger in selection procedures

When you’re selecting people for jobs, students for university, projects to fund, or making any one of the many significant choices we often find ourselves faced with, you’re often advised to decide what characteristics you want the successful candidate to have, rate the characteristics numerically, weight them according to the importance you think each should have, then add up the weighted ratings.

For example, if you’re rating three characteristics, and you think one is twice as important as each of the other two, you would take 50% of the rating of the most important characteristic and 25% of the ratings of each of the other two, then add them together.

The problem with that procedure, though, is that the in the final analysis the weight of the most important characteristic will be far higher than you had intended. We can see why this happens by looking at the logic of ratings.

Let’s say you’re selecting students for a program. Your rating scale, then, is intended as a measure of ability to succeed in studying the domain the program covers. You are assessing five characteristics, and assigning weights of 50%, 30% 10%, 5%, and 5%.

If the several measures of ability to succeed are all measuring the same concept, then they will be highly correlated – people who score high on each measure will also score high on the others. When this is true there is no reason to weight the measures – that is, if they are measures of the same thing there is no justification for making one more important than the others. The statistics of test design provides clear criteria for determining if all of a group of measures are measuring the same thing.

If the measures are not correlated, then they are measuring different aspects of ability to succeed. If they are combined without weighting they will tend to cancel each other out – a high score on one measure will be cancelled out by a low score on another uncorrelated measure – and scores will tend to accumulate in the middle of the score range.

If weights are assigned to the measures to reflect priority, the applicants who score high on the one or two measures with highest priority will tend to have ratings in the high range. The rest of the scores will continue to cancel each other out and the rest of the candidates will accumulate in the middle range.

Accumulation of scores in the middle range creates a problem for selection, because the cut-off point usually is found in the middle range, and choices must be made between applicants whose scores are very similar. For example, if one student received a mark of 55 on a test of mathematics, and another student a 57, you would not conclude that the second student was a better mathematician than the first. The difference is probably due to random variables, perhaps something as simple as the first student having a headache.

This also means that the characteristics with lower priority will usually end up having no influence on selection at all because ratings of these characteristics will cancel each other out. If you are rating uncorrelated characteristics and want each to have a specific weight in selection you will need to use a procedure that insures they will have this weight. A simple procedure in our example would be to draw 50% of the selected applicants from those with high scores on the most important characteristic, 30% from those with high scores on the second most important one, and so on. Alternatively, the selection can be made in stages to ensure that each characteristic is evaluated according to its priority rank and separately from uncorrelated characteristics.

Of course, sometimes some characteristics will be correlated and some not. The correlated characteristics can then be combined into a single score that will be more accurate than the single characteristics by themselves. The other lesson to be drawn from this is that someone familiar with test design should review selection procedures to ensure that they have the intended results. Ignoring the relationships between the characteristics you are assessing means that you will be defeating your own purposes – the ones implied by the weight you assigned to each characteristic.

Tomorrow we'll look at some insidious forms of weighting that can sabotage selection even when you don't deliberately weight scores.

The Hidden Danger in Selection Procedures © 2012, John FitzGerald