For example, if you’re rating three characteristics, and you think one is twice as important as each of the other two, you would take 50% of the rating of the most important characteristic and 25% of the ratings of each of the other two, then add them together.
The problem with that procedure, though, is that the in the final analysis the weight of the most important characteristic will be far higher than you had intended. We can see why this happens by looking at the logic of ratings.
Let’s say you’re selecting students for a program. Your rating scale, then, is intended as a measure of ability to succeed in studying the domain the program covers. You are assessing five characteristics, and assigning weights of 50%, 30% 10%, 5%, and 5%.
If the several measures of ability to succeed are all measuring the same concept, then they will be highly correlated – people who score high on each measure will also score high on the others. When this is true there is no reason to weight the measures – that is, if they are measures of the same thing there is no justification for making one more important than the others. The statistics of test design provides clear criteria for determining if all of a group of measures are measuring the same thing.
If the measures are not correlated, then they are measuring different aspects of ability to succeed. If they are combined without weighting they will tend to cancel each other out – a high score on one measure will be cancelled out by a low score on another uncorrelated measure – and scores will tend to accumulate in the middle of the score range.
If weights are assigned to the measures to reflect priority, the applicants who score high on the one or two measures with highest priority will tend to have ratings in the high range. The rest of the scores will continue to cancel each other out and the rest of the candidates will accumulate in the middle range.
Accumulation of scores in the middle range creates a problem for selection, because the cut-off point usually is found in the middle range, and choices must be made between applicants whose scores are very similar. For example, if one student received a mark of 55 on a test of mathematics, and another student a 57, you would not conclude that the second student was a better mathematician than the first. The difference is probably due to random variables, perhaps something as simple as the first student having a headache.
This also means that the characteristics with lower priority will usually end up having no influence on selection at all because ratings of these characteristics will cancel each other out. If you are rating uncorrelated characteristics and want each to have a specific weight in selection you will need to use a procedure that insures they will have this weight. A simple procedure in our example would be to draw 50% of the selected applicants from those with high scores on the most important characteristic, 30% from those with high scores on the second most important one, and so on. Alternatively, the selection can be made in stages to ensure that each characteristic is evaluated according to its priority rank and separately from uncorrelated characteristics.
Of course, sometimes some characteristics will be correlated and some not. The correlated characteristics can then be combined into a single score that will be more accurate than the single characteristics by themselves. The other lesson to be drawn from this is that someone familiar with test design should review selection procedures to ensure that they have the intended results. Ignoring the relationships between the characteristics you are assessing means that you will be defeating your own purposes – the ones implied by the weight you assigned to each characteristic.
Tomorrow we'll look at some insidious forms of weighting that can sabotage selection even when you don't deliberately weight scores.
No comments:
Post a Comment