Friday, December 9, 2011

Better living through multiple linear regression analysis

I probably say somewhere on the main ste that multiple regression analysis is overused, and indeed it is. Nevertheless, it does have valuable uses which I don't want to frighten people away from, so here's an article about one of them.

I regularly use regression analysis to clarify for a client the factors affecting satisfaction with training and rehabilitation programs the client offers. It started with a review of a program about which the client knew rhat the more enthusiastic about the program consumers were on entry, the more satisfied they were at the end. The question was whether final satisfaction or dissatisfaction with the programs was simply a self-fulfilling prophecy – did consumers say they were satisfied or dissatisfied with the programs simply to justify their initial attitudes?

The client also collected information about consumers' opinions of various characteristics of their programs. This information was not correlated with initial attitude, nor were different types of this information correlated with each other. It was therefore easy, using multiple linear regression analysis, to estimate what proportion of final satisfaction could be explained by initial attitude toward the programs, and then see if characteristics of the programs explained the remainder of the final satisfaction (the residual, as it's known in regression analysis). It turned out that characteristics of the programs were twice as important as initial attitude in determining satisfaction with the programs.

So not only did multiple linear regression analysis determine that satisfaction with the programs was not a self-fulfilling prophecy, it also estimated the relative importance of initial attitude and of the actual characteristics of the programs. The analysis was made easier by the lack of correlation between the different types of information collected, but correlated information can be analyzed with more complicated designs. The possible existence of correlation, though, is the chief reason you shouldn't try this at home. Statistical and database software make it easy to do multiple linear regression analysis, but if you don't know how to deal with correlated variables or how to identify outliers (extreme observations which distort the results), you'll often get the wrong results when you use that software.

We have since gone on to use this technique to determine whether what consumers say are the important factors in determining their satisfaction are in fact the most important. We have frequently found that a simple count of the most popular explanations is contradicted by the multiple linear regression analysis. This is not surprising, since counting explanations, even if they are valid, gives us only a very rough estimate of the importance of different factors. The multile linear regression analysis clarifies the issue.

Of course, it is also important that you use a proper hypothesis-testing design. Just turning multiple linear regression loose on a set of data is almost certain to produce a large proportion of unhelpful or misleading results.

Better Living through Multiple Linear Regression Analysis © 1999, 2011 John FitzGerald

No comments:

Post a Comment