Wednesday, 27 January 2016

Strange kind of support for multiple regression analysis

Afbeeldingsresultaat voor vitamin ePerhaps I've completely misunderstood the point made by Richard Nesbitt, professor of psychology at Michigan University, but the arguments he adduces against multiple regression analysis are in fact, it seems to me, arguments for it, paradoxically so. I will let you read a stretch from the transcript of the conversation with him on Edge:
"The thing I’m most interested in right now has become a kind of crusade against correlational statistical analysis—in particular, what’s called multiple regression analysis. Say you want to find out whether taking Vitamin E is associated with lower prostate cancer risk. You look at the correlational evidence and indeed it turns out that men who take Vitamin E have lower risk for prostate cancer. Then someone says, "Well, let’s see if we do the actual experiment, what happens." And what happens when you do the experiment is that Vitamin E contributes to the likelihood of prostate cancer. How could there be differences? These happen a lot. The correlational—the observational—evidence tells you one thing, the experimental evidence tells you something completely different. 
In the case of health data, the big problem is something that’s come to be called the healthy user bias, because the guy who’s taking Vitamin E is also doing everything else right. A doctor or an article has told him to take Vitamin E, so he does that, but he’s also the guy who’s watching his weight and his cholesterol, gets plenty of exercise, drinks alcohol in moderation, doesn’t smoke, has a high level of education, and a high income. All of these things are likely to make you live longer, to make you less subject to morbidity and mortality risks of all kinds. You pull one thing out of that correlate and it’s going to look like Vitamin E is terrific because it’s dragging all these other good things along with it. 
This is not, by any means, limited to health issues. A while back, I read a government report in The New York Times on the safety of automobiles. The measure that they used was the deaths per million drivers of each of these autos. It turns out that, for example, there are enormously more deaths per million drivers who drive Ford F150 pickups than for people who drive Volvo station wagons. Most people’s reaction, and certainly my initial reaction to it was, "Well, it sort of figures—everybody knows that Volvos are safe." 
Let’s describe two people and you tell me who you think is more likely to be driving the Volvo and who is more likely to be driving the pickup: a suburban matron in the New York area and a twenty-five-year-old cowboy in Oklahoma. It’s obvious that people are not assigned their cars. We don’t say, "Billy, you’ll be driving a powder blue Volvo station wagon." Because of this self-selection problem, you simply can’t interpret data like that. You know virtually nothing about the relative safety of cars based on that study.
I saw in The New York Times recently an article by a respected writer reporting that people who have elaborate weddings tend to have marriages that last longer. How would that be? Maybe it’s just all the darned expense and bother—you don’t want to get divorced. It’s a cognitive dissonance thing. 
Let’s think about who makes elaborate plans for expensive weddings: people who are better off financially, which is by itself a good prognosis for marriage; people who are more educated, also a better prognosis; people who are richer; people who are older—the later you get married, the more likelihood that the marriage will last, and so on.
The truth is you’ve learned nothing. It’s like saying men who are a somebody III or IV have longer-lasting marriages. Is it because of the suffix there? No, it’s because those people are the types who have a good prognosis for a lengthy marriage.
A huge range of science projects are done with multiple regression analysis. The results are often somewhere between meaningless and quite damaging."
Multiple regression analysis, as I understand it, is precisely a technique which the researcher can use to find out whether a factor shown to correlate with an observed outcome really makes an independent contribution to that outcome or, rather, whether it is 'parasitic' on other more important factors. The technique lets you see what happens when you keep these other factors constant: for instance, will, everything else being equal, make spending more money on your wedding still make your marriage last longer? The examples Nesbitt gives to discredit multiple regression analysis (people's Vitamin E consumption, the type of car people drive, the amount of money people spend on their weddings, etc.) in fact perfectly showcase the technique's merits.



No comments:

Post a Comment