## Weakliem on air rage and himmicanes

Weakliem writes:

I think I see where the [air rage] analysis went wrong.

The dependent variable was whether or not an “air rage” incident happened on the flight. Two important influences on the chance of an incident are the number of passengers and how long the flight was (their data apparently don’t include the number of passengers or duration of the flight, but they do include number of seats and the distance of the flight). As a starting point, let’s suppose that every passenger has a given chance of causing an incident for every mile he or she flies. Then the chance of an incident on a particular flight is approximately:

p=knd

p is the probability of an incident, k is the chance per passenger-mile, n is the number of passengers, and d is the distance. It’s approximate because some incidents might be the second, third, etc. on a flight, but the approximation is good when the probabilities are small, which they are (a rate of about 2 incidents per thousand flights). When you take logarithms, you get

log(p)=log(k) + log(n) + log(d)

DeCelles and Norton used logit models–that is, log(p)/log(1-p) was a linear function of some predictors. (When p is small, the logit is approximately log(p)). So while they included the number of seats and distance as predictors, it would have been more reasonable to include the logarithms of those variables.

What if the true relationship is the one I’ve given above, but you fit a logit using the number of seats as a predictor? . . . there are systematic discrepancies between the predicted and actual values. That’s relevant to the estimates of the other predictors. . . . a model that adds variables for those qualities will find that first class with front boarding has higher rates than expected given the number of seats, which is exactly what DeCelles and Norton appeared to find. . . .

This is the same problem that led to spurious result in the hurricane names study.

In some sense this doesn’t matter to me because the air rage and himmicane studies were so bad anyway—and it won’t matter to NPR, Ted, PNAS, etc., either, because they follow the rule of publishing well-packaged work by well-connected authors that makes novel claims. Also, given that these researchers had the goals of publishing something dramatic from these research projects, I have no doubt that, even had they followed Weakliem’s particular advice here, they still had enough researcher degrees of freedom to pull some rabbits out of these particular hats.

I’m sharing this, because (a) it’s always fun to hear about air rage and himmicanes, and (b) Weakliem’s making a good general point about regression models. Usually what I say is that the most important thing is not how you model your data, but rather what data you include in your model. But Weakliem is illustrating here that sometimes it does make sense how you do it.

1. Thomas says:

A typo there? logit is log(p/(1-p))
Also, it would make sense to offset the covariates log(n) and log(d), to keep the equation as is and only model the incidence rate (so to speak)

2. Alex says:

I’m reading through your (Gelman et al.’s) new Regression book and that sounds much like an example/suggestion from the transformations chapter.

3. Michael Nelson says:

You raise an interesting point. We all know that researcher df’s create multiple paths to significant outcomes (once you ignore multiplicity), just by chance, because the number of possible variables and models and analytical methods increases exponentially. But that’s only assuming that the researcher is going to choose among valid options. If we allow that many researchers apply analyses that, knowingly or not, are technically incorrect, then we’ve added a completely new dimension to the garden, and the exponentiality itself increases exponentially.

For example: Suppose the researcher comes to her first fork when choosing between two outcome variables. Suppose these are the only two valid options, so the single path splits into two. But there’s a third prong in which the researcher averages the two outcomes, even though doing so is invalid due to substantive or technical restrictions. After several of these junctions, you end up in some sort of Borges-Escher k-dimensional garden!

And you can’t really even blame the researcher. Statistics is so complicated, and so dependent on the specific study, that it is perhaps impossible to anticipate all the wrong paths and warn researchers away from them, or even to preemptively suggest the “one” correct path that steers them away from the others.

As you say, statistics is hard, and I know that when I use an unfamiliar technique, I have to affirmatively determine whether each decision is permissible. To a non-statistician, they’re all unfamiliar. Asking them to do that kind of due diligence is unreasonable. This is self-serving, but it makes me wonder if the only solution is to declare that statisticians should be doing statistics. I mean, objectively, isn’t weird that a researcher who may be brilliant in their substantive field gets the benefit of the doubt that their analysis is correct because they took a couple of stats classes in grad school? Would you ask your plumber to work on your electrical wiring?

• Andrew says:

Michael:

There’s a credentialed idiot in the Williams College math department who would’ve benefited from your advice a couple months ago!

• Michael Nelson says:

If the issue were that he didn’t know better, you’d have a point. I strongly suspect he was being disingenuous in order to “earn” a consulting fee or as a Trump supporter. No credential can ensure integrity. Unless we want to adopt the statistician’s version of the Hippocratic oath–“First, do no harmful analyses.”

• Andrew says:

Michael:

Sure. But when I say he was an idiot, I’m not saying just that he was an idiot for doing stupid statistics. He was also an idiot if he thought that some consulting fee or some political statement was worth destroying his reputation.

• Dale Lehman says:

I can’t agree to this suggestion. I’m an economist, so I suppose I’ll have to stop doing any statistical analysis. And you had better not try to do any economic analysis. But, then again, I’m a microeconomist so I’d better stay away from macro. Actually, I do, and have refused to teach macro for decades (having not been promoted at one time because of that refusal).

My point is that determining areas in which people are ‘allowed’ to participate cannot be subject to a litmus test of what their degree is in, what field they specialized in, what school they went to, or what their GRE scores were, for that matter. We live in a world, for better or worse, where the “solution” will require permitting people to work outside of their fields. We must ultimately judge their work on its quality and not on the basis of their credentials (although the latter can be part of the determination, with how much it counts also a matter of judgement).

For the record, I am also not a fan of saying that only electricians can work on electrical work and only plumbers can work on plumbing. I’m not even sure I would regulate that only MDs can practice medicine, but I certainly want transparency of credentials. It’s not just statistics that’s hard – good quality work in anything is hard, including judging what good quality work looks like.

• Martha (Smith) says:

Good points.

• Michael Nelson says:

No one’s suggesting you need a stats degree to be allowed to do statistical analyses. A statistician is as a statistician does, to coin a cliché. My degree’s in psychology, but I consider myself a statistician because I was trained as one, and I have the essential skills and mindset, including the inclination and ability to recognize my ignorance and to remedy it through study or consultation. My observation is that too few people who publish statistical analyses have the necessary skills and mindset. My suggestion is that social sciences might benefit if we didn’t automatically extend faith in analyses on the basis of substantive expertise.

This actually goes to your point about transparency. Right now, we really do rely on faith and a cursory peer review by people who may be just as inexpert. There is no pre-publication quality assurance. But imagine if it were standard practice for journals to require people to write up their analysis, data, code, etc., and to submit that for review even if it won’t be published in the article. And imagine if the analysis would only be accepted if one of three things were true: 1) a coauthor is a statistician, or 2) a statistician has signed off on the analysis (their name is on the paper for that specifically and not as a coauthor), or else 3) one of the reviewers will be a statistician. And I mean “statistician” in the above-defined sense, perhaps demonstrated by any one of a dozen different criteria, like affiliation, degree, publications…. That’s transparency.

Finally, and a slight digression, specialization is a fact of life, and one that’s improved the human condition immeasurably. Think how few good heart surgeons there would be if we expected them to be equally good at brain surgery. The reason for wanting electricians to do wiring isn’t elitism, it’s because a) I want a bonded electrician in case my house burns down, b) insurance companies would go bankrupt if they bonded people without proof of expertise, c) so electricians use certifications to prove to the insurer that they know what they’re doing. This is how almost every field works. My contention is that we need to recognize most statistical analyses are accepted on faith, many don’t deserve it, and the consequences are bad enough that we should consider some kind of specialization.

• I agree with Dale Lehman’s comments above. You write that “My degree’s in psychology, but I consider myself a statistician because I was trained as one, and I have the essential skills and mindset.” Surely, those who do poor statistical analyses would also claim that they were “trained” to think statistically, and that they have the “essential skills and mindset.” I have a hard time believing that if you asked the himmicane people, or the others producing nonsense, if they are capable of analyzing data they would slap their foreheads in embarrassment — “now that you mention it, I actually don’t know how to deal with data!”

• Michael Nelson says:

You make my point for me: our only option currently is to extend blind faith to every analysis before publication, relying on each researcher’s egocentric opinion of their own statistical abilities, frequently allowing them to obscure their precise analytical process from reviewers and readers alike. Maybe I’m not a good statistician–maybe I’m a poet who relaxes by reading statistics blogs. Without some kind of verification, you don’t know. Same with your attorney or your doctor or your mechanic. I just happen to lean toward the notion that we should take our statistical analyses at least as seriously as we take our car’s alternator. Much more is riding on the analyses. (Pun intended!)

Besides, I’m not talking about something radical or onerous. Generally speaking, journals assign reviewers based on substantive expertise to encourage productive feedback and ensure some level of quality. Adding one quantitative reviewer might achieve the same for statistical analyses. And we don’t have to have a Socratic dialogue over defining who’s qualified to review a study in child psychology, so why should it be so hard to decide who’s a qualified statistician?

• We don’t (or shouldn’t) “extend blind faith to every analysis before publication” , or even after; rather, we should actually assess the analysis. I don’t see how “some kind of verification” of the authors’ abilities (whatever this may mean) is either necessary or sufficient. In the second paragraph you suggest ensuring that the *reviewers* have the background to analyze aspects of the paper, which I agree with, but which is, I think, different than the other parts of your suggestion.

It’s likely, though, that I’m mis-understanding your argument. Perhaps you could clarify what “some kind of verification” means — whether this is some sort of assessment of the researchers’ general skills, and how this might actually work in reality. (Would I pass this verification, and therefore be allowed to analyze data?)

• Michael Nelson says:

Well, my argument has changed because you (and Dale) have been persuasive! As you both say, direct inspection of the analysis is ideal. Given that we can’t (yet) manage to convince people to disclose code, much less the multiverse of analyses they tried, building disclosure into a confidential process like peer review seems more workable than full transparency.

Verification of ability does not ensure that the researcher was honest or made no mistakes, but it does improve the likelihood that they were capable of doing it right in the first place. Again, we have this in all parts of life, even in science (people are hired and get funding based, in part, on their CV), just not at the level of publication. But I do prefer to preserve the principle of universal access to publication. Among scientists, this is a moral virtue, as well as a practical one. A completely public, actual-peer review would be better, of course.

And I *never* meant to imply that someone should be prevented from analyzing data, only that top journals require some level of verification by someone credentialed. Were such a system to be adopted (as opposed to my now-preferred approach of qualified peer reviewers), it would change the incentives around the quality of statistical analysis. Someone whose career/reputation is based in doing good statistical analyses will be embarrassed if a paper she signed off on turns out to be numerical gibberish, so she has an incentive to do good work. I theorize that part of the reason substantive researchers keep publishing poor analyses is that getting called out for it isn’t embarrassing–their peers are doing the same analyses and so won’t judge the researcher as harshly as if she made a substantive blunder. Same thing with relying on p-values and citing retracted papers–there’s no incentive to stop if it gets you published and all your peers do it, too.

4. rm bloom says:

The left out the dependence on “snipes”