Why do we, as a discipline, have so little understanding of the methods we have created and promote? Our primary tool for gaining understanding is mathematics, which has obvious appeal: most of us trained in math and there is no better form of information than a theorem that establishes a useful fact about a method. But the preceding sentence imposes a heavy burden: it must be possible to prove a theorem and facts established by the theorem must be useful. We find finite-sample facts indispensible because real datasets have finite samples and asymptotic theorems never tell us how to apply their conclusions to finite samples. But finite-sample theorems about contemporary methods are rare; it seems inescapable that they are at least extremely difficult, given their popularity in earlier eras.

This paper considers a complementary tool for opening our black-box methods, modeled explicitly on the approach molecular biologists use to open Nature’s black boxes.

I (Dan) came across a fun and fascinating article by Jim Hodges about how we explain and understand (or, really, how we don’t) statistical models. It falls nicely within the space of things that I’ve been thinking about recently and it is well worth a read.

The focus here is very much on quite simple linear mixed effects models (and maximum likelihood-type fitting of those models) but it’s really about building up a framework for systematically understanding statistical models. The language he uses is that of scientific experiments and it reminds me a lot of how Aki both talks about his computational experiments, and how he writes them up before they’re turned into papers. (See, for example, his R notebooks)

Anyway. I have nothing specific to add except this is a thing that is worth reading and thinking about and building off.

Dan:

If you can’t come up with an interesting title, just take one from this list. That always works for me!

Quote from the blogpost: “I (Dan) came across a fun and fascinating article by Jim Hodges about how we explain and understand (or, really, how we don’t) statistical models.”

I always associate posts by mr. Simpson in my head with music, because (if i am not mistaken) he frequently incorporates lyrics and/or song titles in his posts. I like songs and music.

The quote above reminded me of lyrics by Paul Weller in the song “Changing man” which goes like this (and could possibly be a fitting title of the post as well):

“The more i see, the more i know. The more i know, the less i understand”

https://www.youtube.com/watch?v=0v9WhRpQw8E

Safety in Numbers seems appropriate. Or maybe “¿Safety in (small) Numbers?”

Hodges wrote (p, 4),

“As a matter of strategy, is our discipline or indeed an individual researcher better off doing something relatively simple (the molecular-biological approach) and learning something quickly, or betting on the ability to produce useful facts in the long run with mathematics? A reasonable strategy would, it would seem, do some of each.”

The “some of each” process describes what often actually goes on in doing research in mathematics. A couple of quotes come to mind. (I don’t recall the exact quotes, so am giving paraphrases; I also don’t recall the names of the people I am quoting.)

“I have eyes to see where I am going and feet to get there. That is the role of intuition and proof in mathematics, no more and no less.” ( From someone who was once on the math faculty at Bryn Mawr College, contemporary with Emmy Noether.)

“I spend MWF trying to prove the theorem, and TuTh trying to find counterexamples.” From someone giving a talk at a conference)

To elaborate for non-mathematicians: Research in math involves two aspects: Conjecturing on what is true, and proving it. In actual research, these are intertwined. Someone may examine several cases where the same “givens” occurs and observe that they all have an additional property in common. So the person may conjecture that the “givens” imply that additional property. They (or someone else, or several other people, either simultaneously, collaboratively, or serially) may then try to prove the conjecture. The process of trying to prove the conjecture often involves looking at examples, which may give insight into developing a proof, or which might produce a counterexample (that disapproves the conjecture). But a counterexample may lead to an altered conjecture, with perhaps an additional hypothesis. And so on …

Reminds of CS Peirce’s conception of mathematics as experiments performed on diagrams or symbols instead of chemicals – with the near certainty of a “proof” being simply from easy to obtain adequate replication (by any other qualified mathematician with pen and paper and some free time).

I believe it is important to study and analyze a process carefully before applying a statistical method to it otherwise errors in application are likely to occur. A case in point is the prescription of the Bayesian method for statistical inference in diagnosis which has been done without studying the diagnostic process in practice. The result is, it is the incorrect method as it is not employed for diagnosis in practice, for example, in any of the scores of diagnostic exercises in real patients, such as clinicopathologic conferences ( CPCs ) and clinical problem solving exercises which have been published in the New England Journal of Medicine. The reason is because this method leads to wrong diagnosis in some patients. For example, in a healthy young woman presenting with highly atypical chest pain, the prior probability of acute myocardial infarction ( MI ) is extremely low, which may lead to this disease being ruled out without testing by interpreting it as strong prior evidence against this disease. And even if a test, an EKG is performed and acute Q wave and ST elevation EKG changes ( acute EKG changes ), with likelihood ratio of 13 observed, the resulting posterior probability of 50 percent would be interpreted as equivocal evidence from which acute MI would be inferred to be indeterminate which would be incorrect clinically. In the clinical problem solving exercise in which this real patient was discussed, acute MI was formulated as a diagnostic hypothesis without any prior evidence for or against it. This hypothesis was proven correct and acute MI correctly inferred with near certainty from acute EKG changes interpreted as strong evidence based on its known high frequency of 85 percent of leading to correct inference of acute MI in patients with varying prior probabilities. The method actually employed to make this correct diagnosis is, I believe, the frequentist method.

The frequent method is employed in practice, as it enables us to diagnose a disease accurately in any patient regardless of prior probability as prior probability does not play any role as ( prior ) evidence in this method.

“The frequent method is employed in practice, as it enables us to diagnose a disease accurately in any patient regardless of prior probability as prior probability does not play any role as ( prior ) evidence in this method.”

Ignoring prior probabilities themselves is actually a prior assumption that places a lot of weight on extreme values and outliers. This might work in some cases, but in others you will end up with extremely unreasonable results that also fail to answer your target questions. No method is perfect – like you describe, a poorly calibrated Bayesian method can lead to horribly misguided results – but thinking the frequentist interpretation of probability gives you the kind of certainty you describe is misguided at best and horribly wrong at worst.

What I describe is how diagnosis is actually performed in practice by physicians. I have no preference for the frequentist or the Bayesian method. It is just that I cannot find a study or case report in which diagnosis in a real patient has been performed in a Bayesian manner. If you know of one, please tell me about it. The problem faced by a practicing physician is that he encounters any given disease in different patients with varying prior probabilities ranging from the very high to the very low and his goal is to infer a disease correctly in every patient regardless of prior probability. The method that he has evolved, it seems to me, is to suspect a disease from a presentation in every patient and then look for strong evidence for it by performing a test from which he infers the disease.He finds this method to be most accurate and therefore he employs it which is very similar to the frequentist method. It is not that he consciously employs the frequentist instead of the Bayesian method. Practically all physicians do not know what these methods are or what the difference between them is. They just employ the method for diagnosis which is most accurate in their patients!

Bimal:

I can believe that physicians employ a method for diagnosis with the goal of being most accurate for their patients. But just because they want this, that does not mean this is happening.

+1

Andrew, this is actually happening as the diagnostic accuracy in practice in general is 85 to 90 percent and in CPCs 98 percent. It is because of this high accuracy that the practice of clinical medicine is a going concern and we do not have many more malpractice suits than at present. I believe this high accuracy is achieved primarily by the frequentist method in practice and could never be achieved by the Bayesian method. What I would love to see is a study in which the Bayesian method is employed for diagnosis in about 25 published diagnostic exercises in real patients and its accuracy compared to that of the method which is now employed which is the frequentist method.

The concepts of frequentist and Bayesian approaches become less distinct when we start talking about decision-making processes in practice rather than conducting statistical tests and computing probabilities. You could argue that a doctor is mentally constructing a prior probability distribution for the diagnosis by taking a patient’s history, then adjusting the distribution based on current complaints and observations. Probabilities are updated when lab tests return. But you could also say that the doctor is collecting parameter estimates of past and present patient characteristics, plugging those into a mental model that relates sets of symptoms and medical histories, and then deciding if the pattern that emerges is similar enough to the pattern to which observations would converge in the long run when the true diagnosis is one disease or another.

I find it a bit easier to believe that they are thinking in the first way you describe rather than the second. In a simple example, if I want to decide whether or not it is too dangerous to drive down the interstate, I can think ‘there is a small chance that I have a wreck’ or ‘if I drive this road thousands of times, over the long run I would only be in a wreck 3 or 4 times’. I suspect most people think in the first way, not the second. I suspect physicians do the same. I suspect they have many many priors about the patient based on a variety of experiences (training, particular cases they have seen, prior info on the patient, how risk averse they are, etc) that subconsciously influence their decision.

So maybe I am way off here, but it does seem that when thinking about decisions, people use prior probabilities with their mental models.

I was thinking along these lines. I would think diagnosticians use all sorts of priors. The prior on malaria would be almost nil in a northern climate and much higher in a tropical climate.

I believe the best way to find out what happens during diagnosis is to carefully examine the process of diagnosis in real patients in published diagnostic exercises which I did in 50 CPCs ( Diagnosis June 2016). What I found was that a number of diseases are suspected from initially available data and formulated as diagnostic hypotheses to form a comprehensive differential diagnosis. There is no prior evidence in the form of a prior probability for any of these diagnostic hypotheses. Diagnostic reasoning then advances by seeing how well a disease explains given data which seems to me to be primarily reasoning by likelihood. Finally, the discussing physician mentions a test whose result will establish the correct diagnosis. I found the physician to make the correct diagnosis in 49 out of 50 patients when the result of this test is revealed. The prior probability of a disease seems to play no role in this reasoning.Diseases with low as well as with high prior probabilities are diagnosed accurately by what appears to be primarily the frequentist method.

I do not find this reasoning employed by experienced physicians to have any significant element of Bayesian reasoning.

I’m not saying that some doctors are frequentist while others are Bayesian, nor that individual doctors switch back and forth between the two. I’m saying that we can be 100% certain that doctors are not mentally conducting any kind of statistical tests in their heads (though they could employ software to do so, I suppose). The idea of “Bayesian reasoning” is a subjective abstraction, or in other words, a model. My point above is that we test models based on how well they explain observations and one can generally define Bayesian and frequentist in such a way that either could explain diagnostic processes. It’s fine, for example, to say “Diagnostic reasoning…advances by seeing how well a disease explains given data which seems to me to be primarily reasoning by likelihood.” But it seems equally fine to say that doctors form a prior expectation of a disease and then update their expectation as they get more information–they start with a history (“he’s a young non-smoker so probably doesn’t have lung cancer”), adjust based on patient complaints of symptoms (“he coughs up blood and has difficulty breathing, so it’s a little more likely he has lung cancer”), and adjust again after conducting tests (“chest x-ray is inconsistent with lung cancer, so far less likely”)–which sounds Bayesian to me.

“You could argue that …”

This may be true for some physicians, but I think a lot just “go by the seat of their pants”, trusting their instinct — and often get it wrong. (Bear in mind things like one of Gigerenzer’s findings: That a large proportion of physicians say something has a fifty-fifty chance just because there are only two alternatives. Also bear in mind that there are a lot of things that there aren’t lab tests for.)

No physician with whom I have raised the question has ever been willing to discuss the likelihood of anything with me. The answer is always, “Well, let’s just see what this next test I ordered says.”

I looked up “differential diagnosis” and this page certainly mentions a “Likelihood ratio-based method”

https://en.wikipedia.org/wiki/Differential_diagnosis

Also, anecodtally, overtesting is a problem in medicine:

“tens of billions are spent every year on “defensive medicine,” marked by unnecessary tests ordered to protect doctors from the possibility of a lawsuit for missing something. “Yet diagnoses are still missed, with grave consequences,”

https://www.hopkinsmedicine.org/news/media/releases/diagnostic_errors_more_common_costly_and_harmful_than_treatment_mistakes

And then there was the physician who said that he preferred a particular diagnosis “because we know how to treat it.” (I might have finally convinced him that the diagnosis the physical therapist gave was more accurate, because her treatment produced some improvement, while his treatment didn’t produce any improvement).

Physicians are famous for using prior information. “If you hear hoofbeats in Central Park, think ‘horses’ but don’t forget zebras,” don’t they still teach that in med school?

Just to make sure I understand – in your example, are you saying that the prior probability of acute MI “in a healthy your woman presenting with highly atypical chest pain” is low? Or are you saying that the prior probability of acute MI “in a healthy young woman” is low?

“acute MI correctly inferred with near certainty from acute EKG changes interpreted as strong evidence based on its known high frequency of 85 percent of leading to correct inference of acute MI in patients with varying prior probabilities.” Maybe I am misunderstanding, but shouldn’t this kind of strong evidence from the data overwhelm the prior anyway? I guess I fail to see how a Bayesian method comes to the wrong conclusion here, if it is properly applied.

In the Bayesian method, as I understand it ,the low prior probability of acute MI of 7 percent in this patient is combined with the likelihood ratio of 13 for acute EKG changes to generate a posterior probability of 50 percent which represents total evide3nce from which acute MI is inferred to be indeterminate in this patient which is incorrect. I do not see how the strong evidence overwhelms the prior in this reasoning.

Where do you get this mysterious 7 percent? Are you saying that in the past only 7 percent of young women presenting with acute chest pain in an ER actually had MI based on some actual data?

Or, are you saying that only 7% of all women under age 30 have ever had an MI during their lifetime or some other nonsense number that has nothing to do with the situation at hand?

Seems weird to me if 93 percent of women under age 30 or so who come into an ER with acute chest pain have something other than an MI…

Here’s what I think the Doctors do, they think of all the disease states they can think of which are consistent with the presentation “young woman at ER with acute chest pains”…. And then they weight each of these disease states equally, which is wrong, but basically easy for them to think about.

suppose there are say 5 disease states, things like MI, GERD, bone cancer of the sternum, and a couple others…. so the prior they are actually assigning is 20% to each of these.

Now, they order an EKG, and p(abnormal EKG | GERD or Bone cancer or etc etc) is very small like 4%, and p(abnormal EKG | MI) = 0.997.

Now, posterior probability of MI is 0.997 * 0.2 / (0.997*0.2 + 0.04 * .8) = 0.862

So they decide that the person is very likely to have MI, so they maybe begin treatment for that while ordering a second test maybe a blood draw or something…

The Bayesian calculation is going fine here, except that in fact the priors shouldn’t be 0.2 for each of the 5 disease states, probably MI is much more likely a-priori than say bone cancer of the sternum… so it should probably have been given 0.5 to begin with or something…

This is exactly what I asked above. And what I was getting at with the prior. Doesn’t seem right.

Anyhow, if I understand correctly, Bimal is presenting a scenario like this: There’s a Bayesian clinician and a Frequentist clinician. They are given the information that there is a probability of 0.07 that healthy young women presenting with acute chest pain have MI (sounds wrong, but whatever). Then they are given a case where a patient as described above gets this EKG test that has a likelihood ratio of 13 for MI. The Bayesian clinician takes out his Fagan’s nomogram and determines that a prior probability is 7% and the likelihood ratio is 13, so the post-test probability must be about 50%. The Frequentist clinician ignores the prior probability of 7% and diagnoses it as MI based on the EKG test. Frequentist gets it right.

That’s what the example sounds like to me, but perhaps I have misunderstood. If I did interpret the example correctly, I would argue that this is a highly artificial scenario, and I do hope that the physicians I see use their prior experience and information in an informative way (the ones I work with do).

Garbage in, garbage out… As Phil says, if the 7% number is accurate and the likelihood ratio really is 13 then about 50% is correct, but 7% seems wrong and probably based on some population wide number of how many young women have ever had MI. Using this totally unrelated number and applying it to women who present at an ER with acute chest pain will give you the wrong answer, GIGO.

> are you saying that only 7% of all women under age 30 have ever had an MI during their lifetime

Does that sound even remotely plausible to you? Do you think that one in fifteen women younger than 30 have had a MI? How many 30yo women do you know? How many have had an MI?

(That seems to be off by more than one order of magnitude. In the US, less than 0.5% of women aged 20-40 report having suffered a MI. Overall prevalence in women is just 2.3%.)

> Seems weird to me if 93 percent of women under age 30 or so who come into an ER with acute chest pain have something other than an MI…

In 2016 there are over 8 million patients going to an ER with chest pain and less than 1 million cases of MI (many of them not linked to an ER visit, but let’s assume they all go through thoug ER for the sake of the argument): over 88% of the people coming into an ER with acute chest pain have something other than a MI.

3.2 million women aged 15-64 had an acute chest pain emergency. I can’t find how many cases of MI there were on that group but let’s say it’s around 200k. That means that over 94% of women under age 65 who come into an ER with acute chest pain had something other than an MI.

I couldn’t figure out where the 7% was coming from, it seemed high for lifetime MI for young women, it seemed low for ER patients with acute chest pain (but apparently you’ve found info that suggests it’s not low for ER patients with acute chest pain)

The truth is, acute chest pain would obviously be from say being slammed in the chest by the steering wheel during a car accident, or from falling off a ladder and landing on a rock… or etc etc in many cases. “woman with acute chest pain” is not as much information as the doctor would actually have from a basic history, like whether there were chest related injuries or whether there was a history of GERD or whatever.

What are the numbers for “significantly overweight woman reports sudden onset of acute chest pain with no real warning, shortness of breath, a history of high cholesterol, little exercise, and a sedentary desk job, both father and brother died of heart attack” ?

In any case, the Bayesian solution to this conundrum isn’t “if and only if there’s a high probability of MI, treat for MI”. You don’t need a diagnosis like “98% chance of MI” in order to make the right decision… what you need is a Bayesian decision theory application, where you look at all the actions you could take, and decide which one has the best expected benefit to the patient…

if there are 17 different things that are non-life-threatening that it could be, and then there’s a 50% chance of MI, you should probably be treating as if MI until further information narrows the range of possibilities to something like GERD or fractured sternum or pleural infection or whatever.

If 50% of young women with chest pain and these EKG characteristics have something other than MI then there is nothing wrong with the calculation. If this is not the case then there’s something wrong with the prior or the conditional probability. Bayes’s Theorem can’t generate a wrong answer, given correct inputs.

Given the obviousness of “Bayes’s Theorem can’t generate a wrong answer, given correct inputs”

Why do many go along with “Bayes’s Theorem likely will generate useful answers, given incorrect inputs”.

It is rampant in the publications that use Bayesian analysis where default priors are used and posterior probabilities are interpreted almost literally. Whether non-informative, weakly informative or somewhat informative it seems to me that it needs to be assessed just how likely that likely is.

Some suggestions from Michael Betancourt here https://betanalpha.github.io/assets/case_studies/principled_bayesian_workflow.html#24_model_adequacy

> generate a posterior probability of 50 percent which represents total evidence from which acute MI is inferred to be indeterminate in this patient which is incorrect.

As far as we know, it is “incorrect” only in the sense that predicting the probability of rain tomorrow as 50% will prove “incorrect” when tomorrow comes and it will either rain or it will not.

However, if you were complaining that the guidelines had claimed 50% probability in six cases and in all of them the actual answer had been the same you would have a (p<0.05) point.

A number of important points have been raise by my posts about the method of diagnosis which I shall now address:

1.The prior probability of acute MI of 7 percent in the 40 year old woman is derived from its prevalence in a population of similar patients in the clinical problem solving exercise in which this patient was presented ( Pauker et al. NEJM 1992; 326:688-91.

2. In my view, the significance of the adage ‘If you hear hoofbeats, think of horses, not zebras has been misunderstood. It means that we should think of a common disease first and evaluate it rather than a rare disease, given a presentation.It does not mean that a presentation ( prior probability )serves as ( prior ) evidence for a common disease only. For ‘zebras, do occur in practice. For example, nearly all patients presented regularly in the Sunday New York Times Magazine are ‘zebras’ as they have highly atypical presentations or are rare.They are usually diagnosed correctly because they are suspected and formulated as diagnostic hypotheses which are proven correct by appropriate tests. They would not be diagnosed if they were ruled out without testing because they are zebras.

3.My intent in my posts is not to defend the frequentist method against the Bayesian method, but to present my formal and informal findings regarding the actual method employed for diagnosis in practice.

These findings indicate, the method is primarily frequentist.I cannot find any evidence for employment of the Bayesian method in any published diagnostic exercise in a real patient.

4. I am not the only person to raise concerns about the Bayesian method in diagnosis. In 1977, the eminent clinical investigator, Alvan Feinstein, wrote ( Clin Pharmacol Ther 1977; 21: 482-496 ):

I know of no published work in which the initial claims of a Bayesian enthusiast have been confirmed by the results found in clinical reality. I know of no clinical setting or institution in which the Bayesian diagnostic methods are being regularly used for practical diagnostic purposes in a routine or specialized manner. I know of no specific constructive, practical diagnostic decisions-involving real world patients , data and doctors- in which the Bayesian methods have made a prominent contribution that could not have been achieved as easily without Bayes’ formula. ( If readers know of any, I hope they will tell me ).

I believe Feinstein’s words are as relevant today as they were more than 40 years back.

5.The current situation regarding the correct method of diagnosis is similar to that of a treatment which appears attractive but whose efficacy has not not been proven by any study, preferably a controlled trial.

Similarly, it seems to me, the Bayesian method appears attractive, but its diagnostic accuracy has not been proven by any study.

What is needed, I think,is a study comparing the diagnostic accuracy of the Bayesian method to that of the frequentist method in, say, 50 real patients such as those presented in diagnostic exercises in the New England Journal of Medicine.

If such a study reveals the Bayesian method to be more accurate, I shall gladly accept this result. But till such a study is done, the frequentist method is the one which appears to be employed in practice as seen in published diagnostic exercises due to its high diagnostic accuracy in patients with varying prior probabilities.

If the 7% is a correct frequency of occurance in your population, and the likelihood ratio you gave is a correct one in terms of frequency of occurance then it really is true that only 50% of your patients have MI… if you are saying that you prefer the over diagnosis of MI it is probably because as I mention it’s better to operate on the basis of MI because this is the highest expected utility course of action.

Nothing you have said actually suggests that your doctors are operating on a frequentists basis, which would involve checking whether the frequency of occurance of test results is abnormal for a given random number

You are correct, only 50 percent of the patients similar to the given 40 year old woman with prior probability of 7 percent who have acute EKG changes will have acute MI.But the physician is not employing this frequency to infer acute MI definitively in the given 40 year old woman. Instead, he is employing the 85 percent frequency of acute MI in patients with varying ( not the same ) prior probabilities who have acute EKG changes, which does not correspond to the Bayesian posterior probability of 50 percent for inference.As I understand it, the method is frequentist because he employs a frequency in a heterogenous series of patients with varying prior probabilities for inference and not a Bayesian posterior probability.

The thought behind a frequentist inference is, I believe,that the fact that 85 percent patients with acute EKG changes have acute MI regardless of prior probability makes this test result strong evidence from which acute MI is inferred with a certainty ( ? confidence level ) of 85 percent ) in any patient regardless of prior probability.

I shall now discuss another patient to bring out the frequentist nature of inference during diagnosis.

Let us consider a patient often seen in practice, a 65 year old man with multiple cardiac risk factors who presents with highly typical chest pain in whom an EKG is performed which reveals non-specific T wave EKG changes which have a likelihood ratio of 1 for acute MI.

The prior probability of acute MI is very high,let us say 85 percent in this patient.In the Bayesian method, this prior probability would be combined with likelihood ratio of 1 to yield a posterior probability of 85 percent.

In the Bayesian method, acute MI would be inferred with near certainty in this patient from this very high posterior probability.I doubt however if this inference will be made in this patient with nonspecific EKG changes, LR of 1 in practice as the frequency of this test result in inferring acute MI correctly in patients with varying prior probabilities is about 50 percent, which makes it worthless.

In the frequentist method, which I believe, is employed in practice,the hypothesis of acute MI in this patient will neither be ruled in nor ruled out from this frequency of 50 percent in sharp contrast to the near certain inference of acute MI in the Bayesian method.

The debate about whether the Bayesian or the frequentist method is employed for inference during diagnosis in practice can be easily settled experimentally,I believe, by giving data in real patients like the above two patients to physicians and observe how they actually infer a disease in them.

This is a response to the Aug. 2 comment of Daniel.

He writes there that when a patient comes to the ER with acute chest pain, you assume he has 50/50 chance of acute MI and you do an EKG which updates posterior probability to 80+ from which acute MI is inferred.W

Bimal, reply to the comment here: https://gelmanstatdev.wpengine.com/2019/07/30/i-dont-have-a-clever-title-but-this-is-an-interesting-paper/#comment-1094907

and it will appear below my reply.

Also note that if you use a less than sign the blog will eat the stuff after it looking for an HTML tag. easiest is just to type the words “less than”

This is a response to the Aug 2 comment of Daniel.

He writes there that when a patient comes to the ER with acute chest pain you assume he has a 50/50 chance of acute MI and you do an EKG which updates posterior probability to 80+ from which acute MI is inferred.

If I understand him correctly, he is saying we should assume the prior chance of 50/50 in any patient with chest pain regardless of presentation.

This approach is not too different from what I call the frequentist approach in which acute MI is formulated as a diagnostic hypothesis which to me is the same as giving it a 50/50 prior chance.

The difference in the two approaches appears to be that his posterior probability represents a frequency of acute MI in a homogenous population of patients similar to the given patient while my frequency is in a heterogenous population of patients with varying prior probabilities.

If we adopt a Bayesian approach in practice, as Daniel suggests, in which a suspected disease in any patient has a prior chance of 50/50, then the diagnostic accuracy of this approach will be close to that of the frequentist method.

Bimal,

I would hesitate to suggest that one “should” do any particular thing (normative), what I am trying to say is that as a matter of description, what you call a “Frequentist approach” is anything but, it is actually a Bayesian approach in which a particular prior which is *different from the frequency of occurrence in some reference set* is adopted.

In Bayesian calculations, one uses probability to express a degree of credence, plausibility, or sometimes called “belief”. Whereas in a Frequentist approach one uses probability *solely to express how often a thing would occur if repeatedly performed in large quantities*

Bayesian thinking is different in that it applies to singular events, not just collectives of events, and it expresses a state of information about the situation rather than a frequency of occurrence.

If someone does something probabilistically, or even non-formally and intuitively with some features of probability, and it corresponds to a particular probabilistic calculation, but the probabilistic calculation isn’t arrived at by plugging in the frequency with which things occurred under some repetitions or across some large population, then it is a Bayesian calculation *by definition* because the probability isn’t derived from Frequency considerations.

If you ask “what is the probability that Jane Doe who is standing in front of me has an MI” then by definition you are doing Bayesian calculations, because there is not a large collection of identical Jane Does which you could sample from to even define a collection of events.

Now, if you ask “what is the frequency with which people who presented with the same symptoms as Jane Doe during the last 12 months had MI” then you are doing a Frequentist calculation. Please note that when you use Bayes theorem but you plug in frequencies you are doing *both* a Frequentist and a Bayesian calculation, this is the only case where they coincide, and it corresponds to the state of information “all I know about this situation is the frequency of occurrence in the reference set”.

Another aspect of Bayesian calculations is that they express themselves in terms of probability, so 73% probability of MI vs 27% of not MI is a Bayesian calculation. When we want to make a decision, which includes deciding in binary “treat for MI or not” we take this probability calculation, and combine it with a Utility (a measure of goodness or badness of outcome) and choose to maximize the utility (or minimize cost, it’s the same thing but negated).

So a decision like “treat for MI” which is arrived at by considering probability of MI vs not MI and how well the patient is likely to do if you do treat for MI vs if you don’t, is also a Bayesian calculation.

A frequentist calculation generally doesn’t use utility, but rather makes decisions like “if test comes back positive, treat for MI” because it reduces the frequency of being wrong, rather than the utility.

In response to your latest comment today, Daniel, I think there is some misunderstanding about what should be called the frequentist method of statistical inference.It is customary to use this term, I believe, to refer to the non-Bayesian method pioneered by Ronald Fisher and Jerzy Neyman in the first half of the 20th century. In this method, the parameter ( disease ) to be inferred is set up as a hypothesis without any prior probability attached to it based on our knowledge and experience. The hypothesis is evaluated by a test and proven to be correct if a test result is obtained which has has a high frequency of leading to an accurate inference as determined by a test of significance or by a confidence argument.

It is this frequentist method, I suggest, which is employed for statistical inference during diagnosis in practice. The known high frequency of 85 percent of accurate inference of acute MI from acute EKG changes in patients with varying prior probabilities, generates a confidence level of 85 percent, so to speak, with which acute MI is inferred definitively from acute EKG changes in any patient regardless of prior probability such as in the 40 year old woman.

Bimal Jain MD wrote: “The prior probability of acute MI of 7 percent in the 40 year old woman is derived from its prevalence in a population of similar patients in the clinical problem solving exercise in which this patient was presented ( Pauker et al. NEJM 1992; 326:688-91.”

I took a look at the article you cited for the source of that prior probability (https://www.nejm.org/doi/full/10.1056/NEJM199203053261007 ), and I found this: “In this woman, the ECG abnormalities raised the odds some 13-fold and thus provided strong evidence of an acute myocardial infarction even though the initial likelihood was low. If the initial odds were 1:13 (7 percent) in this woman, the ECG shown in Figure 1 would raise the odds to 13:13 (50 percent).”

Now it’s pretty clear from this example that the 7 percent isn’t an empirically obtained value, but rather a hypothetical value chosen to simplify the math of an example.

There’s also the matter that the *actual* prior probability shouldn’t be for a woman picked at random from the general population, but rather for a woman who has chest pain serious enough to feel the need to go to a doctor. The chest pain obviously raises the probability of a heart problem even before the EKG is done.

Also, the commentary linked above in NEJM does not reflect the thinking processes that would indicate a frequentist solution. The commentary states “Although the young woman had neither a lifestyle nor familial characteristics suggesting a propensity for coronary artery disease, the discussant immediately questioned whether ischemic heart disease was the culprit and held fast to that hypothesis, although no supporting evidence was presented until he saw the ECG.” That is what you call a strong prior, right?

Further, the gist of the entire commentary is that the clinicians do not know with any certainty what the diagnosis is. “Physicians must often decide whether to initiate therapy in the face of diagnostic uncertainty.”

The commentary indicates that the physician chose therapy for acute MI, not because of a likelihood ratio of 13, but because, “Even given this risk, the benefit of therapy is so much higher than the risk that we must conclude that we do not have to be certain of the diagnosis of acute myocardial infarction to initiate thrombolytic therapy.”

The thinking based on this commentary sounds more like this – I have a strong hunch this is acute MI, even though acute MI is rare in this type of person; my test result has a strong likelihood of acute MI; the benefits of giving therapy for acute MI strongly outweigh the risks; so I treat for acute MI.

Importantly, at least as described in the NEJM commentary, the physician has very strong prior about the patient before the test is ever run.

Thanks jd, all of that indicates *Bayesian* thinking.

Further, I’d mention that any result for which a proper prior exists that gives the given answer is a Bayesian answer, even if the proper prior isn’t the one that data suggests should be used.

The results of the calculation which are being called “Frequentist” are the results that you’d get for a 50/50 prior on MI vs not MI

This is actually *more* Bayesian than the more frequentist result which is to insist that the only *correct* prior to use is the 7% prior because this is the frequency of occurrence in some particular population.

Basically, patient comes into ER with acute chest pain, doctor assumes about 50/50 chance they have MI, does a test for MI which is positive, updates their probability calculation to 80+% on the basis of this data, does a cost/benefit analysis of treating as if MI vs as if not MI, realizes that treatment for MI is the best choice using what is essentially Bayesian Decision Theory, and then treats for MI.

ALL of this is *Bayesian thinking*

Addressing Daniel’s comment https://gelmanstatdev.wpengine.com/2019/07/30/i-dont-have-a-clever-title-but-this-is-an-interesting-paper/#comment-1095651

Good points. This discussion is interesting to me, because I have been thinking along the lines that everyone thinks Bayesian in everyday life, they just don’t know it.

The premise of the first comment by Bimal was basically that diagnosis is not probabilistic. I disagree with this simply because diagnosis is performed by human brains, and I do not see that it is possible that a person making a decision like diagnosis in a critical care environment could be separated from their priors (prior experiences, training, beliefs, etc). Even if they choose to use a test result in the face of any other prior evidence, belief, etc., that in itself is a prior (either a weak one for prior experience, etc, or a strong one for the test), right?

It seems to me that the diagnostic decision making is probabilistic simply because a person is making the decision. So I disagree with the notion that diagnosis is not probabilistic.

As for real world trials, I can accept that using some sort of prior based on frequency of occurrence in a population, a likelihood ratio from a test, and something like Fagan’s nomogram to determine posterior probability, might not be the best method for a clinician. But I would argue that this is not a good example of probabilistic diagnosis because it is much oversimplified, and the choice of prior is wrong (does not reflect all prior information involved in the decision).

> There’s also the matter that the *actual* prior probability shouldn’t be for a woman picked at random from the general population, but rather for a woman who has chest pain serious enough to feel the need to go to a doctor.

That’s kind of obvious, isn’t it? That’s what the 7% stands for. I don’t know why some people here finds that figure so problematic. In any case, it is orders of magnitude more plausible than saying that 7% is the probability of a random woman having an acute MI episode right now.

> The chest pain obviously raises the probability of a heart problem even before the EKG is done.

Yes, it does. To 7% according to that case study. I don’t know where did they take that number from but if anything it seems too high compared with the output of this tool: https://qxmd.com/calculate/calculator_287/pre-test-probability-of-cad-cad-consortium

Carlos, the part I find hard to believe is if 7% is the percentage of people who come to the ER who think they are having a heart attack, and actually are. It’s not that relevant for MI if you experience a sharp chest pain on the right side when you breathe in deeply, but it’s relevant for pneumonia… similarly chest pain subsequent to blunt trauma to the chest after a fall sharply localized to a particular spot you can palpate… broken rib…etc

but a computer database would list all these irrelevancies as a population with acute chest pain.

I get it that 7% is very high compared to fraction of 40 year old women who are having a heart attack now… I just find it implausibly low compared to women seeking help for what they believe is a heart attack, or people whom a doctor can not rule out heart attack by basic inspection…

Carlos, I guess that’s the real relevant frequency, what fraction of people at the ER who a trained doctor would bother to order an EKG on are having MI? Does it seem plausible to you that 93% of the time a doctor suspects MI that they’re wrong?

I find the idea of writing up notebooks before turning them into papers intriguing.

There is in the software development world a methodology called test-driven development. It works as follows. You have a piece of code to test, say some function that adds two given integers together. To begin with, you define the function with an empty implementation – say, always returning 0. Then, before implementing it, you write several tests, e.g. checking that add(0, 0) == 0, add(3, 7) == 10, add(-3, 3) == 0 and so on. And only once you’ve done that, then you are allowed to implement the function. When you are done and all the tests pass, you can have some confidence in it working as intended.

I imagine non-exploratory statistical research could work in a very similar manner:

1. With your hypothesis formulated, construct a fake data set, making it have as similar a format as possible to what you expect the real data set will have.

2. Write all of your code for cleaning the data and analysing it. (In practice, this would be done e.g. using Jupyter or R notebooks.) Of course, the analysis will give you no or spurious results as we are dealing only with fake data.

3. Commit all of this using a version control system such as git, and then push it to a service like GitHub. That way anyone reading your paper can go back and inspect the methodology (including a timeline of changes to it) and be assured that you did in fact formulate all (or much) of your data cleaning and analysis before you had any inkling of what the data looked like.

4. Gather the data.

5. Now replace the fake data with the real data, and you will have the results of your analysis.

One objection to this would be that you don’t always know the exact format of the data before you have collected it. That may at times be true. But, so it seems to me, the more of these decisions can be done before collecting the data, the better. If some decisions related to data cleaning and processing are done post hoc, at least the data analysis is not.

The advantages of this methodology are many:

1. It strongly discourages fishing, p-hacking and multiple comparisons, as data cleaning, processing and analysis occurs prior to there being any knowledge of the data.

2. It makes clear that researchers formulated their hypotheses before they had any knowledge of the data.

3. It makes it easier to inspect and reproduce the study, as all the steps of data cleaning, processing and analysis are publicly available in code.

4. It may aid the data collection step, as you will have already thought through what data benefits the analysis and collect accordingly. For instance, if participants’ age are not relevant to your analysis, you may not need to collect that data.

5. When manually preregistering data analysis decisions, it is difficult to cover all decisions, as you may not necessarily know which decisions you need to take until you sit down to do the actual analysis. Sitting down to actually do the analysis before having the data forces you to discover these open questions before you have any knowledge of the data.

6. It allows others (editors, supervisors, colleagues) to easily scrutinise your hypotheses and methodology before you gather data, possibly catching critical issues at an earlier stage.

7. In case of Bayesian analysis, it also necessitates determining the prior before having any knowledge of the data (again with an independently certified timestamp).

I quote from the paper on the Garden of Forking Paths:

“In this garden of forking paths, whatever route you take seems predetermined, but that’s because the choices are done implicitly. The researchers are not trying multiple tests to see which has the best p-value; rather, they are using their scientific common sense to formulate their hypotheses in reasonable way, given the data they have. The mistake is in thinking that, if the particular path that was chosen yields statistical significance, that this is strong evidence in favor of the hypothesis.”

I suppose, following the analogy of the Garden, that this methodology translates to drawing up a detailed and timestamped route through said garden before ever setting foot in it.

But maybe I am mistaken. I myself did statistical research for my master’s thesis, and engaged then, to my shame, in many of these bad practices – not out of ill intent, but just because I didn’t know any better. Fortunately, I think hardly a person ever read that thesis.

E.G.

I did something similar with medical students. The data for their analysis would not be ready before the end of their research term. So I made up fake data and encouraged them to write a serious paper based on that arguing that we the got the real data it would take far less time to do a paper. Maybe just a bit of word processing to change what we guessed wrong. They agreed.

As I learned more about how the supervisors of these student worked (senior faculty members of prodigious medical schools) I convinced my director to allow me to choose not to work with them. About two years later, someone from their group advised me that they (the supervisors) decided it was too much work to get the real data, so they sent the paper off as is.

Unfortunately, I could not be sure they were actually correct (maybe the supervisors changed their minds) and I no longer had access to files from the project or even the names of the students. I have no idea how many people have read the paper.

This is why we can’t have nice things.

This is a reasonably good summary of the differences between the cultures of the statistics and machine learning communities.

Think this belongs in this post

Hadley Wickham, a statistician from Hamilton, has won the international 2019 COPSS Presidents’ Award.

Previously the award has primarily recognised highly theoretical contributions to statistics. This year is the first time it has been awarded for practical application.

https://www.nzherald.co.nz/nz/news/article.cfm?c_id=1&objectid=12254723

Keith:

Congratulations to Hadley. The award is well deserved.

Regarding the quote, “This year is the first time it has been awarded for practical application.” . . . All I can say is, the award has been given for practical application at least once in the past!

OK – wrong on that ;-)

Next hypothesis? – This year is the first time it has been awarded to someone whose primary appointment is not in an academic department.

Seriously, I think that news article was trying to point to most of Hadley’s work having been about computation.

I do remember that the Statistical Society of Canada’s requirement for awards once was specified as “use of hard math in the resolution of a statistical problem.”