The abstract of Birbaum (1962) says: “The likelihood principle states that the “evidential meaning” of experimental results is characterized fully by the likelihood function, without other reference to the structure of an experiment, in contrast with standard methods in which significance and confidence levels are based on the complete experimental model.”

I guess it depends on what do we understand by “evidential meaning”. I would say that the posterior distribution in Bayesian analysis is the result of combining the “evidential meaning” of the data with the prior distribution. The fact that the posterior depends on the likelihood _and_ the prior is not a problem!

A related but different question is what kind of prior selection methods are “acceptable”. Now I see better how changing the prior referencing the structure of an experiment goes “against the spirit” of the LP. But even if the LP didn’t exist, the real problem is that it goes “against the spirit” of Bayesian analysis! The prior is supposed to reflect the plausible values of the parameter and that shouldn’t depend on how we model the sampling distribution of the experiment. In the same way that the prior should not depend on reparametrizations, for example.

]]>There is a weak and a strong likelihood principle. As Yuling said in his blog entry, they essentially postulate that two likelihoods that are proportional to each other should lead to the same inference about the unknown parameter. What you seem to have in mind is part of the phrase that is typically used to “prove” that Bayesian inference comply with the likelihood principle: “The posterior distribution of the parameter depends on the data only via the likelihood, hence the inference complies with the likelihood principle”. But this is a faux argument. Unless I am missing something, to comply with the likelihood principle, the posteriors would need to be proportional if the likelihoods are proportional, which would mean that the priors would need to be proportional.

I am all in favour to do multiple analyses with different priors as part as a sensitivity analysis. So far I only wondered how a subjective Bayesian (i.e. somebody who sees probabilities as subjective beliefs and priors as expressing one’s belief) regard such an analysis. They would presumably think that the analyst has multiple personalities? I never thought about what the implication of doing multiple analyses with different priors is on claims on following the likelihood principle. But, yes, I agree with you, it would violate the likelihood principle. But then, as I said at the beginning, I never thought that there is much truth to the statement that Bayesians follow the likelihood principle. :)

]]>Yuling, my point is that somebody who uses a prior that yields a point estimate of 0.99 with n=10 and y=9 in all likelihood does not worry/care about PPC. Likewise, people who care about PPC would probably not use a prior that yields a point estimate of 0.99 with n=10 and y=9. :)

But thank you for pointing out that preprint of yours!

Thanks. I don’t remember if I had seen this argument before. I understand the likelihood principle to say something like “the inference about the parameter θ should depend on the sample data x only through its likelihood function L(θ|x)”. I don’t think it says that the inference shouldn’t depend on the model / prior.

If this is a valid objection to prior selection using reference priors then it would be an objection to any criteria for choosing a prior. The very idea of letting inference depend on the prior would be a violation of the likelihood principle. Presented with multiple analysis that used different priors and produced different inferences we should conclude that at most one is valid, and that would be assuming one of the priors is “right”.

The objection would be valid if the inference depended on the data through the prior, in the case where we let the data dictate the choice of model / reference prior. But that wouldn’t be an issue related to the use of reference priors in particular, the same violation of the likelihood principle would happen for any Bayesian analysis with a data-dependent prior.

]]>Berwin, my point is that PPC looks into the tail of y (the same as hypothesis testing), and does not conform the likelihood principle. I use a point estimate theta=.99 in the example for convenience (or equivalently a prior strongly favoring theta=1), but the conclusion would not change with other informative priors.

Regarding your distinction on prior-as-part-of-the-data-generating-process versus fixed-unknown-parameters, we discussed this distinction in our recent paper https://arxiv.org/pdf/2006.12335.pdf Section 2.2.

Yes, and this binomial/negative binomial model is the canonical example.

The likelihood is proportional to $p^9 (1-p)^1$, i.e. the kernel of a B(10, 2) distribution.

The reference prior for the binomial sampling model is B(0.5, 0.5), so the posterior will be B(10.5, 2.5).

The reference prior for the negative binomial sampling model is B(0, 0.5), an improper prior, and the posterior will be B(10, 2.5).

Likelihoods are proportional, so the likelihood model stipulates that one should make the same inference about the parameter.

But as the priors depend on the sampling methods, and the posteriors differ, a Bayesian analysis using reference priors will lead to different inference depending on which sampling model is chosen, thus violating the likelihood principle.

See also the discussion in Chapter 7.4, p 232ff, of Lee (2012, Bayesian Statistics: An Introduction, 4th ed., John Wiley & Sons) or Lesaffre and Lawson (2012, Bayesian Biostatistics, John Wiley & Sons) who state on page 117 “Jeffreys rule can be derived using other principles [. . . ]. However, it has been criticized because of violating the likelihood principle since the prior depends on the expected value of the log-likelihood under the experiment (probability model for the data)”.

]]>> objective Bayesians who use reference priors (…) definitely do not follow that principle

What do you mean by “do not follow that principle”? Does the use of a reference prior somehow go against the likelihood principle?

]]>But much of what you seem to be address could just follow from sufficiency and the likelihood function being a minimal sufficient statistic for the model assumed. But without the data, the model assumed cannot be be checked and so we are stuck doing math rather than statistics.

]]>At the end of the second paragraph you say “Other informative priors can exist but is not relevant to our discussion here”. Actually, what prior are you using? You do not seem to specify any prior (on the parameter). Or do you refer to the choice of the binomial model as using an informative prior? Also, which estimator are you using?

Your calculations seem to imply that you are using either a flat prior (Beta(1,1), Laplace-Bayes) and the MAP estimator, or the (improper) Haldane prior (Beta(0,0)) and the posterior mean initially. But what are you using later when you have n=10, y=9 but $\hat\theta=0.99$? To obtain this estimate, would you not need n=100 and y=99 at least? If so, the PPC p-values would no longer be contradictory.

Finally, I don’t think that the likelihood principle is often phrased as an axiom in Bayesian statistics. I never heard this before and it would be highly problematic for some variations of Bayesian statistics. :-)

My take is that the likelihood principle is, as its names says, a principle that is intuitively appealing and is making sense to many people. Furthermore, it is well known that frequentist statistics does not follow this principle. So it was/is used by (some) Bayesians to claim the moral high ground by claiming that they did/do follow the likelihood principle. That was fine before the event of MCMC, which made the application of Bayesian statistics widely possible, but is not really tenable now. Actually, it was not really tenable then either and doesn’t stand up to closer scrutiny. In Good’s classification of Bayesians[^1], there might be some categories of Bayesians that follow the likelihood principle, but objective Bayesians who use reference priors and (some) subjective Bayesians definitely do not follow that principle.

[^1]: IIRC, Good actually does not use “do you follow the likelihood principle: (a) yes, (b) no” as one of his categorising questions. And, in my opinion, a more crucial question that is missing is “(a) do you think of the prior as part of the data generating process (i.e. as the parameters as truly random variables), or (b) do the parameters have fixed but unknown true values and the prior is used to encode some prior knowledge about parameters”.

]]>Daniel, I guess I was just giving some loose examples of not-likeliehood-principle-conforming procedures, and when they would be compatible with the rest of the workflow and when not.

]]>For myself, it seems that to be consistent, to compare between models, we use a mixture with a degree of credibility for each model. I think this produces a coherent comparison of models. What do you think?

]]>