Is that a problem? That’s why they call the prior minimally informative. “The prior is centered at θ = 0.4118 (VE=30%) which can be considered pessimistic. The prior allows considerable uncertainty; the 95% interval for θ is (0.005, 0.964) and the corresponding 95% interval for VE is (-26.2, 0.995).”

]]>pbeta(0.5,0.700102, 1, lower.tail=FALSE) => 38%

so the prior has 38% chance that the vaccine causes more cases than the placebo

I’ll grant that these things are not particularly well defined. My definition of weakly informative is that the prior has “limited” influence on the posterior over the range of expected outcomes. This is a very squishy definition. So for example the normal(0,1) prior has “limited” influence over the likelihood if the parameter is in the neighborhood of the unit scale.

In this case the prior has “limited” influence on the posterior if the efficacy is in the range of 0%-99%. A flat prior for theta, or jeffery’s prior or flat-on-efficacy will give you posteriors in the same ballpark. This is different than if the researchers had given their best estimate of the prior based on lab, phase I and phase II data. I could easily imagine them coming up with something like beta(1.5,10) truncated to <=.5 based on past data. That would be more opinionated, and philosophically more correct from a Bayesian standpoint.

Say what you will about Jeffrey's priors, at least they provide a concrete definition of "non-informative" to argue against. Otherwise it the "Stan developers'" definition versus the "Fellows Statistics developers'" definition versus …

]]>I was just going to post on this; to my mind beta(1,1) represents 1 pseudocount of success and 1 of failure (with these counts representing the information that both failure and success are possible).

]]>This is just Bayesian NHST, I don’t see what advantage adding an arbitrary prior has over the usual NHST. Bayesian approaches are useful when you have a mechanistic model you want to check.

The real issue is the age/comorbidity dependence of the immune response. They need to compare that to what happens if you just get infected the normal way and do a cost-benefit analysis.

A vaccine with low effectiveness and high risk of ADE in the same people at risk of severe illness from infection isn’t worth it. The people at low risk would have been much better off holding covid parties back in March (then self quarantining for two weeks after the *known* exposure) and ended this then. Your immunity is going to be much more diverse to a real infection than just to one viral peptide.

]]>Ben, Bob:

Jouni’s article is relevant to this discussion. I blogged it back in 2011 and there were 0 comments!

]]>Yeah, if you let alpha, beta -> 0+, you get Haldane’s improper prior. I don’t think it is a good idea to use it in Pfizer’s case or any other case that I can imagine someone modeling successes. But one can see why he argued it was less informative than the uniform. If you start with a beta(1,1) prior, for any finite number of observations, you never get all the posterior probability on 0 or 1 even if the outcomes are all failures or all successes. That’s why that prior conveys the information that both failure and success are possible. If you started with a beta(0+,0+) prior and got all failures or all successes, you would end up with a beta(y, 0+) or beta(0+, n) which are degenerate but do put all the mass on one or the other.

]]>I agree that you could construct a prior via the inverse CDF that is similar to this beta prior or any other beta prior. Also, if Pfizer wanted to put the prior on the vaccine effect and make the success probability a transformed parameter, that would be easy too (at least with Stan). But I don’t think it is good practice to make people start with prior expectations and work out what fully specified prior distribution is consistent with those prior expectations because I think that people don’t have well-thought out prior expectations. It seems to me that people are much more comfortable being pressed about prior quantiles than prior expectations.

]]>https://errorstatistics.com/2020/11/12/s-senn-a-vaccine-trial-from-a-to-z-with-a-postscript-guest-post/

The uninformative prior is noted on p. 111 of the report which is in my comments, but I take it people here are aware of that already. ]]>

You do have to check that the implied (inverse) CDF is increasing on (0,1) but the derivative is usually known, so it is pretty easy to check whether it has a root in (0,1) before you start sampling.

]]>From p, we can derive relative risk and vaccine efficacy (VE). The relative risk = 0.4118/(1-0.4118) * 1=0.700 (corresponding to a VE=30%), same as their number. That is RR= p/(1-p)*C, where p is the conditional prob of events in vaccine conditional on the total events, and C is the randomisation ratio.

The likelihood for p is binomial, so the posterior for p is also beta (0.700102+events in vaccine, 1+events in control) due to conjugacy. From posterior p, we can derive RR and VE.

]]>I meant this more as a discussion of semantics and a general discussion about how to formulate priors with general properties than an argument about which priors to use in this example. Sensitivity analysis shows these priors have no substantial effects on the conclusions. Same with the hyperpriors Andrew and I used in our seroprevalance analysis.

And of course, I’m obliged, being on Andrew’s blog, to note that those statisticians were probably still arguing about likelihoods if they’re the kind of statisticians who like to argue about models. For example, we could’ve had logit or probit or robit models here, there can be pooling of various kinds, different power calculations based on assumptions about effects (very much like a discussion of a prior), different criteria for removing “outliers” from data, different significance thresholds, different forms of parametric or non-parametric hypothesis test, etc.

]]>Thanks, Ben. That’s an interesting perspective I hadn’t thought about. I usually try to stay away from “uninformative” as I’m not even sure what it’s supposed to mean.

I thought “consistent with 0” just meant having non-zero mass there? Or is this a problem because when we transform to log odds, we’re back to vanishing tails?

The problem I have with thinking about this is that it’s uniform on the probability scale, which seems about as “uninformative” as you can get, but of course, when you transform to log odds, it’s logistic(0, 1), which is no longer flat.

I’m also having trouble reconciling that beta(1, 1) is anything more than maybe weakly informative in the sense of not expecting 0 or 1 values, because any central inference is still very close to what you’d get with an improper beta(0, 0). When we fit something like a baseball ability, we wind up with something like a beta(100, 200) prior, which feels much more informative to me, primarily because it involves a high pseudocount.

How can you use an improper prior? Do you just take the limit as alpha, beta -> 0? Doesn’t that still lead to an improper posterior if you only ever see 0 or 1 outcomes? I still have never worked through the math on the Jeffreys beta(0.5, 0.5) prior.

]]>> A vaccine is useless unless the effect is huge :-)

D’oh! That makes sense. You can tell I’m not an epidemiologist :-). And thanks for the clarification in the second note.

I should’ve been clearer—I meant the Stan developers’ notion of “weakly informative”, at least insofar as represented by our wiki of prior choice recommendations.

I’d go further and say that priors only make sense relative to a likelihood *and data*. For example, if we’re doing a regression on length, then normal(0, 1) might be a weakly informative prior for a regression coefficient for a predictor whose units are meters. But then keeping all else the same, if you convert the data to millimeters, you’d have to change the prior to normal(0, 1000) to have the same meaning, even without changing the model in any substantial way. Both might be weakly informative if the posterior has unit scale in the first case and scale 1000 in the second. Standardizing predictors lets us take a better stab at default priors.

I assume “these priors” are Jeffrey’s priors and by “scale invariance” you mean invariance under reparametrizations. The invariance property addresses the issue you mention of flat in theta being beta(1,1) but flat in log-odds being beta(0,0). The Jeffrey’s prior is always beta(0.5,0.5).

In the single-parameter case, Jeffrey’s priors have another interesting property: they maximize the expected divergence between prior and posterior. They are in some sense the “less informative” alternative. These reference priors can also be defined in multi-parameter models but are no longer the same as Jeffrey’s priors.

]]>In I understand correctly, you propose to take to the extreme the idea of prior elicitation using percentiles so you take as input as many as you want and fit a distribution. This flexibility is interesting when you have detailed information to include in the prior but for creating a weekly-informative one it seems overkill.

There is not much difference between the prior they use and what you get using your method and the same median (41% vaccine efficiency). Fixing two quantiles instead of one (setting the middle third to be a vaccine efficiency between -27% and 74%) your method gives a prior that matches pretty well the beta(0.7,1) distribution.

]]>This is similar to stuff I naturally do, which is basically to choose a flexible family and then tweak the parameters until I get several quantiles the way I want them. I also often use the peak density and place that on a particular location. I don’t really care about the fact that it’s a “beta” or a “skew normal” or whatever, just whatever can set up an appropriate shape.

The Chebyshev idea is pretty good, but can result in not a real CDF (decreasing function).

]]>I did my StanCon presentation about such priors in August

]]>> maybe they can get more buy-in from a simple beta-binomial analysis, but I don’t like beta priors

What kind of prior would you have liked better?

]]>That is debatable. As the wikipedia page for the beta distribution emphasizes, the beta(1,1) is the “uninformative” prior distribution that conveys the information that both failure and success are possible. In this situation (any many others), it is totally reasonable to deny that a person is guaranteed to (not) get COVID as a result of the drug. That is in contrast to Haldane’s improper prior, beta(0+,0+), which is consistent with the possibility that the success probably might be exactly 0 or 1.

I’m glad Pfizer went with a more informative prior and maybe they can get more buy-in from a simple beta-binomial analysis, but I don’t like beta priors. It is one of many examples of a probability distribution that was constructed in the pre-computer era to have elementary expressions for its moments, which does not serve the analyst (or the regulators) well when they do not have a well-formed prior expectation, prior variance, etc.

]]>From your first quote,

“A minimally informative beta prior, beta (0.700102, 1), is proposed for θ = (1-VE)/(2-VE). The prior is centered at θ = 0.4118 (VE=30%) which can be considered pessimistic.”

it would seem that they fixed the second parameter to one and solved for 30% efficacy.

]]>You’re right, the trial protocol happens to give some details on what they intended to do. :-)

> What we mean by weakly informative is that the prior determines the scale of the answer.

Who is “we”? Stan developers?

> For example a standard normal prior (normal(0, 1)), imposes a unit scale, whereas a normal(0, 100) would impose a scale of 100.

Are both equally weakly informative? What would be an example of a more or less informative prior? (By the way, Gelman et al. say that the prior can often only be understood in the context of the likelihood. This is an understatement, priors only ever make sense in the context of a model.)

> I’m really surprised they’re only looking at N = 200 and expecting something like n = 30.

Where are this numbers coming from?

> Binomial data is super noisy and thus N = 200 is a small data size unless the effect is huge.

A vaccine is useless unless the effect is huge :-)

]]>