Dan, thanks for the answer and pointer to the paper. But my question was meant more generally: Are there specific Bayesian ways to deal with ill-posed problems? Or is the idea that nothing special is necessary because having a decent prior allows one to deal with well-posed and ill-posed problems in the same way?

]]>I was being too vague (or wrong?).

It was in reference to case study 1 in the link to the Representists versus Propertyists post where the repeated sampling performance from a “silly prior” (flat) shows a property that is usually taken as good (e.g. uniform coverage of credible intervals that match confidence intervals) is actually poor in some meaningful sense when a sensibly prior is used (informative for small effects in a given context).

So not that you need a sensible prior to get various good frequency properties but that to know if the properties are really really good (good for what?) that should be evaluated under a sensible prior. On the other hand, if there is reason for the property to be taken as all important/beyond question – then the silliness can be disregarded.

]]>* If penalizing corresponds to a silly prior that should not be disregarded without very good reasons*

… such as it also corresponding to a sensible prior. Penalized estimates can correspond to multiple priors – an entire equivalence class of them – so just one of these priors being “silly” is a weak criticism of the penalized estimate.

]]>I just assumed he didn’t agree, which is fair enough.

]]>I wouldn’t do this if I needed a full predictive distribution. I mean that I can make predictions (just the mean) for new observations before seeing any data using my prior mean and afterwards using the model I’ve fit.

]]>It wasn’t clear to me whether Keith missed that point or if he got it but just wanted to discuss a different point, so I forebore to comment

]]>> If penalizing corresponds to a silly prior that should not be disregarded without very good reasons and when Bayes leads to poor repeated sampling performance that should not be disregarded without very good reasons for sticking with the prior.

I agree with the last part. I don’t agree with the first part. Because a prior has to deal with things like containment and penalties don’t, you can get away with much simpler penalties.

]]>There definitely is a lot of this about – you can always consider a penalty as a log-prior. But that prior may not work well (and for a lot of famous penalties, it doesn’t).

For example, this paper by Lassas and Siltanen shows that the Kimeldorf and Wahba stuff doesn’t hold if you replace the L2 norm with an L1 norm

http://iopscience.iop.org/article/10.1088/0266-5611/20/5/013

I’d need to know a bit more a about the problem to have a firm opinion, but my default method for priors on autocorrelation is this: https://arxiv.org/abs/1608.08941

]]>You still need to get the parameters from somewhere to build a frequentist predictive distribution. Where do they come from?

]]>http://gelmanstatdev.wpengine.com/2017/04/19/representists-versus-propertyists-rabbitducks-good/

But seriously, I think the biggest loss of opportunity comes from ignoring the prior when penalizing and ignoring repeated sampling performance when doing Bayes.

If penalizing corresponds to a silly prior that should not be disregarded without very good reasons and when Bayes leads to poor repeated sampling performance that should not be disregarded without very good reasons for sticking with the prior.

Rod Little’s argues for the later in this talk here – https://ww2.amstat.org/meetings/ssi/2017/onlineprogram/AbstractDetails.cfm?AbstractID=304106

]]>Kimeldorf and Wahba (1970), A Correspondence Between Bayesian Estimation on Stochastic Processes and Smoothing by Splines (https://projecteuclid.org/euclid.aoms/1177697089)

Akaike (1979), Likelihood and the Bayes procedure.

There is quite a large time series literature from that time about the relationship between smoothness priors and penalized likelihoods. In particular work of Kitagawa and Gersch (summarized in 1996 in http://www.springer.com/us/book/9780387948195). While these days most of the literature points to the “british” take on time series decomposition, this sphere of work predates that work IMO, though they lead to the same place. Basically like the Kalman smoother/filter can be derived as a Bayesian solution of kind.

]]>I agree with this sentiment, but i think it’s much cleaner to say “use penalised maximum likelihood with an appropriate penalty”. Not everything needs to be “Bayesian” (especially not things that aren’t* Bayesian).

* I try really hard to stay away from drawing a line in the sand around what is and isn’t Bayesian, but if you change your “prior” based on your estimation method (you’d never use a boundary avoiding prior with a full posterior calculation – it’s not what they’re made for), then it’s not a prior.

]]>It’s also worth noting the point made in BDA: sometimes it makes a lot of sense to use a different prior if you plan on using a MAP estimate rather than sampling from the posterior. Your favorite Lasso example is clearly such a case, and BDA also points out that, especially with variance terms, zero avoiding priors are important for MAP estimates, even if they don’t reflect prior belief of parameters.

]]>Without having watched Lance’s talk, I think the connection between penalties and priors can be big, useful news for some researchers. I work with people that work with ML methods and think of Bayesian methods as “something that takes too long”. A group of them seemed surprised that penalties could be chosen intelligently using something other than cross-validation, which is not tractable for even a few parameters.

But of course, your point that while MAP estimates are exactly equivalent to a certain penalized MLE method, MAP estimates are not exactly the golden standard for Bayesian estimation.

]]>