Markov chain Monte Carlo is a key computational tool in Bayesian statistics, but it can be challenging to monitor the convergence of an iterative stochastic algorithm. In this paper we show that the convergence diagnostic R-hat of Gelman and Rubin (1992) has serious flaws. R-hat will fail to correctly diagnose convergence failures when the chain has a heavy tail or when the variance varies across the chains. In this paper we propose an alternative rank-based diagnostic that fixes these problems. We also introduce a collection of quantile-based local efficiency measures, along with a practical approach for computing Monte Carlo error estimates for quantiles. We suggest that common trace plots should be replaced with rank plots from multiple chains. Finally, we give recommendations for how these methods should be used in practice.
This article is the culmination of several years of discussion and examples, starting in 2014 or so when Kenny Shirley came across an example where multiple chains had countervailing trends, which motivated the development of split R-hat. It’s fun to be able to shoot down and then improve my own method!
I expect that we and others will do more work on this and related areas as we continue improve aspects of Stan (and statistical computing more generally) involving adaptation and convergence. For a start, see this vignette by Jonah Gabry and Martin Modrak, “Visual MCMC diagnostics using the bayesplot package.”