I don’t think we disagree on the details, but what got me confused is that you start asking about the solution of one integral and then you solve a different (related) one.

A = Integral_{y in R-ball}[ multinormal(y) dy ]

(for simplicity I take the standard normal with mean 0 and variance I)

which would be solved using a stright foward Monte Carlo calculation as

A = Volume(R-ball) E_{y in R-ball}[ multinormal(y) ] = Volume(R-ball) SUM_{m in 1:M} multinormal(y_m)

where y_m is sampled uniformly in R-ball.

The integral that you calculate is a different one and corresponds to the second factor only

B = Integral_{y in R-ball}[ multinormal(y) p(y) dy ]

where p(y) is a density function which is constant in R-ball but depends on the domain of integration:

p(y) = 1/Volume(R-ball).

The relation between them is of course

B = 1/Volume(R-ball) Integral_{y in R-ball}[ multinormal(y) dy ] = 1/Volume(R-ball) A

> That’s additive error around the true value theta that doesn’t depend on the true value, just the mcmc-se (assuming the estimator’s unbiased, of course).

In this example, looking at your integral B (the average of the multinormal function in the R-ball), as the R-ball gets larger both the expected value and the standard deviation of the Monte Carlo estimator change.

If we consider integral A the expected value of the Monte Carlo estimator is almost constant (as the R-ball gets larger the integral approaches 1) but the standard deviation increases.

We can imagine other cases (for example a “scaling”) where both may change but the relative error remains constant and other cases (for example a “translation”) where the standard deviation remains the same but the expected value changes so the relative error may approach to zero or diverge as you mentioned.

]]>@Carlos: I suspect we’re talking past each other, because my point was very simple. So at the risk of stating the obvious…

In calculus, the volume over which one integrates is explicit, though in an expectation the function is weighted by the probability density,

E[f(Theta) | y] = INTEGRAL_{theta in Theta} f(theta) * p(theta | y) d.theta

With Monte Carlo, the expectation calculation is just

E[f(Theta) | y] = 1/M SUM_{m in 1:M} f(theta(m)),

where

theta(m) ~ p_Theta|Y(theta | y).

The volume part’s implicit in generating the theta(m) in volumes of high probability mass.

The MCMC CLT governs the error of the Monte Carlo expectation estimate as

mcmc-se = sd(theta) / sqrt(ess(theta))

where ess(theta) is the effective sample size of theta(1), .., theta(M).

The error is additive error around the expected value,

estimate = true_value + err,

where the error distribution is

err ~ normal(0, mcmc-se).

That’s additive error around the true value theta that doesn’t depend on the true value, just the mcmc-se (assuming the estimator’s unbiased, of course). Relative error, on the other hand, is a proportion, usually measured in terms of (estimate – actual) / abs(actual). Of course, that doesn’t work for actual = 0 (which is a huge pain for Stan testing).

]]>@Carlos: I suspect we’re talking past each other, because my point was very simple. So at the risk of stating the obvious…

In calculus, the volume over which one integrates is explicit, though in an expectation the function is weighted by the probability density,

E[f(Theta) | y] = INTEGRAL_{theta in Theta} f(theta) * p(theta | y) d.theta

With Monte Carlo, the expectation calculation is just

E[f(Theta) | y] = 1/M SUM_{m in 1:M} f(theta(m)),

where

theta(m) ~ p_Theta|Y(theta | y).

The volume part’s implicit in generating the theta(m) in volumes of high probability mass.

The MCMC CLT governs the error of the Monte Carlo expectation estimate as

mcmc-se = sd(theta) / sqrt(ess(theta))

where ess(theta) is the effective sample size of theta(1), .., theta(M).

But that’s additive error around the expected value,

estimate = true_value + err,

where the error distribution is

err ~ normal(0, mcmc-se).

]]>Bob, I don’t quite understand your discussion about relative and absolute error.

When you say “The textbook Monte Carlo approach to evaluating such an integral is to evaluate …” you’re missing a factor V (the volume of the domain of integration). The value of the integral is the volume times the average value of the function, the Monte Carlo computation estimates that average. In this example, the value of the integral is (close to) one [*]; the discussion about the value of the integral being 0.01 does seem applicable in this context.

[*] For high values of R, which is what you’re discussing. See https://imgur.com/a/yPuXJMx for the example of a 10-dimensional standard normal and R from 1 to 10. The true value of the integral is displayed in red, the boxplots show V*f(x) for 1000 points selected uniformly from the hypersphere of radius R and the blue points are 10 Monte Carlo estimates for the integral obtained averaging 100 points each. As R grows the estimate of (the volume times) the mean of the function in the hyperball gets worse because (the volume times) the standard deviation of the function in the hyperbola gets larger.

]]>Hmm– maybe we should call Word Press “The Dog”.

]]>Carlos: I didn’t mean to imply this integral is hard to compute. It was meant to illustrate that when the summands in an MCMC estimate are mostly zero, the absolute error obeys the expected bounds from the MCMC CLT, but the relative error is terrible. I brought up “arbitrary” integrals because we often talk about MCMC as being a way to compute Bayesian posterior expectations for arbitrary functions of the parameters. Those functions have to be well behaved to produce decent relative error using the plug-and-play MCMC estimate,

$latex \displaystyle\mathbb{E}[f(\theta) \mid y] \approx \frac{1}{M} \sum_{m = 1}^M f(\theta^{(m)})$

where $latex \theta^{(m)} \sim p(\theta \mid y)$.

Edit. Hmm. Looks like math doesn’t work in comments. That’s

E[f(theta) | y] = 1/M SUM_{m in 1:M} f(theta(m))

for

theta(m) ~ p(theta | y)

Anon: Yes, definition should be

vector<lower = -1, upper = 1>[N]

Blame WordPress for eating both of our homework.

]]>Exactly. You just sample from the MvNormal, and then calculate the average of the function F(x) = radius(x) < r ? 1 : 0 across the samples.

]]>in Julia

d = 10

dir = rand(MvNormal(zeros(d),1))

dir = dir/norm(dir)