(This post is by Yuling, not Andrew)

Rajib Mozumder, Benjamin Bostick, Brian Mailloux, Charles Harvey, Andrew, Alexander van Geen, and I arxiv a new paper “Making the most of imprecise measurements: Changing patterns of arsenic concentrations in shallow wells of Bangladesh from laboratory and field data”. Its abstract reads:

Millions of people in Bangladesh drink well water contaminated with arsenic. Despite the severity of this heath crisis, little is known about the extent to which groundwater arsenic concentrations change over time: Are concentrations generally rising, or is arsenic being flushed out of aquifers? Are spatially patterns of high and low concentrations across wells homogenizing over time, or are these spatial gradients becoming more pronounced? To address these questions, we analyze a large set of arsenic concentrations that were sampled within a 25km2 area of Bangladesh over time. We compare two blanket survey collected in 2000–2001 and 2012–2013 from the same villages but relying on a largely different set of wells. The early set consists of 4574 accurate laboratory measurements, but the later set poses a challenge for analysis because it is composed of 8229 less accurate categorical measurements conducted in the field with a kit. We construct a Bayesian model that jointly calibrates the measurement errors, applies spatial smoothing, and describes the spatiotemporal dynamic with a diffusion-like process model. Our statistical analysis reveals that arsenic concentrations change over time and that their mean dropped from 110 to 96 μg/L over 12 years, although one quarter of individual wells are inferred to see an increase. The largest decreases occurred at the wells with locally high concentrations where the estimated Laplacian indicated that the arsenic surface was strongly concave. However, well with initially low concentrations were unlikely to be contaminated by nearby high concentration wells over a decade. We validate the model using a posterior predictive check on an external subset of laboratory measurements from the same 271 wells in the same study area available for 2000, 2014, and 2015.

For a long time, households with elevated arsenic levels have been recommended to switch to a neighbor’s safe well. Modeling the well arsenic is a familiar statistical problem that has appeared frequently in Andrew’s books and our other methodology papers. Our new paper addresses a practical question: if one switches to a tested-safe well, should they worry that the safe well may eventually be contaminated by surrounding high arsenic wells? Clearly groundwater may mix and such mixing or diffusion could eventually drive all well arsenic converge to some identical level—then why bother to switch?

To start, we worked with a small dataset containing 271 wells that were tested in 2000,2014 and 2015. If we make a linear regression of the measurements, the regression coefficient is smaller than one. Expect this is just measurement errors or regression-to-the-mean. Expect having regression-to-the-mean also does not exclude the possibility that groundwater might be actually mixing.

For the main analysis in the paper, we work with two blanket survey datasets that tested almost all wells in this area in 2000 and 2012. The statistical challenge is that (a) most wells had been reinstalled and (b) the test in 2012–2013 was conducted using field kits, which exhibited a large bias and variance.

We fit a Bayesian model that (a) calibrates this measurement error using an ordered logistic regression, (b) spatially smoothes the well arsenic surface by bivariate splines, and (c) learns the mixing dynamic.

For example if we believe the diffusion is main driving force of the arsenic mixing dynamic, we would like to estimate a diffusion equation: to regress the over-time change on Laplacian, which can be extracted from the spatial model:

Of course here we are unlikely to have a pure diffusion. We can still make some regression on Laplacian to understand the mixing dynamic. The main finding from this paper is that the largest decreases occurred at the wells with locally high concentrations where the estimated Laplacian indicated that the arsenic surface was strongly concave, while well with initially low concentrations were unlikely to be contaminated by nearby high concentration wells over a decade. This finding should be reassuring for households who rely on well-switching to reduce arsenic exposures.

Statistically, we do not invent anything new. But it does take some non-trivial efforts to combine all pieces: we know how to run spatial smoothing but now we also have biased measurements; we routinely fit differential equations in Stan but now we are dealing with a PDE on latent variables; the sample size is not very big but we have an overparameterized model (16950 free parameters) so some sparse matrices are required, etc. I am happy that some detailed statistical analysis can help solve real science problems.

This is great stuff, Yuling. Thanks for posting!

This relates to three of my favorite things about this blog:

1. Causal inference, water sources and shoeleather (¡my first comment!): https://gelmanstatdev.wpengine.com/2013/03/14/everyones-trading-bias-for-variance-at-some-point-its-just-done-at-different-places-in-the-analyses/#comment-143862

2. Population health and drinking water interventions: https://gelmanstatdev.wpengine.com/2016/01/07/28459/#comment-258052

3. Snarky commenting opportunities of all types: https://gelmanstatdev.wpengine.com/2017/07/24/recently-sister-blog-4/#comment-529779

Now as a researcher working on health issues in developing countries and having worked in Bangladesh and having thought about this arsenic problem before…. oh darn, dinner is here gotta go.

So how expensive is it to test for Arsenic? Why not just gave a protocol to test every year?

Or is there a need for a cheaper field test?

Rather than “why bother switching?”, I suspect the more important question may be “why allow your neighbor to switch to your well if there is a risk of drawing contaminated water in?”

The good news is that this offers some evidence that there is low risk to helping your neighbor.