## Weights in statistics

Thomas Lumley writes:

There are roughly three and half distinct uses of the term weights in statistical methodology, and it’s a problem for software documentation and software development. Here, I [Lumley] want to distinguish the different uses and clarify when the differences are a problem. I also want to talk about the settings where we know how to use these sorts of weights, and the ones where we don’t. . . .

I agree with Lumley!

Weighting causes no end of confusion both in applied and theoretical statistics. People just assume because something has one name (“weights”), it is one thing. So then we get questions like, “How do you do weighted regression in Stan,” and we have to reply, “What is it that you actually want to do?”

And then there’s this whole thing where people do poststratification weighting and think they’re doing inverse probability weighting; see Section 3.3 of this article with John Carlin to see why these two sorts of weights are different.

Here’s what we wrote about weighting in Section 10.8 of Regression and Other Stories:

Three models leading to weighted regression

Weighted least squares can be derived from three different models:

1. Using observed data to represent a larger population. This is the most common way that regression weights are used in practice. A weighted regression is fit to sample data in order to estimate the (unweighted) linear model that would be obtained if it could be fit to the entire population. For example, suppose our data come from a survey that oversamples older white women, and we are interested in estimating the population regression. Then we would assign to survey respondent a weight that is proportional to the number of people of that type in the population represented by that person in the sample. In this example, men, younger people, and members of ethnic minorities would have higher weights. Including these weights in the regression is a way to approximately minimize the sum of squared errors with respect to the population rather than the sample.

2. Duplicate observations. More directly, suppose each data point can represent one or more actual observations, so that i represents a collection of w_i data points, all of which happen to have x_i as their vector of predictors, and where y_i is the average of the corresponding wi outcome variables. Then weighted regression on the compressed dataset, (x, y, w), is equivalent to unweighted regression on the original data.

3. Unequal variances. From a completely different direction, weighted least squares is the maximum likelihood estimate for the regression model with independent normally distributed errors with unequal variances, where sd(ε_i) is proportional to 1/√w_i . That is, measurements with higher variance get lower weight when fitting the model. As discussed further in Section 11.1, unequal variances are not typically a major issue for the goal of estimating regression coefficients, but they become more important when making predictions about individual cases.

These three models all result in the same point estimate but imply different standard errors and different predictive distributions. For the most usual scenario in which the weights are used to adjust for differences between sample and population, once the weights have been constructed and made available to us, we first renormalize the vector of weights to have mean 1 (in R, we set w <- w/mean(w)), and then we can include them as an argument in the regression (for example, stan_glm(y ~ x, data=data, weights=w)).

1. Roger H says:

Stata was well ahead of the curve in supporting different types of weights – it calls your three cases above pweights, fweights and aweights respectively (there’s also last a rarely-used command-specific ‘catch-all category’, iweights) https://www.stata.com/help.cgi?weight. It’s had this feature ever since I first used it which was version 7 released in 2000, probably longer.

2. John Hall says:

In time-series models, you can use weights to give the more recent period greater weight in the estimates than later periods, rather than directly modelling the fact that the estimates have changed over time.

3. Daniel H. says:

Suppose I have two sources of data, one high quality and one noisy? I could imagine… A) use weights to favor the good data, B) try something like a multilevel model or C) use the noisy data to derive priors for the good data analysis. This is a theoretical question, I don’thave an example here.

• Andrew says:

Daniel:

Such an example is here, where we had a small number of high-quality measurements and a large number of noisy, biased measurements. Weighting is not the way to go here. We used a hierarchical model.

4. Keith O'Rourke says:

In addition to Andrew’s example, the challenges with weights are discussed here – On the bias produced by quality scores in meta‐analysis, and a hierarchical view of proposed solutions https://academic.oup.com/biostatistics/article/2/4/463/321492

5. randall says:

…how should Weighting be properly applied to typical Public Opinion polls that we see in the media almost daily?

These polls routinely seem to have a ‘large number of noisy, biased measurements’ that are then ‘adjusted’ by the polling agencies.

• Andrew says:

Randall:

Yes, pollsters adjust for known differences between sample and population; see here, for example.