So, once early voting is undewrway, should polls be weighted by the proportion of voters who’ve already cast their vote?

]]>Bingo, frankly Andrew is way to smart to not realize this…

The purpose of polling is not to measure public opinion, rather to shape opinion…

If you care about prediction focus on running your only objective polling not playing with fun models..

]]>@Justin This isn’t the relevant point. No one has voted yet. But lots of people will have voted by the end of September, roughly halfway to the offical election day. So, there is less time for them to change their minds than Nate Silver seems to imply when he talks about his model.

]]>Politifact discussion of

1. Biden lies

There are some real hooters in there, high recommended.

Not like anyone needs it, but for balance’ sake:

2. Trump lies

]]>If an election were to be held tomorrow and “prediction” meant simply no more or less than an accurate ascertainment in advance of the counts in two categories “A” and “B” (e.g. “pro” or “con”) then if it were possible to [1] survey every individual voter today; and [2] we knew on the basis of prior experience or – what ought amount to the same – some iron psychological law that the voters in this district always respond truthfully and moreover do not change their minds; then our complete enumeration today would constitute a prediction of the outcome tomorrow, and one in which would be be justified in regarding as certain. In this situation – of complete enumeration of truthful voters – we would be justified in attaching probability of “1” to our prediction; but doing so would amount to a superfluous embellishment given the peculiar ground in which our survey was done. Let us say that the total number of votes is N, the sum of A and B; and that A is to win so that A is greater than B. We make our “prediction” that A wins with (A of N) of the votes and to this prediction we attach a probability statement: our survey procedure (with all its attendant assumptions) tells us we may regard this prediction as certain – we may as well say “Our prediction comes with an error probability of 0”. We may as well say “The probability that A wins with (A of N) is 1”. But it must be understood that the probability of which we speak has its ground in the nature of our sampling procedure; and our strong assumptions about the behavior of the voters.

Let us take a small but significant step in the direction of realism. Suppose that the voters are still known to be truthful when they respond on Monday; and that they do not change their minds before they vote on Tuesday (both these on the basis of long prior experience with voters in this district or – what amounts to the same – from understanding of some ‘iron’ psychological law which holds good for the folk there). We also suppose that all who respond on Monday are able to vote on Tuesday – that no one fails to cast a vote, by reason of illness or indisposition; and that all votes are indeed counted; and that moreover the count itself is not subject to error. These are many caveats and they are preposterously unrealistic. But holding that aside, we bring in the essential degrees of freedom without which a realistic survey cannot be described. That is sampling variation. We do not survey all the voters, rather we survey n smaller than N. And this survey sample is partitioned into two groups n , the sum of a and b. Under the same conditions as above, the only plausible prediction is that A will win a fraction (a of n) of the total votes. If (a of n) is greater than (b of n) we are compelled to “predict” that A will be the winner.

But having introduced the smaller sample n we cannot assert with complete confidence as before that if a is greater than b then it will be the case tomorrow (all our assumptions holding good: the truthfulness, veracity, ability and longevity voters; and the reliability of the vote-counting) that A will be grater than B. For there is the possibility of reversal. In this idealized setting we can quantify the probability of reversal by counting the number of ways we may query n individuals falling into two categories: a A’s from b B’s from a total population of N. The number of such selections is (A,a)*(N-A,n-a) [I have written the combinatorial coefficient sideways because of typographical limitations]. And the number of such selections which represent a reversal are simply those for which a are less than n – a: i.e. for which a is less than half-n. If we regard our survey as one among many parallel surveys of exactly the same design on that same Monday afternoon, on any reasonable interpretation of the word “probability”, we may quantify the degree to which such a reversal occurs by the sum of terms for all a greater than half-n of the hypergeometric probability (A,a)*(N-A,n-a) divided by (N,n). The upper limit of this sum depends on the order in which the four numbers n, half-n, A, N stand in relation to one another; and a careful attention to this reveals that the problem breaks down into a handful of cases. Put another way, the propensity for our survey of size n to not suffer from reversals depends to some extent on how the sample size n stands in relation to the (unknown) outcome count A.

For definiteness call this probability of reversal R = p( A less than B given: a greater than b; a,n,A,N).

Note that the parametric dependencies (a,n,A,N) after the “;” cannot be dispensed with.

Now say on Monday evening we predict “A” will achieve a fraction (a of n) of the votes and if that fraction is greater than (n-a) of n we predict that “A” will win. We also attach to our prediction a quantification of the uncertainty which goes along with it. This is the reversal probability “R”.

We may give our prediction as follows: “The prediction of A’s share is (a/n) and the probability that A wins is the non-reversal probability 1 – R. When one says “The probability that A wins is 1-R” it must mean this.

In this simple idealized setting, the factors bringing uncertainty into our prediction are not the minds of the voters (they are rigid and truthful); not their propensity to vote on Tuesday (all of them shall); not the propensity of the vote-tabulation to be less than accurate (it is perfect). The only factors in this model in which uncertainty is grounded and which therefore propagate uncertainty into our prediction is the uncertain representativeness of our sample – here due entirely to the choice of a single sample of size n out of (N,n) samples of size n drawn from the population of size N.

The prediction here is thus a point-estimate; and the probability is a “margin of error” due entirely due to sampling variability. We can imagine the whole thing being repeated in the sense of a large group of similar samples of size n taken on the same Monday with the same method; which differ only insofar as different individual voters were queried.

Now see how complications of this sort multiply as soon as we introduce more realism into the description – i.e. degrees of freedom in the model: [1] subjects may not answer truthfully; [2] subjects may change their minds; [3] not all subjects who were queried will actually vote; [4] not all votes are counted; [5] not all votes that are counted are counted accurately; [5] That time passes between the event of the survey and the election; [6] That time-series of such surveys indicate a trend in the responses; etc. But the qualitative nature of the prediction will be the same: I come up with a point estimate a of the number who will vote for “A” and I quantify my certainty in that prediction with a non-reversal probability i.e. my confidence that given (a greater than b) I will find (A greater than B) after the election.

]]>jim –

> But then even some NPR commentators acknowledged that there’s a legit reason: first class mail volume has been and still is falling through the floor.

I”m not sure that stacks up as a reason for the policy changes to be implemented at this particular point in time. Eve if, overall, it increases efficiency it still might have a particularly negative impact along a particular metric – people’s ability to vote.

Do you conclude that this isn’t happening because of a political motivation for voter suppression?

]]>Keith –

> I’m more interested in the trend for polling to underestimate Trump support.

For the most part, the error in the polling was within the margin of error. My understanding (and I would appreciate being corrected if I’m wrong) is that the larger error in predictions for the overall outcome came from underestimating the chances for an overall alignment of direction of error – and in particular among a key set of swing states.

My guess is that you think that there is a “shy Trump voter” effect? If so, do you have an evidence basis for believing that to be true? I’ve seen some arguments that there isn’t such an effect.

-snip-

Were there any possible factors for which you didn’t find evidence?

Yes. Take the hypothesis that there’s a segment of the Trump support base that does not participate in polls. If that’s true, that’s a huge problem for organizations like ours, and we need to study that and understand it if we’re ever going to fix it. But we looked for evidence of that, and we didn’t find it.

If it’s true that we’re missing a segment of the Trump support base, we would expect to find – without doing any fancy weighting, just looking at the raw data – that people in more rural, deep-red parts of the country would be underrepresented. And we didn’t find that; if anything, they were slightly overrepresented. We did a number of things with a critical eye looking for those types of problems, and did not find them. And so that gave me real reassurance that fundamentally, it’s not that the process of doing polls was broken last year.

-snip-

https://www.pewresearch.org/fact-tank/2017/05/04/qa-political-polls-and-the-2016-election/

And

–snip–

The bottom line is that Trump did better than the polls predicted, but he didn’t do so in a pattern consistent with a “shy Trump” effect. It’s more likely that polls underestimated Trump for more conventional reasons, such as underestimating the size of the Republican base or failing to capture how that base coalesced at the end of the campaign.

–snip–

https://fivethirtyeight.com/features/shy-voters-probably-arent-why-the-polls-missed-trump/

Do you have reasons to doubt these analyses? If so, what are they?

> I think that was a factor last election, and I think it might be a much bigger factor this time.

“I think it might be” is a pretty broad statement. Do you think it will be? If so, on what evidence do you base that opinion?

]]>Pasquale –

> Too much at stake for those seeking to protect or attain the power and riches that come along with victory to expect fairness and honesty from them.

Could you elaborate on what you think is a manifestation of dishonesty?

]]>Pasquale:

You could be right. I work on many projects, and I accept that some of them will be a waste of time. That’s the nature of research.

]]>Ryan:

The fundamentals model gives a multivariate normal prior (with a wide variance) for the vote in the 50 states on election day. The polls are informative about current opinion (we also include allow for nonsampling error in various ways). We have a time series model for trends in public opinion during the campaign. Regarding the mean-reversion thing, see this discussion.

]]>[state_mean_it = state_mean_it-1 + phi*(state_fixed_model_i – state_mean_it-1) + u_it ] and [correlated u_it across i based on state-level factors] and [model for varience of u_it] and [national_mean = weighted sum of state means] captures some of the dynamics he’s talking about where 1) based on partisanship and other “fundamental factors” there is an attractor that you expect the polling to hover around and 2) polls may not have a unit root on any scale, i.e. many changes in polling are ephemeral and over time people forget and revert to their partisan tendencies. I guess this a restatement of the prior debate where the big question was “do polls mean-revert” except with a specific model for the mean they’re reverting to. There is (I think) justification for pure mean-reversion since politicians can change their message depending on the current polling.

I am also very skeptical of extending fundamentals models much further into the past. The measures of interest are only vaguely available, so of course they perform less well.

]]>Jeff,

Obv, Cool as a fox :)

Keith:

In 2016, polls underestimated Trump support in some states, not others. I think the underestimation came from nonrepresentativeness of the sample; see discussion in this article.

]]>Polling is Broken, not longer just biased and unreliable.

Until that’s fixed (impossible), Nate Silver, Sam Wang, Princeton bla bla bla are as useful as the National Enquirer when trying predict the presidential election.

Too much at stake for those seeking to protect or attain the power and riches that come along with victory to expect fairness and honesty from them.

]]>What would be the cause of the underestimate? Are people afraid to say they support Trump or will vote Republican? Are they wavering or undecided? If they’re switching at the last minute, why are they doing that?

]]>Actually at first I was *extremely* fried about that. But then even some NPR commentators acknowledged that there’s a legit reason: first class mail volume has been and still is falling through the floor.

]]>Certainly. I originally encountered it in Isaiah Berlin’ book. Only meant that Tetlock popularized it in the TED/ Long-Now talk circuit in the 2000s.

]]>Chetan:

The fox and hedgehog thing long preceded Tetlock! Also see here.

]]>And the instance in which the fox is wearing headphones wants to draw attention to the podcast. And glasses = smart, obvs.

]]>I’d wager the newer fox logo is inspired by the Hedgehog (top-down big theory) vs. Fox (eclectic generalist) dichotomy popularized by Philip Tetlock.

Question is: is that a stated preference or revealed preference?

]]>something which we should have a right to talk about if universes

were as plenty as blackberries, if we could put a quantity of them

in a bag, shake them well up, draw out a sample, and examine them

to see what proportion of them had one arrangement and what

proportion another. But, even in that case, a higher universe

would contain us, in regard to whose arrangements the conception

of probability could have no applicability. ” Peirce.

Now substitute “outcome of an election” for “arrangement of Nature” and “elections” for “universes.

]]>Anon:

The “delusional” thing is just silly. Elliott knows what he’s doing! Regarding our strategy vs. Nate’s: there are a bunch of differences. The clearest differences is how we’re handling the polls: we’re including state and national polls, allowing nonsampling error and time trends using a hierarchical Bayesian model (you can see details in our Stan code). I’m not quite sure what Nate’s doing but I think it’s some sort of weighted average, and that sort of thing becomes tricky when you’re trying to juggle many sources of uncertainty and correlation. Regarding the prior or fundamentals-based model: there’s no agreed-upon way to do this. I respect that Nate’s doing what he thinks is best, but ultimately N is small and you have to make judgment calls and state your assumptions clearly.

As to cross validation, that’s not such a big deal one way or another. No matter how you slice it, you’re gonna be trying out different models, fitting them to past data, and looking to see what they imply for the current election. Cross-validation is just one way of assessing that fit: it’s neither perfect nor horrible. Cross validation is just one more tool. I don’t agree with Nate that there’s useful signal going back to 1880, but I do think there’s signal in past elections, of course. Indeed, I feel that the people who would deny the value of past elections are overfitting to a single past election, 2016.

]]>I wasn’t talking about 538’s graph. I was talking about the graph on this blog post, https://gelmanstatdev.wpengine.com/wp-content/uploads/2020/08/Screen-Shot-2020-08-13-at-7.59.45-PM.png, introduced by “We give Biden a 78% chance of winning the state, and here’s our forecast of the two candidates’ vote shares in Florida:”.

]]>Thanks for the reply, Andrew.

Could you say more why you think your strategy makes sense vs. Nate’s? Why is cross-validation on a small sample a good idea here and why is Nate Cohn incorrect when he says you’re delusional? I don’t think the Nates are correct here in the sense that we’re talking about the differences in approach vs. the side effects of some kind of psychological defect. But I am wondering if there is a way to further unpack this to explain the modeling reasoning.

Fwiw, I think CV/oo sampling is reasonable because you’re taking an approach that says there is signal in past elections and in the covariates of those elections in predicting the outcome that are relevant to the current election. That seems.. reasonable. The idea that, according to Nate Cohn, if you are interested in weather forecasting, this becomes a foolish predicate, just doesn’t add up for me. The fact he doesn’t unpack his argument much further tells me his reasoning isn’t all that deep, beyond just saying the CV sample is small. But Nate C. has a lot of very smart ideas about polling so I’m wondering if there’s more to say here that I’m missing.

]]>I think clicking betweeen the tabs ‘Today’ and ‘4 years’ makes disappear and return

]]>Jeff:

That’s weird. When I copy the link from Alex’s comment and put it in my browser, I see a time series of approval ratings with no dots. When I click on the link from your comment, I see the time series with dots included. And they’re the same url! I have no idea what is going on here.

]]>I think Alex was saying that they already have the dots for the approval ratings:

]]>This is just crystal-balling, but I think both of those probabilities are well below 1%, so probably not enough to influence things.

I’m more interested in the trend for polling to underestimate Trump support. I think that was a factor last election, and I think it might be a much bigger factor this time. I wouldn’t be surprised if there was a two point shift in results from that alone. I don’t know how you could sensibly forecast that, though.

]]>I heard some questions at Stan-con last night about using different methods in different application areas. Comparing methods is like comparing apples and bicycles. Would I eat a bicycle? I could try, but it wouldn’t work very well. Would I use an apple as transportation? I wouldn’t get very far. But they’re both extremely useful.

Similarly, if you think that a bicycle is useless, it probably means you don’t know how to ride it. If I was a whale from mars, and I came to earth not knowing what a bicycle was because I don’t have legs, then a bike would be pretty useless to me.

I think it’s important to think about application area more than the method itself. Certain methods have been used successfully in certain application areas for years.

]]>>> Is any statistical model for a future election flexible enough to include the possibilities of an outbreak of war, assassination(s)or candidate heart attacks?

Wasn’t it said in an earlier post that the model assumes Trump and Biden are in fact the candidates?

]]>And actually, the “outcome might be in doubt” in at least one other way: what if either candidate catches COVID? If one candidate is in the hospital on Election Day (or during early voting), that could swing the vote, potentially.

]]>1) is basically zero. The US system is actually really hard to alter to that degree. Also, if there’s no election, IIRC Pelosi automatically becomes President on Inauguration Day.

There is no *way* Trump has enough support to get around this. I don’t think any US president in the remotely modern era could, but if there were one, it certainly wouldn’t be him!

>>If we have a normal election it’s hard to imagine that the outcome is in doubt.

All else being equal, yes, but all else might not be equal.

The one big caveat I see is if other nations start vaccinating their population by October, and then we do, and *Trump gets credit for cutting through bureaucratic red tape to get it approved*.

Just a vaccine approval wouldn’t be enough, though, IMO.

]]>Andrew and Natalie,

Yep, we excluded Emerson because they use MTurk, and HarrisX because of both shady data transparency and writing clearly biased questionnaires (maybe we can chalk this up to a Penn factor). The *and* is important because it shows they’re not just getting a bad sample but that they don’t really care.

Now that I’m more awake, I also looked at our rules (nb Nate S, not “subjective” criteria) and we also exclude numbers from John Zogby and American Research Group for similar transparency reasons. We keep Gravis because, though IVR is crappy, at least they pick up the phone and tell us what is going on.

I really don’t know what to do about Rasmussen. They list most of the info on their site, but the underlying demos are clearly all wacky. I’m not convinced it’s enough to say “this poll looks weird,” though; I feel the need to base the exclusion on other factors too. Weird data happens all the time!

On the other hand, if your poll is routinely biased toward Trump by 5-10 points, *and* it jumps around wildly by nearly as much, that suggests both that is has more error than you can remove by just adding a house effect and that it will probably add less value of the estimate because of increased measurement error.

Phil:

Yes, you’re right. I actually would not say that Nate is underfitting; rather, I might say that he’s overfitting to 2016. Or maybe that corresponds to underfitting to earlier elections. To the extent that he’s basing his decisions on data going back to 1880 (and I guess he is; otherwise why would he mention it?), I think he’s overfitting to those past data which I don’t are relevant to what’s going on now.

]]>I mostly agree with this, except for the part about “Overfitting” being just a word. I mean, of course there is a literal sense in which it is just a word. But it is a word that is intended to represent a concept. The word “overfitting” is just a word, but overfitting is a real phenomenon.

Overfitting is bad.

Unfortunately, underfitting is also bad.

Nate can say you’re overfitting, and you can say he’s underfitting. Or maybe you are both overfitting or both underfitting. More likely, you are both overfitting in some ways and underfitting in others. Indeed, I would say that’s about the best you can hope for! It’s not like you can expect to fit exactly the right model in exactly the right way. I have often pointed out to my colleagues, and sometimes to my clients, that it is unrealistic to have the goal that your estimate is exactly right; what I am aiming for is not knowing which way it’s likely to be wrong.

Anyway, I’m not saying it’s worth arguing about this, I’m just saying that if you’re arguing about whether you’re overfitting, you aren’t arguing about a word but about a concept.

]]>My understanding is equally simplistic, I guess. If the uncertainty in the current state leads to +/-x, the uncertainty in the future state should (at least usually) be larger. Events between now and November could move things either direction. Imagine some kind of uncertainty multiplier for each electoral district, with the multiplier decaying towards unity as Election Day approaches…I don’t know how one would estimate what it should be right now, but it wouldn’t be 1.00000.

]]>Anon:

Nate Cohn wrote to Elliott Morris: “I think you’re overfitting models on a small dataset and using regularization and out-of-sample cross-validation to delude yourself into thinking that’s not what your doing.” I don’t think Elliott, Merlin, and I are deluding ourselves. We know that we’re fitting models on a small dataset and using regularization and out-of-sample cross-validation! Are we “overfitting”? That I don’t know. “Overfitting” is just a word. We’re doing our best, that’s all.

Nate Silver wrote, “Given the nature of election data, you don’t want do to too much optimization.” I agree. You should never do too much of anything.

One thing that I’ve been thinking about is that when people talk about being conservative and not overfitting, in practice this can mean that they’re relying really heavily on the experience of 2016. Then what they’re doing is overfitting to 1 data point.

Finally, sure, I think that using an “uncertainty index” can be a good idea. No model can contain everything.

]]>Yeah, I get that.

But I throw it out there because we’re talking about the 2020 election and it’s a serious concern. We need serious people to focus on it.

]]>That’s a prediction you can make, but to be clear, it’s very divergent from 538 (and I’d assume The Economist’s) forecast.

Nate specifically wrote a long paragraph saying how this model is conditioning on the fact that democracy generally proceeds like it has in the past, i.e. no unprecedented interference. Gelman never made that statement explicit, but nowhere in his model does he try and account for that.

Wondering what these probabilities are is fair, but there’s little point in Gelman or Silver chiming in, none of their skills would be relevant at that estimation. But it’s important to note that “hard to imagine the election is in doubt” goes right against the 538 forecast. You might just disagree with Nate, but he thinks the election is very much in doubt, even in reasonable circumstances.

]]>Nate Cohn called G Elliott (and by association, you) deluded because you use oos cross-validation to test the model’s viability.

See the thread here https://twitter.com/Nate_Cohn/status/1293602797196369922

and https://twitter.com/NateSilver538/status/1293610859126755330

And Nate’s longer reply here https://thecrosstab.substack.com/p/what-makes-a-model-good-august-9

And

Last, what do you think about creating an “uncertainty index” as a bag of indicators?

]]>jim –

> 2) serious tampering happens (direct or backdoor means, such as messing w/ USPS)?

Not clear where it’s going longer-term or even what the effect might be (i.e, a backfire effect if people get their SS checks late?), but that happening is looking less and less like an uncertainty:

]]>Further, the 538 interface makes this hard to see, but it seems the 538 prior is moving things even more strongly towards 50%. Their current national polling average is +8.5 Biden which would point to a two-party vote of 54.25%-45.75%, but the projection is 53-46. Some rounding is obviously happening there, but regardless 538 seem to be expecting more than a 1% shift in the vote share vs. less than half a point for the Economist.

]]>Andrew –

> which pretty much corresponds to what we think would happen if the election were held today.

Excuse my simplistic understanding, but…

So then is it true there is a very big difference, in that for your forecast there isn’t much change between “if it were held today” and your overall estimate, as compared to a big change in the 538 forecast? If so, wouldn’t that be a really big difference and the easiest way to see that they build in a lot more uncertainty?

]]>That’s like modelling card probabilities when you’re playing against Bugs Bunny.

]]>Joshua:

The projections of model today correspond to our estimates of today’s public opinion, which pretty much corresponds to what we think would happen if the election were held today (although this is kind of a weird way to think about it, given that people know the election will not be held today).

]]>It’s partially pooling toward the fundamentals-based prediction.

]]>Andrew –

Related…your projections for popular vote %’s change from Biden at 54.5% today to 54.1% on election day. The drop is obviously very small, relatively, but I’m curious if you have a general sense for why it changes? I”m assuming it’s not just random variation?

]]>Yea, I’d love to see the analysis he did to determine how to weight that because I do think the 538/Economist uncertainty difference is a little larger than Andrew’s illustration implies.

I think that Andrew’s examples above undersell the uncertainty difference a bit. The Trump EC advantage is looking closer to 2 points than the 3 he used in his final example. So to fit the two models probabilities you would need:

Econ: pnorm(0.54,0.51,0.025)=89%

538: pnorm(0.53,0.51,0.035)=72%

That would imply an increase of 40% in the assumed SD vs the 25% needed in Andrew’s example.

That said, I take Andrew’s point that it doesn’t take much of a parameter difference to explain the probability difference. I could totally believe that both the 2.5% and 3.5% uncertainties are within the uncertainty in the uncertainty (or 2% vs 2.5% in Andrew’s examples).

]]>…Nate says…

]]>