Rui:

I think that Nate wrote that they are using fat tails. The bigger peak and the fat tails go together, as you can see if you plot the t density. It’s clear they have fat tails in the Fivethirtyeight simulations, as that’s the only way to give Trump a chance of winning California, for example. I’m pretty sure the added errors are independent or something like it because that’s how you can have their conditional prediction that, even if Trump wins California, his chance of winning the national election is only about 50%.

Multivariate distributions are tricky, and it’s no surprise that if you slap one down and apply it to data, it can have some undesired artifacts. I think the mistake people are making is to assume by default that the Fivethirtyeight forecast (or any model) is “correct,” in some sense. Their procedure is just something they threw together. I respect the judgment of the Fivethirtyeight team; it’s just that the forecast distribution is complicated, and that will lead to trouble on things that they didn’t look at carefully when they were setting up their procedure.

]]>Fogpine:

Thanks. It does sound like this represents a problem with the model, not one that would have much effect on the headline predictions but still an issue. Elliott is looking into it.

]]>Elliott Morris, Merlin Heidemanns, and Andrew Gelman,

I re-checked my points above and couldn’t find mistakes. I think the points are relevant to your model, so I hope you consider them. If scenarios like Trump winning but losing Ohio are relevant to interrogating your model, these must be too.

I put together a rare state event estimate that helps explain my points more, especially in reaction to the insightful commentary on “the difficulty of model calibration” and “using anomalous events” in the Gelman, Hullman, Wlezian, and Morris article linked in the post above.

Here’s the rare event calculation:

Three state election results appear to be outside the Economist model’s 99.9% prediction intervals in backtesting for 2016. How surprising is this? If state results were independent, then in only 1 of about 49,778 presidential elections would we see 3 or more state results that are outside 99.9% intervals from a fully calibrated model.* So observing 3 states outside the 99.9% interval would strongly suggest model miscalibration and need for revisions.

However, state results are not independent. This makes checking calibration more difficult, yet still possible, especially by focusing on rare events. In particular, the probability of 3 or more states with results outside 99.9% intervals is bounded above by the probability of at least 1 state with results outside 99.9% intervals, which even with nonindependence cannot exceed 0.051 (0.001*51).

Further, 2 and 1 states had results outside 99.9% intervals in 2008 and 2012 backtesting. So, with a fully calibrated model there would be at most 0.051 probability of observing states outside 99.9% intervals in these years too. Given independence of the differences between model predictions and actual results across years, the probability of observing so many states outside 99.9% prediction intervals is at most 0.051^3, or about 1 in 7500. This suggests the Economist model is miscalibrated.

The probability can be reduced from 1 in 7500 to below 1 in 50000 by switching to higher-coverage intervals or tightening the bound used.

Thank you very much for considering these points, and especially for posting the model on github.

*Calculated from binomial CDF using 51 states (50 + DC).

]]>I tried comparing actual state outcomes with final model predictions for the 2008, 2012, and 2016 elections. In each election, 8 state results fell outside 95% prediction intervals from the model. However, in expectation only about 2.5 states should do so (2.5/51 is about 5%). Additionally, 4, 5, and 5 states fell outside the model’s 98% prediction intervals in 2008, 2012, and 2016, respectively. In expectation, only about 1 state should do so (1/51 is about 2%).

I understand that state results are correlated, so one should only expect state prediction intervals to show good calibration across many elections, and not necessarily within an election or a few elections. But that is exactly the value of thinking about extremely unlikely events! — For extremely unlikely events, we shouldn’t see any occurrence, regardless of between-state correlation. And yet 2, 1, and 3 states fell outside the Economist model’s 99.9% prediction intervals in 2008, 2012, and 2016, respectively.

Results are shown in this figure: https://i.postimg.cc/Y9M8p0Fy/economist-model-tail-check.png

I’m not sure what the solution is. Maybe heavy tails? An alternative would be to shift the state predictive distribution means closer to actual outcomes, but I don’t see how to do that without overfitting and thereby worsening overconfidence. Or maybe no solution is needed because I misunderstood something.

My analysis has limitations too: I calculated intervals by assuming logit-transformed predictions were normally distributed. But maybe too few samples were used (1500), especially because distribution SDs were estimated from empirical 95% intervals of the sample ((high – low)/(2*1.96)).

]]>Socrates:

Merlin Heidemanns made the graphs. Some code is here.

]]>> How many decimal places does it make sense to report the win probability?

I pointed to that sentence when the article was shared last month. I see it published unchanged and I wonder if the sentence is correct and I was wrong. If someone wants to comment on its grammatical soundness I will be grateful.

]]>Andrew R.:

I know nothing about this Senate model.

]]>Your Senate model gives a 22% chance of Rs getting 51+ seats and 12% chance of a 50/50 split.

Probability of R control is 30%

This implies a probability of a Republican VP breaking ties as (30-22)/12 = 67%

How does that work with an 87% Biden win percentage?

Probability of R control should be 22 + 12*(1-.87) = 23.6% ]]>

I think it’s pretty likely that we are in some kind of realignment, but it may not be obvious except in historical hindsight. If Biden wins back the traditionally-blue states (WI MI PA) that Trump won in 2016, does that mean 2016 was just a fluke?

Maybe not, if this election ends up being more of a referendum on Trump’s handling of COVID and other current problems – depending on who the candidates are in 2024 and 2028, and economic trends, and what happens to the Republican party platform post-Trump, it could be an interruption in a trend that resumes later: we could easily see a bluer Southwest (AZ going blue and TX becoming a swing state?) and a purpler Midwest going forward, IMO.

Economic/environmental policy re: fossil fuels could make a major difference in places like TX and PA too.

]]>Daniel said,

“This is because of the difficulty of *encoding* knowledge into a formal model.”

A point well worth emphasizing, over and over again. So often people “fall in love with” their models, and get off into fairly tale land.

]]>In this context then, what it means to have a “wrong model” is that you don’t really believe that the background knowledge and assumptions actually imply the given form of the p function. it should be somehow different in a way that results in lower/higher probabilities for certain events.

This is because of the difficulty of *encoding* knowledge into a formal model.

]]>The thing is, in the Bayesian interpretation of probability, probabilities are **never** observable. They aren’t frequencies, or even limits of frequencies, they don’t exist in the world. They are useful fictions for inference from one set of accepted facts to another set of accepted facts.

probability to a Bayesian is always conditional on a model/knowledge base. The probability basically looks like

Normalize{p( something | parameters) p(parameters | background,assumptions) }

the background and assumptions are assumed true, and assumed to imply the form of the p function over the space of both “something” and “parameters”. We then take observations of “something” which are additional facts we know from measurement, and back calculate new knowledge over the space of unknown parameters.

Although the measurements of “something” are definite facts about the world (or at least about the measurement instruments) the p function itself is not observable. At best we can argue over whether the assumptions really should be allowed to dictate that given form of the p function. It’s a *logical* question, not an empirical one.

]]>Jay:

I agree, but I’d alter your statement so say that I think the low between-state correlations in the Fivethirtyeight model are by accident. Based on what Nate Silver has written, I’m guessing that they started with high between-state correlations and then added long-tailed errors that were independent across states, which had the effect of lowering the correlations between states. You’re right that there are legitimate reasons to allow the possibility of nonuniform swings, but I think some of the weird things going on with the Fivethirtyeight model are artifacts of the way they added in noise terms. This can happen, that when you expand your model you introduce new problems.

]]>I know Nate has said that a lot of what he’s modeling is “the possibility that something unexpected will happen between now and the election” (as it perhaps did last night.) And I know in his podcast he’s said quite a few times, basically, that if you look back more than twenty years or so, polls were often really volatile in the last month of the election, and he’s considering that as a reasonable part of his training data, so he treats the polls as volatile.

But none of that addresses Andrew’s concern that the correlations between states are too low. If you expect most of the volatility to be in _national_ swings, as it sounds like Andrew does, then you might give Trump a higher chance of winning but you would give him a low chance of winning conditional on losing Ohio.

I think this is actually a disagreement of priors between Andrew and Nate (possibly also driven by different training sets). Nate has said that he’s taking into account the possibility that this election is a realignment (or mid-realignment) for one or more states. So WV went from being a Democratic stronghold to being a Republican stronghold really quickly, and VA went the other way. So sure, Ohio has been a red-ish state in the past few elections, but there’s some probability that it will shift back and be blue of center going forward, and Nate’s model thinks that sort of thing is non-trivial.

]]>Is the method used to learn the covariance matrix specified anywhere? Based on the github and the Economist page, it looked like it could be based on demographics, how states voted in the past, urbanicity, religiousness, etc, but if it was described in one of the papers or calculated by a script in the github then I must have missed it. I’m wondering if the model might be a little off on states having undergone large shifts in voting behavior the past 4 years. In particular, Virginia has drifted the opposite politically of correlated states such as Michigan and Pennsylvania. Arizona has also experienced a larger change in voter behavior, when compared with Florida and Nevada. Perhaps the covariances would look slightly different after taking into account the 2020 polls, as well as factors such as education, which have become more pronounced over the past decade. While practically this might not affect the predicted outcomes too significantly, I was surprised to see Virginia and Arizona with the same tipping point probability (admittedly low at 4%).

]]>Whether this increase truly represents probability in the Bayesian sense of the word I’m not so sure. After all, given the difficulty of modeling a result several months out, it’s not really hard to imagine them initially pumping up uncertainties by adding fudge factors, etc in order to cover for some truly dramatic shift in the race, especially considering all the unexpected events in the past year. However, I think it’s worth noting that it wasn’t so long ago that polls could swing by 10 points in the last month of the race, so perhaps Fivetheirtyeight’s model is built partially on that data. Again I think a lot of words could have been saved had Fivethirtyeight just published their model…

]]>You’re right that n=1 is not enough for statistical analysis. We can look at the accuracy of a set of forecasts but that won’t tell us much about how good any particular forecast happened to be. It’s not clear what does the following remark mean in practical terms: “if our stated probabilities do wind up diverging from the results, we will welcome the opportunity to learn from our mistakes and do better next time.”

]]>PS: I’ve got to admit that I myself used the term “wrong model”, but I meant this *relative* to intuition or another model, as explained in my comment – or in situations in which more can be observed, relative to later observed data.

]]>@jim: “Yes what about where intuition agrees with the model but both are wrong?”

Well, one of my points was that arguably (at least) in a situation like this, where eventually there is only one observation, a “true” probability does not exist. Surely it is not observable, only later we can see whether the event of interest has happened or not. So what then do you mean by “intuition and model both wrong”? “Wrong” compared to what?

Wow. Yeah that could change things… given that early voting is already happening, the uncertainty itself could have an effect even if it’s asymptomatic.

]]>“Figuring out the consequences of what you already think. ”

Well if you’re just going to keep pruning to what you already think, then you already think you know the consequences of what you think, right? :)

I’m sure we need mathematicians. But I’m not sure we need statisticians, and I’m very sure we have *way* to many people modelling their beliefs and then claiming that the model is a respectable forecast. That’s a general opinion, not particularly pointed at election forecasts, but there are certainly shades of predetermination in the way Andrew thinks about this.

The upshot is that – at least for the moment – it hardly matters, since Biden is almost a shoe-in, and all this trimming bias is really just fine-tuning the noise.

]]>Garrett:

I discussed this issue several months ago, in a post entitled, “Do we really believe the Democrats have an 88% chance of winning the presidential election?” with followup here. I agree that this is a big discrepancy. Part of the discrepancy could arise from concerns such as vote suppression which are not in our model (see section 1.4 here), but I don’t think that’s the whole story. I think that bettors are overreacting to the experience of 2016.

Back in June, I looked into the possibility of betting on Biden, but I decided not to, for several reasons. First, I wasn’t quite sure exactly how to do so. Second, it didn’t seem like I could bet very much, at least not on Betfair. Third, I felt kinda bad about using my quantitative social science expertise to win money from people who I think are misunderstanding some aspect of forecasting. I don’t think it’s immoral to bet, and I’m cool about betting on sporting events or whatever, but this particular bet made me feel uncomfortable.

In any case, our model is what it is. It includes some but not all information, and people can feel free to make their own judgments about the evidence.

]]>> The chance of that is low, since there is very little time remaining to election

A few hours after you wrote that, Trump tested positive for coronavirus. There’s now a lot of question as to whether he gave it to Biden during the debate… Maybe neither one of them will be alive by Nov 3

]]>Figuring out the consequences of what you already think. There are lots of things where even if you know basically “for sure” what certain outcomes will be, other outcomes that are downstream from that or otherwise related are not at all clear. If everyone just automatically knew every logical consequence of every assumption, we wouldn’t need Mathematicians at all, we wouldn’t build models, and we wouldn’t bother to have computers.

]]>“I like to say that sometimes the most worthwhile thing about a model is how it’s wrong and how we can learn from that. The danger always is that if we take our own intuition over what the model says, can we trust our intuition more to be right? ”

Yes what about where intuition agrees with the model but both are wrong?

Going through the model and “correcting” the things that “seem wrong” is just biasing it your existing beliefs and negating the purpose of building it. What’s the point in going to all that trouble if you’re just going to tune it to what you already think?

]]>You might say you’re not a betting man, but this really is a large discrepancy and I think it would add credibility to your model if you publicly made a big bet on Biden.

]]>I do agree though with Andrew’s point that when exploring implications of a model and maybe criticising it, it can be useful to look at such probabilities, ask whether they “seem too high”, ask, how can they be explained within the model and for what reason do they seem too high, and then maybe see that there’s something missing in the model that we may want to add. I like to say that sometimes the most worthwhile thing about a model is how it’s wrong and how we can learn from that. The danger always is that if we take our own intuition over what the model says, can we trust our intuition more to be right? The attitude when doing these things should not be that “we have a more correct intuition what the true probability is”, because there is no such thing.

This whole discussion is always difficult regarding events that can essentially not be repeated. A probability of 1% means, in a frequentist setup, that in 1 or 100 situations that are exactly the same one should expect the event to happen. If ultimately it doesn’t happen the one time it has the chance to happen, and by quite some margin, this really doesn’t say anything about whether the 1% were too high or about right. We’ve got to live with the fact that this issue can be discussed for long, but reality ultimately will not give anything remotely close to a decision who is right.

]]>If you’re just looking for opinions, I personally feel like slightly fatter tails would make me more comfortable. How much fatter I’m not qualified to say, 0/40000 makes an outcome seem absolutely inconceivable to me in a way that 10/40000 or 40/40000 does not. And I’m just not sure that (Biden wins Ohio but loses Wisconsin and Pennsylvania) and (Biden wins Ohio but loses the EC) feel like quite at that level to me. Just gut instinct on that.

One interesting outcome: the latest (10/01) Fivethirtyeight model had a perfect Biden sweep (538 EC votes) in exactly one simulation. At first I thought it must be a fluke, but then noticed they had seven outcomes with Biden at 537, which I assume are cases where Trump holds NE1 or NE3. Seems too high to me, but once again that’s just my instinct. As you point out in this post, calibrating for unprecedented events is very difficult.

One more puzzling thing that I’d be curious to hear your take on. Fivethirtyeight currently has Biden at 0.424 to win Georgia and 0.538 to win Ohio. You give Biden 0.49 to win Georgia, and 0.42 to win Ohio. So you have the relative probabilities reversed between the two states: think they Biden is more likely to win Ohio than Georgia, you think Biden is more likely to win Georgia than Ohio. What significant differences between your models make this the case? (I must admit that seeing Georgia as a swing state is more surprising to me than Ohio, but that instinct is just based on past elections.)

]]>I actually wonder if the dots on those dotplots are not completely chosen at random.

Apparently, they are not completely chosen at random:

https://twitter.com/natesilver538/status/1293654976313655296

I wish they’d share their code and simulation output so we wouldn’t have to do this sort of reverse engineering.

Definitely

]]>Fred:

I was curious so I went to the Fivethirtyeight site. Unfortunately it doesn’t seem to have their simulations, but you can click on “Model outputs” and get some summaries. If I’m reading it right, they have Trump’s chance of winning California as only 0.2%. That still seems too high to me, but you’re right that it’s much less than 1%. (As you say, the Fivethirtyeight forecast gives Trump a 1% chance of winning Hawaii, which seems too high too.)

Anyway, given all that, I’m baffled as to why their dotplot for California with 100 dots shows a dot with Trump winning. Sure, this can happen, but if his chance of winning under the model is 0.2%, then with 100 dots you’d expect to see zero that have Trump winning the state.

I actually wonder if the dots on those dotplots are not completely chosen at random. Maybe they purposely try to pick at least one winning dot for each candidate, as a way of emphasizing uncertainty in the outcome. I say this because I also checked Wyoming, where Fivethirtyeight gives Biden a 0.2% chance of winning . . . but one of the 100 dots is a Biden win. Hmm . . . where else to check? Delaware: their model gives Trump a 0.08% chance of winning (that’s 0.0008 if you write it as a probability) but, again, one of the 100 dots shows Trump winning. Same with Idaho (Biden with a 0.4% chance of winning but one of the 100 dots is on his side). Also Oklahoma, NE-3, and New York. They didn’t do it with D.C., though.

I wonder if those plots are programmed to include at least one dot for each candidate, but with DC hardcoded as an exception.

Here’s what the dotplots say:

We simulate the election 40,000 times to see who wins most often. The sample of 100 outcomes below gives you a good idea of the range of scenarios our model thinks is possible.

“A good idea of the range” . . . so they’re not actually saying it’s a random sample.

I wish they’d share their code and simulation output so we wouldn’t have to do this sort of reverse engineering.

]]>Various readers have suggested that our high correlations and normally-distributed tails might be leading to some implausible conditional probabilities.

At one point, we were using a t distribution with 7 degrees of freedom, but we scrapped it because it didn’t address other issues we were concerned with at the time.

Can I ask why the t distribution was scrapped?

There isn’t enough presidential election data to really check calibration of the model’s tails. Because tail shape can’t be calibrated, using normal tails seems similar to thinking that lighter-than-normal tails and heavier-than-normal tails are equally probable. Yet, in my and (I think) most others’ experience, lighter-than-normal tails are quite rare and heavy tails are common. So, it seems better to use a t distribution.

Is it possible you didn’t tune the degrees of freedom based on personal judgement, but instead obtained that from fits to the data and got unhelpful results? If so, have you considered that, without enough data in the tails, the fitted degrees of freedom could be mostly influenced by the available data near the distribution’s center or shoulder, which can be unrepresentative of the actual, unobserved tails?

Thanks for continuing to write these blog posts on the 2020 model! They are always great to read.

]]>Fred:

I wrote that Fivethirtyeight’s forecast “gives Biden an approximate 3% chance of winning Alabama and Trump a 1% chance of winning California” based on their graphics for the two states. Their graphic for California showed 100 dots, 99 of which corresponded to Biden winning California and 1 of which corresponded to Trump winning California. As I wrote, this is an approximation. It could be, for example, that the FIvethirtyeight forecast gave Trump a 0.8% chance of winning California, which is approximately 1% but less than 1%.

I have no idea what you mean by “misleading at best” to report this approximate probability. I’m just reading off their graph.

]]>Their forecast of California (along with other states like Massachusetts and New York) says “Trump wins <1 in 100" compared to their Hawaii forecast of "Trump wins 1 in 100". ]]>

Jm:

We were thinking about that idea of considering “at least one of . . .,” but I don’t know that this would help much , given that the outcomes are so highly correlated.

]]>No argument here about it being arbitrary, but I think you might be able to get a little bit better guess than something that varies over 4 orders of magnitude. I think part of the problem comes from trying to specify a single unlikely event: Trump wins California. That seems really hard, so I have no idea what number to put on it. But you can cheat a little bit by saying what are the chances Trump wins at least one of CA or MA or VT or NY or… essentially integrating over enough unlikely events that you get to some ‘interpretable’ probability. Then maybe you can reasonably say this cumulative probability is definitely in the 0.5-1% range, which is something you can rationally think about.

The only real reason to do this would be to have a way to intuitively check your model output. But that could be useful if you were planning to make more principled changes to, for instance, make the tails fatter.

]]>Yariv:

Yes, exactly. The reason to care about these predictions of outlandish events is that they can give us insights into our model and can help us improve it. And we did indeed tune our between-state covariance matrix so as to get reasonable predictions for more predictable outcomes.

]]>Trump winning California does seem pretty near zero.

OTOH I could totally see Biden win Alabama and a ton of other traditionally-R states if there is some bad health news about Trump that leaves it uncertain whether he would actually be able to be President. The chance of that is low, since there is very little time remaining to election, but given Trump’s age it probably wouldn’t be 1/40,000 or even 1/10,000 low even without COVID.

I think places like California would still have decent D turnout even if Biden’s health were uncertain, since it’s largely voting *against* Trump. I don’t think the opposite necessarily applies (not that R voters would vote for Biden, but that tons wouldn’t vote).

]]>It occurs to me that this sort of thing is probably asymmetric, since Trump is much more widely disliked than Biden (a very different situation from 2016, where Trump had no political history and Clinton was widely disliked).

So if something like a COVID diagnosis for one of the candidates happens… well probably more people would still show up to vote *against* Trump (even if it were unclear whether Biden or Harris would actually be president) than vice versa.

]]>Presumably if you were _just_ interested in Trump winning CA, the answer would be "nothing", since you're not really generating predictions to the third decimal. But if you have a whole background of low-probability events, or if you're tuning your parameters to increase the size of the tails to what you think is reasonable, that could have a significant effect.

The whole thing reminds me of the physics concepts of an effective theory and the renormalization group. When we're working at a certain energy regime of a physical system and we don't know what it behaves like at much higher energies, we can pick some arbitrary, unphysical cutoff behavior, write down the model, and then tune its parameters so that the outputs fit the measurements we take within the physical regime.

Likewise, for the election model, you shouldn't really care whether outlandish results happen 1-in-1,000 simulations or 1-in-40,000 – just pick a tail model, and then make sure the 1-in-10 or 1-in-20 results make sense.

]]>N:

We could do some version of what you suggest, but what we actually do is choose a fixed covariance matrix for the states based on old data; we don’t learn anything about from the 2020 polls.

]]>Jm:

I know what you mean, but there’s still the question of where do you draw the line? See the second-to-last paragraph above.. What’s the probability that Trump wins California? If you don’t want to say 1/40,000, would you say 1/10,000? 1/1,000? 1/100? 1/10? At some point you have to bite the bullet, right?

]]>“The stipulation is that things are more or less like before.”

ie, people go to the polls, vote for who they like and their vote gets counted.

]]>What makes you think they haven’t?

]]>Whether that means it should be included in a model is a different question. It would almost have to be something like: leave the model exactly as it is, but for every output you include a term for "there is an x% chance that this model is completely wrong and missed something absolutely critical".

]]>Well, I think one of the difficulties is that a lot of the extremely unlikely – or even just “pretty unlikely” or “surprising” – outcomes depend on stuff that isn’t really predictable from polling etc. and may not be included in the model’s uncertainty.

Like game-changing last-minute news (for example, one of the candidates has some kind of bad health event for example) or much higher ballot rejection rates in one state or something weird happens in Las Vegas and NV has absurdly low Election Day turnout.

And disqualifying a whole lot of ballots in one state doesn’t require or even necessarily imply ballot fraud – it could be a result of unfamiliar procedures in this election.

]]>All models are wrong, some are useful. If you propose a change in a model, explain how the change makes it more useful.

The point of this model is to predict an election outcome, whatever that’s worth. It is not to predict the probability of Penn voting one way conditional on Ohio’s voting.

Clifton thinks ballot fraud in Penn makes ballot fraud less likely in Ohio. I think it maybe correlates the other way, which would increase the between state correlation. But without a model and data, what’s the point?

]]>