Skip to content

“MRP is the Carmelo Anthony of election forecasting methods”? So we’re doing trash talking now??

What’s the deal with Nate Silver calling MRP “the Carmelo Anthony of forecasting methods”?

Someone sent this to me:

and I was like, wtf? I don’t say wtf very often—at least, not on the blog—but this just seemed weird.

For one thing, Nate and I did a project together once using MRP: this was our estimate of attitudes on heath care reform by age, income, and state:

Without MRP, we couldn’t’ve done anything like it.

So, what gives?

Here’s a partial list of things that MRP has done:

– Estimating public opinion in slices of the population

– Improved analysis using the voter file

– Polling using the Xbox that outperformed conventional poll aggregates

– Changing our understanding of the role of nonresponse in polling swings

– Post-election analysis that’s a lot more trustworthy than exit polls

OK, sure, MRP has solved lots of problems, it’s revolutionized polling, no matter what Team Buggy Whip says.

That said, it’s possible that MRP is overrated. “Overrated” is a difference between rated quality and actual quality. MRP, wonderful as it is, might well be rated too highly in some quarters. I wouldn’t call MRP a “forecasting method,” but that’s another story.

I guess the thing that bugged me about the Carmelo Anthony comparison is that my impression from reading the sports news is not just that Anthony is overrated but that he’s an actual liability for his teams. Whereas I see MRP, overrated as it may be (I’ve seen no evidence that MRP is overrated but I’ll accept this for the purpose of argument), as still a valuable contributor to polling.

Ten years ago . . .

The end of the aughts. It was a simpler time. Nate Silver was willing to publish an analysis that used MRP. We all thought embodied cognition was real. Donald Trump was a reality-TV star. Kevin Spacey was cool. Nobody outside of suburban Maryland had heard of Beach Week.

And . . . Carmelo Anthony got lots of respect from the number crunchers.

Check this out:

So here’s the story according to Nate: MRP is like Carmelo Anthony because they’re both overrated. But Carmelo Anthony isn’t overrated, he’s really underrated. So maybe Nate’s MRP jab was just a backhanded MRP compliment?

Simpler story, I guess, is that back around 2010 Nate liked MRP and he liked Carmelo. Back then, he thought the people who thought Carmelo was overrated, were wrong. In 2018, he isn’t so impressed with either of them. Nate’s impression of MRP and Carmelo Anthony go up and down together. That’s consistent, I guess.

In all seriousness . . .

Unlike Nate Silver, I claim no expertise on basketball. For all I know, Tim Tebow will be starting for the Knicks next year!

I do claim some expertise on MRP, though. Nate described MRP as “not quite ‘hard’ data.” I don’t really know what Nate meant by “hard” data—ultimately, these are all just survey responses—but, in any case, I replied:

I guess MRP can mean different things to different people. All the MRP analyses I’ve ever published are entirely based on hard data. If you want to see something that’s a complete mess and is definitely overrated, try looking into the guts of classical survey weighting (see for example this paper). Meanwhile, Yair used MRP to do these great post-election summaries. Exit polls are a disaster; see for example here.

Published poll toplines are not the data, warts and all; they’re processed data, sometimes not adjusted for enough factors as in the notorious state polls in 2016. I agree with you that raw data is the best. Once you have raw data, you can make inferences for the population. That’s what Yair was doing. For understandable commercial reasons, lots of pollsters will release toplines and crosstabs but not raw data. MRP (or, more generally, RRP) is just a way of going from the raw data to make inference about the general population. It’s the general population (or the population of voters) that we care about. The people in the sample are just a means to an end.

Anyway, if you do talk about MRP and how overrated it is, you might consider pointing people to some of those links to MRP successes. Hey, here’s another one: we used MRP to estimate public opinion on health care. MRP has quite a highlight reel, more like Lebron or Steph or KD than Carmelo, I’d say!

One thing I will say is that data and analysis go together:

– No modern survey is good enough to be able to just interpret the results without any adjustment. Nonresponse is just too big a deal. Every survey gets adjusted, but some don’t get adjusted well.

– No analysis method can do it on its own without good data. All the modeling in the world won’t help you if you have serious selection bias.

Yair added:

Maybe it’s just a particularly touchy week for Melo references.

Both Andy and I would agree that MRP isn’t a silver bullet. But nothing is a silver bullet. I’ve seen people run MRP with bad survey data, bad poststratification data, and/or bad covariates in a model that’s way too sparse, and then over-promise about the results. I certainly wouldn’t endorse that. On the other side, obviously I agree with Andy that careful uses of MRP have had many successes, and it can improve survey inferences, especially compared to traditional weighting.

I think maybe you’re talking specifically about election forecasting? I haven’t seen comparisons of your forecasts to YouGov or PredictWise or whatever else. My vague sense pre-election was that they were roughly similar, i.e., that the meaty part of the curves overlapped. Maybe I’m wrong and your forecasts were much better this time—but non-MRP forecasters have also done much worse than you, so is that an indictment of MRP, or are you just really good at forecasting?

More to my main point—in one of your recent podcasts, I remember you said something about how forecasts aren’t everything, and people should look at precinct results to try to get beyond the toplines. That’s roughly what we’ve been trying to do in our post-election project, which has just gotten started. We see MRP as a way to combine all the data—pre-election voter file data, early voting, precinct results, county results, polling—into a single framework. Our estimates aren’t going to be perfect, for sure, but hopefully an improvement over what’s been out there, especially at sub-national levels. I know we’d do better if we had a lot more polling data, for instance. FWIW I get questions from clients all the time about how demographic groups voted in different states. Without state-specific survey data, which is generally unavailable and often poorly collected/weighted, not sure what else you can do except some modeling like MRP.

Maybe you’d rather see the raw unprocessed data like the precinct results. Fair enough, sometimes I do too! My sense is the people who want that level of detail are in the minority of the minority. Still, we’re going to try to do things like show the post-processed MRP estimates, but also some of the raw data to give intuition. I wonder if you think this is the right approach, or if you think something else would be better.

And Ryan Enos writes:

To follow up on this—I think you’ll all be interested in seeing the back and forth between Nate and Lynn Vavreck who was interviewing him. It was more of a discussion of tradeoffs between different approaches, then a discussion of what is wrong with MRP. Nate’s MRP alternative was to do a poll in every district, which I think we can all agree would be nice – if not entirely realistic. Although, as Nate pointed out, some of the efforts from the NY Times this cycle made that seem more realistic. In my humble opinion, Lynn did a nice job pushing Nate on the point that, even with data like the NY Times polls, you are still moving beyond raw data by weighting and, as Andrew points out, we often don’t consider how complex this can be (I have a common frustration with academic research about how much out of the box survey weights are used and abused).

I don’t actually pay terribly close attention to forecasting – but in my mind, Nate and everybody else in the business is doing a fantastic job and the YouGov MRP forecasts have been a revelation. From my perspective, as somebody who cares more about what survey data can teach us about human behavior and important political phenomenon, I think MRP has been a revelation in that it has allowed us to infer opinion in places, such as metro areas, where it would otherwise be missing. This has been one of the most important advances in public opinion research in my lifetime. Where the “overrated” part becomes true is that just like every other scientific advance, people can get too excited about what it can do without thinking about what assumptions are going into the method and this can lead to believing it can do more than it can—but this is true of everything.

Yair, to your question about presentation—I am a big believer in raw data and I think combining the presentation of MRP with something like precinct results, despite the dangers of ecological error, can be really valuable because it can allow people to check MRP results with priors from raw data.

It’s fine to do a poll in every district but then you’d still want to do MRP in order to adjust for nonresponse, estimate subgroups of the population, study public opinion in between the districtwide polls, etc.


  1. sg says:

    My own sense is Nate Silver has taken on the social media pundit’s tendency to flame (and think) in 140-chars instead of constructing arguments anyone can really engage with. Someone recently described him as Cillizza with a spreadsheet and that increasingly seems to be the case:

    A few more Nate S. MRP quotes.

    > A lot of quants who don’t know any better think that with MRP, you can spin straw into gold. Instead, it’s like spinning straw into… a straw basket. Which is actually sort of useful? But it’s still straw. You haven’t created any information, just reformulated it.

    >I get that selection bias is a problem. I think there’s also a point at which you smooth away a lot of the information in a dataset.
    … MRP, which doesn’t do too great out-of-sample when you don’t already know the result.

    > No one’s saying there aren’t good uses for MRP, I’ve used variations of it myself since before it was cool. But I do think the technique has become overused and the Atlantic article is pretty much a textbook example of what not to do with it.

    > A lot of quants who don’t know any better think that with MRP, you can spin straw into gold. Instead, it’s like spinning straw into… a straw basket. Which is actually sort of useful? But it’s still straw. You haven’t created any information, just reformulated it.

    • Andrew says:


      What’s with him saying he used MRP “since before it was cool”? MRP has always been cool, my dude.

    • gec says:

      > MRP, which doesn’t do too great out-of-sample when you don’t already know the result.

      Maybe I’m missing the context of this quote, but isn’t the whole point of (responsible) use of MRP to better understand what’s going on outside of the sample? Like any regularization method, it is designed to help steer a researcher away from the idiosyncracies of a particular sample and generalize beyond it?

      • >steer a researcher away from the idiosyncracies of a particular sample and generalize beyond it?

        Hence it should do “great out of sample” that is, it should tell you what the population is like instead of just the idiosyncrasies of the sample. He’s claiming that MRP doesn’t work unless the analyst “already knows” what result should come out of it, and therefore tunes the regression and poststratification formulas to “get the right answer”

        meh, it’s trash talk in my opinion.

        • Garnett says:

          “Trash-talk” or “ultra-cautionary tale?”

          Non-statisticians easily get worked up about the latest and greatest statistical methods, as if they can overcome weak theory and ill-defined expectations. Is Silver just playing the devil’s advocate?

          • gec says:

            Fair point–once a method is “in the wild”, it is often treated as a kind of “default” that goes uncriticized (and, as you say, is treated as some kind of panacea).

            I’d say in many scientific circles that Bayes factors have recently achieved this status, for example.

            On the other hand, Silver’s words are also often treated uncritically by the non-statistical audience, so the onus is on him to explain his reasoning.

        • gec says:

          > tunes the regression and poststratification formulas to “get the right answer”

          Ah, thanks that helps me understand the target of his criticism. Though of course it could reasonably be leveled against literally any data analysis (after all, even computing a mean entails assumptions about measurement scales, how the data are partitioned, etc.). So, as you say, silly trash talk.

          Personally, I appreciate that thinking in MRP terms forces me to confront and justify all that “tuning” and make it transparent. Funny how MRP is criticized for helping to lay bare the assumptions that are typically implicit.

  2. Garnett says:

    Many examples in this blog pertain to MRP and political science. Are there good examples of MRP in biomedical research? There must be some, especially given the scope of publicly available data such as NHANES.

  3. Jeffrey Lax says:

    Nate Silver has become the Carmelo Anthony of Nate Silvers.

  4. Dzhaughn says:

    Maybe Nate’s just saying that someone told him Mr. P would cost $2.4 M per year, and he paid.

    I wonder which is the Kurt Rambis of statistical methods?

  5. Michael Nelson says:


    If I were to play devil’s (Nate’s?) advocate, I might argue that, tone aside, Nate is not referring to how MRP is seen/used by actual social scientists for legit research. Perhaps Nate, as the editor of a data journalism site aimed at a lay audience, is mainly aware of and critiquing the use of MRP by authors of data journalism articles aimed at a lay audience. I guess I’d have to read the article ( to really know the context, but I’m not that good of an advocate. In any event, would you say that this very narrow, very generous interpretation holds up?

  6. John hall says:

    On “Nate described MRP as “not quite ‘hard’ data.” I don’t really know what Nate meant by “hard” data—ultimately, these are all just survey responses”

    But MRP can be used be used on more than just survey responses. It can be used whenever you have a multi-level regression and need to do some post-stratification. I have used it in finance, whereby you fit some multi-level model to asset returns at a particular point in time and then get group level estimates weighted by market capitalization instead of equally weighted. Same thing, no?

  7. zbicyclist says:

    OK, let’s go back to the morning of election day, 2016.

    From memory (I have the screen shot somewhere) Nate gives Trump a 32% chance of winning. Most pollsters gave trump about a 16% chance. Some dolts at the Huffington Post gave Trump a 1.8% chance, and 2 days before the election ran an article entitled “What’s Wrong With 538?”.

    (At about the same time, the Cubs were down 3 games to 1 in the World Series, giving them about a 12.5% chance of winning a series they ultimately won. Stuff happens.)

    Nate was wrong, of course, but “1 chance in 3” is more accurate than “1 chance in 6” or “1 chance in 50”.

    Maybe he made less use of MRP type techniques than others, and concluded from this that he has a better mousetrap than MRP?

    • Dalton says:

      “Nate was wrong, of course”?

      Narrowly focusing on the 32% probability of winning prediction, how was he “wrong”? This was an estimate prior to the unobserved event. As you pointed out, outcomes with a 32% or less probability happen all the time (like Kawhi Leonard’s vicious dagger to put away the Sixers, which had 32.1% chance of being made if taken by the average player based on its distance and how quickly a defender is closing in on the shot).

      What would “right” look like on November 7, 2016? If you mean that the expected value of prediction is exactly equal to the outcome then only 100% or 0% forecast can be considered right.

      • values closer to 50% would have better reflected the fact that in the presence of serious non response and other biases the information available made it near impossible to call the election. this was the reality, but models weren’t correctly modeling this effect

      • sg says:

        His model also had only a 10% chance Clinton wins the popular vote and loses the EC. It just wasn’t calibrated well, despite all the defensiveness on his part. It feels like a reckoning is still to be had, and it is good that we’re talking about MPR, but he’s been completely unapologetic and writing defensive screens against the media and anyone in his path ever since. Has been a sad thing to see.

        • Dalton says:

          I think there’s more of an argument here for this, given that losing the popular vote and winning the EC has happened before. So perhaps he has too much correlation among the states in his model. But still, events with 10% chance happen all the time.

          Obviously, we can’t think about this from a frequentist point of view because, eww gross, but also because the 2016 election happens only once. But I’m still trying to wrap my head around a pre-election estimate of 32% probability or even 28.5% probability as “wrong”. On November 6th, 2016 Donald Trump winning seemed unlikely to just about everybody (including Trumps campaign!) so clearly the probability should have been less than 50%. Hell maybe he even won because it seemed unlikely. Maybe a solid chunk of the electorate voted for him simply because they believed the polls and did it as sort of a protest. Maybe if they thought he actually would win they would have changed theirs votes.

          Is there any research into feedback in polling and elections? Does the information gained and published through polling actually alter the thing being polled?

          • In a Bayesian model, it’s “wrong” if it doesn’t represent the state of information we actually thought it should be representing.

            In my opinion, we had information that said that polls were correlated, had serious nonresponse issues, that phone polling in general was problematic, and that some panel surveys had strange participants (unrepresentative).

            When the outcome is essentially binary as it is in the US election system, then a state of information like 70/30 should be thought of as representing a fair amount of certainty that the 70 result will happen. And other pollers had more like 90/10. It was clear to me that 90/10 was overconfidence.

            Basically the models were attributing more information to the results of polls than the polls actually had. Imagine you go out door to door and ask people their race. Suppose only 10% of people answer. Then you do a calculation using an assumption of simple random sampling and come up with something like 94 +- 2% of people are white. If it turns out that black or asian or etc families had reasons not to answer the door, then your model is over-confident because it makes poor assumptions. With appropriate assumptions you might say something like 90 +- 15% are white (obviously with a long left tail). The second model is “right” not because it gets the proportion of white people correct, but because it isn’t over-confident on what the value is.

            However, with binary data, proportion and confidence are intimately tied because if p is the probability of the first thing to happen then 1-p is the probability of the second. 70/30 or especially 80/20 or 90/10 represented overconfidence in the information extractable from polls imho.

            • Andrew says:


              Some of this discussion came up on the blog, for example here and here.

              • Yep, as I said then:

                My impression is that modelers don’t put a “the whole process could have a bias that’s normally distributed at +- 5% or so” not because they don’t believe that’s true, but because if you do that you wind up with not much better than asking your aunt Greta who she thinks will win.

                and the point is, if you look like a coin-flip, people won’t pay you because they can flip coins themselves. They want some kind of “certainty” and also some kind of “drama” (horserace).

                If you ignore what the pollers tell you are the standard errors of their polls, and you say to yourself “it’s plausible that *all* the polls are biased one way or another, and each is a random realization of this biased process” then you’d start with a prior like maybe beta(5,5) for the true underlying fraction of people voting for say Clinton, and something like a normal(0,.05) truncated to [-1,1] model for the bias, and then something like normal noise in the biased polls with +- 5% margins.

                outcome = normal(underlying+bias,noise)

                you’d run your Bayesian model, and discover that there’s basically no information you can extract about the bias separate from the underlying (ie. it’s not identifiable), so at the end the bias is still normal(0,0.05) and the polls are polling around Clinton +2%, but with a 5% bias either way, it’s all very consistent with “you don’t know jack”.

                Under that kind of model, you’d probably have gotten something like 55/45 Clinton/Trump instead of something like 90/10 or 80/20 like people were predicting.

            • Dalton says:

              Right, but insert the cliche about model’s being wrong here. But we can be more nuanced than that, model’s can be more wrong in some aspects than in others. Thus the idea that posterior predictive checks should be focused on the summary statistics you most care about getting right. Or rather the idea of doing multiple posterior predictive checks to see what aspects of the data generating process the model does a good job of describing and what aspects the model does a poor job of describing. I get your points about non-response bias, etc. You’re saying that we had more uncertainty in the true state of public opinion, so it seems to me you’re making more of an argument that the election simulations (which is ultimately where I think Nate comes up with his topline number) should’ve been more dispersed. Right? But I still don’t see the median of all those election simulations being at 50%.

              So to speak specifically about a detailed election forecasting model: our model might be wrong in the vote share or turnout of a particular demographic group, but it could still be good with getting the overall vote share for a particular candidate. That’s why I said, “narrowly focusing on the 32% probability.” To me, 25 – 35% feels about right. Donald Trump winning was as much of a shocker as a Clinton landslide (something like taking PA, MI, WI, AZ, FL, NC, OH, GA plus maybe even SC and TX ) would’be been.

              “With appropriate assumptions you might say something like 90 +- 15% are white (obviously with a long left tail).” So 105% could be white? ;)

              This is an aside, but in my area of expertise (fisheries) we have people publishing survival estimates for fish passage at dams where the point estimate (not just some portion of the uncertainty interval) is greater than 1. So apparently dams create fish. Those models are obviously wrong, and yet they keep pumping them out.

              • 90+-15 means 90 is the point estimate and the scale of the errors is about 15 percentage points, but the bit about the long left tail was to indicate that they weren’t symmetric and hence you couldn’t have more than 100% white.

                in my area of expertise (fisheries) we have people publishing survival estimates for fish passage at dams where the point estimate (not just some portion of the uncertainty interval) is greater than 1

                Obviously not Bayesian models. This is exactly the kind of stuff we’ve had discussions about every time Frequentist vs Bayesian comes up. Bayesian models can’t give even intervals that overlap with impossible regions provided you give the appropriate definition of impossible to the model.

            • Dalton says:


              Isn’t this something we could assess retrospectively? Obviously not if we only look at the 2016 election, but suppose we took all the cases where we had some facsimile of the kind of polling data that we had in 2016. We’d probably have to extend beyond presidential elections. But we could train models on the polling data using say the methods of 538 versus a model that introduces more forms of uncertainty, versus a coin flip. Since you’re suggesting any election model should be closer to a coin flip based (or maybe I’m misrepresenting what you’re suggesting and you’re narrowly focused on the condition present in the 2016 polls), and 538 is willing to be more precise than that. Given enough contests that have pre-election polling data that satisfy our criteria, we should be able to assess which models performed the best.

              In fact didn’t 538 do something like this:

              • > Since you’re suggesting any election model should be closer to a coin flip based (or maybe I’m misrepresenting what you’re suggesting and you’re narrowly focused on the condition present in the 2016 polls),

                No, I’m suggesting that specifically election models using data of the type being collected “these days”.

                Things like “random digit dial, and most people have cell phones and ignore you” or “online polling” or whatever.

                If you collect more reliable data, you could easily have predictions that *should* be something like 90/10, but it needs to be something you have strong reason to believe is truly reliable.

      • Chris Wilson says:

        Most generally, as you add uncertainty to a binary forecast, your forecast should approach 50%. Non-response bias and turnout, from recollection, are huge sources of uncertainty, that need to be ‘adjusted’ for. What’s funny is I always assumed Nate was using MRP – but then, I am far from expert in this area.

        • Andrew says:


          MRP is something you do with raw data to adjust sample to population. It’s my impression that Nate does postprocessing of reported toplines, so he’s not analyzing survey responses directly. In 2016, some of the state polls did insufficient adjustment for nonresponse, hence analyses of reported summaries from state polls led just about all analysts to conclude that Clinton would probably win. Apparently, Trump’s campaign team thought Clinton was going to win, too. Nate was better than most of the others in that he had more uncertainty in the outcome.

          It was possible to do better in 2016 using MRP on raw poll responses; see here.

          • Chris Wilson says:

            Thanks for linkage Andrew! I don’t understand his negativity about MRP. At this stage, it seems decidedly preferable to just aggregating toplines…

            • Andrew says:


              I think Nate’s objections to MRP are:

              1. MRP is not magic, and it’s being sold (not by me, but by some people) as being magic. In particular, inference for demographic slices or geographic areas with small sample sizes will be inherently model-based, so it will work well when the model has good predictors but not otherwise.

              2. MRP estimates are not raw data thus they shouldn’t be trusted, they’re in some sort of “uncanny valley” that makes Nate uncomfortable.

              Point 1 is fair enough. MRP uses predictive modeling, and predictive modeling can give bad answers where the model is off.

              But I think point 2 is misinformed. Nate uses reported toplines from polls; those toplines are based on survey adjustment. This adjustment could actually be MRP (as with Yougov) or it might be some crappy weighting method that could improved if the survey orgs were using MRP. To oppose MRP (or, more generally, RPP) in such settings just seems foolish.

    • zbicyclist says:

      I found my screenshot from 538 on election day morning. “Who will win the presidency?” Hillary: 71.4%, Trump 28.6%

  8. Joe says:

    I would imagine that “not quite hard data” means that the results are inferential rather than produced directly. As you say, no one is actually just reporting raw data or elementary descriptive statistics. But there’s an intuitive sense in which “run a poll in every state” is a more “direct” technique. I think many people have a taste-based preference for more direct methods. I’d give the analogy of criminal cases where juries often prefer eyewitness testimony to circumstantial evidence despite the fact that we now know eyewitness testimony to be extremely unreliable.

    Second, and speculatively, my read of one of the major conceptual frames at 538 is that they have faith in pollsters not techniques. They’ve calculated “grades” and estimated bias for hundreds of different pollsters, and their basic aggregation method is to take toplines and then weight them in terms of those factors. Surely, some of this is born of necessity as you can’t usually get the raw data, but I think there’s actually a philosophical commitment here. 1) There’s a lot more to polling techniques than just the ultimate statistical analysis of the responses and 2) there are a lot of judgement calls at every step of the process. So, it may make sense to focus on the work of those who have performed well in the past without worrying too much about how they managed to do it (at least some element of which likely involves trade secrets). In that sense, if I can speculatively put words in Nate’s mouth, a good pollster has more value than a good statistical method.

  9. Rick G says:

    Being the “Carmelo Anthony” of something means that, despite all of the promise and talent, and possibly great applicability at one point in time, there is no good reason for it anymore. The analogy is that Carmelo Anthony is really good at ~2009 basketball but not 2019 basketball, basically because he isn’t a great 3-point shooter and relies on too many low-EV mid-range shots. So what Nate is saying is that MRP is really cool but there must be some set of tools that he uses now which make MRP obsolete, i.e. MRP no longer has any place in the modern world of election forecasting.

  10. > Nate described MRP as “not quite ‘hard’ data.”
    I have used similar expressions when talking about indirect estimates for non-inferiority analysis and more generally network meta-analysis.

    The general impression seems to be that although the effect assessment is not randomized it is based on data. However, whether the assessment is done by formulas, likelihoods or classic Bayes, there is a supposition that certain relative effects that can’t be directly estimated/assessed would be the same if an omitted comparison group had been in the study.

    Now it can be very sensible to make that supposition, but the implied likelihood (or approximate likelihood based formula) should be recognized as an informed prior and an appropriate Bayesian workflow followed to assess if its informativeness is appropriate and credible.

    Thew bigger picture here though, is that cost/benefit of this argument is not very attractive – its hard for even many statisticians to grasp and the extra uncertainty it brings out often is not that critical.

  11. Daniel:

    > In a Bayesian model, it’s “wrong” if it doesn’t represent the state of information we actually thought it should be representing.

    Yes, wrong in the sense that if you persisted in using that represented state of information (would repeated do polls analysed using the same state of information) you would persist in being repeatedly wrong.

    However, the one time it actually was done done in the past, it could have by (extreme) fluke given a reasonable answer. Remember if a tortoise is also predicting the event you are predicting, it is not impossible to lose to it.

    I think we may persist in this disagreement for some time ;-)

    • I’m not sure what disagreement you mean. I agree with you that the state of information “Clinton has 70% chance of winning” which is approximately what 538 was saying at the time is a state of information that is based on wrong assumptions about the world, and if you persist in using it you’ll continue to be repeatedly wrong. Some of the wrong assumptions are things like

      “polling methods that worked when phone calling was a very different activity will continue to be relevant and low bias today”

      It could have by extreme fluke been the case that phone polling and other methods were perfectly fine and we just got some weird numbers throughout the whole of the lead-up to 2016, and if we persist in using those old polling methods we’ll do fine next time, but we knew, or should have known, that it wasn’t true.

  12. Paul Alper says:

    Andrew wrote: “and I was like, wtf? I don’t say wtf very often”

    So of course google

    “wtf andrew gelman”

    and you can see for yourself how frequent that is. But perhaps more relevant is the unnecessary and ubiquitous “like.”–like–on-the-rise-/

  13. Jordan Anaya says:

    I don’t know what MRP is, but I am a basketball expert.

    Nate Silver came up with a player projection system which he named after Carmelo:

    In that introduction he links to an article about Anthony he wrote:

    Maybe he thinks Carmelo isn’t as bad as people think but still not worth his contract? But that article is old and Carmelo’s reputation has taken multiple hits since then.

    Basically he went to the Thunder and couldn’t adjust his game to be the third option, then went to the Rockets and couldn’t adjust his game to be a role player (he still thought he was a star). So maybe being a Carmelo is thinking you should be the primary option when in reality you should be the fifth or sixth line of evidence?

    • Andrew says:


      MRP is something you do with raw data, to adjust the sample to match the population. An example is here. Another example is here. When Nate does poll analysis, I think he uses published summaries: rather than taking raw survey responses, he relies on whatever adjustments were done by the polling organizations. I understand this decision on Nate’s part: there are lots of polls out there that will publish summaries but not release their raw data—but the point is that he wouldn’t be using MRP for most of what he does. So maybe he thinks MRP is overrated because he doesn’t use it. The thing is, there’s a lot that can be learned from analysis of survey data. MRP is a really powerful tool.

  14. Bob says:

    I really don’t understand the US statisticians obsession with polling. Well I do a bit, but there must bet much better uses of your time.

    • Andrew says:


      It’s worse than that! I also do a lot of sports analysis. If you want to talk about unimportant topics that statisticians are obsessed with, sports has got to top the list.

      • jd says:

        “a lot of sports analysis” – where??? That golf example doesn’t count.
        “It’s worse than that!” – hey now, I’d go for more sports topics on this blog:) It’s as least an important a topic as gremlins and the end of the world.

  15. jd says:

    What does MRP stand for?

    • Multilevel Regression and Poststratification

      basically a technique where you learn to predict the average results of various groups using regression on survey data, and then you figure out the average results of a full population by predicting using the regression equations for the *known* demographics of the whole population, rather than relying on the survey to accurately sample from every demographic group in the appropriate proportion.

  16. jd says:

    Apparently there is another “jd” out there…

Leave a Reply