Skip to content

Stan goes to the World Cup

Leo Egidi shares his 2018 World Cup model, which he’s fitting in Stan.

But I don’t like this:

First, something’s missing. Where’s the U.S.??

More seriously, what’s with that “16.74%” thing? So bogus. You might as well say you’re 66.31 inches tall.

Anyway, as is often the case with Bayesian models, the point here is not the particular predictions but rather the transparency of the whole process. If the above win probabilities look wrong to you: Fine. You’re saying you have prior knowledge that’s not captured in Leo’s model. The thing to do next is to formally express that knowledge, alter the model, and re-fit in Stan.


  1. Peter says:

    The probability that Russia wins the world cup is about zero (I wouldn’t bet any amount on this team).
    A model that gives Russia about the same chance of winning the world cup as France, Argentina, Portugal and Spain – is this a joke?
    In the current FIFA ranking Russia is ranked 70th, France 7th and Argentina 5th, Portugal 4th and Spain 10th:
    In other words, if Russia had had to qualify for the tournament (which it didn’t as the host), it most likely would have failed.

    I guess this model shows that if one wants to apply statistics to the real word with reasonable results, one needs to have some subject-matter knowledge.

    • Andrew says:


      Yeah, I wondered about that too. That’s one reason I wrote that last paragraph above. The point is to build a reasonable model. Spewing out all these probabilities can be helpful, as they can give a clue to what’s missing and needs to be added in for the model to make sense.

    • Curious says:

      Without having run the calculations myself, Russia’s probability is likely a function of the fact that the Home Team has won 30% of the World Cup Championships since 1930:

      • Peter says:

        I figured that this was the case. In the terminology of regression analysis, I think there is massive omitted variable bias here. I.e., home advantage only helps if you have a good or a very good team – meaning the home advantage is much weaker than the fact that, since 1930, 30% of the time the world cup was won by the host, suggests to the naive eye.

        • Curious says:

          After looking at the methodology a bit more closely, the statistical information, if it is used, would have to enter via the bookie odds as the authors do not explicitly include it in their model.

          Also, the percent of top 4 finishes for the Host Country is 65%.

          I agree that it is simply observational data given that the causal mechanism may not be well understood (though I do not know this literature), but I believe Home Field advantage is often taken into account when bookies set initial odds.

  2. Anonymous says:

    Maybe of interest:
    Why predicting the winner of the World Cup is so difficult. June 14 2018
    This article contains a table with predictions from various models.

    What makes a country good at football? June 9 2018

    Editorial: How to win the world cup. Though tainted by corruption, the tournament rewards liberalism, internationalism and open markets. [article title from the print edition of The Economist magazine]

  3. KKnight says:

    Yes, Russia’s chances of winning are very slim. However, their FIFA ranking is artificially low due to the fact that this ranking weights competitive matches very highly and Russia, being the host nation, hasn’t played any in the past two years. Brazil suffered a similar fate in 2014, although not as drastic.

    A more accurate system is the Elo ranking, which can be found online somewhere (Google it). This ranking has Russia at 41 (or so) essentially tied with Scotland and the Czech Republic.

  4. Shravan says:

    I’m not the least bit interested in watching soccer, but any model, about anything really, that puts Germany in number one position has my full support. How could it be any other way?

  5. Leo Egidi says:

    Hi Andrew, Hi all,

    Thanks for posting my model, and thanks for further comments and suggestions. I summarize the main points raised by Andrew and Peter and I provide some possible answers:

    1) Russia probabilities and winning probabilities in general: yes, I agree some of these estimated probabilities may appear weird, especially for Russia. However, predicting all the World Cup progress in advance is quite artificial, and is not the main purpose of my model at all. Rather, final group stages ranks and single matches probabilities provided in the case study look much more real, and tend to favor the best teams against the weaker teams. This is mainly due to the inclusion in the model of some group stages bookies odds, which of course are not available for the second part of the world Cup! (Quarter of finals, semifinals,…) .
    So, this is a possible clue for some of these probabilities …I will update them as the Cup progresses.
    Anyway, hosting teams, even if they are not good at all, often outperform: think at the third place gained by Chile in 1960, the fourth place of Korea in 2002 (achieving semifinals after surprisingly defeating Italy and Spain), or the good performance of South Africa in 2010. Maybe, there is an hosting team effect, regardless of the abilities.

    2) Refitting in Stan: the idea is now to use the observed results after each group stage and refit stan models in order to update the predictions. And maybe, after this, we could have a glimpse how the model could be adjusted.

    3) Model details and transparency: in the next days I will include in my website further model details about priors, bookies and code. So, whoever is interested may interact and propose changes/extensions! Statistical modeling is in my opinion a compromise between art and science, and creative statisticians may be very useful for improving sports statistical models with much information behind. Soccer results are of course hardly predictable.

Leave a Reply