Barley, Hops, and Bayes: Predicting The World Beer Cup
This post covers predicting award counts by the United States in an international beer competition. Exploratory data analysis and Bayes methods are also supported.
By Reginal Eps, EndlessPint.
The WBC is the World Beer Cup or what I am starting to consider the "World" Beer Cup (wBC), in the same way Americans have a championship for a sport only they play and declare themselves "World Champs". I am thinking you should take the WBC tagline of "The Most Prestigious Beer Competition in the World" with a grain of salt, at least the “World” part.
WBC v. wBC
Let's take a look at some of the numbers to see what makes me skeptical of calling this a full-fledged world competition. First up: entrant count by country for the previous Cup in 2014 (4,750+ entrants).
2014 WBC Participant Countries - Top-10 by Beer Entries
The top 10 countries sum to 91% of all entries, with the US accounting for 72% of all 2014 beer entries. Discrepancies of this sort are not uncommon . Consider that each nation is different with respect to its population, GDP, and brewing history, which all influence brewery counts. Let's drive home the point with an arguably unnecessary chart:
Twenty-two of the fifty-eight participant countries in 2014 won at least one of the 281 medals awarded. When looking at the breakdown of the winning countries we get a similar picture to the one above. I will spare you an accompanying chart but will highlight this skew by listing the top ten winning nations (totaling 94% of the medal haul).
|country||awards||% of total|
There is some difference between the two lists but predictably the top entrants tend to dominate the awards.
According to Alltech's 2015 research the US and Europe account for 86% of the 10,000+ craft breweries worldwide. Additionally, the US with 4,000+ breweries has far and away the most breweries of any country. Though the WBC may not be unrepresentative of reality it still gives me pleasure to refer to the Cup as wBC or W(n)BC, World (not-quite) Beer Cup. After all, in 2014 California had more awards (36) than all other countries not named "USA" and two other states, Colorado (22) and Oregon (18), each had more awards than all but one other country, Germany (27) [2, 3]. Say it with me now:
My initial interest in this year's wBC (you see I'm not letting it go) was to look into past performances of countries and make predictions on the breakdown of awards for 2016. Based on the similarity with recent past Cups, 96 styles being competed on and US the accounting for 71% of the participating breweries (accessed: APR-24, 2016), a medal haul in the range of 193 - 207, with an over/under for the wager-inclined at 201, seems reasonable .
The Bayesian model  run on the data is slightly more conservative in its distribution output (middle plot below) but I believe this is skewed by the low number of categories in the first few years. Even so, with this and the above mentioned similarity to recent years it would seem a safe bet that the US gets 190 medals, and likely closer to 200.
General past percentage rate extrapolation for additional countries:
The awards are certainly an honor and I expect the recognition bestowed upon past and future winners is well earned. Truthfully I don't really care about most of the breweries and categories represented by the WBC (I'm putting the capital letter back, no hard feelings) and yet that's the beauty of the whole thing, as far as I'm concerned. We all win through the diversity and competition encouraged by having such an event.
 See: FIFA World Cup (hmm, again with the "Cup"); UN Security Council; Ryder Cup (Cup, again? D'oh!); etc.When Good Advice Goes Bad
 These numbers reflect the ones proclaimed by the "World Beer Cup 2014 Fact Sheet".
 Not too dissimilar to US state GDP performance versus entire nations.
 288 medals up for competition (96 x 3); mind you, we are referencing listed breweries (1,452 US out of 2,057 total) as a proxy for beer entries; used +/- 2.5% around the median winning rate for range; US over/under naively set at midpoint between median and average winning rate.
 Hugely indebted to both IBM's Data Science Workbench tutorial as a jumping off point and especially Probabilistic Programming and Bayesian Methods for Hackers, Chapter 1 in doing this work
Original. Reposted with permission.
- Deep Learning Transcends the Bag of Words
- Bayesian Machine Learning, Explained
- When Good Advice Goes Bad