Super Bowl 47 is just a few days away, and cubicle dwellers across America are gearing up for another round of their office Super Bowl Squares pool. If you’ve never participated in one, the rules are simple: A 10×10 grid is created, and players buy one or more squares, with the money going into the pool. Once all the squares have been bought, each square is randomly assigned a unique ordered pair of numbers (x, y) where x and y are values vary from 0 to 9. At the end of each quarter, if both teams’ score modulo 10 matches one of your number pairs, you win a certain amount from the pool. There are several variations, but typically the end of the first, second, and third quarters would pay out a small amount, and the end of the game would pay out a larger amount.
Since American football scoring is a bit odd compared to other sports (7 points for a touchdown + extra point, 3 points for a field goal, etc.), many players consider numbers like 7, 3, and 0 to be good, and numbers like 2 and 5 to be bad. But are these assumptions justified? Are there certain pairs of numbers that have a higher probability of winning than others?
To answer this question, I teamed up with fellow Continuum employee Ben Zaitlen. Earlier in the month Ben pulled NFL play-by-play data aggregated by Brian Burke at advancednflstatistics.com. The data set contains data for every play of every NFL game dating back to the 2002 season. We did our analysis in a notebook on wakari which you can view here. Playing with this data also gave us the opportunity to use a new tool for collaborative analysis released by the Wakari team: bundles
Often when doing collaborative data analysis we need to share not only the resulting plots and not only the plots and scripts which generated them, but the entirety of the analysis. This means a cohesive collection of analysis, plots, scripts, and data. Wakari bundles enable collaborative data analysis from data to insight. Bundles can be thought of as git repositories, with some added metadata. This enables us to share ipython notebooks, python scripts, and data. In the future, bundles will be fully integrated with Wakari and describe the conda environments as well as plots, scripts, and data. Currently, you have to use our commands to update your bundles, but we’ll be adding a full git server soon, after which you can use git in a more complete and natural calling scheme.
Making a bundle
Creating a bundle is very easy. I execute the wk bundle command inside the directory I want to share, and specify the path to any notebooks I want to display.
The bundle command creates the associated metadata inside a bundle, listing notebooks that we wish to display, as well as any anaconda environments we wish to package. We can then publish this bundle using the wk publish command.
The command outputs a link that will direct you to a page which has the name and description of the bundle, as well as any notebooks defined.
The publish command generates a public view of the bundle which you can reach here. By default, all bundles that you publish are accessible by anyone. You can password protect it using the wk sharing command. The public link provides a button users can click to add the bundle to their wakari account.
If you’ve grabbed a copy of someones bundle and you wish to update it, you can do so using the wk fetch command
Let’s return back to our question of what the most likely scores modulo 10 are per quarter. To answer this, Ben and I used all the data from 2002 to 2011 and generated a heat map for each quarter of a game. Each square represents the percentage of a pair of scores modulo 10 that occurred in a given quarter. We see from the plot below that the story is a little more complicated than simply selecting 7, 3, and 0.
note: upper left triangle is blacked out. A square of X,Y is equivalent to a square of Y,X
There are definitely pairs of numbers that are good and bad to have at the end of the first quarter, but this becomes less true as the game progresses.
In the first quarter, the most common outcome is 7,0. This could be a touchdown with an extra point or a score of 10-7 (which is fairly common). We also see the squares of 3,3 and 3,0 occur with some frequency, though not nearly as often as 7,0. In the second quarter, we start seeing more variability — though the 7,0 pair is still the most likely square. Notice that the maximum of the gradient has shifted compared to the first quarter. While 7,0 is still the most likely square, it’s half as likely compared to the first quarter, and pairs of 3,3 and 3,0 increase their relative occurrence. In the third quarter, the maximum has again moved down as more squares occur with greater relative frequency. Pairs 7,0, 3,3 and 3,0 are still the most likely outcome, but it’s important to notice that 7,0 only has a 10% likelihood. It’s also important to note that 7,4 has nearly the same frequency as 3,3 — a portent of things to come…
Finally! in the last quarter — the finality to the game and riches to be had — we see that 7,4, 3,0, and 7,0 make up the top 3 squares (6.4%, 7.6%, and 4.4% respectively). This means the standby assumption of games ending in 7, 3, and 0 doesn’t quite capture the full story. If you are lucky enough to grab the 0,7 square for your pool, you’re in good shape throughout the game; squares with 3 and 4, you should probably stick around and see how the game plays out. As for 1, 2, 5, 6, 8, 9, it’s not unreasonable to consider your money lost and consoling yourself with spinach-cheese dip is recommended.
Share and Re-Share
Perhaps Super Bowl games are unlike regular games. Maybe playoffs games make better predictors for Super Bowl scores. If you want to explore this idea further, or explore the NFL play by play dataset in other ways, we encourage you to sign up for the Wakari beta here and import this bundle into your Wakari environment. If you are collaboratively working on analysis, the ability to pass code with data not only solves a data locality problem, but invites a type of exploration and play which can otherwise be dampened when collaborating through email, phone, etc.
As for who is going to win Super Bowl 47? Continuum Analytics doesn’t (yet) have a Python module for predicting Super Bowl winners, so I’ll do this the old fashioned way: my gut says San Francisco by a touchdown…