In preparation for the 2020 College Football season, we did some exciting new work with something called Elo Ratings. We got them ready just in time for kickoff, only for COVID-19 to throw a wrench in the 2020 season, along with all of our lives.
Instead of waiting around, I figured I’d use this time to introduce you to this exciting new stat and show you just how powerful it can be! And we’ll cross our fingers that we actually get to use it in 2020.
What is an Elo?
You mean who is Elo? Elo Ratings were created by physicist Arpad Elo, and were originally (and still) used to rank chess players. Elo Ratings are best-suited for head-to-head games. The basic premise is that at the very start of measuring Elo, you assign every team a score of 1500. As the teams play each other, you keep track of the results, and then update the ratings of each team up or down depending on the outcome of the game and how good their opponent was.
What goes into College Football Elo Ratings?
Honestly, not much. All we need to calculate a team’s updated Elo Ratings from week to week are:
- their current Elo Rating
- the actual result of the game (win, loss, or tie)
- the expected result of the game, based on each team’s Elo going into the game (AKA pre-game win probability)
- how much weight you give to the result of each game, called the k-factor
These are the absolute minimum requirements needed to calculate Elo. However, we do throw in a few extra things for good measure. First is home field advantage. After testing a bunch of different values to see which made our calculation the most accurate, we came up with a home field advantage of +55 Elo Points to the home team. For two otherwise even-rated teams, this works out to be about an 8% increase in win probability. So nothing to scoff at. For teams that are already outmatched, being at home doesn’t help them all that much, maybe around 2% depending on the opponent.
The other thing we need to do once per season is regress each team back to the mean score of 1500. This regression factor, from 0 to 1, shows how much consistency a team can hold on to from season to season. In the NFL, teams regress by one-third, meaning they retain 67% of their strength from the previous season. In other sports it may be higher or lower. In College Football, after testing a range of values, we found that .95 was the best option, meaning that from one season to the next, each team gets to keep 95% of it’s Elo Rating from last year. This is really high, and we were kind of surprised at first. But that just speaks to how strong the powerhouses are at recruiting top talent year in, year out.
A little more info about the k-factor
The exact calculation for a team’s new Elo Rating is to take their current Elo Rating, and add to it k-times the difference between the actual score and the expected score of the game. It looks like this.
New Rating = Current Rating + k * (Actual Score - Expected Score)
Now the k that we landed on is 85. This is pretty darn high. For reference, most NFL Elo Ratings use a k from 20-40, and some sports with long seasons like baseball may use a k as low as 4, meaning that each win has little significance, but the sum of many wins adds up over time. But, as we know, college football has a short 12-game regular season (even shorter this year). And when it comes to getting into the playoffs, each game is make or break. That’s why it makes some sense that each win holds a lot of weight, and especially if a team was expected to lose by a big margin and wins (or vice-versa). This allows Elo to quickly correct itself if a low-rated team comes out and gets a few big wins, or if a powerhouse blows a cupcake game.
That said, the most a team could improve their Elo Rating in one week is 85, if they were to win a game (Actual Score = 1) that they were expected to lose with near certainty (Expected Score = 0). This would add k * 1, or 85 points, to their Elo Rating. And if they lost a game (Actual Score = 0) that they were expected to win easily (Expected Score = 1), then they would lose 85 Elo Points and their rating would go down.
One other note: the expected score, or win probability, is a bit more complicated to calculate, but it results in a number between 0 and 1. The actual score is just the outcome of the game. A 1 equals a win, a 0 equals a loss, and a 0.5 would equal a tie, although there are currently no ties in college football.
How accurate are your Elo Ratings?
From 2010–2019, with our most optimized inputs, we came out with a Brier score of .175. A Brier score is the mean squared error (MSE), meaning the difference between the expected score (our predicted win probability using Elo) and the actual outcome (a 1 or 0 for a win or loss). So, lower is better. And that’s pretty low. What it amounts to is that, on average, our predictions were off by about .4. That doesn’t sound great, but keep in mind that if we give two teams a 50% chance of winning each, one of those teams is going to end up winning the game, and we will have been off by .5 in the actual vs. expected scores. A more practical validation of our model is the below graph, which shows how accurate our predictions were at each confidence level.
Graph of Actual vs. Expected Wins for Each Win Probability Prediction
When we look at each actual outcome vs. our predicted result grouped by .01, we see a pretty darn linear line, meaning that we are fairly accurate with our predictions. For instance, when we said a team had a 95% chance of winning the game, which we gave 247 teams over the course of 20 seasons, in reality they won that game 96% of the time. That’s pretty accurate. Likewise, if we predicted a 5% chance of a team winning, they actually won 6.4% of the team.
We do notice here that we tend to underpredict win probabilities for some lower-tiered teams, meaning there are a good deal of upsets in college football, so that’s something we’d hope to correct in the future with the addition of more data; one example of this would be what fivethirtyeight does with the NFL, by adding a factor for whether the starting QB is playing, which has a significant effect on the outcome of games. Perhaps that is the cause of some of these upsets. Unfortunately right now with 130 teams and 65 games every Saturday, it’s a bit hard for us to keep up with that at the moment. Another thought is that some of these upsets come from D-II or D-III teams early on in the season that aren’t tracked by our Elo Ratings weekly, since this data only covers D-I teams. As a result, there could be a team that hasn’t had their rating updated since the game they played against a D-I opponent one year ago, and they could be a completely different team by then. If you have any ideas on how to adjust for that, let us know (a safe bet may be to regress these teams with less than 12 ratings per season closer to 1500).
What we see at the tail ends of the spectrum is that when a team is predicted to win with high confidence (above 92% win probability) is when we tend to get the result right more often. This implies that one-sided match-ups, which occur fairly often in CFB, usually go as planned. On the other side, when we give a team below an 8% win probability, you can be fairly certain that is an accurate probability we’ve given.
So what do we do with this information?
Well for one thing, this is just cool to track and follow throughout the season to see how quickly teams can rise and fall. Take a look at LSU’s rise to the Championship last season. Up until the CFP Final, LSU was still the underdog in Elo Ratings.
We can also try to use this data to inform betting decisions on games. This would be most useful when Elo is giving a team a high win probability (above 92% to be safe), and the betting odds imply otherwise. In this case, it could be a good opportunity to take that bet. We’ll be monitoring that this season and giving our predictions via our new newsletter, which you can subscribe to here.
We have to be careful with betting purely based on Elo though, because if we think back to the list of factors going into Elo, it was very short. The data that sportsbooks use to set the odds are much more comprehensive, so the information-gap could potentially be large. That’s why it’s best to use Elo Ratings as a tool, along with context, to find the best options.
Where do we go from here?
We already mentioned adjusting for the starting QB being out. Once we figure out how to do that accurately every week, we’ll certainly try to implement it. Another thing that could improve the usefulness of these ratings is factoring in margin-of-victory. Many would argue that a close game against a weak opponent hints at the flaws of a top-rated team; however to Elo, a win is a win. We could correct for this by penalizing teams that come into a game with a high win probability and end up winning by a field goal, especially if it comes late in the game (vis-à-vis game control), and by rewarding teams a bit (or not penalizing them as much) for losing a game by a slim margin that they were supposed to lose by a wide margin. Lastly, we could give a team a bonus for crushing a competitor that was supposedly a 50/50 matchup. Of course, all of this has to be tested to see if it actually improves our Brier score. We can add as much data as we like to our model, but unless it actually makes our predictions significantly more accurate, what’s the point?
Thanks for reading and we look forward to sharing more Elo Ratings with you each week to see how teams are moving up and down, as well as give projected results for each upcoming game. Remember to subscribe to our new newsletter for all the key stats right in your inbox each week.