Why Expected Points and EPA are kind of broken
tl;dr EPA is difficult to calculate, hard to explain, is inherently noisy, and isn’t more stable than metrics like fantasy points that are far easier to understand and calculate.
Why EPA should be awesome
When it was introduced by Brian Burke by way of Virgil Green, EPA (Expected Points Added) promised to be a breakthrough in NFL player evaluation. EPA accounts for down, distance remaining to gain a first down, and field position. It also accounts for garbage time. Therefore we might expect EPA to better measure true skill in NFL players. This is exciting. Metrics with this type of potential are pretty rare.
How to calculate EPA (it’s hard)
EPA is calculated by taking the expected point value of every down, distance and field position (“game state”) combination before a play is run, and subtracting it from the expected point value of the new game state after a play is run. Positive values are better than negative values. If you sum up all the EPA values over some period, you get the Total Expected Points Added for that set of plays.
EPA is extremely difficult to calculate. The typical graph you’ll see when people talk about EPA is the 1st and 10 graph. It shows the expected points for 1st and 10 at various position on the football field.
This is all very nice to look at, but EPA is concerned with the expected points added. This means to calculate it you need the curves for second down. And the curves for third down. And fourth down.
An Expected Points sample calculation
Imagine that you are on offense and you picked up 4 yards on 1st and 10 from your own 20 yard line. What does the curve for 2nd and 6 look like? And what is the value of 2nd and 6 from the 24 so that we can calculate EPA? The plot looks like this:
Just by eyeballing it, the value of 2nd and 6 from the 24 looks like it is less than 0. Since the value of 1st and 10 from the 20 was positive, we can conclude that EPA is likely negative for the game state. But we also see a problem. The problem is that while there is still a strong linear trend up and to the right, the variance has increased considerably. How much confidence should we put in the idea that 2nd and 6 from the 24 has negative Expected Points?
One strategy - in fact the one used by Burke and others when writing about EPA - is to use LOESS smoothing to create a trend line through the data points. Here is what that graph looks like for 2nd and 6.
With a nice LOESS line smoothing things out, it now appears that 2nd and 6 from the 24 has a positive value. We are still just eyeballing things though. What we really need are formulae from which to make predictions. Sadly, LOESS doesn’t produce coefficients which we can extract and use to create predictions on new data.
To get coefficients we need to fit a linear model to the data. Here’s a simple first order model that fits the LOESS version well.
And here’s a polynomial version that is slightly better.
Based upon which model you use, the value of 2nd and 6 from the 24 is either .086 or .078, a difference of about 9%.
So now we know the Expected Point value of our game state on second down. We still need to get the EP value for our original game state.
Again there are multiple ways to model the 1st and 10 plot. Depending on which you choose, the value might be either 0.207 or 0.18 (It’s worth noting that 1st and 10 from the 20 is by far the most common game state in football, so it should have the lowest variance because of the large sample size). It might also be above, below or between these values.
Original state EP: 0.207 or 0.182 New state EP: 0.086 or 0.078
Putting aside model choice for a minute, we can use the above values to calculate EPA. The EPA that resulted from moving from 1st and 10 at the 20 to 2nd and 6 at the 24 is somewhere between -0.129 and -0.096.
Maybe. Again, we don’t really know for sure.
Worse, we have to do this for every possible down and distance. This includes down and distance combinations which are more rare, and thus have much higher variance. Take 3rd and 15, which is a downright mess.
Is the relationship here really non-linear? Or is this just noise? In other words, on 3rd and 15 near your own goal, are you really in a better position to score points than at the 20?
I’m not the first person to point out the problem with the relatively high level of uncertainty in Expected Points. Trey Causey wrote eloquently about the issue a year ago. He concluded:
Using data from 2010-2013, the 95% confidence interval for expected points for a 1st and 10 at the opponent’s 35 ranges from 3.0 to 3.68. We think that plausible values for the ‘true’ expected points from that scenario lie in that range, based on the data we’ve collected. Say the ‘true’ value is closer to 3.0, say 3.15 and we make a decision that is supposed to net us a half a point in expected points. It’s entirely possible that we haven’t really made any positive gains at all!
Clear eyes, full hearts, can’t lose
Despite these issues I went ahead and created curves for each down and distance combination and then fit linear models to the curves. I then wrote some python code to generate EPA values for each play since 2000 using the Armchair Analysis database. Then I aggregated that data by Quarterback and season and performed a join to facilitate a year-over-year analysis.
To test the stability of passing EPA for QBs, I bootstrapped the r-squared statistic by regressing Passing EPA on itself year-over-year. I also bootstrapped fantasy points on itself year-over-year. The sample size for both analysis was 361 player season pairs, and included only QBs who played in 12 or more games in season N. Here’s a density/histogram plot of the two results:
The x-axis is the year-over-year r-squared value, and the y-axis is the probability that a particular r-squared appears in the 5000 sample distribution. The overlap between the two distributions is nearly complete.
In words, I found that season N Pass EPA explains roughly 30% of the variance in season N+1 Passing EPA. I also found that Passing EPA and Fantasy Points are equally stable year over year.
Stability is important because if we want to claim that a metric is capturing the skill inherent in a player, we would expect this skill to carry over from year to year. Moreover, Passing EPA and Fantasy Points are highly correlated. EPA really isn’t adding anything new to the picture.
It’s getting late on a Sunday night, so I’ll end it with this:
EPA is difficult to calculate, hard to explain, is inherently noisy, and isn’t more stable than metrics like fantasy points that are far easier to understand and calculate. As a nod to the law of parsimony, maybe it’s best that we just stick with vanilla fantasy points until something better comes around.