Originally posted on 9/10/15
Fantasy football: where owning a sports franchise is within anyone’s reach, a year of pride is put on the line, and grown adults cry almost every Sunday. I’ve been playing competitive fantasy football for more than 10 years with 25 friends, all avid NFL fans. I’m the Peyton Manning of my league. I’m a fantastic regular season performer. Then the playoffs roll around, and every position player on my roster forgets he’s supposed to be good at football.This fall, I’ll take any advantage I can get. One of the biggest strategic changes over the last 5+ seasons has been a power shift from position players to quarterbacks. You want a good quarterback. You really want a good quarterback. Good quarterbacks now account for the most points in almost every league, and the guys at the top account for many more points than the guys in the middle of the pack. Long story short, for the love of Jon Gruden, draft a good quarterback.
What’s cool is that instead of relying on online research and fantasy football manuals that aggregate “expert analysis” from old guys living in their mothers’ basements, I remembered that I worked for a company that, you know…does analytics for a living. The benefit of machine learning over black box predictions from places like ESPN and Yahoo! is that machine learning will not only project a quarterback’s stats, but it’ll tell me how it arrived at each prediction. Knowing the underlying components of each predictive model enables me to apply my “domain expertise” to determine whether or not the model, and the predictions, check out.
We thought it’d be interesting to a whole lot of people if we could leverage Eureqa to eat data for breakfast and have it tell us which quarterbacks we should be eyeing in our upcoming fantasy drafts. We analyzed data from 2007-2014 for all quarterbacks who started at least 10 games in a season. Using Eureqa, we generated seven unique predictive models for pass yards, passing touchdowns, interceptions, rush yards, rushing touchdowns, two-point conversions made, and fumbles lost. That is: which signals from past data are most influential in predicting a quarterback’s passing yards this season? And, based on those signals, how many yards will he actually throw for? We then aggregated all of the player’s predicted statistics to yield a “total points” column and stack-ranked our top 20 performers. Football fans will not only enjoy our recommended “cheat sheet” for quarterbacks; they’ll also be fascinated to learn about the most important signals that guided our predictions.
Here’s how things shake out:
At the top of the pack are the usual suspects. Aaron Rodgers is pretty good at throwing a football, it turns out, and Eureqa doesn’t expect that to change. He did just lose his best receiver in Jordy Nelson for the year with a torn ACL, so we’ll see if that affects his performance. If we’ve learned anything about Green Bay over the years, they’re a wide receiver factory and they’ll have guys step up. It does surprise me to see Andrew Luck ranked behind Manning, though not by much. (I’d be astounded if Luck threw for only 29 TDs, though he may be an outlier in the data.) It’s also surprising to see Eli Manning in the eighth spot, but artificial intelligence is perhaps telling us something we don’t know. Eureqa also didn’t factor in Brady’s unquantifiable “unleash hell” variable, where former teammates vow a pissed-off Brady’s going to put on a performance for the ages to spite the clowns running the NFL. (Sorry, I’m a Pats fan. Had to throw in at least one shot at Goodell.) Tony Romo, Philip Rivers, Matt Stafford, and Cam Newton also look low to me, but they’ve all also had a few pretty dud seasons mixed into their careers; this could be another one of those years. Overall, this list passes the sniff test pretty well. Let’s dig into how we got there.
Most important predictive signals: total fantasy points (positive correlation), rush attempts (negative correlation)
Analysis: This makes some sense. A QB’s total points from last year is generally an indicator of his quality as a fantasy quarterback. A quarterback who performed well last year is likely to perform well this year, and throwing for lots of yards is a large part of that. Interestingly, last year’s rushing attempts also shows up. The more a QB runs the ball, the fewer yards he tends to throw for. Easy enough.
Most important predictive signals: passing TDs (positive), fumbles lost (positive), two-point conversions (positive), sack percentage (negative)
Analysis: This is where it starts to get fun. We have a limited dataset, so we have to hypothesize what things Eureqa found that are incredibly valuable, or what things might’ve shown up as curious results simply because we didn’t have enough data. Last year’s pass TDs are highly predictive of this year’s pass TDs. Once a QB learns how to get in the end zone, assuming little roster turnover year-over-year, they’ll generally stay in the ballpark of last year’s performance. Fumbles lost (the more fumbles you lose, the more TDs you tend to throw) is the one that made me raise an eyebrow. Maybe we don’t have enough data. Maybe it’s right, and fumbles lost is an interesting proxy for how risky a quarterback is (they hold onto the ball longer, run more, etc.), which yields more big plays and touchdowns.
More two-point conversions could mean a few things. It could mean you’re scoring more touchdowns, which unsurprisingly means more opportunities to go for two. It could also mean you’re playing from behind more often, and teams tend to throw the ball more when they’re behind, which leads to more touchdowns. And lastly, and also pretty intuitively: don’t let your quarterback hit the ground. Sack percentage is the percentage of time a quarterback is sacked when he drops back for a pass play. A quarterback who doesn’t get sacked/pressured throws for more touchdowns. And he stays healthy. And he’s better friends with his offensive line.
Most important predictive signals: sack percentage (negative)
Analysis: Here’s the tough one to rationalize, so I’ll first lead off why explaining this could be wrong, and then I’ll move along to sound like a crackhead and vigorously explain why it could make sense. Out of all the 150 variables that could predict next year’s interceptions, Eureqa found one that’s more explanatory than all the others, or any combination of any of the others: sack percentage. The more you got sacked last year, the fewer interceptions you’ll throw this year. Really, Eureqa? You could’ve found anything: pass attempts, TDs, age, complex stats that ESPN nerds code together, etc. Instead you brought me sack percentage, and made it negatively correlated. This could very well be a case of not enough data. Or, it could be that interceptions are just ridiculously hard to predict. Many of them are fluke incidents: a ball is tipped, the wind carries a throw too far, the receiver trips, the QB is hit right as he throws the ball. These are things that just can’t be accounted for with data.
But hang on a second. Season interceptions are somewhat predictable. Who’s going to throw more picks: 16 games of Geno Smith, or 16 games of Aaron Rodgers? What’s that? Geno Smith is out for six weeks? When did this happen?! (Sorry, Jets fans. I know, I know, too soon.) The point is, some quarterbacks absolutely, consistently, throw more interceptions over the course of the season. What if a quarterback threw a lot of picks in large part because he was consistently pressured (or sacked)? That means his sack percentage would be high that year. Possible responses: (1) The quarterback works his tail off in the offseason to get the ball out of his hands more quickly, and/or (2) The coaches and front office work vigorously to improve the offensive line…which means in the following year, the quarterback has much better protection and time to throw. And most NFL QBs are talented enough to make most throws when they have the time. There is no data to support my second theory. All logic. All heavily coffee-driven logic.
Most important predictive signals: rush yards (positive), deep pass percentage (positive), times sacked (positive)
Analysis: I like what this one is telling me. If a QB had a lot of rushing yards last year, he’ll probably run this year. Once a runner, always a runner. Deep pass percentage is the percentage of throws in which a receiver is at least 15 yards down the field. This could be a proxy for how risky a quarterback plays. He runs, he throws deep, he goes for big plays. I don’t know if this is true, but it could be. Lastly, number of times sacked. If a QB was sacked a bunch last year, that’s probably an indicator that he was under a lot of pressure. When you’re under pressure, you scramble more. If you were under pressure last year, you’ll probably be under pressure again this year, and keep scrambling. Oh…that directly contradicts what I said in my interceptions analysis? Let’s move along.
Most important predictive signals: rush attempts (positive)
Analysis: Quarterbacks who run the ball more score more touchdowns. Scrambling quarterbacks continue to scramble when they’re in the red zone. Go figure.
Most important predictive signals: mean
Analysis: Our best models for fumbles lost (2.6) and two-point conversions (0.7) are constants. Fumbles lost and two-point conversions are both highly unpredictable, so guessing the average is our best bet to predict player-by-player outcomes. Many times, fumbles lost depends on fluke hits and bad bounces. Two-point conversions are relatively rare and dependent on unique in-game situations. While I know there will be a few players that deviate from the trend, I’ll gladly take averages here.
With the above chart, and models informing us what’s likely to happen, we recommend applying any insider information you have about new player acquisitions, preseason performance, etc. to the mix to make the best judgment call possible. Man and machine, working together, to solve the world’s most pressing problems…like fantasy football.