In this piece, I conduct an analysis of NFL players by utilizing attribute ratings from EA Sports in the most recent edition of its video game franchise Madden NFL 2019. I reverse-engineer the player “overall” rating by position, build various classifiers to predict a player’s position by his attributes, and create a “similar players” recommender using a nearest neighbors algorithm.
I started my investigation into NFL players with a series of open ended questions:
- Which positions have the greatest drop off between elite level players and average players?
- Which attributes are most important at each position?
- Which players are most similar to each other in abilities?
While traditional statistics exist that capture on-field performance with metrics like completion percentage, tackles made per game, and field goal percentage beyond 40 yards, I was most interested in answering questions based on a direct assessment of the physical attributes and skills these players possessed. Enter Madden NFL 2019.
Madden NFL 19 is a video game produced by EA Sports based on the real-life players in the National Football League (NFL). The NFL is the most popular sport to watch in America, a spot it has held since the 1960s, and in 2017 grossed $14 billion in revenue, with $8.1 billion being distributed amongst its 32 teams. In August 2018, the Madden video game franchise itself was estimated to be worth $4 billion, with sales of the 2019 edition of the game projected to reach 5 million copies.
EA Sports takes the assessment process for player attributes very seriously: it has a group of evaluators, the Ratings Adjustor team, travel to games to observe players and submit notes. Additionally, the attribute ratings receive a great deal of attention and scrutiny amongst fans and players. The result is a carefully considered rating of 12 to 99 (a higher rating is better) for 54 distinct attributes related to playing football, like throwing power, kicking accuracy, tackling ability, and many more.
Getting the Data
While I expected to find an existing solution that provided a database of player attributes in an Excel or .csv format, I wasn’t able to. While a couple of sites did have players attributes, they were either incomplete or didn’t have data current through the end of the most recent season which concluded in February 2019.
So, to get the data, I wrote a web scraper in Python using Selenium, a handy library that allows you to automate browser activity. It’s often used for creating automated tests for web applications, but the functionality it provides for interacting with web pages programmatically makes it ideal for scraping tasks like this one which required scrolling interaction to trigger the webpage to load more players per position. To actually get the data itself, I used a list of player position to iterate over pages on the EA Sports website and grab the outerText attribute for each element in the player_container class.
Cleaning the Data
After scraping the EA Sports website, I turned the data into a format I could work with by writing and applying a series of Python scripts to transform the data into a Pandas DataFrame.
In addition to transforming the data into a DataFrame, I also applied a slight modification to the ‘position’ feature:
- I grouped players of a position in the set [‘LT’, ‘LG’, ‘C’, ‘RG’, ‘RT’] into one position ‘OL’
- I grouped players of a position in the set [‘LE’, ‘DT’, ‘RE’] into one position ‘DL’
- I grouped players of a position in the set [‘LOLB’, ‘MLB’, ‘ROLB’] into one position ‘LB’
- I grouped players of a position in the set [‘CB’, ‘FS’, ‘SS’] into one position ‘DB’
For those without football-specific knowledge, these positions are similar and are often grouped together when being discussed.
Exploring the Data
Before moving to building predictive models, I was curious to get a better understanding of the underlying data by answering a few questions that came to mind.
What Is The Distribution of Player Overall Ratings?
For the data scraped (a total of 995 players), overall rating appears to be roughly normally distributed.
Which Positions Have the Greatest Skew in Overall Ratings?
A few different metrics can be used to answer this question, but I settled on how much higher the median of the five best player’s overall ratings are compared with the median “starting player” which is the average of the 16th and 17th best players (there are 32 teams, so this value is the median). While the simplest approach would be to look at each position and take the difference between the average and median staring players, in this context I was most interested in understanding how steep the drop off was between having an elite player versus a mid-tier player.
For Each Position, Which Attributes Contribute Most to Overall Rating? AKA: Reverse-Engineering The ‘Overall’ Rating Feature
FiveThirtyEight did an exploration of how Madden ratings are calculated in 2015, but significant changes have taken place since then and the formulas they arrived at no longer apply. To approximate the overall rating function, I grouped players by position and trained a linear model on them with the target being a numerical value approximating a player’s overall rating given his attribute ratings. I chose a linear model with Lasso regularization because I valued interpretability for this use case.
Across the positions studied, the linear model found attributes that qualitatively make sense as being important for the position. For example, here are the five highest weighted attributes for QB, WR, and DB:
- Medium Accuracy
- Throw Power
- Throw Under Pressure
- Play Action
Wide Receiver (WR)
- Deep Route Running
- Short Route Running
- Catch in Traffic
Defensive Backs (DB)
- Zone Coverage
- Man Coverage
Building a Position Classifier
Next, I was curious to see if it was possible to reliably classify what position a player is based on his attribute ratings. While this has limited practical application (we already know what position each player plays), it could be interesting if it were modified for players at the high school level who sometimes play multiple positions and could suggest or assign players to a role that best fits their attributes. I tried a variety of models, and arrived at the best results with random forest and XGBoost.
I used Pipeline and GridSearchCV to train a random forest model. The model correctly classified 96.6% (greater detail shown in the confusion matrix below). Cases of misclassifying DL/LB pairs are acceptable given that, in some defensive schemes, player with the title “LB” often serve the role of a DL.
Like the random forest model, XGBoost also correctly classified 96.6% percent of cases, and rarely made mistakes outside of the DL/LB pair.
Suggesting Similar Players
While determining a player’s position from his attributes with a high degree of success was an interesting finding, I wondered if it was possible to suggest similar players using a nearest neighbors approach well enough to pass a qualitative test of . To test the output of my model, I polled a handful of friends who follow the NFL to ask them if they felt that the similar players generated accurately reflected playing style and ability.
I wrote a function get_similar_players that takes in a player’s name and the number of similar players requested (n). The function grabs all other players at the position, then runs a nearest neighbors algorithm over the set of players to generate a similarity score for each and return a sorted n component list of similar players. The model worked surprisingly well.
For example, the five most similar players suggested for wide receiver Tyreek Hill were Tyler Locket, Brandin Cooks, Doug Baldwin, Desean Jackson, and T.Y Hilton. Tyreek Hill is distinctive for being small relative to other wide receivers and very fast, and all of these players match that description. It’s worth noting that physical characteristics like height/weight were not directly incorporated in the model, but are likely captured indirectly by attribute ratings.
Another example: the five most similar players suggested for quarterback Tom Brady were Drew Brees, Philip Rivers, Matt Ryan, Eli Manning, and Sam Darnold. All of these quarterbacks exemplify the characteristics of a classic “pocket passer” that is less mobile but adept at throwing accurate passes.
This result may have practical applications in constructing a successful team around a set of particular player roles (ex: some teams prioritize having a mobile quarterback, faster linebackers who can cover well, etc.). Football is known to be a sport where strategic roster construction can have a meaningful impact on a team’s success.
Player ratings are updated each week during the NFL season in response to on-field player performance. It would be interesting to attempt to reverse-engineer the formula applied to update player ratings by including on-field performance statistics.
It would be interesting to extend this analysis to other professional sports like basketball and soccer which also have major video game titles that provide many player attribute ratings.