As a data guy, I have always enjoyed learning about the power of statistics combined with the right metric(s) – together these two components are able to be used to drive effective decisions that lead to a desired outcome. A great example of this – as I am sure many folks are already aware – is documented in Michael Lewis’“Moneyball”. In a nutshell, Moneyball describes how Oakland A’s General Manager Billy Bean was able to use “The Right Metric” – in this case On-Base Percentage – combined with statistical analysis to effectively acquire players that led to overall team success. This approach is nothing new in the world of sports or business, but it’s good to have this constant reminder of how creative thinking (the pursuit of the right metrics, the right attributes) and modeling (using the tools of Data Science) can lead to remarkable results.
In a recent conversation with a few guys that I play basketball with, we were discussing Kevin Love. His statistics this year have been simply amazing – he regularly puts up over 30 points a game and grabs 20 rebounds. We were trying to figure out how to rank him against other top players in the league, guys like Kobe Bryant, Kevin Durant, Lebron James. We didn’t really reach a conclusion (other than the fact that my scruffy facial hair, spot on three point shooting, and dominant rebounding closely resembles those of Mr. Love). So decided that it was time for a “Moneyball-Style” investigation of basketball statistics. (AUTHORS NOTE: Why, you may ask, did I decide this? In reality I just like playing with data. I also like creating clever acronyms.) While this analysis is currently in its early stages, and will likely stay that way, I thought it would be fun to share it with the world. I welcome any creative ideas on how to improve on the metric, test its applicability, or out-do my acronym.
Let’s start with the metric itself, which is PHARTS. A player’s PHARTS rating is calculated as follows: ([Points Halved] + [Assists] + [Rebounds] – [Turnovers] + [Steals])/(Minutes)
Once I had the metric, I needed to get some data. Since I am still in the evaluation stage I tried to find a free database of historical basketball statistics – luckily I was able to find this at http://www.databasebasketball.com/, at least through 2009. I downloaded the data into my sandbox (in this case Microsoft Excel) and proceeded to do a little discovery of the shape of the data – for example, in the early years of the data set a player’s minutes for the season weren't tracked. Similarly, statistics on steals and turnovers don’t start showing up until 1975. So as you will see in the analyses below I am only showing PHARTS ratings for players in the seasons between 1975 and 2009. I also needed to filter out players based on the number of minutes they played in the season – otherwise a player who had an amazing streak of 3 games and then sat on the bench the rest of the year (can anyone say Jeremy Lin?) might show up as a top-PHARTS prospect.
OK, so enough talk now – let’s get to the results of my analysis. To be clear, there is still work to do (most importantly my analysis remains descriptive – I have yet to correlate PHARTS scores with some objective metric like team winning percentage). But the results are still interesting, and at least merit some discussion.
Let’s start with the simple question: “Based on Career PHARTS, who are the top 25 players of all time?”. The result of this is show below with a lot of names you would expect, but also with a few surprises. (NOTE: This is filtered to show only players with more than 10,000 career playing minutes in the data set).
A lot of the names you see on this list are the expected ones – Magic Johnson, Larry Bird, Michael Jordan. But (at least for me) there are a few surprises – Chris Paul is up there in some rarified air; and who are Mel Daniels, Dan Issel, and LafayetteLever? Also, where is Kobe Bryant? Another note – since I needed to filter out the players and years where turnovers and steals weren’t tracked there are some key names that would be added to the top 10 above, including: Wilt Chamberlin, Bob Pettit, Bill Russell, Elgin Baylor, and Oscar Robertson.
Now, you might be looking at this list and saying – well, this is the same answer I would get if I just looked at Points/Minute, isn’t it? (Hint: remember that Kobe isn’t in that top 25 list) The chart below shows the Top 25 players from this same data set – but this time ranked by points. I’ve color coded the bar chart so that anyone in the Top 25 Career PHARTS list is in green (dark green is the best) and anyone in red is NOT in the top 25 list.
Michael Jordan is at the top of this list – he was a great scorer with a great PHARTS rating. But many of these top scorers are not in the original top 25 list – guys like Dominique Wilkins, Carmelo Anthony, and Kobe Bryan. These guys are great scorers, but aren’t as well rounded as the Top 25 list in terms of rebounds or assists. Now, it is fair to say that the PHARTS metric may be unduly influenced by rebounds, so a future version of this metric might look only at offensive rebounds (or at least weight them differently). But that’s outside of the scope of my analysis so far.
There is actually a wide range of factors that contribute to a player’s PHARTS rating, and it isn’t just scoring and rebounding. For example, John Stockton and Chris Paul make the Top 25 list by merit of their ridiculously high number of assists per minute. Swen Nater (who!?) is number 26 by averaging 0.63 rebounds a minute. Lafayette Lever and John Stockton are also helped into the top 25 by having the highest number of steals per minute among their PHARTS-leading counterparts.
Another interesting thing I noticed (I actually started thinking about this when reading up on the history of folks like Fats Lever and Michael Adams – high PHARTS guys I had never heard of) is that there was often an arc to their careers. They started with one team, had several seasons of greatness, and then were traded and never experienced their original success. So I looked at how PHARTS varies based on each player’s number of seasons in the NBA. I also looked at the variation (I used the standard deviation of PHARTS) of their performance my season.
So what can aspiring data-driven NBA GMs learn from this? Don’t acquire a player with a great PHARTS rating after his 5th season and expect him to continue to perform at that same rate – on average PHARTS scores peak in a player’s 5th or 6th season. That said, the variation among PHARTS scores starts to decrease after a player’s 9th season (if he lasts that long) – so if you are able to pick up a seasoned veteran you should be able to predict how he will perform in subsequent years. It’s a little bit more unpredictable for players in seasons 6 through 9.
Ok, this is all well and good, but I suppose the real question is whether a team of strong PHART-ers (as it were) is actually a strong team. Based on my analysis, here are the Top 20 teams (from 1975 through 2009) based on single season PHARTS average for the entire team.
Now, based on my quick analysis of the data, high TEAM PHARTS does not lead to championships – of this list the 1984 Lakers were in the Finals (they lost) and the 1977 Sixers were in the Finals (they lost too). That’s a 10% success rate – not so good. So, I tried a different approach – does having a SINGLE top ranked PHARTS player on your team indicate success? Here are the top 20 single-season PHARTS ratings by player (with their team listed too).
These results are a little more promising – from this list 6 of the 20 were on Championship Teams. Now – to be true to this analysis I really should be looking at the correlation of high individual PHARTS with team winning percentage. For those of you who remember Moneyball (or who are A’s fans) you know that the Billy Bean approach led to teams with consistently high winning percentages, but no World Series (although the amazingly do win the World Series in the movie of the same name).
So, where does this leave us? Nowhere, really – except that I can now comfortably say that PHARTS seems to be an OK measure of a player’s value. Maybe not the best; maybe I will tweak it some more in the future, but still pretty good. So the one final thing to do is to apply it to the 2011-2012 NBA player list, and see if my boy, K-Love, shows up among the top players. Here’s what the PHARTS ratings look like for an arbitrary selection of top NBA players:
Whoa, wait a minute! Is it possible that Rajon Rondo is the best player in the league? Maybe he is. Both he and Kevin Love are up there above the prolific scorers on the list. And why is that? Well, because Rondo delivers more assists than anyone in the league, and because Kevin Love isn’t afraid of going inside, banging some bodies, and grabbing some boards. Just like me.
A NOTE FOR THOSE OF YOU WHO WERE PAYING ATTENTION:
The point of this entire exercise was to illustrate the process of doing analysis. I spent time finding data. I loaded the data into a tool I was comfortable with. I visualized the data with my favorite visualization tool (Tableau). I iterated on my hypotheses. And I published my insights. Now, my data set in excel was only 20,936 rows. But imagine if you needed to do this on a billion records. Or on a trillion. Could you? If you want to learn more about how to do this at scale, come visit us at http://www.greenplum.com