As a data guy, I have always enjoyed learning about the
power of statistics combined with the right metric(s) – together these two
components are able to be used to drive effective decisions that lead to a
desired outcome. A great example of this
– as I am sure many folks are already aware – is documented in Michael Lewis’“Moneyball”. In a nutshell, Moneyball
describes how Oakland A’s General Manager Billy Bean was able to use “The Right
Metric” – in this case On-Base Percentage – combined with statistical analysis
to effectively acquire players that led to overall team success. This approach is nothing new in the world of
sports or business, but it’s good to have this constant reminder of how
creative thinking (the pursuit of the right metrics, the right attributes) and
modeling (using the tools of Data Science) can lead to remarkable results.
In a recent conversation with a few guys that I play
basketball with, we were discussing Kevin Love.
His statistics this year have been simply amazing – he regularly puts up
over 30 points a game and grabs 20 rebounds.
We were trying to figure out how to rank him against other top players
in the league, guys like Kobe Bryant, Kevin Durant, Lebron James. We didn’t really reach a conclusion (other
than the fact that my scruffy facial hair, spot on three point shooting, and
dominant rebounding closely resembles those of Mr. Love). So decided that it was time for a
“Moneyball-Style” investigation of basketball statistics. (AUTHORS NOTE: Why, you may ask, did I decide
this? In reality I just like playing
with data. I also like creating clever
acronyms.) While this analysis is
currently in its early stages, and will likely stay that way, I thought it
would be fun to share it with the world.
I welcome any creative ideas on how to improve on the metric, test its
applicability, or out-do my acronym.
Let’s start with the metric itself, which is PHARTS. A player’s PHARTS rating is calculated as
follows: ([Points Halved] + [Assists] + [Rebounds] –
[Turnovers] + [Steals])/(Minutes)
Once I had the metric, I needed to get some data. Since I am still in the evaluation stage I
tried to find a free database of historical basketball statistics – luckily I
was able to find this at http://www.databasebasketball.com/, at least through 2009. I downloaded the data into my sandbox (in
this case Microsoft Excel) and proceeded to do a little discovery of the shape
of the data – for example, in the early years of the data set a player’s minutes for the season weren't tracked.
Similarly, statistics on steals and turnovers don’t start showing up
until 1975. So as you will see in the
analyses below I am only showing PHARTS ratings for players in the seasons
between 1975 and 2009. I also needed to
filter out players based on the number of minutes they played in the season –
otherwise a player who had an amazing streak of 3 games and then sat on the
bench the rest of the year (can anyone say Jeremy Lin?) might show up as a
top-PHARTS prospect.
OK, so enough talk now – let’s get to the results of my
analysis. To be clear, there is still
work to do (most importantly my analysis remains descriptive – I have yet to
correlate PHARTS scores with some objective metric like team winning
percentage). But the results are still
interesting, and at least merit some discussion.
Let’s start with the simple question: “Based on Career
PHARTS, who are the top 25 players of all time?”. The result of this is show below with a lot
of names you would expect, but also with a few surprises. (NOTE: This is filtered to show only players
with more than 10,000 career playing minutes in the data set).
A lot of the names you see on this list are the expected ones
– Magic Johnson, Larry Bird, Michael Jordan.
But (at least for me) there are a few surprises – Chris Paul is up there
in some rarified air; and who are Mel Daniels, Dan Issel, and LafayetteLever? Also, where is Kobe Bryant? Another note – since I needed to filter out
the players and years where turnovers and steals weren’t tracked there are some
key names that would be added to the top 10 above, including: Wilt
Chamberlin, Bob Pettit, Bill Russell,
Elgin Baylor, and Oscar Robertson.
Now, you might be looking at this list and saying – well,
this is the same answer I would get if I just looked at Points/Minute, isn’t
it? (Hint: remember that Kobe isn’t in
that top 25 list) The chart below shows
the Top 25 players from this same data set – but this time ranked by points. I’ve color coded the bar chart so that anyone
in the Top 25 Career PHARTS list is in green (dark green is the best) and
anyone in red is NOT in the top 25 list.
Michael Jordan is at the top of this list – he was a great
scorer with a great PHARTS rating. But
many of these top scorers are not in the original top 25 list – guys like
Dominique Wilkins, Carmelo Anthony, and Kobe Bryan. These guys are great scorers, but aren’t as
well rounded as the Top 25 list in terms of rebounds or assists. Now, it is fair to say that the PHARTS metric
may be unduly influenced by rebounds, so a future version of this metric might
look only at offensive rebounds (or at least weight them differently). But that’s outside of the scope of my
analysis so far.
There is actually a wide range of factors that contribute to
a player’s PHARTS rating, and it isn’t just scoring and rebounding. For example, John Stockton and Chris Paul
make the Top 25 list by merit of their ridiculously high number of assists per
minute. Swen Nater (who!?) is number 26
by averaging 0.63 rebounds a minute.
Lafayette Lever and John Stockton are also helped into the top 25 by
having the highest number of steals per minute among their PHARTS-leading
counterparts.
Another interesting thing I noticed (I actually started
thinking about this when reading up on the history of folks like Fats Lever and
Michael Adams – high PHARTS guys I had never heard of) is that there was often
an arc to their careers. They started
with one team, had several seasons of greatness, and then were traded and never
experienced their original success. So I
looked at how PHARTS varies based on each player’s number of seasons in the
NBA. I also looked at the variation (I
used the standard deviation of PHARTS) of their performance my season.
So what can aspiring data-driven NBA GMs learn from
this? Don’t acquire a player with a
great PHARTS rating after his 5th season and expect him to continue
to perform at that same rate – on average PHARTS scores peak in a player’s 5th
or 6th season. That said, the
variation among PHARTS scores starts to decrease after a player’s 9th
season (if he lasts that long) – so if
you are able to pick up a seasoned veteran you should be able to predict how he
will perform in subsequent years. It’s a
little bit more unpredictable for players in seasons 6 through 9.
Ok, this is all well and good, but I suppose the real
question is whether a team of strong PHART-ers (as it were) is actually a
strong team. Based on my analysis, here
are the Top 20 teams (from 1975 through 2009) based on single season PHARTS
average for the entire team.
Now, based on my quick analysis of the data, high TEAM
PHARTS does not lead to championships – of this list the 1984 Lakers were in
the Finals (they lost) and the 1977 Sixers were in the Finals (they lost too). That’s a 10% success rate – not so good. So, I tried a different approach – does having
a SINGLE top ranked PHARTS player on your team indicate success? Here are the top 20 single-season PHARTS
ratings by player (with their team listed too).
These results are a little more promising – from this list 6
of the 20 were on Championship Teams.
Now – to be true to this analysis I really should be looking at the
correlation of high individual PHARTS with team winning percentage. For those of you who remember Moneyball (or
who are A’s fans) you know that the Billy Bean approach led to teams with
consistently high winning percentages, but no World Series (although the amazingly do win the World Series in the movie of the same name).
So, where does this leave us? Nowhere, really – except that I can now
comfortably say that PHARTS seems to be an OK measure of a player’s value. Maybe not the best; maybe I will tweak it
some more in the future, but still pretty good.
So the one final thing to do is to apply it to the 2011-2012 NBA player
list, and see if my boy, K-Love, shows up among the top players. Here’s what the PHARTS ratings look like for
an arbitrary selection of top NBA players:
Whoa, wait a minute!
Is it possible that Rajon Rondo is the best player in the league? Maybe he is.
Both he and Kevin Love are up there above the prolific scorers on the
list. And why is that? Well, because Rondo delivers more assists
than anyone in the league, and because Kevin Love isn’t afraid of going inside,
banging some bodies, and grabbing some boards.
Just like me.
A NOTE FOR THOSE OF YOU WHO WERE PAYING ATTENTION:
The point of this entire exercise was to illustrate the process of doing analysis. I spent time finding data. I loaded the data into a tool I was comfortable with. I visualized the data with my favorite visualization tool (Tableau). I iterated on my hypotheses. And I published my insights. Now, my data set in excel was only 20,936 rows. But imagine if you needed to do this on a billion records. Or on a trillion. Could you? If you want to learn more about how to do this at scale, come visit us at http://www.greenplum.com