Early this year I made a big career move: after
almost 7 years working at Yahoo! I joined Greenplum as our VP of Product Management. The excitement of the new job has been exhilarating – new
industries to understand, Big Data challenges to solve, and the fast moving
pace of a “start-up-like” company. I’ve always enjoyed learning
new things – it was what I liked best about working in the central
data team at Yahoo!. Greenplum is a place where I have continued to learn.
That said, there are a
few patterns that I experienced at Yahoo! that I continue to see as I meet with
Greenplum customers and prospects who want to tackle the world of Big
Data. The first is an ever-growing need for analytic agility.
Organizations are constantly challenged by the time and efforts required to
extract insights from existing data assets and then convert these into actions
(in the form of well-informed business decisions or data-driven
applications). A portion of this challenge is solved by having the right
platform – one of my favorite parts of being at Greenplum is when I can share
with a customer how Greenplum’s Unified Analytics Platform supports an agile analytics environment.
But the right platform is only part of the solution. The
other consistent query I get from prospects and customers is: “What is the right
organizational model for me to be successful with Big Data?” And while
it’s fun for me to talk about our platform, I think it’s this second aspect of
the Big Data challenge that may be the toughest to solve. And the most
critical aspect to success.
Strategic Data Solutions
at Yahoo!
In mid-2005 I joined a
newly formed group at Yahoo! called Strategic Data Solutions (SDS). I
actually sought out a role in the group after reading an article in Information Week about the appointment of Usama Fayyad as
Yahoo!’s Chief Data Officer. My persistence paid off and I was lucky
enough to get hired, joining a group of other data-loving professionals
(including my current Greenplum colleague Annika Jimenez.
Yahoo! was among the earlier companies to realize that its data assets were in
fact a strategic asset, and to make a big bet (in the form of
what grew to a 500+ person organization) to extract the most value from this
data. It turns out that the bet Yahoo! made back in 2005 is similar to
what I see many non-Internet companies starting to do today. In
industries that people may consider to be old-school when it comes to Big Data
– insurance, manufacturing, utilities – we now see executives starting to
anoint their own Chief Data Officers and rolling out strategic data
initiatives.
So, when these customers
ultimately ask, “What is the right organizational model for me to be successful
with Big Data?” – I look back to the way that the Strategic Data Solutions
organization was set up, and I see a lot of things that we did right. Of
course there is no single cookie-cutter approach to organizational design that
works in all situations, but I believe that the core philosophy that drove how
SDS was set up can help give other companies a strong framework for how to
think about their strategic data initiatives. I’ve listed below the core
components of the SDS organization – you can view these both as “product lines”
as well as organizations within the group. My assertion: when
building out your strategic data organization and its capabilities, think in
terms of the big functional areas below:
- Data Platform: at the core of any strategic data initiative is
establishing a strong data platform that meets the core data provisioning
needs of the organizations data consumers. Be careful not to confuse a data platform initiative
with a more traditional “data warehouse” initiative. While one of the functions of the data platform may be to host or
integrate with a data warehouse, the data platform also needs to support
data sets that may not typically be in a data warehouse (documents,
machine-generate logs, etc) and also needs to support workloads that
aren’t well suited to a traditional data warehouse (sandbox-based
analytics, feed provisioning to production systems, non-SQL data analysis,
etc). At Yahoo! we made a big investment in
building out a core data platform (originally a home-grown file-based
system, and ultimately a combination of Hadoop and relational databases) to
support the broad range of data consumers we needed to support.
- Business Intelligence: one of the mistakes we made early on in SDS was to
abdicate responsibility for the delivery of the core business intelligence
needs of the various Yahoo! business units. It was a convenient
decision to make initially: it was an area with demanding consumers,
difficult-to-prove ROI, and was frankly not as “sexy” as the other more
advanced work that we wanted to do. Over time, however, we realized
that supporting the business intelligence needs of our business
stakeholders needed to be one of our core offerings. There were
benefits in terms of data re-use, stakeholder relationships, and other
economies of scale that made this the right thing for SDS to do. By
successfully supporting the BI needs of our business partners we were able
to “earn the right” to engage with them on the more advanced analytics and
data services we had to offer. The key to success here was to
(appropriately) view our BI investment as a cost center. We avoided
getting caught up in the losing battle of trying to show the ROI of our BI
efforts by instead focusing our ROI-based initiatives in areas where we
could, in fact, show true returns (see below).
- Data Science Services: within SDS we worked hard to enable customers (the
various Yahoo! product lines & business units) to derive “actionable
insights” from the data asset we created with our Data Platform.
Often the data science skills required for anything other than traditional
reporting and BI weren’t resident in the various lines of business.
(In fact, we continue to see this challenge today, and are working to help
solve the Data Scientist skills shortage through things like our innovative partnership with
Kaggle ) So SDS built out a consultancy-oriented group
to help our customers move to the next level of analysis. The
ultimate goal of our engagements with the business was twofold: first, the
Data Science team was devoted to solving data-driven problems that
resulted in a measureable ROI (increasing ad clickthrough rates, reducing
churn, improving customer acquisition); second, we wanted to train our
internal business customers on how to use the Data Platform and associated
tools to do subsequent Data Science projects on their own.
- Data Driven Applications: the ultimate goal of a lot of our Data Science
initiatives at Yahoo! was to spur the creation of data-driven applications
that could measurably impact the bottom or top line. As the
name implies, these applications leveraged the results of some underlying
data science efforts (scoring algorithms, recommendation models, pricing
optimizations) to drive actions taken in Yahoo!s customer and
internal-facing applications. The team was structured to work on a
commissioned project basis: business unites would request support to build
specific applications and back up their requests with detailed business
cases. The Data Driven Applications team would then prioritize the
long list of incoming requests and methodically tackle the highest-value
projects. This model turned out to be a win-win for both SDS and our
internal customers – the business units received value-enhancing
data-driven applications; and SDS was able to effectively show how the
investment in data as a strategic asset was driving true ROI for Yahoo!.
- Data Distribution:
a final and important aspect of the strategic data organization is an
understanding that in addition to supporting the analytical needs (either
via BI support or data science projects) there is also the need to support
data distribution. For example, at Yahoo! the core data platform was
used to generate segment membership information for billions of users
(browser cookies) each day. These profiles needed to be distributed
out to the operational systems that consumed them – the ad targeting
platforms – so it was important to have the appropriate infrastructure and
APIs to allow the consumers of these large data sets to access and move
them. Additionally, there were consistent demands to provision
subsets of the data in the core data platform to other consumers both
inside and outside of Yahoo!. The Data Distribution challenge is one
that many of our Greenplum customers today are started to struggle with as
well, and it’s important to think about it when scoping out a big data
strategy.
Dive Right In. The
Water’s Warm!
Now I can’t guarantee
that the structure described above is perfect for every organization – there
are likely variations of this perspective that have worked for other successful
data groups. However, I do think the emerging themes are consistent, and
that if you consider the above elements while diving in to the Strategic Data
Organization waters, you’ll be more likely to achieve success.
At the end of the day,
there is a bit of a leap of faith
required to make a strategic bet on big data. But the data shows that
it’s worth it. A recent article in the Harvard
Business Review revealed: “In
particular, companies in the top third of their industry in the use of
data-driven decision making were, on average, 5% more productive and 6% more
profitable than their competitors.”
Good luck!