Early this year I made a big career move: after
almost 7 years working at Yahoo! I joined Greenplum as our VP of Product Management.  The excitement of the new job has been exhilarating – new
industries to understand, Big Data challenges to solve, and the fast moving
pace of a “start-up-like” company.  I’ve always enjoyed learning
new things – it was what I liked best about working in the central
data team at Yahoo!. Greenplum is a place where I have continued to learn.
That said, there are a
few patterns that I experienced at Yahoo! that I continue to see as I meet with
Greenplum customers and prospects who want to tackle the world of Big
Data.  The first is an ever-growing need for analytic agility. 
Organizations are constantly challenged by the time and efforts required to
extract insights from existing data assets and then convert these into actions
(in the form of well-informed business decisions or data-driven
applications).  A portion of this challenge is solved by having the right
platform – one of my favorite parts of being at Greenplum is when I can share
with a customer how Greenplum’s Unified Analytics Platform supports an agile analytics environment.
   But the right platform is only part of the solution.  The
other consistent query I get from prospects and customers is: “What is the right
organizational model for me to be successful with Big Data?”  And while
it’s fun for me to talk about our platform, I think it’s this second aspect of
the Big Data challenge that may be the toughest to solve.  And the most
critical aspect to success.
Strategic Data Solutions
at Yahoo!
In mid-2005 I joined a
newly formed group at Yahoo! called Strategic Data Solutions (SDS).  I
actually sought out a role in the group after reading an article in Information Week about the appointment of Usama Fayyad as
Yahoo!’s Chief Data Officer.  My persistence paid off and I was lucky
enough to get hired, joining a group of other data-loving professionals
(including my current Greenplum colleague Annika Jimenez. 
Yahoo! was among the earlier companies to realize that its data assets were in
fact a strategic asset, and to make a big bet (in the form of
what grew to a 500+ person organization) to extract the most value from this
data.  It turns out that the bet Yahoo! made back in 2005 is similar to
what I see many non-Internet companies starting to do today.  In
industries that people may consider to be old-school when it comes to Big Data
– insurance, manufacturing, utilities – we now see executives starting to
anoint their own Chief Data Officers and rolling out strategic data
initiatives.
So, when these customers
ultimately ask, “What is the right organizational model for me to be successful
with Big Data?” – I look back to the way that the Strategic Data Solutions
organization was set up, and I see a lot of things that we did right.  Of
course there is no single cookie-cutter approach to organizational design that
works in all situations, but I believe that the core philosophy that drove how
SDS was set up can help give other companies a strong framework for how to
think about their strategic data initiatives.  I’ve listed below the core
components of the SDS organization – you can view these both as “product lines”
as well as organizations within the group.   My assertion: when
building out your strategic data organization and its capabilities, think in
terms of the big functional areas below:
- Data Platform: at the core of any strategic data initiative is
     establishing a strong data platform that meets the core data provisioning
     needs of the organizations data consumers.  Be careful not to confuse a data platform initiative
     with a more traditional “data warehouse” initiative.  While one of the functions of the data platform may be to host or
     integrate with a data warehouse, the data platform also needs to support
     data sets that may not typically be in a data warehouse (documents,
     machine-generate logs, etc) and also needs to support workloads that
     aren’t well suited to a traditional data warehouse (sandbox-based
     analytics, feed provisioning to production systems, non-SQL data analysis,
     etc).   At Yahoo! we made a big investment in
     building out a core data platform (originally a home-grown file-based
     system, and ultimately a combination of Hadoop and relational databases) to
     support the broad range of data consumers we needed to support.
- Business Intelligence: one of the mistakes we made early on in SDS was to
     abdicate responsibility for the delivery of the core business intelligence
     needs of the various Yahoo! business units.  It was a convenient
     decision to make initially: it was an area with demanding consumers,
     difficult-to-prove ROI, and was frankly not as “sexy” as the other more
     advanced work that we wanted to do.  Over time, however, we realized
     that supporting the business intelligence needs of our business
     stakeholders needed to be one of our core offerings.  There were
     benefits in terms of data re-use, stakeholder relationships, and other
     economies of scale that made this the right thing for SDS to do.  By
     successfully supporting the BI needs of our business partners we were able
     to “earn the right” to engage with them on the more advanced analytics and
     data services we had to offer.  The key to success here was to
     (appropriately) view our BI investment as a cost center.  We avoided
     getting caught up in the losing battle of trying to show the ROI of our BI
     efforts by instead focusing our ROI-based initiatives in areas where we
     could, in fact, show true returns (see below).
- Data Science Services: within SDS we worked hard to enable customers (the
     various Yahoo! product lines & business units) to derive “actionable
     insights” from the data asset we created with our Data Platform. 
     Often the data science skills required for anything other than traditional
     reporting and BI weren’t resident in the various lines of business. 
     (In fact, we continue to see this challenge today, and are working to help
     solve the Data Scientist skills shortage through things like our innovative partnership with
     Kaggle )  So SDS built out a consultancy-oriented group
     to help our customers move to the next level of analysis.  The
     ultimate goal of our engagements with the business was twofold: first, the
     Data Science team was devoted to solving data-driven problems that
     resulted in a measureable ROI (increasing ad clickthrough rates, reducing
     churn, improving customer acquisition); second, we wanted to train our
     internal business customers on how to use the Data Platform and associated
     tools to do subsequent Data Science projects on their own.
- Data Driven Applications: the ultimate goal of a lot of our Data Science
     initiatives at Yahoo! was to spur the creation of data-driven applications
     that could measurably impact the bottom or top line.   As the
     name implies, these applications leveraged the results of some underlying
     data science efforts (scoring algorithms, recommendation models, pricing
     optimizations) to drive actions taken in Yahoo!s customer and
     internal-facing applications.  The team was structured to work on a
     commissioned project basis: business unites would request support to build
     specific applications and back up their requests with detailed business
     cases.  The Data Driven Applications team would then prioritize the
     long list of incoming requests and methodically tackle the highest-value
     projects.  This model turned out to be a win-win for both SDS and our
     internal customers – the business units received value-enhancing
     data-driven applications; and SDS was able to effectively show how the
     investment in data as a strategic asset was driving true ROI for Yahoo!.
-  Data Distribution:
     a final and important aspect of the strategic data organization is an
     understanding that in addition to supporting the analytical needs (either
     via BI support or data science projects) there is also the need to support
     data distribution.  For example, at Yahoo! the core data platform was
     used to generate segment membership information for billions of users
     (browser cookies) each day.  These profiles needed to be distributed
     out to the operational systems that consumed them – the ad targeting
     platforms – so it was important to have the appropriate infrastructure and
     APIs to allow the consumers of these large data sets to access and move
     them.  Additionally, there were consistent demands to provision
     subsets of the data in the core data platform to other consumers both
     inside and outside of Yahoo!.  The Data Distribution challenge is one
     that many of our Greenplum customers today are started to struggle with as
     well, and it’s important to think about it when scoping out a big data
     strategy.
Dive Right In.  The
Water’s Warm!
Now I can’t guarantee
that the structure described above is perfect for every organization – there
are likely variations of this perspective that have worked for other successful
data groups.  However, I do think the emerging themes are consistent, and
that if you consider the above elements while diving in to the Strategic Data
Organization waters, you’ll be more likely to achieve success.
At the end of the day,
there is a bit of a leap of faith
required to make a strategic bet on big data.  But the data shows that
it’s worth it.  A recent article in the Harvard
Business Review revealed: “In
particular, companies in the top third of their industry in the use of
data-driven decision making were, on average, 5% more productive and 6% more
profitable than their competitors.”
Good luck!

 
Good Luck Josh.
ReplyDeleteI've been working on Big Data as well in the past year and lots of what you wrote resonates with my visions.
Brands apart we're all trying to come up with a credible story so Information Management is taken to the next level in this era of social, mobility and innovation.
All the best,
LMC