This development is not surprising, as the values of agile development should resonate with anyone who's involved in delivering data and insights. At it's core, agile values:
- Individuals and interactions over processes and tools
- Working software over comprehensive documentation
- Customer collaboration over contract negotiation
- Responding to change over following a plan
Let's start with the implications of "agile" in the world of Big Data Analytics (and also, Big Data Applications). Based on my experience at Yahoo! (and also, based on what I am starting to see from Greenplum's customers) there is kind of an evolution of needs that takes place during the lifecycle of Big Data Analytics application development - let's call this the Analytics Application Development Lifecycle. Before diving into what this lifecycle looks like, let's first talk about the environment that enterprises with Big Data are dealing with today. In general, I've seen the following characterstics, which end up informing what an Agile Big Data environment needs to support:
- Underlying Data Sets are Fast Changing: In this environment, timely analysis of new products and concepts is a competitive advantage. As a result, data processing and analysis systems need to be flexible enough to support underlying changes without requiring a rewrite or a new data model.
- Demand for Analytics is Time Sensitive: In the big data world, the ability to analyze new features that are in production and impact revenue/ monetization is critical. Delays in turning around new requests can result in serious financial impact or customer risk.
- Business Questions and Data Needs are Unpredictable: Anyone who is supporting the Business Intelligence (BI) needs of a "Big-Data-Driven" organization will tell you that reporting and analysis needs for new features can’t be anticipated - additional data needs often arise as the result of first-pass analyses. This means that data query and analysis systems must be built for unpredictable demands.
- Volumes of Data and Data Consumers are Extremely Large: Analytics systems need to support deep analysis by data scientists, dashboards and reporting for larger internal user bases, and consumption by operational systems. To complicate things, all of these capabilities need to scale to support massive & growing data sets.
Given the above, what does a Big Data & Analytics platform need to do? It needs to support the analytics lifecyle as shown below.
A system that can easily support the above flow - with a focus on iterative, collaborative development within the "Ad Hoc" and "Proving Ground" quadrants - is well positioned to drive success for Big Data, Big Analytics initiatives. (Obviously I am biased, but check out Greenplum's recent launch of Chorus to understand our vision here.) When evaluating your own platform to assess whether it's ready to support this lifecycle my advice is to focus on the capabilities described in the Top Ten list below. Now - this is not a comprehensive list, but it captures the core elements that one should be looking for as part of a data platform rollout.
- Ad-hoc access to “raw” event/user level data
- Data source agnosticism – Hadoop & RDBMS interop
- Data search and discovery
- Analysis- and Developer-friendly environment – SQL, Code
- Lower-than-average cost of change for new data, metrics
- Schedule and publish capabilities for views, tables, insights
- Unified catalog/metadata service
- 3rd Party Tool “friendliness”
- Resource management for ad-hoc & production workloads
- Enterprise features for the entire data system
There are plenty of other things to think about as well: do you have the right "Data Scientists" within your organization to leverage this platform? Are you properly instrumenting your products and processes to drive data into your data platform? Are you thinking about closing the loop by building applications and systems that can leverage the insights delivered by your data science team (operationalization, as it were)? All things to keep in mind as you venture into the exciting new world of Big Data, and Big Analytics.