The implications of this shift (let's assume it is a shift) are numerous - but one is that the emerging practice and community of data science will require a new set of tools and capabilities. I read a great article recently (if someone can point me to the link, I can reference it appropriately) that compared the emerging data science practice to the emergence of computer scientists several decades ago. Before there were classes, tools, etc. for computer scientists we had electrical engineers, applied mathematicians, and other quantitative fields contributing to the emerging field of computer science. Today you can major in computer science, there are software packages built exclusively for computer scientists (IDEs, for example) - it's simply become part of our standard nomenclature. The same thing is happening with data science, and what we refer to at Greenplum as Big Analytics - a set of tools and capabilities are emerging (and still need to be developed) that enable the world's data scientists to their jobs better, faster, and with bigger and bigger data.
Sunday, February 12, 2012
DATA, DATA EVERYWHERE
The implications of this shift (let's assume it is a shift) are numerous - but one is that the emerging practice and community of data science will require a new set of tools and capabilities. I read a great article recently (if someone can point me to the link, I can reference it appropriately) that compared the emerging data science practice to the emergence of computer scientists several decades ago. Before there were classes, tools, etc. for computer scientists we had electrical engineers, applied mathematicians, and other quantitative fields contributing to the emerging field of computer science. Today you can major in computer science, there are software packages built exclusively for computer scientists (IDEs, for example) - it's simply become part of our standard nomenclature. The same thing is happening with data science, and what we refer to at Greenplum as Big Analytics - a set of tools and capabilities are emerging (and still need to be developed) that enable the world's data scientists to their jobs better, faster, and with bigger and bigger data.
Thursday, February 9, 2012
THE BEGINNING OF SOMETHING BIG
But at Yahoo! I learned that what I'd been doing so far was child's play. The first application I worked on, an internal tool for measuring the reach and engagement of Yahoo! properties, processed over 4 billion events a day and stored that data in a database that was 5 times larger than anything I had worked with to date. During my time at Yahoo! I worked on data pipelines and analytical applications that routinely process tens of Terabytes a day of raw data, and data marts that easily break the 100 Terabyte mark. And for a while, I think that Yahoo! was one of the few companies in the world that needed to and had the capacity to work with such data sizes.
What I see today is that times have changed: what was once the rarefied air of the few is becoming the daily Big Data challenge of the many. It's no longer sufficient to expect that Big Data systems need to be run by a select few who know how to effectively deploy and manage Hadoop clusters or manually tune and maintain an MPP database. In the emerging world of Big Data the companies that survive and win will be those that can get past the hurdles of just "managing" Big Data, but can rapidly iterate and learn using Big Analytics.
So, next week I will embark on a new journey, taking what I've learned over the past 12 years and applying it to the industries first real Big Data and Big Analytics platform. I am sure it will be a fun ride, and a chance to do something, well... BIG.
Subscribe to:
Posts (Atom)