KDnuggets Home » News » 2012 » Feb » Publications » On Big Data Analytics: Interview with Florian Waas, EMC/Greenplum.  ( < Prev | 12:n03 | Next > )

On Big Data Analytics: Interview with Florian Waas, EMC/Greenplum.


 
  
With terabytes, things are actually pretty simple - most conventional databases scale to terabytes these days. However, scaling to petabytes is a whole different ball game.


ODBMS Blog, by Roberto V. Zicari on February 1, 2012
"With terabytes, things are actually pretty simple - most conventional databases scale to terabytes these days. However, try to scale to petabytes and it's a whole different ball game." -Florian Waas.

Greenplum On the subject of Big Data Analytics, I interviewed Florian Waas (flw). Florian is the Director of Software Engineering at EMC/Greenplum and heads up the Query Processing team.

Q1. What are the main technical challenges for big data analytics?

Florian Waas: Put simply, in the Big Data era the old paradigm of shipping data to the application isn't working any more. Rather, the application logic must "come" to the data or else things will break: this is counter to conventional wisdom and the established notion of strata within the database stack.
Instead of stand-alone products for ETL, BI/reporting and analytics we have to think about seamless integration: in what ways can we open up a data processing platform to enable applications to get closer?
What language interfaces, but also what resource management facilities can we offer? And so on.

At Greenplum, we've pioneered a couple of ways to make this integration reality: a few years ago with a Map-Reduce interface for the database and more recently with MADlib, an open source in-database analytics package. In fact, both rely on a powerful query processor under the covers that automates shipping application logic directly to the data.

Q2. When dealing with terabytes to petabytes of data, how do you ensure scalability and performance?

Florian Waas: With terabytes, things are actually pretty simple - most conventional databases scale to terabytes these days. However, try to scale to petabytes and it's a whole different ball game.
Scale and performance requirements strain conventional databases. Almost always, the problems are a matter of the underlying architecture. If not built for scale from the ground-up a database will ultimately hit the wall - this is what makes it so difficult for the established vendors to play in this space because you cannot simply retrofit a 20+ year-old architecture to become a distributed MPP database over night.
Having said that, over the past few years, a whole crop of new MPP database companies has demonstrated that multiple PB's don't pose a terribly big challenge if you approach it with the right architecture in mind.

Read more.


KDnuggets Home » News » 2012 » Feb » Publications » On Big Data Analytics: Interview with Florian Waas, EMC/Greenplum.  ( < Prev | 12:n03 | Next > )