Pivotal HD ODBMS Interview with Scott Yara and Florian Waas
ODBMS Editor Roberto Zicari talks to leaders of the new Pivotal about their new platform and Pivotal HD - their own Hadoop version.
by Roberto V. Zicari on April 22, 2013
Greenplum announced on Monday, February 25th a new Hadoop distribution: Pivotal HD. I asked a few questions on Pivotal HD to Scott Yara, Senior Vice President, Products and Co-Founder Greenplum/EMC, and Florian Waas, Senior Director of Advanced Research and Development at Greenplum/EMC.
Q1. What is in your opinion the status of adoption of, and investment in, open source projects such as Hadoop within the Enterprise?
Scott Yara, Florian Waas: We have seen a massive shift in perception when it comes to open source.
In the past, innovation was primarily driven by commercial R&D departments and open source was merely trying to catch up to them. And even though a number of open source projects from that era have become household names they weren't necessarily viewed as leaders in innovation.
This has fundamentally changed in recent years: open source has become a hotbed of innovation in particular in infrastructure technology. Hadoop and a variety of other data management and database products are testament to this change. Enterprise customers do realize this trend and have started adopting open source large-scale. It allows them to get their hands on new technology much faster than was the case before and as a additional perk this technology comes without the dreaded vendor lock-in. ...
Q4. How did you expand Hadoop capabilities as a data platform with Pivotal HD?
Scott Yara, Florian Waas: Pivotal HD is a full Apache HD distribution plus some Pivotal add-ons. As we said before, the HDFS abstraction is a pretty good one-but the standard stack on top of it is lacks severely in performance and expressiveness; so we give customers better alternatives. For enterprise customers this means: you can use Pivotal HD like regular Hadoop where applicable but if you need more, you get it in the same bundle.
Q5. What is the rationale beyond introducing HAWQ, a relational database that runs atop of HDFS?
Scott Yara, Florian Waas: Not quite. We've transplanted a modern distributed query engine onto HDFS. We stripped out a lot of "incidental" database technology that databases are notorious for. HAWQ gives enterprises the best of both worlds: high-performance query processing for a query language they already know on the one hand, and scalable open storage on the other hand. And, unlike with a database, data isn't locked away in a proprietary format: in HAWQ you can access all stored data with any number of tools when you need to.