ODBMS blog, Roberto V. Zicari, September 7, 2012
How easy is to use Hadoop? What are the next generation Hadoop distributions? On these topics, I did Interview John Schroeder, Cofounder and CEO of
Q1. What is the value that Apache Hadoop provides as a Big Data analytics platform?
John Schroeder: Apache Hadoop is a software framework that supports data-intensive distributed applications. Apache Hadoop provides a new platform to analyze and process Big Data. With data growth exploding and new unstructured sources of data expanding a new approach is required to handle the volume, variety and velocity of data. Hadoop was inspired by Google's MapReduce and Google File System (GFS) papers.
Q2. Is scalability the only benefits of Apache Hadoop then?
John Schroeder: No, you can build applications that aren't feasible using traditional data warehouse platforms.
The combination of scale, ability to process unstructured data along with the availability of machine learning algorithms and recommendation engines creates the opportunity to build new game changing applications.
Q7. What is special about MapR architecture? Please give us some technical detail.
John Schroeder: MapR rethought and rebuilt the entire internal infrastructure while maintaining complete compatibility with Hadoop APIs. MapR opened up Hadoop to programming, products and data access by providing a POSIX storage layer, NFS access, ODBC/JDBC and REST. The MapR strategy is to expand use cases appropriate for Hadoop while avoiding proprietary features that would result in vendor lock-in. We rebuilt the underlying storage services layer and eliminated the "append only" limitation of the Hadoop Distributed File System (HDFS). MapR writes directly to block devices eliminating the inefficiencies caused by layering HDFS on top of a Linux local file system. In other Hadoop implementations, these continually rob the entire system of performance efficiencies.
Q11. What are the next generations of Hadoop distributions?
John Schroeder: The first generation of Hadoop surrounded the open source Apache project with services and management utilities. MapR is the next generation of Hadoop that combines standards-based innovations developed by MapR with open source components resulting in the industry's only differentiated distribution that meets the needs of both the largest and emerging Hadoop installations.