KDnuggets Home » News » 2012 » Jan » Software » Big Data Tools: HPCC vs Hadoop  ( < Prev | 12:n03 | Next > )

Big Data Tools: HPCC vs Hadoop


 
  
Four key factors that differentiate HPCC from Hadoop: HPCC Enterprise Control Language, Roxie Delivery Engine, Enterprise Ready, and Beyond MapReduce


Gregory Piatetsky: These points are from HPCC Systems website. I look forward to a response from Hadoop camp.

The four key factors that differentiate HPCC from Hadoop.

ECL 1. HPCC Enterprise Control Language

  • Declarative programming language: Describe what needs to be done and not how to do it
  • Powerful: Unlike Java, high level primitives such as JOIN, TRANSFORM, PROJECT, SORT, DISTRIBUTE, MAP, etc. are available. Higher level code means less programmers and shorter time to deliver complete projects
  • Extensible: As new attributes are defined, they become primitives that other programmers can use
  • Implicitly parallel: Parallelism is built into the underlying platform. The programmer needs not be concerned with it
  • Maintainable: A High level programming language, no side effects and attribute encapsulation provide for more succinct, reliable and easier to troubleshoot code
  • Complete: Unlike Pig and Hive, ECL provides for a complete programming paradigm.
  • Homogeneous: One language to express data algorithms across the entire HPCC platform, including data ETL and delivery.
2. Roxie Delivery Engine
  • Low latency: Data queries are typically completed in fractions of a second
  • Not a key-value store: Unlike HBase, Cassandra and others, Roxie is not limited by the constrains of key-value data stores, allowing for complex queries, multi-key retrieval, fuzzy matching and more
  • Highly available: Roxie is designed to operate in critical environments under the most rigorous service level requirements
  • Scalable: Horizontally linear scalability provides room to accommodate for future data and performance growth
  • Highly concurrent: In a typical environment, thousands of concurrent clients can be simultaneously executing transactions on the same Roxie system
  • Redundant: A shared-nothing architecture with no single points of failure provides extreme fault tolerance
  • ECL inside: One language to describe both: the data transformations in Thor and the data delivery strategies in Roxie.
  • Consistent tools: Thor and Roxie share the same exact set of tools which provides consistency across the platform.
No Speed Limit 3. Beyond MapReduce
  • Open Data Model: Unlike Hadoop, the data model is defined by the user, and it's not constrained by the limitations of a strict key-value paradigm
  • Simple: Unlike Hadoop MapReduce, solutions to complex data problems can be expressed easily and directly in terms of high level ECL primitives. With Hadoop, creating MapReduce solutions to all but the most simple data problems can be a daunting task. Many of these complexities are eliminated by the HPCC programming model
  • Truly parallel: Unlike Hadoop, nodes of a datagraph can be processed in parallel as data seamlessly flows through them. In Hadoop MapReduce (Java, Pig, Hive, Cascading, etc.) almost every complex data transformation requires a series of MapReduce cycles; each of the phases for these cycles cannot be started until the previous phase has completed for every record, which contributes to the well-known "long tail problem" in Hadoop. HPCC effectively avoids this, which effectively results in higher and predictable performance.
  • Powerful optimizer: The HPCC optimizer ensures that submitted ECL code is executed at the maximum possible speed for the underlying hardware. Advanced techniques such as lazy execution and code reordering are thoroughly utilized to maximize performance
4. Finally HPCC is Enterprise Ready.

Read more.

Comments:

Jesse Shaw
I prefer HPCC over Hadoop. You couldn't have been more succinct.


 
Related
Data Mining Software

KDnuggets Home » News » 2012 » Jan » Software » Big Data Tools: HPCC vs Hadoop  ( < Prev | 12:n03 | Next > )