LexisNexis is releasing a set of open-source, data-processing tools that it says outperforms Hadoop and even handles workloads Hadoop presently can't.
GigaOm, Derrick Harris Jun. 15, 2011
LexisNexis is releasing a set of open-source, data-processing tools that it says outperforms Hadoop and even handles workloads Hadoop presently can't. The technology (and new business line) is called HPCC Systems, and was created 10 years ago within the LexisNexis Risk Solutions division that analyzes huge amounts of data for its customers in intelligence, financial services and other high-profile industries. There have been calls for a legitimate alternative to Hadoop, and this certainly looks like one.
According to Armando Escalante, CTO of LexisNexis Risk Solutions, the company decided to release HPCC now because it wanted to get the technology into the community before Hadoop became the de facto option for big data processing. Escalante told me during a phone call that he thinks of Hadoop as "a guy with a machete in front of a jungle - they made a trail," but that he thinks HPCC is superior.
How HPCC works
Hadoop relies on two core components to store and process huge amounts of data: the Hadoop Distributed File System and Hadoop MapReduce. However, as Cloudant CEO Mike Miller explained in a post over the weekend, MapReduce is still a relatively complex language for writing parallel-processing workflows. HPCC seeks to remedy this with its Enterprise Control Language.
Escalante says ECL is a declarative, data-centric language that abstracts a lot of the work necessary within MapReduce. For certain tasks that take a thousand lines of code in MapReduce, he said, ECL only requires 99 lines. Furthermore, he explained, ECL doesn't care how many nodes are in the cluster because the system automatically distributes data across however many nodes are present. Technically, though, HPCC could run on just a single virtual machine. And, says Escalante, HPCC is written in C++ - like the original Google MapReduce on which Hadoop MapReduce is based - which he says makes it inherently faster than the Java-based Hadoop version.