Apache Drill clones Google Dremel Real Time Big Data Tool

Apache Drill projects wants to build an open source version of Google Dremel, which allows real-time querying of Big Data, and is suitable for processing streaming data. This will be a big step beyond Hadoop, which was designed for batch processing.

TechCrunch, Klint Finley, Aug 17, 2012

[ Gregory PS: Google Dremel is a scalable, interactive ad-hoc query system for Big Data which greatly exceeds Hadoop capabilities ]

... Apache Drill is an attempt to build an open source version of Google Dremel, and the project was recently accepted into the Apache Incubator program. It's supported by MapR, a company that sells a modified version of Hadoop with proprietary customizations.

There are other open source real-time big data systems, notably Storm, which was developed at Backtype and open sourced by Twitter, and Apache S4, which was open sourced by Yahoo. Storm in particular has gotten a lot of attention lately, and Nodeable launched a cloud hosted version of recently.

Nodeable CEO Dave Rosenberg says the big difference between Dremel and other real-time big data systems such as Storm and S4 is that these are streaming engines, while Dremel is designed for ad-hoc querying, ie really fast search results.

Hadoop is for batch processing, meaning that queries are run on a set of data that you already have. Streaming engines process data as it comes in. The terms "streaming" and "real time" are often used interchangeably, which could lead to some confusion about Dremel/Drill since they are also referred to as real time.

...

There's another project in the works to create an open source version of Dremel called OpenDremel. Other projects working on speedy queries for big data include Apache CouchDB and the Cloudant backed variant BigCouch.

Read more.

Related
→ Data Mining Software