Google Dremel is a scalable, interactive ad-hoc query system for Big Data which greatly exceeds Hadoop capabilities. By combining multi-level execution trees and columnar data layout, it can aggregate trillion-row tables in seconds. You can use Dremel today.
... Hadoop sprung from two research papers Google published in late 2003 and 2004. One described
the Google File System, a way of storing massive amounts of data across thousands of dirt-cheap computer servers, and the other detailed
MapReduce, which pooled the processing power inside all those servers and crunched all that data into something useful. Eight years later, Hadoop is widely used across the web, for data analysis and all sorts of other number-crunching tasks. But Google has moved on.
In 2009, the web giant started replacing GFS and MapReduce with new technologies, and Mike Olson will tell you that these technologies are where the world is going. "If you want to know what the large-scale, high-performance data processing infrastructure of the future looks like, my advice would be to read the Google research papers that are coming out right now," Olson said during a recent panel discussion alongside Wired.
Since the rise of Hadoop, Google has published three particularly interesting papers on the infrastructure that underpins its massive web operation. One details Caffeine, the software platform that builds the index for Google's web search engine. Another shows off Pregel, a "graph database" designed to map the relationships between vast amounts of online information. But the most intriguing paper is the one that describes a tool called
... you can use Dremel today - even if you're not a Google engineer. Google now offers a Dremel web service it calls
BigQuery. You can use the platform via an online API, or application programming interface. Basically, you upload your data to Google, and it lets you run queries on its internal infrastructure.
See also Google Research paper
Dremel: Interactive Analysis of WebScale Datasets, by Sergey Melnik et al.
Dremel is a scalable, interactive ad-hoc query system for analysis
of read-only nested data. By combining multi-level execution
trees and columnar data layout, it is capable of running aggregation
queries over trillion-row tables in seconds. The system scales
to thousands of CPUs and petabytes of data, and has thousands
of users at Google. In this paper, we describe the architecture
and implementation of Dremel, and explain how it complements
MapReduce-based computing. We present a novel columnar storage
representation for nested records and discuss experiments on
few-thousand node instances of the system.