ODBMS Blog, by Roberto V. Zicari, Dec 5, 2012
On Big Data, Analytics and Hadoop. Interview with Daniel Abadi.
On the subject of Big Data, Analytics and Hadoop I have Interviewed
Daniel Abadi, associate professor of computer science at Yale University and Chief Scientist and Co-founder of
Hadapt.
... RVZ: You have created a start up called Hadapt which claims to be the " first platform to combine Apache Hadoop and relational DBMS technologies". What is it? Why combining Hadoop with Relational database technologies?
Daniel Abadi: Hadoop is becoming the standard platform for doing large scale processing of data in the enterprise. It's rate of growth far exceeds any other "Big Data" processing platform. Some people even think that "Hadoop" and "Big Data" are synonymous (though this is an over-characterization). Unfortunately, Hadoop was designed based on a paper by Google in 2004 which was focused on use cases involving unstructured data (e.g. extracting words and phrases from Webpages in order to create Google's Web index).
Since it was not originally designed to leverage the structure in relational data in order to take short-cuts in query processing, its performance for processing relational data is therefore suboptimal.
At Hadapt, we're bringing 3 decades of relational database research to Hadoop. We have added features like indexing, co-partitioned joins, broadcast joins, and SQL access (with interactive query response times) to Hadoop, in order to both accelerate its performance for queries over relational data and also provide an interface that third party data processing and business intelligence tools are familiar with.
Therefore we have taken Hadoop, which used to be just a tool for super-smart data scientists, and brought it to the mainstream by providing a high performance SQL interface that business analysts and data analysis tools already know how to use. However, we've gone a step further and made it possible to include both relational data and non-relational data in the same query; so what we've got now is a platform that people can use to do really new and innovative types of analytics involving both unstructured data like tweets or blog posts and structured data such as traditional transactional data that usually sits in relational databases.
Read more.
| Next post |