KDnuggets Home » News » 2012 » Dec » Publications » On Big Data, Analytics and Hadoop, and Hadapt

On Big Data, Analytics and Hadoop, and Hadapt


 
  
< Previous post Next post >
Unfortunately, Hadoop was designed for dealing with unstructured data, and tasks like extracting keywords from the web to build Google index. Hadoop does not use structure in relational data to speed up query processing, and its performance for processing relational data is suboptimal.


ODBMS Blog, by Roberto V. Zicari, Dec 5, 2012

On Big Data, Analytics and Hadoop. Interview with Daniel Abadi.

On the subject of Big Data, Analytics and Hadoop I have Interviewed Daniel Abadi, associate professor of computer science at Yale University and Chief Scientist and Co-founder of HadaptHadapt.

... RVZ: You have created a start up called Hadapt which claims to be the " first platform to combine Apache Hadoop and relational DBMS technologies". What is it? Why combining Hadoop with Relational database technologies?

Daniel Abadi Daniel Abadi: Hadoop is becoming the standard platform for doing large scale processing of data in the enterprise. It's rate of growth far exceeds any other "Big Data" processing platform. Some people even think that "Hadoop" and "Big Data" are synonymous (though this is an over-characterization). Unfortunately, Hadoop was designed based on a paper by Google in 2004 which was focused on use cases involving unstructured data (e.g. extracting words and phrases from Webpages in order to create Google's Web index).

Since it was not originally designed to leverage the structure in relational data in order to take short-cuts in query processing, its performance for processing relational data is therefore suboptimal.

At Hadapt, we're bringing 3 decades of relational database research to Hadoop. We have added features like indexing, co-partitioned joins, broadcast joins, and SQL access (with interactive query response times) to Hadoop, in order to both accelerate its performance for queries over relational data and also provide an interface that third party data processing and business intelligence tools are familiar with.

Therefore we have taken Hadoop, which used to be just a tool for super-smart data scientists, and brought it to the mainstream by providing a high performance SQL interface that business analysts and data analysis tools already know how to use. However, we've gone a step further and made it possible to include both relational data and non-relational data in the same query; so what we've got now is a platform that people can use to do really new and innovative types of analytics involving both unstructured data like tweets or blog posts and structured data such as traditional transactional data that usually sits in relational databases.

Read more.



< Previous post Next post >

KDnuggets Home » News » 2012 » Dec » Publications » On Big Data, Analytics and Hadoop, and Hadapt