KDnuggets Interview: Amr Awadallah, CTO & Co-founder, Cloudera on the Need for Self-Service Analytics

We discuss the importance of enabling self-service analytics, partnership with Cask, Big Data vendor selection and competitive landscape.

Twitter Handle: @hey_anmol

amr-awadallahAmr Awadallah is the Founder, CTO at Cloudera,Inc. Before co-founding Cloudera in 2008, Amr (@awadallah) was an Entrepreneur-in-Residence at Accel Partners. Prior to joining Accel he served as Vice President of Product Intelligence Engineering at Yahoo!, and ran one of the very first organizations to use Hadoop for data analysis and business intelligence. Amr joined Yahoo after they acquired his first startup, VivaSmart, in July of 2000.

Amr holds a Bachelor’s and Master’s degrees in Electrical Engineering from Cairo University, Egypt, and a Doctorate in Electrical Engineering from Stanford University.

First part of interview

Here is second part of my interview with him:

Anmol Rajpurohit: Q5. What does the term "Self-service Analytics" mean? What makes it a major growth driver for Cloudera EDH? How does the Xplain.io acquisition help?

self-service-analyticsAmr Awadallah: Self-Service Analytics is about enabling the end users to ask new questions without having to wait months and months for the IT team to bring in the data that they need to ask these new questions. Cloudera's Enterprise Data Hub (EDH) goes at the core of solving that problem. It allows you to consolidate all of your data regardless of type (structured, semi-structured, or unstructured) then later on the business users can extract the sub-set of the schema that they care about dynamically without having to block on the IT team to re-architect the ETL pipeline.

The Xplain.io product allows you to analyse the query logs from a traditional RDBMS then it infers which set of queries are ideal candidates to be moved to the EDH, makes suggestions on how should the data be laid out schema/partition wise, and rewrites the queries to be more efficient in the EDH. It is magic.

AR: Q6. What was the main motivation behind strategic partnership with Cask? How does this impact Cloudera clients?

cask-partnershipAA: You can think of Cask as a middleware abstraction layer on top of the EDH that simplifies a lot of the common development tasks (similar to IBM WebSphere on top of Java for example). For Cloudera customers that are building complex Big Data applications (as opposed to just using SQL or Search) they are able to implement such applications much quicker using the primitives available in the Cask development framework.

AR: Q7. What are the most common concerns that clients have while choosing a vendor for Big Data solutions?

AA: The most common concern is skillset. They are afraid that this is a new technology and they don't have the knowledge inside to deal with it big-dataso they stand still. That paralysis is very dangerous since it means they will deadlock while their competition leapfrogs them technologically. While Hadoop was hard to use 7 years ago, today there is a very rich ecosystem around this platform that makes it much easier to adopt (at Cloudera we have more than 300 software partners with tools and applications that simplify the adoption curve for this platform).

Cloudera also offers administrator training courses, which coupled with Cloudera Manager, allows a Linux sysadmin or DBA to learn how to operate this environment in a matter of weeks. We also offer training for the end users of the system, starting with straightforward data analytics to more advanced data science and machine learning concepts. Check out university.cloudera.com for more details.

AR: Q8. How do you differentiate Cloudera from competitors such as Hortonworks and MapR? How has the competitive landscape changed over the past 2 years?

competitionAA: Our product offering is more mature and more complete, especially when it comes to security and stability. First, on the security front we are the only solution with native encryption and private key management due to our Gazzang acquisition last year. Our security capabilities were also significantly enhanced when Intel partnered with Cloudera and merged their Hadoop distribution with ours.

Second, as I mentioned above, we have a lot of sophistication at Cloudera around the testing of the software to ensure the highest reliability and stability. That sophistication is further strengthened by Cloudera Manager which collects telematics from our installed customer base about how they are operating their clusters. We store all these telematics in our own EDH cluster which allows us to quickly analyse the customer operational data to support them when they have any issues. This data also allows us to do predictive maintenance for our customers, where we predict failure before it happens and reach out to our customers to make corrective changes to their environments.

Cloudera is much older than the other competitors and we have a much large customer base that allowed us to create the largest most-diversified operational telematics database in the industry.

Third part of the interview will be published very soon.

anmol-rajpurohitAnmol Rajpurohit is a software development intern at Salesforce. He is a MDP Fellow and graduate mentor at UCI-Calit2. He has presented his research work at various conferences including IEEE Big Data 2013. He is currently a graduate student (MS, Computer Science) at UC, Irvine.