KDnuggets Interview: Amr Awadallah, CTO & Co-founder, Cloudera on the Future of Information Architecture Design

We discuss Cloudera’s achievements, story behind the name ‘Cloudera’, CTO role, and key attributes of information architecture designed for future.

Twitter Handle: @hey_anmol

amr-awadallahAmr Awadallah is the Founder, CTO at Cloudera,Inc. Before co-founding Cloudera in 2008, Amr (@awadallah) was an Entrepreneur-in-Residence at Accel Partners. Prior to joining Accel he served as Vice President of Product Intelligence Engineering at Yahoo!, and ran one of the very first organizations to use Hadoop for data analysis and business intelligence. Amr joined Yahoo after they acquired his first startup, VivaSmart, in July of 2000.

Amr holds a Bachelor’s and Master’s degrees in Electrical Engineering from Cairo University, Egypt, and a Doctorate in Electrical Engineering from Stanford University.

Here is my interview with him:

Anmol Rajpurohit: Q1. You have already described the founding story of Cloudera on a few interviews available online. So, I would not ask you to repeat that. Rather, I am curious to know - How do you assess the evolution of Cloudera? What do you consider the most significant achievements so far?

clouderaAmr Awadallah: The most significant achievement is taking a technology (Apache Hadoop) which was only relevant to a couple of web companies (like Yahoo and Facebook) and then growing it into a whole industry as we see today. It is very satisfying to see such an evolution happen in a matter of 7 years.

AR: Q2. You have many times shared the history behind the name "Hadoop". I am curious to know who came up with the name "Cloudera" and what did the founders like so much about that name?

data-cloudAA: In our first iteration for the company we were planning on launching Hadoop as a cloud service. We truly believed that the era of the cloud is upon us, hence the name Cloudera. Six months after starting the company we shifted from being a cloud service to on premise enterprise software since that is what our customers wanted back then. We kept our name Cloudera as it was already well known, we thought it was cool :), and we always knew we will be coming back to the cloud (our software now runs in a number of Cloud environments via a product we have called Cloudera Director).

AR: Q3. What are your top priorities currently as the CTO of Cloudera? What does your typical day look like?

ctoAA: I have an article on quora in which I describe what a CTO does, you can find it here: http://www.quora.com/Chief-Technology-Officers/What-does-a-CTO-do . The summary is to balance the outside world (customers, partners, developers) with the inside world (product, technology, culture) without missing any major trends.

In my role I also do a lot of evangelism, which involves giving talks at conferences all around the world. I cannot share my current top priority as that is confidential :) but in general it is to make sure that Cloudera continues to build technology that sets us up for success in the long term.

AR: Q4. With regards to the Modern Information Architecture, what capabilities do you consider the critical success factors? What are your thoughts on the flexibility vs performance trade-off?

AA: Over the last couple of years we proved that this platform is very scalable, very economical, and very flexible (any data type, any workload -- not just SQL). The most critical factors for success right now are to focus information-architectureon making this platform bulletproof in terms of security, and to make sure it is extremely stable and reliable. That is easier said than done since the platform still continues to evolve very quickly with new project additions like Apache Spark or Apache Kafka. All these new additions require very extensive testing to catch all the corner cases and race conditions that might lead to stability issues in such a large scale complex system. At Cloudera we spend almost half of our engineering resources focused on these issues.

In terms of performance vs flexibility, they are both important but sometimes hard to achieve both at same time. If you are running the same dashboard a thousand times every day then you care more about performance. If you are exploring and trying to ask new questions that you haven't thought about before then you care more about flexibility. The beauty of our platform is that it allows you to pick which operating mode you prefer. We have schema-on-read formats like Avro that are extremely flexible, and we have highly optimized schema-on-write formats like Parquet which are very high performance. You pick which is best for the task at hand.

Second part of the interview will be published soon.

anmol-rajpurohitAnmol Rajpurohit is a software development intern at Salesforce. He is a MDP Fellow and graduate mentor at UCI-Calit2. He has presented his research work at various conferences including IEEE Big Data 2013. He is currently a graduate student (MS, Computer Science) at UC, Irvine.