KDnuggets Interview: Amr Awadallah, CTO & Co-founder, Cloudera on the Secret Sauce of Open Source

We discuss the critical success factor for open source projects, entrepreneurial lessons, advice, desired qualities in data scientists and more.

Twitter Handle: @hey_anmol

amr-awadallahAmr Awadallah is the Founder, CTO at Cloudera,Inc. Before co-founding Cloudera in 2008, Amr (@awadallah) was an Entrepreneur-in-Residence at Accel Partners. Prior to joining Accel he served as Vice President of Product Intelligence Engineering at Yahoo!, and ran one of the very first organizations to use Hadoop for data analysis and business intelligence. Amr joined Yahoo after they acquired his first startup, VivaSmart, in July of 2000.

Amr holds a Bachelor’s and Master’s degrees in Electrical Engineering from Cairo University, Egypt, and a Doctorate in Electrical Engineering from Stanford University.

First part of interview

Second part of interview

Here is third and last part of my interview with him:

Anmol Rajpurohit: Q9. The Hadoop ecosystem is a spectacular example of the success of Open Source. However, there are various arenas where open source has not been able so successful (such as ERP systems, etc). What factors do you consider as vital for the success of Open Source projects?

developer-communityAmr Awadallah: Community, community, community. The key reason for why open source projects succeed is fostering a strong community of developers around them, they obviously also need to be solving a real customer problem (as opposed to reinventing the wheel, or building something cool that nobody needs at the first place).

I personally think open source disruption will come to ERP systems, it is a question of when it will happen. In fact, if you look at the current ERP-as-a-service vendors (ala Netsuite and Workday), I wouldn't be surprised if much of their backend is already being built on top of open source solutions.

AR: Q10. Today, there are about 25+ active Apache Projects in Big Data. Which ones are your personal favorite? Why?

AA: I am biased, but my personal favorites are all the ones in the Cloudera Distribution :) One of our key values for our customers is curation of the sparkCloudera distribution by picking the best of breed projects to be part of it. So far our track record have been flawless in terms of making the right choices. For example, about a year and a half ago we added Apache Spark to our distribution, now everybody else is doing the same. I have no favorites among the projects in our distribution, I love them all equally. It is the combined power of these projects, the platform, where the true power lies.

AR: Q11. What are the key entrepreneurial lessons that you have learned from the experience of launching Cloudera and bringing it to the current state? How would you do things differently if you ever plan to launch another startup?

AA: For this question I will refer your readers to this video interview which I did recently at Draper University (http://livestream.com/draperuniversity/amrawadallah). In it I cover the history of formation of Cloudera and what are some of the key entrepreneurial lessons that I learned along that journey.

teamIf I was to pick one lesson it would be: team, team, team. The success of your company will depend entirely on that, make sure you spend your time to hire great people, and make sure to correct that when it isn't working out. If there is one thing I would do differently it would be to correct for hiring mistakes faster, much faster.

AR: Q12. What is the best advice you have got in your career?

AA: Don't waste time on regrets, learn from your mistakes and evolve quickly. This is advice that I formed for myself as I experienced life.

AR: Q13. What key qualities do you look for when interviewing for Data Science related positions?

AA: Our Director of Data Science (Josh Wills) has a great line he uses to describe what is a data scientist:

"A data scientist is a person who is better at statistics than any software engineer and better at software engineering than any statistician".

So the key qualities are to have good statistics background, good software skills, but most importantly have the investigative eye and business acumen to wrangle the data into an insight that makes sense for the business (that last part is what is hard to find).
AR: Q14. What was the last book that you read and liked? What do you like to do when you are not working?

a-briefer-history-of-timeAA: My main hobby is video games, over the last year the games I really liked are: Destiny, Call of Duty AW, Battlefield 4, Last of Us, and Grand Theft Auto 5. I can't wait for the new Uncharted 4 and Halo 5 to come out. As far as books, the last one I read is "A Briefer History of Time", I had read it a long time ago, but wanted to refresh my memory after seeing "The Theory of Everything" movie. A recent movie that I really liked is "The Judge" by Downey Jr. and Duvall, amazing acting, I like movies that make me tear up.

anmol-rajpurohitAnmol Rajpurohit is a software development intern at Salesforce. He is a MDP Fellow and graduate mentor at UCI-Calit2. He has presented his research work at various conferences including IEEE Big Data 2013. He is currently a graduate student (MS, Computer Science) at UC, Irvine.