Interview: Eli Collins, Cloudera on Evolution and Future of Big Data Ecosystem

We discuss the change in Big Data priorities, risks, Big Data ecosystem, rise of data culture in organizations, challenges, advice and more.

eli-collins-clouderaEli Collins is Cloudera's Chief Technologist, currently focused on new technology introduction and strategy. He previously lead the team responsible for Cloudera’s Hadoop distribution (CDH), is an Apache Hadoop committer and PMC member. He also serves on the advisory boards of several startups. Prior to joining Cloudera, Eli worked on processor virtualization and Linux at VMware.
You can find him on Twitter at @elicollins.

Here is my interview with him:

Anmol Rajpurohit: Q1. Can you please describe your current role at Cloudera?

Eli Collins: I work in our CTO office. I’m currently focused cloudera-logoon new technology introduction and strategy. I spend a good amount of time with customers and partners, prospective customers and partners, sharing with them and incorporating what I learn back into our various teams.

AR: Q2. In terms of a hype cycle, where would you currently place Big Data? Are we still in the hype phase? Or, are the expectations now mostly pragmatic?

EC: I’ve seen a drop in excessive big data publicity over the past year, it’s an established term at this point. The stories are shifting from stories about the technology itself to stories about how people are using data to make life better. The Internet of Things seems to be the hype dejour.

AR: Q3. How have the Big Data priorities changed in last five years?

As the basic technology matures there’s a shift towards making it more accessible. We’re talking more about methodologies than capabilities. We’re paying more attention to integrating with the rest of the data ecosystem. We’re moving up stack as the foundational technologies get more mature.

AR: Q4. What are the major risks posed by Big Data? What risk mitigation strategy do you propose for those risks?

EC: I’m not sure there are inherent risks in big data. Like any technology it riskscan be used for good and harm. More and more of our activities are intermediated by machines, we can analyze the data they generate to better understand how things work, and how to improve. i.e. we can apply the scientific method to more parts of life. There are however things we need to be mindful of. For example, if you’re looking at data that’s not representative of the whole population then using analytics can make a system less equitable. I talk about this problem a bit in this talk.

AR: Q5. How would you describe the Big Data ecosystem?

hadoop-big-dataEC: The big data ecosystem evolved from the Internet companies that had these sorts of problems before others. Existing technology and products just didn’t work well for them so they ended up inventing new ones. Most of Cloudera’s founders came from this world, and this is what motivated them to start the company in 2008, when there was an opportunity to bring this new technology to the market.

Now there are dozens of big data companies that are building new products to serve this part of the data management market, and the existing players are adapting their product portfolios. For example, almost every major data management company ships or integrates with the Apache Hadoop ecosystem at this point. This is not to say Hadoop is the only big data technology, just a prominent example.

AR: Q6. We have seen great advancements in the Big Data technology over the past few years. Do you think that the executive mindset and organizational culture has evolved at a similar pace to embrace Big Data?

EC: It’s evolving. I was recently in a meeting where one of the board members of a large company flew most of their executive staff halfway around the world to get educated on big data. They see how their competitors are using data, they see how new data-culturedata-oriented entrants to their markets are competing successfully with incumbents. Companies are increasingly appointing chief data officers, so they’ve been thinking more about data already. Sometimes the initiative comes from the bottom up. Some companies are hiring technology people outside their vertical, for example I’ve seen financial services firms hiring people from Google. Changing culture takes time but it’s happening. A lot of companies are already very data driven, they’re changing the fastest.

AR: Q7. Currently, what are the most common bottlenecks in extracting value from Big Data projects?

EC: analytics-bottlenecksIt really depends on the use case and the organization. With any wildly popular new technology there’s always a shortage of people and skills for a while. In our case the market has been correcting for that well. We started training people over 6 years ago. For some it’s things on the input side of the equation, e.g. getting their data supply chain in shape. For others it’s the “output”, immature tools or missing applications, so more users can get value from the system.

AR: Q8. Where do you see Big Data headed in the next 2-3 years?

EC: Hopefully, as it continues to get baked into all the products and services we consume, it’s less of a thing. Over the next 2 to 3 years it will enable analytics to be a lot more pervasive. Hopefully we’ll take it for granted.

AR: Q9. What are the key criteria that a company must assess while selecting a commercial Hadoop platform vendor?

selection-criteriaEC: I’d start with product capabilities. Actually use the product to get a sense of the differences. Because data platforms are a long-term investment, you also want to think about things like the vendor’s track record with the product and customers, their history of innovation, product roadmap, whether they have the people to support the technology they’re offering, how aligned they are with the other products you use, whether they can be a long term sustainable business, and so on. Most of our customers are pretty strategic about who they work with.

AR: Q10. What is the best advice you have got in your career?

EC: Be open to new things and optimize for learning. Unless you’ve known what you want to do your whole life and really enjoy doing that thing, you’ll need to change, and be able to adapt well to change. A lot of other advice can be derived from this. For example, if you optimize for learning you want to be around people who are smarter and know more than you.

AR: Q11. What was the last book that you read and liked? What do you like to do when you are not working? flash-boys

EC: I enjoyed Flash Boys by Michael Lewis. Unrelated, I keep hoping someone will write The Soul of a New Machine for something newer than the minicomputer. Work and spending time with my wife takes up most of my time. I also enjoy spending time with friends, exercising and reading. I’m not a snowflake.