Interview: Anil Gadre, MapR on 3 Keys for Big Data Success: Reliability, Security, & Scalability

We discuss the origin of Apache Myriad, state of security in Big Data, MapR Quick Start Solutions, Hadoop vendor selection criteria, and more.

Twitter Handle: @hey_anmol

anil-gadre-maprAnil Gadre is the SVP of Product Management at MapR. Prior to MapR, Anil was the EVP of Product Management at Silver Spring Networks, responsible for product strategy, planning and marketing of networking and software products focused on the Smart Grid for the energy industry.

Before that, Anil was with Sun Microsystems, a Fortune 200 technology leader, serving as EVP of The Application Platform Software organization and had previously been the Chief Marketing Officer leading global branding, demand creation and an extensive developer ecosystem program.

He has a BSEE from Stanford University, and an MM degree from the Kellogg School at Northwestern University.

First part of interview

Here is second part of my interview with him:

Anmol Rajpurohit: Q5. When and how was the idea conceived for Project Myriad? What is the current status? Future plans?

Anil Gadre: This is an exciting story! One of our engineers, Santosh Marella, went to a hackathon event in Silicon Valley. The event was focused on trying to bridge the world of Apache Mesos and YARN/Hadoop. He came up with a very clever answer, which was plugin-based, rather than requiring a major change to the core code. Suddenly the two worlds of Mesos and Hadoop could be bridged and he won the competition. That led to creating the Apache open source Project Myriad with the support of community members.
AR: Q6. What are your thoughts on the state of security in enterprise implementations of Big Data solutions? Is security getting the appropriate attention? Or are most of the companies still taking a reactive, rather than proactive, approach to security?
AG: We see our customers paying appropriate attention to security. One reason for this is that we have customers in financial services, telecom, and healthcare, among many other industries, who are used to living in regulated environments with high compliance requirements. There are two aspects that customers should look at. One is the core platform's security, and the second is the broader view of data governance.

AR: Q7. What were the aspirations behind the recently launched MapR Quick Start Solutions? What are the key benefits that they offer?

AG: The Quick Start Solutions were designed to speed the time to value for a customer and radically lower the risk of doing a big data project. We took our experience from hundreds of customers, found the most common use cases, and focused on making it an easy and affordable decision for a customer. The three initial solutions are data warehouse optimization and analytics, improving a website's recommendation engine, and security log analytics. Nearly every customer has at least one of these needs as they start on big data projects, and we have made it really easy to get going with little risk.
AR: Q8. What are some of the important aspects of deriving value from Big Data that often get ignored? What are the questions that clients are not asking their Hadoop distribution vendors, but they really should?

AG: It is important to make sure that there is buy-in and sponsorship from the line of business for any big data project in addition to the IT group who has to make it happen. Secondly, they should pay attention to the people and process side just as much as to the use cases selected, because the real value will come from people using the insights.

As for questions to ask Hadoop vendors, we think customers need to make sure that the platform is ready for their future needs for reliability, datacenter-grade security, and scalability. Once an organization builds confidence in the value of big data, the use cases multiply and the need for mission-critical reliability becomes very important. Customers need to make sure they are ready for this eventuality.

AR: Q9. How do you differentiate the three major players in Hadoop distributions landscape?

AG: While all three players offer essentially similar community software tools, we believe MapR offers more comprehensive support of the Spark environment. Once you get to the foundation however, the difference is that MapR uniquely provides a number of innovations that are highly differentiated both on reliability and real-time capability. MapR has won three "top ranked" analyst awards while the competitors have none (Top-Ranked Hadoop Distribution from Forrester, Top-Ranked NoSQL database from Forrester, Top SQL-on-Hadoop solution from Gigaom). MapR is widely seen as the leading Hadoop provider in high-scale production environments that demand mission-critical reliability.
AR: Q10. How do you think the expectations from Data Science have evolved over time? Where do you see them headed in the future?

analytics-futureAG: The problem is that the data itself is changing. That means that the Data Scientist must figure out how to keep up with that change. This is one reason we are a champion of the Apache Drill project, which brings self-service data exploration to Hadoop, and discovers the data's schema on-the-fly. It is like taking an x-ray of your data to find out what the structure is. This is a radical advance in the ability to explore big data to figure out which use cases might be most valuable.

AR: Q11. What is the best advice you have got in your career?

AG: Like what you do because you will wind up doing a better job on whatever you work on!

agilityAR: Q12. What are the key attributes that you look for when interviewing for Data Science-related positions on your team?

AG: With the data changing as rapidly as it is, we tend to prefer high agility people. What matters is how quickly they can learn new things.

anmol-rajpurohitAnmol Rajpurohit is a software development intern at Salesforce. He is former MDP Fellow and graduate mentor at UCI-Calit2. He has presented his research work at various conferences including IEEE Big Data 2013. He is currently a graduate student (MS, Computer Science) at UC, Irvine.