Interview: M.C. Srivas, CTO, MapR on Data Agility – The Next Frontier of Big Data

We discuss the competitive differentiation of MapR, challenges in consumerizing Big Data, trends, strategy recommendations, desired skills and more.

mc-srivasM.C. Srivas ran one of the major search infrastructure teams at Google where GFS, BigTable and MapReduce were used extensively. He wanted to provide that powerful capability to everyone, and started MapR on his vision to build the next-generation platform for semi-structured big data. His strategy was to evolve Hadoop and bring simplicity of use, extreme speed and complete reliability to Hadoop users everywhere, and make it seamlessly easy for enterprises to use this powerful new way to get deep insights. That vision is shared by all at MapR. Srivas brings to MapR his experiences at Google, Spinnaker Networks, Transarc in building game-changing products that advance the state of the art.

Srivas was Chief Architect at Spinnaker Networks (now NTAP) which built the industry's fastest single-box NAS filer, as well as the industry's most scalable clustered filer. Previously, he managed the Andrew File System (AFS) engineering team at Transarc (now IBM). AFS is now standard classroom material in operating systems courses. While not writing code, Srivas enjoys playing tennis, badminton and volleyball. M.C. has an MS in Computer Science from University of Delaware, and a B.Tech. in electrical engineering from IIT Delhi.

First part of interview

Here is second and last part of my interview with him:

Anmol Rajpurohit Q7. How do you distinguish MapR from the competition such as Cloudera and Hortonworks?

M.C. Srivas: MapR is the only Hadoop distribution that brings the kind of integrity, reliability, performance, and manageability that enterprises have long desiredcompetition in all their systems. The other Hadoop distributions are all pretty similar to each other, and cannot deliver the sort of reliability and manageability that MapR delivers. For example, only the MapR Distribution including Hadoop can continuously self-heal and self-repair from any component failure automatically, whether it be a disk, a server, a rack, or the whole data center itself. MapR is the only distribution that automatically protects and versions your data, letting you roll back or roll forward in case of accidental corruptions or deletions.

MapR is the only distribution for Hadoop, and in fact the only commercial data processing system, that can work across data centers worldwide, automatically move petabytes, and switch over from one to another and back, whether due to “follow the sun” operations, or due to disaster recovery. The Aadhaar project in India moved to the MapR Distribution including Hadoop to take advantage of such advanced capabilities.

AR: Q8. Given the current state of technology and talent, what do you see as the biggest challenges in the way of consumerizing Big Data on a massive scale?

MCS: The “consumerization,” as you term it, has already taken place at Yahoo, Facebook, Twitter, LinkedIn, and at many, many other websites around the world. The challenge is always to balance privacy while delivering these services. People have started to get afraid of being tracked, and have started hiding themselves.

Real life is about “forgive and forget” … you are innocent until proven guilty, and if you have done something wrong and paid your dues, then you are entitled to a new start. The web, as it gets more and more entangled in real life, needs to mimic these human aspects and needs to get more forgiving and forgetting.

If they don’t do it themselves, it will get legislated and that can take a turn for the worse. For example, the European Union already has passed some bizarre laws about people who can ask web sites to not link to content that is already a matter of public record. Big Data is a powerful tool, and it must be used with care responsibly.

AR: Q9. Which trends do you expect to dominate across the big data industry over the next 2-3 years?

MCS: Legacy databases and data warehouses are expensive because DBA resources are required to flatten, summarize and fully structure the data. Upfront DBA costs delay access to new data sources, and big-data-trendsthe rigid structure is very difficult to alter over time. The result is that legacy databases are not agile enough to meet the needs of most organizations today. Earlier big data projects focused on storing target data sources. Now, instead of focusing on how much data is being managed, organizations are moving their attention to measuring data agility. How does the ability to process and analyze data impact operations? How quickly can they adjust and respond to changes in customer preferences, market conditions, competitive actions, and the status of operations? These questions will direct the investment and scope of big data projects in the near term.

Additionally, data lakes and data hubs have represented a popular first deployment for Hadoop. A data lake or data hub is a scalable infrastructure that’s both economically attractive (reduced per-terabyte cost) and designed for flexibility, and it has the ability to store various forms of both structured and unstructured data. During the next few years, data lakes will evolve as organizations move from batch to real-time processing and integrate file-based Hadoop and database engines into their large-scale processing platforms.

AR: Q10. What is the best advice you have got in your career?

MCS: What you do today will follow you for the rest of your life. Therefore, don’t ever compromise on pushing the ceiling. Quality is lasting. You want to be known as a person that demands and delivers the highest quality products, in every sense of the word.

AR: Q11. What strategy would you recommend to companies that are trying to make big data an integral part of their business strategy? start-simple

MCS: Start simple. Pick a small project that can succeed easily. The idea is to focus on learning how to implement and deploy the new technology first. Subsequent projects can be more complex, but it’s important to first get success on a small scale.

AR: Q12. Recently, we have seen a lot of universities offering big data-related programs and certificates. Do you think the universities are doing a good job in preparing students for a technical career in big data? What changes would you suggest to the big data-related academic curriculum?

MCS: Universities are doing a good job with big data, in many ways. Some tend to focus on the technology itself, while others look at combining various technologies to build new systems. Both are important. We have a pretty strong internship program every summer, where students in their senior or final years work at MapR for 3-6 months on various projects. We have been quite impressed by the quality of the students coming in. Many of these students have joined MapR after they’ve graduated.

AR: Q13. What key qualities do you look for when interviewing for data science-related positions on your team?

MCS: Attitude. Attitude and the ability to work as part of a team are the most important qualities. It goes without saying that the person has to have the relevant knowledge, intelligence, and attitudegood communication skills. But it’s also important to understand how they approach a problem and how they react when confronted with difficult technical issues. Do they ask for help? Do they share their issues? Do they look around to see what others have done? Do they keep trying, or do they give up? Do they get angry, or are they able to channel that frustration to try harder? Attitude is everything.

AR: Q14. What was the last book that you read and liked? What do you like to do when you are not working?

one-minute-managerMCS: I read “The One Minute Manager” most recently. It is an easy read compared to most business books, and as a founder/CTO of a company, I have run into the issues discussed in that book quite often. For relaxation, I like to watch movies, and I recently watched the top-rated TV series “The Sopranos.” I also play a lot of badminton and chess—I follow former World Chess Champion Vishy Anand, whom I admire.