We discuss Yahoo’s contributions to Big Data ecosystem, recommendation to Big Data vendors, predictions for Big Data, advice, and more.

nandu-jayakumarNandu Jayakumar has been working with Big Data for over a decade now. He is passionate about databases and distributed systems. At Yahoo, he is currently building data applications that power digital advertising. He is also focused on advanced analytics that aim to improve user understanding at Yahoo. As a senior leader of Yahoo’s well regarded data team, he has built key pieces of Yahoo’s data processing platforms and tools through their several iterations. These include data repositories, data pipelines and reporting systems. In the past, he has contributed to open source projects, including Shark (part of the Apache Spark effort).

Nandu holds a Bachelor’s degree in Electronics Engineering from Bangalore University and a Master’s degree in Computer Science from Stanford University.

Anmol Rajpurohit: Q5. What are the significant upcoming contributions from Yahoo in the Big Data ecosystem?

Nandu Jayakumar: Yahoo is actively involved in several projects within the community, and continues to contribute to core, open-source projects like yahoo-contributionHadoop, YARN, PIG, Hive, Oozie, etc.

We are also making significant contributions to newer technologies like Druid and Spark. You should see interesting work from Yahoo added to the public domain in these areas, soon.

Beyond source code, the other major way in which several web companies, like Yahoo, contribute, is in openly discussing our often pioneering Big Data applications. These case studies often influence and grow the community more than code contributions can.

We constantly evaluate new technologies, and, once we adopt them, we strongly believe in giving back to these open-source technologies. You always see a steady stream of useful contributions from Yahoo.

AR: Q6. Based on your experience, what recommendations would you offer to technology vendors implementing Big Data technology?


Standards and interoperability have been a key ingredient to this ecosystem’s success. Vendors that ignore this will have a hard time selling their products.

big-data-vendors Big Data technologies skew heavily towards open-source. Vendors must offer solutions that are both better, and can evolve faster than open-source alternatives.

At this point in time, a focus on stability and ease of use might help their implementations more than adding features and competing directly with open-source choices.

AR: Q7. What do you personally think about the future of Big Data? Your predictions?

NJ: I predict that Big Data will have a new name within the next five years. Our tool-chain will look completely different in the coming years.

Streaming data management and standards for data interchange, for large scale data, will become mainstream. I also predict more regulation and laws around the data we can collect, store and utilize.

AR: Q8. What advice would you give to people aspiring a long career in Big Data?

NJ: This advice is targeted at individuals working towards building technical careers in Data Science. adviceBecome an expert at the fundamentals first. The theory of Databases, Distributed Systems, Machine Learning and Visualization are the basics that guide all our work on Big Data.

The next step would be to jump in and participate directly in as many actual Big Data implementations as possible. The greater the variety the better. Finally, it is important to keep up with the latest work going on in this young, and quickly evolving field. New books, videos and conferences are key resources.

AR: Q9. If you ran out of your to-do list on a work day, what will you do?

NJ: I wish this would happen more often to me! When it does, I tend to do one of two things. The first is to keep up with the latest going-ons, whether it is new projects at work or interesting developments in the industry or academia. Otherwise, I find myself trying to learn something new. information-dashboard

AR: Q10. What book did you recently read and like?

NJ: I haven’t finished reading the entire book, but I really like and am learning a lot from Information Dashboard Design: The Effective Visual Communication of Data by Stephen Few.