Interview: Ranjan Sinha, eBay on Winner Insights from International Sorting Competitions

We discuss advancements in the field of Personalization, lessons from winning sorting competitions, Data Science trends, career advice, and more.

Twitter Handle: @hey_anmol

ranjan-sinhaRanjan Sinha is head of data science engineering & technology at eBay Inc., where he leads projects on customer analytics and personalization. Earlier, as lead data scientist, he led multiple business impacting projects in recommendations and personalization that has significantly enhanced consumers’ shopping experiences.

Prior to eBay, Dr. Sinha was a research academic at the University of Melbourne and holds a PhD in computer science from RMIT University, Australia. He has published over 30 works, including in top-tier venues such as IEEE Big Data, VLDB, and ACM SIGMOD.

He won the sort benchmark for both JouleSort and PennySort and was amongst Wall Street Journal’s top-12 Asia-Pacific young inventors. He regularly speaks on data science, big data technologies, and co-organizes the popular Bay Area Search Meetup.

First part of interview

Here is second part of my interview with him:

Anmol Rajpurohit: Q5. Personalization (and Recommendation) has been a high priority for quite some time. What would you consider as the most significant advancement in these fields during the last year, i.e. 2014?

personalizationRanjan Sinha: As automated personalization and 1-to-1 targeting becomes more accessible, marketers are learning to leverage the wealth of existing data to focus on engagement opportunities that could not have been otherwise identified. Very soon, no two people will see the exact same content on a site. Improved access to data, predictive analytics, and real-time actionable insights is expected to deliver unprecedented lift in engagement, conversions, and revenue on an e-commerce site.

Based on recent surveys, 94% of companies agree that personalization is critical to current and future success (Econsultancy), however, 70% of brands are yet to personalize emails sent to subscribers (Experian) and 60% of marketers struggle to personalize content in real-time even though 77% believe real-time personalization is crucial (Adobe & DMA). Thus, while there is significant enthusiasm and momentum, I believe we are still in the early stages of automating personalization.

Recent advancements of personalization in e-commerce include:
  1. In-session personalization involves personalizing content in real-time based on activities within the same session. This also enables us to personalize (to a limited extent) for anonymous visitors based on activities in recent sessions. This involves tying short-term in-session behavior and long-term behavior of customers in order to influence a consumer's shopping experience within the same session.
  2. Whole-page personalization enables us to move beyond ranking the content to personalize the entire front-end page experience, which may involve re-ordering the modules on the page and changing the text.
  3. Multi-screen (or device) personalization enables us to provide a holistic experience to the consumer across all the devices they use.

AR: Q6. What were your key lessons from winning reputed competitions on sorting performance?

sortingRS: Sorting is an invaluable tool that allows many common tasks to be performed efficiently. Algorithms for sorting are also of great theoretical importance and several advances in data structures and algorithmic analysis have come from their study. They have been investigated since before the first computers were constructed. The underlying reason for such interest in sorting is the potential of reaping huge computational savings, as sorting is a basic element of a wide range of computational activities.

A few lessons learnt from these competitions include:
  1. A cache-efficient design is crucial to extract performance once the data is in memory.
  2. The use of multiple threads is important in order to benefit from multi-core processors.
  3. The use of several disk drives in parallel can significantly improve sequential read/write speed and is cost-efficient.
  4. Asynchronous I/O can help in reducing the time spent waiting for an I/O. For instance, while a run is being sorted or merged, the other run can be read from or written to disk.
  5. More RAM helps to reduce the number of disk seeks and the number of passes. However, as this raises the costs of the overall hardware, the final RAM capacity needs to be tuned based on a few runs of the sorting application.
  6. Fast external memory or solid-state drives will help significantly but may also add to the hardware cost.

Several more hardware factors can assist in finely tuning the sorting application in order to be able to extract the optimum performance in a given system. These include the CPU speed, cache hierarchy, number of cores, input/output bandwidth, disk read/write/seek time, RAM access latency, CPU frequency scaling, low latency RAM, disabling unnecessary services, choice of motherboard, as well as choice of algorithms and data structures. Furthermore, the lessons from algorithmic engineering efforts as exemplified by publications in reputed venues such as ACM JEA, VLDB Journal, and ACM SIGMOD can be applied in tuning the algorithms and implementations.

AR: Q7. What is the best advice you have got in your career?

stay-hungryRS: The advice that resonates closely with me is “Stay Hungry, Stay Foolish” by Steve Jobs. A second advice is “Career is a marathon, not a sprint”. A recent blog article provides excellent insights from some of the finest technology leaders today.

AR: Q8. What trends do you expect to see dominate Data Science and Big Data arena in the next 2-3 years?

trendsRS: This is an incredibly exciting time to be involved in the areas of Big Data and Data Science. While it is very difficult to predict future trends in such a fast-paced field, a few trends that appear likely to continue growing in importance in the near-term include:
  1. Data science driven intelligence will be more commonplace such as in personal analytics for monitoring health.
  2. Deeper consumer insights and analytics will enable businesses to understand and cater to their customers’ needs better.
  3. High performance data science platforms and moving their capabilities to the cloud.
  4. Real-time big data analytics that involves capturing and processing data in seconds/milliseconds from multiple sources.
  5. Data governance relating to quality, privacy, and security.

AR: Q9. If you were a fresher starting in Analytics industry today, how would you shape your career?

RS: I would be thrilled to start my career in the analytics industry today. Below are a few pointers on how I would likely shape my career: data-analytics-career
  1. Take advantage of the online courses and classes offered by MOOC platforms such as Coursera, edX, and Udacity. Aim to have a cross-disciplinary knowledge in the areas of Data, Analytics, Programming, and Visualization.
  2. Obtain in-depth analytics experience in one or more domains such as personalization, recommendations, finance, trust, search, and security. Consider investing at least 18-24 months to understand a specific domain, develop innovative solutions, and obtaining impactful results.
  3. Obtain internships in a company with strong analytics professionals and ideally in a cross-disciplinary team. This will enable you to learn from top-notch mentors and expose you to real world challenges prior to your first full-time role.
  4. Understand how an end-to-end data science pipeline works and how it may impact the customers. This will help in debugging issues during an AB Test, as many a times it is not the data science model in isolation that matters in the final results of an AB Test.
  5. Participate in data science competitions, analytics meetups, publish blogs/presentations, and share work on github.
  6. Finally, never stop learning and look for opportunities to apply your newfound knowledge in your domain, innovate, and implement a prototype to demo (ideally accompanied with visualization).

AR: Q10. What was the last book that you read and liked? What do you like to do when you are not working?

RS: The books I enjoyed reading recently are:
  1. It Worked for Me: In Life and Leadership by Colin Powell, 2014
  2. Chai: The Experience of Indian Tea by Rekha Sarin and Rajan Kapoor, 2014

During my spare time, I like to engage in outdoor activities such as hiking and biking, listening to instrumental music, cooking, playing the Conga drums, and catching up on technology advancements.