Interview: Antonio Magnaghi, TicketMaster on Unifying Heterogeneous Analytics through Lambda Architecture
We discuss the role of Data Science team at Ticketmaster, ecommerce data characteristics, analytics based on highly variant data flow, infrastructure challenges, and merits of lambda architecture.
At Ticketmaster Antonio is leading key machine learning initiatives on recommendations, predictive user modeling, forecasting and large-scale distributed systems for real time optimization problems. Antonio worked on algorithmic content production and marketplace design at Demand Media. He acquired an extensive background in on-line advertising at Yahoo!/Yahoo! Research and Fox Audience Network/The Rubicon Project. Antonio conducted research on IP networks and network data mining while at Microsoft and Fujitsu Laboratories of America.
He holds four patents and a Ph.D. from the University of Tokyo, Japan.
Here is my interview with him:
Anmol Rajpurohit: Q1. What are the core responsibilities of the Data Science team at Ticketmaster?
In broader terms, our mission is also to contribute to the Machine Learning (ML) community, evangelize those principles and architectures that are key for the delivery of real world, enterprise-grade ML platforms. For instance, we organize regular meet-ups on lambda architectures, we sponsor internship programs, etc.
AR: Q2. What are the typical characteristics of the ecommerce data at Ticketmaster?
In addition to these considerations, there are other aspects that are entirely peculiar to our business domain. Our data needs to be regarded as longitudinal in nature. We have records pertaining to user interests in live events that go back decades. The knowledge to be gained from mining this data is unique. Another singularity about our data is its bursty nature. When very popular events go on sale, it becomes a social phenomenon. Traffic spikes are of a far greater magnitude than regular traffic. These act like “denial of service” attacks with a transient duration. Being able to devise algorithms that can manage such varied time scales and dynamics is a very exciting challenge.
AR: Q3. What are the unique challenges of working on such highly variant data-flow? How did you address the infrastructure challenge of processing huge spikes in the incoming data?
You can easily conceive how label distribution for incoming traffic can be skewed and time varying. This poses a significant challenge in terms of the generalization power of models previously trained. We strive to adopt and engineer state of the art algorithms. On-line learning and streaming algorithms are heavily utilized in our various incarnations of lambda architectures. Naturally, we also adhere to sound and well-established engineering practices that guarantee scalability, consistency and resiliency.
AR: Q4. Why did you choose Lambda Architecture for data processing? Can you provide us a use case where this hybrid approach delivers advantage over other alternatives?
AR: Q5. What are the major components of the current data architecture at Ticketmaster? How would you describe the ideal data architecture for future?
AM: Because of the heterogeneity of our data and data sources, at Ticketmaster we have several instances of lambda architectures. Each one of them is customized to address the specific needs of each data consumer. We have been focusing on building new data products. As these products become successful and new data emerges, these separate lambda architectures will expand and integrate with each other organically.
AR: Q6. How do you evaluate the success of adopting Lambda Architecture? How significant was the impact on the performance of resulting recommendations?
Second part of the interview
Related: