Interview: Antonio Magnaghi, TicketMaster on Unifying Heterogeneous Analytics through Lambda Architecture

We discuss the role of Data Science team at Ticketmaster, ecommerce data characteristics, analytics based on highly variant data flow, infrastructure challenges, and merits of lambda architecture.

antonio-magnaghiAntonio Magnaghi is Vice President, Data Sciences, at Ticketmaster, one of the World's largest eCommerce sites.

At Ticketmaster Antonio is leading key machine learning initiatives on recommendations, predictive user modeling, forecasting and large-scale distributed systems for real time optimization problems. Antonio worked on algorithmic content production and marketplace design at Demand Media. He acquired an extensive background in on-line advertising at Yahoo!/Yahoo! Research and Fox Audience Network/The Rubicon Project. Antonio conducted research on IP networks and network data mining while at Microsoft and Fujitsu Laboratories of America.

He holds four patents and a Ph.D. from the University of Tokyo, Japan.

Here is my interview with him:

Anmol Rajpurohit: Q1. What are the core responsibilities of the Data Science team at Ticketmaster?

ticketmaster-live-nation-logoAntonio Magnaghi: The Data Science Team at Ticketmaster is entrusted with the collection, management and mining of varied data sets. Our mandate is to use principled and quantitative approaches for knowledge discovery and decision-making. And use the information distilled from our data to design novel algorithms and data products that can transform the business.

In broader terms, our mission is also to contribute to the Machine Learning (ML) community, evangelize those principles and architectures that are key for the delivery of real world, enterprise-grade ML platforms. For instance, we organize regular meet-ups on lambda architectures, we sponsor internship programs, etc.

AR: Q2. What are the typical characteristics of the ecommerce data at Ticketmaster?

live-eventAM: The dimensions that typically characterize the data sets we routinely handle are those that one would expect from an eCommerce site that ranks among the top 5 world wide in terms of revenue. Volumes of data are quite large. Velocity at which data enters our data lakes is similarly high. Our catalog of live events and performing attractions spans all genres and multiple international markets.

In addition to these considerations, there are other aspects that are entirely peculiar to our business domain. Our data needs to be regarded as longitudinal in nature. We have records pertaining to user interests in live events that go back decades. The knowledge to be gained from mining this data is unique. Another singularity about our data is its bursty nature. When very popular events go on sale, it becomes a social phenomenon. Traffic spikes are of a far greater magnitude than regular traffic. These act like “denial of service” attacks with a transient duration. Being able to devise algorithms that can manage such varied time scales and dynamics is a very exciting challenge.

AR: Q3. What are the unique challenges of working on such highly variant data-flow? How did you address the infrastructure challenge of processing huge spikes in the incoming data?

bursty-dataAM: Anmol, as you stated there are several challenges we face daily. Like I mentioned before, on-sales are an intrinsic part of our business. On-sales embody the excitement of fans for the performers they love. On-sales can be so large as to acquire a social dimension. It is critical, therefore, for Ticketmaster to provide the best possible experience to fans during on-sales. Our platforms and algorithms need to be highly performant under these conditions of high system stress. Other retail sites typically are not exposed to such extreme situations.

You can easily conceive how label distribution for incoming traffic can be skewed and time varying. This poses a significant challenge in terms of the generalization power of models previously trained. We strive to adopt and engineer state of the art algorithms. On-line learning and streaming algorithms are heavily utilized in our various incarnations of lambda architectures. Naturally, we also adhere to sound and well-established engineering practices that guarantee scalability, consistency and resiliency.

AR: Q4. Why did you choose Lambda Architecture for data processing? Can you provide us a use case where this hybrid approach delivers advantage over other alternatives?

lambda-architectureAM: Lambda architectures present distinctive traits that make them particularly suitable for real-world, large-scale data products. They provide a viable answer to the need of gathering insights that are function of the entire data set. At the same time, lambda architectures are built with the capability to support data processes at different data and temporal granularity. Enterprise data pipelines, most of the time, comprise heterogeneous processing modalities: computing can be batch (e.g. Hadoop) vs. (near) real-time vs. streaming (e.g. Storm), data storage can he file-oriented (e.g. HDFS) vs. columnar (e.g. HBase). Additionally, given the distributed nature of lambda architectures, they are designed to be scalable and fault-tolerant.

AR: Q5. What are the major components of the current data architecture at Ticketmaster? How would you describe the ideal data architecture for future? data-architecture

AM: Because of the heterogeneity of our data and data sources, at Ticketmaster we have several instances of lambda architectures. Each one of them is customized to address the specific needs of each data consumer. We have been focusing on building new data products. As these products become successful and new data emerges, these separate lambda architectures will expand and integrate with each other organically.

AR: Q6. How do you evaluate the success of adopting Lambda Architecture? How significant was the impact on the performance of resulting recommendations?

Measure successAM: The adoption of lambda architectures has had a notable impact on our ability to move quickly, incrementally deploy and measure our progress. It is not uncommon to encounter significant obstacles to scale a prototype up in a predictable and reliable manner. Because of the traits that lambda architectures have built in from the get go, it was easy for us to scale as needed. This is critical for a business of the size of Ticketmaster. Additionally, having more timely data also made a huge difference for us in terms of algorithms (streaming/near real-time) to we could support and this has resulted in overall greater user engagement.

Second part of the interview