Interview: Cliff Lyon, Stubhub on Mastering the Art of Recommendation and Personalization Analytics

We discuss challenges in designing recommendation and personalization systems, how to select the right metrics, and learning regarding presentation of recommendation on different channels.

Cliff LyonCliff Lyon joined as the head of technology recommender systems in May of 2013, where he leads a dedicated engineering and research team in the development of applied machine learning software and services. These solutions drive dynamic content delivery in both email and site-side applications. Prior to that, Cliff was Senior Director of Engineering at CBS Interactive, where, in addition to recommendation, he delivered technology for online optimization and multivariate testing.

Here is my interview with him:

Anmol Rajpurohit: Q1. What are the critical success factors in designing recommendation and personalization systems? What is the most challenging part in setting up successful recommendation systems?

Cliff Lyon: Recommendation and personalization are organizationally complex. These systems span multiple groups. And, unlike the tactical goals – the algorithm, the delivery stack, and the user experience flow – the strategic goal doesn't belong to any single contributor. The business relies on all these parts working together to create the effect it wants. So someone has to pay attention to this.

In my experience, when a system doesn't do as well as the business had hoped, it’s rarely because some part of the system – the algorithm, the engineering, or the design – was deficient or broken. Often the reason is much simpler: the system isn't driving the strategic goal.

For example, if the business wants more revenue, and you’re recommending related products while people are purchasing, you may distract them and end up losing money. The same applies at the top of the funnel: you don’t want to show an early stage laptop shopper a compatible Bluetooth mouse. That’s the sort of thing they can add to cart at checkout. These are simple examples; in practice the dysfunction can be much more subtle. The main point is that despite the fact that the individual parts of the system work, and perhaps work very well, the business may not be getting what it wants. Personalization_360

If you are setting up a recommendation or personalization system, my advice to you is to make sure you know what the strategic goal is, and test your system against that goal as soon as possible, with the simplest version of the system you can. Once you establish that you’re headed in the right direction, and you have a baseline, then you can invest as much time and effort as you like deepening various aspects of the system.

AR: Q2. What approach do you recommend for the selection of the right similarity metric? What are the popular metrics?

CL: We use A/B testing to see which metric or combinations of metrics work best. I would certainly recommend that as a general approach.

Recommendation SystemsA popular metric is not necessarily a good metric. For example, anyone interested in recommendation will encounter the cosine similarity at a very early stage. I see it mentioned over and over again in papers for illustration, probably because it is conceptually simple and easy to explain. However, in my experience it doesn't do as well as other similarity metrics, though I am sure there are people who have had a different experience. We include it in testing anyway, in case we encounter a situation where it may work well.

There was a paper I liked from Google a while back looking at a bunch of similarity metrics that’s a decent place to start: Evaluating Similarity Measures: A Large-Scale Study in the Orkut Social Network. Also worth a look are the metrics for association rule mining: confidence, lift, leverage, and conviction. I would also recommend trying the root log-likelihood ratio (LLR) similarity implemented in Apache Mahout. The history of recommendation as we’re discussing it here is relatively short; it is easy enough to just go through the timeline, and look at the various metrics that have been introduced over time.

Bottom line, set up your system in a way that makes it easy to test different metrics, and experiment to find the right one for your application.

AR: Q3. From your talk it seems that there is no cookie-cutter strategy on how and when to present the recommendations to viewers, and thus, every company must figure out themselves what works best for them through experimentation such as A/B testing. Still, based on your experience do you have any generic suggestions on the presentation of recommendations on web, mobile, email, etc. - the suggestions which happen to work most of the time?

CL: There are certainly idioms. “You may also like…” from any sort of product browse or detail page, for example. There are natural places on a page, places in a flow, where recommendation works; you know these from your own positive experiences with recommendation on the web. Recommendation works well at the edges, figuratively and literally. You don’t want to distract people from what they’re doing, but you want to be there if they “pop out” of the flow, and decide to maybe go to something else. So look at the stages people go through as they interact with your catalog, and look for the in-between places that emerge. Test recommendations there, and if you get a decent response, just make sure that those clicks are also contributing to your strategic goal.

ExperimentationWe had an interesting example some years ago where we had a recommendation module at the end of a very long page, which featured an editorial product review. Even though we were doing reasonably well, I pushed the page owner to move the placement higher on the page, because I was convinced few visitors scrolled down far enough to see the placement. I figured that if we moved the placement up, more people would see it, and we’d do better.

The reverse was true. We went from an 8% click-rate at the bottom of the page to a 4% click-rate at the top. In hindsight, we reasoned that when recommendation was at the top, the placement became a casualty of the sort of natural filtering people do – they skip the ads and the navigation and so on, and just follow their line of interest. Being at the bottom of a long piece of content ends up working really well, because when a reader reached the bottom, they sort of popped out of the flow and were ready for something new.

We went on and used this same idea for message boards, UGC, and other sort of long form content, and it worked really well.

That’s just one example of course. In the end, it is all about experimentation – we didn’t come to understand this effect until we were faced with an experimental result that challenged our intuition.

Second and last part of the interview.