Interview: Samaneh Moghaddam, Applied Researcher, eBay on Aspect-based Opinion Mining

We discuss aspect-based opinion mining, major challenges, cold start items, the need for accurate opinion mining models for cold start items and how factorized LDA can be leveraged.

SamanehSamaneh Moghaddam is part of the “Customer Connect Data Science Engineering” team at eBay that transforms customer service data into actionable information. She is working on topic and sentiment models to identify and summarize customers’ pain points from user feedback. Samaneh Moghaddam holds a PhD in Computer Science with a thesis on aspect-based opinion mining. She has published several papers in the area of sentiment analysis in the top international conferences such as WWW, ACM SIGIR, ACM CIKM, and ACM WSDM. She has also
presented tutorials at ACM SIGIR 2012 and
WWW 2013 to introduce this research area
to the community.

Here is my interview with her:

Anmol Rajpurohit Q1. How do you define "aspect-based opinion mining"? What are some of the most prominent challenges related to aspect-based opinion mining?

Samaneh Moghaddam: Mining opinions at the document-level or sentence -level is useful in many cases. Opinion-MiningHowever, these levels of information are not sufficient for the process of decision-making (e.g. whether to purchase the product). In a typical review, the reviewer usually writes both positive and negative aspects of the reviewed item, although his general opinion on the item may be positive or negative. Aspect-based opinion mining addresses the needs for detailed information.

Given a set of reviews about an item (e.g., product, services, organization, person, etc.), the task is to identify the major aspects of the item and to predict the rating of each aspect. Aspects (also called features) are attributes or components of the item that has been commented on in a review, e.g., display, battery life, zoom for a digital camera. Estimated rating of an aspect is a numerical value (e.g., in the range from 1 to 5) indicating the quality of that aspect. A sample of extracted aspects and their estimated ratings for a camcorder is shown in the following figure: Extracted aspects for a Camcorder There are various challenges that make the problem of aspect-based opinion mining hard. For example, there are many aspects/sentiments that are understandable for a human reader but hard to be extracted by a machine, e.g., “fits in my pocket pretty easily” that implicitly implies positive sentiment about the aspect size. Another challenge is noisy information as reviewers normally include a large amount of irrelevant information, e.g., opinion about the manufacturer. Finally, identifying aspects and ratings for cold-start items is a critical and challenging problem.

AR: Q2. What do you mean by "Cold Start items"? Why should we be concerned about them? 

SM: Items with few numbers of reviews are called cold start items. In real-life data sets a large percentage of items are cold start (in some data set around 90% of items). A cold start item can be a recently released item (a new smart phone), a rarely reviewed item (an Inn resort in a small city), a very unique item, etc. Our experiments on three real-life data sets [1] show that the distributions of number of reviews per items and number of reviews per reviewers follow a power law.

In other words, we observed that a large number of items have only a few reviews, and a few items have a large number of reviews. These statistics indicate there is a great need for accurate opinion mining models for cold start items.

AR: Q3. Why do the current technologies do not serve well the aspect-based opinion mining of cold start items? How does Factorized LDA help solve this problem?

SM: In the last decade, several latent variable models have been proposed to address the problem of aspect-based opinion mining. All of these models are applied at the item level, i.e., they learn one model per item from the reviews of that item. Learning a model per item is logical as the rating of an aspect depends on the aspect quality, which usually differs for different items. However, an issue that has been neglected in all of the current works is that latent variable models are not accurate if there is not enough training data.

Factorized LDA is our proposed solution to address the cold-start problem. FLDA is a probabilistic model based on LDA (Latent Dirichlet Allocation) that models not only items but also reviewers. This model makes the following assumptions:
  • A category has a set of aspects, which are shared by all items in that category. For example, {zoom, battery life, shutter lag, etc.} is a set of aspects shared by all products in the category ‘digital camera’ (probabilities of occurrence of aspects can differ for different items in the category).
  • Each item has a distribution over the aspects representing what aspects of its category are mainly commented on in reviews of that item. Each of these aspects is associated with a distribution of ratings.
  • Each reviewer has a distribution over the aspects representing what aspects are more commented on by the reviewer. The reviewer is also associated, for each aspect, with a rating distribution.

LDAThis model assumes that both items and reviewers can be modeled by a set of latent factors. Item’s/reviewer’s factors represent the item/reviewer distribution over aspects and for each aspect its distribution over ratings. Each review in the FLDA model is generated based on the learned factors of the corresponding item and reviewer. It first samples aspects in a review from the aspect distributions of the corresponding item and reviewer, and then generates the rating of each aspect conditioned on that aspect and the rating distributions of that item and reviewer.

For cold start items, the aspect and rating distributions are mainly determined by the prior aspect distribution of the category and the rating distribution of the reviewer (or the prior rating distribution of all reviewers), respectively. For non-cold start items, the aspect and rating distributions mainly depend on the observed reviews of that item.

Second and final part of the interview.