Sentiment Analysis Innovation Summit 2014: Day 1 Highlights

Highlights from the presentations by opinion mining experts from Twitter, eBay and Samsung on Day 1 of Sentiment Analysis Innovation Summit 2014 in San Francisco.

IE Sentiment AnalyticsSentiment Analysis Innovation Summit 2014 (May 1-2, 2014) held in San Francisco, CA was organized by the Innovation Enterprise. The summit brought together experts from industry and academia for two days of insightful presentations, discussions, panels and networking. It covered a wide range of topics related to sentiment analysis such as making sense of consumer sentiment, extracting sentiment using NLP, sentiment monitoring, sentiment analysis with Machine Learning & Solr, etc.

I had a great time at the summit and would like to share the key points from selected talks. This KDnuggets-style summary is a quick and convenient way to revisit the quality content and key takeaways from the selected presentations at the summit. In case you are particularly interested in any of these talks, keep checking KDnuggets as we plan to soon publish exclusive interviews with some of these speakers.

Here are highlights from selected talks on day 1 (Thu, May 1):

Jim Skinner, Technical Program Manager at Twitter delivered a talk on “Twitter Sentiment Analysis Using N-grams with a Dynamic Neural Network”. This is a very interesting advance that combines semantic approach and machine learning approach. He discussed about solutions to following two problems with Social Media Sentiment: Vast amount of data and continually changing languages. Twitter He gave a brief overview of Sentiment Analysis explaining the key terms such as lexicon, dictionary, weights, etc. In semantic analysis approach it is easy to understand the feature set and the weights, and there is no need to develop training set.

He explained various lexicon problems such as Completeness, and Stemming. Switching to machine learning approach, he presented advantages such as no limit to the number of features, weights generated from the training data, etc.  This approach has problem of term frequency. Next, he presented advantage of using a blended approach. By using Simple Weighting Algorithms based on the sum of N-gram weights, messages land into four sentiment tiers: Strong Negative, Mild Negative, Mild Positive and Strong Positive. Therefore, the blended approach has a comprehensive coverage of data and provides quick access to sentiment insights, while the lexicon improves continuously.

Vita Markman, Staff Research Engineer at Samsung Research America delivered a talk on “Integrating Linguistic Features into Sentiment Models: Sentiment Mining in Social Media within Industry Setting”. Samsung The central goal of the talk was to show how linguistically-informed algorithms can be adapted to an industry setting as well as tailored to the language of social media. It was focused on phrasal feature discovery and auto labeling with minimum language processing tools. There are several constraints to the industry setting: focus on specific domain, limited time & resources and off-the-shelf tools which are hard to adapt to the domain & noise. Despite of these constraints, one thing is very clear - we need to go beyond the dictionary meaning of the words due to linguistic variability (with infinite dimension), context sensitivity, negation, and domain specificity. She discussed following challenges:

  1. Brevity: micro-reviews are short
  2. Phrases not words often bear key sentiment
  3. Adjectives and sentiment-bearing words may be absent
  4. Domain specificity make some labeled data non- reusable
  5. Off-the-shelf tools are challenged by noisy data
  6. Industry constraints may disallow large-scale annotation efforts or building intricate models

Talking about inference and phrase learning, she mentioned that labeled seed features (‘great’, ‘terrible’) and anchor features (‘but’, ‘and’) give auto-labeled phrasal features. Recursively infer phrasal features i.e. add inferred phrases into original seed features on the go. She concluded by claiming that phrasal features allow for more nuanced sentiment models with minimal annotation.

Samaneh Moghaddam, Applied Researcher at eBay talked about “Sentiment Analysis for Cold Start Items”. With too many reviews to read, it’s difficult for eBay users to eBay make informed decisions and even more difficult for the manufacturers to keep track of customer reviews. One of the newly emerged problems in the area of Opinion Mining is Aspect-based Opinion Mining, which includes tasks such as: Aspect Extraction, Rating Prediction and Opinion Phrases Extraction. All the state-of –the-art LDA models are applied at item level. The impact of the size of training data is evaluated on a real-life data set from Epinions(a general consumer review site). Next, she analyzed the results and found the following:

  1. Item level models work well for items with large number of reviews
  2. They perform poorly when training data is small
  3. Basic LDA model outperforms the more complex models for these items

Introducing the cold start problem, she called items with few reviews as “cold start”. In real life data sets, more than 90% of items are cold start. Discussing the problem of identifying aspects and estimating their ratings for cold start items, she proposed Factorized LDA (FLDA) as a solution, training at the category level. The solution assumes that both items and reviewers can be modeled by a set of latent factors. She then gave a quick intro to the intricacies of the model. The perplexity for the proposed FLDA model has a lower value, indicating better performance over LDA, D-PLDA, and I-FLDA.  She also discussed application of this solution in item categorization and overall rating prediction.

Next part: Highlights of talks on Day 2