Sentiment Analysis Innovation Summit 2014: Day 1 Highlights
Highlights from the presentations by opinion mining experts from Twitter, eBay and Samsung on Day 1 of Sentiment Analysis Innovation Summit 2014 in San Francisco.
I had a great time at the summit and would like to share the key points from selected talks. This KDnuggets-style summary is a quick and convenient way to revisit the quality content and key takeaways from the selected presentations at the summit. In case you are particularly interested in any of these talks, keep checking KDnuggets as we plan to soon publish exclusive interviews with some of these speakers.
Here are highlights from selected talks on day 1 (Thu, May 1):
Jim Skinner, Technical Program Manager at Twitter delivered a talk on “Twitter Sentiment Analysis Using N-grams with a Dynamic Neural Network”. This is a very interesting advance that combines semantic approach and machine learning approach. He discussed about solutions to following two problems with Social Media Sentiment: Vast amount of data and continually changing languages.
He explained various lexicon problems such as Completeness, and Stemming. Switching to machine learning approach, he presented advantages such as no limit to the number of features, weights generated from the training data, etc. This approach has problem of term frequency. Next, he presented advantage of using a blended approach. By using Simple Weighting Algorithms based on the sum of N-gram weights, messages land into four sentiment tiers: Strong Negative, Mild Negative, Mild Positive and Strong Positive. Therefore, the blended approach has a comprehensive coverage of data and provides quick access to sentiment insights, while the lexicon improves continuously.
Vita Markman, Staff Research Engineer at Samsung Research America delivered a talk on “Integrating Linguistic Features into Sentiment Models: Sentiment Mining in Social Media within Industry Setting”.
- Brevity: micro-reviews are short
- Phrases not words often bear key sentiment
- Adjectives and sentiment-bearing words may be absent
- Domain specificity make some labeled data non- reusable
- Off-the-shelf tools are challenged by noisy data
- Industry constraints may disallow large-scale annotation efforts or building intricate models
Talking about inference and phrase learning, she mentioned that labeled seed features (‘great’, ‘terrible’) and anchor features (‘but’, ‘and’) give auto-labeled phrasal features. Recursively infer phrasal features i.e. add inferred phrases into original seed features on the go. She concluded by claiming that phrasal features allow for more nuanced sentiment models with minimal annotation.
Samaneh Moghaddam, Applied Researcher at eBay talked about “Sentiment Analysis for Cold Start Items”. With too many reviews to read, it’s difficult for eBay users to
- Item level models work well for items with large number of reviews
- They perform poorly when training data is small
- Basic LDA model outperforms the more complex models for these items
Introducing the cold start problem, she called items with few reviews as “cold start”. In real life data sets, more than 90% of items are cold start. Discussing the problem of identifying aspects and estimating their ratings for cold start items, she proposed Factorized LDA (FLDA) as a solution, training at the category level. The solution assumes that both items and reviewers can be modeled by a set of latent factors. She then gave a quick intro to the intricacies of the model. The perplexity for the proposed FLDA model has a lower value, indicating better performance over LDA, D-PLDA, and I-FLDA. She also discussed application of this solution in item categorization and overall rating prediction.
Next part: Highlights of talks on Day 2