KDnuggets Home » News » 2015 » Jul » Opinions, Interviews, Reports » Sentiment Analysis Symposium Summary and Highlights ( 15:n24 )

Sentiment Analysis Symposium Summary and Highlights

Here, find out how leading analysts and researchers are exploring the sentiment analysis and text mining in their areas. Also, explore the opportunities, challenges and use-cases for the sentiment analysis.

By Steve Gallant, (MultiModel Research).

Here are some personal impressions on the 2015 conference, which was organized by Seth Grimes (Alta Plana).  For a more complete list of speakers, slides, videos, and other material, please visit http://sentimentsymposium.com/.

seth-grimes-2015This is the eighth year that Seth has organized a conference.  There were two parallel sessions this year, including a half-day session devoted entirely to Financial Markets.

bing-liuProf. Bing Liu (Univ. of Illinois, Chicago) gave the keynote, and emphasized “Lifelong Learning,” noting that most learning approaches do not sufficiently remember learnings from previous tasks.  For example, evaluations of different products use overlapping phrases.  Better to keep and use learnings from different aspects.

Brook Miller (MotiveQuest) emphasized marketing to different communities of people (referred to as “tribes” by his colleagues talk last year).  Brands are sometimes important (eg., technology) or minimal (food).

Anjali Lai (Forrester Research) noted that sophistication of Datamining tools and methodology was not the most important aspect; instead we need to focus on business problems.  She discussed 3 major gaps:  social sentiment vs. identifiable information; qualitative insight vs. quantitative insight; and raw data vs. actionable information.

Jeff Catlin (Lexalytics) gave examples to show that syntax is both hard to do and important.  Connective words such as “until” and “so” can make a big difference in sentiment. They use unsupervised learning on large corpora, as well as deep learning methods.

Dave Schubmehl (IDC) estimated the Cognitive Computing / Machine Learning market at $20 billion, with annual growth (CAGR) greater than 30%.  He noted rapid development in personal assistants (Microsoft Cortana Analytics, Google Now, and Apple Siri).  He also noted some big investments in R&D, with IBM investing $1B annually and Wipro investing $200M annually.  His prediction:  By 2018, half of all consumers will interact with Cognitive Computing Systems on a regular basis.

Robert Dale (Arria) gave quite an interesting talk on the subtleties of natural language generation from data.  There is more and more text generation:  weather, earthquake reports, financial reports, medical reports, operational reports (with recommended actions).  For marketing, this becomes the problem of “persuasive personalization.”  His predictions for 2020:  more text generated by machine than by human.  [My personal takeaway:  generating language looks easy, but it is quite difficult to do it well with multiple items, so don’t try this at home, kids!]

Jon Halestrap (TheySay) told how they were able to interactively explore sentiment data to track down hard-to-find problems at Birmingham Hospital that had large effects in patient opinion.  These included banging trash can lids disturbing patient sleep, and poor signage making difficulties for visitors.

A nice feature of this year’s Symposium was an afternoon workshop devoted to financial applications.

Eiji Hirasawa (FTRI - NIKKEI Group) reported that incorporating unstructured data in models gave an 11% improvement in predictive performance.  He also noted that increased tweets can cause an increase in stock volatility.

Kevin Coogan (Zettacap) tracks stock price estimates on twitter -- targets and stops.  He found that price estimates show maximum overshoots at a stock’s high, and maximum undershoots at a stock’s low.

James Ross (HedgeChatter) said that social media has now been established as the third factor in stock analysis, alongside fundamentals and price data.  He was one of several speakers that emphasized that it was vital to filter twitter authors, especially because it is easy to generate lots of twitter messages with cheap programs.  In particular, they focus upon handles that are good stock price predictors and the people they influence.  They also find reliably bad predictors and flip their signals!


Vika Abrecht (Bloomberg) gave a very interesting talk on Bloomberg’s analysis of whether a story is market-moving news, whether it’s novel, and what its sentiment is.  Their sentiment models are multiclass (+,-,0), probability-producing, non-linear support vector machines.  Inputs include n-grams, negations, tf-idf, and pointwise mutual information (PMI).  Part of speech tagging gives no help, and syntax analyzers are computationally expensive for processing over a million stories a day in multiple languages.  For tweet input, they use 10,000 curated handles, and they also tag companies.


John Liu (Digital Reasoning) talked about Cognitive Computing, and focused on “Cognitive Alpha” as the intersection of sentiment analysis and behavioral economics.  He pointed out that there are market anomalies from cognitive and emotional biases.  As an interesting and amusing example, he pointed to the following bet:  If a coin comes up heads, you gain 60% of what you own; otherwise you lose 50% of what you own.  Would you take this bet for one flip?  Would you take the bet many times?  If you take the bet many times, you are extremely likely to lose money!

Bio: Steve Gallant (sgallant@mmres.com) is VP for Research at MultiModel Research, a company involved with text and machine learning for financial, medical, and marketing applications.