Sentiment and Emotion Analysis for Beginners: Types and Challenges

There are three types of emotion AI, and their combinations. In this article, I’ll briefly go through these three types and the challenges of their real-life applications.

By Veronika Vartanova, Mobility Researcher at Iflexion

Sentiment analysis, emotion AI, or, as it’s commonly referred to in terms of commercial use, opinion mining, is mostly regarded as a popular application of Natural Language Processing (NLP). However, despite text processing being the vastest branch of the technology, it’s far from being the only one.

There are three types, or levels, of emotion AI, and their combinations. All of them have their own challenges and are currently at various stages of development. In this article, I’ll briefly go through these three types and the challenges of their real-life applications.




Text sentiment analysis

As a subset of NLP, text analysis and written opinion mining are the simplest and most developed types of sentiment analysis to date. With a high demand and a long history of development, they are also the most adopted ones by businesses and the public sector.

Basic sentiment analysis, especially for commercial use, can be narrowed down to classification of sentences, paragraphs, and posts or documents as negative, neutral, or positive. A more complex processing of sentiment and attitude, extraction of meaning, classification of intent, and linguistics-based emotion analysis are also gaining traction.

Automated sentiment analysis is usually achieved through supervised deep machine learning, a lexicon-based unsupervised process, or a combination of both.

There are many ready-made datasets that often, but not always, employ social media, various review platforms, and publicly available Q&A services. Crawling and scrapping of popular websites (those that allow it) to extract new data is also popular, Twitter and Amazon being particularly popular choices.


Visual sentiment and emotion analysis

As a part of multimedia sentiment analysis, visual emotion AI is much less developed and commercially integrated, compared to text-based analysis.

Good examples of current applications of emotion analysis are visual content search by emotion identifiers (“happiness,” “love,” “joy,” “anger”) in digital image repositories, and automated image and video tags predictions. On the horizon is automated understanding of people’s emotions for educational, political, cultural, security, and other purposes.

Currently, the combined visual/text analysis, as well as analysis of image annotations and companion text are still the major sources for machine learning processes, aimed to create AI for visual sentiment analysis.

Data for visual sentiment analysis can also originate from social media: images from Flickr, Twitter, Tumblr; video from the public hosting platforms (YouTube, Vimeo, etc.)

Thanks to many well-known sets of annotated static images, facial expressions can be interpreted and classified easily enough. Complex or abstract images, as well as video and real-time visual emotion analysis are more of a problem, especially considering less concrete signifiers to anchor to, or forced and ingenuine expressions.

Complex visual sentiment analysis requires higher levels of abstraction, cultural knowledge, understanding of subjectivity, concepts, and cues. It is harder to acquire labelled or curated datasets and create models for learning to extract and predict meaning for this purpose.

While recent studies show much promise, they are first and foremost indicative of the fact that there’s a long way to go before we arrive at visual lie-detectors and threat detection security systems, which can combine automated facial emotion and body language analyses for spotting potentially risky situations.


Audio sentiment analysis

Audio chatbots and becoming an ever bigger part of our lives. One would be hard-pressed to remember a recent customer service call without some sort of a ‘canned’ response or greeting. While far from being perfect, these audio assistants either already use or are soon to be using sentiment analysis.

Detecting stress, frustration and other emotions from the tone of voice as well as the context is one of the tasks that machines can already do. Understanding of and the ability to simulate prosody and tonality is a big part of speech processing and synthesis right now.

Existing emotion-detection methods that are used for audio sentiment analysis usually go in pair with speech recognition. The parameters for this analysis are sets of detectable acoustic features: pitch, energy, tempo, spectral coefficients, and so on.

One of the most recognized toolkits for emotion analysis is the Munich Open-Source Emotion and Affect Recognition Toolkit (openEAR), capable of extractng more than 4,000 features (39 functionals of 56 acoustic low-level descriptors).


What are the main challenges of sentiment analysis and emotion AI?

There are several challenges that emotion AI developers still need to overcome.

This is a common notion in machine learning consulting now: the success of emotion AI ‘education’ will always depend on the quality of the input data. Bigger, better, and cleaner datasets are necessary to avoid the “garbage in, garbage out” situations, such as caused by these challenges:

  • Text sentiment analysis challenges: inability to detect double meaning, jokes, and innuendos; inability to account for regional variations of language and non-native speech structures.

Example: Sarcasm in written speech can be a hard task to process for emotion AI, which can result in a skewed understanding of meaning and intent. While social media are often the sources for opinion and intent mining by machine learning algorithms, the language there is admittedly specific and not necessarily a truthful representation of real-life speech. The infamous cases of ‘AI chatbot becomes a racist bigot after a day on Twitter’ and the likes are comical yet still common.

  • Visual sentiment analysis challenges: inability to distinguish between genuine and forced or exaggerated emotional expressions; not accounting for body language; problems with processing concepts and abstract imagery.

Example: one of the obvious uses of emotion and sentiment analysis that comes to mind are security and defense applications, for example, visual lie-detectors. So far, the problem associated with this level of algorithmic perceptiveness lies in the field of understanding genuine emotions, or the lack of thereof. While there are recent successful studies and developments aimed at spotting real vs. fake facial expressions, those are still relatively small-scale and extremely segmented like, only concerning smiles).

  • Audio analysis challenges: not accounting for various accents, regional speech patterns, personal idiosyncrasies of pronunciation, and so on.

Examples: many non-native speakers retain accents when speaking a second language. Among other things, accents can manifest in carried-over tonality shifts, speed and pausing variations that are not characteristic of native speakers. Those will have to be specifically accounted for since otherwise these shifts may lead to misunderstanding of sentiment and intent. The same can be true, though to a lesser degree, in the case of native speakers’ regional accents within the same language.

Those are the main issues to be overcome on the way to better chatbots, smart assistants, robot-guides in home and business settings, and ultimately to the self-aware, empathic and truly understanding AI.

Bio: Veronika Vartanova is a Mobility Researcher at Itransition, a software development company based in Denver, CO. She writes on the latest trends in mobile app development, AR and VR business integration, and mobile-first digital transformation.