5 Things You Need to Know about Sentiment Analysis and Classification
We take a look at the important things you need to know about sentiment analysis, including social media, classification, evaluation metrics and how to visualise the results.
By Symeon Symeonidis, Democritus University of Thrace
In the last years, Sentiment Analysis has become a hot-trend topic of scientific and market research in the field of Natural Language Processing (NLP) and Machine Learning. Below, you can find 5 useful things you need to know about Sentiment Analysis that are connected to Social Media, Datasets, Machine Learning, Visualizations, and Evaluation Methods applied by researchers and market experts. Let’s get started!
1. Social Media are the main resource
Sentiment Analysis examines the problem of studying texts, like posts and reviews, uploaded by users on microblogging platforms, forums, and electronic businesses, regarding the opinions they have about a product, service, event, person or idea.
Figure 1. 3-Classes Sentiment Analysis 
The most common use of Sentiment Analysis is this of classifying a text to a class. Depending on the dataset and the reason, Sentiment Classification can be binary (positive or negative) or multi-class (3 or more classes) problem.
In addition, among researchers and stakeholders, you can find either similar or completely different opinions concerning the relation between emotion detection and sentiment analysis, depending on their perspective. However, regardless the result or approach, they all adopt the same techniques.
2. Before starting the Sentiment Analysis
Many evaluations and labeled sentiment datasets have been created, especially for Twitter posts and Amazon product reviews.
The most popular and widespread are:
- Stanford Twitter Sentiment
- Sentiment Strength Twitter Dataset
- Amazon Reviews for Sentiment Analysis
- Large Movie Review Dataset
- Sanders Corpus
- SemEval (Semantic Evaluation) dataset
Also, anyone using the APIs provided by many platforms and forums can crawl and collect data. The most famous API is that of Twitter.
An initial step in text and sentiment classification is pre-processing. A significant amount of techniques is applied to data in order to reduce the noise of text, reduce dimensionality, and assist in the improvement of classification effectiveness. The most popular techniques include:
- Remove numbers
- Part of speech tagging
- Remove punctuation
- Remove stopwords
3. How to classify Sentiment?
This approach, employes a machine-learning technique and diverse features to construct a classifier that can identify text that expresses sentiment. Nowadays, deep-learning methods are popular because they fit on data learning representations.
This method uses a variety of words annotated by polarity score, to decide the general assessment score of a given content. The strongest asset of this technique is that it does not require any training data, while its weakest point is that a large number of words and expressions are not included in sentiment lexicons.
The combination of machine learning and lexicon-based approaches to address Sentiment Analysis is called Hybrid. Though not commonly used, this method usually produces more promising results than the approaches mentioned above.
Figure 2. Sentiment classification techniques 
4. Evaluation metrics
As a classification problem, Sentiment Analysis uses the evaluation metrics of Precision, Recall, F-score, and Accuracy. Also, average measures like macro, micro, and weighted F1-scores are useful for multi-class problems. Depending on the balance of classes of the dataset the most appropriate metric should be used.
Figure 3. Steps-to-Evaluate-Sentiment-Analysis 
5. Visualise Results
To visualize the results of Sentiment Analysis, many people employ well-known techniques, such as graphs, histograms, and confusion matrices. Because of present multiple data domains and tasks, visualizations approaches like wordcloud, interactive maps, sparkline-style plots are also very popular.
Figure 4. Sentiment Word Cloud 
To dive deeper into the fascinating world of Sentiment Analysis, we recommend you to follow some posts from KDnuggets:
- SlangSD: A Sentiment Dictionary for Slang Words
- Mining Twitter Data with Python Part 6: Sentiment Analysis Basics
- Political Data Science: Analyzing Trump, Clinton, and Sanders Tweets and Sentiment
- Sentiment Analysis & Predictive Analytics for trading. Avoid this systematic mistake
- Tutorial: Building a Twitter Sentiment Analysis Process
Bio: Symeon Symeonidis is a PhD candidate in the area of intention and sentiment mining, at Democritus University of Thrace.
- Hierarchical Classification – a useful approach for predicting thousands of possible categories
- The Machine Learning Abstracts: Classification
- AI and Sentiment Analysis to help you move ahead of the competition