SentimentBuilder: Visual Analysis of Unstructured Texts

Sankey diagrams are mainly used to visualize the flow of data on energy flows, material flow and trade-offs. SentimentBuilder found how to use them with unstructured text in their online NLP tool.

By Rob Potschka,

When you look at words on paper or even a consecutive stream of them, can you determine what they mean?  Sure you can, for a few sentences; but can you find out what they mean when you have hundreds or thousands of conversations, complaints, emails or reviews of your product or service?

Currently there is only one way to analyze unstructured texts in bulk…we use a machine. 
Online Natural Language Processing is the current buzz phrase that allows us to split out Sentences, Words, Themes and Subjects via an online tool.  When Parts of Speech are organized into table format, we get a glimpse of the intelligence in the texts.

Most data scientists prefer to see things visually, but can we convert unstructured texts into an image?  And even if we could, would it show us the importance of a stream of words or relationships between them?

We can!  We have been doing it for years, almost a hundred of them. The technique has been used by large energy producing companies to show the flow of energy from production type to output and the volume produced.  And now it can be used for Unstructured Texts!



Some of you know what this is, and I am sure none of you have ever seen it used in this fashion…This is a Sankey Flow Report, also known as a Sankey Diagram.

The data to support the visualization are customer reviews for a Hotel.  In our example the unstructured texts from customers have been parsed and organized into views and in this case we see the top keywords with a Sentiment identifier.

The weight or thickness of a Band determines the importance of the Part of Speech, in this case Nouns.  And from a quick glimpse of the Nodes (the rectangles) we can determine the occurrence of a Noun in sentences (the noun hotel was found in 89 sentences).

In this vertical view of the diagram we can quickly identify the data relationships…For example there are 25 sentences with a Positive Sentiment that contain the word hotel.

Let’s have a look at what I call the Top 3 unstructured text combinations and why they are important…

1. Comparative Adjective! An Adjective Noun combination will tell you:

  • What kind it is?
  • How many are there?
  • Which one is it?

As you can see in this example, the Online Natural Language Processing of customer hotel reviews shows us that we have Excellent Rooms, Food and Customer Service.  If we were to include many more records in the diagram we gain additional insight.  This type of information can help you determine areas that require attention or things that are running smoothly.


2. Personal Pronouns! A Pronoun followed by the next keyword and the next five tokens for context helps you identify personal subjects.   In our example below we have 5 sentences with a negative tone.  I’m sure if we pulled all the ‘he called’ and ‘I called’ personal pronouns from the data we would get an idea of the room service issues.


3. Spatial Identifiers! A preposition links nouns, pronouns and phrases to other words in a sentence. The word or phrase that the preposition introduces usually indicates the temporal, spatial or logical relationship of its object to the rest of the sentence.

In our example below we can see that a specific area of the hotel has a time lag issue…there are customers that had to wait for a service at the front desk, some waiting 30 minutes to check in.  I am sure you would all agree that waiting 30 minutes for anything is too long.  Like other visualizations, If we include more Spatial records in the Sankey diagram, we visually gain greater insight.



Well I hope you enjoyed this post that shows you how unstructured texts can be viewed visually and the importance of various part of speech combinations.  For more insight into your unstructured texts, visit my site to try a free Online Natural Language Processing tool.


Bio: Rob Potschka  is the owner and lead developer behind which appears on Wikipedia as the first web based tool for building Sankey diagrams, and the owner and developer behind which outputs intelligence from unstructured texts.