NLP Insights for the Penguin Café Orchestra


We give an example of how to use Expert.ai and Python to investigate favorite music albums.



Sponsored Post.

By Laura Gorrieri, expert.ai

Please find the notebook version of this thread here.

Let's build a small application to investigate one of my favourite artists. They are called "The Penguin Café Orchestra" and if you don't know them you are going to find out what they are about.

Our dataset: a list of their album's reviews that I took from Piero Scaruffi's website and saved in a dedicated folder.

Our goal: to understand more about an artist using album reviews.

Our practical goal: to see how expert.ai’s NL API works and what it can do.

What is The Penguin Café Orchestra about?

First let's see what comes out from the reviews just analysing the words used in them. We'll firstly concatenate all the reviews in one variable, in order to have a whole artist's review. Then we are going to take a look at the most frequent words in them, hoping that it will reveal more on the Penguin Café Orchestra.

## Code for iterating on the artist's folder and concatenate albums' reviews in one single artist's review
import os

artist_review = ''
artist_path = 'penguin_cafe_orchestra'
albums = os.listdir(artist_path)

for album in albums:
album_path = os.path.join(artist_path, album)
      with open(album_path, 'r', encoding = 'utf8') as file:
           review = file.read()
           artist_review += review

Using a shallow-linguistics approach we can investigate the artist review, which contains all the available reviews. To do so we use matplotlib and word cloud to produce a word cloud that will tell us more about the most frequent words in the text.

 
# Import packages

import matplotlib.pyplot as plt
%matplotlib inline

# Define a function to plot word cloud
def plot_cloud(wordcloud):
    # Set figure size
    plt.figure(figsize=(30, 10))
    # Display image
    plt.imshow(wordcloud)
    # No axis details
    plt.axis("off");

# Import package
from wordcloud import WordCloud, STOPWORDS

# Generate word cloud
wordcloud = WordCloud(width = 3000, height = 2000, random_state=1, background_color='white', collocations=False, stopwords = STOPWORDS).generate(artist_review)

# Plot
plot_cloud(wordcloud)

Expert Ai Penguin Cafe Word Cloud

Fig.1: A word cloud in which the most used words appear in a bigger font and the less used one in a smaller font.

How does their music make you feel?

Thanks to the word cloud, we know more about The Penguin Café Orchestra. We know that they use instruments such as the ukulele, piano and violin, and that they mix genres such as folk, ethnic, and classic.

Still, we have no idea of the style of the artist. We can know more by looking at what emotions come out of their work.

 

To do so, we are going to use expert.ai’s NL API. Please register here, find the documentation on the SDK here and on the features here.

### Install the python SDK

!pip install expertai-nlapi

## Code for initializing the client and then use the emotional-traits taxonomy

import os

from expertai.nlapi.cloud.client import ExpertAiClient
client = ExpertAiClient()

os.environ["EAI_USERNAME"] = 'your_username'
os.environ["EAI_PASSWORD"] = 'your_password'

emotions =[]
weights = []

output = client.classification(body={"document": {"text": artist_review}}, params={'taxonomy': 'emotional-traits', 'language': 'en'})

for category in output.categories:
    emotion = category.label
    weight = category.frequency
    emotions.append(emotion)
    weights.append(weight)

print(emotions)
print(weights)


['Happiness', 'Excitement', 'Joy', 'Amusement', 'Love']
[15.86, 31.73, 15.86, 31.73, 4.76]

For retrieving weights we used “frequency” which is actually a percentage.  The sum of all the frequencies is 100. This makes the frequencies of the emotions a good candidate for a pie chart, that is plotted using matplotlib.

# Import libraries

from matplotlib import pyplot as plt
import numpy as np

# Creating plot
colors = ['#0081a7','#2a9d8f','#e9c46a','#f4a261', '#e76f51']
fig = plt.figure(figsize =(10, 7))
plt.pie(weights, labels = emotions, colors=colors, autopct='%1.1f%%')

# show plot
plt.show()

Expert Ai Pie Chart
Fig.2: A pie chart representing each emotion and its percentage.

What's their best album?

If you wanted to start to listen to them, to see if you feel the same emotions that Scaruffis found in their work, where could you start? We can take a look at the sentiment analysis for each album and get an idea of their best ones. To do so, we iterate on each album's review and use expert.ai NL API to retrieve their sentiment and its strength.

## Code for iterating on each album and retrieving the sentiment

sentiment_ratings = []
albums_names = [album[:-4] for album in albums]

for album in albums:
    album_path = os.path.join(artist_path, album)
    with open(album_path, 'r', encoding = 'utf8') as file:
        review = file.read()
        output = client.specific_resource_analysis(
            body={"document": {"text": review}}, params={'language': 'en', 'resource': 'sentiment' })
            sentiment = output.sentiment.overall sentiment_ratings.append(sentiment)

print(albums_names)
print(sentiment_ratings)

['Broadcasting From Home', 'Concert Program', 'Music From the Penguin Cafe', 'Signs of Life']
[11.6, 2.7, 10.89, 3.9]

 

Now we can visualize the sentiment for each review using a bar chart.  This will give us quick visual feedback on the best album of The Penguin Cafe Orchestra, and on their career. To do so we use once again matplotlib.

import matplotlib.pyplot as plt
plt.style.use('ggplot')

albums_names = [name[:-4] for name in albums]

plt.bar(albums_names, sentiment_ratings, color='#70A0AF') plt.ylabel("Album rating")
plt.title("Ratings of Penguin Cafe Orchestra's album")
plt.xticks(albums_names, rotation=70)
plt.show()

Expert Ai Ratings Bar Chart

Originally posted here.