KDnuggets Home » News » 2018 » Sep » Opinions » The Data Science of “Someone Like You” or Sentiment Analysis of Adele’s Songs ( 18:n35 )

The Data Science of “Someone Like You” or Sentiment Analysis of Adele’s Songs


An extensive analysis of Adele's songs using Natural Language Processing (NLP) on the lyrics, to uncover the underlying emotions and sentiments.



By Preetish Panda, Prompt Cloud.

Adele

Adele is one of the most famous contemporary singers and undoubtedly one of the few artists loved by people of all ages. Her album 21 received tremendous success and earned numerous mentions in the Guinness Book of World Records. Adele was also the first female artist to get two albums in the top five of the Billboard 200 at the same time and two singles in the top five of the Billboard Hot 100. As a lead artist she was the first woman in the history of the Billboard Hot 100 to have three simultaneous top 10 singles -- "Rolling in the Deep", "Someone Like You", and "Set Fire to the Rain".

Enough said about her singing prowess! What’s fascinating for us is the easy availability of her lyrics exposed by Genius API, audio features via Spotify’s API and the potential to analyze them. In this study, I’ll be performing sentiment analysis on her studio albums, i.e.,19, 21, 25.

Data set

The Genius.com API allowed me to download the lyrics and associated data points. In total the data set consists of 41 tracks and the following data fields:

  • artist name
  • album name
  • track title
  • track number
  • lyric text
  • line number

Goal

The goal of this study is to perform text mining techniques, specifically Natural Language Processing (NLP) on the lyrics to uncover the underlying emotions and sentiments. Given below are the specific questions we’ll answer:

  • What are the most frequent words used by her?
  • Which are the positive and negative words prevalent in her songs?
  • What are the overall sentiments expressed in her songs?
  • Which are the top words used to describe different emotions?
  • Which are the songs associated with different emotions?
  • What type of relationship exists between the emotions and the albums?
  • What does the valence score vary for her albums? (Spotify API will be used here)

Most frequent words

Let’s begin by loading the required packages. After that we will create a word cloud to get an overview of the word frequency.

 
library(geniusR)
library(magrittr)
library(purrr)
library(dplyr)
library(tidyr)
library(ggplot2)
library(tm)
library(wordcloud)
library(syuzhet)
library(circlize)
library(tidytext)
library(reshape2)

albums <-  tibble(
  artist = c(
    rep("Adele", 3)
  ),
  album = c(
    "19", "21","25"
  )
)

album_lyrics <- albums %>% 
  mutate(tracks = map2(artist, album, genius_album))

lyrics <- album_lyrics %>% 
  unnest(tracks) %>%    
  arrange(desc(artist))

lyrics <- as.data.frame(lyrics)

lyrics_text <- lyrics$lyric
#Removing punctuations and alphanumeric content
lyrics_text<- gsub('[[:punct:]]+', '', lyrics_text)
lyrics_text<- gsub("([[:alpha:]])\1+", "", lyrics_text)
#creating a text corpus
docs <- Corpus(VectorSource(lyrics_text))
# Converting the text to lowercase
docs <- tm_map(docs, content_transformer(tolower))
# Removing english common stopwords
docs <- tm_map(docs, removeWords, stopwords("english"))
# creating term document matrix 
tdm <- TermDocumentMatrix(docs)
# defining tdm as matrix
m <- as.matrix(tdm)
# getting word counts in decreasing order
word_freqs = sort(rowSums(m), decreasing=TRUE) 
# creating a data frame with words and their frequencies
lyrics_wc_df <- data.frame(word=names(word_freqs), freq=word_freqs)

lyrics_wc_df <- lyrics_wc_df[1:300,]

# plotting word cloud

set.seed(1234)
wordcloud(words = lyrics_wc_df$word, freq = lyrics_wc_df$freq, 
          min.freq = 1,scale=c(1.8,.5),
          max.words=200, random.order=FALSE, rot.per=0.15, 
          colors=brewer.pal(8, "Dark2"))


This gives us the following visualization:
Adele Fig 1 Word Cloud

We can see that ‘love’ is the most frequently used word and the other high frequency words are ‘like’, ‘never’, ‘say’, ‘heart’ and ‘ill’. This makes sense as many of her songs are actually about love and heartbreak.

Positive and negative words in the songs

This will be another word cloud, but this time we’re going to segregate the words into positive and negative polarity. We’ll are using the ‘bing’ method of ‘syuzhet’ package for this visualization.

 
lyrics$lyric <- as.character(lyrics$lyric)

tidy_lyrics <- lyrics %>% 
  unnest_tokens(word,lyric)

set.seed(1234)
tidy_lyrics %>%
  inner_join(get_sentiments("bing")) %>%
  count(word, sentiment, sort = TRUE) %>%
  acast(word ~ sentiment, value.var = "n", fill = 0) %>%
  comparison.cloud(colors = c("#F8766D", "#00BFC4"), max.words = 250)


The resulting visualization shows that the predominant positive words are ‘love’, ‘like’, ‘better’, ‘right, free’, ‘wonders’ and negative words are ‘rumours’, ‘miss’, ‘fall’, ‘fool’, ‘crazy’, pain, hurts’, ‘wrong’, ‘tired’.

Adele Fig 2 Negative Positive

Overall sentiments expressed

This chart will show us the cumulative score generated for different emotions expressed in her songs. For this chart we’ll use the ‘nrc’ method of ‘syuzhet’ package. Given below is the code for the same:

 
# Getting the sentiment value for the lyrics
ty_sentiment <- get_nrc_sentiment((lyrics_text))

# Dataframe with cumulative value of the sentiments
Sentimentscores <- data.frame(colSums(ty_sentiment[,]))

# Dataframe with sentiment and score as columns
names(sentimentscores) <- "Score"
sentimentscores <- cbind("sentiment"=rownames(sentimentscores),sentimentscores)
rownames(sentimentscores) <- NULL

# Plot for the cumulative sentiments
ggplot(data=sentimentscores,aes(x=sentiment,y=Score))+
  geom_bar(aes(fill=sentiment),stat = "identity")+
  theme(legend.position="none")+
  xlab("Sentiments")+ylab("Scores")+
  ggtitle("Total sentiment based on scores")+
  theme_minimal() 


The chart given below shows that the positive score is higher than negative which might have resulted from the high frequency usage of ‘love’. We also see that ‘joy’, ‘anticipation’ and ‘sadness’ are the top three emotions.

Adele Fig 3 Sentiment Scores

Top words used to describe different emotions

This chart will help us find out the most frequently used words that are associated with the emotions spanning across all the albums. Execute the following code to generate the chart.

 
song_wrd_count <- tidy_lyrics %>% count(track_title)

lyric_counts <- tidy_lyrics %>%
  left_join(song_wrd_count, by = "track_title") %>% 
  rename(total_words=n)

lyric_sentiment <- tidy_lyrics %>% 
  inner_join(get_sentiments("nrc"),by="word")

lyric_sentiment %>% 
  count(word,sentiment,sort=TRUE) %>% 
  group_by(sentiment)%>%top_n(n=10) %>% 
  ungroup() %>%
  ggplot(aes(x=reorder(word,n),y=n,fill=sentiment)) +
  geom_col(show.legend = FALSE) + 
  facet_wrap(~sentiment,scales="free") +
  xlab("Sentiments") + ylab("Scores")+
  ggtitle("Top words used to express emotions and sentiments") +
  coord_flip()


The chart shows that in terms of polarity, ‘love’, ‘baby’, ‘young’ are the top 3 positive words and ‘fall’, ‘words’, ‘leave’ are the negative words. Coming to emotions, sadness is primarily respected by words such as ‘fall’, ’leave’, ‘crazy’.

Adele Fig 4 Sentiment Words


Sign Up