The Data Science of “Someone Like You” or Sentiment Analysis of Adele’s Songs
An extensive analysis of Adele's songs using Natural Language Processing (NLP) on the lyrics, to uncover the underlying emotions and sentiments.
By Preetish Panda, Prompt Cloud.
Adele is one of the most famous contemporary singers and undoubtedly one of the few artists loved by people of all ages. Her album 21 received tremendous success and earned numerous mentions in the Guinness Book of World Records. Adele was also the first female artist to get two albums in the top five of the Billboard 200 at the same time and two singles in the top five of the Billboard Hot 100. As a lead artist she was the first woman in the history of the Billboard Hot 100 to have three simultaneous top 10 singles -- "Rolling in the Deep", "Someone Like You", and "Set Fire to the Rain".
Enough said about her singing prowess! What’s fascinating for us is the easy availability of her lyrics exposed by Genius API, audio features via Spotify’s API and the potential to analyze them. In this study, I’ll be performing sentiment analysis on her studio albums, i.e.,19, 21, 25.
Data set
The Genius.com API allowed me to download the lyrics and associated data points. In total the data set consists of 41 tracks and the following data fields:
- artist name
- album name
- track title
- track number
- lyric text
- line number
Goal
The goal of this study is to perform text mining techniques, specifically Natural Language Processing (NLP) on the lyrics to uncover the underlying emotions and sentiments. Given below are the specific questions we’ll answer:
- What are the most frequent words used by her?
- Which are the positive and negative words prevalent in her songs?
- What are the overall sentiments expressed in her songs?
- Which are the top words used to describe different emotions?
- Which are the songs associated with different emotions?
- What type of relationship exists between the emotions and the albums?
- What does the valence score vary for her albums? (Spotify API will be used here)
Most frequent words
Let’s begin by loading the required packages. After that we will create a word cloud to get an overview of the word frequency.
library(geniusR) library(magrittr) library(purrr) library(dplyr) library(tidyr) library(ggplot2) library(tm) library(wordcloud) library(syuzhet) library(circlize) library(tidytext) library(reshape2) albums <- tibble( artist = c( rep("Adele", 3) ), album = c( "19", "21","25" ) ) album_lyrics <- albums %>% mutate(tracks = map2(artist, album, genius_album)) lyrics <- album_lyrics %>% unnest(tracks) %>% arrange(desc(artist)) lyrics <- as.data.frame(lyrics) lyrics_text <- lyrics$lyric #Removing punctuations and alphanumeric content lyrics_text<- gsub('[[:punct:]]+', '', lyrics_text) lyrics_text<- gsub("([[:alpha:]])\1+", "", lyrics_text) #creating a text corpus docs <- Corpus(VectorSource(lyrics_text)) # Converting the text to lowercase docs <- tm_map(docs, content_transformer(tolower)) # Removing english common stopwords docs <- tm_map(docs, removeWords, stopwords("english")) # creating term document matrix tdm <- TermDocumentMatrix(docs) # defining tdm as matrix m <- as.matrix(tdm) # getting word counts in decreasing order word_freqs = sort(rowSums(m), decreasing=TRUE) # creating a data frame with words and their frequencies lyrics_wc_df <- data.frame(word=names(word_freqs), freq=word_freqs) lyrics_wc_df <- lyrics_wc_df[1:300,] # plotting word cloud set.seed(1234) wordcloud(words = lyrics_wc_df$word, freq = lyrics_wc_df$freq, min.freq = 1,scale=c(1.8,.5), max.words=200, random.order=FALSE, rot.per=0.15, colors=brewer.pal(8, "Dark2"))
This gives us the following visualization:
We can see that ‘love’ is the most frequently used word and the other high frequency words are ‘like’, ‘never’, ‘say’, ‘heart’ and ‘ill’. This makes sense as many of her songs are actually about love and heartbreak.
Positive and negative words in the songs
This will be another word cloud, but this time we’re going to segregate the words into positive and negative polarity. We’ll are using the ‘bing’ method of ‘syuzhet’ package for this visualization.
lyrics$lyric <- as.character(lyrics$lyric) tidy_lyrics <- lyrics %>% unnest_tokens(word,lyric) set.seed(1234) tidy_lyrics %>% inner_join(get_sentiments("bing")) %>% count(word, sentiment, sort = TRUE) %>% acast(word ~ sentiment, value.var = "n", fill = 0) %>% comparison.cloud(colors = c("#F8766D", "#00BFC4"), max.words = 250)
The resulting visualization shows that the predominant positive words are ‘love’, ‘like’, ‘better’, ‘right, free’, ‘wonders’ and negative words are ‘rumours’, ‘miss’, ‘fall’, ‘fool’, ‘crazy’, pain, hurts’, ‘wrong’, ‘tired’.
Overall sentiments expressed
This chart will show us the cumulative score generated for different emotions expressed in her songs. For this chart we’ll use the ‘nrc’ method of ‘syuzhet’ package. Given below is the code for the same:
# Getting the sentiment value for the lyrics ty_sentiment <- get_nrc_sentiment((lyrics_text)) # Dataframe with cumulative value of the sentiments Sentimentscores <- data.frame(colSums(ty_sentiment[,])) # Dataframe with sentiment and score as columns names(sentimentscores) <- "Score" sentimentscores <- cbind("sentiment"=rownames(sentimentscores),sentimentscores) rownames(sentimentscores) <- NULL # Plot for the cumulative sentiments ggplot(data=sentimentscores,aes(x=sentiment,y=Score))+ geom_bar(aes(fill=sentiment),stat = "identity")+ theme(legend.position="none")+ xlab("Sentiments")+ylab("Scores")+ ggtitle("Total sentiment based on scores")+ theme_minimal()
The chart given below shows that the positive score is higher than negative which might have resulted from the high frequency usage of ‘love’. We also see that ‘joy’, ‘anticipation’ and ‘sadness’ are the top three emotions.
Top words used to describe different emotions
This chart will help us find out the most frequently used words that are associated with the emotions spanning across all the albums. Execute the following code to generate the chart.
song_wrd_count <- tidy_lyrics %>% count(track_title) lyric_counts <- tidy_lyrics %>% left_join(song_wrd_count, by = "track_title") %>% rename(total_words=n) lyric_sentiment <- tidy_lyrics %>% inner_join(get_sentiments("nrc"),by="word") lyric_sentiment %>% count(word,sentiment,sort=TRUE) %>% group_by(sentiment)%>%top_n(n=10) %>% ungroup() %>% ggplot(aes(x=reorder(word,n),y=n,fill=sentiment)) + geom_col(show.legend = FALSE) + facet_wrap(~sentiment,scales="free") + xlab("Sentiments") + ylab("Scores")+ ggtitle("Top words used to express emotions and sentiments") + coord_flip()
The chart shows that in terms of polarity, ‘love’, ‘baby’, ‘young’ are the top 3 positive words and ‘fall’, ‘words’, ‘leave’ are the negative words. Coming to emotions, sadness is primarily respected by words such as ‘fall’, ’leave’, ‘crazy’.