KDnuggets Home » News » 2015 » Feb » Software » Fun and Top! US States in 2 Words using twitteR ( 15:n06 )

Fun and Top! US States in 2 Words using twitteR


Combining twitteR package with text mining techniques and visualization tools can produce interesting outputs. Find out which US state is fun and top, and which is good and crazy, according to Twitter.



by Antonio Sánchez Chinchón (@aschinchon) .

R is an incredible tool to do Data Science. Currently, there are more than 6200 packages available in CRAN to be explored (one person would spend around 17 years testing one of these packages a day). R is almost infinite. One of the most exciting and entertaining packages I know is twitteR package, which provides an interface to the Twitter web API. It allows you to download tweets from twitter. The searching function is very flexible: you can search tweet containing particular words, exact phrases, referencing persons, containing some hashtag, from some person, to some person, between two dates, around a place … a detailed description of possible queries can be found here.

Combining this powerful tool with text mining techniques and visualization tools can produce interesting outputs. In this experiment I search tweets talking about American states. The way I do it is looking for tweets containing the exact phrase “[STATE NAME] is”. For example, I look for tweets containing “Alabama is”. I do the same for all state names. Once I have the result of this searching, I clean and standardize tweets (removing punctuation characters and transforming words to lower case) and I convert the in a Corpus to use text mining functions containing in tm package. One of these functions allows me to cross tweets with a list of opinion words obtained from here. Once I have all the opinions, I only have to summarize them to obtain the two most common words for each state.

Us States 2 Words

This is the time for visualizing results. The maps package allows to plot silhouettes of American states (except Hawaii and Alaska, I don’t know why). Combining it with ggplot2 and gridExtra packages I arrange the states and place the 2 words from twitter analysis inside the state boundaries. Not all phrases make sense but all of them reflect a twitter feeling in a particular moment of time. Of course, this is just an experiment to show how to use and combine some interesting tools of R. If you don't like what twitter says about your state, don’t take it too seriously.

Here is the table with top 2 words for each state
Abbr State word1 word2
AL Alabama sweet free
AZ Arizona good lead
AR Arkansas good lead
CA California better great
CO Colorado beautiful win
CT Connecticut heck freedom
DE Delaware shake reckless
FL Florida amazing better
GA Georgia best confused
ID Idaho beautiful best
IL Illinois cool good
IN Indiana wins defeating
IA Iowa warning good
KS Kansas sh*t hate
KY Kentucky overrated fast
LA Louisiana good crazy
ME Maine love amazing
MD Maryland good free
MA Massachusetts fun top
MI Michigan right good
MN Minnesota cool right
MS Mississippi best impressive
MO Missouri rejecting right
MT Montana sh*t work
NE Nebraska enough great
NV Nevada lonely weed
NH New Hampshire fall scratch
NJ New Jersey evil horrible
NM New Mexico beautiful desert
NY New York beautiful unhappy
NC North Carolina excited great
ND North Dakota worst worth
OH Ohio love good
OK Oklahoma best pretty
OR Oregon stealing beautiful
PA Pennsylvania top beautiful
RI Rhode Island right beautiful
SC South Carolina dispute fans
SD South Dakota dumb ready
TN Tennessee great leading
TX Texas good great
UT Utah great fine
VT Vermont smartest scandal
VA Virginia defensive good
WA Washington well beautiful
WV West Virginia loss outlaw
WI Wisconsin good best
WY Wyoming boring finest

Antonio Sánchez Chinchón is mathematician and works as data scientist at Telefónica. He is the creator of Ripples, an unclassifiable blog of mathematical experiments and R programming. You can follow him in @aschinchon

Related: