Fun and Top! US States in 2 Words using twitteR
Combining twitteR package with text mining techniques and visualization tools can produce interesting outputs. Find out which US state is fun and top, and which is good and crazy, according to Twitter.
by Antonio Sánchez Chinchón (@aschinchon) .
R is an incredible tool to do Data Science. Currently, there are more than 6200 packages available in CRAN to be explored (one person would spend around 17 years testing one of these packages a day). R is almost infinite. One of the most exciting and entertaining packages I know is twitteR package, which provides an interface to the Twitter web API. It allows you to download tweets from twitter. The searching function is very flexible: you can search tweet containing particular words, exact phrases, referencing persons, containing some hashtag, from some person, to some person, between two dates, around a place … a detailed description of possible queries can be found here.
Combining this powerful tool with text mining techniques and visualization tools can produce interesting outputs. In this experiment I search tweets talking about American states. The way I do it is looking for tweets containing the exact phrase “[STATE NAME] is”. For example, I look for tweets containing “Alabama is”. I do the same for all state names. Once I have the result of this searching, I clean and standardize tweets (removing punctuation characters and transforming words to lower case) and I convert the in a Corpus to use text mining functions containing in tm package. One of these functions allows me to cross tweets with a list of opinion words obtained from here. Once I have all the opinions, I only have to summarize them to obtain the two most common words for each state.
This is the time for visualizing results. The maps package allows to plot silhouettes of American states (except Hawaii and Alaska, I don’t know why). Combining it with ggplot2 and gridExtra packages I arrange the states and place the 2 words from twitter analysis inside the state boundaries. Not all phrases make sense but all of them reflect a twitter feeling in a particular moment of time. Of course, this is just an experiment to show how to use and combine some interesting tools of R. If you don't like what twitter says about your state, don’t take it too seriously.
Here is the table with top 2 words for each state
Antonio Sánchez Chinchón is mathematician and works as data scientist at Telefónica. He is the creator of Ripples, an unclassifiable blog of mathematical experiments and R programming. You can follow him in @aschinchon
Related:
R is an incredible tool to do Data Science. Currently, there are more than 6200 packages available in CRAN to be explored (one person would spend around 17 years testing one of these packages a day). R is almost infinite. One of the most exciting and entertaining packages I know is twitteR package, which provides an interface to the Twitter web API. It allows you to download tweets from twitter. The searching function is very flexible: you can search tweet containing particular words, exact phrases, referencing persons, containing some hashtag, from some person, to some person, between two dates, around a place … a detailed description of possible queries can be found here.
Combining this powerful tool with text mining techniques and visualization tools can produce interesting outputs. In this experiment I search tweets talking about American states. The way I do it is looking for tweets containing the exact phrase “[STATE NAME] is”. For example, I look for tweets containing “Alabama is”. I do the same for all state names. Once I have the result of this searching, I clean and standardize tweets (removing punctuation characters and transforming words to lower case) and I convert the in a Corpus to use text mining functions containing in tm package. One of these functions allows me to cross tweets with a list of opinion words obtained from here. Once I have all the opinions, I only have to summarize them to obtain the two most common words for each state.
This is the time for visualizing results. The maps package allows to plot silhouettes of American states (except Hawaii and Alaska, I don’t know why). Combining it with ggplot2 and gridExtra packages I arrange the states and place the 2 words from twitter analysis inside the state boundaries. Not all phrases make sense but all of them reflect a twitter feeling in a particular moment of time. Of course, this is just an experiment to show how to use and combine some interesting tools of R. If you don't like what twitter says about your state, don’t take it too seriously.
Here is the table with top 2 words for each state
Abbr | State | word1 | word2 |
AL | Alabama | sweet | free |
AZ | Arizona | good | lead |
AR | Arkansas | good | lead |
CA | California | better | great |
CO | Colorado | beautiful | win |
CT | Connecticut | heck | freedom |
DE | Delaware | shake | reckless |
FL | Florida | amazing | better |
GA | Georgia | best | confused |
ID | Idaho | beautiful | best |
IL | Illinois | cool | good |
IN | Indiana | wins | defeating |
IA | Iowa | warning | good |
KS | Kansas | sh*t | hate |
KY | Kentucky | overrated | fast |
LA | Louisiana | good | crazy |
ME | Maine | love | amazing |
MD | Maryland | good | free |
MA | Massachusetts | fun | top |
MI | Michigan | right | good |
MN | Minnesota | cool | right |
MS | Mississippi | best | impressive |
MO | Missouri | rejecting | right |
MT | Montana | sh*t | work |
NE | Nebraska | enough | great |
NV | Nevada | lonely | weed |
NH | New Hampshire | fall | scratch |
NJ | New Jersey | evil | horrible |
NM | New Mexico | beautiful | desert |
NY | New York | beautiful | unhappy |
NC | North Carolina | excited | great |
ND | North Dakota | worst | worth |
OH | Ohio | love | good |
OK | Oklahoma | best | pretty |
OR | Oregon | stealing | beautiful |
PA | Pennsylvania | top | beautiful |
RI | Rhode Island | right | beautiful |
SC | South Carolina | dispute | fans |
SD | South Dakota | dumb | ready |
TN | Tennessee | great | leading |
TX | Texas | good | great |
UT | Utah | great | fine |
VT | Vermont | smartest | scandal |
VA | Virginia | defensive | good |
WA | Washington | well | beautiful |
WV | West Virginia | loss | outlaw |
WI | Wisconsin | good | best |
WY | Wyoming | boring | finest |
Antonio Sánchez Chinchón is mathematician and works as data scientist at Telefónica. He is the creator of Ripples, an unclassifiable blog of mathematical experiments and R programming. You can follow him in @aschinchon
Related: