Extracting Tweets With R

This article will give you a great, brief overview for extracting Tweets using R.

Twitter is a popular microblogging site that lets user tweet up to 140 characters, post pictures, videos and GIFs. What a user tweets about, gives away a lot of information about himself, his surroundings, likes, dislikes and preferences.

Companies, organisations and individuals have found a lot of ways to utilise this data and extract meaningful information from it. Take for example, tracking earthquake, spotting the Aurora Borealis, predicting stock market, understanding relationship among users, recommending books, sentiment analysis for a product etc.

The first step towards all this is extracting the tweets of concerned party or event into usable format. This article will help you get started with that!

Following are the steps we will be taking:

1. Create a Twitter application to extract data from Twitter (Just a few clicks here and there)<
2. Extract tweets using search word (2 lines of code!)

Steps to Create Twitter Application

1. Use your Twitter credentials to login here and click on ‘Create New App’

2. Fill in the application form (shown below) with relevant details. Note that the name should be unique and should not have been used by anyone else before. After you have read the Twitter Developer Agreement, tick the check box and ‘Create your Twitter application’. Oh! And, you will need to have your phone linked to twitter to create an app

3. You will lend on application details page; move to ‘Keys and Access Tokens’ tab, scroll down and click ‘Create my access token’. Note the values of API Key and API Secret for future use. Thou shan’t share these with anyone, one can access your account if they get the keys.

4. In order to extract tweets, you will need to establish a secure connection between R and Twitter as follows:

Load necessary R packages and get CURL certification. ROAuth: R interface for OAuth, the open standard for token-based authorisation on the internet.

#Clear R Environment
#Load required libraries
# Download the file and store in your working directory
download.file(url= "http://curl.haxx.se/ca/cacert.pem", destfile= "cacert.pem")

Set the certification at Twitter by making a call to OAuthFactory function

#Insert your consumerKey and consumerSecret below
credentials <- OAuthFactory$new(consumerKey='XXXXXXXXXXXXXXXXXX',

Let’s now ask Twitter for access!



5. After executing the above code, you will be directed to Twitter’s authorisation screen. Click on Authorize App and note the PIN generated. Go back to RStudio and enter the PIN. Note, you will only need to do this once.

Save the credentials for later use:

save(credentials, file=”twitter authentication.Rdata”)


Extract Tweets

Now that we are all done with setting up gateways to reach Twitter, let’s get our hands dirty with real data. Function searchTwitter lets you search through Twitter and return a list of tweets consisting the searched text.

Below is a piece of code to extract tweets with the search string, #DataLove. Explore other parameters of this function that lets you filter for time period, geography etc.

#Load Authentication Data
load(“twitter authentication.Rdata”)
#Register Twitter Authentication
setup_twitter_oauth(credentials$consumerKey, credentials$consumerSecret, credentials$oauthKey, credentials$oauthSecret)
#Extract Tweets with concerned string(first argument), followed by number of tweets (n) and language (lang)
tweets <- searchTwitter('#DataLove', n=10, lang="en")


Closing Note

Extracting tweets is just the beginning. This data becomes beautiful when you add visualisation, identify patterns, analyse relations and get relevant insights. Check-out one such fun analysis here!

Hope you enjoyed adding a new skill to your Machine Learning portfolio!

Kritika Jalan is an experienced business analyst working in management consulting. She is skilled in R, Python, SQL, and other data analysis tools and machine learning techniques.