A Beginner’s Guide to Tweet Analytics with Pandas
Unlike a lot of other tutorials which often pull from the real-time Twitter API, we will be using the downloadable Twitter Analytics data, and most of what we do will be done in Pandas.
Twitter provides access to analytics for all of its users, but I am assuming relatively few vanilla tweeples pay much attention to its existence. There are a variety of other services which can help perform tweet and audience analytics, and further analysis such as that related to geographic and natural language processing, but when paired with some simple Python, the Twitter-supplied data can be incredibly useful.
This is a simple guide to getting your hands a bit dirty doing analysis on your own in Python. Unlike a lot of other tutorials which often pull from the real-time Twitter API, we will be using the downloadable Twitter Analytics data, and most of what we do will be done in Pandas.
Before we get started, let's get the obligatory imports out of the way.
Get and Inspect the Data
First we need the data. This part is easy enough; go to Twitter, click on the upper right menu (your profile pic), select Analytics, choose the Tweets tab along the top, use the date range pickers to select a time period, and choose Export Data. It doesn't matter how much data you use; our simple example will work with any amount. I chose the default, Past 28 Days.
Get that Twitter analytics data.
Once we have the CSV file, we will want to load it into a Pandas DataFrame for analysis.
Don't mind all the dropped columns; while a lot of what is there is useful for our analysis -- tweet text, time, impressions, retweets, etc. -- many are not -- all the promoted things -- and so we will just omit them from the start.
As we would with any data analysis project, next we have a look at the data.
tells us that I have tweeted a measly 95 times in the past 4 weeks. Not a very large dataset, and we probably would not want to make any inferences based on our findings, but a good enough toy set to start out with.
Let's see what useful analytics we can pull out of this.
Basic Tweet Stats
So, given what data is shown in the output of running
head() on the dataset above, and having a rough intuition of what tweet metrics would be useful, we will grab the following stats:
- Retweets - Mean RTs per tweet & top 5 RTed tweets
- Likes - Mean likes per tweet & top 5 liked tweets
- Impressions - Mean impressions per tweet & top 5 tweets with most impressions
I won't bother with any analysis of these metrics. Needless to say, I should step my social media game up.
Top #Hashtags and @Mentions
It's no secret that hashtags play an important role in Twitter, and mentions can also help grow your network and influence. Together they help put the 'social' in social networking, transforming platforms like Twitter from passive experiences to very active ones. With that, getting a handle on the most social aspect of this social network can be a helpful endeavour.
Putting aside some evident bumps like punctuation being removed from tweet text prior to checking Twitter handles (this could be a problem if you have tweeps named both Francesco_AI and FrancescoAI), this works and is at least relatively Pythonic (though I'm sure it could be more so).
Finally, let's have a look at some very basic temporal data. We will check mean impressions for tweets based -- independently -- on both the hour of day and day of week that they are tweeted. I caution (once gain) that this is based on very little data, and so nothing useful will likely be gleaned. However, given much larger amounts of tweet data, entire social media campaigns are planned.
While this is based on impressions, it could just as reasonably (and easily changed to) be based on engagements, or RTs, or whatever else you pleased. Working in advertising, and promoting tweets? Maybe you are more interested in some of those promotion* metrics we hacked off the dataset at the start.
We have to convert the Twitter supplied date field to a legitimate Python datetime object, bin the data based on which hourly slot it falls into, identify days of week, and then capture this data in a couple of additional columns in the DataFrame, which we will pillage for stats afterward.
It seems I tweet at rather consistent times of day. It also seems that my Wednesday tweets, 11 AM tweets, and 6 PM tweets are my bread and butter. Of course, this is based on 95 tweets, and so is meaningless and inconclusive. However, after performing these same steps on some considerably larger sets of data, some interesting trends have been observed which may help lead to business decisions. All from some simple Python.
While not earth-shattering, our simple Pandas-based Twitter analytics code is enough to get us thinking about how we may better use social media. Applied to the right data, elementary scripts can be quite powerful.