A Guide to Instagramming with Python for Data Analysis
I am writing this article to show you the basics of using Instagram in a programmatic way. You can benefit from this if you want to use it in a data analysis, computer vision, or any other cool project you can think of.
By Nour Galaby, Data Enthusiast.
Instagram is the largest photo sharing social media platform with 500 million monthly active users, and 95 million pictures and videos uploaded on Instagram everyday. It has a huge amount of data and huge potential. This guide will help you view Instagram as a source of data and not just a platform, and how to use it as a developer for your projects.
About API and Tools
Instagram has an official API but it is outdated and is currently very limited in things you can do with it. So in this guide I will use LevPasha’s Unofficial Instagram API, which supports all major features like like, follow, upload photo and video, etc! It is written in Python, but I will focus only on the data side.
I recommend using Jupyter notebooks and IPython. Normal python will work fine, but you may not have features like displaying the images
You can install the library using
pip this way:
You will need
ffmpeg if you don't have it. To install it on Linux:
For Windows, run this in your Python interpreter:
Logging in to Instagram Using the API
If successful, you should receive a "Login Success" message.
With that out of the way let's get started with our first request:
As you can see, the result is in JSON format, containing all of the requested data.
You can access it in the normal key/value way. For example:
You can use any advance viewing tool (Notepad++) to view the JSON and explore it.
Get and View Instagram Timeline
Now let's do something more interesting. We will request the last posts in the timeline and view them in our notebook.
With this line you can get the timeline:
And similar to the previous request, we will use LastJson() to view the result. By inspecting the resulting JSON, we can see that it holds a list in a key called 'items.' Each element in that list holds information about a specific post in the timeline, including such elements as:
- [text] - holds the text value for the caption written under the post, including hashtags
- [likes] - the number of likes that a post has
- [created_at] - date post created
- [comments] - post comments
- [image_versions] - holds links to the actual JPG file, which we will use to display it on our Jupyter notebook
Get_url() will iterate over the list of posts and for each post will find the URL and append it to our empty list:
After it's done, we should have a list of URLs like the following:
To view the images, we can use the
Ipython.display module as follows:
Viewing images in a notebook is very useful and we will use those functions later to view our results, as you will see.
Get Your Most Liked Posts
Now we know how to make a basic request, but what if we want to get more complex requests? Here we will do something similar: we will get our most liked posts. In order to do that, first we need to get all posts in your user profile, and then sort them by number of likes:
Get All User Posts
In order to get all the posts we will use the
more_avialable values to iterate over our list of results:
|Top Stories Past 30 Days|