Predicting Popularity of Online Content

A look at predicting what makes online content popular, with a particular focus on images, especially selfies.

By Tomasz Trzcinski, Tooploox.

This is a post based on the talk given at Warsaw Data Science Meetup on April 12th 2016.

It’s difficult to make predictions, especially about the future.

Nowadays, whenever we hear about a recent breakthrough in science, technology or business, it is more than likely that the so-called ‘big-data’ is part of the success. Whether it’s a new medical algorithm to discover early stages of cancer or a new method for reducing traffic in big cities, massive amounts of digital data are very often a ‘secret ingredient’. As a matter of fact, everyone of us has probably heard at least a couple of the estimates on how much data human mankind creates every second, every minute or every hour. If you still have trouble grasping the breadth of this process, have a look at where they show the ever increasing amount of information created online in real time.


Now that I convinced you that there is an abundance of data available at our fingertips, how about explaining why it might be interesting to predict which part of this content will be popular? By popular, I mean accessed, shared, modified, retrieved, etc. multiple times. Imagine that you are storing your data on your local computer and people who want to download will need to get it directly from your hard drive. How much data transfer you will need to get your data to all the interested parties depends on how popular your data will be. Let’s take another, more realistic example (at least for 1.5 billion people who use Facebook): you take a picture and you upload it on Facebook. How many likes will your picture get?

As this is clearly a vital problem for humans across the world, many researchers have looked into it. One of the most interesting works entitled “What makes an image popular” by A. Khosla, A. Das Sarma and R.Hamid from MIT tries to answer this question by looking into visual and social features of the images from Flickr. They gather a dataset of over 2.3M images from different users and their corresponding popularity metric - number of views. Below you can find a set of example pictures from their dataset:


Source: What makes an image popular?
A. Khosla, A. Das Sarma and R.Hamid.

First thing that they analysed was the popularity distribution. As a matter of fact, the vast amount of pictures makes it impossible for all of them to become popular. Only a small percentage of the pictures will actually become popular, while the majority will remain seen only by a few. This is reflected in the long-tail character of the popularity distribution graph. To deal with this variation within the data authors transformed view counts using logarithmic function and normalized them by the time passed since publication. The resulting distribution curve looks much more balanced and now they could get back to the original question: what makes an image popular?

Source: What makes an image popular?
A. Khosla, A. Das Sarma and R.Hamid.

Khosla et al. started by looking into the visual features such as the colors present in an image, edge and gradient distributions as well as outputs of deep convolutional neural networks - machine learning algorithms proved successful in image classification tasks such as ImageNet Challenge. The results show that:

  • greenish and blueish colors tend to have lower importance when predicting image popularity compared to more reddish colors. This may be due to the fact, that more striking colors attract more attention


Source: What makes an image popular?
A. Khosla, A. Das Sarma and R.Hamid.

  • convolutional neural networks and their outputs (object detection results) provide an important insights into the popularity metrics of the image. For instance, objects like miniskirt, bikini or perfume exhibit strong positive impact while spatula, plunger or laptop has a rather negative impact on image popularity
  • when using all visual features available before the publication of the photo, the authors obtained Spearman correlation rank of up to 0.4 (in the scale of magnitude of 0 to 1), which suggests that visual content plays a role in its popularity.

They also analysed the social cues, such as mean number of views received by the photos posted by the users as well as number of tags and photo description length. They concluded that social cues play even more important role in popularity prediction than visual cues and the corresponding correlation reaches up to 0.77.

Finally, they implemented a demo for anyone to use and predict the popularity of his/her photo before it is published online. They have also showed a few sample images aligned according to their true popularity as well as their predicted popularity:


Source: What makes an image popular?
A. Khosla, A. Das Sarma and R.Hamid.