Dissecting the Big Data Twitter Community through a Big data Lens

Tweeter communities have activities: tweets, retweets, replies, and followers. Retweets graph is a good representation of actual connections in the network, their strengths, as well as the propagation of information through the network.

Is it a Small World Network?

As shown by the following plot, the Retweets distribution follows a Power law, but edge distribution is close to Power law but falls short. The network is close to a scale-free network.



However, the network has a very high diameter of 154 and a mean path length 11. Hence, it is not a small world network. Furthermore, it’s Cluster coefficient is very small (0.0009953724), which suggest that the cross chatter in the network is very small. So the Big data retweets do not create a cohesive community.

How can I get more Retweets?


When we talk about retweets, this is the thought on everyone’s mind. The plot shows number of tweets per day in X-axis, the number of followers on Y axis in log scale, and each point’s size and the color is decided by the number of retweets it has received.

According to the plot, having a lot of followers helps and necessary, but it is not a sufficient. However,  tweeting a lot seems to help, and most tweeps tweeting more than ten tweets a day have received at least 10 retweets. ( Retweets are not included).

Are Tweet Bots Useful?

Do retweets bots (e.g. BigDataTweetBot, NoSQLDigest) are useful or do they just create noise by retweeting things blindly? Let us investigate. Let’s look at the betweenness centrality, which is a measure of the role of each node in connecting the network, to understand who are key connectors in the network. @Espenel takes the first while the fourth takes by @KirkDBorne. Second and third are taken by twitter bots (BigDataTweetBot, NoSQLDigest), which suggests that twitter bots are indeed useful.

What did Community talking about?


Following word cloud shows the words that have been most often used. The word cloud has most of the usual suspects, like links to IoT and cloud,  businesses, marketing etc. Among companies Google, Intel and IBM have been mentioned.

Interestingly, we do not see any of the big data tools. It is possible that related discussions happen in their own hashtags such as #hadoop and #spark.

Following are most tweeted tweets through the time period

  1. #bigdata to our users !!! check the new keyword suggestions for an improved
  2. 4 predictive #analytics and practical applications for the everyday marketer (422)
  3. marrying #data to #analytics a major theme at #hp’s conference  (153)
  4. combining analytics and security to treat vulnerabilities like ants (150)
  5. sbi uses big data mining to check defaults biz loss: when state bank of india  (141)

Following are most tweeted tweets by day. We only list tweets that have had more than 75 retweets in a day. It shows the number of tweets it has received within brackets.

  1. Aug 05: guidelines to optimize #bigdata transfers (89)
  2. Aug 10:#nfl taps #bigdata to study #concussions but major game changes far off (139)
  3. Aug 10: sbi uses big data mining to check defaults biz loss: when state bank of india (sbi)  (140)
  4. Aug 12: #iot facts + how to make business sense of the internet of things (85)
  5. Aug 18: idf 2015: intel teams with google to bring realsense to project tango (113)
  6. Aug 18: marrying #data to #analytics a major theme at #hp’s conference (152)
  7. Aug 19: combining analytics and security to treat vulnerabilities like ants: bill franks chief analytics off (149)
  8. Aug 20: qantas annual profit soars to au$975m: australia’s flying kangaroo is out of the red having boosted (115)
  9. Aug 20: top news: sap oem on twitter: “top 10 #bigdata twitter handles to follow @merv (78)
  10. Aug 22: five open source big data projects to watch (132)
  11. Aug 22: 3 ways that big data are used to study #climatechange (126)
  12. Aug 22: should #bigdata be used to measure #employee #productivity? (110)
  13. Aug 23: e-commerce market #analytics to #ebay #amazon #alibaba sellers and buyers
  14. Aug 23: should #bigdata be used to measure #employee #productivity? (134)

One interesting observation is that most trending tweets were about usecases, not about tools or techniques.


  1. Few well-known tweeps have a lot of retweets, and top three roughly have their own communities.
  2. The network is roughly scale-free, but not a small world network. Nodes are weakly connected, which suggests non-cohesive  communities.
  3. A large number of followers is a necessary but not a sufficient condition to receive a lot of retweets. Tweeting a lot seems to help.
  4. Tweet bots are centrally placed and likely useful.
  5. Most retweeted tweets seem to focus on use cases.

Bio: Srinath Perera is a scientist, software architect, and a programmer that works on distributed systems.