KDnuggets Home :: News » 2011 » Jan » Software » How Twitter Uses NoSQL  ( < Prev | 11:n02 | Next > )
Latest News

How Twitter Uses NoSQL


How Twitter manages Big Data (12 TB/day) with Scribe, Hadoop, Pig, Cloudera, and other software

ReadWriteWeb, By Klint Finley / January 2, 2011

Twitter InfoQ has released a video of Twitter's Kevin Weil speaking at Strange Loop earlier this year on how the company uses NoSQL. Weil is quick to point out that Twitter is heavily dependent on MySQL. However, Twitter does employ NoSQL solutions for many purposes for which MySQL isn't ideal. According to Weil, Twitter users generate 12 terrabytes of data a day - about four petabytes per year. And that amount is multiplying every year. ...

Syslog stopped scaling for Twitter after a while, so instead it uses Scribe, a log collection framework created and open-sourced by Facebook. ... Twitter uses Scribe to write logs to Hadoop. Scribe made it so easy for Twitter to log data, it started to log much more data. It now logs 80 different categories of data.

Twitter needs to store more data per day than it can reliably write to a single hard drive, so it needs to store data on clusters. Twitter uses Cloudera's Hadoop distribution to power its clusters.

Weil says MySQL isn't efficient at doing analytics at the scale Twitter needs. Instead, Twitter uses Hadoop and its own open source project called FlockDB. Hadoop can run analytics and hit FlockDB in parallel to assemble social graph aggregates.

A Pig Script

Read more.

KDnuggets Home :: News » 2011 » Jan » Software » How Twitter Uses NoSQL  ( < Prev | 11:n02 | Next > )

Copyright © 2014 KDnuggets.  | SUBSCRIBE to KDnuggets News email  | Tweet Twitter | facebook Facebook | RSS RSS | About KDnuggets