The latest KDnuggets Poll asked:
What was the largest database / dataset you analyzed?
Comparing the results of 2011 poll with a similar 2010 Poll: Largest Database Data Mined / Analyzed, we see that median dataset size in 2011 is in 10-20 GB range, while the median in 2010 was in 8-10 GB range.
Largest dataset analyzed in 2011 vs 2010
We note the steady growth of analysts with experience in the web-scale range of datasets.
In 2011 about 35.4% reported analyzing over databases over 100 GB (vs 32.2% in 2010), and 21.4% - over 1 Terabyte (vs 18.3% in 2010).
Regional breakdown shows that US leads in percent of data miners who worked with terabyte range datasets (about 30%).
(Note: Australia/NZ region not included, since not enough responses were received).
|Region (voters)||Largest Dataset Analyzed (median)||% analyzed TB+ data|
|US/Canada (53)||11-100 GB||30.2%|
|Europe (49)||11-100 GB||18.4%|
|Asia (20)||1-10 GB||10%|
|Latin America (15)||1 GB||6.7%|
|Africa/Middle East (7)||1-10 GB||28.6%|
Here is another breakdown of Largest Dataset Analyzed by region.
see next KDnuggets Poll: Which data mining/analytic tools you used in the past 12 months for a real project
It would be interesting to know what tools they use for analyzing TBs of data.