The latest KDnuggets Poll asked:
What was the largest database / dataset you analyzed?
Comparing the results of 2011 poll with a similar 2010 Poll: Largest Database Data Mined / Analyzed, we see that median dataset size in 2011 is in 10-20 GB range, while the median in 2010 was in 8-10 GB range.
Largest dataset analyzed in 2011 vs 2010
We note the steady growth of analysts with experience in the web-scale range of datasets.
In 2011 about 35.4% reported analyzing over databases over 100 GB (vs 32.2% in 2010), and
21.4% - over 1 Terabyte (vs 18.3% in 2010).
Regional breakdown shows that US leads in percent of data miners who worked with terabyte range datasets (about 30%).
(Note: Australia/NZ region not included, since not enough responses were received).
Region (voters) | Largest Dataset Analyzed (median) | % analyzed TB+ data |
---|---|---|
US/Canada (53) | 11-100 GB | 30.2% |
Europe (49) | 11-100 GB | 18.4% |
Asia (20) | 1-10 GB | 10% |
Latin America (15) | 1 GB | 6.7% |
Africa/Middle East (7) | 1-10 GB | 28.6% |
Here is another breakdown of Largest Dataset Analyzed by region.
Comments:
Gregory Piatetsky
see next KDnuggets Poll: Which data mining/analytic tools you used in the past 12 months for a real project
www.kdnuggets.com/2011/05/new-poll-analytics-data-mining-tools.html
Zoltán Prekopcsák
It would be interesting to know what tools they use for analyzing TBs of data.