Poll Results: Where is Big Data? For most, Largest Dataset Analyzed is in laptop-size GB range
A majority of data scientists (56%) work in Gigabyte dataset range. We note a small increase in Petabyte (web-scale) data miners, and a decline in Megabyte data miners. US, Australia/NZ, and Asia lead in percentage of Terabyte and Petabyte analysts.
The 2015 results, based on 459 votes, show a very similar pattern that has remained surprisingly stable since 2012, and which suggests that majority of data scientists and analysts do not work with really big data.
- Majority of answers (52.8% in 2013, 54.3% in 2014, 55.6% in 2015) are in Gigabyte range. The median response was between 11 and 100 GB (which comfortably fits on one laptop) for each year 2012-15.
- Slight growth in responses from web-scale "peta-data-miners", which have analyzed petabyte scale databases (from 2.5% in 2013 to 4.6% in 2015).
- a small but significant gap, with almost no answers in 1-10 PB range, which separates analysts who work with Terabyte-size commercial data warehouses and those who work with multi-petabyte Internet-scale data stores.
To see the trends better, we grouped the answers into ranges for Megabytes (< 1GB), Gigabytes (1-999 GB), Terabytes (1-999 TB), and Petabytes (>1 PB). We will call data scientists with largest dataset analyzed in each range Mega-analysts, Giga-analysts, etc.
The global percent of Giga-analysts continued to slightly increase: 52.8% in 2013, 54.3% in 2014, 55.6% in 2015. The percent of Mega-analysts has steadily declined (from 26.1% in 2013 to 21.6% in 2015), as can be expected. The share of Tera-analysts has remained steady at 18.3-18.6% over 3 years. We do see slight growth at the upper end with Peta-analysts, from 2.5% in 2013 to 4.6% in 2015.
Here is a similar chart just for the US, which shows growth in Giga- and Peta-analysts and the corresponding decline in Megabyte and Terabyte analysts.
Regional participation was
- 42%, US/Canada
- 29%, Europe
- 18%, Asia
- 4.1%, Latin America
- 3.9%, AU/NZ
- 2.4%, Africa/MidEast
The chart below shows the distribution of largest dataset ranges by region, sorted by % of TB+ answers. In US/Canada, 26.4% analysts worked with TB+ datasets. Next is AU/NZ where 22.2% worked on TB+ data, followed by Asia (21.7%), and Europe (20.7%).
Here are the results of past polls: