Poll Results: Where is Big Data? For most, Largest Dataset Analyzed is in laptop-size GB range
A majority of data scientists (56%) work in Gigabyte dataset range. We note a small increase in Petabyte (web-scale) data miners, and a decline in Megabyte data miners. US, Australia/NZ, and Asia lead in percentage of Terabyte and Petabyte analysts.
Latest KDnuggets Poll asked:
What was the largest dataset you analyzed / data mined?
The 2015 results, based on 459 votes, show a very similar pattern that has remained surprisingly stable since 2012, and which suggests that majority of data scientists and analysts do not work with really big data.
To see the trends better, we grouped the answers into ranges for Megabytes (< 1GB), Gigabytes (1-999 GB), Terabytes (1-999 TB), and Petabytes (>1 PB). We will call data scientists with largest dataset analyzed in each range Mega-analysts, Giga-analysts, etc.
The global percent of Giga-analysts continued to slightly increase: 52.8% in 2013, 54.3% in 2014, 55.6% in 2015. The percent of Mega-analysts has steadily declined (from 26.1% in 2013 to 21.6% in 2015), as can be expected. The share of Tera-analysts has remained steady at 18.3-18.6% over 3 years. We do see slight growth at the upper end with Peta-analysts, from 2.5% in 2013 to 4.6% in 2015.
Here is a similar chart just for the US, which shows growth in Giga- and Peta-analysts and the corresponding decline in Megabyte and Terabyte analysts.
Regional participation was
The chart below shows the distribution of largest dataset ranges by region, sorted by % of TB+ answers. In US/Canada, 26.4% analysts worked with TB+ datasets. Next is AU/NZ where 22.2% worked on TB+ data, followed by Asia (21.7%), and Europe (20.7%).
Here are the results of past polls:
The 2015 results, based on 459 votes, show a very similar pattern that has remained surprisingly stable since 2012, and which suggests that majority of data scientists and analysts do not work with really big data.
- Majority of answers (52.8% in 2013, 54.3% in 2014, 55.6% in 2015) are in Gigabyte range. The median response was between 11 and 100 GB (which comfortably fits on one laptop) for each year 2012-15.
- Slight growth in responses from web-scale "peta-data-miners", which have analyzed petabyte scale databases (from 2.5% in 2013 to 4.6% in 2015).
- a small but significant gap, with almost no answers in 1-10 PB range, which separates analysts who work with Terabyte-size commercial data warehouses and those who work with multi-petabyte Internet-scale data stores.
To see the trends better, we grouped the answers into ranges for Megabytes (< 1GB), Gigabytes (1-999 GB), Terabytes (1-999 TB), and Petabytes (>1 PB). We will call data scientists with largest dataset analyzed in each range Mega-analysts, Giga-analysts, etc.
The global percent of Giga-analysts continued to slightly increase: 52.8% in 2013, 54.3% in 2014, 55.6% in 2015. The percent of Mega-analysts has steadily declined (from 26.1% in 2013 to 21.6% in 2015), as can be expected. The share of Tera-analysts has remained steady at 18.3-18.6% over 3 years. We do see slight growth at the upper end with Peta-analysts, from 2.5% in 2013 to 4.6% in 2015.
Here is a similar chart just for the US, which shows growth in Giga- and Peta-analysts and the corresponding decline in Megabyte and Terabyte analysts.
Regional participation was
- 42%, US/Canada
- 29%, Europe
- 18%, Asia
- 4.1%, Latin America
- 3.9%, AU/NZ
- 2.4%, Africa/MidEast
The chart below shows the distribution of largest dataset ranges by region, sorted by % of TB+ answers. In US/Canada, 26.4% analysts worked with TB+ datasets. Next is AU/NZ where 22.2% worked on TB+ data, followed by Asia (21.7%), and Europe (20.7%).
Here are the results of past polls: