Computing Platforms for Analytics, Data Mining, Data Science

The poll results suggest a split between a majority of data miners and data scientists who work with growing but still "PC-size", small GB-sized data, and a smaller group of Big Data analysts who work with cloud-sized data. Cloud computing, Unix, and especially Mac gained in popularity.

By Gregory Piatetsky, @kdnuggets.

The latest KDnuggets Poll was on

Computing platform for your analytics, data mining, data science work or research:.

Despite the popularity of Big Data, the majority of data miners and data scientists work with "PC-size" data, with PC remaining the most popular platform. However, the median response for number of CPUs and memory used approximately doubled since the previous such poll in 2010, and Unix and especially Mac gained in usage.

The results of the poll are based on 282 voters.

The Venn diagram below shows the relative popularity of PC/Laptop (85%), Server (30%), and Cloud platforms (24%), and also the overlaps.
Interestingly, despite the Big Data hype, PC remains the most popular platform for data mining and analytics work, although a significant part is now done in the cloud and on a server.

Platform Popularity for Analytics / Data Mining: PC vs Server vs Cloud
Fig 1: Platform Popularity for Analytics / Data Mining: PC vs Server vs Cloud.

The average data miner in this poll used 1.4 platforms.
54% used only PC/laptop, 9% used only dept server, and 5% only the cloud.

Among those that used cloud computing, 59% of them used private cloud, 46% used public cloud, and 6% used both.

Next poll question was on processing power.

How many processors/cores your typical data mining job actually uses?
1 core (69)  24%
2 cores (45)  16%
3-4 cores (86)  30%
5-16 cores (51)  18%
17-64 cores (17)  6.0%
> 64 cores (14)  5.0%

Median number of cores is 3-4.

Although my note on the poll said "if you have 8-core processor but your job only uses 1, choose 1", I suspect most voters ignored this, since the most common answer is 3-4 cores which probably corresponds to the number of cores in the CPU (many popular laptops/PCs now have 4 cores), but I doubt that so many data mining jobs were so well parallelized that they actually used all 4 cores.

Comparing with a similar poll 2010 KDnuggets poll: Computer configuration for your main Analytics / Data Mining machine, we see that the median number of cores approximately doubled, to 3-4 in 2015 from 2 in 2010.

Next poll question was on memory.