What Data You Analyzed – KDnuggets Poll Results and Trends

Image/video data analysis is surging, JSON replacing XML, anonymized data usage is growing in US and Europe (but not in Asia), itemsets and Twitter analysis is declining - some of the highlights of KDnuggets Poll on data types used.



Over 600 readers voted in latest KDnuggets Poll asked:
What data types you analyzed in the past 12 months?

Here are the highlights:
  • Table, Text, and Time Series remained the most popular types of data used
  • the usage of image/video is surging (186% up)
  • anonymized data use is growing in US, Canada, and Europe (but not in Asia)
  • JSON usage is up, replacing XML (whose usage is down)
  • Itemsets/transaction analysis is down 20% (the association rule algoritms replaced by more complex analysis)
  • web log data analysis is declining - perhaps because large sites are relying more on Google Analytics.
Poll Data Types Analyzed 2017
Fig. 1: Data Types Analyzed, 2017


The most popular data types analyzed in 2017 were
  1. Table data (fixed n. columns), 69.8%, first place, as in the past polls
  2. Text, 46.4% - moved to 2nd place compared to 2014
  3. Time series, 45.6%, dropped to 3rd place
  4. JSON, 25.5%, up from 7th place
  5. Anonymized data, 22.8%, up from 10th place
  6. Location/geo, 22.6%,


Comparing with a similar 2014 KDnuggets Poll: Data Types/Sources Analyzed, we see the largest increases in share of responses for
  • Images / video, from 4.9% to 14.1%, 186% up
  • Anonymized data, from 14.0% to 22.8%, 63% up
  • Other, from 7.2% to 11.2%, 56% up
  • JSON, from 17.0% to 25.5%, 50% up
  • Location/geo, from 19.7% to 22.6%, 14.9% up
The biggest decreases, compared to 2014 poll were for
  • Itemsets / transactions, from 26.5% to 20.1%, 24% down
  • Web clickstream/web log, from 12.5% to 10.0%, 20% down
  • Twitter, from 17.8% to 14.7%, 17% down
  • XML, from 14% to 12%, 14% down
  • Table data (fixed n. columns), from 76.9% to 69.8%, 9.3% down
Poll Data Types Analyzed 2017 Vs 2014
Fig. 2: Data Types Analyzed, 2017 vs 2014


Regional distribution of 632 voters
  • US/Canada, 36.6%
  • Europe, 34.2%
  • Asia, 17.4%
  • Africa/Middle East, 4.4%
  • Latin America, 4.3%
  • Australia/NZ, 3.2%
Next, we compared the share of top 3 most popular data types, and also anonymized data and Image/video across 3 largest regions.

Poll Data Types Analyzed Region 2017
Fig. 3: Popular Data Types Analyzed by region, 2017


Some observations:
  • US data analysts use tabular data the most;
  • text usage is similar across regions;
  • Asian data analysts use much less anonymized data, but slightly lead in image/video data analysis.