KDnuggets Home » Polls » Largest Data Size Data-Mined (May 2007)

Largest Data Size Data-Mined

Largest database or dataset you data-mined was

over 10 Terabytes 15
7 (estimated)
1.1 to 10 Terabytes 22
14 (estimated)
101 GB to 1 Terabyte 21
11 to 100 GB 35
1.1 to 10 GB 32
101 MB to 1 GB 22
11 to 100 MB 8
1.1 to 10 MB 12
less than 1 MB 6
2007 count
(total 173)

2006 count
(total 181)


Note from the Editor: The above graph compares the results from 2007 poll with the results from a similar 2006 KDnuggets Poll on largest database data-mined.

We see growth in the top end. In 2007, 37 or 22% of respondents reported mining databases of 1 terabyte or more, about double of 11.5% who dealt with terabyte-size databases in 2006. Also, the median of largest database size mined in 2007 was in the 30-60 GB range, while for 2006 it was in the 2-4 GB range, a growth of 1 order or magnitude !

Will Dwinell, Measuring Data Size
Measuring the size of the data is tricky, since:
  1. A given set of data may be stored more or less efficiently. This can make several orders of magnitude of difference.
  2. Measured size will vary, depending on whether data in multiple tables is joined or not.
  3. The original data set may be very much larger than one which is actually digested downstream, giving activities like sampling and feature selection.

TimManns, largest database analysed
The largest single database table I access is 900GB (62 days of data, 70 million rows per day), but I commonly access data from several different tables. The sum of these tables is probably several terabytes.
The conclusions of my analysis (churn, cross-sell, fraud detection etc) is usually a single flat table of approx 1GB, which is stored for a month.

KDnuggets Home » Polls » Largest Data Size Data-Mined (May 2007)