|
Comments
Note from the Editor:
The above graph compares the results from 2007 poll with the results
from a similar
2006 KDnuggets Poll on largest database data-mined.
We see growth in the top end. In 2007, 37 or 22% of respondents reported mining databases of 1 terabyte or more, about double of 11.5% who dealt with terabyte-size databases in 2006.
Also, the median of largest database size mined in 2007 was in the 30-60 GB range, while for 2006 it was in the 2-4 GB range, a growth of 1 order or magnitude !
Will Dwinell, Measuring Data Size
Measuring the size of the data is tricky, since:
- A given set of data may be stored more or less efficiently. This can
make several orders of magnitude of difference.
- Measured size will vary, depending on whether data in multiple tables
is joined or not.
- The original data set may be very much larger than one which is
actually digested downstream, giving activities like sampling and
feature selection.
TimManns, largest database analysed
The largest single database table I access is 900GB (62 days of data, 70
million rows per day), but I commonly access data from several different
tables. The sum of these tables is probably several terabytes.
The conclusions of my analysis (churn, cross-sell, fraud detection etc)
is usually a single flat table of approx 1GB, which is stored for a
month.
| |
|