KDnuggets Home » Polls » Largest database data-mined Poll (Jun 2009)

Largest database data-mined Poll


What was the largest database or dataset you data-mined? [185 votes total]

less than 1 MB (5) 3%
1.1 to 10 MB (9) 5%
11 to 100 MB (12) 6%
101 MB to 1 GB (19) 10%
1.1 to 10 GB (46) 25%
11 to 100 GB (35) 19%
101 GB to 1 Terabyte (30) 16%
1.1 to 10 Terabytes (17) 9%
over 10 Terabytes (12) 6%

The median database size is 10-20 GB, same as in 2008 KDnuggets Poll: largest database or dataset you data-mined.

We note that 2009 poll run for longer period of time, so it got more votes.

We observe that there are the same number of votes in over 10 Terabytes range, but significantly more votes in 101 GB to 1 Terabyte, and especially 1.1 to 10 Terabytes ranges.

Olumide Sonubi, largest database or dataset
We have worked with two of the largest telecoms and one of the top ten banks in Europe in the last 2 years on various data mining projects. I can categorically say we have never worked on petabytes, mainly terabytes even when working on CDR data or Wap logs. We tend to extract data from the warehouse based on the project objectives which bring down the data to a manageable size.
The largest data mining project for a telecom client with appropriately 5 billion monthly CDRs over a 12 to 18 period came to less than 500 terabytes.

Tim Manns, why not the petabyte range?
I'm probably working on about 20-30 terabytes. Large telcos in the US, Europe or Asia would have many times this amount, perhaps into the petabyte range.
Although there aren't many, some data warehouses are reaching petabtyes. Maybe you should raise the scale to include that option. I'd love to see how many responses you get :)

(Editor: will add petabyte range in the next poll) For comparison, here are the results

KDnuggets Home » Polls » Largest database data-mined Poll (Jun 2009)