KDnuggets : Polls : Data preparation (Oct 2003)
Poll
What % of time in your data mining project(s) is spent on data cleaning and preparation [187 votes total]

over 80% (46) 25%
61 to 80% (73) 39%
41 to 60% (46) 25%
21 to 40% (7) 4%
20% or less (15) 8%

Comments

Karl Brazier, The blip at the bottom
Suspect there may be a small peak at the bottom end caused by model induction researchers like myself. Doesn't mean we don't think cleaning is important, just that our remit is to focus elsewhere. So we'll just have one or two new data sets to clean at the start of our work and probably supplement these with some of the cleaner sets from the UCI Repository. At the time of writing, I think I see this blip beginning to form. Well anyway, there's the offer of an explanation if it does.

KDnuggets : Polls : Data preparation

Copyright © 2003 KDnuggets. Subscribe to KDnuggets News!