Data preparation part in data mining projects


What % of time in your data mining project(s) is spent on data cleaning and preparation [187 votes total]

over 80% (46) 25%
61 to 80% (73) 39%
41 to 60% (46) 25%
21 to 40% (7) 4%
20% or less (15) 8%


Karl Brazier, The blip at the bottom
Suspect there may be a small peak at the bottom end caused by model induction researchers like myself. Doesn't mean we don't think cleaning is important, just that our remit is to focus elsewhere. So we'll just have one or two new data sets to clean at the start of our work and probably supplement these with some of the cleaner sets from the UCI Repository. At the time of writing, I think I see this blip beginning to form. Well anyway, there's the offer of an explanation if it does.

