By Tim Graettinger, Ph.D., Jan 2011
Did anyone ever tell you that you have "dirty data"? Were you offended? Surprised? Did you vow to do something about it? Or did you just resign yourself to living with it?
Missing data is one common component that contributes to the unpleasant moniker "dirty data". In this installment of my series on the nuts and bolts of data mining, we'll tackle missing data. We'll start first with detecting it. From there, we'll diagnose the issues, both qualitatively and quantitatively. With a proper diagnosis, we can then prescribe a treatment for any of the variety of situations that crop up. You'll walk away with a mental framework and a set of tools and techniques that are invaluable for real-world data mining applications.
In this article, Tim Graettinger covers
- What is Missing Data - and Why Do We Care?
- Instruments of Detection
- Diagnosis: Missing
- Prescription for Better Results
- Intensive Care
- Post-Op