Date:
Discovery Corps, April 1, 2012, By Tim Graettinger.
In the data mining process, where do data scientists like you and me add the most value? Is it in exploring the data, uncovering anomalies and seeing relationships between elements? In selecting transformations for the elements to improve their representations for modeling and analysis? In building sophisticated predictive models?
I believe we contribute the greatest value by framing the problem. We can do stellar technical work - but if the problem is framed poorly, it's all a pointless exercise -- maybe even solving the wrong problem. Conversely, even a mediocre model - applied to the right, well-framed problem - will provide immediate benefit to an organization.
What do I mean by "framing the problem"? Simply put, it means to clearly, explicitly define what the problem is and is not. When I frame a problem, I work through a checklist that includes these five key questions:
- What is the unit of analysis?
- Who/what is the population of interest?
- What is the outcome?
- What is the time frame?
- How will we measure success?
Read more.