In the data mining process, where do data scientists add the most value? Is it in exploring the data, uncovering anomalies and seeing relationships between elements?
Date:
Discovery Corps, April 1, 2012, By Tim Graettinger.
In the data mining process, where do data scientists like you and me add the most value? Is it in exploring the data, uncovering anomalies and seeing relationships between elements? In selecting transformations for the elements to improve their representations for modeling and analysis? In building sophisticated predictive models?
I believe we contribute the greatest value by framing the problem. We can do stellar technical work - but if the problem is framed poorly, it's all a pointless exercise -- maybe even solving the wrong problem. Conversely, even a mediocre model - applied to the right, well-framed problem - will provide immediate benefit to an organization.
What do I mean by "framing the problem"? Simply put, it means to clearly, explicitly define what the problem is and is not. When I frame a problem, I work through a checklist that includes these five key questions:
- What is the unit of analysis?
- Who/what is the population of interest?
- What is the outcome?
- What is the time frame?
- How will we measure success?
By answering these questions, we frame the problem. We will look at the first two of these "framing questions" in depth in the rest of this article. We will tackle the latter questions in the next article in the series. Let's get started!
Read more.
|