KDnuggets Home » News » 2012 » Jul » Publications » Framing the Data Mining Problem - Part 2  ( < Prev | 12:n17 | Next > )

Framing the Data Mining Problem - Part 2

Tim Graettinger focuses on three key questions that help to clearly and explicitly define the problem to be solved at the outset of a data mining project.

By Tim Graettinger, Discovery Corps Inc, July 2012

In Part 1 of this mini-series, we posed the question,

"Where in the data mining process do humans like you and me - the data scientists - add the most value?"

Frame My response was that we contribute the greatest value by framing the problem well. In this context, framing means to clearly, explicitly, define what the problem is and is not. My framing checklist includes these five key questions:

  • What is the unit of analysis?
  • Who/what is the population of interest?
  • What is the outcome?
  • What is the time frame?
  • How will we measure success?
Part 1 addressed the first two framing questions above. In this article, we will tackle the remaining three topics. Keep in mind that framing is an ongoing conversation between you and your client. It is a continual process of discovery and refinement. Nothing is more important to the success of your project, because ...

The solution you build is determined by the way you frame the problem.

What is the Outcome?

Outcome. Output. Observed result. For us, these terms are synonymous. They all answer the question, "What happened?" For instance, in telecommunications, an outcome of frequent interest is renewal - did a customer renew their contract or did they terminate. In the world of non-profit fundraising, response to a marketing campaign is an outcome. That is, did a prospective donor make a contribution or not. Both of these outcomes have a yes/no flavor to them. Outcomes can be more diverse than yes or no, however. For a residential real estate application, we might choose the selling price, in dollars, as the outcome.

Perhaps the selected outcomes above seem fairly obvious, and in certain instances they are. Other times, the situation is not so clear cut, and a choice must be made - and the choice must be consciously made with buy-in from your client.

Consider the notion of renewal in fundraising. Gifts to a non-profit are freely made on a date chosen by the donor. There is no termination of services if no gift is made. Nevertheless, it is useful for organizations to think about donors who make gifts on a regular, "renewing" basis and those who lapse. Stop reading for a moment and ruminate on what outcome you might define for this situation.

You did stop and think for at least a minute, didn't you? Go ahead, you'll thank me later. In my work with various non-profit fundraisers, we have typically defined a "lapsed" donor as one who has not made a gift for at least 13 months. This choice is appropriate and pretty common in fundraising circles since many donors make contributions on an annual basis. Making the time window 13 months (rather than 12) gives ample room for the vagaries of human behavior.

Read more.

KDnuggets Home » News » 2012 » Jul » Publications » Framing the Data Mining Problem - Part 2  ( < Prev | 12:n17 | Next > )