KDnuggets Home » News » 2012 » Nov » Publications » Tim Graettinger: Data Mining Misconceptions The 50-50 Problem  ( < Prev | 12:n26 | Next > )

Tim Graettinger: Data Mining Misconceptions
The 50-50 Problem


 
  
When creating a predictive model, we need to "tune" it to match our client's concern between false alarms and missed opportunities. Setting the cut-off point between 'promising' and 'unpromising' depends a lot on our client's biggest concern -- missed opportunities or false alarms.


By Tim Graettinger, Discovery Corps, Nov 2012

... Some misconceptions arise from simple errors in logic. Often, they stem from a lack of familiarity or experience. None are particularly technical problems. All are easily remedied with simple examples and simple explanations. In this article, I will focus on one misconception that I call the "50/50 problem."

An Example of the 50/50 Problem
Recently, I was working with a very bright, energetic client in the biotech industry. Her firm builds imaging equipment and provides services to pharmaceutical companies. The imaging equipment (calling it a complex, microscope-like camera is far too wordy) generated data that she wanted to use to classify chemical compounds as promising or unpromising candidates for drugs. It turns out that in the vast world of chemical compounds, there are more unpromising drug candidates than promising ones - a lot more. My job was to use data mining techniques to create a classifier (a mathematical formula or a set of rules) that would successfully distinguish promising drug candidates from unpromising ones - using data produced by the imaging equipment.

After some initial work, I presented a classifier to my client. I happily reported that the classifier correctly labeled promising compounds as promising 10% of the time. My client was completely underwhelmed. Her knee-jerk response was, "But you can do 50% just by flipping a coin!"

The 50/50 Problem in a Nutshell

Fruit and Mud Is a misconception becoming evident? My client, like many intelligent people, made a simple error in thinking. She made the assumption that, because there were two possible outcomes (promising and unpromising), then the outcomes were both 50% likely. This is the "50/50 problem."

My own theory is that many of us are victims of our own education. All of my probability textbooks introduced the subject with discussions about flipping coins. With that as a starting point, perhaps it's no wonder that people make the 50/50 assumption without even thinking about it.

Read more.


KDnuggets Home » News » 2012 » Nov » Publications » Tim Graettinger: Data Mining Misconceptions The 50-50 Problem  ( < Prev | 12:n26 | Next > )