| View previous topic :: View next topic |
| Author |
Message |
prakash.sridharan Contributor
Joined: 16 Jan 2008 Posts: 5 Location: Mumbai
|
Posted: Thu Jan 15, 2009 9:58 am Post subject: Use of Unsupervised models for fraud detection |
|
|
Hi,
Has anybody come across the use of cluster analysis/other unsupervised models for fraud detection. I'm facing a specific challenge to use an unsupervised model for detecting fraud. The dataset does not have a specific variable as a target/dependent variable.
This is in the field of Health Insurance. I'm unable to divulge too many details in this regard. But the kind of variables we have here are similar to what we encounter in Banking - Credit Card Fraud etc...
Thanks in advance for your help
Prakash |
|
| Back to top |
|
 |
TimManns Data Mining Guru
Joined: 25 Sep 2006 Posts: 37 Location: Sydney
|
Posted: Sun Jan 18, 2009 4:24 pm Post subject: yes, try searching for info on 'network intrusion detection' |
|
|
Over time or at a specific point in time?
- if over a period of time looking for changing/new events
Some of the early virus detection programs worked in this way (ie. anything new on the system was considered a virus/problem).
I've done a couple of projects using Kohonen network nets to observe a large sample and build a model, then observe changes that occur in any individual. Kohonen was useful because you could fit the clusters on a 2D grig map and use flexible criteria about how many X or Y points an individual varies over time from their original cluster classification.
If an individual changes clusters then this may suggest a change in behaviour and/or frauduent behaviour. Stolen credit card etc.
- if at a snap-shot fixed point in time.
Just cluster the data. Any rows not being easily clustered may be frauduent or exceptional for some reason. Simple checking of outlierrs often helps too.
Cheers
Tim |
|
| Back to top |
|
 |
r_bhatt
Joined: 20 Jan 2009 Posts: 1
|
Posted: Tue Jan 20, 2009 7:50 am Post subject: Unsupervised models for fraud detection |
|
|
Prakash:
One could use fuzzy logic and clustering to solve the problem.
First you should create variables that measure the potential abnormal patterns in claims behavior. Example of variables could be:
- Number of claims filed through the physician/ service provider in the past month/ 3 months/ 6 months
- Number of claims filed by medical condition code divided by historical average for number of claims by that medical condition code
- Number of claims filed by customer ID
- Disparity features that measure the disparity between age, gender, occupation and medical condition code (e.g., Osteoporosis claims filed by a 20-year old male student) -- this has to be done by medical condition code
Each claim could then be scored on a percentile basis on each variable (looking back 3-6 months). You could then cluster the claims on these percentile scores to get outliers or could use simple scoring algorithms based on business understanding.
You can contact me (raj at knowledgefoundry.net) if you need more details.
cheers |
|
| Back to top |
|
 |
clifton.phua
Joined: 10 Jul 2007 Posts: 3
|
Posted: Tue Jan 20, 2009 8:27 am Post subject: peer group analysis, spike detection, anomaly detection |
|
|
| if your purpose is to detect fraud in real-time, you can probably try out peer group analysis, spike detection, or anomaly detection approaches (they have been applied to various kinds of fraud detection before). however, you still need class labels to evaluate your models/algorithms. |
|
| Back to top |
|
 |
prakash.sridharan Contributor
Joined: 16 Jan 2008 Posts: 5 Location: Mumbai
|
Posted: Fri Jan 23, 2009 1:31 am Post subject: |
|
|
Raj,
Thank you very much for your suggestion. Its a very interesting idea. I would like to learn more about it. I'll contact you.
Clifton,
This is the approach we are exploring at the moment.
Thanks all for your suggestions. |
|
| Back to top |
|
 |
|