| Poll |
Industries/fields where you applied data mining in the past 12 months [111 voters, 278 votes total]
|
| CRM (43) |
39.1% |
| Fraud Detection (24) |
21.8% |
| Direct Marketing/ Fundraising (22) |
20.0% |
| Credit Scoring (21) |
19.1% |
| Biotech/Genomics (17) |
15.5% |
| Web content mining/Search (15) |
13.6% |
| Other (15) |
13.6% |
| Telecom (14) |
12.7% |
| Web usage mining (12) |
10.9% |
| Science (12) |
10.9% |
| Insurance (12) |
10.9% |
| Retail (11) |
10.0% |
| Investment / Stocks (11) |
10.0% |
| Medical/ Pharma (8) |
7.3% |
| Manufacturing (7) |
6.4% |
| Government/Military (7) |
6.4% |
| e-commerce (6) |
5.5% |
| Travel/Hospitality (5) |
4.5% |
| Security / Anti-terrorism (5) |
4.5% |
| Health care/ HR (5) |
4.5% |
| Junk email / Anti-spam (2) |
1.8% |
| Entertainment/ Music (2) |
1.8% |
| Banking (1) |
0.9% |
|
Note: percentages are relative to the number of voters.
Oren Etzioni, Farecast.com
On June 27th, Farecast announced the Public Beta launch of
Farecast.com,
the first and only airfare prediction site on the Web. Their predictions
are based on data-mining methods originally developed at the University
of Washington by Prof. Oren Etzioni in collaboration with his student
Alex Yates as well as Dr. Craig Knoblock and Rattapoom Tuchinda of USC.
Gunnar Blix, Financials / Lending
Financials / Lending is absent from the list. There are a number of
interesting applications in that area, including default and prepayment
risk, pipeline conversion, etc. Some applications may fall under CRM
and marketing, but certainly not all.
Karl Brazier, Other Fields
I recently worked on a study to try out some data mining in a social
policy studies application.
See http://www.ccp.uea.ac.uk/publications.asp, paper 06-1 for a social
science paper or
http://www.actapress.com/Content_Of_Proceeding.aspx?ProceedingID=303,
paper 468-089 for a bit more explanation of the DM
(Sorry - no free
download for this one. If you're really keen, e-mail me on
karl.brazier2(a)norwich-union.co.uk and I'll see what I can do...)
I think this is a potentially rich and currently under-exploited field
for DM research. It has generated a lot of data from questionnaire
surveys, often not targetted at answering a specific question, and
continues to do so. Data are usually a complex mixture of categorical,
numerical and free text data and number of fields is often high. And
there may be a need to induce models on different versions of the
outcome variable because the best definition for it is not universally
agreed.
The work to be done to open up this field seems to be to overcome
resistance from its strongly classical statistical culture, which is
rather sceptical of an approach that searches hypothesis spaces instead
of doing traditional propose-then-test. But I think DM is so well
suited to the material, both data and problems posed, that this
resistance needs to be challenged.
| |
|