|
| View previous topic :: View next topic |
| Author |
Message |
B
Joined: 24 Aug 2006 Posts: 1
|
Posted: Thu Aug 31, 2006 10:28 am Post subject: selecting Training data |
|
|
Hi,
Background:
We have a quality control procedure, during which we collect metrics, and after which we calculate a ROI (return on investment). We introduced a less rigorous procedure for less demanding situations and our ROI plummetted!
I've been given the task of seeing whether any data mining technique can help explain what we should change in the less rigorous procedure to improve ROI. (We have someone else running statistical analyses, and he has not yet found any particular aspect that needs to be changed.)
My question:
I understand that in a "real" data mining application, one creates models on a subset of data and then evaluates performance on a different subset of data (to avoid overfitting so the model remains valid for future data). But since my main goal is to see what the model tells me about our existing data, would I be justified to create my models using all the currently available data as training data?
BTW, in case it makes any difference ... I'm a rank beginner, having had only one short undergraduate course in data mining. And I'm using Weka to do my task.
TIA for any help ...
B |
|
| Back to top |
|
 |
editor Site Admin
Joined: 04 Oct 2005 Posts: 120 Location: Boston, MA
|
Posted: Thu Aug 31, 2006 4:38 pm Post subject: Can you create models using all training data |
|
|
The short answer is NO. If you do that, your models are likely to overfit the data and produce much better accuracy than should be expected for new data. As a test, try your procedure on a randomized datra, where you randomly permute the values of your key variable, e.g. ROI and see if you
get results as good as using the original data.
If you have a small amount of data, you should use 10-fold cross validation.
See my lecture 10 under
www.kdnuggets.com/data_mining_course/
for explanation of cross-validation
Gregory Piatetsky |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|
|
|