dataminer101

Joined: 09 Jan 2006
Posts: 2
Location: Montreal

 Posted: Mon Jan 09, 2006 1:00 pm    Post subject: Help! Agriculture, finding best param values Hi everybody and happy new year to all of you. I am asking for your help because I am really not getting anywhere with my project. I am a newbie to data mining and I am doing my best to learn the basics so that I get started with my project. I guess my problem, is the fact that I am new to data mining and I have no clue how to get started. The approach I have been following is trying to read books about data mining and machine learning so that I can understand and compare all the "numerous" algorithms out there and then try to find to the one(s) that would apply to my case. The first problem with this approach is that I did not find any good resources (either the subjects are treated on the surface or they are overly made complex and hard to follow through). The second problem is that it is incredibly time consuming. So I am wondering if I should continue in this path or if I should try to proceed differently. I am sure that a lot of you guys have been in the same position and some of you have struggled with this problem just like me. So, I am hopping that you guys would suggest a method that would help me get started. The project I am working on is related to the field of agriculture and has as objective to try to find the best values of all the parameters that affect the outcome (the amount of meat produced) of an animal production (could be dairy, poultry, porch, etc...) So as I said, the approach is to run one or more algorithms on historical data for a certain type of production (poultry for example) and trying to find what should be the best values for the operating conditions that would maximize the growth of the animals (weight), while trying to minimize the production costs. A few examples of the questions that this project is trying to solve are as follows: when is the best time and how long should the barns be light? When and how much food should we give the animals? What is the best operating temperature set point? When and how much cooling/heating should be done? , etc.... As you noticed, all these questions are concerned with the optimization of the operating conditions but most importantly, the reduction of operating costs. Huge amounts (10's of Go) of historical data for these operating conditions are to be used for this purpose. PS: I am trying to use the Weka learning environment (java based and open-source). I hope that you guys would kind enough to help me work my way trough this. I would appreciate your help and advice and I thank in advance all of you who took the time to read this lengthy post Cheers.
editor

Joined: 04 Oct 2005
Posts: 124
Location: Boston, MA

 Posted: Mon Jan 09, 2006 4:25 pm    Post subject: Stuck with data mining The specific problem you describe -- optimization for certain type of production -- is not an easy problem. I do not think Weka would be the right tool for questions like "When and how much food should we give the animals". You probably need a custom solution, since your questions are not the typical classification or clustering questions. It is difficult to get started with data mining by only reading books. To get the results quickly, you can consider expert consulting available from many companies on www.kdnuggets.com/companies/consulting.html If you do not have any budget for consulting, may be you can contact some professors at local universities that teach data mining, and they can put you in touch with some bright students who can help as part of their studies.
Guest

 Posted: Mon Jan 09, 2006 5:48 pm    Post subject: How about linear programing and orthogonal experiment design?
dataminer101

Joined: 09 Jan 2006
Posts: 2
Location: Montreal

 Posted: Tue Jan 10, 2006 9:30 am    Post subject: Re: PLEASE HELP ME, I am stuck !!!! Hi there and thanks for your replys. I definitely do not have any budget for consulting, but thanks for suggesting that anyways. I posted the same message in another forum and some people were kind enough to help me. One person suggested neural networks in combination with genetic algos, another talked about self-organizing-maps and a third suggested ANOVA and another also talked about linear programming. I would appreciate if you can tell me your thoughts on the methods stated above and why you think they are usefull/useless. (I am assuming you are familiar with some of these, I might be wrong !!) well, thanks again for your help. and I will try to get familiar with all these methods to be able to contrast them but in the mean time I am count on the help of experienced people like you guys to help me get a head start. Cheers.
gsafarz

Joined: 28 Jan 2006
Posts: 3

 Posted: Sat Jan 28, 2006 9:25 pm    Post subject: It sounds like you have some historical data that you can mine to get you started with regards to what may have the biggest effects on your outcome. You can use simple regression to find the relevant effects, where you use all the possible inputs (amount of feed, light, temperature, etc) to predict your outcome. Be sure to include polynomial and interactions in the predictor set and as much information as you can gather. You can use a simple stepwise or other selection technique to get the end results. Also, do not ignore colinear variables, you must make note of these as they may fall out during some selection techniques and they may be important for next steps. Do not stop there. From the regression you can see what may be having the most effect. Use this knowledge to start experimentation. Look into fractional factorial design experimentation and response surface analysis. These techniques will get you where you want to go. They are designed to get you to the optimal settings through series of experimentations. It may take a while to get optimal results especially considering you are measuring things like growth that take a while. If you do not have a lot of subject matter expertise (agriculture in this case) hook up with those that are, they can be invaluable. If you think that this path may get you where you are going I would suggest looking into a class on design of experiments. I happen to be enrolled in an online class that starts next week, see www.statistics.com. I have no affiliation with the website or class whatsoever, as a matter of fact I am unsure of how the online format will work for me, this is sort of a test of my own. I am enrolled in the class as well as one of my employees. Hope this helps you out.
