KDnuggets » Forums
Latest News



 FAQFAQ    SearchSearch    MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

What does it take to make it as a professional data miner?

 
Post new topic   Reply to topic    www.kdnuggets.com Forum Index -> Data Mining Open Forum
View previous topic :: View next topic  
Author Message
al.khwarizmi



Joined: 14 Sep 2012
Posts: 1

PostPosted: Fri Sep 14, 2012 6:08 pm    Post subject: What does it take to make it as a professional data miner? Reply with quote

Hi forum. To a cut a very long story short, I couldn't graduate from university (due to psychiatric hospitalization, not academic dishonesty or something like that) and now I have a bunch of debt and dubious (formal) qualifications. (My academic experience was in comp sci and linguistics.) However, this guy I know who does data mining in the Bay Area says I should totally apply for jobs that normally require a master's degree. He says he knows people with lesser qualifications who have jobs in this area. Suffice it to say though that I am still intimidated by this prospect.

So I'll describe where I am re: career skills and ask you whether he's right or not and where I should improve. I want to be able to make a strong case that I would be a good worker in this field—my pathology is party psychotic in nature and as such dealing with people is something I need to try to make as pain-free for me as possible ... I'd be really disheartened to receive an endless stream of rejection notices if I'm not good enough.

So, OK, here's where I am. I can get pretty good results on interesting public domain datasets I find, like this wine quality dataset:

http://archive.ics.uci.edu/ml/datasets/Wine+Quality

As you can see the predicted variable is an integer quality score on the interval [0, 10]. On the hypothesis that this rating system is an interval scale, I predicted a real value, rounded it to the nearest integer, and achieved a RMSE of ~0.75 on my test set. So, not bad.

I've also used Wall Street Survivor and consistently been able to achieve returns better than the S&P 500 as a whole, and gotten results nearly as good as those of the winner in the KDD Cup of 2001.

Beyond that, I try not to be a one-trick pony. For career reasons but mainly for my own edification I have aimed for a very broad education. For instance, I currently know German and Swedish—the latter of which I use quite frequently—and am learning Spanish—these sorts of things are useful in the NLP domain. I have read a good deal in the philosophy of science and the philosophy of statistics, something directly applicable in data mining, which could be conceived of, at least in some ways, as a highly formalized version of scientific inquiry. Over the past few months, I've learned about as much as I can about chemistry short of having a lab as one would normally learn in a university over the span of a year—verifying knowledge with many exercises and on the Chemistry StackExchange—and will move on directly to organic, bio- and medicinal chemistry afterwards—these things are useful in various commercially viable data mining domains.

To be clear, even though I'm unemployed and everything, there isn't much slouching here.

Programming-wise: I have used both OCaml and Python to the greatest extent, and am very handy in general with the *nix command line. I do not however know very much about Java or C++ (bad programming languages) which are unfortunately often demanded, at least in the Bay Area. Perhaps a greater weakness: I hate R with a passion owing to what I perceive of as an excessively domain-specific language. I could learn it in earnest but I'd kick myself while doing so. I am interested in learning more Octave/MATLAB, as well as RapidMiner.

As far as career-relevant personality traits go: I am extremely persistent when it comes to tasks that interest me and could well be expected to work past normal hours in such a case. I am also considered highly knowledgeable and creative by peers. The downside is that I sometimes have a hard time getting on with others and have been known to behave strangely.

Is this kind of profile good enough or should I try competing on Kaggle or something? Etc....
Back to top
View user's profile Send private message
phil123
Data Mining Guru


Joined: 05 Mar 2012
Posts: 50
Location: Canada

PostPosted: Sun Sep 16, 2012 9:19 am    Post subject: My opinion Reply with quote

My opinion is that you should be either good at math and statistics or computer science, or preferably both.

For computer science, a programmer should at least know C++ or a language like Java or C#. Moreover, he should preferably be familiar with distributed algorithms and stuff like that, be good at programming, algorithmics, etc. so that he can write efficient code. In data mining, complexity and scalability of algorithms is very important.

But if someone is not good at computer science, it could be ok if he is good at statistics/math. Instead of developping his own algorithms, he may use Matlab, rapidminer and other software package. Moreover, someone who is good at math may approach problems from a statistics perspective and better see what is statistically significant and so on...

So in my opinion, you should be strong in at least one of these two areas.

Philippe
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic    www.kdnuggets.com Forum Index -> Data Mining Open Forum All times are GMT - 5 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group

KDnuggets » Forums

Copyright © 2012 KDnuggets.   Subscribe to KDnuggets News! Tweet Twitter | facebook Facebook | RSS RSS | About KDnuggets