Top Data Scientist Claudia Perlich’s Favorite Machine Learning Algorithm
Interested in the reasons why a top data scientist is partial to one particular algorithm over others? Read on to find out.
By Claudia Perlich, Dstillery.
Hands down logistic regression (with many bells and whistles like stochastic gradient descent, feature hashing and penalties).
I know that in the day and age of Deep Learning this seems to be a really odd answer. So let’s start with a bit of background:
In 1995–1998 I was using neural networks, 1998–2002 I was working mostly with tree based methods and from 2002 on, logistic regression (and linear models in general including quantile regression, Poisson regression, etc.) ended up to slowly make its way into my heart. In 2003 I published a paper in Machine Learning showing the results on comparing tree based methods against logistic regression on 35 (at the time large) datasets.
The short answer (if you want to skip the 30 pages) - if the signal to noise ratio is high, trees tend to win. But, if you have very noisy problems and the best model has an AUC<0.8 - logistic beats the trees almost always. Ultimately not very surprising: if the signal is too weak, high variance models get lost in the weeds.
So what does this mean in practice? The type of problems I tend to deal with are super noisy with low level of predictability. Think of it in the terms of deterministic (chess) all the way to random (supposedly the stock market). Some problems are just more predictable (given the data you have) than others. And this is not a question of the algorithms but rather a conceptual statement about the world.
Most problems I am interested in are very close to the stock market end of the spectrum. Deep learning is really great on the other end - “Is this picture showing a cat?”. In the world of uncertainty, the bias variance tradeoff still often ends up being favorable on the side of more bias - meaning, you want a ‘simple’ very constrained model. And this is where logistic regression comes in. I personally have found it much easier to ‘beef up’ a simple linear model by adding complicated features than trying to constrain a very powerful (high variance) model class. In fact each and every one of the data mining competitions I have won (KDD CUP 07–09) used a linear model.
Beyond the performance - linear models are robust and tend to need much less handholding (ok, fine, stochastic gradient descent and penalties make it a bit harder). This is extremely important when you want to do predictive modeling in industry where you do not have the luxury to spend 3 month on building the perfect model.
And finally, I have a better chance of maybe understanding what is going on from a linear model.
Dstillery is a data analytics company that uses machine learning and predictive modeling to provide intelligent solutions for brand marketing and other business challenges. Drawing from a unique 360 degree view of digital, physical and offline activity, we generate insights and predictions about the behaviors of individuals and discrete populations.
Original. Reposted with permission.
- The 10 Algorithms Machine Learning Engineers Need to Know
- Regularization in Logistic Regression: Better Fit and Better Generalization?
- Machine Learning Key Terms, Explained