Ten Machine Learning Algorithms You Should Know to Become a Data Scientist
It's important for data scientists to have a broad range of knowledge, keeping themselves updated with the latest trends. With that being said, we take a look at the top 10 machine learning algorithms every data scientist should know.
By Muktabh Mayank, ParallelDots.
Machine Learning Practitioners have different personalities. While some of them are “I am an expert in X and X can train on any type of data”, where X = some algorithm, some others are “Right tool for the right job people”. A lot of them also subscribe to “Jack of all trades. Master of one” strategy, where they have one area of deep expertise and know slightly about different fields of Machine Learning. That said, no one can deny the fact that as practicing Data Scientists, we will have to know basics of some common machine learning algorithms, which would help us engage with a new-domain problem we come across. This is a whirlwind tour of common machine learning algorithms and quick resources about them which can help you get started on them.
1. Principal Component Analysis(PCA)/SVD
PCA is an unsupervised method to understand global properties of a dataset consisting of vectors. Covariance Matrix of data points is analyzed here to understand what dimensions(mostly)/ data points (sometimes) are more important (ie have high variance amongst themselves, but low covariance with others). One way to think of top PCs of a matrix is to think of its eigenvectors with highest eigenvalues. SVD is essentially a way to calculate ordered components too, but you don’t need to get the covariance matrix of points to get it.
This Algorithm helps one fight curse of dimensionality by getting datapoints with reduced dimensions.
2a. Least Squares and Polynomial Fitting
Remember your Numerical Analysis code in college, where you used to fit lines and curves to points to get an equation. You can use them to fit curves in Machine Learning for very small datasets with low dimensions. (For large data or datasets with many dimensions, you might just end up terribly overfitting, so don’t bother). OLS has a closed form solution, so you don’t need to use complex optimization techniques.
As is obvious, use this algorithm to fit simple curves / regression
2b. Constrained Linear Regression
Least Squares can get confused with outliers, spurious fields and noise in data. We thus need constraints to decrease the variance of the line we fit on a dataset. The right method to do it is to fit a linear regression model which will ensure that the weights do not misbehave. Models can have L1 norm (LASSO) or L2 (Ridge Regression) or both (elastic regression). Mean Squared Loss is optimized.
Use these algorithms to fit regression lines with constraints, avoiding overfitting and masking noise dimensions from model.
3. K means Clustering
Everyone’s favorite unsupervised clustering algorithm. Given a set of data points in form of vectors, we can make clusters of points based on distances between them. It’s an Expectation Maximization algorithm that iteratively moves the centers of clusters and then clubs points with each cluster centers. The input the algorithm has taken is the number of clusters which are to be generated and the number of iterations in which it will try to converge clusters.
As is obvious from the name, you can use this algorithm to create K clusters in dataset
4. Logistic Regression
Logistic Regression is constrained Linear Regression with a nonlinearity (sigmoid function is used mostly or you can use tanh too) application after weights are applied, hence restricting the outputs close to +/- classes (which is 1 and 0 in case of sigmoid). Cross-Entropy Loss functions are optimized using Gradient Descent. A note to beginners: Logistic Regression is used for classification, not regression. You can also think of Logistic regression as a one layered Neural Network. Logistic Regression is trained using optimization methods like Gradient Descent or L-BFGS. NLP people will often use it with the name of Maximum Entropy Classifier.
This is what a Sigmoid looks like:
Use LR to train simple, but very robust classifiers.
5. SVM (Support Vector Machines)
SVMs are linear models like Linear/ Logistic Regression, the difference is that they have different margin-based loss function (The derivation of Support Vectors is one of the most beautiful mathematical results I have seen along with eigenvalue calculation). You can optimize the loss function using optimization methods like L-BFGS or even SGD.
Another innovation in SVMs is the usage of kernels on data to feature engineer. If you have good domain insight, you can replace the good-old RBF kernel with smarter ones and profit.
One unique thing that SVMs can do is learn one class classifiers.
SVMs can used to Train a classifier (even regressors)
Note: SGD based training of both Logistic Regression and SVMs are found in SKLearn’s http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html , which I often use as it lets me check both LR and SVM with a common interface. You can also train it on >RAM sized datasets using mini batches.