The Great Algorithm Tutorial Roundup
This is a collection of tutorials relating to the results of the recent KDnuggets algorithms poll. If you are interested in learning or brushing up on the most used algorithms, as per our readers, look here for suggestions on doing so!
KDnuggets recently ran a poll asking our readers "Which methods/algorithms you used in the past 12 months for an actual Data Science-related application?"
844 voters participated, with the top 10 algorithms shown below:
The results were summarized and some analysis was offered in this post, which is a great read if you are looking for further breakdowns of what algorithms were reported by which types of respondents, respondent locations, etc.
As a result, we thought that the following resources may be useful to readers looking to plug holes in their knowledge of these particular algorithms, as well as machine learning algorithms in general.
Algorithm Basics
For algorithms basics, including many of the top reported algorithms outlined in the above graphic, the following posts are good places to start:
- The 10 Algorithms Machine Learning Engineers Need to Know
Read this introductory list of contemporary machine learning algorithms of importance that every engineer should understand. - Top 10 Data Mining Algorithms, Explained
Top 10 data mining algorithms, selected by top researchers, are explained here, including what do they do, the intuition behind the algorithm, available implementations of the algorithms, why use them, and interesting applications. - Machine Learning Key Terms, Explained
An overview of 12 important machine learning concepts, presented in a no frills, straightforward definition style. All Machine Learning Models Have Flaws
This classic post examines what is right and wrong with different models of machine learning, including Bayesian learning, Graphical Models, Convex Loss Optimization, Statistical Learning, and more.
Algorithm Specifics
What follows are select tutorials and additional information for each of the top 10 algorithms appearing in the poll.
Regression
- A Brief Primer on Linear Regression – Part 1
This introduction to linear regression discusses a simple linear regression model with one predictor variable, and then extends it to the multiple linear regression model with at least two predictors. - A Brief Primer on Linear Regression – Part 2
This second part of an introduction to linear regression moves past the topics covered in the first to discuss linearity, normality, outliers, and other topics of interest. - Regression & Correlation for Military Promotion: A Tutorial
A clear and well-written tutorial covering the concepts of regression and correlation, focusing on military commander promotion as a use case.
Clustering
- Data Science 102: K-means clustering is not a free lunch
K-means is a widely used method in cluster analysis, but what are its underlying assumptions and drawbacks? We examine what happens for non-spherical data and unevenly sized clusters. - A Tutorial on the Expectation Maximization (EM) Algorithm
This is a short tutorial on the Expectation Maximization algorithm and how it can be used on estimating parameters for multi-variate data.
Decision Tree/Rules
- Decision Trees: A Disastrous Tutorial
Get a concise overview of decision trees here, one of the most used KDnuggets reader algorithms as measured in a recent poll. - Dealing with Unbalanced Classes, SVMs, Random Forests, and Decision Trees in Python
An overview of dealing with unbalanced classes, and implementing SVMs, Random Forests, and Decision Trees in Python.
Visualization
- 4 Lessons for Brilliant Data Visualization
Get some pointers on data visualization from a noted expert in the field, and gain some insight into creating your own brilliant visualizations by following these 4 lessons. - Three Simple Resolutions to Design Better DataViz
Start your New Year off with resolutions to produce better data visualizations: visualize your data, remove chart legends, and try new things.
K-nearest Neighbors
- Implementing Your Own k-Nearest Neighbour Algorithm Using Python
A detailed explanation of one of the most used machine learning algorithms, k-Nearest Neighbors, and its implementation from scratch in Python. Enhance your algorithmic understanding with this hands-on coding exercise.
Principal Component Analysis (PCA)
- Nutrition & Principal Component Analysis: A Tutorial
An great overview of Principle Component Analysis (PCA), with an example application in the field of nutrition. - A comparison between PCA and hierarchical clustering
Graphical representations of high-dimensional data sets are the backbone of exploratory data analysis. We examine 2 of the most commonly used methods: heatmaps combined with hierarchical clustering and principal component analysis (PCA).
Statistics
- What Statistics Topics are Needed for Excelling at Data Science?
Here is a list of skills and statistical concepts suggested for excelling at data science, roughly in order of increasing complexity. - Central Limit Theorem for Data Science
This post is an introductory explanation of the Central Limit Theorem, and why it is (or should be) of importance to data scientists. - Understanding the Empirical Law of Large Numbers and the Gambler’s Fallacy
Law of large numbers is a important concept for practising data scientists. In this post, The empirical law of large numbers is demonstrated via simple simulation approach using the Bernoulli process.
Random Forests
- Random Forest: A Criminal Tutorial
Get an overview of Random Forest here, one of the most used algorithms by KDnuggets readers according to a recent poll. - When Does Deep Learning Work Better Than SVMs or Random Forests?
Some advice on when a deep neural network may or may not outperform Support Vector Machines or Random Forests. - Dealing with Unbalanced Classes, SVMs, Random Forests, and Decision Trees in Python
An overview of dealing with unbalanced classes, and implementing SVMs, Random Forests, and Decision Trees in Python.
Time Series/Sequence
- A simple approach to anomaly detection in periodic big data streams
We describe a simple and scaling algorithm that can detect rare and potentially irregular behavior in a time series with periodic patterns. It performs similarly to Twitter's more complex approach. - Anomaly Detection in Predictive Maintenance with Time Series Analysis
How can we predict something we have never seen, an event that is not in the historical data? This requires a shift in the analytics perspective! Understand how to standardization the time and perform time series analysis on sensory data.
Text Mining
- Mining Twitter Data with Python Part 1: Collecting Data
Part 1 of a 7 part series focusing on mining Twitter data for a variety of use cases. This first post lays the groundwork, and focuses on data collection. - Text Mining 101: Topic Modeling
We introduce the concept of topic modelling and explain two methods: Latent Dirichlet Allocation and TextRank. The techniques are ingenious in how they work – try them yourself.
Going Further
Here are a few posts which bring some of the concepts of machine learning algorithms together, or leverage some of them for different or novel approaches.
- Top 10 Quora Machine Learning Writers and Their Best Advice
Top Quora machine learning writers give their advice on pursuing a career in the field, academic research, and selecting and using appropriate technologies. - Top 10 IPython Notebook Tutorials for Data Science and Machine Learning
A list of 10 useful Github repositories made up of IPython (Jupyter) notebooks, focused on teaching data science and machine learning. Python is the clear target here, but general principles are transferable. - 7 Steps to Mastering Machine Learning With Python
There are many Python machine learning resources freely available online. Where to begin? How to proceed? Go from zero to Python machine learning hero in 7 steps! - Why Implement Machine Learning Algorithms From Scratch?
Even with machine learning libraries covering almost any algorithm implementation you could imagine, there are often still good reasons to write your own. Read on to find out what these reasons are. - Doing Statistics with SQL
This post covers how to perform some basic in-database statistical analysis using SQL.
As always, we thank our guest bloggers for their ongoing fantastic contributions in the realm of machine learning and all other areas of data science.
Related: