Machine Learning Algorithms: A Concise Technical Overview – Part 1
These short and to-the-point tutorials may provide the assistance you are looking for. Each of these posts concisely covers a single, specific machine learning concept.
Whether you are a newcomer to machine learning, a newbie to specific algorithms or concepts, or a seasoned ML vet looking for a once-over of an algorithm you haven't seen or used in a while, these short and to-the-point tutorials may provide the assistance you are looking for. Each of these posts concisely covers a single, specific machine learning concept.
Support Vector Machines (SVMs) are a particular classification strategy. SMVs work by transforming the training dataset into a higher dimension, which is then inspected for the optimal separation boundary, or boundaries, between classes. In SVMs, these boundaries are referred to as hyperplanes, which are identified by locating support vectors, or the instances that most essentially define classes, and their margins, which are the lines parallel to the hyperplane defined by the shortest distance between a hyperplane and its support vectors. Consequently, SVMs are able to classify both linear and nonlinear data.
Clustering is used for analyzing data which does not include pre-labeled classes. Data instances are grouped together using the concept of maximizing intraclass similarity and minimizing the similarity between differing classes. This translates to the clustering algorithm identifying and grouping instances which are very similar, as opposed to ungrouped instances which are much less-similar to one another. As clustering does not require the pre-labeling of classes, it is a form of unsupervised learning.
Frequent pattern mining is most easily explained by introducing market basket analysis (or affinity analysis), a typical usage for which it is well-known. Market basket analysis attempts to identify associations, or patterns, between the various items that have been chosen by a particular shopper and placed in their market basket, be it real or virtual, and assigns support and confidence measures for comparison. The value of this lies in cross-marketing and customer behavior analysis.
As I'm sure you are undoubtedly aware, decision trees are a type of flowchart which assist in the decision making process. Internal nodes represent tests on particular attributes, while branches exiting nodes represent a single test outcome, and leaf nodes represent class labels.
In machine learning, decision trees have been used for decades as effective and easily understandable data classifiers (contrast that with the numerous blackbox classifiers in existence).
Linear regression is a simple algebraic tool which attempts to find the “best” (generally straight) line fitting 2 or more attributes, with one attribute (simple linear regression), or a combination of several (multiple linear regression), being used to predict another, the class attribute. A set of training instances is used to compute the linear model, with one attribute, or a set of attributes, being plotted against another. The model then attempts to identify where new instances would lie on the regression line, given a particular class attribute.