Machine Learning Key Terms, Explained
An overview of 12 important machine learning concepts, presented in a no frills, straightforward definition style.
There are many posts on KDnuggets covering the explanation of key terms and concepts in the areas of Data Science, Machine Learning, Deep Learning, Big Data, etc. (see here, here, and here). In fact, it's one of the tasks that KDnuggets takes quite seriously: introducing and clarifying concepts in the minds of new and seasoned practitioners alike. In many of these posts, concepts and terminology are often expounded upon and fit into The Big Picture, sometimes miring down the key concept in exchange for defining some greater notion.
This is the first in a series of such posts on KDnuggets which will offer concise explanations of a related set of terms (machine learning, in this case), specifically taking a no-frills approach for those looking to isolate and define. After some thought, it was determined that these foundational-yet-informative types of posts have not been given enough exposure in the past, with future iterations likely to include:
- Deep Learning
- Natural Language Processing
- Data Mining and Data Science
- Other interesting topics we can think of :)
Not enough information provided in these definitions for you? No worries, since each term listed links to related posts on KDnuggets for further investigation.
So, let's start with a look at machine learning and related topics.
1. Machine Learning
According to Mitchell, machine learning is "concerned with the question of how to construct computer programs that automatically improve with experience." Machine learning is interdisciplinary in nature, and employs techniques from the fields of computer science, statistics, and artificial intelligence, among others. The main artefacts of machine learning research are algorithms which facilitate this automatic improvement from experience, algorithms which can be applied in such diverse fields as computer vision, artificial intelligence, and data mining.
Classification is concerned with building models that separate data into distinct classes. These models are built by inputting a set of training data for which the classes are pre-labelled in order for the algorithm to learn from. The model is then used by inputting a different dataset for which the classes are withheld, allowing the model to predict their class membership based on what it has learned from the training set. Well-known classification schemes include decision trees and support vector machines. As this type of algorithm requires explicit class labelling, classification is a form of supervised learning.
Regression is very closely related to classification. While classification is concerned with the prediction of discrete classes, regression is applied when the "class" to be predicted is made up of continuous numerical values. Linear regression is an example of a regression technique.
Clustering is used for analyzing data which does not include pre-labeled classes, or even a class attribute at all. Data instances are grouped together using the concept of "maximizing the intraclass similarity and minimizing the interclass similarity," as concisely described by Han, Kamber & Pei. This translates to the clustering algorithm identifying and grouping instances which are very similar, as opposed to ungrouped instances which are much less-similar to one another. k-means clustering is perhaps the most well-known example of a clustering algorithm. As clustering does not require the pre-labeling of instance classes, it is a form of unsupervised learning, meaning that it learns by observation as opposed to learning by example.
Association is most easily explained by introducing market basket analysis, a typical task for which it is well-known. Market basket analysis attempts to identify associations between the various items that have been chosen by a particular shopper and placed in their market basket, be it real or virtual, and assigns support and confidence measures for comparison. The value of this lies in cross-marketing and customer behavior analysis. Association is a generalization of market basket analysis, and is similar to classification except that any attribute can be predicted in association. Apriori enjoys success as the most well-known example of an association algorithm. Association is another example of unsupervised learning.