Nine Laws of Data Mining, part 2

The second group data mining laws includes: There are always patterns, Data mining amplifies perception in the business domain, Prediction increases information locally by generalisation, Value law, Law of Change. Tom Khabaza explains.

By Tom Khabaza, Society of Data Miners.

Continued from 9 Laws of Data Mining, part 1.

5th Law of Data Mining – “Watkins’ Law”:

 There are always patterns

patternsThis law was first stated by David Watkins.  We might expect that a proportion of data mining projects would fail because the patterns needed to solve the business problem are not present in the data, but this does not accord with the experience of practising data miners.

Previous explanations have suggested that this is because:

There is always something interesting to be found in a business-relevant dataset, so that even if the expected patterns were not found, something else useful would be found (this does accord with data miners’ experience), and

A data mining project would not be undertaken unless business experts expected that patterns would be present, and it should not be surprising that the experts are usually right.

However, Watkins formulated this in a simpler and more direct way: “There are always patterns.”, and this accords more accurately with the experience of data miners than either of the previous explanations.  Watkins later amended this to mean that in data mining projects about customer relationships, there are always patterns connecting customers’ previous behaviour with their future behaviour, and that these patterns can be used profitably (“Watkins’ CRM Law”).  However, data miners’ experience is that this is not limited to CRM problems – there are always patterns in any data mining problem (“Watkins’ General Law”).

The explanation of Watkins’ General Law is as follows:

  • The business objective of a data mining project defines the domain of interest, and this is reflected in the data mining goal.
  • Data relevant to the business objective and consequent data mining goal is generated by processes within the domain.
  • These processes are governed by rules, and the data that is generated by the processes reflects those rules.
  • In these terms, the purpose of the data mining process is to reveal the domain rules by combining pattern-discovery technology (data mining algorithms) with the business knowledge required to interpret the results of the algorithms in terms of the domain.
  • Data mining requires relevant data, that is data generated by the domain processes in question, which inevitably holds patterns from the rules which govern these processes.

To summarise this argument: there are always patterns because they are an inevitable by-product of the processes which produce the data.  To find the patterns, start from the process or what you know of it – the business knowledge.

Discovery of these patterns also forms an iterative process with business knowledge; the patterns contribute to business knowledge, and business knowledge is the key component required to interpret the patterns.  In this iterative process, data mining algorithms simply link business knowledge to patterns which cannot be observed with the naked eye.

If this explanation is correct, then Watkins’ law is entirely general.  There will always be patterns for every data mining problem in every domain unless there is no relevant data; this is guaranteed by the definition of relevance.

6th Law of Data Mining – “Insight Law”:

 Data mining amplifies perception in the business domain

User InsightsHow does data mining produce insight?  This law approaches the heart of data mining – why it must be a business process and not a technical one.  Business problems are solved by people, not by algorithms.  The data miner and the business expert “see” the solution to a problem, that is the patterns in the domain that allow the business objective to be achieved. Thus data mining is, or assists as part of, a perceptual process.  Data mining algorithms reveal patterns that are not normally visible to human perception.  The data mining process integrates these algorithms with the normal human perceptual process, which is active in nature. Within the data mining process, the human problem solver interprets the results of data mining algorithms and integrates them into their business understanding, and thence into a business process.

This is similar to the concept of an “intelligence amplifier”.  Early in the field of Artificial Intelligence, it was suggested that the first practical outcomes from AI would be not intelligent machines, but rather tools which acted as “intelligence amplifiers”, assisting human users by boosting their mental capacities and therefore their effective intelligence.  Data mining provides a kind of intelligence amplifier, helping business experts to solve business problems in a way which they could not achieve unaided. 

In summary: Data mining algorithms provide a capability to detect patterns beyond normal human capabilities.  The data mining process allows data miners and business experts to integrate this capability into their own problem solving and into business processes.

7th Law of Data Mining – “Prediction Law”:

Prediction increases information locally by generalisation

The term “prediction” has become the accepted description of what data mining models do – we talk about “predictive models” and “predictive analytics”.  This is because some of the most popular data mining models are often used to “predict the most likely outcome” (as well as indicating how likely the outcome may be).  This is the typical use of classification and regression models in data mining solutions. 

However, other kinds of data mining models, such as clustering and association models, are also characterised as “predictive”; this is a much looser sense of the term.  A clustering model might be described as “predicting” the group into which an individual falls, and an association model might be described as “predicting” one or more attributes on the basis of those that are known.

Similarly we might analyse the use of the term “predict” in different domains: a classification model might be said to predict customer behaviour – more properly we might say that it predicts which customers should be targeted in a certain way, even though not all the targeted individuals will behave in the “predicted” manner.  A fraud detection model might be said to predict whether individual transactions should be treated as high-risk, even though not all those so treated are in fact cases of fraud.

These broad uses of the term “prediction” have led to the term “predictive analytics” as an umbrella term for data mining and the application of its results in business solutions.  But we should remain aware that this is not the ordinary everyday meaning of “prediction” – we cannot expect to predict the behaviour of a specific individual, or the outcome of a specific fraud investigation.

What, then, is “prediction” in this sense?  What do classification, regression, clustering and association algorithms and their resultant models have in common?  The answer lies in “scoring”, that is the application of a predictive model to a new example.  The model produces a prediction, or score, which is a new piece of information about the example.  The available information about the example in question has been increased, locally, on the basis of the patterns found by the algorithm and embodied in the model, that is on the basis of generalisation or induction.  It is important to remember that this new information is not “data”, in the sense of a “given”; it is information only in the statistical sense.