Cellular Automata in Stream Learning
In this post, we will start presenting CA as pattern recognition methods for stream learning. Finally, we will briefly mention two recent CA-based solutions for stream learning. Both are highly interpretable as their cellular structure represents directly the mapping between the feature space and the labels to be predicted.
By Jesús López, Data Scientist and Researcher in Machine Learning and Artificial Intelligence
This post is dedicated to John Horton Conway and Tom Fawcett, who recently passed away, for their noted contributions to the field of cellular automata and machine learning.
Introduction
With the advent of fast data streams, real-time machine learning has become a challenging task. They can be affected by the concept drift effect, by which stream learning methods have to detect changes and adapt to these evolving conditions. Several emerging paradigms such as the so-called “Smart Dust”, “Utility Fog”, “TinyML” or “Swarm Robotics” are in need for efficient and scalable solutions in real-time scenarios. Cellular Automata (CA), as low-bias and robust-to-noise pattern recognition methods with competitive classification performances, meet the requirements imposed by the aforementioned paradigms mainly due to their simplicity and parallel nature.
In this post, we will start presenting CA as pattern recognition methods for stream learning. Finally, we will briefly mention two recent CA-based solutions for stream learning: LUNAR [1], which is capable of learning incrementally while adapting to the concept drift effect, and CURIE [2], that is able to act as a competitive drift detector in evolving scenarios. As we will see, both are highly interpretable as their cellular structure represents directly the mapping between the feature space and the labels to be predicted.
Cellular Automata in a nutshell
The journey of CA was initiated by John von Neumann [3] and Ulam, but they became really fashionable thanks to John Horton Conway and the popularity of Conway’s Game of Life [4] in the field of artificial life. Arguably, the most scientifically significant and elaborated work on the study of CA arrived in 2002 with the thoughtful studies of Stephen Wolfram [5]. In recent years, the notion of complex systems proved to be a very useful concept to define, describe, and study various natural phenomena observed in a vast number of scientific disciplines. Despite their simplicity, CA are able to describe and reproduce many complex phenomena. Besides, the capability of performing universal computation has been one of their most celebrated features. They have also garnered the attention of the research community due to the fact that an arbitrary Turing machine can be simulated by a cellular automaton, so universal computation is possible.
We can find four mutually interdependent parts in CA:
- A lattice is created by a grid of elements (cells), which can be composed in one, two or higher dimensional space, but typically composed of uniform squared cells in two dimensions.
- CA contain a finite set of discrete states, whose number and range are dictated by the phenomenon under study. We find the simplest CA built using only one Boolean state in one dimension.
- The neighborhood, which is used to evaluate a local rule, is defined by a set of neighboring (adjacent) cells. A neighboring cell is any cell within a certain radius of the cell in question. Additionally, it is important to specify whether the radius R applies only along the axes of the cell space (von Neumann neighborhood), or if it can be applied diagonally (Moore neighborhood). In two dimensions, the R=1 von Neumann neighborhood or the R=1 Moore neighborhood are often selected (see Figure 1).
- Finally, a local rule defines the evolution of each CA; it is usually realized by taking all states from all cells within the neighborhood, and by evaluating a set of logical or arithmetical operations written in the form of an algorithm (see Figure 2).
CA in Pattern Recognition
CA were (and are) not commonly used for pattern recognition, but Tom Fawcett changed that forever with his work [6], where CA were used as a form of instance-based classifiers. Thus, he put CA in value for the pattern recognition community by introducing them as a low-bias data mining method, simple but powerful for attaining massively fine-grained parallelism, non-parametric, with an effective and competitive classification performance in many cases (similar to that produced by other complex data mining models), and robust to noise. Besides, CA were found to perform well with relatively scarce data. All this prior evidence makes CA suited for pattern recognition tasks.
If we add two more parts, CA can turn into pattern recognition methods:
- Initialization: the grid is seeded depending on the feature values of the instances of the training dataset. Specifically, the state of each cell is assigned the label corresponding to the majority of training data instance with feature values falling within the range covered by the cell. As a result, cells will organize themselves into regions of similar labels (Figure 3).
- Generations: after the initialization step, some cells can remain unassigned. Then, it is necessary to run the CA (generations) until no cells are left empty, and then continuing until either no changes are made or a fixed threshold is exceeded. Along this process, each cell computes its new state by applying the update rule over the cells in its immediate neighborhood. Each cell follows the same update rule, and all cells are updated simultaneously and synchronously. A critical characteristic of CA is that the update rule examines only its neighboring cells, so the processing is entirely local. No global or macro grid characteristics are computed whatsoever (Figure 3).
CA in Stream Learning
In a real-time data mining process (stream learning), stream learning algorithms must operate under a set of constrained conditions: each instance can be processed only once, the processing time of each instance and the usage of memory must be low, and we must consider that data streams evolve along time (changes in data may occur). Since we cannot explicitly store all past data to detect or quantify these changes, concept drift detection and adaptation are acknowledged challenges for real-time processing algorithms.
Unfortunately, the original form of CA for pattern recognition does not allow for stream processing. The developed LUNAR algorithm [1] is a streamified learning version of CA which transforms CA into real incremental learners that incorporate an embedded mechanism for drift adaptation. And CURIE [2] is a drift detector relying on CA.
LUNAR
Here we introduce the key modifications to streamify a CA pattern recognition approach:
- Preparatory instances: a small portion of the data stream (preparatory instances is used to perform an initialization of LUNAR. Then, the grid is seeded with these data values, and LUNAR progresses through several generations (and applying the local rule) until all cells are assigned a state. With the traditional CA for pattern recognition, the whole dataset is available from the beginning. Therefore, the CA initialization is not required.
- An updated representation of the instance space: since historical data is not available and data grow continuously, LUNAR updates the cell bins of the grid every time a new data instance arrives. By doing this, we ensure that the range of values of the instance space is updated during the streaming process. With the traditional CA for pattern recognition, as the whole training dataset is static, the grid boundaries and cell bins are calculated at the beginning of the data mining process, and remain unaltered during the test phase.
- LUNAR’s cells are continuously updated: LUNAR’s cells are also updated when new data arrives by assigning the incoming label (state) to the corresponding cell. Instead of examining the cells in the immediate neighborhood of this corresponding cell during several generations (which would take time and computational resources), LUNAR just checks the previous state of the cell and the current state provided by the recent data instance. When both coincide, no change is required, but when they differ, the cell adopts the state of the incoming data instance. In this sense, LUNAR always represents the current data distribution. This is the way LUNAR works as an incremental learner. However, with the traditional CA for pattern recognition, once initialized, a local rule is applied over successive generations to set the state of a given cell depending on the states of its neighbors.
Regarding the adaptation capability of LUNAR to the drift, it is able to achieve such skill by hybridating with the approach of [7] (“Paired Learners”). Here, a stable learner is incrementally trained and predicts considering all of its experience during the streaming process, whereas a reactive learner is trained and predicts based on its experience over a time window. Figure 5 shows the successful adaptation of this mechanism.
CURIE
The design of a concept drift detector with high performance is not trivial, yet it is crucial to achieve more reliable streaming algorithms. The design objective is to develop techniques that detect all existing drifts in the stream with low latency and as few false alarms and missed detections as possible.
As shown in Figures 6 and 7, the detection mechanism of CURIE hinges on the evidence that a recent number of mutations in the neighborhood of a cell that has just mutated, may serve as an symptomatic indicator of the occurrence of a drift. If the previous state of the cell (before the arrival of the incoming instance) is different from the label of the incoming instance, mutation has happened. When there is a mutation, we assign the current time step to the cell in a grid of time steps. Then, CURIE checks the state of the neighboring cells in a radius in a specific period of time. If the number of neighboring mutants exceeds a threshold, CURIE considers that a drift has occurred.
Conclusions
Stream learning, due to its application to real-time data analysis, is probably one of the hottest topics in machine learning. This, combined with the fact that some emerging paradigms (such as “Smart Dust”, “Utility Fog”, “TinyML” or “Swarm Robotics”) are in need for tailor-made solutions for real-time data, makes CA a very interesting technique. The presented LUNAR and CURIE methods, based on the findings of John Horton Conway and Tom Fawcett among others, are promising researches that address these topics.
References
[1] http://www.sciencedirect.com/science/article/pii/S0020025520308306
[2] https://arxiv.org/abs/2009.09677
[3] Neumann, J., Burks, A. W. et al. (1966). Theory of self-reproducing automata volume 1102024. University of Illinois Press Urbana.
[4] Games, M. (1970). The fantastic combinations of john conways new solitaire game life by martin gardner. Scientific American, 223, 120–123.
[5] Wolfram, S. (2002). A new kind of science volume 5. Wolfram media Champaign, IL.
[6] Fawcett, T. (2008). Data mining with cellular automata. ACM SIGKDD Explorations Newsletter, 10, 32–39.
[7] Bach, S. H., & Maloof, M. A. (2008). Paired learners for concept drift. In 2008 Eighth IEEE International Conference on Data Mining (pp. 23–32). IEEE.
Bio: Dr. Jesús López (@txuslopez) is currently based in Bilbao (Spain) working at TECNALIA as Data Scientist and Researcher in Machine Learning and Artificial Intelligence. His research topics are real-time data mining (stream learning), concept drift, continual learning, anomaly detection, spiking neural networks, and cellular automata for data mining. Not forgetting the leisure side, he also loves to go outdoors to surf all over the globe.
Original. Reposted with permission.
Related:
- The ravages of concept drift in stream learning applications and how to deal with it
- Compute Goes Brrr: Revisiting Sutton’s Bitter Lesson for AI
- Society of Machines: The Complex Interaction of Agents