Anomaly Detection, A Key Task for AI and Machine Learning, Explained

One way to process data faster and more efficiently is to detect abnormal events, changes or shifts in datasets. Anomaly detection refers to identification of items or events that do not conform to an expected pattern or to other items in a dataset that are usually undetectable by a human expert.



By Sciforce.

It is true that the Industrial Internet of Things will change the world someday. So far, it is the abundance of data that makes the world spin faster. Piled in sometimes unmanageable datasets, big data turned from the Holy Grail into a problem pushing businesses and organizations to make faster decisions in real-time. One way to process data faster and more efficiently is to detect abnormal events, changes or shifts in datasets. Thus, anomaly detection, a technology that relies on Artificial Intelligence to identify abnormal behavior within the pool of collected data, has become one of the main objectives of the Industrial IoT.

Anomaly detection refers to identification of items or events that do not conform to an expected pattern or to other items in a dataset that are usually undetectable by a human expert. Such anomalies can usually be translated into problems such as structural defects, errors or frauds.

 

Examples of potential anomalies:

 

  • A leaking connection pipe that leads to the shutting down of the entire production line;
  • Multiple failed login attempts indicating the possibility of fishy cyber activity;
  • Fraud detection in financial transactions.

 

Why is it important?

 
Modern businesses are beginning to understand the importance of interconnected operations to get the full picture of their business. Besides, they need to respond to fast-moving changes in data promptly, especially in case of cybersecurity threats. Anomaly detection can be a key for solving such intrusions, as while detecting anomalies, perturbations of normal behavior indicate a presence of intended or unintended induced attacks, defects, faults, and such.

Unfortunately, there is no effective way to handle and analyze constantly growing datasets manually. With the dynamic systems having numerous components in perpetual motion where the “normal” behavior is constantly redefined, a new proactive approach to identify anomalous behavior is needed.

 

Statistical Process Control

 
Statistical Process Control, or SPC, is a gold-standard methodology for measuring and controlling quality in the course of manufacturing. Quality data in the form of product or process measurements are obtained in real-time during the manufacturing process and plotted on a graph with predetermined control limits that reflect the capability of the process. Data that falls within the control limits indicates that everything is operating as expected. Any variation within the control limits is likely due to a common cause — the natural variation that is expected as part of the process. If data falls outside of the control limits, this indicates that an assignable cause might be the source of the product variation, and something within the process needs to be addressed and changed to fix the issue before defects occur. In this way, SPC is an effective method to drive continuous improvement. By monitoring and controlling a process, we can assure that it operates at its fullest potential and detect anomalies at early stages.

Introduced in 1924, the method is likely to stay in the heart of industrial quality assurance forever. However, its integration with Artificial Intelligence techniques will be able to make it more accurate and precise and give more insights into the manufacturing process and the nature of anomalies.

 

Tasks for Artificial Intelligence

 
When human resources are not enough to handle the elastic environment of cloud infrastructure, microservices and containers, Artificial Intelligence comes in, offering help in many aspects:

Figure

Tasks for Artificial Intelligence

 

Automation: AI-driven anomaly detection algorithms can automatically analyze datasets, dynamically fine-tune the parameters of normal behavior and identify breaches in the patterns.

Real-time analysis: AI solutions can interpret data activity in real time. The moment a pattern isn’t recognized by the system, it sends a signal.

Scrupulousness: Anomaly detection platforms provide end-to-end gap-free monitoring to go through minutiae of data and identify smallest anomalies that would go unnoticed by humans

Accuracy: AI enhances the accuracy of anomaly detection avoiding nuisance alerts and false positives/negatives triggered by static thresholds.

Self-learning: AI-driven algorithms constitute the core of self-learning systems that are able to learn from data patterns and deliver predictions or answers as required.

 

Learning Process of AI Systems

 
One of the best things about AI systems and ML-based solutions is that they can learn on the go and deliver better and more precise results with every iteration. The pipeline of the learning process is pretty much the same for every system and comprises the following automatic and human-assisted stages:

  • Datasets are fed to an AI system
  • Data models are developed based on the datasets
  • A potential anomaly is raised each time a transaction deviates from the model
  • A domain expert approves the deviation as an anomaly
  • The system learns from the action and builds upon the data model for future predictions
  • The system continues to accumulate patterns based on the preset conditions
Figure

Learning Process of AI Systems

 

As elsewhere in AI-powered solutions, the algorithms to detect anomalies are built on supervised or unsupervised machine learning techniques.

 

Supervised Machine Learning for Anomaly Detection

 
The supervised method requires a labeled training set with normal and anomalous samples for constructing a predictive model. The most common supervised methods include supervised neural networks, support vector machine, k-nearest neighbors, Bayesian networks and decision trees.

Probably, the most popular nonparametric technique is K-nearest neighbor (k-NN) that calculates the approximate distances between different points on the input vectors and assigns the unlabeled point to the class of its K-nearest neighbors. Another effective model is the Bayesian network that encodes probabilistic relationships among variables of interest.

Supervised models are believed to provide a better detection rate than unsupervised methods due to their capability of encoding interdependencies between variables, along with their ability to incorporate both prior knowledge and data and to return a confidence score with the model output.

 

Unsupervised Machine Learning for Anomaly Detection

 
Unsupervised techniques do not require manually labeled training data. They presume that most of the network connections are normal traffic and only a small amount of percentage is abnormal and anticipate that malicious traffic is statistically different from normal traffic. Based on these two assumptions, groups of frequent similar instances are assumed to be normal and the data groups that are infrequent are categorized as malicious.

The most popular unsupervised algorithms include K-means, Autoencoders, GMMs, PCAs, and hypothesis tests-based analysis.

Figure

The most popular unsupervised algorithms

 

 

SciForce’s Chase for Anomalies

 
Like probably any company specialized in Artificial Intelligence and dealing with solutions for IoT, we found ourselves hunting for anomalies for our client from the manufacturing industry. Using generative models for likelihood estimation, we detected the algorithm defects, speeding up regular processing algorithms, increasing the system stability, and creating a customized processing routine which takes care of anomalies.

For anomaly detection to be used commercially, it needs to encompass two parts: anomaly detection itself and prediction of future anomalies.

 

Anomaly detection part

 
For the anomaly detection part, we relied on autoencoders — models that map input data into a hidden representation and then attempt to restore the original input from this internal representation. For regular pieces of data, such reconstruction will be accurate, while in case of anomalies, the decoding result will differ noticeably from the input.

Figure

Results of our anomaly detection model. Potential anomalies are marked in red.

 

In addition to the autoencoder model, we had a quantitative assessment of the similarity between the reconstruction and the original input. For this, we first computed sliding window averages for sensor inputs, i.e. the average value for each sensor over a 1-min. interval each 30 sec. and fed the data to the autoencoder model. Afterwards, we calculated distances between the input data and the reconstruction on a set of data and computed quantiles for distances distribution. Such quantiles allowed us to translate an abstract distance number into a meaningful measure and mark samples that exceeded a present threshold (97%) as an anomaly.

 

Sensor readings prediction

 
With enough training data, quantiles can serve as an input for prediction models based on recurrent neural networks (RNNs). The goal of our prediction model was to estimate sensor readings in future.
Though we used each sensor to predict other sensors’ behavior, we had trained a separate model for each sensor. Since the trends in data samples were clear enough, we used linear autoregressive models that used previous readings to predict future values.

Similarly to the anomaly detection part, we computed average each sensor values over 1-min. interval each 30 sec. Then we built a 30-minute context (or the number of previous timesteps) by stacking 30 consecutive windows. The resulting data was fed into prediction models for each sensor and the predictions were saved as estimates of the sensor readings for the following 1-minute window. To expand over time, we gradually substituted the older windows with predicted values.

Figure

Results of prediction models outputs with historical data marked in blue and predictions in green.

 

It turned out that the context is crucial for predicting the next time step. With the scarce data available and relatively small context windows we could make accurate predictions for up to 10 minutes ahead.

 

Conclusion

 
Anomaly detection alone or coupled with the prediction functionality can be an effective means to catch the fraud and discover strange activity in large and complex datasets. It may be crucial for banking security, medicine, marketing, natural sciences, and manufacturing industries which are dependent on the smooth and secure operations. With Artificial Intelligence, businesses can increase effectiveness and safety of their digital operations — preferably, with our help.

 
Original. Reposted with permission.

Related: