What's the best way to detect anomalies in your data? We explore different approaches to getting the most value from your industrial data to help you avoid downtime.
“Prediction is very difficult, especially about the Future”… Neils Bohr, Danish Physicist
“Prediction is very difficult, especially about the Future”… Neils Bohr, Danish Physicist
Industrial downtime for today’s complex businesses means much more than a simple inconvenience. The cost of unplanned interruptions, the impact of unforeseen failures, the effect of unexpected breakdowns. All of this cumulatively means a lot more than factory workers merely being prevented from completing their normal tasks.
Whenever a technician on the factory floor has to address unplanned maintenance…
Every time a simple repair swallows hours of precious productive labor…
Each time service personnel are called to fix an abrupt overhaul …
Every time this happens, an industrial unit stands to lose $540,000 per hour from a preventable technical failure.
Industrial asset downtime is in fact a $647 billion problem. Addressing this problem begins with getting the foundation right. A foundation which is rock solid, built with scientific predictive analytics and accurate anomaly detection.
To help secure this foundation, a number of asset intensive Industries are investing in real-time Anomaly Detection and Prediction. Accurate Anomaly Detection can go a long way to successfully predict failures of critical assets.
Most industrial Anomaly Detection efforts fail, as these systems end up identifying either too many anomalies (false positives) or not enough (false negatives). Anomaly Detection itself is a complex concept as most of the time one may be searching for anomalies without being aware of what comprises an anomaly. Or, one is looking for potential abnormalities without knowing what even qualifies to be an “abnormal state.”
Since the process involves scouting for those “unknown unknowns” amidst a sea of industrial data patterns, using the right approach is extremely important. After all, the objective is to cull out uncertainties and add predictability to industrial operations.
Below, we discuss some of the most common approaches used for Anomaly Detection and Prediction. We also highlight the advantages of using certain suggested approaches as best practices, for nailing those anomalies on time.
Anomaly Detection can use two basic methods—rule-based or supervised machine learning detection systems. Rule-based systems are designed by defining specific rules that describe an anomaly and assign thresholds and limits. They typically rely on the experience of industry experts and are ideal to detect “known anomalies.” These known anomalies are familiar to us as we recognize what is normal and what is not.
One of the major flaws of rule-based systems is that they don’t adapt automatically as patterns change. To learn new patterns, a new model would have to be built each time with labelled data. As a result, these models are not suitable for dynamic high-velocity data. Further, the data labeling process itself can be manually intensive, error prone and can lead to poor model performance. Basically, it is quite a challenge to capture the “unknown unknowns” with supervised methods.
Since real business scenarios are quite complex and full of uncertainties, everything may not be happening the “known” way. For something that is out of the ordinary or discovered so far, rule-based Anomaly Detection is ineffective. In such cases, unsupervised/semi-supervised machine learning-based detection systems are most appropriate. These are cognitively enabled systems which use machine learning algorithms designed to predict anomalies which occur from unusual situations.
Unsupervised learning can help deduce patterns that are unusual and alert plant operators accordingly. Just like the human brain which tries to predict the next note in a melody, these machine learning algorithms constantly predict what is likely to happen next in the metric data stream. By being capable enough to predict multiple data patterns at once, they give a likelihood score for each prediction. With each new metric data point arriving, the learning algorithms compare their prediction to the new input to see if the prediction is accurate or not. The final result is a clear “yes” or “no.” You finally have an Intelligent way to detect and predict “unknown” anomalies using a cognitive system with greater accuracy before the incident occurs.
In the traditional top down approach, different features are calculated for each sensor. All these features generated from each sensor are then put together to define the state space. The motivation behind generating features here is to capture different characteristics of each sensor in different stages of operation.
For example, when there is an upward or downward trend in the signal, the rate of change feature would become very useful. Similarly, when there is a well-defined periodicity in a specific sensor’s values, features from the frequency domain would be very useful. While each feature being generated would be useful, their utility is generally local to a signal with a specific characteristic. It may not be as useful in a global setting once the signal stops displaying those characteristics. So, when a particular sensor gets out of the phase where it followed a positive or negative trend, the rate-of-change features would mostly be close to zero.
In addition, all sensors might not exhibit the same characteristics. This makes some of the features being generated less useful for that particular sensor. But since engineered features are applied to the whole data, these feature calculations result in increasing the dimension of state space. This sparseness of information will generally increase with increasing number of sensors and as each sensor passes through different stages of operations.
However, in the bottom up approach, the state space of the machine is developed acknowledging that each sensor is a mapping into a portion of the dynamic process generating the data. Additionally, it recognizes that sensors can pass through different stages as the process/environment/configuration on the machine is changing.
Before defining the state space of the machine, sensors are broken into their respective stages. Hence it does not rely on feature engineering to code information from different stages of operations implicitly.
This also takes into account the fact that anomalies may only show up in a specific sub-group of sensors. This may be difficult to detect in the machine state space, especially when a large number of sensors are present. At the same time, machine level anomalies resulting from the interaction between sensors will be difficult to detect at the individual sensor level.
Hence, anomalies are detected within each stage of each sensor and then again in the machine state space. The approach can also identify sequence based anomalies where various normal-looking stages may appear out of known sequences both at the sensor as well as machine level.
A manual approach to Anomaly Detection is good at detecting the outliers or the extreme value points which cause anomalies. It just relies on sample data to train and build machine learning models. However, since anomalies are rare events, picking data samples may not contain all failures or signals.
The biggest flaw here is that it is reactive in nature and past anomalies may not always be indicative of future problems. Also, it tries to fit a single behavioral pattern model for the entire time series of data. This results in building a single predictive model for all entities.
Since every entity comprises multiple stages, they cannot be explained by a single model. Every entity is different in the real world due to environmental influences, so a single model can’t learn and predict for all.
Further, even normal values that are out of sequence are anomalies and might be the critical ones. Therefore, just identifying anomalies isn’t useful, but assigning an importance based out outcomes is needed.
A cognitive approach to Anomaly Detection and Prediction, on the other hand, applies a machine-first approach. It creates a mechanism where the algorithms learn the domain from the data and the subject matter expert feedback. The process starts by creating unique data signatures to identify similar and different entities. It then extracts data patterns to learn and model normal states and then identify anomalous states. These become the “anti-patterns” for each and every asset.
Next, it trains an ensemble of models from many similar entities. It runs and learns from multiple data science experiments. Finally, it applies the ensemble of models to predict the outcomes for each and every entity. By learning from meta-data, it improves models continuously. These results are finally validated with subject matter experts.
You eventually get a fully automated and cognitively enabled machine learning system, where anomalies are detected and predicted before they occur. All of this with minimal manual intervention.
While there are multiple approaches to Anomaly Detection and Prediction, how does one get started? Which are the critical factors to keep in mind when choosing the best approach for your industry?
Here is a comprehensive checklist to help you sift through the various dimensions of these approaches. Download this checklist and make an informed decision.
Download the Checklist
Anita Raj is a Product Marketing and Growth Hacking Strategist on the Progress DataRPM team, with over 10 years of experience working in the field of big data, cloud and machine learning. She brings a deep expertise in running growth marketing leadership experience at multi-billion-dollar enterprise companies and high growth start-ups.
Copyright © 2018 Progress Software Corporation and/or its subsidiaries or affiliates.
All Rights Reserved.
Progress, Telerik, and certain product names used herein are trademarks or registered trademarks of Progress Software Corporation and/or one of its subsidiaries or affiliates in the U.S. and/or other countries. See Trademarks for appropriate markings.