What Are Unsupervised Anomaly Detection Techniques?
Unsupervised Anomaly Detection Techniques
Overview
Unsupervised anomaly detection (UAD) is a machine learning approach that identifies unusual or abnormal patterns in data without the need for labeled training data.
Unlike supervised learning, which requires labeled examples of normal and anomalous data, UAD techniques aim to learn the intrinsic characteristics of normal data and then flag any significant deviations as potential anomalies.
This makes UAD particularly useful in scenarios where labeled data is scarce or difficult to obtain, such as in industrial monitoring, cybersecurity, and healthcare applications. (Faura et al., 2021) (Ghajari et al., 2024) (Alnutefy & Alsuwayh, 2024)
Key Unsupervised Anomaly Detection Techniques
Clustering-based Methods
-
K-means Clustering: Partitions the data into K clusters and identifies outliers as data points that are far from the cluster centroids. (Alnutefy & Alsuwayh, 2024)
-
Cluster-based Local Outlier Factor (CBLOF): Extends the Local Outlier Factor (LOF) algorithm by considering the cluster size and density to identify outliers. (Faura et al., 2021)
Density-based Methods
-
Local Outlier Factor (LOF): Measures the local density deviation of a data point compared to its neighbors to identify outliers. (Ghajari et al., 2024) (Hong et al., 2024)
-
Isolation Forest: Builds an ensemble of decision trees that isolate anomalies by finding features that have unusually short path lengths. (Faura et al., 2021) (Hong et al., 2024)
Statistical Methods
-
Gaussian/Elliptic Envelope: Fits a Gaussian or elliptic distribution to the data and identifies outliers as data points that fall outside the learned distribution. (Alnutefy & Alsuwayh, 2024)
-
Markov Chain: Models the data as a Markov chain and identifies anomalies as states that have a low probability of occurrence. (Alnutefy & Alsuwayh, 2024)
-
PCA Reconstruction Error: Uses Principal Component Analysis (PCA) to project the data into a lower-dimensional space and identifies outliers as data points with high reconstruction error. (Faura et al., 2021)
Machine Learning-based Methods
-
One-Class Support Vector Machine (SVM): Learns a decision boundary around the normal data and identifies outliers as data points that fall outside the boundary. (Faura et al., 2021) (Hong et al., 2024)
-
Long Short-Term Memory (LSTM): Uses a type of recurrent neural network to model the temporal patterns in the data and identify anomalies as deviations from the learned patterns. (Faura et al., 2021) (Kakar et al., 2024)
-
LSTM Encoder-Decoder: Extends the LSTM approach by using an encoder-decoder architecture to learn a compressed representation of the normal data and identify anomalies as data points with high reconstruction error. (Faura et al., 2021) (Kakar et al., 2024)
Other Techniques
-
Mean Linear Regression Residual: Trains a linear regression model for each data point using the rest of the data as predictors, and identifies anomalies as data points with high mean residual. (Faura et al., 2021)
-
Data-Driven Metric Learning-based Anomaly Detection: Learns a distance metric from the data to identify anomalies as data points that are far from the normal data distribution. (Kakar et al., 2024)