Journals & Magazines >IEEE Access >Volume: 10

Quantile Autoencoder With Abnormality Accumulation for Anomaly Detection of Multivariate Sensor Data

Illustration of quantile autoencoder and abnormality accumulation technique. Quantile autoencoder reconstructs input data with uncertain boundary, and anomaly score is de...

Abstract:

Anomaly detection (AD) is a crucial task in various industrial sectors where large amounts of data are generated from multiple sensors. Deep learning-based methods have m...Show More

Metadata

Abstract:

Anomaly detection (AD) is a crucial task in various industrial sectors where large amounts of data are generated from multiple sensors. Deep learning-based methods have made significant progress in AD, owing to big data and deep neural networks (DNN). Most methods for deep anomaly detection (DAD) utilize reconstruction error (i.e., the difference between the original and reconstructed values) as a measure of abnormality. However, AD performance can be improved by diversifying the source of anomaly score. To support this, we introduce the concept of anomaly source diversification and provide mathematical proofs to support this idea. In this regard, we propose a quantile autoencoder (QAE) with abnormality accumulation (AA) as a novel DAD approach that leverages data uncertainty and iteratively obtains reconstruction errors as additional sources. The anomaly score with QAE is derived from both the reconstruction error and the uncertainty term which is the range between the two quantiles. In addition, AA aggregates the errors obtained from the recursive reconstruction of the input, after which calculates the anomaly score based on the Mahalanobis distance. This process induces the score distributions of both the normal and abnormal samples farther apart by narrowing the width of the distributions, which contributes to the improvement of AD performance. The performance of the proposed QAE-AA was verified through the experiments on multi-variate sensor datasets in various domains; QAE-AA achieves 4-23% higher AUROC score on average compared to the other AD methodologies.

Illustration of quantile autoencoder and abnormality accumulation technique. Quantile autoencoder reconstructs input data with uncertain boundary, and anomaly score is de...

Published in: IEEE Access ( Volume: 10)

Page(s): 70428 - 70439

Date of Publication: 30 June 2022

Electronic ISSN: 2169-3536

DOI: 10.1109/ACCESS.2022.3187426

Funding Agency:

Contents

CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.

SECTION I.

Introduction

Anomaly detection (AD), which is also known as novelty detection or outlier detection, is a task that involves identifying abnormal cases in a pool of collected data or a data stream. Since the 1960s, it has been widely studied [1] and utilized in broad range of applications, such as, fraud detection [2]–[4], network security [5]–[7], video surveillance [8]–[10], medical diagnosis [11]–[13] and multiple sensor data [14]–[18]. In particular, as IoT and big data technologies have become common, acquiring meaningful features for performing AD from massive sensor data becomes more challenging. Under these circumstances, recent advancements in neural networks and deep learning have significantly influenced the field of AD. Deep anomaly detection (DAD) methods have demonstrated improved performance in many complicated AD tasks [19].

DAD framework can be classified into supervised, unsupervised, and semi-supervised AD based on the problem and data formulation. Supervised AD can be achieved when both normal and abnormal samples are sufficient and labeled [20]. However, in general, abnormal samples are neither sufficient nor labeled in many real-world applications [19]. This is the reason why semi-supervised and unsupervised AD methods have been applied widely in DAD. Unlike supervised AD, which directly determines whether the input is normal or not, both the semi-supervised and unsupervised AD methods learn the features of normality, after which they calculate the anomaly score to measure a degree of abnormality.

A recent review of [21] further categorized DAD according to the role of deep neural network (DNN) in AD process. Two major branches among them are generic normality feature learning and anomaly measure-dependent feature learning. The former is based on general feature extraction methods including autoencoders and generative adversarial networks (GAN). In doing this, networks learn generic representations to well reconstruct/generate/predict normal data. Typically, these methods utilize reconstruction error as an anomaly score. Feature extraction of the latter approach is more dependent on the anomaly scoring function. By designing a loss function for specific anomaly measures, deep learning model could learn score-dependent latent features, e.g., nearest neighbor distance, one-class classification (DSVDD [22]), and clustering-based scores (DAGMM [23]).

We focus on the former approach, in particular, autoencoder-based methods with semi-supervised AD settings which are easy to implement and have straightforward intuitions in detecting anomalies [21]. Deep autoencoders perform dimensionality reduction, such as principal component analysis [24]–[26] and random projection [27]–[29]. By training an autoencoder with normal samples, the network shows lower reconstruction error on normal data, but poor on abnormal data that are not exposed during the training. This difference allows us to separate two classes by reconstruction error. However, AD performance can be further improved because the objective function of minimizing reconstruction error is not identical to maximizing anomaly detection performance. In this regard, one approach to improve DAD performance is leveraging additional sources for anomaly scoring (i.e., anomaly source diversification), and here we exploit aleatoric uncertainty and recursive reconstruction errors.

There are two types of uncertainty in deep learning depending on its causes: epistemic uncertainty and aleatoric uncertainty [30]. Epistemic uncertainty, also called model uncertainty, originates from the difference between training results (difference between the models). Epistemic uncertainty can be reduced by acquiring more data. On the other hand, aleatoric uncertainty, which is also called data uncertainty, is attributed to the data itself. Therefore, it is inherent and irreducible. Recently, epistemic uncertainty has been considered in DAD, because the reconstruction for the abnormal sample has a significant variance depending on the model [31]–[35]. On the other hand, aleatoric uncertainty has received less attention in the field of DAD and is used as a threshold for the classification of normal and abnormal [36].

In this study, we propose a novel DAD framework that considers aleatoric uncertainty by introducing a quantile autoencoder (QAE). We leverage aleatoric uncertainty under the assumption of channel-wise consistency in normal conditions; that is, the inherent deviation of normal data would be less than abnormal data. Aleatoric uncertainty in terms of the range between two quantiles is used in the proposed framework with reconstruction errors. In addition, we propose the abnormality accumulation (AA) technique, which aggregates and calculates the anomaly score based on the errors of recursive reconstructions. This makes the difference between normal and abnormal distributions more evident. We verified the proposed framework with multi-variate sensor datasets in different domains. Each of the two methods contributes to anomaly source diversification. We further provide theoretical grounds that support the idea of anomaly source diversification under the assumption of Gaussian error distribution.

The main contributions of this study are as follows:

We propose QAE for uncertainty-based DAD. Aleatoric uncertainty term, which is the range between the two quantiles, is additionally considered in anomaly scoring. To the best of our knowledge, it is the first time to utilize QAE and quantile range in AD as the source of anomaly score.
We propose AA, in which recursive reconstruction error is additionally considered in anomaly scoring. AA decreases the overlapping region between anomaly score distributions on normal and abnormal data. Therefore, the two score distributions become more distinguishable, which facilitates the determination of the normal and anomalies. The performance of the proposed QAE-AA is tested using various multi-variate sensor datasets, and it demonstrates a significant improvement in AD performance in terms of AUROC.
We introduce the concept of anomaly source diversification, which means it becomes easier to distinguish normals and abnormals as we diversify and gather more error sources in calculating anomaly scores. We provide mathematical proofs of why anomaly source diversification is helpful in reconstruction error-based DAD methods under the assumption of Gaussian error distribution. This explains the utilization of QAE, AA, and Mahalanobis distance in the proposed framework, and we further show that the empirical error distributions can be modeled by a mixture of Gaussians.

This paper consists of five sections, including the introduction. In Section II, we introduce the related DAD methods. Section III describes the proposed framework, QAE-AA, and concept of anomaly source diversification. The experiments’ results and conclusions are presented in Sections IV and V, respectively.

SECTION II.

Related Works

A. Reconstruction Error Based Methods

In DAD by generic normality feature learning, the difference between an input and the reconstructed output (e.g., mean squared error) is typically used as a measure of abnormality. In general, the autoencoder (AE) [45]–[47] and the variational autoencoder (VAE) [37], [48], [49] have been utilized because of their superior capability to learn latent representations. Once a neural network is trained using normal samples to minimize its reconstruction error, the normal samples are reconstructed effectively from lower-dimensional latent features, but the abnormal samples are not. Therefore, anomalies produce larger reconstruction errors and can be distinguished by properly specifying a threshold for the normal cases. In a recent work of [39], a novel DAD methodology namely reconstruction along projection pathways (RAPP) was introduced which leverages reconstruction errors in latent spaces. It first collects the latent features in each layer of the encoder from the first forward path. By inputting the output into the encoder network again, the difference in the latent features of the second forward path is used as an anomaly score.

Another branch is based on generative adversarial networks (GANs) [11], [38], [43], [50], [51]. By utilizing GAN architecture, a neural network can model the distribution of normal samples. Anomaly scoring in GAN approaches is also based on the reconstruction error, and discriminator loss is additionally used with the reconstruction error [11], [43], [50], [51]. GANomaly [38] performs anomaly scoring based on the reconstruction error of bottleneck features. Thus, the score is defined in the latent space. The unified framework of GAN-based DAD is well summarized in [43], and it shows that the ensemble of anomaly scores from GAN variants further improves detection performance.

Previous studies show the utilization of the latent reconstruction errors, discriminator loss, and ensemble of reconstruction errors has improved the AD performance. This can be explained by the concept of anomaly source diversification. Similarly, uncertainty can also be an additional source, and contribute to the improved performance of DAD methods.

B. Anomaly Detection With Uncertainty

There are various methods for dealing with uncertainty within a deep learning framework, such as Bayesian deep learning, the Monte-Carlo (MC) dropout technique, and deep ensembles [52]. In particular, the MC dropout technique is adopted in uncertainty-based DAD applications [31]–[35]. MC dropout utilizes dropout in both the training and inference stages. Therefore a single neural network can generate different outputs based on the probabilistic connections between neurons. The statistical information of the outputs can be obtained through MC sampling, where the dropout is activated during the inference stage. This approach mainly aims to exploit epistemic uncertainty. The following examples use the MC dropout technique for epistemic uncertainty quantification: [31] conducted a study on a deep learning-based time-series prediction with a confidence interval for the Uber dataset in which AD was performed by triggering an alarm when the observed value fell outside the 95% predictive interval. Uncertainty in terms of the variance of the reconstruction errors was utilized as a weighting factor for the anomaly score [32]. In the field of medical imaging, uncertainty was adopted to diagnose diabetic retinopathy from fundus images in [33]. In the study conducted by [34], pixel-wise variations in retinal optical coherence tomography images were derived and utilized for the segmentation of abnormal areas. Similarly, the uncertainty of abnormal images in the MVTec-AD dataset was derived, after which the area under the receiver operating characteristic (AUROC) scores between the residual-based and uncertainty-based detection results was compared [35].

In this study, we consider aleatoric uncertainty via QAE and use the uncertainty term to diversify the sources for anomaly scoring. Our approach is different from the previous methods in terms of the method for measuring uncertainty (aleatoric uncertainty with multiple quantile regression) and the way in which uncertainty is used for anomaly scoring (Mahalanobis distance-based anomaly score).

SECTION III.

Proposed Methodology

In this section, we describe the proposed QAE model and AA technique in detail.

A. Quantile Autoencoder for Anomaly Detection

The aleatoric uncertainty is utilized for AD with QAE. Its basic concept is that the reconstruction of normal samples produces stable outputs within a certain range of variations for each input channel. In other words, normal cases reconstructed using normal-fitted latent features are likely to have low variability compared to the abnormal cases (due to the loss function that induces minimizing the variance of output). The consistency of normal samples is independently valid for each channel. To leverage aleatoric uncertainty, we propose a QAE which predicts multiple quantiles with a single neural network. Then, the range between two quantiles (upper and lower) is used as a degree of uncertainty. Then, the anomaly score is derived by Mahalanobis distance from both the reconstruction error and the uncertainty term, as illustrated in Fig. 1.

FIGURE 1.

Structure of the QAE for AD. The QAE predicts the median value and two quantiles. The reconstruction error and the aleatoric uncertainty are used for anomaly scoring.

Show All

1) Quantile Autoencoder

The basic AE performs a single-value reconstruction, which is the mean of Gaussian distribution by minimizing the mean squared error (MSE). The proposed QAE is a variation of the AE that predicts the different quantiles of the output distribution by minimizing the sum of pinball losses. Thus, the QAE performs multiple quantile regressions through a single neural network, which can be regarded as multi-task learning.

Let $X$ be a random variable with a cumulative distribution function $F_{X}(x)$ . Next, the $\tau$ -th quantile $x_\tau$ is $F_{X}^{-1}(\tau)$ , where $\tau \in (0,1)$ . The proposed QAE $Q$ comprises an encoder $Q_{enc}$ and a quantile decoder $Q_{dec}$ as follows: $\begin{align*} Q_{enc}(x)=&z, \\ Q_{dec}(z)=&[\hat {x}_{\tau _{l}},\hat {x}_{\tau _{m}},\hat {x}_{\tau _{u}}]=\hat {\mathbf {x}}_\tau,\tag{1}\end{align*}$ View Source where $\tau _{l} < 0.5$ , $\tau _{u}>0.5$ , and $\tau _{m}=0.5$ , that represent the lower, upper quantiles and median.

Quantile regression can be performed by minimizing the pinball loss [53]. For the $\tau$ -quantile, the pinball loss is defined as follows: $\begin{align*} L_\tau (x,\hat {x}_\tau) = \begin{cases} \tau (x-\hat {x}_\tau) & \text {if $x \geq \hat {x}_\tau $} \\ (1-\tau)(\hat {x}_\tau -x) & \text {if $x < \hat {x}_\tau $} \end{cases}\tag{2}\end{align*}$ View Source where $x$ represents an input and $\hat {x}_\tau$ represents a prediction of $\tau$ -quantile. The pinball loss function is a tilted absolute error function weighted by the target $\tau$ . Intuitively, the pinball loss results in a higher penalty of overestimation ( $\hat {x}_\tau \geq x$ ) for low quantiles ( $\tau < 0.5$ ). Therefore, the network is trained to underpredict. Similarly, the network overpredicts for high quantiles $\tau > 0.5$ . Finally, the loss for the proposed QAE, $L_{Q}$ , is a summation of the pinball losses with multiple $\tau \text{s}$ : $\begin{equation*} L_{Q}(x,\hat {\mathbf {x}}_\tau) = L_{\tau _{l}}(x,\hat {x}_{\tau _{l}})+L_{\tau _{m}}(x,\hat {x}_{\tau _{m}})+L_{\tau _{u}}(x,\hat {x}_{\tau _{u}})\tag{3}\end{equation*}$ View Source By training multiple quantiles at once, QAE could alleviate crossing quantiles problem [54].

2) Quantile-Based Anomaly Scoring

In the proposed approach, anomaly scores are derived from the reconstruction error and the aleatoric uncertainty term which is the range between the predicted upper and lower quantiles. From the output of the QAE $\hat {\mathbf {x}}_\tau$ , the reconstruction error $\epsilon _{rec}$ and the uncertainty term $\epsilon _{unc}$ in the form of row vector are defined as follows: $\begin{align*} \epsilon _{rec}=&x - \hat {x}_{\tau _{m}}, \\ \epsilon _{unc}=&\hat {x}_{\tau _{u}}-\hat {x}_{\tau _{l}}.\tag{4}\end{align*}$ View Source Then, typical MSE-based anomaly score $A_{r}(x)$ and a simple version of the quantile-based anomaly score $A_{q}(x)$ with $\epsilon = [\epsilon _{rec},\epsilon _{unc}]$ is expressed as follows: $\begin{align*} A_{r}(x)=&||\epsilon _{rec}||^{2}/d_{\epsilon _{rec}}, \tag{5}\\ A_{q}(x)=&||\epsilon ||^{2}/d_{\epsilon }\tag{6}\end{align*}$ View Source where $d$ is the dimension of corresponding vector. Because $\epsilon _{rec}$ and $\epsilon _{unc}$ exist in different domains, the difference between their magnitude ranges has a significant impact on anomaly scoring. This problem can be alleviated through normalization. Previous work of [39] reported a similar problem when utilizing reconstruction errors in both the original space and the latent spaces. To match the range of the error terms, a normalized distance with orthogonalization and scaling was proposed which is identical to the Mahalanobis distance. We also calculate the Mahalanobis distance-based normalized anomaly score $A_{nq}(x)$ as follows: $\begin{equation*} A_{nq}(x)=(\epsilon -\mu)S^{-1}(\epsilon -\mu)^{T},\tag{7}\end{equation*}$ View Source where $\mu$ and $S$ represent the channel-wise mean and covariance matrix obtained from $\epsilon$ of the training set, respectively. This approach is also adopted in the AA technique described below.

B. Abnormality Accumulation

In addition to the QAE, we propose an anomaly scoring technique, namely abnormality accumulation (AA), to further improve AD performance. The basic concept of AA is calculating the anomaly score based on the aggregated errors using a single AE. Let the superscript $i$ represent the number of iterations for AA. Then, $Q^{i}$ is a composite function $Q\circ Q\cdots \circ Q$ that $Q$ is repeated $i$ times; thus, $Q^{1}(x)$ is equal to $Q(x)$ . After the first iteration, $\hat {x}^{i}_{\tau _{m}}$ is used as the input for the next iteration. Consequently, $\epsilon ^{i}$ contains the reconstruction error and uncertainty term obtained from the result of $Q^{i}(x)$ . With a predefined number of iterations $N$ , we aggregate $\epsilon ^{i}\text{s}$ into $[\epsilon ^{i}]_{i=1}^{N}$ and calculate normalized anomaly score with AA as follows: $\begin{equation*} A_{nqa}(x)=([\epsilon ^{i}]_{i=1}^{N}-\mu _{A})S^{-1}_{A}([\epsilon ^{i}]_{i=1}^{N}-\mu _{A})^{T},\tag{8}\end{equation*}$ View Source where $\mu _{A}$ and $S_{A}$ represent the channel-wise mean and covariance matrix obtained from $[\epsilon ^{i}]_{i=1}^{N}$ of the training set. The illustration and algorithm of AA are presented in Fig. 2 and Algorithm 1.

Algorithm 1 Abnormality Accumulation

Input: input sample $x$ , QAE $Q$ , number of iteration $N$ .

set initial value $\hat {x}_{\tau _{m}}^{0}=x$ ;

for i = $1,2,3,\ldots,N$ do

obtain $[\hat {x}^{i}_{\tau _{l}},\hat {x}^{i}_{\tau _{m}},\hat {x}^{i}_{\tau _{u}}]=Q(\hat {x}^{i-1}_{\tau _{m}})$ ;

compute $\epsilon _{rec}^{i}=x-\hat {x}^{i}_{\tau _{m}}$ and $\epsilon _{unc}^{i}=\hat {x}^{i}_{\tau _{u}}-\hat {x}^{i}_{\tau _{l}}$ ;

concatenate $\epsilon _{rec}^{i}$ and $\epsilon _{unc}^{i}$ into $\epsilon ^{i}=[\epsilon _{rec}^{i},\epsilon _{unc}^{i}]$ ;

end for

aggregate $\epsilon ^{i}\text{s}$ to create a concatenated vector

$[\epsilon ^{i}]_{i=1}^{N}=[\epsilon ^{1}, \epsilon ^{2}, \epsilon ^{3},\ldots,\epsilon ^{N}]$ ;

10:

compute the channel-wise mean $\mu _{A}$ and the covariance matrix $S_{A}$ from $[\epsilon ^{i}]_{i=1}^{N}$ ;

11:

Output: anomaly score with AA,

12:

$A_{nqa}(x)=([\epsilon ^{i}]_{i=1}^{N}-\mu _{A})S^{-1}_{A}([\epsilon ^{i}]_{i=1}^{N}-\mu _{A})^{T}$ .

$FIGURE 2. - Illustration of AA. Abnormality becomes accumulated by aggregating $\epsilon ^{i}$ .$

FIGURE 2.

Illustration of AA. Abnormality becomes accumulated by aggregating $\epsilon ^{i}$ .

Show All

C. Anomaly Source Diversification

As previously stated, the use of QAE and AA with Mahalanobis distance in the proposed DAD methods can be supported by the concept of anomaly source diversification which is based on the following proposition.

Proposition 1:

For $k$ i.i.d. zero-mean Gaussian reconstruction error where the variance of normal is smaller than that of abnormal, MSE-based anomaly score distributions on normal and abnormal samples become more distinguishable as $k$ increases.

Proof:

Let $X_{n}$ and $X_{a}$ be random variables of normal and anomaly that follow the distributions of $P_{X_{n}}$ and $P_{X_{a}}$ , respectively, where $P_{X_{n}}\not \approx P_{X_{a}}$ . For AD, an AE $Q$ with its weight $\theta$ is trained to minimize MSE between input and reconstructed output for normal data. $\begin{equation*} \theta ^{*} = \mathop {\mathrm {arg\,min}} _\theta \mathbb {E}_{X_{n}}[||x-Q_\theta (x)||^{2}]. \tag{9}\end{equation*}$ View Source We further assume that the well-trained AE $Q_\theta$ has a zero-mean Gaussian distribution on its reconstruction errors, $E_{n}=X_{n}-Q_\theta (X_{n})\sim N(0,\sigma _{n}^{2})$ and $E_{a}=X_{a}-Q_\theta (X_{a})\sim N(0,\sigma _{a}^{2})$ with inequality of $\sigma _{n}\leq \sigma _{a}$ from minimization of Eq. 9. Then, the squared errors, $E_{n}^{2}$ and $E_{a}^{2}$ , follow the gamma distributions of $\Gamma (0.5,2\sigma ^{2}_{n})$ and $\Gamma (0.5,2\sigma ^{2}_{a})$ , respectively.

If we consider multivariate case of $k~i.i.d$ . error distributions $E_{1},\cdots, E_{k}$ in addition to the subscripts ( $n$ and $a$ ), $\bar {E^{2}}=\frac {1}{k} \sum _{i}^{k} E^{2}_{i}$ follows the gamma distribution $\Gamma (k/2,2\sigma ^{2}/k)$ , and its mean and variance can be obtained by the parameters of the gamma distribution, i.e., $\sigma ^{2}$ , and $2\sigma ^{4}/k$ . Then, $\bar {E}^{2}_{n}$ and $\bar {E}^{2}_{a}$ have fixed means which are $\sigma ^{2}_{n}$ and $\sigma ^{2}_{a}$ . On the other hand, variances of the two distribution decreases as $k$ increases because $Var(\bar {E_{n}^{2}})=2\sigma ^{4}_{n}/k$ and $Var(\bar {E_{a}^{2}})=2\sigma ^{4}_{a}/k$ . This makes the overlapping area between anomaly score distributions $\bar {E}^{2}_{n}$ and $\bar {E}^{2}_{a}$ decreases; thus, normal and abnormal become more distinguishable.

Fig. 3 shows the changes in the anomaly score distribution according to different $k$ . As the number of $k$ increases, the separation between both distributions becomes evident.

FIGURE 3.

Difference between gamma distributions of normal and abnormal data based on the increase in $k$ : (a) $k=1$ and $k=10$ , (b) $k=1$ and $k=100$ . The normal and abnormal distributions of $k=100$ are more distinguishable than those of $k=10$ .

Show All

The above proposition emphasizes the importance of obtaining as many $i.i.d$ Gaussian errors as possible in AD. Anomaly source diversification aims to obtain different error sources to achieve the effect of increasing $k$ . Typically, AD deals with multivariate data $x\in \mathbb {R}^{d}$ , and the proposed AA technique further expands $k=d$ to $k=N\cdot d$ using a single neural network, where $N$ indicates the number of iterations. Also, Mahalanobis distance corresponds to $\bar {E}^{2}$ because orthogonalization and scaling make rotated axes to be independent. Therefore the AD performance can be improved by the proposed methodology.

Note that Proposition 1 assumes the ideal case of zero-mean Gaussian error distributions on both normal and abnormal data. Although normally distributed empirical errors can be found in the literature [55], the assumption of zero-mean Gaussianity on untrained abnormal data can be questioned because empirical errors are more likely to be skewed and biased. In this regard, we analyze the empirical error distributions of the real-world datasets used in this study. Fig. 4 shows histograms of reconstruction errors on abnormal data with Gaussian mixture model fitting results. As can be seen, the Gaussian mixture model well fits empirical error distribution, and we further generalize Proposition 1 to Gaussian distribution with arbitrary mean and variance as follows.

FIGURE 4.

Error distribution with Guassian mixture model fitting results: (a) MI-F, (b) SNSR.

Show All

Proposition 2:

For $k$ i.i.d. non-zero-mean Gaussian reconstruction errors, MSE-based anomaly score distributions on normal and abnormal samples become separable as $k$ increases.

Proof:

For random variable $X\sim N(\mu _{x},1)$ , $Z=\sum _{i=1}^{k} X_{i}^{2}$ follows non-central chi-squared distribution with $k$ degree of freedom and non-centrality parameter $\lambda = \sum _{i=1}^{k} \mu _{x}^{2}$ . The mean and variance of $Z$ is $k+\lambda$ and $2(k+2\lambda)$ , respectively. If we denote reconstruction error with arbitrary mean and variance as $Y=\sigma _{y} X \sim N(\mu _{y},\sigma _{y}^{2}) = N(\mu _{x}\sigma _{y}, \sigma _{y}^{2})$ , then squared mean of $k~i.i.d~Y\text{s}$ can be represented by $Z$ as follows. $\begin{equation*} \frac {1}{k}\sum _{i=1}^{k} Y_{i}^{2} =\frac {\sigma _{y}^{2}}{k}\sum _{i=1}^{k} X_{i}^{2} = \frac {\sigma _{y}^{2}}{k} Z\tag{10}\end{equation*}$ View Source Then, the mean and variance of $\frac {1}{k}\sum _{i=1}^{k} Y_{i}^{2}$ are $\sigma _{y}^{2}+\mu _{y}^{2}$ and. $\frac {2\sigma _{y}^{2}}{k}(\sigma _{y}^{2}+2~\mu _{y}^{2})$ , respectively. Likewise to the Proposition 1, variance of anomaly score distribution decreases as $k$ increases but their mean is further apart by the bias of reconstruction error $\mu _{y}^{2}$ ; mean and variance of mean square of $k$ i.i.d zero-mean Gaussian $N(0,\sigma ^{2})$ are $\sigma ^{2}$ and $2\sigma ^{4}/k$ .

Thus, increasing $k$ is still helpful for Gaussian with arbitrary mean and variance. In this perspective, the assumption of zero-mean can be considered the most difficult case in AD, because it is impossible to classify the normal and abnormal when $\sigma _{a}$ is equal to or less than $\sigma _{n}$ .

SECTION IV.

Experiments

To verify the effectiveness of the proposed methodology, we compare the AUROC score which is a well-known evaluation metric for classification models. In binary classification, the perfect classifier has AUROC of 1, and the uniformly random classifier has AUROC of 0.5. During the experiment, we refer to the verification framework and datasets presented in the work of [39].

A. Datasets and Problem Settings

We compare the results of five different datasets; MI-F, MI-V, EOPT, RARM (binary class), and SNSR (multi class). Description of datasets is given in Table 2. For experiments, we set two normality conditions; unimodal normality and multimodal normality. Unimodal normality indicates a single normal class, and multimodal normality means a normal dataset is composed of multiple normal classes. The binary class datasets have labeled normal class. Therefore, the experiment is performed with unimodal normality. On the other hand, a multi-class dataset has no explicit labels for normal. Therefore, we set a target class that is considered normal in unimodal normality and abnormal in multimodal normality (thus remaining classes become normal, and in the following tables, MM is used for the abbreviation of multimodal normality.) We report averaged results on a different target class. For the training-test split, randomly selected 60% of the data in the normal class is used for training, and each half of the remaining data is used for the validation and normal test sets, respectively. All input features are normalized with z-score normalization.

TABLE 1 Summary of Literature on the Recent Relevant Anomaly Detection Models

TABLE 2 Description of the Benchmark Datasets

B. Network Structure and Experimental Setup

We build QAE with the same backbone network structure in [39] except for the final layer and loss function to provide the prediction of multiple quantiles $\hat {\mathbf {x}}_\tau$ . The target quantiles were set to $\tau _{l}=0.3, \tau _{m}=0.5$ and $\tau _{u}=0.7$ . Each $Q_{enc}$ and $Q_{dec}$ have 10 dense layers with the Leaky-ReLU activation function. The dimensions of $z=Q_{enc}(x)$ listed in Table 2 are derived from the PCA results. The networks were trained using the Adam optimizer [60] and built using Pytorch [61].

We compared the results of the proposed methodology by following two steps for the ablation study. First, we compared the QAE with different anomaly score settings: $A_{r}, A_{q}, A_{nq}$ . In doing so, MSE-based anomaly score from a basic AE trained to minimize MSE was used as the benchmark result. Second, we compared the results of $A_{nqa}$ according to different numbers of iterations. Finally, we compared the AUROC score achieved via the proposed QAE with AA to other AD methodologies, including machine learning-based and recent deep learning-based methodologies reported in the works of [39], [40] (OCSVM to XAE); DN2 from [44]. In addition, we also compared the result of DSVDD [22] and DAGMM [23] models which are non-reconstruction error-based DAD methods. The above experimental setups are identical to those described in the work of [39], and the two models (DSVDD, DAGMM) are implemented with the same backbone networks as the QAE. Note that the experimental results described below are a summary of ten trials per model setting from different random seeds.

C. Experimental Results

Table 3 compares the mean and standard deviation of the AUROC results from the QAE with different scoring functions. Utilizing uncertainty terms without normalization ( $A_{q}$ ) induced a performance decrease compared to the $A_{r}$ scores. But after the normalization, the AUROC of $A_{nq}$ shows the best performance for all cases except EOPT. Compared to the $A_{r}$ of basic AE, AD performance of $A_{nq}$ is increased by 9% on average. Specifically, AUROC score is increased by 23% and 17% for MI-F and RARM, respectively.

TABLE 3 AUROC (%) of QAE

Table 4 summarizes the AUROC of $A_{nqa}$ (mean and standard deviation) by changing the number of iterations $N$ from one to six ( $N=1$ is equal to $A_{nq}$ ). Although the increment is different for each dataset, the result shows that the AD performance can be improved by using additional reconstruction errors through iteration. Compared to the result without AA ( $A_{nq}$ ), MI-F and $\text {SNSR}_{\text {MM}}$ achieved 9% and 11% higher AUROC scores. Unlike the ideal case in Proposition 1, the iteration at which the performance is maximized differs for each data. But in most cases, significant performance changes occurred in the initial 2~3 iteration, and we compared the result of $A_{nqa}$ with fixed iteration $N=3$ in Table 5.

TABLE 4 AUROC (%) of QAE-AA

TABLE 5 Comparison of AUROCs (%) to Other Benchmark Results

Finally, Table 5 shows the performance comparison between the proposed methods and other AD methods. The characteristics of anomalies are different for each dataset, thus it is difficult to find a unique model that overwhelms the others for all cases. However, the proposed QAE-AA shows the best AUROC score in four out of six datasets; MI-V, RARM, SNSR, $\text {SNSR}_{\text {MM}}$ . For these datasets, AUROC increased 4.6% on average compared to the second-best result. In addition, in terms of the average score for all six datasets described at the bottom of the table, QAE-AA shows 4-23% higher AUROC; AUROC is increased 12% in overall.

This comparison result to various AD methodologies verifies the effectiveness of the proposed methodology, and shows that the concept of anomaly source diversification could be embodied by utilizing aleatoric uncertainty and iterative reconstruction errors.

D. Visualization on Anomaly Score Sources

Fig. 5 shows examples of the QAE output (i.e., square of anomaly sources $\epsilon _{rec}$ and $\epsilon _{unc}$ ) on different target channels. The first two columns show the results of a single iteration of QAE, whereas the right two columns show the results from the fourth iteration. Examples in Fig. 5 indicate that both aleatoric uncertainty term and reconstruction error after iterations could be good sources for anomaly scoring. As can be seen, abnormal samples are distributed in a higher error range. Thus, they can be classified according to the anomaly threshold.

$FIGURE 5. - Examples of squared anomaly sources $\epsilon _{rec}^{2}$ and $\epsilon _{unc}^{2}$ on different target channel. $x$ -axis indicates index of data and $y$ -axis indicates value of corresponding anomaly source: (a) MI-F, Channel 1, (b) RARM, Channel 3.$

FIGURE 5.

Examples of squared anomaly sources $\epsilon _{rec}^{2}$ and $\epsilon _{unc}^{2}$ on different target channel. $x$ -axis indicates index of data and $y$ -axis indicates value of corresponding anomaly source: (a) MI-F, Channel 1, (b) RARM, Channel 3.

Show All

E. Analysis on Score Distribution

Fig. 6 shows the changes in anomaly score distributions for MI datasets. Anomaly score distributions of normal and abnormal samples are illustrated in blue and orange histograms, respectively. Although the shapes do not perfectly match the ideal cases presented in Fig. 3, it is observed that the overlapping region between normal and abnormal distributions is reduced when applying QAE and AA. Therefore higher AUROC score can be achieved by the proposed methodology.

FIGURE 6.

Examples of anomaly score distribution of normal and abnormal data: (a) MI-F, (b) MI-V.

Show All

SECTION V.

Conclusion

In this research, we investigate the concept of anomaly score diversification, and propose QAE network and AA technique. The effectiveness of the proposed framework is verified with experiments on real-world datasets. Anomaly source diversification is inspired by the idea that diversifying error sources in the calculation of anomaly scores induces performance improvement in anomaly detection. We provide a theoretical background for this by showing that the distributions of mean square error anomaly scores on normal and abnormal become farther apart as the number of error sources under the assumption of Gaussian error increases.

In doing this, we propose a QAE that produces not only median but also quantiles to leverage aleatoric uncertainty as an additional error source for anomaly scoring. The outputs reconstructed from the abnormal samples are likely to have larger channel-wise uncertainty than that of normal samples likewise to the reconstruction errors. In addition, we introduce the AA technique that aggregates the errors via recursive reconstructions and then calculates anomaly score by using Mahalanobis distance. As the dimension of the errors is increased by the recursion, the difference between the anomaly score distributions of normal and abnormal samples becomes more apparent. The effectiveness of the proposed QAE-AA is verified with various datasets in the real world. QAE-AA obtained the highest AUROC score in four out of six datasets and achieved an average 4% to 23% higher AUROC score. These experimental results show that the proposed methodology can improve AD performance.

Recent works [44], [62] reported notable AD performance in some benchmark datasets by utilizing adversarial examples and time series AD setting, which can be additionally applied to the proposed QAE-AA framework. In this regard, our future research is moving toward overcoming the limitation and further improving AD performance. For example, AD performance on image data can be enhanced by additionally considering epistemic uncertainty or latent space errors.

ACKNOWLEDGMENT

An earlier version of this paper was presented in part at the Workshop on AI for Design and Manufacturing (ADAM) during the 36th AAAI Conference on Artificial Intelligence (AAAI-22), in February 2022 [63].

References is not available for this document.

MIT Libraries

MIT Libraries

Quantile Autoencoder With Abnormality Accumulation for Anomaly Detection of Multivariate Sensor Data

Abstract:

Metadata

Abstract:

Funding Agency:

Introduction