Introduction
Direction of arrival (DoA) estimation is the process of determining the location of signals arriving at a group of sensors. DoA estimation is performed in many fields, such as radar, telecommunications, speech, and sonar [1] and [2]. This problem has been studied for many decades with the initial development of conventional beamforming (CBF) by Bartlett in 1948 [3], followed by methods including Capon’s beamformer in 1969 [4], the minimum norm method [5], [6] in 1979, multiple signal classification (MUSIC) [7] in 1986, estimation of signal parameters via rotational invariance techniques (ESPRIT) [8] in 1989, and more recently compressive sensing [9]. Starting in the 1990s, machine learning (ML) based approaches began to be applied to source localization [10]; however, the use of ML methods (also known as model-free, or data driven methods) has significantly increased in the last decade due to both the advancement and use of graphical processing units (GPUs) for ML and the more readily available off-the-shelf ML packages such as TensorFlow, PyTorch, or MATLAB’s own toolboxes.
Many different types of ML approaches have been studied for DoA estimation such as multi-layer perceptrons (MLPs), support vector machines (SVMs), convolutional neural networks (CNNs), autoencoders, residual networks (ResNets), and vision transformers (ViTs) along with hybrid approaches combining multiple network types.
MLPs were one of the first ML-methods used for DoA estimation in [11] which used an MLP with non-linear activation functions to estimate DoA for signals in random noise. MLPs continued to be used today such as in [12] which developed a deep neural network (DNN), which is functionally similar to an MLP, to perform DoA estimation for up to four sources in which the network also estimates the number of sources present. Additionally, MLPs has been used to perform DoA estimation and number of source estimation in non-gaussian noise as illustrated in [13] and they have been extended to subarray sampling use cases [14].
SVMs have also been applied to the DoA estimation problem including [15] which developed a multi-class SVM algorithm for DoA estimation and [16] which used support vector regression in wireless communication which was then validated with experimental data in [17]. More recently a sensitivity study of SVM error performance to hyperparameters has been performed [18].
The use of CNNs include [19] which developed a CNN focused on DoA estimation in low SNR that performed well for low numbers of snapshots and in cases of SNR mismatches while also estimating the number of sources present. Reference [20] studied the improvements in CNN classification accuracy when imposing the natural shift-invariant structure present in ULAs (and other arrays) on the input data to the CNN. Reference [21] then extended this work to CNNs but posed the ML problem as a regression problem. Reference [22] has also extended CNNs for three-dimensional DoA estimation while also developing a output layer formulation allowing for probabilities at each output neuron to be calculated providing the ability to apply confidence intervals to predictions.
Autoencoders, more specifically denoising autoencoders, have been leveraged in DoA estimation in order to improve the estimate of the sample covariance matrix [23]. The denoised sample covariance matrix is then used in classical covariance based DoA estimation schemes. An autoencoder was also used in [24] to perform DoA estimation but included array imperfections seeking to account for real world scenarios. ResNets have also been applied to DoA estimation in [25]. ResNets were originally developed to overcome the vanishing gradient problem [26] by including skip connections across single or multiple layers of a neural network (NN). More recently, ViTs have been used for multi-class classification of DoA in [27]. This work is an outgrowth of the success of transformers in natural language processing [28], and their extension to image classification with ViTs which have been shown to outperform CNNs in image classification [29] and [30] leading to a natural extension of applying ViTs to DoA estimation due to the highly successful results from CNNs.
In addition to these different machine learning architectures, there are different approaches in the use of each architecture. For example, NNs can be developed to directly calculate the DoA via regression, or output probabilities of each possible DoA within a discretized domain, known as classification. Work such as [23] leverages NNs to denoise the estimate of the covariance matrix, which can be fed to classical covariance-based DoA estimation methods. Other works leverage the structure of classical methods and seek to improve on them such as [31] and [32] which both utilize the structure of MUSIC in the development of their ML-based approach. More recently, there has been work specifically designing NNs for DoA estimation accounting for imperfect data whether due to array perturbations, sensor mutual coupling, and gain or phase errors [24], [32].
For a more comprehensive review of the current status of ML-based DoA estimation methods the reader is referred to the following review papers [33] and [34].
The rapid expansion in the use of ML for DoA estimation has resulted in a significant amount of work to identify reductions in DoA error compared to classical methods without discussing the trade-offs (other than computational cost and training time). Specifically, the missing unbiased constraint in the development of ML-based DoA estimators is a significant trade-off that needs to be studied. This is especially true in relatively simple problems with perfect information where efficient estimators are well known, for example, the CBF for the single source in additive white Gaussian noise (AWGN) problem [1]. The application of ML-based methods to cases with efficient estimators may be applicable but a discussion of the bias-variance trade-off should be included and is often missing.
It is the authors’ belief that a more thorough discussion of the trade-offs of ML-based methods is required. For example, the use of ML-based methods in imperfect information cases (such as perturbed arrays, arrays with missing sensors, muffled arrays, or mutual sensor coupling) is a promising use case for ML-based methods because efficient estimators do not exist due to the unknown state of the array. Additionally, the use of ML-based methods such as denoising autoencoders to improve the estimate of the sample covariance matrix prior to the use of traditional covariance based DoA estimation schemes [23], as discussed above, is another interesting application.
The advantage of the ML-based methods is that no assumptions about the signal model need to be made a priori. ML algorithms utilize the training data to learn a model that represents the physics of the training data. The key point of training is to develop a model that is able to generalize to cover all data. This is especially important if the training data does not span the entire physics of the system. This is where ML methods have the greatest advantage over classical methods, as no signal structure is assumed in training and developing models of signal structure can be difficult when considering real-world effects. CBF, Capon’s beamformer, and MUSIC all make assumptions about the state of the sensors when estimating DoA. This is problematic when there is a mismatch between the assumed physical model and the actual physical model. This mismatch can be caused by failed sensors, sensor perturbations, sensor calibration drift, non-isotropic sensors, near-field sources, and others.
Furthermore, in conventional estimation problems, when developing an estimator it is common to first attempt to derive an unbiased estimator [35]. Unbiased estimators are typically desired because the expected value of the estimator, \begin{equation*} MSE = bias^{2} + variance \tag {1}\end{equation*}
This can result in the ML methods achieving significantly lower mean square error (MSE) compared to classical methods, but this MSE improvement is achieved by trading off bias for variance. This becomes especially important when coupled with the overuse of ML-based methods where classical approaches are well suited.
The specific contributions of this research are:
To illustrate that it is unnecessary to utilize ML-based methods for DoA estimation problems with perfect information a priori when unbiased estimators are desired.
To demonstrate the ability of ML-based methods (in this case a CNN) to generalize to imperfect information cases better than classical methods such as MUSIC and CBF.
To show that CNNs can learn a more general function for DoA estimation compared to classical methods when accounting for imperfect information.
To illustrate that the CNNs can converge to a biased estimator.
To outline that MUSIC in general has lower errors in DoA estimation for muffled sensors compared to missing sensors.
This paper is organized as follows: Section II outlines the signal model used to generate the training, validation, and test data for this work including details of the different imperfect information cases; Section III reviews important background topics; Section IV provides an overview of the CNN architecture and the training, validation, and prediction process; Section V presents the results and discussion; and Section VI provides the conclusions.
Conventions: Boldface lowercase math symbols denote vectors and boldface uppercase math symbols denote matrices. H denotes Hermitian.
Signal Model
This paper utilizes a uniform linear array (ULA) with L sensors and inter-sensor spacing \begin{equation*} \mathbf {X}=\sum _{m=1}^{M} \mathbf {v}_{m} \mathbf {s}_{m} + \mathbf {N} \tag {2}\end{equation*}
where the array manifold vector is denoted by v with size
When multiple independent snapshots are combined, the result is a \begin{equation*} \mathbf {X}=\begin{bmatrix} \mathbf {x}_{1}& \quad \mathbf {x}_{2}& \quad \ldots & \quad \mathbf {x}_{K} \end{bmatrix} \tag {3}\end{equation*}
This signal model is then used as the basis to create four different array conditions (shown in Figure 1) to evaluate MUSIC, CBF, and the CNNs under both single-source and two-source present cases. The following subsections outline the four array cases, the reason for studying each case, and the modifications to the signal model to model each of these cases.
A. Perfect Array Data
The perfect array data case is used to train, validate, and test the ML models. No modifications to the signal model outlined above are required.
For the single-source case, the u-space is discretized into 100 bins and
For the two-source case, the same process is followed except for the locations of the sources. For the training and validation data, each data sample randomly places two uncorrelated sources of equal strength in the u-space domain by sampling uniformly from the u-space discretization. For the test data, the equal strength sources are located at the half-power beam width (HPBW) of the array for each data sample.
B. Perturbed Array Data
The perturbed array data case is used only as a test case for the ML models and classical methods. The purpose of introducing this test case is to demonstrate the ability of ML methods to generalize to cases of imperfect information. The perturbed case models the real-world scenario where over time sensor positions will drift, resulting in decreased array performance. The number of test data samples and u-space discretization for the perturbed case is the same as for the perfect array test cases. The perturbed array data is generated by adding a perturbation value to each sensor location in the array manifold vector. The array perturbation values are calculated by sampling a Gaussian random variable, with a mean of zero and variance of 0.4, independently for each sensor. This sampling operation is conducted independently for each test data sample.
C. Missing Array Data
The missing array data case is used only as a test case for the ML models. The purpose of introducing this test case is to demonstrate the ability of ML methods to generalize to cases of imperfect information where random sensors in the array may fail over time. The missing array data is generated by setting the rows of the data matrix that correspond to missing sensors to 0 utilizing the same u-space discretization and number of data samples as the previous cases.
The missing sensor cases analyzed in this work are all combinations of 4 failed sensors in a 10 sensor array resulting in 210 array combinations. The majority of the figures throughout this paper will be presented for a specific SNR and all array combinations. Each combination has been assigned a configuration number to condense the plots. The mapping between array combinations and configuration number is provided in Figure 2.
D. Muffled Array Data
The muffled array data case is used as a test case for the ML models. The purpose of introducing this test case is to demonstrate the ability of ML methods to generalize to the scenario of sensor performance decreasing due to increased noise on the sensors. The muffled array data is generated by multiplying the muffled sensor data by 0.1 after the frequency domain snapshot data matrix is calculated. This represents the effects of increased noise on the desired sensors. The muffled array data is generated using the same u-space discretization and number of data samples as the previous cases. The same sensor combinations as those used for the missing sensor array are used for the muffled sensor array except each configuration represents muffled sensors.
Review of Important Topics
A. Conventional Beamforming
CBF is a classical approach used to estimate DoA. In the frequency domain, it relies on the assumed constant phase shift between each successive sensor in the array [37]. Thus, CBF works by scanning across the entire u-space domain and for each u-space discretization: (a) applies the appropriate phase shift to each sensor, (b) coherently sums the phase-shifted data, and (c) divides the sum by the number of snapshots. The resulting level is then set as the spectral amplitude of that DoA. For the single-source case, the maximum amplitude in the resulting spectrum is located at the predicted DoA. CBF is conducted for all test cases, as it is expected to have poor performance in the imperfect information cases since it relies on the constant phase-shift assumption, which fails when the array is not perfect.
B. Overview of Multiple Signal Classification
MUSIC is a subspace-based (or super-resolution) method for DoA estimation [1] and is used as a benchmark comparison for the CNNs developed in this work. MUSIC relies on the orthogonality of the signal + noise and noise subspaces in the frequency domain snapshot model. MUSIC works by calculating and sorting from largest to smallest the eigenvalues and eigenvectors of the sample covariance matrix given by (4). The first n eigenvectors are the signal eigenvectors and the remaining are the noise eigenvectors, denoted by \begin{align*} \mathbf {C_{s}} & = \mathbf {X} \mathbf {X}^{H}/K \tag {4}\\ \mathbf {f} & = \frac {1}{\mathbf {v}^{H} \mathbf {e_{n}} \mathbf {v}} \tag {5}\end{align*}
C. Overview of Cramer-Rao Lower Bound
The Cramer-Rao Lower Bound (CRLB) represents the lower bound of the variance for a minimum variance unbiased estimator (MVUE) [35]. The CRLB is given by (6), assuming that the regularity condition in (7) is satisfied, where p is the probability density function (PDF) of the random variable x with unknown parameter \begin{align*} & var\left ({{ \hat {\theta } }}\right ) \geq \frac {1}{-\mathbf {E} \left [{{ \frac {\partial ^{2} \ln p\left ({{\mathbf {x};\theta }}\right )}{\partial ^{2} \theta }}}\right ]} \tag {6}\\ & \mathbf {E} \left [{{ \frac {\partial \ln p\left ({{\mathbf {x};\theta }}\right )}{\partial \theta } }}\right ] = 0 \tag {7}\end{align*}
The CRLB for DoA estimation with a line array for plane waves in AWGN is given by (8) [1] where \begin{align*} \mathbf {C}_{CR} \left ({{ u }}\right ) & = \frac {\sigma ^{2}_{w}}{2K} \{ \Re \left [{{ \left ({{ \mathbf {S_{f}} \mathbf {V}^{H} \mathbf {S_{x}}^{-1} \mathbf {V} \mathbf {S_{f}} }}\right ) \odot \mathbf {H^{T}} }}\right ] \} ^{-1} \tag {8}\\ \mathbf {S_{x}} & = \mathbf {V} \mathbf {S_{f}} \mathbf {V}^{H} + \sigma ^{2}_{w} \mathbf {I} \tag {9}\\ \mathbf {P_{V}} & = \left [{{ \mathbf {I} - \mathbf {V} \left ({{ \mathbf {V}^{H} \mathbf {V} }}\right )^{-1} \mathbf {V}^{H} }}\right ] \tag {10}\\ \mathbf {H} & \triangleq \mathbf {D}^{H} \mathbf {P_{V}}^{\perp } \mathbf {D} \tag {11}\end{align*}
The CRLB for each array case is calculated by modifying the array manifold vector as outlined in II. Note that the standard CRLB does not apply to the muffled array case. Due to this, the CRLB for the missing sensor case is utilized for the muffled sensor CRLB throughout this paper to provide a reference CRLB for the muffled case. Additionally, no CRLB is provided for the perturbed case because each array manifold vector perturbation value is a random variable, so if Q perturbed data samples are created this would result in Q CRLBs. This is impractical to plot, and due to this constraint, no CRLB is presented for the perturbed cases.
Convolutional Neural Network Details
A CNN is a ML network architecture that utilizes filters to extract information about input data [38] to learn a generalized model of the data. The layers in which these filters are applied are called convolutional layers. These convolutional layers increase the dimensionality of the data as multiple filters of a defined size are applied to the input data. These convolutional layers are then typically followed by normalization, activation, pooling, and dropout layers. The batch normalization layers are used to reduce the impacts of the changing statistical distribution of each layer during training, which can lead to the vanishing gradient problem [26]. The change in statistical properties during training is commonly called the internal covariate shift [39]. Next, activation layers are used to allow the network to learn non-linear trends [40], and pooling layers can be used to reduce the dimensionality of the data [41]. These layers are then typically followed by a dropout layer to aid in the generalization capabilities of the neural network [42]. These operations can be grouped together and performed multiple times to allow each group of operations to learn different features of the data.
In supervised learning, training and validation data, with already known outputs, are used as input to the network such that the network makes predictions based on the input training data. The error between the predicted output of the neural network and the known output is calculated. The weights and biases are then updated based on the magnitude of the error and the learning rate through a process called back propagation [43]. A block diagram of this process is provided in Figure 3. The process is repeated by looping over all the data (called epochs) until the optimal performance of the given network architecture and initialization conditions are met. The optimal performance of the network is found by leveraging the validation data which is not used to update the weights and biases. Instead, it is used to periodically check the error of the network during training. Using separate validation data to confirm the accuracy of a network is needed because the neural network would eventually memorize the input data, called overfitting [40]. The optimal network is typically the one that produces the minimum error on the validation data, not on the training data, as illustrated in Figure 4.
A. Overview of Network Structure & Training Parameters
An overview of the NN architecture used in this work is provided in Figure 5. The network consists of an input layer followed by a two-dimensional convolutional layer, batch normalization layer, activation layer, dropout layer, and max pooling layer. The preceding 5 layers are then repeated 3 times creating three blocks of layers, with the only change being a doubling in the total number of filters for each successive convolutional layer. This is then followed by another dropout layer, a flattening layer, and the final fully connected layer with the number of neurons equal to the number of sources.
The designed CNN where 2D conv indicates 2D convolutional layers with the number of filters and size of the kernel specified, m indicates the slope for the negative portion of ReLU, p denotes the percentage of neurons dropped in a dropout layer, and max pooling indicates a maximum pooling layer with the window and stride size indicated.
The filter size used for all convolutional layers in this network is
The specific selection of the network size and hyperparameters of the network were determined by training a few trial CNNs and relying on past work by the authors. The trial CNNs used one to three of the previously described blocks of layers and the validation error of the networks was monitored to select the final number of blocks. Additional trial CNNs were developed to select the dropout percentage and slope parameter for leaky ReLU. The size of the convolution filters was selected by leveraging previous research indicating that larger filter sizes directly lead to lower error in DoA estimation [21]. A filter size of 20 was selected because the size of the input data in this work was
B. Training and Validation Results
For this work, SNR-specific and number of source specific CNNs were trained using the perfect array data generated by the signal model outlined in Section II. Each data sample generated by the signal model is an \begin{equation*} \mathbf {X}(i) = \frac {2*(\mathbf {X}(i) - min(\mathbf {X}))}{(max(\mathbf {X}) - min(\mathbf {X})) - 1}; \tag {12}\end{equation*}
The training and validation loss curves, plotted on a semi-log plot, for a CNN trained on the single-source 0 dB SNR case for 100 epochs is shown in Figure 7. This plot shows two training plateaus in the data, one between 5 and 20 epochs and a second above 20 epochs. This indicates that additional training time beyond the 10 epochs conducted for all the networks may have lead to improved results on the test data. However, training beyond 10 epochs was not pursued as the performance of the CNNs after 10 epochs of training was able to demonstrate the desired results without the prohibitive increase in training time for all 210 CNNs. It should be noted that this loss plot does not show the typical parabola shaped validation loss due to the training and validation data being generated from the same signal model thus the training data spans the validation data statistics. If a different set of validation data was used, for example the perturbed array data, the validation loss curve would have the parabolic shape indicating over-fitting.
Training and validation loss curves for the single-source 0 dB SNR CNN trained for 100 epochs.
C. Predictions
After training, the CNNs can be used to make predictions utilizing the test data. Predictions are generated for the perfect, perturbed, missing, and muffled array cases for both the single-source and two-source cases. Before making predictions, as outlined above, the complex valued data needs to be converted to real values only and is also normalized. It is important to note in this step that the data after normalization is not bounded on the range −1 to 1 as the maximum and minimum values used in (12) are the maximum and minimum from the training data. This is done because the testing data may have data outside the bounds of what was included in the training data, and thus compressing it to the same range of the training data is not statistically valid. The MSE in dB between the neural network prediction and the true value is then calculated by averaging the error across all test samples accounting for the periodicity of direction cosines as shown in (13).\begin{align*} MSE = 10*log_{10} \left ({{\frac {1}{N} \sum _{i=1}^{N} min \left ({{ \begin{array}{lllllllllll} {\left ({{ \hat {\theta } - \theta }}\right )^{2},} \\ {\left ({{ \hat {\theta } - \left ({{ \theta +2 }}\right ) }}\right )^{2},} \\ {\left ({{ \hat {\theta } - \left ({{ \theta -2 }}\right ) }}\right )^{2}} \end{array} }}\right ) }}\right ) \tag {13}\end{align*}
Results and Discussion
The following section outlines the results when comparing the MSE in dB for the CNNs, MUSIC, CBF, and the CRLB.
A. Perfect Array Results
Figure 8 plots the MSE in dB against SNR for the single-source (top figure) and two-source (bottom figure) cases. For the single-source case, it can be readily seen that at low SNR the CNNs and MUSIC are nearly equivalent. At higher SNR, the CNNs outperform MUSIC and exceed the CRLB. For the two-source case, the CNNs outperform MUSIC at lower SNR only; however, the low SNR case is typically the most relevant case, as locating high SNR signals is less challenging. Additionally, it is expected that MUSIC will perform well in the perfect data case, as there is no model mismatch (perfect information) between the assumptions in MUSIC and the actual data. For both the single and two-source cases, both MUSIC and the CNNs outperform the CBF as expected.
Comparison of MSE between music, CBF, and the CNNs with the CRLB for a perfect array with 1 source (above) and 2 sources.
For both signal cases, the CNNs are below the CRLB for at least a portion of the SNR range. This indicates that the CNNs are biased estimators. This becomes apparent when looking at the definition of MSE in terms of bias and variance as previously defined in (1). Since the CRLB is the lower bound of the variance for an unbiased estimator, the only way to reduce the MSE below the CRLB is to introduce bias into the estimator [45]. Additionally, at lower SNR where the CNNs are above the CRLB the CNN-based estimator may still be biased because the training data is not guaranteed to have the same statistical properties of the test data which would result in a biased estimator for the test data.
B. Perturbed Array Results
Figure 9 plots the results of estimating DoA for a randomly perturbed array using CNNs trained on perfect array data, MUSIC, and CBF. Unlike the perfect array case, there is a model mismatch (imperfect information) between the data and assumptions in all estimation methods. This results in significantly degraded performance of MUSIC and CBF for both signal cases. The CNNs also suffer from degraded performance relative to the perfect array case but outperform MUSIC and CBF. This indicates that the CNNs have learned a more generalizable function for DoA estimation compared to MUSIC. However, this is likely at the cost of bias, as discussed in the perfect array results section. This case highlights the trade-offs of utilizing CNNs. The size of the CNNs can be made arbitrarily large to drive the MSE lower while also being more generalizable than classical methods, but this comes with the trade-off of bias in the estimator which cannot be removed with additional sampling [35]. The CNNs will converge to their biased estimate, while an unbiased estimator will converge to the true answer as the number of snapshots increases.
Comparison of MSE between music, CBF, and the CNNs with the CRLB for a perturbed array with 1 source (above) and 2 sources.
C. Missing Sensor Array Results
Figures 10 & 11 are the results at -10, 0, and 10 dB SNR for all 210 missing sensor combinations, as outlined in Section II. The single-source case results show that at −10 dB SNR the CNNs outperform MUSIC and CBF for all possible missing sensor combinations. The performance advantage of the CNNs relative to MUSIC then decreases as SNR increases. For the two-source case, the CNNs outperform MUSIC for almost all missing sensor combinations up to 0 dB SNR. At higher SNRs the performance of MUSIC increases faster than that of the CNNs; however, as stated previously the low SNR cases are typically the ones of more interest. It should be additionally noted that for the two-source case the CNNs exceed the CRLB indicating they are guaranteed to be biased estimators.
Comparison of MSE between music, CBF, and the CNNs with the CRLB for 1 source at -10 (top), 0 (middle), and 10 (bottom) dB for all missing sensor combinations.
Comparison of MSE between music, CBF, and the CNNs with the CRLB for 2 sources at HPBW at -10 (top), 0 (middle), and 10 (bottom) dB for all missing sensor combinations.
The performance differences between MUSIC and the CNNs specifically are better visualized in the colormaps provided in Figure 12. Each green portion of the plot indicates an SNR and array configuration where a CNN has a lower MSE than MUSIC. These results further indicate that at lower SNR values the CNNs are able learn a more generalizable function for DoA estimation compared to MUSIC.
Heatmaps indicating in green SNR values and missing sensor configurations where the CNNs outperform music for 1 source (top) and 2 sources at HPBW (bottom).
The colormaps also highlight two trends from the data, the first being that for both the single-source and two-source cases, there are threshold SNR values that are tipping points where large numbers of array configurations switch from being better modeled by MUSIC to being better modeled by the CNNs. Secondly, the vertical striping indicates that some missing sensor combinations are very well represented by the CNNs and are almost always better than MUSIC. Further research is required to understand these two trends.
D. Muffled Sensor Array Results
Figures 13 & 14 are the results at
Comparison of MSE between music, CBF, and the CNNs with the CRLB for 1 source at -10 (top), 0 (middle), and 10 (bottom) dB for all muffled sensor combinations.
Comparison of MSE between music, CBF, and the CNNs with the CRLB for 2 sources at HPBW at -10 (top), 0 (middle), and 10 (bottom) dB for all muffled sensor combinations.
Heatmaps indicating in green SNR values and muffled sensor configurations where the CNNs outperform music for 1 source (top) and 2 sources at HPBW (bottom).
The colormaps also illustrate the same threshold SNR trend seen in the missing sensor case. The vertical striping is also present for the muffled case, but only for the single-source case unlike in the missing sensor case. Further research is required to understand these two trends.
Conclusion
In this study, the trade-offs between ML-based and classical methods for DoA estimation were examined. We trained SNR and number of source specific CNNs utilizing perfect array assumptions for the ML-based approach. We then tested the CNNs against a test data set that included perfect, perturbed, missing, and muffled array cases. MUSIC and CBF algorithms were also evaluated against these same four array cases. The results demonstrate two findings. First, we illustrated that it is unnecessary to utilize ML-based methods for DoA estimation problems with perfect information a priori when unbiased estimators are desired. For the perfect array case, MUSIC achieved similar results as the CNNs developed for this work, because there was no model to data mismatch.
Second, we found that for the other array cases (perturbed, missing, and muffled), there is a model to data mismatch resulting in an imperfect information scenario. In these cases, we have shown that CNNs generally performed better than MUSIC and CBF at low SNR. This indicates that the model-free method (the CNN) learned a more general model of the ULA compared to the assumptions in MUSIC. These results illustrate that CNNs can offer improved robustness to real-world array cases with imperfect information because the CNNs were trained only using the perfect array case. Although the use of CNNs for DoA estimation in imperfect information cases offers improved accuracy compared to classical methods, this comes at the cost of the CNNs being biased estimators, as illustrated by the CNN results at times exceeding the CRLB. Thus, the estimator will not converge to the true answer as the number of snapshots increases. Additionally it should be noted, that the CNN results could be improved further (via a larger network and additional hyperparameter tuning) which would further highlight this trend. Lastly, this paper demonstrated that MUSIC is more robust to muffled sensors compared to missing sensors.
This paper has shown there is minimal benefit to utilizing CNNs for perfect array geometry. The utility of CNNs is their ability to learn generalized functions, allowing something trained on a specific signal model (in this case the perfect array) to be applied to a different signal model (perturbed, missing, and muffled signal models) and achieve satisfactory results.