Journals & Magazines >IEEE Access >Volume: 13

Convolutional Neural Networks for Direction of Arrival Estimation Compared to Classical Estimators and Bounds

The use of convolution neural networks (CNN) for direction of arrival estimation is best suited to cases of imperfect information where classical methods such as multiple...

Abstract:

Recently, there has been a proliferation of applied machine learning (ML) research, including the use of convolutional neural networks (CNNs) for direction of arrival (Do...Show More

Metadata

Abstract:

Recently, there has been a proliferation of applied machine learning (ML) research, including the use of convolutional neural networks (CNNs) for direction of arrival (DoA) estimation. With the increasing amount of research in this area, it is important to balance the performance and computational costs of CNNs with classical methods of DoA estimation such as Multiple Signal Classification (MUSIC). We outline the performance of both methods of DoA estimation for single-source and two-source cases for multiple array conditions. The results are also compared to the Cramer-Rao lower bound (CRLB) and conventional beamforming. For each source case, CNNs were trained for a perfect uniform line array (ULA) and tested against data from a perfect ULA, perturbed ULAs, ULAs with missing sensors, and ULAs with muffled sensors. We show that for the single-source case, the CNNs do not offer any performance improvement relative to MUSIC at low signal-to-noise ratio (SNR). For the two-source cases, the CNNs perform better than MUSIC but only at low SNR. For the remaining array cases, the CNNs outperform MUSIC. These results indicate that the performance improvements from CNNs are highest for situations where there is signal model to data mismatch (imperfect information). This work also illustrates that the CNN estimators developed in this work exceed the CRLB and are biased estimators caused by the lack of unbiased constraint in the loss function during training of the CNNs.

The use of convolution neural networks (CNN) for direction of arrival estimation is best suited to cases of imperfect information where classical methods such as multiple...

Published in: IEEE Access ( Volume: 13)

Page(s): 25533 - 25545

Date of Publication: 04 February 2025

Electronic ISSN: 2169-3536

DOI: 10.1109/ACCESS.2025.3538997

Funding Agency:

References is not available for this document.

Contents

CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.

SECTION I.

Introduction

Direction of arrival (DoA) estimation is the process of determining the location of signals arriving at a group of sensors. DoA estimation is performed in many fields, such as radar, telecommunications, speech, and sonar [1] and [2]. This problem has been studied for many decades with the initial development of conventional beamforming (CBF) by Bartlett in 1948 [3], followed by methods including Capon’s beamformer in 1969 [4], the minimum norm method [5], [6] in 1979, multiple signal classification (MUSIC) [7] in 1986, estimation of signal parameters via rotational invariance techniques (ESPRIT) [8] in 1989, and more recently compressive sensing [9]. Starting in the 1990s, machine learning (ML) based approaches began to be applied to source localization [10]; however, the use of ML methods (also known as model-free, or data driven methods) has significantly increased in the last decade due to both the advancement and use of graphical processing units (GPUs) for ML and the more readily available off-the-shelf ML packages such as TensorFlow, PyTorch, or MATLAB’s own toolboxes.

Many different types of ML approaches have been studied for DoA estimation such as multi-layer perceptrons (MLPs), support vector machines (SVMs), convolutional neural networks (CNNs), autoencoders, residual networks (ResNets), and vision transformers (ViTs) along with hybrid approaches combining multiple network types.

MLPs were one of the first ML-methods used for DoA estimation in [11] which used an MLP with non-linear activation functions to estimate DoA for signals in random noise. MLPs continued to be used today such as in [12] which developed a deep neural network (DNN), which is functionally similar to an MLP, to perform DoA estimation for up to four sources in which the network also estimates the number of sources present. Additionally, MLPs has been used to perform DoA estimation and number of source estimation in non-gaussian noise as illustrated in [13] and they have been extended to subarray sampling use cases [14].

SVMs have also been applied to the DoA estimation problem including [15] which developed a multi-class SVM algorithm for DoA estimation and [16] which used support vector regression in wireless communication which was then validated with experimental data in [17]. More recently a sensitivity study of SVM error performance to hyperparameters has been performed [18].

The use of CNNs include [19] which developed a CNN focused on DoA estimation in low SNR that performed well for low numbers of snapshots and in cases of SNR mismatches while also estimating the number of sources present. Reference [20] studied the improvements in CNN classification accuracy when imposing the natural shift-invariant structure present in ULAs (and other arrays) on the input data to the CNN. Reference [21] then extended this work to CNNs but posed the ML problem as a regression problem. Reference [22] has also extended CNNs for three-dimensional DoA estimation while also developing a output layer formulation allowing for probabilities at each output neuron to be calculated providing the ability to apply confidence intervals to predictions.

Autoencoders, more specifically denoising autoencoders, have been leveraged in DoA estimation in order to improve the estimate of the sample covariance matrix [23]. The denoised sample covariance matrix is then used in classical covariance based DoA estimation schemes. An autoencoder was also used in [24] to perform DoA estimation but included array imperfections seeking to account for real world scenarios. ResNets have also been applied to DoA estimation in [25]. ResNets were originally developed to overcome the vanishing gradient problem [26] by including skip connections across single or multiple layers of a neural network (NN). More recently, ViTs have been used for multi-class classification of DoA in [27]. This work is an outgrowth of the success of transformers in natural language processing [28], and their extension to image classification with ViTs which have been shown to outperform CNNs in image classification [29] and [30] leading to a natural extension of applying ViTs to DoA estimation due to the highly successful results from CNNs.

In addition to these different machine learning architectures, there are different approaches in the use of each architecture. For example, NNs can be developed to directly calculate the DoA via regression, or output probabilities of each possible DoA within a discretized domain, known as classification. Work such as [23] leverages NNs to denoise the estimate of the covariance matrix, which can be fed to classical covariance-based DoA estimation methods. Other works leverage the structure of classical methods and seek to improve on them such as [31] and [32] which both utilize the structure of MUSIC in the development of their ML-based approach. More recently, there has been work specifically designing NNs for DoA estimation accounting for imperfect data whether due to array perturbations, sensor mutual coupling, and gain or phase errors [24], [32].

For a more comprehensive review of the current status of ML-based DoA estimation methods the reader is referred to the following review papers [33] and [34].

The rapid expansion in the use of ML for DoA estimation has resulted in a significant amount of work to identify reductions in DoA error compared to classical methods without discussing the trade-offs (other than computational cost and training time). Specifically, the missing unbiased constraint in the development of ML-based DoA estimators is a significant trade-off that needs to be studied. This is especially true in relatively simple problems with perfect information where efficient estimators are well known, for example, the CBF for the single source in additive white Gaussian noise (AWGN) problem [1]. The application of ML-based methods to cases with efficient estimators may be applicable but a discussion of the bias-variance trade-off should be included and is often missing.

It is the authors’ belief that a more thorough discussion of the trade-offs of ML-based methods is required. For example, the use of ML-based methods in imperfect information cases (such as perturbed arrays, arrays with missing sensors, muffled arrays, or mutual sensor coupling) is a promising use case for ML-based methods because efficient estimators do not exist due to the unknown state of the array. Additionally, the use of ML-based methods such as denoising autoencoders to improve the estimate of the sample covariance matrix prior to the use of traditional covariance based DoA estimation schemes [23], as discussed above, is another interesting application.

The advantage of the ML-based methods is that no assumptions about the signal model need to be made a priori. ML algorithms utilize the training data to learn a model that represents the physics of the training data. The key point of training is to develop a model that is able to generalize to cover all data. This is especially important if the training data does not span the entire physics of the system. This is where ML methods have the greatest advantage over classical methods, as no signal structure is assumed in training and developing models of signal structure can be difficult when considering real-world effects. CBF, Capon’s beamformer, and MUSIC all make assumptions about the state of the sensors when estimating DoA. This is problematic when there is a mismatch between the assumed physical model and the actual physical model. This mismatch can be caused by failed sensors, sensor perturbations, sensor calibration drift, non-isotropic sensors, near-field sources, and others.

Furthermore, in conventional estimation problems, when developing an estimator it is common to first attempt to derive an unbiased estimator [35]. Unbiased estimators are typically desired because the expected value of the estimator, $E\left ({{\hat {\theta }}}\right)$ , will be the true value, $\theta $ , while a biased estimator will only converge to $\theta + b$ , where b is the bias of the estimator. If an unbiased estimator cannot be derived, the unbiased constraint can then be removed from the derivation of the estimator. Although an unbiased constraint can be easily applied in classical estimation theory, ML-based methods utilize gradient descent methods, such as Adam [36], to iteratively converge to a solution utilizing training data and a loss function, typically root mean square error (RMSE) in regression problems. This can result in the ML estimator being biased for two reasons. First, if the training data for the ML-based method does not span the entire statistics of the test data, predictions of the test data will be biased because the statistics of the training data do not match the statistics of the test data. Second, the use of RMSE and mean square error (MSE) as the loss functions during training result in no unbiased constraint being applied to the weight updates during back propagation. For example using MSE, defined in (1), the gradient-descent algorithm will seek to minimize MSE which does not impose an explicit constraint on the bias term. This results in the ML-based method finding an estimator that achieves the lowest error, but is not guaranteed to be efficient and therefore unbiased.\begin{equation*} MSE = bias^{2} + variance \tag {1}\end{equation*} View Source

This can result in the ML methods achieving significantly lower mean square error (MSE) compared to classical methods, but this MSE improvement is achieved by trading off bias for variance. This becomes especially important when coupled with the overuse of ML-based methods where classical approaches are well suited.

The specific contributions of this research are:

To illustrate that it is unnecessary to utilize ML-based methods for DoA estimation problems with perfect information a priori when unbiased estimators are desired.
To demonstrate the ability of ML-based methods (in this case a CNN) to generalize to imperfect information cases better than classical methods such as MUSIC and CBF.
To show that CNNs can learn a more general function for DoA estimation compared to classical methods when accounting for imperfect information.
To illustrate that the CNNs can converge to a biased estimator.
To outline that MUSIC in general has lower errors in DoA estimation for muffled sensors compared to missing sensors.

This paper is organized as follows: Section II outlines the signal model used to generate the training, validation, and test data for this work including details of the different imperfect information cases; Section III reviews important background topics; Section IV provides an overview of the CNN architecture and the training, validation, and prediction process; Section V presents the results and discussion; and Section VI provides the conclusions.

Conventions: Boldface lowercase math symbols denote vectors and boldface uppercase math symbols denote matrices. ^H denotes Hermitian. $\odot $ denotes the Hadamard product. $\mathbf {a}\sim \mathcal {CN}(\boldsymbol {\mu },\boldsymbol {C})$ indicates that a is a complex random vector with normal distribution, mean $\boldsymbol {\mu }$ , and covariance C. $\mathop {\mathrm {diag}}\nolimits ([a_{0},a_{1},\ldots ,a_{L-1}])$ is an $L\times L$ diagonal matrix containing the elements $a_{0},~a_{1}$ , $\ldots $ , $a_{L-1}$ along its diagonal. I denotes an identity matrix. K denotes the number of snapshots. $\lambda $ indicates wavelength. $^{\perp }$ denotes the orthogonal operation.

SECTION II.

Signal Model

This paper utilizes a uniform linear array (ULA) with L sensors and inter-sensor spacing $d=\lambda /2$ . The sensors are placed on the z-axis with coordinates from $0,\lambda /2,\ldots ,(L-1)\lambda /2$ . The plane wave assumption is made, which results in a specific wave creating an incident angle with the array of $\theta _{s}$ , with a direction cosine given by, $u_{s}=\cos (\theta _{s})$ , where s denotes the specific plane wave [1]. The frequency domain signal snapshot model is given by (2),\begin{equation*} \mathbf {X}=\sum _{m=1}^{M} \mathbf {v}_{m} \mathbf {s}_{m} + \mathbf {N} \tag {2}\end{equation*} View Source

where the array manifold vector is denoted by v with size $L \times 1$ with each row entry corresponding to a specific direction cosine $u_{s}$ . The $i^{th}$ row of v is $\exp (j\pi u_{s} (i-1))$ [1]. The vector $\mathbf {s}_{m}$ is the complex amplitude of the signal with distribution $\mathcal {CN}(0,\sigma _{s}^{2})$ and size $1 \times K$ . N is AWGN with size $L \times K$ . Lastly, m denotes the specific source.

When multiple independent snapshots are combined, the result is a $L\times K$ matrix given by (3).\begin{equation*} \mathbf {X}=\begin{bmatrix} \mathbf {x}_{1}& \quad \mathbf {x}_{2}& \quad \ldots & \quad \mathbf {x}_{K} \end{bmatrix} \tag {3}\end{equation*} View Source

This signal model is then used as the basis to create four different array conditions (shown in Figure 1) to evaluate MUSIC, CBF, and the CNNs under both single-source and two-source present cases. The following subsections outline the four array cases, the reason for studying each case, and the modifications to the signal model to model each of these cases.

FIGURE 1.

Array cases.

Convolutional Neural Networks for Direction of Arrival Estimation Compared to Classical Estimators and Bounds

Alerts

Abstract:

Metadata

Abstract:

Funding Agency:

Introduction

Signal Model

A. Perfect Array Data

B. Perturbed Array Data

C. Missing Array Data

D. Muffled Array Data

Review of Important Topics

A. Conventional Beamforming

B. Overview of Multiple Signal Classification

C. Overview of Cramer-Rao Lower Bound

Convolutional Neural Network Details

A. Overview of Network Structure & Training Parameters

B. Training and Validation Results

C. Predictions

Results and Discussion

A. Perfect Array Results

B. Perturbed Array Results

C. Missing Sensor Array Results

D. Muffled Sensor Array Results

Conclusion

Authors

Figures

References

Keywords

Metrics

References