Introduction
Accurate user positioning is an enablers of several future services and technologies [1], [2], [3], [4] such as location-aware communication, vehicle to everything (V2X) applications, industrial internet of things (IIOT), cooperating robots, commercial applications, etc. For this purpose, radio-based positioning of user equipment (UE) in wireless communication networks can be considered [5]. Multiple base stations (BSs) deployed in such networks allow the collection of channel state information (CSI) over distributed links, which can be exploited for positioning of a UE. The CSI consists of the channel across the spatial and frequency domain, where the large number of antennas and large available bandwidth of current and future communication networks [4], e.g., fifth generation (5G) or upcoming sixth generation (6G), can provide a high angular and temporal resolution to enable high accuracy positioning.
Conventional radio-based positioning methods are generally model-based and usually follow a two-step approach. With CSI estimated at one BS [6] or at multiple BSs [7], relevant parameters or measurements e.g., path delay, angle of arrival (AoA), reference signal receive power (RSRP), time difference of arrival (TDoA), etc, are first determined to subsequently compute the UE’s position in a second step. Recently, machine learning (ML) and artificial intelligence (AI)-based techniques have also been proposed for radio-based UE positioning [8], [9], [10], [11], [12] which are primarily data-driven and not model-based. In particular, deep learning (DL) methods, particularly convolutional neural networks (CNNs) have shown promising results [13], [14], [15], [16], being able to achieve sub-meter accuracy. In such data-driven models, the CSI over subcarriers and antennas of a UE at a given position is considered as a fingerprint associated with the UE’s position. By leveraging the ability of wireless networks to collect large amounts of data, a database of CSI fingerprints associated with different UE’s positions along with the respective UE’s position label can be constructed. With the DL-based positioning methods, a neural network (NN) can be trained on a given database, such that afterwards the NN can be employed for estimating a UE’s position by providing the CSI of the UE as its input. Different types of fingerprints have been considered in the literature, including the received signal strength (RSS), the magnitude and/or phase of the CSI over subcarriers in the frequency domain and across antennas in the spatial domain [15], [16], [17], [18], [19].
With the CSI of a UE available across multiple BSs, early fusion or late fusion can be considered for the DL-based positioning methods [20], [21]. In early fusion, the CSI fingerprints from multiple BSs are collected and bundled together to constitute a single CSI fingerprint associated with the UE’s position. Thus with early fusion, only one NN needs to be trained with a database comprising with fingerprints of the CSI across multiple BSs [20]. On the other hand, with late fusion, one NN is assumed at each BS where the CSI is considered as a fingerprint of the UE’s location associated only with the given BS [21]. The NN associated with that BS is trained with a database of CSI fingerprints from that BS, enabling the NN to determine the UE’s position based only on the CSI estimated by that BS. Afterwards, a final UE’s position estimate is obtained by combining the position estimates obtained by the NNs across the multiple BSs [21], [22], e.g., with a weighted average.
The choice between early or late fusion generally depends on the application [23]. However, when considering changes in the UE-BS channel between the training phase and deployment phase, e.g., due to a blockage of the line of sight (LOS) between a UE and a BS, late fusion can benefit from uncertainty estimation [21]. In particular, the NN at each BS can be trained to estimate the uncertainty in the UE’s position determined by each NN. This enables the late fusion approach, to determine the final position estimate for the UE considering the uncertainty of the multiple position estimates obtained across multiple BSs. In practice the most reliable position estimates have a larger impact in determining the final UE’s position. Uncertainty estimation can be computed based on simple approaches like Monte Carlo Dropout (MCD) [24] and Deep Ensembles (DEs) [25], which characterize the uncertainty based on the variance of the positioning error obtained with multiple NNs, i.e., similar position estimates across the different NNs indicates lower uncertainty estimation. Uncertainty estimation methods have also been proposed in [26] and [27] to detect corrupted fingerprints.
Most conventional approaches to positioning require a strong line-of-sight (LOS) path and may be impaired in non-LOS (NLOS) conditions or when there is a strong multipath. Recent works such as [6] and [28] have shown how to take advantage of the multipath information for single anchor UE positioning but are limited to multiple-input multiple-output (MIMO) systems and require prior knowledge of the nature of the incoming paths (i.e., LOS or NLOS). On the other hand DL-based methods can still be employed in strong multipath scenarios and don’t require multiple antennas at both receiver and transmitter. Despite this fact, with the multipath profile being susceptible to environmental changes, a DL model trained with CSI fingerprints from one environment may achieve a poor performance for the UE positioning in another environment [21], [29].
The lack of direct transferability of the knowledge acquired in one environment to other environments is one of the challenges of DL-based positioning [30]. The most straightforward way to address this is to retrain the NN from scratch with CSI fingerprints from the new environment, which may however be resource expensive and may not always be feasible.The resource intensive nature of position labeling that is required can be reduced by employing channel charting [31] and by considering distance metrics between CSI fingerprints to create a map of the deployment scenario [32], [33] using no or very few position labels. On the other hand, several approaches can be considered for improving the generalizability of a trained model to adapt it to environmental changes or to a new environment including transfer learning, domain generalization, multi-task learning and meta learning. With transfer learning, a previously trained model is used as an initial model that is fine-tuned with reduced training data from a new environment [19], [29], which allows to speed up the training and to improve the performance compared to training from scratch.
Furthermore, with multi-task learning (MTL) the aim is to jointly learn multiple models by training them while also sharing some or all of their parameters, thereby benefiting from regularization [34]. Consequently, by considering positioning in different environments as different tasks, the positioning across multiple environments can be improved. When training a MTL scheme the choice of the relative importance of each task has to be considered. The hardest to learn tasks should be weighted less, so that the model focuses more on tasks that are easier to learn. Based on the uncertainty of each task, a method was proposed in [35] that takes into account the importance of each task. This method, not only provides a way to tune the importance of different tasks but also simultaneously learns the uncertainty for each task, which as shown in [21] is beneficial for the DL-based position using CSI fingerprints.
Another approach aiming at improving the generalizability of NN models is meta-learning. With meta-learning, a model is trained on multiple tasks or environments such that the minimization of the loss function in an unseen task is done more efficiently. Training is done by considering a meta-level objective such as the average positioning error across the multiple environments [30], [36]. Meta-learning aims at having a trained model that generalizes better not only across the trained tasks but also facilitates learning an unseen task with a lower number of training samples, in contrast to MTL which only aims at learning better the trained tasks.
Motivated by the two-step approach of conventional positioning methods, i.e., with parameter extraction from the CSI in a first step and a position determination in a second step, a two-part model trained with multi-task learning and a meta-level objective has been recently proposed in [37]. For UE positioning in different environments, i.e., different training tasks, different models are assumed with the first part of the models being common across all task and trained with CSI samples from all tasks (multi-task learning) aiming at minimizing the sum positioning error across all tasks (meta-level objective). The second part of the model of each task is trained to be environment specific by using only training data from each environment. The proposed approach in [37] is able to improve the positioning accuracy of the trained environments, as well as achieve a better generalizability when transferring the first part of the model and fine tuning the two-part model with CSI samples of a new environment.
A. Contributions
As proposed in [35], MTL benefits from uncertainty estimation. The training in MTL can be improved by determining the relative weighting of the losses of each task based on the associated uncertainty estimate [35]. For this reason, in this paper we combine the results from [21] and [37] to benefit from the MTL of different positioning tasks and from late fusion using uncertainty estimation. For a setup with multiple BSs and considering the positioning of a UE using each BS as a separate task, we show that employing a MTL scheme with uncertainty estimation and late fusion achieves high positioning accuracy. Additionally, even though this is outside the scope of the current paper, it was shown in [37] that a model trained with the MTL scheme can be further used for transfer learning in a new environment, reducing the time and amount of data that needs to be gathered.
Moreover, we extend the work in [21] by employing a method described in [38] for sensor fusion that takes into account the possibility that one or more sensors may be spurious. In the case of DL-based positioning, a model estimate could be spurious if the purported uncertainty is low but the real error is high. We employ this method in a late fusion scheme and show that it is beneficial in improving the positioning accuracy especially in dynamic environments.
Lastly, we aim not only to minimize the positioning error, but also evaluate the reliability of the uncertainty estimation. It would be beneficial if the estimated uncertainty truly reflects the model’s uncertainty about the current measurement, such that a high uncertainty should indicate high positioning error and vice-versa. To evaluate the quality of uncertainty estimates we consider the area under sparsification error (AUSE) metric [27], [39]. In addition to the AUSE metric, we evaluate the integrity of the positioning results with respect to the integrity risk (IR) which is used in global navigation satellite system (GNSS) applications and has been recently proposed in the Third Generation Partnership Project (3GPP) as a positioning key performance indicator (KPI) for 5G positioning [40].
The paper is structured as follows. In Section II the considered system model is described along with the different types of fusion and the MTL scheme is introduced. In Section III the simulation setup is described and the DL-model structure. The results and conclusion are then presented in sections IV and V respectively.
System Model
We consider an uplink setup with \begin{equation*} \tilde {\boldsymbol {H}_{n}} = [\tilde {\boldsymbol {h}}_{0}^{n}, \tilde {\boldsymbol {h}}_{1}^{n}, \ldots, \tilde {\boldsymbol {h}}_{N_{C}-1}^{n} ] \in \mathbb {C}^{N_{R} \times N_{C}}, \tag {1}\end{equation*}
A. DL Based Positioning With Fingerprints
Deep learning based localization using CSI fingerprints as inputs consists of two phases, namely the training and the deployment phase, which are often alternatively termed as offline and online phases, respectively. During the training phase, CSI fingerprints are collected throughout the area of interest along with a label corresponding to the UE position associated with each CSI fingerprint. In order to collect fingerprints along with their labels, the use of positioning reference units (PRUs) can be employed, which consist of a device with known position, i.e., obtained with another positioning method or with sensors [43]. Without loss of generality, we assume that the UEs lie on a two dimensional plane. Subsequently, the CSI fingerprints and the position labels
The key idea behind positioning with CSI fingerprints is that the CSI for each position is considered unique for that specific position. This stems from the fact that the channel between UE and BS is a rich source of information since it is influenced by various environmental factors such as walls objects or other obstacles. All this information is indirectly incorporated into the multipath propagation of the channel, which includes direct paths (LOS) and indirect paths (NLOS), and is extracted during the training phase of the NN. Consequently, positioning using fingerprints is part of modern positioning techniques such as [28], which leverages both LOS and NLOS paths. Additionally, as shown in [44], there is not necessarily a need for a LOS path at all since NLOS paths already contain information that can make the fingerprints unique and useful for positioning. The basic assumption is that the propagation environment should not significantly change between the training and deployment phases since that would degrade the performance of the NN.
Two different approaches for positioning using CSI fingerprints from multiple
1) Early Fusion
In early fusion, a single DL-model is trained for the UE positioning, having as input the concatenation of the CSI fingerprints from all BSs, i.e., the single NN model
2) Late Fusion
With late fusion, a UE’s position estimate is determined based on the CSI fingerprint at each BS. For this purpose, a separate NN model is trained at each BS based on the CSI at each BS as input. The parameters
Compared to early fusion, this method necessitates much less traffic for the network. The reason for that is that only the output of each of the models needs to be collected instead of the whole CSI fingerprints as in the case of early fusion. On the other hand, an appropriate model for the combination of the multiple estimates has to be developed. In this paper we built upon the work in [21] and propose and compare methods to appropriately combine the position estimates considering the uncertainty.
B. Uncertainty Estimation
Normally, for DL based positioning with CSI fingerprints, the parameters \begin{equation*} \mathcal {L}(f_{\epsilon }) = {\mathcal {L}}_{x} + {\mathcal {L}}_{y}, \tag {2}\end{equation*}
The drawback of using such a loss function is that the model does not acquire any knowledge about the uncertainty that is present in the measurements. In the following, we discuss different types of uncertainties.
1) Aleatoric Uncertainty
The data dependent uncertainty is called aleatoric uncertainty and it reflects the uncertainty that a measurement has about the specific task. In a case of positioning using CSI fingerprints, a particular CSI measurement would have high uncertainty if it has low receive SNR for example. This type of aleatoric uncertainty, i.e. instance-dependent uncertainty, is called heteroscedastic uncertainty. Since the aleatoric uncertainty in positioning using fingerprints is data-dependent, it can also be learned from the data. In [45] a modification to the MSE loss was proposed in order to train a model to simultaneously calculate the position and the aleatoric uncertainty of the current position estimate. The loss function which shall be minimized with respect to the model parameters \begin{equation*} \mathcal {L}'(f_{\epsilon }) = \frac {1}{2 {(\sigma _{x}^{\alpha })}^{2}}{\mathcal {L}}_{x}+ \frac {1}{2 {(\sigma _{y}^{\alpha })}^{2}}{\mathcal {L}}_{y} + \log ({\sigma _{x}^{\alpha }}{\sigma _{y}^{\alpha }}), \tag {3}\end{equation*}
Subsequently, the output of the model has to be modified to include the learned aleatoric uncertainty
2) Epistemic Uncertainty
Aleatoric uncertainty is not the only type of uncertainty present in a DL model. The other type is called epsitemic uncertainty and it accounts for uncertainty in the model’s parameters [45]. Estimates with high epistemic uncertainty indicate that the input comes from a distribution that was not learned by the model. In a DL localization model with fingerprints, the epistemic uncertainty would be high for a region in space where no data were collected or when a CSI measurement was corrupted.
In [24], an approach for capturing a model’s epistemic uncertainty called Monte Carlo dropout (MCD) was introduced. When employing dropout, random neurons in every weight layer of the deep learning model are deactivated with a predefined probability. Typically, dropout is utilized solely for training purposes as a regularization technique [42], but with MCD, this same dropout probability is retained even during the deployment phase. Each successive forward pass through the deep DL model with MCD generates a unique configuration, and conducting multiple forward passes is akin to sampling from an approximate posterior distribution of the model’s parameters given the dataset [24]. The variance of the estimates from the different model configurations during these forward passes serves as an indicator of the model’s epistemic uncertainty.
After T forward passes, the combined aleatoric and epistemic uncertainty of the coordinate x is [45]:\begin{equation*} \sigma _{x} = \frac {1}{T} \sum _{t=1}^{T}\tilde {x}_{t}^{2} - \left ({{\frac {1}{T} \sum _{t=1}^{T}\tilde {x}_{t}}}\right)^{2} + {\sigma _{x}^{\alpha }}, \tag {4}\end{equation*}
Even though the epistemic uncertainty estimation is an efficient way for the DL model to report on its own knowledge about the current measurement, it is not always accurate. There are cases where the epistemic uncertainty is low but the mapping to the position is highly inaccurate. In those instances the model would provide a spurious estimate which has to be identified and eliminated from the fusion scheme. For the late fusion in [21], the method employed to combine the results from the different estimates is based on the assumption that each estimate follows a known Gaussian distribution, whose variance corresponds to the estimated combined aleatoric and epistemic uncertainty. As this assumptions does not always hold, we take into account such model inconsistencies by incorporating a method described in [38] to fuse measurement from multiple sensors. The basic idea of this method is to weigh less the estimate that is most inconsistent with the other estimates. We should note that this method leverages multiple position and uncertainty estimates from different BSs, and therefore can only be employed in a late fusion scheme as described next.
In our setup we consider \begin{equation*} p_{n} = \exp \left ({{\frac {-(x - \tilde {x}_{n})^{2}}{\alpha _{n}}}}\right) \tag {5}\end{equation*}
\begin{equation*} \alpha _{n} = \frac {b_{n}}{\prod _{ l=1, l \neq n}^{N_{B}}(\tilde {x}_{n} - \tilde {x}_{l}) ^{2}} \tag {6}\end{equation*}
From \begin{equation*} \sigma _{x, n}'^{2} = \frac {{\sigma _{x,n}}^{2}b_{n}^{2}}{b_{n}^{2} - 2\sigma _{x, n}^{2}\prod _{ l=1, l \neq n}^{N_{B}}(\tilde {x}_{n} - \tilde {x}_{l}) ^{2}} \tag {7}\end{equation*}
\begin{equation*} b_{n}^{2} = 2\sigma _{x, n}^{2}\prod _{ l=1, l \neq n}^{N_{B}}(\tilde {x}_{n} - \tilde {x}_{l} + \lambda) ^{2} \tag {8}\end{equation*}
By choosing the parameter
C. Multi Task Learning
When considering the late fusion approach, i.e. a separate NN for each of the BSs, we propose sharing some parameters across models of different BSs as described in [37]. The n-th model which corresponds to the n-th BS
Two-part models with multi-task learning. Models’ output comprise the UE’s position estimate and the aleatoric uncertainy (when considered).
This method of training is not possible when considering early fusion, since there is only one available model. Furthermore, when comparing STL late fusion to MTL late fusion we see that MTL requires the data from all BSs to be collected in order to train the models since the models share some parameters. For STL late fusion each BS trains its own model and then only shares the result of the model so there is no need to share input data between them. However, the fact that the models in MTL late fusion share parameters can enable training by means of federated learning [47]. Federated learning refers to the technique whereby multiple nodes can train a model by partially training it locally and then sharing the model’s parameters instead of sharing the data. This method can reduce data transfer requirements between nodes and also preserve privacy. Federated learning is also not applicable in the early fusion case since no single BS is able to do partial training on the model (see Fig. 1) as the it needs CSI data from all BSs on its input to predict a single UE position.
The naive approach of optimizing a MTL scheme is to minimize a linear sum of the loss for each individual task, i.e. for the positioning at each BS. Thus, the loss for the training of the models in the MTL scheme would be would be:\begin{equation*} {\mathcal {L}}_{\text {MTL}}(f_{\theta, \epsilon _{1}}, f_{\theta, \epsilon _{2}}, \ldots, f_{\theta, \epsilon _{N_{B}}})=\sum _{n=1}^{N_{B}}{\mathcal {L}}_{n}(f_{\theta, \epsilon _{n}}), \tag {9}\end{equation*}
In the context of DL based localization using fingerprints each model is generates a 2-dimensional position estimate. By assuming that each output of the n-th DL model follows a Gaussian distribution \begin{align*} & \hspace {-.1pc}\mathcal {L}'_{\text {MTL}}(f_{\theta, \epsilon _{1}}, f_{\theta, \epsilon _{2}}, \ldots, f_{\theta, \epsilon _{N_{B}}}) \\ & =\sum _{n=1}^{N_{B}}\mathcal {L}'_{n}(f_{\theta, \epsilon _{n}}) \\ & = \sum _{n=1}^{N_{B}}\left [{{\frac {1}{2 {(\sigma _{x, n}^{\alpha })}^{2}}\mathcal {L}_{x, n}+ \frac {1}{2 {(\sigma _{y, n}^{\alpha })}^{2}}\mathcal {L}_{y, n} + \log ({\sigma _{x, n}^{\alpha }}{\sigma _{y, n}^{\alpha }})}}\right ] \tag {10}\end{align*}
D. Late Fusion With Uncertainty Estimation
The assumption that the outputs follow a Gaussian distribution, for which we have estimates of the mean value and variance, can be leveraged during the data fusion process. The \begin{equation*} \sigma _{x}^{2}= \frac {1}{\sum _{n=1}^{N_{B}}1/\sigma _{x, n}^{2}},\;\;\; \sigma _{y}^{2} =\frac {1}{ \sum _{n=1}^{N_{B}}1/\sigma _{y, n}^{2}} \tag {11}\end{equation*}
\begin{align*} \tilde {x}_{\text {ML}} & = \frac {\sum _{n=1}^{N_{B}}{\tilde {x}_{n}}/{\sigma _{x, n}^{2}}}{1/\sigma _{x}^{2}}, \\ \tilde {y}_{\text {ML}} & = \frac {\sum _{n=1}^{N_{B}}{\tilde {y}_{n}}/{\sigma _{y, n}^{2}}}{1/\sigma _{y}^{2}} \tag {12}\end{align*}
E. Quality of Uncertainty Estimation
After presenting different ways to calculate the uncertainty for each estimate, we now discuss how to assess the quality of these uncertainty estimates. As shown in the previous section, instead of providing a single position estimate, each model provides a different probability distribution for each individual input, for which the variance corresponds to the uncertainty. Normally to assess whether the output of the model indeed conforms to a probability distribution we would repeatedly produce samples from the model for a single input, calculate the empirical mean and variance and determine how close they are to the estimated model’s distribution. However, in the context of localization using CSI fingerprints we have at most a couple of CSI samples for a given location, therefore any empirical calculation would be unreliable. Instead we use a method to determine the reliability of the uncertainty estimation process which is called the area under sparsification error curve (AUSE).
1) Area Under Sparsification Error Curve
The idea behind this metric is to use the so-called sparsification plots as a quality metric and the sparsification error [39]. Before defining the sparsification error we first need to define the oracle error. We define an array of the errors of each position sample as:\begin{align*} e = \left [{{||\tilde {x}_{0} - x_{0}||_{2}, ||\tilde {x}_{1} - x_{1}||_{2}, \ldots, ||\tilde {x}_{N_{\text {test}-1}} - x_{N_{\text {test}-1}} ||_{2}}}\right ] \tag {13}\end{align*}
\begin{equation*} \text { O}_{N} = \sqrt {\frac {\sum e'_{N:N_{\text {test}}-1}}{N_{\text {test}}-N}}, \tag {14}\end{equation*}
To define the sparsification error, we first define an uncertainty vector \begin{equation*} S_{N} = \sqrt {\frac {\sum e^{S}_{N:N_{\text {test}}-1}}{N_{\text {test}}-N}}, \tag {15}\end{equation*}
\begin{equation*} \text { AUSE} = \frac {\sum _{N=1}^{N_{\text {test}-1}}S_{N}-O_{N}}{N_{\text {test}-1}}. \tag {16}\end{equation*}
2) Integrity Risk
We additionally use the integrity risk (IR) metric which was recently proposed by 3GPP as a key performance indicator for positioning integrity [40]. Normally, if the uncertainty is high, the system should give a warning that the respective error is also high. The integrity risk is defined as the probability that the unknown positioning error exceeds an application specific alert limit (AL) without warning. The available information from each user is the position and the uncertainty estimate, therefore we define an indicator function \begin{align*} \mathbb {1}_{\text {AL}}[||\boldsymbol {\sigma }_{i}||_{2}] = \begin{cases} \displaystyle 1 & \text {for}~ ||\boldsymbol {\sigma }_{i}||_{2} \leq \gamma \\ \displaystyle 0 & \text {for}~ ||\boldsymbol {\sigma }_{i}||_{2} \gt \gamma \end{cases}. \tag {17}\end{align*}
\begin{equation*} \text { IR} = \frac {\sum _{\{i | \mathbb {1}_{\text {AL}}(||\boldsymbol {\sigma }_{i}||_{2})\}}\mathbb {1}_{\text {AL}}[e_{i}]}{N_{\text {test}}}, \tag {18}\end{equation*}
F. Database Description
To evaluate our proposals we use the Dichasus channel measurements described in [49], that were collected at four antenna arrays distributed on the corners of an industrial area shown in Fig. 4. Each of the antenna arrays consists of a
In a real deployment of a positioning system using fingerprints, the CSI fingerprints are influenced by variations in the environment such as movement of objects or people throughout the area of interest. To model a change in the environment for a UE at a given position, i.e., between the training and deployment phase, we consider the attenuation of the strongest path from the UE to a BS, i.e., due to blocking by a nearby person or object. Please note that the strongest path to a BS may correspond to a NLOS path, as some areas do not have a LOS to a BS.
Simulation Setup
A. Dynamic Scenario
We assume that the wireless signal propagates along a number of different paths to the BSs. Each path is associated with a complex gain, time-of-arrival and an angle-of-arrival. The totality of the paths and their parameters can fully describe the channel between a UE and a BS. In order to model the attenuation of the strongest path we first transform the channel from the antenna-subcarrier domain to the angle-delay domain by means of the discrete Fourier transform (DFT) as described in [51].
After the matrix transformation to the angle-delay domain we identify the strongest path as the largest element of the matrix and we attenuate it by 20dB which corresponds to the attenuation effect caused by a human body at similar carrier frequency [43]. For a real system as the one considered, there is power leakage of each path to the neighboring elements although the matrix still remains mostly sparse. To take into account the power leakage we attenuate by the same amount a grid of
Consequently, we consider scenarios with or without the above mentioned human body attenuation. When no attenuation is present we assume that there is no change in the environment between training and deployment phases, i.e. the environment is static as shown in Fig. 5. Additionally we consider scenarios where attenuation affects the signal between UE and each of the BSs, but only in the deployment phase. This implies a change in the environment between the training and the deployment phases which we define as a dynamic scenario. We consider 4 different cases for the dynamic case considering a path attenuation from the UE to one, two, three or to all BSs. In the dynamic scenario example in Fig. 5 the strongest paths of BSs C and D are attenuated.
Static and dynamic scenarios example (strongest paths of 2 BSs attenuated). The UE is depicted as red dot inside the area of interest and the arrows indicate the strongest paths between UE and BSs.
Lastly, we compare all the different fusion approaches. An overview of all the considered approaches is shown in Fig. 6 considering different types of training loss, i.e., MSE or NLL loss, and different fusion methods, i.e., early or late fusion. For the late fusion specifically we have different types of model training. Firstly, the STL late fusion which is the same method as described in [21] where each BS corresponds to a single DL-model and each model is trained only on data from that BS. We also consider the MTL late fusion which assumes that the models of each BS share the parameters of their initial layers and are trained using the MTL scheme described in Section II-C. Finally we also consider different types of combining of the estimates from the multiple models, namely averaging, MCD or SP. With early fusion only one model is trained, with either MSE or NLL loss.
B. Neural Network Configuration
The considered neural network is shown in Fig. 7. In the MTL late fusion we consider that the models across BSs share the parameters of the first four blocks. Both early and late fusion use the same overall structure but with different input size, depending on the considered fingerprint.
The basis of the considered neural network is the convolutional layer as it has shown promising results for positioning using CSI fingerprints [8], [11], [13], [14], [16], [21], [37]. In general, the convolutional layer is followed by a pooling layer whose purpose is to downsample its input but recently [52] it has been shown that using strided convolution instead of a pooling layer may improve the model’s performance. Therefore, for the DL-model in this work we only use strided convolution and no pooling layers. A strided convolution can be thought of as a learned pooling layer, where the input is downsampled but the method of downsampling is learned during training [42, chapter 9.5].
Additionally, to further reduce information loss during downsampling, we implement the method of pooling blocks introduced in [53] which was also used for CSI based positioning in [41]. In a pooling block, a convolutional layer doubles the number of learned convolutional filters before downsampling and then the spatial size is reduced by a strided convolution. A final convolutional layer is used to reduce again the number of learned convolutional filters to the original size. In this way, it is expected that the pooling block will learn to transfer the important information from the spatial dimension to the convolutional filters and preserve it.
Lastly, in order to avoid any problems with vanishing gradients we employ skip connections [54]. By combining all the aforementioned methods we create a pooling block which is shown in Fig. 8. Two pooling blocks are placed one after the other and are followed by 3 dense layers with 128 neurons each and finally with a dense layer with 2 neurons that outputs the estimated 2-dimensional position. When considering the aleatoric uncertainty the last dense layer has 4 neurons which correspond to the 2-dimensional position plus the NLL loss for the x and y position coordinates. Each convolutional layer has 32 filters with a
Depending on the method used, i.e., early fusion, STL late fusion or MTL late fusion, the total number of model parameters are different. For the early fusion method, the input’s dimension is
Simulation Results
We test the different proposed schemes using the Dichasus database [49] in the deployment area shown in Fig. 4 by incrementally increasing the number of training samples
A. Static Scenario
Initially we compare the different results for a static environment, i.e., when there is no change between training and deployment phases. Before comparing the different fusion approaches listed in Table 1, we first show the gain of the training performed with the MTL training on the late fusion approach. For both the STL and MTL late fusion approaches, one model is trained at each BS by using the MSE loss in Eq. (2) as the objective function. However, for the STL late fusion the model at each BS is trained only with data from that BS, while for MTL late fusion the first part of the model at each BS is trained with data from all 4 BSs. In the following, when mentioning MTL late fusion, we refer to joint training of the first common part of the models. Fig. 9 shows the performance of the models at each BS with the STL and MTL. The gain of joint training can clearly be seen across the models at each BS. By training the models jointly, their common part incorporates information from multiple BSs effectively increasing its training size. Thus, for a late fusion scheme, joint training the models at each BS (using MTL), instead of separately (STL), leads to a decreased ME at each BS.
Comparison of ME of each BS in a static scenario when using STL or MTL scheme and MSE loss.
Next, we compare the different fusion schemes with respect to the ME in the test set. Specifically for both STL and MTL late fusion we consider three different methods of combining the estimates of the multiple models as described in section II-D. The first combining method is averaging where a simple average of all the model outputs is performed to calculate the overall estimate. The second combining method uses MCD-based combining where the variance of each estimate is estimated using the MCD method shown in eq. (4). Lastly, we also consider the SP-based combining where the variance, shown in Eq (7), of the different estimates is modified by taking into account spurious measurements. The fused estimate for both MCD and SP is calculated using eq. (12). From Fig. 10 we can observe that using early fusion outperforms all late fusion methods in a static environment, i.e., when there are no changes in the environment between the training and the deployment phase. This is in contrast to the result from [21], where it was shown that late fusion outperformed early fusion in the static case. The difference in the conclusion of the results of [21] and the ones shown in Fig. 10 can be explained due to the different considered environments. Whereas in [21] there is always a LOS between the UE and each BS, in this work the link between the UE and each BS can either be LOS or NLOS.
Comparison of ME for different fusion methods in a static scenario when training using STL or MTL scheme and MSE loss.
We consider the performance of the different fusion approaches when the models are trained using the NLL loss function shown in Eq. (3). Fig. 11 shows the comparison of the models at each BS with separate and joint learning when using the NLL loss function. Similar to the MSE loss function case, there is an improvement in the performance for each model when training them jointly in a MTL scheme. As explained in [35], using a MTL scheme which enables joint late fusion, the aleatoric uncertainty is implicitly used as a learned weighting parameter between the losses corresponding to each task, i.e. the positioning at each BS, which can increase the performance for each task.
Comparison of ME of each BS in a static scenario when using STL or MTL scheme and NLL loss.
We further compare the STL and MTL late fusion and the early fusion methods when training the models using the NLL loss. The results are shown in Fig. 12. Similar to the results in Fig. 11 when considering the MSE loss for the training, jointly training the models shows a significant improvement in performance compared to when training each model separately. This effect is particularly strong for a small number of training samples. We also see that even though the early fusion still outperforms the late fusion approaches for a small number of training samples, this is no longer the case when the number of training samples increases, i.e. the late fusion methods perform similar or better than the early fusion method. Similar to when training with the MSE loss, the late fusion with averaging is the worst performing option.
Comparison of ME for different fusion methods in a static scenario when using STL or MTL scheme and NLL loss.
Additionally, we compare the different fusion methods when training the models using the MSE (2) or the NLL (3) loss functions. We see in Fig. 13 that every late fusion scheme benefits when training the models based on the NLL function regardless of the number of training samples. Essentially, the inclusion of the aleatoric uncertainty improves the MTL late fusion schemes with MCD or SP-based combining, such that they are able to come close to the performance of the early fusion scheme, assuming an adequate number of training samples. For the early fusion scheme we see the opposite effect, namely that the inclusion of the aleatoric uncertainty during training impairs the performance, albeit slightly. The reason for this different behavior between late and early fusion may come from the fact the early fusion measurements always have some BSs having a LOS to the UE. This fact limits the usefulness of the aleatoric uncertainty since in a LOS measurement the uncertainty is anyway low. For the late fusion this is not the case since every BS may experience NLOS conditions which have high aleatoric uncertainty and positioning is then more challenging than the LOS case. As explained in section II-B.1, the model prioritizes cases where the aleatoric uncertainty is low, i.e. LOS cases.
Comparison of ME for different fusion methods in a static scenario when training jointly with MSE or aleatoric loss.
Furthermore, we compare the quality of the uncertainty estimation in the different fusion methods in Table 2. During our evaluations, we noticed that the AUSE value remains mostly constant over training samples and therefore, we show the average over the training samples in Table 2. The averaging late fusion method it is not included in the table as it does not have uncertainty information. A lower AUSE value means that the sorting of the positioning errors across each measurement more closely corresponds with the sorting of the uncertainty, making the uncertainty a good indicator for the actual positioning error. We see from the table that for every fusion method, training using the NLL loss function improves the quality of the uncertainty estimates according to AUSE. This makes sense since there is no aleatoric uncertainty information when training MSE loss function, and instead only the epistemic uncertainty is used.
B. Dynamic Scenario
Next we explore the positioning in a dynamic scenario, where the channel between the UE and one or more BSs experiences a change between the training and deployment phase, i.e., a 20 dB attenuation of the strongest path as described in Sec. III-A. In the following, we refer to this attenuation of the strongest path to a given BS as a change. The effect of this change on the uncertainty of the estimates can be seen in Fig. 15, which depicts the aleatoric uncertainty at BS A when the strongest path to BS is attenuated compared to the aleatoric uncertainty when no attenuation is considered shown in Fig. 14. While the uncertainty in the static case remains more or less low throughout the area, i.e., around 0.25, when there is a change the uncertainty can be up to 8 times larger. We see that the most affected regions are the ones that there is a LOS path to the BS. This path includes most of the energy of the CSI thereby by reducing it the CSI is hugely affected. In the NLOS region we see that the uncertainty remains low since the position information is included in multiple paths, i.e., no single path contains most of the energy in the CSI fingerprint.
Fig. 16 depicts the mean error of different fusion schemes when the channel between the UE and the BS A experiences the described change above in the deployment phase. Compared to the static case (see Fig. 13), we observe a huge degradation in performance for the late fusion with averaging, as well as a large degradation in performance for the early fusion case. As expected the late fusion schemes with MCD and SP combining perform much better than the other fusion methods. Specifically the SP-based late fusion method is able to outperform all others since it is able to most reliably disregard the spurious measurements due to the change in the channel to BS A.
Comparison of ME for different fusion methods in a dynamic scenario when using MTL with MSE or NLL loss.
Furthermore, we now consider a change in the channels between the UE and multiple BSs. In Fig. 17, we depict the performance of early fusion and SP-based MTL late fusion when training with the MSE and NLL loss function as a function of the number of BSs with the attenuation of the strongest path, considering 40 000 training samples. The solid lines show the error over all positions in the test set. As the uncertainty at different positions varies, we also propose to consider the performance for the positions when the uncertainty is below a threshold. By using logistic regression in the static scenario we determine an uncertainty threshold for each method over which the positioning error is over 1m. Then for each method we exclude the measurements with uncertainty over this method-specific uncertainty threshold, and we see that the error decreases, depicted by the dashed lines in Fig. 17. The difference is more pronounced for the larger number of BSs with a change and similarly in those cases more measurements are over the uncertainty threshold and therefore excluded. Interestingly, even though the difference between using the MSE loss or the NLL loss is relatively high for a small number of blocked BSs, with NLL loss training performing better, the difference becomes smaller when blocking more BSs. The reason for that is that when more BSs experience a change then the epistemic uncertainty dominates, since the measurements differ more from the static scenario.
Comparison of the error of different error fusion for different number of blocked BSs, considering 40 000 training samples.
Next, we investigate the reliability of the uncertainty estimates in a dynamic scenario with respect to the AUSE as shown in Table 3. As in the static case, we provide the average over all training samples since similarly to the static case we noticed that the AUSE value remains mostly constant over the number of training samples. First we see that for almost every late fusion approach, training with NLL loss and MTL late fusion approach results in the most reliable uncertainty estimates.
Lastly, we depict in Fig. 18 the integrity risk for 40 000 training samples considering a channel change to one or more BSs. The integrity risk is described in equation (18) and we consider
We note that here we provide only a simple method to calculate the uncertainty threshold based on the uncertainty vector l2-norm and provide an IR value to show the effectiveness of the uncertainty estimation. Other methods to better calculate the threshold can be developed using both uncertainty vector elements and using other classification methods such as support vector machines (SVMs). Moreover, the considered metric does not indicate how many estimates are over the uncertainty threshold for each method which may be something that needs to be considered in some use cases.
Conclusion
In this paper, we examined different fusion methods for positioning using deep learning and CSI fingerprints from multiple BSs. For early fusion, only one model is used for estimating the UE’s position based on the CSI fingerprint across multiple base stations. For late fusion, one model per BS is employed and the overall UE’s position is determined by combining the output of the models across the BSs. The performance of the trained models was evaluated considering a static scenario, where the channel between the training phase and the deployment phase remains the same, as well as in a dynamic scenario, where the channel between the UE and one or more BSs experience a change (attenuation of strongest path) in the deployment phase. While early fusion schemes may normally perform better in static scenarios, changes in the environment lead to a decrease positioning performance with early fusion, as the model is not able to adapt in a dynamic scenario. On the other hand, our results indicate that late fusion approaches are more robust to changes in the environment, which is an important aspect to be addressed for AI-based localization with CSI fingerprints in real deployments. Among the different considered late fusion approaches, we have shown the advantage of multi-task learning, by jointly training shared parameters of the models across the base stations, where the common part of the models benefits from a larger number of training samples.
For the late fusion approaches, different methods for combining the positioning estimates from the BS models have also been investigated. In particular, we have considered simple averaging as well as combining based on considering uncertainty estimation, namely MCD and SP, where the output of the different models are weighted based on the learned aleatoric uncertainty. We show that fusing the multiple estimates based on their uncertainty not only improves the positioning accuracy in both a static and dynamic scenario but also ultimately gives more reliable uncertainty estimates. The reliability of the uncertainty estimates is determined in terms of AUSE, which considers whether the uncertainty corresponds to the real positioning error, and in terms of the IR, which demonstrates a model’s ability to discard unreliable estimates. Additionally we consider that some of the estimates may be spurious, i.e., falsely indicate low uncertainty but with an actual large positioning error, and we employ a technique to identify and disregard such estimates.
Overall, we show that late fusion scheme with multi task learning and uncertainty estimation is the most accurate and reliable in the considered scenarios. This holds also for the dynamic scenario, which is one of the main challenges limiting the deployment of AI-based localization with CSI fingerprints.