I. Introduction
Multiple-input multiple-output (MIMO) communication utilizing large arrays and high bandwidths, such as those employed in millimeter wave (mmWave) bands, enables high data rates for wireless links while increasing angle and delay resolvability during channel parameter estimation. Moreover, the sparsity of the channels enhances the sensing performance of the communication waveform [2]. Despite these advantages, state-of-the-art solutions for mmWave localization still face challenges in achieving the required accuracy for specific use cases that depend on precise position information. Current approaches include ultra-dense deployments for beam-based coordinated measurements from multiple base stations (BSs) [3], [4], [5], deep networks exploiting power delay profiles or other channel parameters [6], [7], [8], [9], and two-stage methods that involve a sparse channel estimation phase and a subsequent stage that maps the estimated channel parameters to the position and orientation of the user by exploiting geometric relationships [10], [11], [12], [13], [14], [15], [16], [17].