SoundSpring: Loss-Resilient Audio Transceiver With Dual-Functional Masked Language Modeling | IEEE Journals & Magazine | IEEE Xplore

SoundSpring: Loss-Resilient Audio Transceiver With Dual-Functional Masked Language Modeling


Abstract:

In this paper, we propose “SoundSpring”, a cutting-edge error-resilient audio transceiver that marries the robustness benefits of joint source-channel coding (JSCC) while...Show More

Abstract:

In this paper, we propose “SoundSpring”, a cutting-edge error-resilient audio transceiver that marries the robustness benefits of joint source-channel coding (JSCC) while also being compatible with current digital communication systems. Unlike recent deep JSCC transceivers, which learn to directly map audio signals to analog channel-input symbols via neural networks, our SoundSpring adopts the layered architecture that delineates audio compression from digital coded transmission, but it sufficiently exploits the impressive in-context predictive capabilities of large language (foundation) models. Integrated with the casual-order mask learning strategy, our single model operates on the latent feature domain and serve dual-functionalities: as efficient audio compressors at the transmitter and as effective mechanisms for packet loss concealment at the receiver. By jointly optimizing towards both audio compression efficiency and transmission error resiliency, we show that mask-learned language models are indeed powerful contextual predictors, and our dual-functional compression and concealment framework offers fresh perspectives on the application of foundation language models in audio communication. Through extensive experimental evaluations, we establish that SoundSpring apparently outperforms contemporary audio transmission systems in terms of signal fidelity metrics and perceptual quality scores. These new findings not only advocate for the practical deployment of SoundSpring in learning-based audio communication systems but also inspire the development of future audio semantic transceivers.
Published in: IEEE Journal on Selected Areas in Communications ( Volume: 43, Issue: 4, April 2025)
Page(s): 1308 - 1322
Date of Publication: 20 January 2025

ISSN Information:

Funding Agency:


I. Introduction

The emergence of next-generation wireless networks, particularly 6G, heralds a transformative era in connectivity, ushering in a wide array of applications. Benefiting from data-oriented signal processing techniques, future wireless networks are expected to not only pursue accurate communication in bit level but also offer a wide range of new functionalities such as semantic-aware intelligent tasks. Joint signal processing at the transceiver is expected to improve the end-to-end system gain, by sensing and exploiting the intrinsic nature of source signals. Among these scenarios, audio communication is always the indispensable one. The tradition transceiver design for audio communication is a divide-and-conquer paradigm. Audio codecs are meant for compressing audio [1], [2], [3], while the transmission robustness is ensured by channel coding and other error control techniques. Audio codecs play a pivotal role in compressing the audio, while rate control works in cooperation with the codec to strategically allocate bits across and within audio frames, thereby optimizing the overall communication efficiency. Channel coding is designed with the pursuit for a low error rate in average. But in practice, we often observe left bit errors that manifest into uncorrectable errors and thus packet loss occurs, in which case the general solution is to request retransmission of lost packets. However, retransmission is suitable only for scenarios with short round trip times (RTTs). For most real-time communications (RTC) applications, error audio frames may not be concealed via retransmission where we expect audio frames to be played as soon as they are decoded, e.g., FaceTime, WeChat, etc. Resending error audio packets can contribute to overall delay, ultimately leading to a poor user quality of experience (QoE).

Contact IEEE to Subscribe

References

References is not available for this document.