Introduction
The Internet of Things (IoT) interconnects everyday objects using information and communication technologies [1], [2]. As this technology advances, it transforms various industries by introducing innovative features and enhancing overall efficiency. Recently, domains such as sound, music computing, and audio data have begun to leverage the advantages of the IoT [3]. Nevertheless, the incorporation of IoT technology into these audio domains is still in its infancy. The IoT combined with speech sensing and processing can have a significant impact in various fields such as healthcare [4], consumer electronics [5], and smart environments [6]. However, in the orbit of healthcare, the combination of the IoT and audio technologies stands out with great promise, poised to revolutionise treatments and improve patient care.
The healthcare system, despite being foundational to human progress, continues to face multiple inherent challenges [15]. Firstly, it lacks a patient-centric approach that, for even minor ailments, entails patients making in-person visits to medical facilities, disrupting their daily routines and often sidelining preventative care or regular health checks. Secondly, the prevalent one-size-fits-all approach means treatments are not tailored to individuals, ignoring potential variations in medical history or genetic makeup. Thirdly, an accessibility disparity persists, transforming quality healthcare into a privilege rather than a universal right, accessible only to specific demographics based on socio-economic status, geographic location, or ethnicity. While other industries have rapidly evolved in a data-driven age, healthcare is yet to fully incorporate available technologies into care delivery [16], [17], [18]. This lag results in preventable errors, costing not just economically but also in terms of human lives.
In response to healthcare challenges, there has been a surge in research focused on leveraging technological advancements to improve healthcare outcomes [15], [19], [20]. These innovations promise to usher in an era of personalised, efficient, and universally accessible healthcare. Central to this article is the exploration of the Internet of Audio Things for Healthcare (IoAuT4H), a concept that intertwines IoT with audio technologies to form a network of devices capable of capturing, processing, and interpreting audio signals for healthcare applications [8]. IoAuT4H is a paradigm at the intersection of Internet of Audio Things (IoAuT) and the Internet of Medical Things (IoMT). IoAuT4H's potential to make healthcare more patient-centric is particularly notable. It enables remote monitoring, where audio-based vital signs and health indicators can be measured to detect anomalies [21], reducing the need for frequent in-person visits. IoAuT4H's applications range from patient monitoring to diagnostics and therapeutic feedback, utilising auditory data like vocal patterns, ambient sounds, or medical equipment feedback to optimise patient care. Its real-time audio data analysis allows for treatment penalisation, adapting to a patient's current state. Moreover, IoAuT devices can significantly increase healthcare accessibility, especially in remote areas with limited traditional infrastructure.
A. Scope and Contributions of the Article
Several review articles [9], [10], [22] in the literature discuss the potential of the IoT in the healthcare industry. However, these existing surveys lack a specific emphasis on the IoAuT and fail to explore its potential healthcare applications. Turchet et al. provide a comprehensive review of the Internet of Audio Things (IoAuT) in [8] and the Internet of Sounds (a merger of IoAuT and the Internet of Musical Things) in [14], identifying potential challenges and implications. However, their work does not extend to the applications of IoAuT in the healthcare system. [23]. IoAuT4H, a subfield of IoAuT, lies at the intersection of the IoAuT and the IoMT. Although our previous work [7] explores speech technology in healthcare, focusing on speech-based solutions. However, it does not extend to the broader audio spectrum. A comparative analysis is presented in Table 1, which identifies a fragmented landscape in IoAuT research, often focused on specific technologies or isolated applications. This article bridges previous gaps by exploring the potential of acoustic sensing through IoAuT4H, leveraging audio-enabled devices for a range of sound signals, including ambient noises and healthcare indicators, as shown in Fig. 1. We discuss the broader spectrum covered by IoAuT4H, encompassing IoSoT and IoSpT, and how this integration offers new healthcare possibilities as shown depicted in Fig. 2.
IoAuT as IoT's central pillar, inclusive of the Internet of Sound Things (IoSoT) and the Internet of Speech Things (IoSpT), which is pioneering healthcare transformation through acoustic sensing.
Applications of IoAuT4H showcasing enhanced capabilities through the integration of general acoustic and speech processing technologies.
The main contributions of our article are given as:
We meticulously delve into the intricacies of IoAuT4H and the hurdles confronting current healthcare systems, conducting thorough exploration and analysis for comprehensive understanding.
We explore the transformative potential of integrating IoAuT4H with the existing healthcare system, providing innovative solutions for enhanced and improved patient care and well-being.
We thoroughly elaborate the complexities of the IoAuT4H, with an emphasis on its future prospects for seamless incorporation into the continually evolving landscape of the healthcare system.
B. Methodology
In this work, we conducted an in-depth literature survey, sourcing material from prominent databases such as IEEE Xplore, Google Scholar, Scopus, and Web of Science. Our search terms were strategically chosen to encompass areas related to acoustic sensing, IoT in healthcare, and audio signal processing. In particular, we use search terms such as “acoustic sensing in medicine”, “Internet of Audio Things in healthcare”, “speech processing for healthcare”, “applications of acoustic sensors in the medical field”, “innovations in acoustic monitoring for health”, “IoAuT and healthcare technology”, “sound-based diagnostics in medicine”, “acoustic sensing technologies for patient monitoring”, “the role of audio in digital health innovations”, “advancements in audio sensing for medical applications”, and “integration of audio technologies in clinical settings”. To ensure a thorough exploration of the topic, we also scrutinised the bibliographies of key papers, uncovering additional resources that broadened our understanding of the application of acoustic technologies in medical settings. This multifaceted search approach enabled us to include both foundational studies and the latest research, thereby fostering a comprehensive discourse on the application and evolution of acoustic sensing in healthcare. We carefully selected publications that were instrumental in advancing the field, focusing on research works published after 2015 to ensure the inclusion of the most recent advancements. Our review provides a holistic view of the current landscape and future potential of acoustic sensing in healthcare, offering valuable insights into how this emerging technology can revolutionise patient care and health monitoring.
C. Organization of the Article
The rest of the article is organized as follows. Section II provides a comprehensive background of the IoAuT4H, presenting its architecture. Commonly used audio sensors and their applications in IoAuT4H are presented in Section II. Section III elaborates the key challenges faced by the existing healthcare system. In Section IV, we delve into the opportunities offered by the IoAuT4H to elevate healthcare services, highlighting its potential to drive towards a patient-centric healthcare system. The challenges associated with the integration of the IoAuT with the healthcare system and the way forward to effectively address these challenges are discussed in Section V. Finally, we conclude the article in Section VI.
Background
This section outlines the fundamental aspects of the IoAuT4H, audio technology, and edge computing.
A. Internet of Audio Things for Healthcare
The convergence of cutting-edge research across the domains of the IoT, AI applied to audio technology, and human-computer interaction has given rise to the IoAuT paradigm [8]. IoAuT is part of the broader field of the IoSoT or Internet of Sounds (IoS). In IoS, musical and non-musical sound devices can interact with each other and other devices on the internet to facilitate sound-based services and applications [14]. Audio signal is also being explored as a communication channel to transmit internet data, a phenomenon called audio internet [24]. Recent works have also explored the sustainable and environment-friendly operation of audio devices using ambient energy [25], [26].
IoAuT4H is a paradigm at the intersection of IoAuT and IoMT. IoAuT4H is conceived with the overarching goal of facilitating the seamless processing and transmission of audio data among an array of devices, each possessing diverse sensing, computation, and communication capabilities. The IoAuT4H ecosystem is structured around the following three essential components:
Audio Things: Audio things represent a specialised subset of IoT devices and have the distinctive attributes commonly associated with IoT devices, such as the capacity to collect, analyse, transmit, and receive data [2]. In addition to these capabilities, audio things can generate and manipulate audio-related content, as well as perform sophisticated analysis of data stemming from audio events. Examples of audio things include acoustic sensors and devices designed to respond to user audio commands, such as Amazon Echo Dot, Amazon Echo Pop, and Google Nest Audio, as well as nodes integrated into wireless acoustic sensor networks [28]. Audio things can utilise noise invariant feature pooling techniques to deal with noisy data in IoAuT [29]. In the realm of audio things, Table 2 outlines a consolidated list of various audio sensors and their diverse applications in the context of IoAuT4H.
Networking and Connectivity: The second pivotal facet of the IoAuT4H ecosystem is networking and connectivity, which enables audio things to establish connections among themselves and to access the broader internet, using wired or wireless communication methods. Networking in IoAuT plays an important role due to real-time constraints in health applications such as emergency situation or critical health of a patient. Furthermore, networking in IoAuT4H becomes more important due to two way communication from the user to the care-provider and from the care-provider to the user. This communication must be secure and follow standard communication protocols between wireless and wired devices for scalable deployments.
Applications and Services: Exploiting the interconnected audio things, a vast array of applications and services can be developed, harnessing the potential of IoAuT4H to enhance user experiences and enable innovative functionalities and solutions in healthcare space. IoAuT4H can be used to monitor health and activities of individuals and provide real assistance using conversational agents. Additionally, this technology can help overcome language barriers and geographical distances between clinicians and patients.
The typical architecture of the IoAuT4H is depicted in Fig. 3. The processing of audio signals can be done either on the embedded sensing devices, on the gateway device or on the cloud depending on the application or the service. The emergence of the IoAuT4H presents a paradigm shift in the utilisation of audio data across a spectrum of technological landscapes, promising to revolutionise the way we interact and leverage audio in our daily lives. This novel ecosystem holds great potential for enhancing audio-related applications, services, and user experiences. The functionality of audio things within the IoAuT4H hinges significantly on the progression of audio technology, which is pivotal in processing audio data and deriving valuable insights. In the subsequent discussion, we elucidate the significant strides made in audio technology, specifically focusing on the processing and utilisation of audio data.
Architecture of IoAuT4H with audio devices connected to the cloud server through gateways. The data processing from audio devices can be implemented at the edge or on the cloud depending upon the type of application.
B. Audio Data Processing and IoAuT4H
Audio signal processing is a cornerstone of IoAuT4H, enabling the analysis and interpretation of speech and ambient sounds to empower sophisticated auditory applications in healthcare. In this domain, Deep Learning (DL) has emerged as a transformative force, especially within the context of mental and neurological healthcare, facilitating the development of innovative diagnostic and therapeutic tools [30], [31]. Pre-processing is a fundamental phase in audio signal processing that involves enhancing the audio signal quality for subsequent analysis. Techniques such as noise reduction and silence removal are employed to refine signal quality, which are essential for accurate system performance [32]. The extraction of audio features is crucial, serving to transform raw audio input into a set of descriptors for subsequent analysis. These features are mainly influenced by human auditory models and include spectral and temporal characteristics [33], [34]. Recently, DL has revolutionised feature extraction with the advent of pre-trained models. These models, when fine-tuned, provide enhanced representations of raw audio signals. The integration of raw audio with DL architectures, such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), facilitates the direct learning of optimal features for tasks. Comparing these DL-based features with traditional ones in health-related applications is a dynamic area of research, aiming to explore and pinpoint the most effective methodologies for the particular use case.
In the broader scope of DL, neural networks including CNNs and long short-term memory networks (LSTMs), a type of RNN, play a critical role in deciphering complex data patterns for predictive modelling and decision-making processes [35]. The training of these networks through forward and backpropagation is vital in refining the models to achieve high accuracy in tasks such as diagnosis, classification, and predictive modelling of neurological and mental health conditions [36], [37], [38]. The application of DL extends to the creation of intelligent virtual assistants and chatbots, providing support and assistance to patients, thereby elevating the standard of care in mental health services [39], [40]. The convergence of audio signal processing with DL within the IoAuT is paving the way for groundbreaking advancements in healthcare. It is enhancing the precision of diagnostics and the personalisation of treatment plans, and introducing patient support systems that are more responsive and effective. Additionally, certain DL models might necessitate residing at the edge owing to the stringent latency constraints of healthcare applications. In the subsequent discussion, we provide an overview of edge computing, exploring its relevance and implications in this context.
C. Edge Computing and IoAuT4H
The IoAuT4H is a notable advancement in healthcare technology, characterised by its extensive network of audio sensors and sophisticated analytical tools. It promises to enhance healthcare delivery with applications ranging from continuous heartbeat monitoring to detecting health-related changes in speech and gait patterns. The effectiveness of the IoAuT4H, especially its capacity for rapid processing and immediate feedback, hinges on implementing an efficient computational and networking framework. Edge computing and embedded processing provide this necessary support by enabling localised, swift data processing [41].
Edge computing [42] represents a paradigm shift in data processing. Instead of relying on distant cloud servers, edge computing processes data closer to its source–be it audio sensors or any IoAuT-enabled healthcare device. This not only ensures minimal latency [43] but also amplifies the efficiency and immediacy of data processing [44]. For instance, in a healthcare setting equipped with IoAuT, real-time audio data from patients can be swiftly processed at the edge or an embedded device, facilitating instant diagnostic feedback or alerts. Such rapid, decentralised processing is particularly vital for applications requiring split-second decision-making, like detecting and alerting about anomalies in critical health parameters [45]. Moreover, with the growth of IoAuT4H devices, there is an inherent increase in the volume of generated data. Transmitting this colossal amount of data to central servers can strain network resources and lead to delays. Edge computing alleviates this issue by enabling localised processing, reducing the amount of data that needs to be sent back and forth [46], [47]. In the context of IoAuT4H, this means that only pertinent, processed information might be relayed to central servers or the cloud, ensuring efficient bandwidth utilisation and faster response times. A recent trend is to employ collaborative learning such as federated learning to exploit the resources of edge and cloud servers to preserve user privacy and develop efficient models for health monitoring [44], [48]. All in all, distributed and edge computing acts as the backbone, empowering IoAuT4H framework in healthcare. Its role in optimising real-time audio data processing, ensuring low latency, and judiciously managing network and communication resources emphasises its pivotal importance in the successful integration and implementation of IoAuT in modern healthcare scenarios.
Key Healthcare Challenges
In this section, we detail healthcare challenges and briefly outline how IoAuT4H can potentially address them, setting the stage for a more comprehensive discussion on its potential in a later section.
A. Scarce Medical Staff and Rising Needs
The current healthcare infrastructure faces significant strain primarily attributed to a shortage of medical professionals and care workers [49], [50], [51], as well as an ageing population and increased numbers of individuals living with disabilities or health conditions [52], [53], [54]. Consequently, the current healthcare system encounters substantial challenges in delivering quality healthcare services to the public at an affordable cost [55]. Addressing these challenges necessitates the adoption of innovative systems, telehealth, and technology-driven monitoring, decision-making, and support systems to provide personalised care to patients in their dwellings.
B. Challenges in Achieving Universal Healthcare Access
Universal healthcare access is hindered by several challenges, including the lack of updated health information, variable care quality, limited infrastructure, financial and geographical barriers, and socio-economic disparities [56], [57], [58], [59]. In remote areas, these challenges are exacerbated by the diminishing number of medical professionals and care workers, making regular face-to-face diagnostics and medical assistance increasingly difficult [49], [60]. These areas also struggle with inadequate infrastructure, telecommunication problems, and a lack of preventive health programs [61], [62]. Addressing these widespread issues requires a holistic approach that combines technological solutions like IoAuT with policy reforms, community initiatives, and increased rural healthcare funding. Telemedicine and e-health solutions, in particular, can play a pivotal role in improving access and care quality, both in general and specifically in remote areas, helping to bridge the healthcare gap and reduce disparities.
C. Remote Patient Monitoring and Chronic Condition Long-Term Care
The effective care of patients with chronic conditions necessitates continuous, long-term monitoring and the facilitation of patient self-management, a challenge intensified by staff limitations in conventional care facilities [63]. Remote patient monitoring, while instrumental, encounters obstacles such as technology barriers, data privacy and security concerns, interoperability, regulatory compliance, and ensuring the quality and accuracy of data [64], [65], [66], [67], [68], [69]. Additionally, the integration and processing of high volumes of data from various wearable and ambient IoT sensors monitoring physiological and cognitive conditions pose significant challenges, including accuracy, reliability, clinical decision support, and ethical and legal considerations [70], [71], [72], [73], [74]. Particularly for long-term care beneficiaries, such as the elderly, difficulties with computer literacy often limit the benefits of these technologies [75], [76], [77]. Continuous disease monitoring, essential for conditions like cardiovascular diseases, adds to these challenges [78], [79], [80], [81], [82], [83]. To mitigate these issues, the IoAuT offers promising solutions, such as voice assistants and sound sensors, that can enhance patient independence and improve remote monitoring and patient engagement, thus advancing healthcare outcomes for chronic conditions.
D. Challenges in Providing Real-Time Assistance and Support
In the healthcare sector, especially with the reduced clinic times and huge workload for clinicians and healthcare workers, real-time support via such voice assistants can be highly beneficial in a patient's recovery journey [91], [92], [93], [94], [95]. The main goals via such real-time support are continuous screening of symptoms, promoting self-management in patients and helping educate patients with their doubts and concerns such as the iHeartU system developed by Zhang et al. [96]. Besides conversational agents, sound sensing via microphones can also help detect different user states during everyday activities like eating and identifying falls [97], [98], [99]. These unobtrusive methods help understand user states in real-time for emergency alerts and facilitate context-aware interventions like medication reminders and diet tracking based on the detected user states.
E. Barrier Between Patients and Medical Professionals
Healthcare challenges stemming from barriers between patients and medical professionals are multifaceted and can have significant consequences. These challenges include communication breakdown, lack of trust, technological divide, time constraints, cultural incompetence, stigma and discrimination, mental health issues, and difficulty maintaining accurate and timely documentation and record keeping [100], [101], [102], [103]. As a whole, these healthcare barriers may result in healthcare delivery that is inefficient, inequitable, and not patient-centred. Addressing these barriers requires a multifaceted approach, including fostering better communication, improving cultural competence, reducing healthcare disparities, and developing patient-centred care models that prioritise trust and collaboration between patients and medical professionals.
F. Clinical Record-Keeping Challenges
Managing health information is essential for developing effective patient care strategies, but it faces challenges including lack of staff training, data privacy issues, information overload, integration problems, resource constraints, as well as difficulties in data storage and retrieval [15], [104], [105], [106], [107], [108]. Similarly, clinical documentation confronts issues like data-entry errors, standardisation problems, maintaining privacy and security, Electronic Health Record (EHR) usability, interoperability, regulatory compliance, and integration into clinical workflows [109], [110], [111], [112], [113], [114], [115]. Addressing these multifaceted challenges necessitates an integrated approach involving the adoption of standardized data formats, improved cybersecurity, and collaboration between healthcare providers, EHR vendors, policymakers, and IT professionals. IoAuT4H offers a potential solution to these challenges using audio signals, providing innovative methods for streamlining health information management and clinical documentation [116].
G. Financial Hurdles in Healthcare
The cost challenge in the current healthcare system refers to the significant financial burden placed on individuals, healthcare providers, and governments due to the high costs associated with medical care and services. This challenge encompasses several key issues including rising healthcare costs [117]; limited access to care [118]; the financial strain on individuals [119]; (racial and ethnic) healthcare disparities [120]; impact on healthcare providers; and government budget [121]. Therefore, it is important to address cost challenge in the healthcare system using technology-driven solutions, cost-containment strategies, healthcare reform initiatives, price transparency, and promoting preventative care to reduce the long-term burden on the system.
The Potential of IoAuT4H to Address Challenges in Healthcare
The challenges discussed in the previous section are driving the development of technological solutions to ease the burden on healthcare facilities and enhance service efficacy and accessibility. This section explores the potential of the IoAuT4H to address these challenges and improve the quality of healthcare services. Before delving into the intricate details, we explore key pilot projects that illustrate how speech, non-speech, and other acoustic signals (such as falls or screams) have been effectively utilized to enhance healthcare infrastructure. The detailed architectures of these IoAuT4H pilot projects are presented in Fig. 4. Furthermore, we present specific case studies in Table 3 that illustrate the use of acoustic sensing in healthcare, underscoring the profound impact and expansive scope of IoAuT4H in this critical sector.
IoAuT4H Pilot Projects; A. Harlie, a conversational virtual assistant, monitors vocal markers to detect early signs of diseases such as Parkinson and dementia. B. Suki AI is a voice-based digital assistant designed to help clinicians by recording notes based on patients' live data, including vital signs. C. Extension of pediatric cardiology services to mobile health clinics via broadband internet at the Miller School of Medicine, University of Miami. D. Developed by the NHS AI Lab, this project involves detecting abnormal behavior of residents through acoustic monitoring and issuing timely alerts.
Voice-based Virtual Assistants / Chatbots: With the widespread availability of smartphones, the use of voice-based conversational virtual assistants and chatbots in the healthcare industry is rapidly increasing. In this context, Ireland et al. [133] developed a conversational agent named ‘Harlie’, which monitors vocal markers to detect early signs of diseases such as Parkinson's, dementia, and others. Harlie, which operates on acoustic signals, utilizes Google's speech-to-text and text-to-speech APIs to convert a patient's spoken words into digital text and transform digital text into synthetic speech, respectively. To enhance security, speech signals are transmitted offshore with random voice modulation to prevent interceptors or attackers from identifying the user based on speech patterns. Harlie also employs AIML (Artificial Intelligence Markup Language) [134] to generate meaningful responses from text input, enabling it to engage in coherent and deterministic conversations. Similarly, Suki AI [135] assists clinicians in making notes allowing them to concentrate on patient care and build more engaging relationships with patients. Suki AI leverages FHIR (Fast Healthcare Interoperability Resources) to integrate live data from patients' electronic medical records (EMR), including vital signs, to create accurate and timely medical notes.
Tele-cardiology Service for Mobile Health Clinics: Ardhanari et al. [136] designed a tele-cardiology system for the mobile health clinics at the Miller School of Medicine, University of Miami to enhance patient access to paediatric cardiology services in remote areas. Patients are initially evaluated by a paediatrician at a mobile health unit. Those requiring further cardiac evaluation based on their history and/or initial findings are then scheduled for a tele-cardiology consultation within the mobile unit, which is equipped with tele-auscultation, tele-electrocardiography, tele-echocardiography, and videoconferencing capabilities. A paediatric cardiologist at a tertiary health care unit utilizes broadband-enabled workstations running AGNES telemedicine software to extend expert cardiology services to the mobile units.
Acoustic Monitoring Integrated with Electronic Care Planning: The NHS AI Lab developed this pilot project to create acoustics-based activity profiles of residents, enabling the detection of abnormal behavior and triggering timely alerts [137]. Residents' sounds are captured using Allycare's wireless-enabled devices (installed in homes with the consent of residents) [138] and are classified and analyzed to detect unusual events or abnormal behaviour. Alerts generated from these detections are integrated with KareInn's electronic care management system [139], which provides a comprehensive view of each resident, including their medical history, health trends such as infections, falls, vital signs, and daily activities. In just nine months since its launch in September 2019, the system achieved a 55% reduction in night time falls and a 20% decrease in hospital admissions across three care homes with a total of 90 registered beds, compared to the previous year. Simultaneously, unnecessary nighttime checks by healthcare staff were reduced by 75%, freeing up their valuable time for other care planning activities.
After highlighting the key aspects of IoAuT4H pilot projects and detailing the technical workings of acoustic sensing and IoAuT within healthcare applications, we provide a thorough overview of IoAuT4H's potential to tackle critical challenges faced by healthcare infrastructures worldwide, particularly in underdeveloped regions, as elaborated in Table 4. This includes examining how IoAuT4H can significantly improve healthcare infrastructure and operational efficiency, empowering healthcare providers to create safer, more efficient, and patient-centric hospital environments. Additionally, we explore how IoAuT4H contributes to enhancing various aspects of the healthcare domain, such as providing real-time alerts and emergency notifications, enabling remote patient monitoring, improving care for elderly and pediatric populations, and delivering specialized healthcare services to underserved areas, ensuring equitable access for everyone.
A. Enhancing Hospital Infrastructure and Efficiency
IoAuT4H can play a pivotal role in enhancing hospital infrastructure and operational efficiency by facilitating real-time patient monitoring, asset tracking, and predictive maintenance. IoAuT4H devices enhance patient experiences, improve the security of sensitive areas, and optimise resource allocation. IoAuT4H's data analysis capabilities empower healthcare providers to make data-driven decisions, ultimately leading to safer, more efficient, and patient-centric hospital environments. This technology is poised to drive significant advancements in healthcare, making hospitals more responsive and adaptive to the needs of both patients and medical professionals. Furthermore, IoAuT4H can monitor ambient environmental factors such as noise levels, air quality and other factors in healthcare centres to maintain optimal conditions for patient care. Keeping the view of control on environmental pollution to improve hospital infrastructure, IoAuT4H can help curb this problem by using noise cancellation technologies and smart infrastructure solutions that can dynamically respond to noise levels.
B. Remote Patient Monitoring
Since the COVID-19 outbreak, remote healthcare has seen a significant rise, with 95% of healthcare facilities now offering remote services, up from 43% pre-pandemic [122]. Central to this shift is remote patient monitoring, which involves monitoring patient health through technology [140]. IoAuT4H with its audio-based technologies, is pivotal in this context. IoAuT4H not only enables continuous monitoring of vital signs and health indicators like heart rate and respiratory rate but also assists in detecting diseases, including mental and cognitive disorders. For example, Pramono et al. [141] proposed a method to detect cough events from acoustic signals using spectral features. Ghayvat et al. [98] presented deep deep-learning model for remote monitoring by detecting distinct acoustic events in everyday situations, which not only enables individuals but also allows healthcare professionals to monitor the ongoing status of each person remotely. Additionally, IoAuT4H facilitates tele-auscultation services, extending cost-effective healthcare to remote areas. Johanson et al.’s prototype enables real-time tele-auscultation over the internet, incorporating audio channels for auscultation data and communication between physicians and patients [126]. A similar prototype by Kamolphiwong et al. [127] improves the quality of transmitted auscultation sounds by managing packet delay variations. Faurholt-Jepsen et al. have also contributed with their MONARCA software, which analyses smartphone data, including speech, for bipolar disorder symptom management [128].
C. Promoting General Wellness and Preventive Care
The significance of wellness and preventive care in maintaining a healthy and thriving life cannot be overemphasised. This not only empowers individuals to lead a low-risk and healthy life but also has far-reaching positive outcomes for society. IoAuT4H-based devices can be employed to monitor vital signs and health indicators (such as sleep patterns, daily activities etc.), thus enabling healthcare providers to manage potential health problems proactively. Moreover, regular audio reminders and alarms can be provided via IoAuT4H ensuring people adhere to required health maintenance activities, such as prescribed medication plans or blood sugar monitoring. Acoustic sensing devices can also play a role in this regard. For example, Mallegni et al. [80] talk about the devices to detect and process acoustic signals to provide a more reliable description of their features e.g., amplitude, and frequency bandwidth which can be helpful in hearing aids, wearable devices to monitor the heart sound etc. In another study, Fang et al. [142] propose two multimodal learning frameworks to classify common voice disorders by combining acoustic signals and medical records. IoAuT4H can be employed to detect voice biomarkers for the identification of cognitive health. Studies have shown that alterations in voice characteristics can be indicative of future dementia with high accuracy [143], [144], [145]. For instance, Lin et al. present a voice-based linear classifier that predicts future dementia risk for asymptomatic individuals [146]. The proposed model, utilising acoustic features associated with cognitive impairment, can enhance preventive care for individuals at the risk of dementia. A similar study is presented in [147] that distinguishes patients with Alzheimer's disease based on speech disruptions during a picture description task.
D. Real-Time Alerts and Emergency Notifications
In recent years, there has been an observable increase in the frequency and severity of catastrophic world events, including natural disasters and public health crises [148]. This trend, coupled with the limitations of traditional emergency response systems [149], has prompted the exploration of more reliable and efficient alternatives [150], including the use of automated alerts for rapid response as highlighted by previous work [151], [152]. IoAuT4H encompassing interconnected audio-enabled devices is envisaged to yield significant dividends in the form of providing real-time alerts and notifications in the wake of emergencies. These devices and sensors can trigger real-time and automated alerts via the detection of various acoustics, including emotions, falls, distress calls and sirens, thus resulting in rapid emergency response. For instance, authors in [131] propose an audio-based emergency detection system, which works on human scream detection using a pre-trained machine learning model. Similarly, the perception sensor network is presented in [153] that employs a Kinect microphone array for the acquisition of audio signals. These audio signals are then classified and localised for scream detection and to dispatch appropriate reinforcement respectively. Acoustics-based fall detection systems for elderly people are proposed in [154] to signal care providers for timely assistance. British Geological Survey developed an app called ALARMS (Assessment of Landslides using Acoustic Real-time Monitoring Systems) for disseminating early warning regarding landslides [155]. Integrating IoAuT4H with existing disaster management systems [156], [157], [158] can introduce a robust layer of notifications, enhancing system resilience. Furthermore, IoAuT4H-based emergency alerts can improve the accessibility of disaster management for individuals with visual impairments or specific communication needs [155], [159].
E. Real-Time Support Using Digital Assistants
Interactive voice-based assistants such as Apple Siri and Amazon Alexa are currently integrated into many homes to help people with some of their routine tasks. In addition to the impact real-time digital assistants have on improving the patient's well-being, they also have the potential to alleviate the stress experienced by caregivers [91], as they feel satisfied that the patient is continuously looked after. These digital assistants also form part of the IoAuT4H ecosystem and research has consistently shown that personal digital assistants (or conversational agents) can be very beneficial in helping patients' disease management [91], [92], [93], [94], [95]. For instance, real-time monitoring with regular verbal check-ins with the user can reveal any additional symptoms and side effects either via the conversations or directly from speech abnormalities, which can be crucial for timely intervention in case of emergencies and also help in self-anamnesis [91], [92], [160]. Using personal digital assistants can also help set reminders for appointments with clinicians, performing physical activities, and eating/hydration, which leads to better adherence [91]. The relevance of acoustic sensing extends to other aspects as well such as cough detection [141] and remote monitoring from acoustic signals which can lead to real-time response by these digital assistants.
F. Improved Patient Engagement and Medication Management
Medication non-compliance is the cause of 11% of the total hospitalisations in the US [161]. Several approaches prove to be beneficial to aid in medication management such as using trackers in the medicine box [161]. Some of the previous works have also explored using sound sensors to detect medication adherence for specific types of disease management. Nousias et al. [162] detect sound from the use of pressurised metered dose inhalers which are used by patients with respiratory illness to improve medication adherence. Sounds that indicate inhaler actuation, inhalation sounds, exhalation sounds, and background/environmental sounds can help keep track of medicine usage. In addition to medication adherence, due to the precision of the audio classification of actions when using the device, their system can also be used for checking the proper usage of these devices for beginners. Apart from sound sensors, voice-based reminders for medication reminders is one of the most straightforward and efficient ways for medication management [163], [164]. In addition, these voice-based interactive agents can also provide emotional and social support with continuous interactions, and use effective behaviour change techniques to improve current health behaviours, such as physical activity, and healthy eating [165], [166], [167].
G. Access to Healthcare in Underserved Areas
In underdeveloped regions, where healthcare access and resources are limited, the IoAuT4H plays a crucial role in enhancing healthcare delivery. IoAuT4H improves awareness and education through interactive audio-based workshops on platforms like Twitter Spaces and disseminates vital health information via radio, automated calls, voice messages, and podcasts. Remote consultations become more efficient through virtual clinics and telemedicine points, making healthcare more accessible. Mobile health clinics equipped with IoAuT4H devices help extend healthcare services to remote areas. Abdellatif et al. [125] developed a telemedicine system using IoMT to connect patients with healthcare providers, a concept further advanced by Alenoghena et al. [168] with a direct consultation hotline. These innovations facilitate appointment scheduling and ensure swift patient-professional connections [169], significantly improving healthcare accessibility in under-resourced areas.
Voice-based chat-bots [170] trained on healthcare data can not be ignored because of their importance in under-developed locations as people can frequently ask questions and can get accurate and verified responses. These systems can also be employed for remote monitoring of patients including cough monitoring [45], sleeping patterns detection [171], medication reminders [172], and psychiatric illness detection [45], [173]. Smart devices can aid medicine management to maintain the stock as well as voice reminders [172] on patients' phones to ensure the prescribed schedule. For example, tracking of medical consumption and voice-based reminders to restock them. IoAuT4H devices can be utilised to collect data and perform predictive analysis which may help healthcare workers to make informed decisions for underserved areas based on diagnosing and treating common diseases.
H. Transforming Elderly Care
The IoAuT4H holds the potential to revolutionise elderly care through real-time monitoring of their emotions, health, and overall well-being. This groundbreaking technology can realise these benefits without the need for deploying additional sensors; instead, it can harness the existing devices at our disposal, such as smartphones, smartwatches, and tablets, to continuously gather real-time audio data. This invaluable data can enable medical professionals to access a comprehensive historical record and identify deviations from an individual's typical routine, thereby facilitating the early detection of potential health issues or signs of diseases.
Furthermore, IoAuT4H can play a crucial role in providing much-needed support to elderly individuals who reside alone in their own homes – offering independent home-based ageing [28]. It can address critical challenges related to loneliness, social isolation, and depression that can significantly impact the quality of life for this demographic [174]. for example, Yalamanchili et al. [175] used acoustic features to train a classification model to classify an individual as experiencing depression or not. Similarly, Liu et al. [173] studied the correlation between depression and speech to identify a set of speech features conducive to the detection, assessment, and potential prediction of depression. By fostering connectivity and facilitating meaningful interactions, the IoAuT4H can potentially offer a lifeline of companionship and emotional support, enhancing the overall well-being of the elderly and contributing to their independence and happiness.
I. Enhancing Pediatric Healthcare
Paediatric healthcare, focusing on the well-being of infants to adolescents, has seen increased demand since the COVID-19 pandemic, particularly in low and middle-income countries [176]. In this context, IoAuT4H offers innovative solutions. Smart IoAuT4H devices, such as wearables with acoustic sensors, play a crucial role in monitoring vital signs like heart rate and body temperature. They are particularly useful in detecting changes in a child's emotional state through voice and sound analysis [177]. These devices facilitate timely medical interventions and aid in tasks like monitoring temperature fluctuations, providing first-aid guidance, and managing common symptoms. IoAuT4H enhances paediatric care management, from tracking medical equipment availability through voice queries to ensuring patient safety with location monitoring via smart bands. It also supports chronic disease management, like using IoT-enabled sphygmomanometers for patients with depression and anxiety, or smart inhalers and glucose meters for asthma and diabetes management [177]. Additionally, applications synchronised with IoAuT4H devices empower parents and caregivers with real-time data for informed decision-making, providing essential education and awareness.
IoAuT4H: The Way Forward
This section examines the challenges associated with incorporating IoAuT4H utilising speech technology into healthcare and discusses possible strategies to expedite its integration. For a quick overview, Table 5 provides a concise summary, showcasing current solutions, pinpointing gaps, and suggesting future directions for IoAuT-based healthcare advancements. Furthermore, Table 6 provides a summary of few promising applications of emerging acoustic technology in healthcare.
A. Privacy and Ethical Concerns
The IoAuT4H has the potential to revolutionise healthcare as discussed in Section IV. However, its integration into realistic healthcare settings is hindered due to data privacy and ethical concerns. Various researchers have focused on this dilemma in the literature and attempted to seek answers to important questions about privacy and security challenges in IoT-empowered healthcare. For instance, Awotunde et al. [178] suggested using access controls and anonymisation techniques as potential solutions to address privacy and security risks in IoT healthcare. To protect the sensitive attributes in the audio data, Hassan et al. [179] proposed the use of differential privacy techniques for IoAuT. Similarly, Alshathri et al. [180] proposed an audio watermarking scheme for robust data protection. However, the utility of these approaches remains valid only in limited scenarios. Therefore, it is imperative to improvise novel approaches to ensure the anonymity and security of continuous data streams in IoAuT4H systems. One possible solution could be to follow the privacy and ethics by design approach, as advocated by Latif et al. [181] that involves embedding ethical and privacy considerations into technology development, ensuring transparency and providing data control options to users.
In addition to privacy concerns, the integration of IoAuT in healthcare faces several technical limitations. The high volume of data generated by audio sensors requires robust and scalable storage solutions, and real-time processing capabilities are essential to handle continuous data streams effectively. Ensuring seamless interoperability among diverse IoT devices and healthcare systems is another significant barrier, often complicated by varying standards and protocols used by different manufacturers [182]. For instance, inconsistent data formats and communication protocols can lead to integration failures and increased complexity in system management. Therefore, addressing such interoperability issues is crucial for the safe, reliable, and robust deployment of IoAuT in healthcare settings.
B. Adversarially Robust Audio Analytics
AI and ML techniques are becoming increasingly instrumental in healthcare, facilitating the efficient assessment of patient conditions and helping professionals in the diagnosis and treatment through speech analytics. However, like other critical systems, speech processing in healthcare is susceptible to both conscious and unconscious adversarial ML attacks, which can undermine system reliability and safety [13]. These attacks pose significant challenges to the integration of IoAuT in healthcare by exacerbating existing concerns related to data privacy, technical limitations, and interoperability. For instance, adversarial attacks can compromise the privacy of sensitive patient data by exploiting vulnerabilities in ML models, leading to unauthorized access or manipulation of audio data. This risk amplifies the ethical and privacy challenges already inherent in IoAuT4H systems.
In the literature, various studies have highlighted the threat of adversarial ML for speech processing applications [183], [184], [185]. While various defence strategies have been proposed to counter such attacks [186], [187], their efficacy is often limited to specific tasks (e.g., speech-to-text, speaker recognition, emotion recognition) and the underlying data distributions. In addition, it is crucial to develop robust and scalable defensive techniques that are capable of processing large volumes of audio data in real time. Similarly, ensuring interoperability is also complicated, as defence mechanisms against adversarial attacks must be compatible with diverse IoT devices and healthcare systems, which often use varying standards and protocols. However, the literature highlights that the attention devoted to developing novel defence strategies for adversarial ML attacks is not proportionate with the one given to developing novel attacks [188]. Therefore, developing defensive and scalable solutions for IoAuT4H remains an open research challenge.
C. Exploring Emerging Acoustic Technologies
Recent advancements in acoustic technologies are reshaping healthcare, enabling innovative applications in healthcare across different domains including diagnostics, treatment, and tissue engineering. For instance, acoustic tweezers developed by Rufo et al. [199], which utilise sound waves to manipulate microscopic particles and cells have shown promise in separating tumour cells. Focused ultrasound has emerged as a powerful tool for studying mechanotransduction, the process by which cells convert mechanical stimuli into biochemical signals, offering non-invasive means to influence cellular behaviour [189]. Acoustofluidic diagnostics using acoustic tweezers [190], [191], [192], [193], [194] are improving early infection detection, while acoustic holography and tweezing [195], [196] advance tissue engineering. In vivo acoustic manipulation [197], [198] is transforming non-invasive surgeries, promising advancements in drug delivery and tissue engineering. Despite such promising applications, several open research issues persist including the scalability and precision of these technologies in clinical settings, their integration with existing medical systems, the safety and efficacy of non-invasive manipulation techniques, the development of standardized protocols and regulatory frameworks, and the long-term effects and biocompatibility of acoustic interventions. Therefore, it is crucial to address these challenges along with exploring innovative solutions to advance acoustic technology in healthcare while ensuring their safe, successful, and widespread adoption.
D. Audio Data Quality for Model Training
Developing intelligent solutions for audio-based healthcare applications is significantly hindered by the need for high-quality, annotated audio data [178]. In real-world settings, audio data quality is often compromised by noise, equipment sounds, and voice variability [11], making it difficult to build systems that perform well with authentic audio. Training models typically rely on data from controlled environments, which may not perform effectively in more variable real-world conditions [200]. This highlights the urgent need for realistic audio datasets that accurately capture a diverse range of environmental conditions. However, collecting and labelling real healthcare audio data on a large scale is laborious, time consuming, and costly [201], [202]. Recent advances in language and speech models have made strides in synthetic data generation and dataset annotation [203], [204]. Despite these advancements, effectively managing noisy healthcare audio data remains a significant challenge. Critical issues include handling acoustic variations, managing overlapping speech, and mitigating environmental noise, which complicate the extraction of relevant information from audio recordings. Ongoing research is focused on developing advanced noise reduction and data augmentation techniques to enhance decision-making capabilities when working with noisy and limited data. Bridging the gap between controlled training scenarios and the complexities of real healthcare audio environments is crucial for fully realising the potential of the IoAuT4H in medical applications. This will be key to improving the robustness and reliability of these systems, thereby enhancing their overall capabilities and effectiveness in real-world deployments.
E. System Integration, Compatibility, and Interoperability
Integrating the IoAuT4H with existing health data portals and clinical decision support systems presents challenges due to diverse EHR systems, varied medical devices and software, extensive audio data, real-time processing requirements, and concerns such as contextual understanding, adaptability, user training, and ethical/legal issues [7], [205]. Ensuring interoperability with current healthcare infrastructure is crucial for maximising benefits in monitoring and decision-making [7]. An integrated system, rather than a standalone speech processing solution, allows for consolidating health information sources, enhancing usability for service providers, clinicians, and patients, and potentially reducing costs by leveraging existing communication and data resources. Moreover, fusing data from audio, physiological, and ambient sensors can enhance prediction algorithms, improving early detection and recognition of health conditions. The integration of health IoT systems with IoAuT4H promises a scalable, customisable platform, facilitating the development of personalised monitoring and support systems that improve the quality of care. Such advancements are aligned with Healthcare 4.0 goals, which emphasises the need for innovative and efficient healthcare solutions [206].
F. Optimising Network Infrastructure for IoAuT4H
Applications of IoAuT4H, particularly those requiring real-time event detection, demand a network infrastructure that supports low-latency communication. This is crucial for scenarios like immediate detection of patient emergencies [207]. With the expected surge in data from numerous audio devices, particularly wearable technology with limited processing power, the network must possess robust data handling capabilities to ensure effective IoAuT4H implementation [20]. Moreover, the transfer of large-scale data introduces privacy, security, and bandwidth management challenges. In this regard, access to extensive and representative datasets is crucial for the efficiency of DL algorithms in IoAuT4H, particularly given the inherent noise in the audio data. Addressing the challenge of limited computational resources at edge nodes for DL model training is critical. A network infrastructure that enables distributed DL training is essential, highlighting the importance of a network design that supports distributed learning and optimization specifically for audio technology. In mobile IoAuT4H applications, such as connected ambulances [208], [209] and uninterrupted connectivity is a necessity, the infrastructure must efficiently manage real-time healthcare audio services across mobile ambulances, wearable devices, and sensors in vehicles. Proactive handover and mobility management between 5G base stations are critical for maintaining seamless connectivity. ML-based handover solutions [210], [211], [212], [213], [214], [215] are integral to the infrastructure, ensuring robust support for these mobile IoAuT4H applications by optimising handovers and reducing latency during critical transitions.
Conclusion
The integration of acoustic sensing facilitated by the IoAuT4H promises to uplift the quality of healthcare. As delineated in this article the harmony of acoustic sensors with the healthcare system, not only streamlines intricate patient monitoring but also fosters innovative avenues for diagnostics, therapeutic interventions, and health awareness. The impact of the IoAuT4H, however, is not confined to one domain. By bridging the information gap and ensuring real-time data availability, acoustic sensing amalgamates the domains of patient autonomy, healthcare accessibility, and remote health monitoring. This fusion is pivotal, especially in an era where emerging medical challenges continually test the global health infrastructure. It is important to develop this technology thoughtfully, with a focus on inclusivity, ethics, and integration. As digital health becomes more prevalent, the IoAuT4H stands to offer significant improvements in health and wellness.
Moving forward, IoAuT4H is set to undergo significant growth in the healthcare sector. As acoustic sensor technologies continue to advance, their precision, accuracy, and usability in diverse healthcare scenarios are expected to increase. The integration of AI, edge computing, and Big Data analytics with IoAuT4H will enhance its functionality, leading to more customised, predictive, and proactive healthcare solutions. This progression, however, comes with its own set of challenges, including the ethical handling of data privacy, the robustness of sensing systems, and the necessity for collaboration across disciplines, particularly among technologists, healthcare professionals, and policymakers. Effectively addressing these challenges will be crucial in unlocking the full potential of IoAuT4H, building on the groundwork laid by recent advancements in remote consultations, telemedicine, and widespread health information dissemination.