I. Introduction
World Health Organization announced in March 2020 that more than 5% of the global population uses sign languages [1]. As one of the most intrinsic ways of conveying semantic information out of the brain, sign language can further improve fluent communication for the community with deafness and hearing loss (D&HL) and help them be highly integrated into the global world. Many approaches have been proposed and studied by capturing signs with different commercial on-the-shelf sensors. Meanwhile, rapid developments in visual and wearable sensors facilitate the translation of sign language into the textual comprehension of computers and other electronic devices in the Internet-of-Things (IoT) era. In 2015, Tubaiz et al. [2] used data gloves to recognize 40 short sentences containing 80 words in Arabic sign language and combined with an optical camera to segment and align hand movements with their corresponding sign words with a sentence-level recognition rate higher than 98%. Wu et al. [3] proposed a wearable real-time American sign language recognition (SLR) system in 2016, which recognized 80 isolated American sign language (ASL) words by capturing hand and arm movements with an inertial measurement unit and surface emulsion signals. The recognition results of the same and different volunteers reached 96.16% and 85.24%, respectively. Koller et al. [4] proposed a new approach to large vocabulary continuous SLR across multiple signers in 2015 and formed a publicly available large vocabulary database that contributed to the development of SLR. Later, Ibrahim et al. [5] proposed a computer vision-based SLR system in 2017 and tested it on isolated Arabic words with a recognition accuracy of 97%. Chong and Lee [6] used Leap Motion controller in 2018 to classify 10 digits, 26 letters, and 36 sign words in American sign language (ASL) based on finger and hand movements with a recognition accuracy of up to 72.79%. Despite years of development, there are still some problems in the research aforementioned: contact wearable inertial sensors result in unnatural hand movement due to the wire and circuitry worn on hands. Computer vision-based SLRs depend on optical cameras, but their applications are subject to environmental darkness and potential privacy breaches. Therefore, finding new alternatives to achieve intelligent, efficient, and accurate human–computer interaction (HCI) for the D&HL community becomes significant.