Abstract:
Google’s voice assistant has the voice match feature, which can only recognize its user’s voice. However, it cannot distinguish between an authentic human voice or an aud...Show MoreMetadata
Abstract:
Google’s voice assistant has the voice match feature, which can only recognize its user’s voice. However, it cannot distinguish between an authentic human voice or an audio-replayed replica of the same person’s voice. This work develops a Gaussian shallow learning Naive Bayes (GNB) voice-replay detector to add such a missing layer of verification. In the front-end feature extraction stage, the model extracts the Mel frequency Cepstrum Coefficients (MFCC) and Constant Q Cepstrum Coefficients (CQCC) from the input voice signal. The gathered attributes are given to the developed GNB classifier to classify the input speech as either genuine from a live source or replayed from a previously recorded source. The GNB classifier is trained using extensive datasets of labeled speech feature samples from both classes. The Equal Error Rate (%EER) statistic measures the classifier’s performance. The trained GNB classifier is exposed to extensive development and evaluation datasets to optimize performance in various reduction, normalization, and filtration situations and settings. The top %EER values for the GNB classifier are 14.3553% for the development set and 19.8722% for the evaluation set. A real-time experiment is conducted with the developed learning model to support the obtained performance results.
Published in: 2022 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT)
Date of Conference: 20-21 November 2022
Date Added to IEEE Xplore: 30 December 2022
ISBN Information: