Loading [MathJax]/extensions/MathMenu.js
IEEE Xplore Search Results

Showing 1-25 of 2,577 resultsfor

Results

Human-Machine interaction through voice modality in recent time has resulted in both research and business use cases. Major business organizations are making the shift towards developing their state-of-the-art voice assistants, whose accuracy of understanding human voice command, is not affected to surrounding noise. One of the first steps towards developing such an interactive model is understand...Show More
We propose a novel voice activity detection (VAD) model in a low-resource environment. Our key idea is to model VAD as a denoising task and construct a network that is designed to identify nuisance features for a speech classification task. We train the model to simultaneously identify irrelevant features while predicting the type of speech event. Our model contains only 7.8K parameters, outperfor...Show More
Voice activity detection (VAD) based on deep learning has achieved remarkable success. However, when the traditional features (e.g., raw waveforms and MFCCs) are directly fed to the deep neural network model, the performance decreases because of noise interference. Here, we propose a robust VAD approach using a masked auditory encoder based convolutional neural network (M-AECNN). First, we analyze...Show More
Voice Activity Detection (VAD) is becoming an essential front-end component in various speech processing systems. As those systems are commonly deployed in environments with diverse noise types and low signal-to-noise ratios (SNRs), an effective VAD method should perform robust detection of speech region out of noisy background signals. In this paper, we propose adversarial domain adaptive VAD (AD...Show More
In recent years, DNN-based systems are extremely popular for performing automatic speech recognition (ASR) task. They have shown better performance than other methods. To perform ASR task efficiently, segmentation of the input data also has to be accurate. There are different kinds of methods of voice activity detection (VAD) including power based statistical methods. But to incorporate with the A...Show More
It is of great significance to explore the influence mechanism of verbal cues on self-voice perception under different conditions. The aim of this study is to probe the effects of verbal and non-verbal conditions on implicit task (experiment 1) and explicit task (experiment 2) of self-voice. Binaural auditory channels were used to present two voices successively, and the speakers included the part...Show More
We introduce a two-stage approach using LSTM for voice activity detection with sound event classification. This approach proves to be effective when training data is limited. Moreover, it achieves better performance than pre-trained model using large-scale data set (AudioSet). Apart from clip-level accuracy, we also introduce two metrics for evaluating overall audio segmentation accuracy: mean $\m...Show More
Simulation of voice disorders has a great deal of interest among clinicians and scientists interested in the simulation of voice timbres. Jitter and shimmer are commonly used as acoustic cues for the analysis and clinical assessment of the voice. In this paper, a voice synthesizer based on direct digital synthesis (DDS) is proposed to model jitter and shimmer. The major advantage of the DDS-based ...Show More
Detecting anchor’s voice in live musical streams is an important preprocessing step for music and speech signal processing. Existing approaches to voice activity detection (VAD) primarily rely on audio, however, audio-based VAD is difficult to effectively focus on the target voice in noisy environments. This paper proposes a rule-embedded network to fuse the audio-visual (A-V) inputs for better de...Show More
Physicians utilize sound records for diagnosing respiratory diseases, but human error in judgment can lead to misdiagnosis and delay treatment. We propose an AI-inspired system separated into event detection and event classification. For event detection, we use voice activity detection to detect the presence of sound events. For event classification, we use the trained DenseNet model to classify v...Show More
Human voice is characteristic for an individual. The ability to recognize a speaker by his/her voice can be a valuable biometric tool with enormous commercial as well as academic potential. Commercially, it can be utilized for ensuring secure access to any system. Academically, it can shed light on the speech processing abilities of the brain as well as the speech mechanism. In fact, this feature ...Show More
The article considers the pre-processing voice signals for voice recognition systems based on the use of artificial neural networks. Based segmentation preprocessing is put in the speech signal according to a phonetic transcription of language, in order to reduce the amount of data supplied to the input of the neural network, which considerably improves its input data sensitivity. Application of n...Show More
This paper presents the characteristic and traffic modeling of VoIP conversation. The traffic data are measured from the operating IP network of Telephone Organization of Thailand (TOT) Corporation. The observed distributions of talkspurt and silent durations considerably differ from the standard ON-OFF model (ITU-T P.59). It is proposed that a new state "long burst" representing the background no...Show More
Background: Parkinson’s disease (PD) is a multi-symptom neurodegenerative disease generally managed with medications, of which levodopa is the most effective. Determining the dosage of levodopa requires regular meetings where motor function can be observed. Speech impairment is an early symptom in PD and has been proposed for early detection and monitoring of the disease. However, findings from pr...Show More
In this paper, significance of the Cepstral Mean and Variance Normalization (CMVN) is investigated for replay Spoofed Speech Detection (SSD) task. Literature shows that application of the CMVN produces significantly better performance on many feature sets, which is counter-intuitive for replay SSD task. This behaviour is analyzed by performing experiments for environment-independent and dependent ...Show More
The use of photoplethysmogram signal (PPG) for heart and sleep monitoring is commonly found nowadays in smart-phones and wrist wearables. Besides common usages, it has been proposed and reported that person information can be extracted from PPG for other uses, like biometry tasks. In this work, we explore several end-to-end convolutional neural network architectures for detection of human's charac...Show More
Recent developments in EEG based overt speech recognition have shown that speech recorded with an EEG can be classified well, however there have yet to be actual applications developed for it. This is most likely due to the EEG setup being unintuitive to the layperson. The Gel-based electrodes used in most literature are both hard and time consuming to setup. To move towards a more user friendly a...Show More
Speech activity detection (SAD) is a critical preparation process for speech-based applications. The speech activity detection is used to identify the speech in an audio recording. This paper aims to propose a speech activity detection on the entertainment media domain based on CNN. The fusion of two Dense Convolutional Network (DenseNet) with different feature extraction by using Dempster-Shafer ...Show More
An integrated environment for monitoring and management of educational institutions has been developed on the Home Assistant platform. The system allows surveillance, monitoring and process management of the indoor and outdoor environment, the consumption (and production) of electrical energy, the presence of people etc. The paper presents the capabilities of the system for managing environmental ...Show More
With the rapid development of artificial intelligence technology, the impact of intelligent interactive robots on human life is growing. This paper constructs an intelligent voice interaction robot system based on the cloud platform and Android platform. This system uses a microphone array for audio data collection, environmental noise reduction, and voice activation. The audio processing technolo...Show More
Studies show that many people have difficulties in understanding dialogue in movies when watching TV, especially hard-of-hearing listeners or in adverse listening environments. In order to overcome this problem, we propose an efficient methodology to enhance the speech component of a stereo signal. The method is designed with low computational complexity in mind, and consists of first extracting a...Show More
Rotary voice coil actuators (RVCAs) have been widely used for precision motion control applications and have drawn more and more attention in recent years. In this paper, the eddy current damping effects of RVCAs of both dual and single magnetic circuit types are studied, the electromotive forces in rotor frame arms of the RVCAs are analyzed and verified by FEM calculation, then corresponding rest...Show More
In the present study, we empirically explore how the voice quality of depression patients (as experimental group) differs from that of healthy people (as control group), in terms of jitter, shimmer, HNR and pitch. Our analysis results reveal that the shimmer, maximum HNR and minimum HNR of patients are significantly different from those of the control group. Specifically, the patients tend to have...Show More
Navigational voice interaction is an important factor affecting driving behavior and safety under complex road conditions. However, much of the research on navigational speech has focused on aspects such as commands and tone, and little on voice familiarity. To investigate the effects of navigation voice familiarity on driving stress and emotions, this study was conducted through simulated driving...Show More
Electronic door locks offer advantages over traditional locks, but many require physical contact for password entry, raising virus transmission concerns. This study presents a voice recognition electronic door lock prototype, allowing users to unlock doors through smartphone voice commands, eliminating physical contact. Key findings include successful integration of voice recognition technology an...Show More