Loading [MathJax]/extensions/MathZoom.js
Comparative analysis of KNN and CNN for Localization of Single Sound Source | IEEE Conference Publication | IEEE Xplore

Comparative analysis of KNN and CNN for Localization of Single Sound Source


Abstract:

This paper proposes an integrated sound source localization through a Machine learning model such as KNN (K nearest neighbours algorithm) and CNN(Convolution neural netwo...Show More

Abstract:

This paper proposes an integrated sound source localization through a Machine learning model such as KNN (K nearest neighbours algorithm) and CNN(Convolution neural network). Numerous industries, including robotics, surveillance, virtual reality, home entertainment, telecommunications, automotive systems, environmental monitoring, and speech processing, use sound source localization. Sound source localization is performed by using sound source angle and inter-phase difference (IPD) acoustic properties.and the accuracy obtained for KNN is 99.47% and for CNN is 95.24%. The comparison of the KNN and CNN is also presented in the paper.
Date of Conference: 01-02 September 2023
Date Added to IEEE Xplore: 17 October 2023
ISBN Information:
Conference Location: Bengaluru, India
No metrics found for this document.

I. Introduction

Sound Source Localization (SSL) is finding the specific position or direction from which a sound originates. Analysing a sound source's location and angle with reference to a microphone or sensor network is necessary. Applications that improve user experiences include audio scene analysis, voice recognition, and acoustic tracking. SSL is essential to these applications. Using a pair of microphones and the direction of arrival (DOA) of sound waves, the location of the sound source is ascertained. Based on variances in arrival time, amplitude, andfrequency content of the recorded sound signals by the sensors, several techniques, such as time delay estimations, phase analysis, intensity disparities, and spectral cues, are used to pinpoint the source's position.[1] SSL for indoors using Deep Learning(SSLIDE) based on CNN and DNN with an encoder and two decoders is proposed. Where prediction of location and mitigation of multipath artefacts are simultaneously performed. The result shows that SSLIDE outperforms CNN based on MAE(Mean Absolute Error). In[2] GCC-PHAT(Generalized Cross-Correlation) with RIR is used for feature extraction for 3D CNN training. TIMIT speech data is used, and a microphone array of 6 Omni-directional microphones forming a hexagon with a side length of 0.15m is considered. The model is tested with different reverberation times and SNR and exhibits 95.8% accuracy, making CNN a considerable algorithm for SSL. In [3] a technique that is different from BSS but accurately separates sources is introduced. Delay-and-sum beam forming is used for source identification. Side lobes are disregarded to provide a more precise signal. Each sound source's signal can be acquired using signal reconstruction. [4] This research tested a microphone array with 4 microphones and a 2cm radius and 12 microphones with a 11.9 cm radius for the circular harmonic domain. It was seen that the directivity of12 microphones with a 11.9cm radius is better than the other one. But the author contributed towards the arrangement of 4 mics with 2cm radius array. The sensitivity to reverberation and noise leads to degraded DOA. Although the SSL with deep learning approach gives remarkable accuracy of 86.36% and stability with the CH-E-MMP-CNN method. In[5] the DSCNN was used for first time for SSL. The Speech dataset of the TIMIT Acoustic Phonetic Continuous Speech Corpus was used with a tetrahedral microphone array of 4 cardioid microphones. 3D and 2D with RNN are employed where it was observed that 3D CNN achieves the highest accuracy with DOA errors of 9.39 and 10.29 degrees but results in very high model complexity, whereas DSCNN maintains a balance between accuracy and complexity. The Author in [6] used convolutional neural networks (CNN)to localise responses with the lowest variance and least distortion. By boosting data fusion and reducing the negative effects of noise and reverberation artefacts on the localization technique, this research examines the application of CNNs to improve the direction of arrival estimate using a ULA in noisy and reverberant environments.[7] As in [6], this method also introduced 3D SSL using a tetrahedral microphone array using CNN STFT phase input features and is experimented with a semi-synthetic audio data set. The experiment provides at least 31% lower MAE(Mean Absolute Error) for SSL. For active speech, azimuthal MAE is 18.97% and elevation MAE is 48.49 degrees, which is very low. Furthermore, the method can be developed for multiple active sound sources. [8] The paper, which has a heavy influence on the comparative analysis, uses an 8-microphone Uniform Linear Array (ULA), and the research suggests a CNN-based classification method for Broadband DOA Estimation of a single continuous sound source under noisy and reverberation situations. In contrast to our suggested algorithm, the system is trained using synthetic noise signals rather than real-time sound sources as inputs and performs really well in terms of accuracy. [9] The DOA Estimation for a Single Sound Source employing the designated phase difference between the signals of each direction and for each microphone was previously employed by the same authors in [7], something that was used in [7], [8]. [10] CNNs for MVDR-based (Minimum Variation Distortion Less Response) localization methods were employed, concentrating in particular on the SRP-WMVDR (Steered Response Powered Weighted MVDR) beam-former to improve accuracy in circumstances with a single source and no interferences. By properly allocating component weights, CNNs successfully boosted coherent frequency fusion of the narrowband response power, improving localization performance and reducing noise and reverberation artefacts. [11] Using convolutional recurrent neural networks (CRNNs) and Time Difference of Arrival (TDOA) estimation on a 4-microphone array, the study offers a system for sound source localization and identification. The suggested method outperforms the DCASE 2019 baseline system in recognising and localising sound events from multichannel recordings by integrating these two strategies.

Usage
Select a Year
2025

View as

Total usage sinceOct 2023:63
012345JanFebMarAprMayJunJulAugSepOctNovDec430000000000
Year Total:7
Data is updated monthly. Usage includes PDF downloads and HTML views.
Contact IEEE to Subscribe

References

References is not available for this document.