Journals & Magazines >IEEE Journal of Biomedical an... >Volume: 26 Issue: 11

Gradient Boosting Machine and Efficient Combination of Features for Speech-Based Detection of COVID-19

Abstract:

In recent times, speech-based automatic disease detection systems have shown several promising results in biomedical and life science applications, especially in the case...Show More

Metadata

Abstract:

In recent times, speech-based automatic disease detection systems have shown several promising results in biomedical and life science applications, especially in the case of respiratory diseases. It provides a quick, cost-effective, reliable, and non-invasive potential alternative detection option for COVID-19 in the ongoing pandemic scenario since the subject's voice can be remotely recorded and sent for further analysis. The existing COVID-19 detection methods including RT-PCR, and chest X-ray tests are not only costlier but also require the involvement of a trained technician. The present paper proposes a novel speech-based respiratory disease detection scheme for COVID-19 and Asthma using the Gradient Boosting Machine-based classifier. From the recorded speech samples, the spectral, cepstral, and periodicity features, as well as spectral descriptors, are computed and then homogeneously fused to obtain relevant statistical features. These features are subsequently used as inputs to the Gradient Boosting Machine. The various performance matrices of the proposed model have been obtained using thirteen sound categories' speech data collected from more than 50 countries using five standard datasets for accurate diagnosis of respiratory diseases including COVID-19. The overall average accuracy achieved by the proposed model using the stratified k-fold cross-validation test is above 97%. The analysis of various performance matrices demonstrates that under the current pandemic scenario, the proposed COVID-19 detection scheme can be gainfully employed by physicians.

Published in: IEEE Journal of Biomedical and Health Informatics ( Volume: 26, Issue: 11, November 2022)

Page(s): 5364 - 5371

Date of Publication: 10 August 2022

ISSN Information:

PubMed ID: 35947565

DOI: 10.1109/JBHI.2022.3197910

Citations are not available for this document.

Contents

SECTION I.

Introduction

Recent developments in speech signal processing have shown numerous clinical applications for non-invasive diagnosis of diseases which helps in effective remote health monitoring and remote healthcare facilities [1], [2], [3], [4]. In the current Coronavirus Disease 2019 (COVID-19) pandemic scenario, this speech-based remote health monitoring system can play a crucial role. According to the World Health Organization data, more than 579 million people have suffered including six million deaths reported till August 8, 2022, due to COVID-19 [5]. The standard and reliable test of COVID-19 is the Reverse transcription-polymerase chain reaction test (RT-PCR) test which is expensive (US $125 per test package, and over$ 15,000 to set up a processing lab) and also time-consuming (4–6 hours of processing time, and a turn-around of 2–4 days, including shipping) [6]. To deal with this challenging situation, there is a huge requirement for large-scale testing for isolating infected individuals and contact tracing [7]. Under this scenario, speech-based COVID-19 detection (CD) is one of the simplest, safest as well cost-effective methods [8].

Several temporal and spectral acoustic features of subjects have been used as inputs to a random forest model for the classification of speech into nine categories such as shallow and deep breathing, shallow and heavy cough, sustained vowel phonation (/o/, /e/, /a/), and normal and fast counting [9]. Detection accuracy of 66.74 % is reported in this study. In [10], respiratory sounds such as cough and breathing have been employed to classify COVID-19 from asthma using 733-dimensional features including 477-dimensional handcrafted features and 256-dimensional VGGNet-based features. The Logistic Regression-based classifier is used to provide an area under the receiver operator characteristic curve (ROC-AUC) of above 80%. The CD from online available speech data has been carried out using phoneme level analysis, Mel filter bank features, and the SVM classifier. It is reported that an accuracy of 88.6% is achieved from a limited number of 19 speakers [11]. An automated machine learning-based COVID-19 classification model is developed using glottal, prosodic, and spectral features from short-duration speech segments [12]. The proposed model yields a classification accuracy of 80%. Modified cepstral features are extracted from two speech databases and fed to the support vector machine (SVM) classifiers for CD and maximum accuracy of 85% is obtained [13]. Transfer learning-based deep neural network classifiers are used for CD for cough, breath, and speech with a ROC-AUC of 0.982, 0.942, and 0.923 respectively [14]. Several machine learning-based algorithms are analyzed for the mobile health solutions of CD and it is observed that the SVM technique provides the highest accuracy of 97% for the Coswara database [15]. A mobile application is developed for CD by combining the symptoms checker with voice, breath, and cough signals for robust performance on openly sourced and noisy data sets by using deep CNN and gradient boosting [16].

Even though several speech-based CD methods have been proposed, there is still scope for improvement in terms of detection accuracy, computational complexity as well as testing on multiple datasets in different categories of speech. As the early CD is essential, the higher and more reliable accuracy of detection is very important which would drastically reduce the spread and medical emergency of the detection. Additionally, many researchers have focused on using chest X-rays for CD using several image processing techniques [17], [18], [19], [20], [21]. Although it achieved superior performance in terms of accuracy but acquisition of chest X-rays is a cumbersome task. A physical visit, a well-trained technician for successful data acquisition, and a medical practitioner are all required. In light of these considerations, the current research focuses on the development of an improved CD system based on speech. For efficient extraction of information from the speech samples, an effective combination of speech features is used in this paper along with Light Gradient Boosting Machine which was proposed by Microsoft in 2016 [22]. It provides improved training performance requiring minimum memory, and parallel processing ability as well as handling large-scale data compared to the traditional machine learning algorithms. In recent years, it has been employed for genomics data analysis [23], speech processing [16], image processing [24], arrhythmia detection [25], and others. Because of the associated advantages, the gradient boosting technique is chosen in the current implementation to achieve better classification performance. The main research contributions of the paper are listed below:

Application of intelligent preprocessing techniques to bring the speech quality of the different real-life recorded speech to equal acoustic levels.
Extraction of spectral, cepstral, and periodicity features at frame level for efficient combination of high dimensional relevant audio features at sample level to accurately detect several respiratory diseases including COVID-19 and Asthma.
Development of Gradient Boosting Machine as a classifier and comparison of the detection performance matrices of the proposed method with those obtained from the standard methods using five datasets in thirteen different categories.
Assessment of the generalization ability of the proposed model which can be presented as a clinical application method wherein the model is trained with a large number of speech samples from the cough category of multiple datasets. Later it can predict the condition of the patient from his/her cough sound.

The paper is organized into four sections with Section I dealing with the introduction, literature review, motivations, and objectives of the investigation. The details of the materials and methods employed are dealt with in section II. Section III contains an analysis of results, and contributions in terms of research findings. The outcome of the research, limitations, and future research scope are presented in section IV.

SECTION II.

Material and Methods

The block diagram of the proposed speech-based COVID-19 detection scheme is presented in Fig. 1 consisting of the following steps: dataset collection, preprocessing and features extraction, scaling of features, classification model training, and validation, and performance evaluation.

Fig. 1.

Block diagram of the proposed speech-based COVID-19 detection scheme.

Show All

A. Datasets

Five datasets have been used to evaluate the performance of the suggested model in this study. These are: Coswara (Dataset-1) [9], Crowdsourced respiratory by the University of Cambridge (Dataset-2) [10], Virufy (Dataset-3) [26], recorded interviews from online platforms in telephone quality speech (Dataset-4) [11], Coughvid (Dataset-5) [7]. Out of these, data set-2 is used for both binary (COVID-19 positive, and healthy) and multi-class classification (COVID-19 positive, Asthma positive, and healthy) whereas datasets-1,3,4,5 are used for the binary classification task. These datasets contain speech samples of subjects from more than 50 countries. The dataset preparation follows a standard technique as shown in Fig. 2. Due to the deadly spreading nature of the COVID-19, the speech samples are recorded for most of the speech datasets in the online mode either by using mobile or web-based applications [7], [9], [10], [11], [26]. Along with the audio samples the COVID-19 status, location, gender, age, and the health conditions of the patients are also stored. The brief details of these five datasets are listed in Table I. A total of 4178 speech samples have been used in the simulation study. Complete details of these datasets are given in supplementary information S1.

B. Preprocessing

Speech preprocessing is critical to the overall success of developing a robust and efficient speech recognition system [27]. When speech is recorded by different users in different environments, then the speech quality varies drastically in one category within the dataset as well as across different datasets [28]. The background noise level significantly affects the overall performance of the speech recognition system [29], [30]. For highly non-stationary situations, the noise level is computed using the noise estimation algorithm [31]. To evaluate the effect of preprocessing, the variation in noise level and coefficient of variation are plotted in Figures 3 and 4 for two cases before and after preprocessing. The coefficient of variation measures the variation in the noise level by calculating the ratio between the standard deviation and mean of the estimated noise levels for one class [32]. For the noise level estimation, the cough category sound is used for dataset-1,2,3,5 and complete sentence sounds for dataset-4. The steps involved in preprocessing are mentioned below.

TABLE I details of the Five Experimental Datasets Used in the Simulation

Fig. 2.

Flowchart for the dataset preparation and classification.

Show All

Fig. 3.

Change in the noise level and between positive and negative class.

Show All

Fig. 4.

Change in Coefficient of variation (CV) of noise level between positive and negative class.

Show All

1) Low Pass Filtering

The sampling frequency of speech signals is different for different datasets. However, significant information is found within the 8 kHz bandwidth [33]. It is also evident from Fig. 5, where the time-frequency representation of one cough signal of dataset-2 is plotted using the spectrogram. To remove the unwanted signal components which are not associated with human speech, all the audio signals are passed through a low pass filter of 10 kHz. To maintain a uniform sampling rate and to extract the same number of features for each frame, all speech signals are resampled at the maximum available sampling frequency (48 kHz) of all the datasets.

Fig. 5.

Spectrogram of the cough signal from dataset-2.

Show All

2) Speech Enhancement

The multi-band spectral subtraction approach has been employed to denoise the speech samples of all five datasets [34]. This is a simple and effective method for denoising signals affected by colored noises where spectral subtraction is performed separately at different frequency bands.

3) Voice Activity Detection and Dynamic Level Control

To separate the voiced frames from the unvoiced frames, a simple short-term energy-based voice activity detection (VAD) algorithm is used. The voiced frames are then passed through a Dynamic Level Controller (DLC). It is made up of an expander and a compressor, with the expander boosting low signal levels and the compressor lowering peak levels [35].

C. Features Extraction

In this section, the details of the audio features extraction techniques used in the investigation are dealt with. At the frame and sample levels, numerous audio features are extracted in the frequency, structural, statistical, and temporal domains. The complete recording of a single user in one category comprises one sample, while a frame is a subset of the entire audio data found in a sample. Considering there is ’n' number of frames present in each sample, the details of the frame-level features are described below. The features are named as f(serial number of the feature) such as f1 to f5701.

Spectral Features — The speech signal is a non-stationary signal but the properties remain constant over fixed time intervals of 10–30 ms. The short-time spectral features are obtained by converting the time domain signal into the frequency domain by applying different Transform techniques. These features provide information about spectral information which plays an important role in speech recognition [36]. In this work, the hamming window is chosen as it provides less spectral leakage and the side lobes of this window are lower than the others [37]. A window size of 25 msec duration with 50% overlapping between two successive frames has been considered. The spectral features extracted are: Linear Spectrum (n×512), Mel Spectrum (n×32), Bark Spectrum (n×32), and Equivalent Rectangular Bandwidth (ERB) Spectrum (n×44). Therefore, the total dimension of spectral features is (n×620).
Cepstral Features — The cepstral features help in extracting relevant speech information for speech emotion recognition tasks by using filter banks based on human speech perception [13]. The cepstral features are Mel-frequency cepstral coefficients (MFCC), MFCC Delta, MFCC Delta Delta, Gammatone cepstral coefficients (GTCC), GTCC Delta, GTCC Delta Delta, each of dimension (n×13). Therefore, the total dimension of cepstral features is (n×78)
Spectral Descriptors — These features extract statistical information from the lengthy spectral features. These features are widely used in speaker, music, mood recognition, and classification tasks [38]. The spectral descriptors used are: Centroid, Crest, Decrease, Entropy, Flatness, Flux, Kurtosis, Roll-off Point, Skewness, Slope, and Spread, each having dimension (n×1). The total dimension of spectral descriptors is (n×11).
Periodicity Features —These features provide important time-domain information of speech which helps in monaural speech analysis [39]. The features used are: Pitch (n×1), and Harmonic Ratio (n×1).

For this purpose, MATLAB-based audioFeatureExtractor is used [40], [41]. The fusion of spectral features, cepstral features, spectral descriptors, and periodicity features yields an n* 712-dimensional feature vector for each speech sample. As the frame numbers vary for each sample, so training in machine learning becomes difficult. Therefore, in this work, the statistical measures are computed at the sample level and it provides a fixed length of features for each sample. To extract statistical distributions at the sample level, several statistical features are extracted from the frame-level features [10]. The sample level features are: mean (f1:f712), median (f713:f1424), RMS (root-mean-square) (f1425:f2136), maximum (f2137:f2848), minimum (f2849:f3560), quartile (1st and 3 rd quartile, interquartile range) (f3561:f3563), standard deviation (SD) (f3564:f4275), skewness (f4276:f4987), kurtosis (f4988:f5699) of all frame-level features. Also, the Zero crossing rate (ZCR) (f5700), and Short-time energy (STE) (f5701) are calculated sample-wise. Each combined feature vector is the concatenation of the sample level features and it is a 5701-dimensional feature vector. Outliers in the high-dimensional feature vector can have an impact on the learning algorithm's performance. As a result, feature scaling is an important preprocessing step. The robust scaler removes the median and scales the data according to the quantile range, removing outliers from the features [42].

D. LightGBM (LGM)

The LGM is an effective gradient boosting decision tree with gradient-based one-side sampling (GOSS) and exclusive feature bundling (EFB) to increase computational efficiency without affecting the accuracy [22]. The steps involved in LGM modeling are: (i) defining the loss function, (ii) performing the GOSS sampling, and identification of the optimal segmentation point using a histogram-based algorithm, (iii) calculation of feature dimension by the EFB method, (iv) performing the leaf-wise algorithm to combine the samples to fit residuals, and (v) splitting the nodes based on the objective function and generate a decision tree.

Let us consider X as the input feature vector and Y as the class labels. The aim of LGM is to determine the approximation function $\widehat{F}(x)$ so that the loss function $(L(y,F(x)))$ gets minimized [43]. $\begin{align*} \widehat{F}(x)=\underset{F}{argmin}\;E_{xy}\left[L(y,F(x)))\right] \tag{1} \end{align*}$ View Source The final LGM model (F $_{M}$ (X)) is formed using M decision trees such that $\begin{align*} F_{M}(X)=\sum\limits _{m=1}^{M}F_{m}(X)\tag{2} \end{align*}$ View Source The LGM is trained in an additive form at step m and can be expressed as: $\begin{align*} \tau _{m}&=\sum\limits _{i-1}^{n}\;L(y_{i}\;,\;F_{m-1}(x_{i})+F_{m}(x_{i}))\\ & \cong \;\sum\limits _{i=1}^{n}\;\left(g_{i}F_{m}(x_{i})\right)\;+ \frac{1}{2}h_{i}F_{m}^{2}(x_{i})) \tag{3} \end{align*}$ View Source Where, g $_{i}$ and h $_{i\ }$ represent the first and second-order gradient statistics of the loss function. By denoting the sample set I $_{j}$ of leaf $j(1\leq j\leq J)$ (3) can be written as: $\begin{align*} \tau _{m}=\sum\limits _{j=1}^{J}\;\left(\left(\sum\limits _{i\in I_{j}}g_{i}\right)\;w_{j}+ \frac{1}{2}\left(\sum\limits _{i\in I_{j}}h_{i}+\lambda \right)w_{j}^{2}\right) \tag{4} \end{align*}$ View Source The optimal leaf weight scores of each leaf node $(w_{j}^\ast)$ is calculated as: $\begin{align*} w_{j}^\ast =- \frac{{ \sum\nolimits _{i\in I_{j}}}g_{i}}{{ \sum\nolimits _{i\in I_{j}}}\;h_{i}+\lambda } \tag{5} \end{align*}$ View Source Let, I $_{L}$ and I $_{R}$ are the sample sets of the left and right branches, respectively. The leaf weight, the regular penalty factor, $\lambda$ is used as a smoothing parameter in calculating gain in the process of splitting points. The objective function after adding the split is then calculated as $\begin{align*} G=& \frac{1}{2}\left(\left| \frac{\left({ \sum\nolimits _{i\in I_{L}}}g_{i}\right)^{2}}{\left({ \sum\nolimits _{i\in I_{L}}}h_{i}+\lambda \right)}+ \frac{\left({ \sum\nolimits _{i\in I_{R}}g_{i}}\right)^{2}}{\left({ \sum\nolimits _{i\in I_{R}}h_{i}+\lambda }\right)} \right. \right. \\ &\quad \left.\left. + \frac{\left({ \sum\nolimits _{i\in I}g_{i}}\right)^{2}}{\left({ \sum\nolimits _{i\in I}h_{i}+\lambda }\right)}\right|\right) \tag{6} \end{align*}$ View Source In the conventional gradient boosting technique, the tree grows horizontally, while in LGM the tree grows vertically which makes it an efficient tool for processing large-scale data and features [43]. The GOSS technique of LGM effectively selects the input features with larger gradients and removes the features with smaller gradient values. This works as feature reduction in the current implementation where the input feature size is relatively higher and thereby, it increases the efficiency of the detection model.

SECTION III.

Results and Discussions

The performance of the proposed model is assessed for two tasks, (I) binary classification task to predict the speech samples as COVID-19 positive or negative, and (II) multiclass classification task to predict COVID-19 positive, Asthma positive, and healthy speech samples. To perform this, the speech samples are passed through the additional preprocessing blocks such as low pass filtering, speech enhancement, voice activity detection, and dynamic level control. Then a total of 5701 features are extracted from each sample. Here, the preprocessing block is a part of the feature extraction. These features are combined with an LGM classifier and three baseline classifiers such as Random Forest (RF) [9], SVM [10], [11], and K-Nearest Neighbor (KNN) [44] used for the speech classification task. For the development of the classification model five-fold stratified cross-validation scheme is employed. Standard performance measures as reported in [45] such as Classification Accuracy (CA), F-2 Score (F-2), Precision (PR), Recall (RC), and area under the curve (AUC), are employed in this study. The details of the performance measures are described in supplementary information S2. Grid search is used to find the optimal parameters of the classifiers. These parameters are listed in supplementary information S3.

A. Performance Evaluation as a Binary Classification Task

The comparative study between the performance of LGM, SVM, RF, and KNN classifiers for binary classification task are presented in Tables II and III. The LGM classifier provides an average accuracy of 0.978, an F-2 Score of 0.979, and an AUC of 0.976 across all the categories in the five datasets. The average accuracy, F-2 Score, and AUC of the SVM classifier are 0.749, 0.717, and 0.712, respectively. Similarly, for the RF classifier, the average accuracy, F-2 score, and AUC are found to be 0.967, 0.966, and 0.963, respectively. For the KNN classifier, the values are 0.753, 0.745, and 0.728. The results show that the LGM classifier performs better on the high-dimensional features than the SVM, RF, and KNN classifiers.

TABLE II performance Comparison for Dataset-1 Using 5701 Feature Vector for Binary Classification

TABLE III performance Comparison for Dataset-2,3,4,5 Using 5701 Feature Vector for Binary Classification

B. Performance Evaluation as a Three-Class Classification Task

To further evaluate the prediction ability of the classifiers, an assessment of multi-class data has been carried out for dataset-2 contains samples of COVID-19 positive, Asthma positive, and healthy in the cough and breathing sound categories. The results are listed in Table IV. It is observed that the performance of the LGM classifier is superior in all the performance measures as compared to the SVM, RF, and KNN classifiers respectively. The ROC curves are two-dimensional plots that provide the relative trade-offs between the true positive and false-positive rates [45]. The ROC curves of dataset-2 in the cough category (binary and multi-class) are shown in Fig. 6 and Fig. 7 respectively. The proposed approach has a high true true-positive rate and a low false false-positive rate, according to the ROC curves. The AUC of the proposed model is 0.99, which is better in comparison to the RF, SVM, and KNN models. The proposed features with the additional preprocessing provide better results compared to standard features and classifiers.

TABLE IV performance Evaluation for Detection COVID-19 Positive, Negative and Asthma From Dataset-2 Using 5701 Feature Vectors

Fig. 6.

Comparison of ROC curves of different classifiers for multiclass classification in cough category of dataset-2.

Show All

Fig. 7.

Comparison of ROC curves of different classifiers for binary classification in cough category of dataset-2.

Show All

C. Comparison With Baseline Models and Combined Datasets

A comparative analysis of the proposed model over the existing methods used in the five datasets are shown in Table V. The Improvement in the detection performance is mentioned in the last column. It is observed that the proposed model shows consistent performance across all the datasets as well as in the combined dataset. There is approximately 30%, 15%, 25%, 9%, and 20% minimum improvement in CD performance for datasets 1,2,3,4,5. For the assessment of the generalization ability of the proposed model, a combined dataset is prepared with the speech signals in the cough category from datasets 1,2,3,5. In the combined dataset, there is a total of 1528 samples from the healthy category, while 1344 samples are from the COVID-19 positive category. The performance of all four methods is evaluated and the results are listed in Table VI. It is observed that the proposed model shows the highest accuracy of 0.983 over the other three standard models.

TABLE V comparative Analysis of the Overall Detection of Performance of Each of the Datasets

TABLE VI performance Comparison for Combined Dataset (Cough Category) Using 5701 Feature Vector

The minimum CD performance of the proposed method is approximately 97 % across all sound categories, databases, and CV schemes. The proposed approach has a high true-positive rate and a low false-positive rate, according to the ROC curves. The AUC of the proposed model is 0.99, which is better in comparison to the RF, SVM, and KNN models. The proposed features with the additional preprocessing provide better results compared to standard features and classifiers.

D. Statistical Analysis of Classifier Models

The statistical analysis of the comparison of the performance of the LGM model with the standard machine learning-based models SVM, RF, and KNN over five datasets is listed in Table VII. For this purpose, the t-statistic value between the two classifiers is computed as mentioned in (7). $\begin{align*} \begin{array}{l}t= \frac{c_{1}-c_{2}}{\sqrt{v_{1}^{2}+v_{2}^{2}}}\end{array} \tag{7} \end{align*}$ View Source Where, the mean and variance of the 5-fold classification accuracy of classifier 1 and classifier 2 are denoted as $c_{1}$ , $c_{2}$ , and $v_{1}^{2}$ , $v_{2}^{2}$ respectively [46]. Most of the t-values in Table VII are positive, which indicates the superior performance of the proposed model over the standard machine learning-based models.The above classification tasks, comparative, and statistical analysis results reveal the effectiveness of the proposed model with preprocessing and an efficient combination of audio features. The main reason for this is the use of various signal processing techniques such as low pass filtering, speech enhancement, voice activity detection, and dynamic level control have substantially helped in reducing the effects of various environments while recording the speech signal of subjects. Secondly, The use of feature fusion-based statistical features evaluated from frame-level speech signal to the LGM classifier has yielded enhanced detection accuracy which is a minimum of 9% more than that obtained by the reported standard methods. The detection model has been observed to be robust as it offers a consistent detection performance of 97% while testing with five different speech datasets.

TABLE VII comparison of T-Statistic of Proposed Model With Standard ML-Based Models

SECTION IV.

Conclusion

In the current study, a non-invasive and effective respiratory disease detection scheme is developed and tested for COVID-19 and Asthma. The major contributions of the investigation are the use of improved preprocessing techniques, an effective combination of spectral, cepstral, and periodicity features along with the implementation of gradient boosting machines for robust and consistent performance across multiple datasets. The proposed model can be used for early and fast automatic diagnosis of COVID-19 without the subject visiting a hospital as well as without the assistance of a medical professional. However, it is suggested that the detection scheme by the use of the proposed intelligent model can be verified by the medical professional before a prescription is initiated. It may be noted that the proposed detection scheme involves more computations and training time. There is still room to improve the method's computing complexity for faster implementations. The effective preprocessing techniques, as well as the combination of audio features can be further implemented and tested for other speech recognition tasks including emotion recognition, Parkinson's disease, and heart disease detection.

ACKNOWLEDGMENT

The authors express their gratitude to Professor Cecilia Mascolo, Department of Computer Science and Technology and Chancellor, Master, and Scholar of the University of Cambridge for sharing the speech database of COVID-19 [10].

Cites in Papers - |

Cites in Papers - IEEE (8)

Select All

Vaishnavi Shivaji Mohite, Krishna Gayatri Patra, Ch Hari Sankar, M Srinivas, S Ullas, "Intelligent Underwater Sound Surveillance for Intrusion Detection and Emergency Alerting", 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT), pp.1-7, 2024.

Show Article

Google Scholar

Jagannath Dayal Pradhan, Komal Tripathy, Tusar Kanti Dash, Ganapati Panda, Prabodh Kumar Sahoo, Heli A. Shah, "Improved Voice Activity Detection for Smart Healthcare of Speech-based Contagious Diseases", 2024 Parul International Conference on Engineering and Technology (PICET), pp.1-5, 2024.

Show Article

Google Scholar

Kai Chen, Zhengyuan Zhou, Yuchen Li, Xu Ji, Jiasong Wu, Jean-Louis Coatrieux, Yang Chen, Gouenou Coatrieux, "RED-Net: Residual and Enhanced Discriminative Network for Image Steganalysis in the Internet of Medical Things and Telemedicine", IEEE Journal of Biomedical and Health Informatics, vol.28, no.3, pp.1611-1622, 2024.

Show Article

Google Scholar

Hirak Mazumdar, Chinmay Chakraborty, Satheesh Bojja Venkatakrishnan, Ajeet Kaushik, Hardik A. Gohel, "Quantum-Inspired Heuristic Algorithm for Secure Healthcare Prediction Using Blockchain Technology", IEEE Journal of Biomedical and Health Informatics, vol.28, no.6, pp.3371-3378, 2024.

Show Article

Google Scholar

Haifeng Lu, Shihao Xu, Shipeng Zhao, Xiping Hu, Rong Ma, Bin Hu, "EPIC: Emotion Perception by Spatio-Temporal Interaction Context of Gait", IEEE Journal of Biomedical and Health Informatics, vol.28, no.5, pp.2592-2601, 2024.

Show Article

Google Scholar

Tusar Kanti Dash, Chinmay Chakraborty, Satyajit Mahapatra, Ganapati Panda, "Mitigating Information Interruptions by COVID-19 Face Masks: A Three-Stage Speech Enhancement Scheme", IEEE Transactions on Computational Social Systems, vol.11, no.4, pp.4790-4799, 2024.

Show Article

Google Scholar

Youcef Djenouri, Ahmed Nabil Belbachir, Tomasz Michalak, Anis Yazidi, "Shapley Deep Learning: A Consensus for General-Purpose Vision Systems", 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), pp.1216-1225, 2023.

Show Article

Google Scholar

Vanshika Rastogi, Ajit Kumar Jain, "Covid-19 Classification Model Based on Age and Gender Analysis Using SWHO-Based Deep CNN", 2023 3rd International Conference on Pervasive Computing and Social Networking (ICPCSN), pp.431-436, 2023.

Show Article

Google Scholar

Cites in Papers - Other Publishers (27)

Longpeng Li, Longhui Li, Yaxin Wang, Baoai Wu, Yue Guan, Yinghua Chen, Jinfeng Zhao, "Integration of Machine Learning and Experimental Validation to Identify Anoikis-Related Prognostic Signature for Predicting the Breast Cancer Tumor Microenvironment and Treatment Response", Genes, vol.15, no.11, pp.1458, 2024.

Gradient Boosting Machine and Efficient Combination of Features for Speech-Based Detection of COVID-19

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Introduction

Material and Methods

A. Datasets

B. Preprocessing

1) Low Pass Filtering

2) Speech Enhancement

3) Voice Activity Detection and Dynamic Level Control

C. Features Extraction

D. LightGBM (LGM)

Results and Discussions

A. Performance Evaluation as a Binary Classification Task

B. Performance Evaluation as a Three-Class Classification Task

C. Comparison With Baseline Models and Combined Datasets

D. Statistical Analysis of Classifier Models

Conclusion

ACKNOWLEDGMENT

Cites in Papers - IEEE (8) | Other Publishers (27)

Cites in Papers - IEEE (8)

Cites in Papers - Other Publishers (27)

References

Cites in Papers - |