Fundamental frequency estimation based on the joint time-frequency analysis of harmonic spectral structure | IEEE Journals & Magazine | IEEE Xplore

Fundamental frequency estimation based on the joint time-frequency analysis of harmonic spectral structure


Abstract:

In this paper, we propose a new scheme to analyze the spectral structure of speech signals for fundamental frequency estimation. First, we propose a pitch measure to dete...Show More

Abstract:

In this paper, we propose a new scheme to analyze the spectral structure of speech signals for fundamental frequency estimation. First, we propose a pitch measure to detect the harmonic characteristics of voiced sounds on the spectrum of a speech signal. This measure utilizes the properties that there are distinct impulses located at the positions of fundamental frequency and its harmonics, and the energy of voiced sound is dominated by the energy of these distinct harmonic impulses. The spectrum can be obtained by the fast Fourier transform (FFT) however, it may be destroyed when the speech is interfered with by additive noise. To enhance the robustness of the proposed scheme in noisy environments, we apply the joint time-frequency analysis (JTFA) technique to obtain the adaptive representation of the spectrum of speech signals. The adaptive representation can accurately extract important harmonic structure of noisy speech signals at the expense of high computation cost. To solve this problem, we further propose a fast adaptive representation (FAR) algorithm, which reduces the computation complexity of the original algorithm by 50%. The performance of the proposed fundamental-frequency estimation scheme is evaluated on a large database with or without additive noise. The performance is compared to that of other approaches on the same database. The experimental results show that the proposed scheme performs well on clean speech and is robust in noisy environments.
Published in: IEEE Transactions on Speech and Audio Processing ( Volume: 9, Issue: 6, September 2001)
Page(s): 609 - 621
Date of Publication: 30 September 2001

ISSN Information:


I. Introduction

The estimation of fundamental frequency is an essential component in a variety of speech processing systems such as the speech analysis-synthesis system and speech coding system [1], [2]. The contour of fundamental-frequency (i.e., pitch contour) also plays an important role in language communication [3] [4]–[6]. There are some difficulties in the estimation of fundamental frequency, although it can be observed by eye inspection. First, the voiced speech is not a perfectly periodic waveform because of the variation of fundamental frequency and the movement of vocal tract. Second, it is difficult to estimate the fundamental frequency of low-level voiced speech at its beginning and ending. Third, the performance of estimation will degrade when the speech signal is corrupted by noise.

References

References is not available for this document.