Abstract:
We present PEFAC, a fundamental frequency estimation algorithm for speech that is able to identify voiced frames and estimate pitch reliably even at negative signal-to-no...View moreMetadata
Abstract:
We present PEFAC, a fundamental frequency estimation algorithm for speech that is able to identify voiced frames and estimate pitch reliably even at negative signal-to-noise ratios. The algorithm combines a normalization stage, to remove channel dependency and to attenuate strong noise components, with a harmonic summing filter applied in the log-frequency power spectral domain, the impulse response of which is chosen to sum the energy of the fundamental frequency harmonics while attenuating smoothly-varying noise components. Temporal continuity constraints are applied to the selected pitch candidates and a voiced speech probability is computed from the likelihood ratio of two classifiers, one for voiced speech and one for unvoiced speech/silence. We compare the performance of our algorithm with that of other widely used algorithms and demonstrate that it performs well in both high and low levels of additive noise.
Published in: IEEE/ACM Transactions on Audio, Speech, and Language Processing ( Volume: 22, Issue: 2, February 2014)