Loading [MathJax]/extensions/MathMenu.js
Speech Synthesis Based on Hidden Markov Models | IEEE Journals & Magazine | IEEE Xplore

Speech Synthesis Based on Hidden Markov Models


Abstract:

This paper gives a general overview of hidden Markov model (HMM)-based speech synthesis, which has recently been demonstrated to be very effective in synthesizing speech....Show More

Abstract:

This paper gives a general overview of hidden Markov model (HMM)-based speech synthesis, which has recently been demonstrated to be very effective in synthesizing speech. The main advantage of this approach is its flexibility in changing speaker identities, emotions, and speaking styles. This paper also discusses the relation between the HMM-based approach and the more conventional unit-selection approach that has dominated over the last decades. Finally, advanced techniques for future developments are described.
Published in: Proceedings of the IEEE ( Volume: 101, Issue: 5, May 2013)
Page(s): 1234 - 1252
Date of Publication: 09 April 2013

ISSN Information:

References is not available for this document.

I. Introduction

Text-to-speech (TTS) synthesis is a technique for generating intelligible, natural-sounding artificial speech for a given input text. It has been used widely in various applications including in-car navigation systems, e-book readers, voice-over functions for the visually impaired, and communication aids for the speech impaired. More recent applications include spoken dialog systems, communicative robots, singing speech synthesizers, and speech-to-speech translation systems.

Select All
1.
P. Taylor, Text-to-Speech Synthesis, U.K., Cambridge:Cambridge Univ. Press, 2009.
2.
D. H. Klatt, "Software for a cascade/parallel formant synthesizer", J. Acoust. Soc. Amer., vol. 67, pp. 971-995, 1980.
3.
E. Moulines and F. Charpentier, "Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones", Speech Commun., vol. 9, pp. 453-467, 1990.
4.
A. Hunt and A. W. Black, "Unit selection in a concatenative speech synthesis system using a large speech database", Proc. Int. Conf. Acoust. Speech Signal Process., pp. 373-376, 1996.
5.
A. Breen and P. Jackson, "A phonologically motivated method of selecting nonuniform units", Proc. Int. Conf. Spoken Lang. Process., pp. 2735-2738, 1998.
6.
R. E. Donovan and E. M. Eide, "The IBM trainable speech synthesis system", Proc. Int. Conf. Spoken Lang. Process., pp. 1703-1706, 1998.
7.
B. Beutnagel, A. Conkie, J. Schroeter, Y. Stylianou and A. Syrdal, "The AT Next-Gen TTS system", Proc. Joint ASA/EAA/DAEA Meeting, pp. 15-19, 1999.
8.
G. Coorman, J. Fackrell, P. Rutten and B. Coile, "Segment selection in the L realspeak laboratory TTS system", Proc. Int. Conf. Spoken Lang. Process., pp. 395-398, 2000.
9.
E. Eide, A. Aaron, R. Bakis, W. Hamza, M. Picheny and J. Pitrelli, "A corpus-based approach to 〈AHEM/〉 expressive speech synthesis", Proc. ISCA Workshop Speech Synthesis, pp. 79-84, 2004.
10.
A. W. Black, "Unit selection and emotional speech", Proc. Eurospeech, pp. 1649-1652, 2003.
11.
T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi and T. Kitamura, "Simultaneous modeling of spectrum pitch and duration in HMM-based speech synthesis", Proc. Eurospeech, pp. 2347-2350, 1999.
12.
Z.-H. Ling, Y.-J. Wu, Y.-P. Wang, L. Qin and R.-H. Wang, "USTC system for Blizzard Challenge 2006 an improved HMM-based speech synthesis method", Proc. Blizzard Challenge Workshop, 2006, [online] Available: http://www.festvox.org/blizzard/blizzard2006.html.
13.
A. W. Black, "CLUSTERGEN: A statistical parametric synthesizer using trajectory modeling", Proc. Interspeech, pp. 1762-1765, 2006.
14.
H. Zen, T. Toda, M. Nakamura and K. Tokuda, "Details of the Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005", IEICE Trans. Inf. Syst., vol. E90-D, no. 1, pp. 325-333, 2007.
15.
L. Rabiner and B.-H. Juang, Fundamentals of Speech Recognition, USA, NJ, Englewood Cliffs:Prentice-Hall, 1993, [online] Available: .
16.
17.
18.
19.
20.
T. Toda and K. Tokuda, "A speech parameter generation algorithm considering global variance for HMM-based speech synthesis", IEICE Trans. Inf. Syst., vol. E90-D, no. 5, pp. 816-824, 2007.
21.
H. Zen, K. Tokuda and T. Kitamura, "Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences", Comput. Speech Lang., vol. 21, no. 1, pp. 153-173, 2006.
22.
Y.-J. Wu and R.-H. Wang, "Minimum generation error training for HMM-based speech synthesis", Proc. Int. Conf. Acoust. Speech Signal Process., pp. 89-92, 2006.
23.
Y.-J. Wu and K. Tokuda, "Minimum generation error training by using original spectrum as reference for log spectral distortion measure", Proc. Int. Conf. Acoust. Speech Signal Process., pp. 4013-4016, 2009.
24.
J. Yamagishi, T. Kobayashi, Y. Nakano, K. Ogata and J. Isogai, "Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm", IEEE Trans. Speech Audio Lang. Process., vol. 17, no. 1, pp. 66-83, Jan. 2009.
25.
J. Yamagishi, B. Usabaev, S. King, O. Watts, J. Dines, J. Tian, et al., "Thousands of voices for HMM-based speech synthesis—Analysis and application of TTS systems built on various ASR corpora", IEEE Trans. Speech Audio Lang. Process., vol. 18, no. 5, pp. 984-1004, Jul. 2010.
26.
M. Tamura, T. Masuko, K. Tokuda and T. Kobayashi, "Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR", Proc. Int. Conf. Acoust. Speech Signal Process., pp. 805-808, 2001.
27.
K. Shichiri, A. Sawabe, K. Tokuda, T. Masuko, T. Kobayashi and T. Kitamura, "Eigenvoices for HMM-based speech synthesis", Proc. Int. Conf. Spoken Lang. Process., pp. 1269-1272, 2002.
28.
K. Kazumi, Y. Nankaku and K. Tokuda, "Factor analyzed voice models for HMM-based speech synthesis", Proc. Int. Conf. Acoust. Speech Signal Process., pp. 4234-4237, 2010.
29.
K. Miyanaga, T. Masuko and T. Kobayashi, "A style control technique for HMM-based speech synthesis", Proc. Interspeech, pp. 1437-1439, 2004.
30.
T. Nose, J. Yamagishi, T. Masuko and T. Kobayashi, "A style control technique for HMM-based expressive speech synthesis", IEICE Trans. Inf. Syst., vol. E90-D, no. 9, pp. 1406-1413, 2007.
Contact IEEE to Subscribe

References

References is not available for this document.