Journals & Magazines >Proceedings of the IEEE >Volume: 101 Issue: 5

Speech Synthesis Based on Hidden Markov Models

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

This paper gives a general overview of hidden Markov model (HMM)-based speech synthesis, which has recently been demonstrated to be very effective in synthesizing speech....Show More

Metadata

Abstract:

This paper gives a general overview of hidden Markov model (HMM)-based speech synthesis, which has recently been demonstrated to be very effective in synthesizing speech. The main advantage of this approach is its flexibility in changing speaker identities, emotions, and speaking styles. This paper also discusses the relation between the HMM-based approach and the more conventional unit-selection approach that has dominated over the last decades. Finally, advanced techniques for future developments are described.

Published in: Proceedings of the IEEE ( Volume: 101, Issue: 5, May 2013)

Page(s): 1234 - 1252

Date of Publication: 09 April 2013

ISSN Information:

DOI: 10.1109/JPROC.2013.2251852

References is not available for this document.

Contents

I. Introduction

Text-to-speech (TTS) synthesis is a technique for generating intelligible, natural-sounding artificial speech for a given input text. It has been used widely in various applications including in-car navigation systems, e-book readers, voice-over functions for the visually impaired, and communication aids for the speech impaired. More recent applications include spoken dialog systems, communicative robots, singing speech synthesizers, and speech-to-speech translation systems.

Select All

P. Taylor, Text-to-Speech Synthesis, U.K., Cambridge:Cambridge Univ. Press, 2009.

CrossRef Google Scholar

D. H. Klatt, "Software for a cascade/parallel formant synthesizer", J. Acoust. Soc. Amer., vol. 67, pp. 971-995, 1980.

CrossRef Google Scholar

E. Moulines and F. Charpentier, "Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones", Speech Commun., vol. 9, pp. 453-467, 1990.

CrossRef Google Scholar

A. Hunt and A. W. Black, "Unit selection in a concatenative speech synthesis system using a large speech database", Proc. Int. Conf. Acoust. Speech Signal Process., pp. 373-376, 1996.

View Article

Google Scholar

A. Breen and P. Jackson, "A phonologically motivated method of selecting nonuniform units", Proc. Int. Conf. Spoken Lang. Process., pp. 2735-2738, 1998.

CrossRef Google Scholar

R. E. Donovan and E. M. Eide, "The IBM trainable speech synthesis system", Proc. Int. Conf. Spoken Lang. Process., pp. 1703-1706, 1998.

CrossRef Google Scholar

B. Beutnagel, A. Conkie, J. Schroeter, Y. Stylianou and A. Syrdal, "The AT Next-Gen TTS system", Proc. Joint ASA/EAA/DAEA Meeting, pp. 15-19, 1999.

CrossRef Google Scholar

G. Coorman, J. Fackrell, P. Rutten and B. Coile, "Segment selection in the L realspeak laboratory TTS system", Proc. Int. Conf. Spoken Lang. Process., pp. 395-398, 2000.

Google Scholar

E. Eide, A. Aaron, R. Bakis, W. Hamza, M. Picheny and J. Pitrelli, "A corpus-based approach to 〈AHEM/〉 expressive speech synthesis", Proc. ISCA Workshop Speech Synthesis, pp. 79-84, 2004.

Google Scholar

10.

A. W. Black, "Unit selection and emotional speech", Proc. Eurospeech, pp. 1649-1652, 2003.

CrossRef Google Scholar

11.

T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi and T. Kitamura, "Simultaneous modeling of spectrum pitch and duration in HMM-based speech synthesis", Proc. Eurospeech, pp. 2347-2350, 1999.

CrossRef Google Scholar

12.

Z.-H. Ling, Y.-J. Wu, Y.-P. Wang, L. Qin and R.-H. Wang, "USTC system for Blizzard Challenge 2006 an improved HMM-based speech synthesis method", Proc. Blizzard Challenge Workshop, 2006, [online] Available: http://www.festvox.org/blizzard/blizzard2006.html.

Google Scholar

13.

A. W. Black, "CLUSTERGEN: A statistical parametric synthesizer using trajectory modeling", Proc. Interspeech, pp. 1762-1765, 2006.

CrossRef Google Scholar

14.

H. Zen, T. Toda, M. Nakamura and K. Tokuda, "Details of the Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005", IEICE Trans. Inf. Syst., vol. E90-D, no. 1, pp. 325-333, 2007.

CrossRef Google Scholar

15.

L. Rabiner and B.-H. Juang, Fundamentals of Speech Recognition, USA, NJ, Englewood Cliffs:Prentice-Hall, 1993, [online] Available: .

Google Scholar

16.

17.

18.

19.

20.

T. Toda and K. Tokuda, "A speech parameter generation algorithm considering global variance for HMM-based speech synthesis", IEICE Trans. Inf. Syst., vol. E90-D, no. 5, pp. 816-824, 2007.

CrossRef Google Scholar

21.

H. Zen, K. Tokuda and T. Kitamura, "Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences", Comput. Speech Lang., vol. 21, no. 1, pp. 153-173, 2006.

CrossRef Google Scholar

22.

Y.-J. Wu and R.-H. Wang, "Minimum generation error training for HMM-based speech synthesis", Proc. Int. Conf. Acoust. Speech Signal Process., pp. 89-92, 2006.

View Article

Google Scholar

23.

Y.-J. Wu and K. Tokuda, "Minimum generation error training by using original spectrum as reference for log spectral distortion measure", Proc. Int. Conf. Acoust. Speech Signal Process., pp. 4013-4016, 2009.

View Article

Google Scholar

24.

J. Yamagishi, T. Kobayashi, Y. Nakano, K. Ogata and J. Isogai, "Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm", IEEE Trans. Speech Audio Lang. Process., vol. 17, no. 1, pp. 66-83, Jan. 2009.

View Article

Google Scholar

25.

J. Yamagishi, B. Usabaev, S. King, O. Watts, J. Dines, J. Tian, et al., "Thousands of voices for HMM-based speech synthesis—Analysis and application of TTS systems built on various ASR corpora", IEEE Trans. Speech Audio Lang. Process., vol. 18, no. 5, pp. 984-1004, Jul. 2010.

View Article

Google Scholar

26.

M. Tamura, T. Masuko, K. Tokuda and T. Kobayashi, "Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR", Proc. Int. Conf. Acoust. Speech Signal Process., pp. 805-808, 2001.

View Article

Google Scholar

27.

K. Shichiri, A. Sawabe, K. Tokuda, T. Masuko, T. Kobayashi and T. Kitamura, "Eigenvoices for HMM-based speech synthesis", Proc. Int. Conf. Spoken Lang. Process., pp. 1269-1272, 2002.

CrossRef Google Scholar

28.

K. Kazumi, Y. Nankaku and K. Tokuda, "Factor analyzed voice models for HMM-based speech synthesis", Proc. Int. Conf. Acoust. Speech Signal Process., pp. 4234-4237, 2010.

View Article

Google Scholar

29.

K. Miyanaga, T. Masuko and T. Kobayashi, "A style control technique for HMM-based speech synthesis", Proc. Interspeech, pp. 1437-1439, 2004.

Google Scholar

30.

T. Nose, J. Yamagishi, T. Masuko and T. Kobayashi, "A style control technique for HMM-based expressive speech synthesis", IEICE Trans. Inf. Syst., vol. E90-D, no. 9, pp. 1406-1413, 2007.

CrossRef Google Scholar

References is not available for this document.

Speech Synthesis Based on Hidden Markov Models

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Speech Synthesis Based on Hidden Markov Models

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?