Journals & Magazines >IEEE Transactions on Audio, S... >Volume: 18 Issue: 5

Supervisory Data Alignment for Text-Independent Voice Conversion

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

We propose new supervisory data alignment methods for text-independent voice conversion which do not need parallel training corpora. Phonetic information is used as a res...Show More

Metadata

Abstract:

We propose new supervisory data alignment methods for text-independent voice conversion which do not need parallel training corpora. Phonetic information is used as a restriction during alignment for mapping the data from the source speaker onto the parameter space of a target speaker. Both linear and nonlinear methods are derived by considering alignment accuracy and topology preservation. For the linear alignment, we consider common phoneme clusters of the source and target space as benchmarks and adapt the source data vector to the target space while maintaining the relative phonetic positions among neighborhood clusters. In order to preserve the topological structure of the source parameter space and improve the stability of conversion and the accuracy of the phonetic mapping, a supervised self-organizing learning algorithm considering phonetic restriction is proposed for iteratively improving the alignment outcome of the previous step. Both the linear and nonlinear methods can also be applied in the cross-lingual case. Evaluation results show that the proposed methods improve the performance of alignment in terms of both alignment accuracy and stability for text-independent voice conversion in intra-lingual and cross-lingual cases.

Published in: IEEE Transactions on Audio, Speech, and Language Processing ( Volume: 18, Issue: 5, July 2010)

Page(s): 932 - 943

Date of Publication: 14 June 2010

ISSN Information:

DOI: 10.1109/TASL.2010.2041688

Contents

I. Introduction

Voice conversion is a technique that is used to transform the voice of one speaker so that it is perceived as the voice of another speaker. There are many existing transformation approaches such as the use of vector quantization [1]–[3], Gaussian mixture models [4]–[7], pitch-synchronous overlap addition [8], artificial neural networks [9], and multiple functions [10], [11]. All these techniques have two common stages: training and transformation. The voice conversion system gathers information on the voices of the source and target speakers and automatically formulates voice conversion rules in the training stage. For this purpose, a process called data alignment is required, in which a relationship between the acoustic parameter spaces of the two speakers is estimated. The transformation stage employs the mapping obtained in the training stage to modify the source voice so that it matches the characteristics of the target speaker. The majority of methods proposed in existing literature assume the availability of parallel training sentences, which are referred to as the text-dependent corpus for the source and target speakers. In these approaches, the source and target voices can be aligned using, for example, dynamic time warping [4]. For research purposes, the requirement of having parallel speech databases is not prohibitive, but from the viewpoint of potential practical applications, this requirement is rather inconvenient and sometimes hard to fulfill. Moreover, in some applications, it may even be impossible to obtain parallel speech corpora; e.g., in cross-lingual voice conversion where the source and target speakers speak different languages. To address this problem, text-independent voice conversion techniques using nonparallel databases are developed.

References is not available for this document.

MIT Libraries

MIT Libraries

Supervisory Data Alignment for Text-Independent Voice Conversion

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

Supervisory Data Alignment for Text-Independent Voice Conversion

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

References