Conferences >2014 IEEE Spoken Language Tec...

Further investigation into multilingual training and adaptation of stacked bottle-neck neural network structure

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Multilingual training of neural networks for ASR is widely studied these days. It has been shown that languages with little training data can benefit largely from multili...Show More

Metadata

Abstract:

Multilingual training of neural networks for ASR is widely studied these days. It has been shown that languages with little training data can benefit largely from multilingual resources. We have evaluated possible ways of adaptation of multilingual stacked bottle-neck hierarchy to target domain. This paper extends our latest work and focuses on the impact certain aspects have on the performance of an adapted neural network feature extractor. First, the performance of adapted multilingual networks preliminarily trained on different languages is studied. Next, the effect of different target units - phonemes vs. triphone states - used for multilingual NN training is evaluated. Then the impact of an increasing number of languages used for multilingual NN training is investigated. Here the condition of constant amount of data is added to separately control the influence of larger language variability and larger amount of data. The effect of adding languages from a different domain is also evaluated. Finally a study is performed where a language with the phonetic structure similar to the target's one is added to multilingual training data.

Published in: 2014 IEEE Spoken Language Technology Workshop (SLT)

Date of Conference: 07-10 December 2014

Date Added to IEEE Xplore: 02 April 2015

Electronic ISBN:978-1-4799-7129-9

DOI: 10.1109/SLT.2014.7078548

Conference Location: South Lake Tahoe, NV, USA

Contents

1. Introduction

One of the challenges in speech recognition community is to build an ASR system with limited in-domain data. Thus the data hungry speech recognition training algorithms have to be modified to handle such limits. This also applies to neural networks (NNs) which are part of essentially any state-of-the-art ASR system today. They serve either as features extractors or as acoustic models estimating probabilities of sub-phoneme classes. NNs have to be trained on a large amount of in-domain data in order to perform well. Today's deep neural networks are largely under-trained from the perspective of the research done on tasks with rich resources [1]. The need for more training data can be alleviated by layer-wise training [2] or unsupervised pre-training [3]. Another techniques such as dropout [4] and maxout [5] effectively reduce the number of parameters in the neural network.

References is not available for this document.

Further investigation into multilingual training and adaptation of stacked bottle-neck neural network structure

Abstract:

Metadata

Abstract:

1. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Further investigation into multilingual training and adaptation of stacked bottle-neck neural network structure

Alerts

Abstract:

Metadata

Abstract:

1. Introduction

References