Loading [MathJax]/extensions/MathZoom.js
Further investigation into multilingual training and adaptation of stacked bottle-neck neural network structure | IEEE Conference Publication | IEEE Xplore

Further investigation into multilingual training and adaptation of stacked bottle-neck neural network structure


Abstract:

Multilingual training of neural networks for ASR is widely studied these days. It has been shown that languages with little training data can benefit largely from multili...Show More

Abstract:

Multilingual training of neural networks for ASR is widely studied these days. It has been shown that languages with little training data can benefit largely from multilingual resources. We have evaluated possible ways of adaptation of multilingual stacked bottle-neck hierarchy to target domain. This paper extends our latest work and focuses on the impact certain aspects have on the performance of an adapted neural network feature extractor. First, the performance of adapted multilingual networks preliminarily trained on different languages is studied. Next, the effect of different target units - phonemes vs. triphone states - used for multilingual NN training is evaluated. Then the impact of an increasing number of languages used for multilingual NN training is investigated. Here the condition of constant amount of data is added to separately control the influence of larger language variability and larger amount of data. The effect of adding languages from a different domain is also evaluated. Finally a study is performed where a language with the phonetic structure similar to the target's one is added to multilingual training data.
Date of Conference: 07-10 December 2014
Date Added to IEEE Xplore: 02 April 2015
Electronic ISBN:978-1-4799-7129-9
Conference Location: South Lake Tahoe, NV, USA
Brno University of Technology, Speech@FIT and IT4I Center of Excellence, Brno, Czech Republic
Brno University of Technology, Speech@FIT and IT4I Center of Excellence, Brno, Czech Republic
Brno University of Technology, Speech@FIT and IT4I Center of Excellence, Brno, Czech Republic

1. Introduction

One of the challenges in speech recognition community is to build an ASR system with limited in-domain data. Thus the data hungry speech recognition training algorithms have to be modified to handle such limits. This also applies to neural networks (NNs) which are part of essentially any state-of-the-art ASR system today. They serve either as features extractors or as acoustic models estimating probabilities of sub-phoneme classes. NNs have to be trained on a large amount of in-domain data in order to perform well. Today's deep neural networks are largely under-trained from the perspective of the research done on tasks with rich resources [1]. The need for more training data can be alleviated by layer-wise training [2] or unsupervised pre-training [3]. Another techniques such as dropout [4] and maxout [5] effectively reduce the number of parameters in the neural network.

Brno University of Technology, Speech@FIT and IT4I Center of Excellence, Brno, Czech Republic
Brno University of Technology, Speech@FIT and IT4I Center of Excellence, Brno, Czech Republic
Brno University of Technology, Speech@FIT and IT4I Center of Excellence, Brno, Czech Republic
Contact IEEE to Subscribe

References

References is not available for this document.