Conferences >2014 IEEE International Confe...

Asynchronous stochastic optimization for sequence training of deep neural networks

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

This paper explores asynchronous stochastic optimization for sequence training of deep neural networks. Sequence training requires more computation than frame-level train...Show More

Metadata

Abstract:

This paper explores asynchronous stochastic optimization for sequence training of deep neural networks. Sequence training requires more computation than frame-level training using pre-computed frame data. This leads to several complications for stochastic optimization, arising from significant asynchrony in model updates under massive parallelization, and limited data shuffling due to utterance-chunked processing. We analyze the impact of these two issues on the efficiency and performance of sequence training. In particular, we suggest a framework to formalize the reasoning about the asynchrony and present experimental results on both small and large scale Voice Search tasks to validate the effectiveness and efficiency of asynchronous stochastic optimization.

Published in: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Date of Conference: 04-09 May 2014

Date Added to IEEE Xplore: 14 July 2014

Electronic ISBN:978-1-4799-2893-4

ISSN Information:

DOI: 10.1109/ICASSP.2014.6854672

Conference Location: Florence, Italy

References is not available for this document.

Contents

1. Introduction

Deep neural networks (DNNs) have become the dominant acoustic model for speech recognition [1], [2], [3], [4], [5]. Two frameworks to incorporate DNNs into the existing HMM-based decoder include the hybrid [6] and tandem [7] approaches. In this paper, we focus on the hybrid approach.

Select All

F. Seide, G. Li, and D. Yu, "Conversational speech transcription using context-dependent deep neural networks," in INTERSPEECH, 2011, pp. 437-440.

CrossRef Google Scholar

G.E. Dahl, D. Yu, and L. Deng, "Context-dependent pretrained deep neural networks for large vocabulary speech recognition," in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2011.

Google Scholar

G. Dahl, D. Yu, L. Deng, and A. Acero, "Context-dependent pre-trained deep neural networks for large vocabulary speech recognition," IEEE Transactions on Audio, Speech, and Language Processing, Special Issue on Deep Learning for Speech and Language Processing, vol. 20, no. 1, pp. 30-42, 2012.

View Article

Google Scholar

N. Jaitly, P. Nguyen, A. Senior, and V. Vanhoucke, "Application of pretrained deep neural networks to large vocabulary speech recognition," in INTERSPEECH, 2012.

CrossRef Google Scholar

G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, and B. Kingsbury, "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups," IEEE Signal Process. Mag., vol. 29, no. 6, pp. 82-97, 2012.

View Article

Google Scholar

H. Bourlard and N. Morgan, Connectionist speech recognition, Kluwer Academic Publishers, 1994.

CrossRef Google Scholar

H. Hermansky, D.P.W. Ellis, and S. Sharma, "Tandem connectionist feature extraction for conventional HMM systems," in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Istanbul, Turkey, June 2000.

View Article

Google Scholar

Y. Normandin, Hidden Markov Models, Maximum Mutual Information, and the Speech Recognition Problem, Ph.D. thesis, McGill University, Montreal, Canada, 1991.

Google Scholar

D. Povey, Discriminative Training for Large Vocabulary Speech Recognition, Ph.D. thesis, Cambridge, England, 2004.

Google Scholar

10.

B. Kingsbury, "Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling," in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Taipei, Taiwan, Apr. 2009, pp. 3761-3764.

View Article

Google Scholar

11.

B. Kingsbury, T. N. Sainath, and H. Soltau, "Scalable minimum Bayes risk training of deep neural network acoustic models using distributed Hessian-free optimization," in INTERSPEECH, 2012.

CrossRef Google Scholar

12.

H. Su, G. Li, D. Yu, and F. Seide, "Error back propagation for sequence training of context-dependent deep networks for conversational speech transcription," in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2013, pp. 6664-6668.

View Article

Google Scholar

13.

K. Vesely, A. Ghoshal, L. Burget, and D. Povey, "Sequencediscriminative training of deep neural networks," in INTERSPEECH, 2013.

CrossRef Google Scholar

14.

L. Bottou, "Stochastic gradient learning in neural networks," in Neuro-Nimes, 1991.

Google Scholar

15.

Y. Kubo, T. Hori, and A. Nakamura, "Large vocabulary continuous speech recognition based on WFST structured classifiers and deep bottleneck features," in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP),2013.

View Article

Google Scholar

16.

Q. Le, M. Ranzato, R. Monga, M. Devin, K. Chen, G. Corrado, J. Dean, and A. Ng, "Building high-level features using large scale unsupervised learning," in International Conference on Machine Learning, 2012, pp. 81-88.

Google Scholar

17.

J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, Q. Le, M. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, and A. Ng, "Large scale distributed deep networks," in Advances in Neural Information Processing Systems (NIPS), 2012.

Google Scholar

18.

G. Heigold, V. Vanhoucke, A. Senior, P. Nguyen, M. Ranzato, M. Devin, and J. Dean, "Multilingual acoustic models using distributed deep neural networks," in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vancouver, Canada, Apr. 2013, vol. 1.

View Article

Google Scholar

19.

C.M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006.

CrossRef Google Scholar

20.

G. Heigold, A Log-Linear Discriminative Modeling Framework for Speech Recognition, Ph.D. thesis, RWTH Aachen University, Aachen, Germany, June 2010.

Google Scholar

21.

J. Schalkwyk, D. Beeferman, F. Beaufays, B. Byrne, C. Chelba, M. Cohen, M. Kamvar, and B. Strope, ""Your word is my command": Google search by voice: A case study," in Advances in Speech Recognition: Mobile Environments, Call Centers and Clinics, chapter 4, pp. 61-90. Springer, 2010.

CrossRef Google Scholar

References is not available for this document.

MIT Libraries

MIT Libraries

Asynchronous stochastic optimization for sequence training of deep neural networks

Abstract:

Metadata

Abstract:

ISSN Information:

1. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

Asynchronous stochastic optimization for sequence training of deep neural networks

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

1. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?