Loading [MathJax]/extensions/MathMenu.js
Dialogue enhancement of stereo sound | IEEE Conference Publication | IEEE Xplore

Dialogue enhancement of stereo sound


Abstract:

Studies show that many people have difficulties in understanding dialogue in movies when watching TV, especially hard-of-hearing listeners or in adverse listening environ...Show More

Abstract:

Studies show that many people have difficulties in understanding dialogue in movies when watching TV, especially hard-of-hearing listeners or in adverse listening environments. In order to overcome this problem, we propose an efficient methodology to enhance the speech component of a stereo signal. The method is designed with low computational complexity in mind, and consists of first extracting a center channel from the stereo signal. Novel methods for speech enhancement and voice activity detection are proposed which exploit the stereo information. A speech enhancement filter is estimated based on the relationship between the extracted center channel and all other channels. Subjective and objective evaluations show that this method can successfully enhance intelligibility of the dialogue without affecting the overall sound quality negatively.
Date of Conference: 31 August 2015 - 04 September 2015
Date Added to IEEE Xplore: 28 December 2015
Electronic ISBN:978-0-9928-6263-3
Electronic ISSN: 2076-1465
Conference Location: Nice, France
References is not available for this document.

1. Introduction

Recent studies show that many people, especially hearing-impaired listeners, have problems in understanding dialogues in TV sound [1], [2]. Although movie soundtracks are normally carefully mixed in order to achieve a good speech intelligibility, problems can still arise in suboptimal listening conditions. To overcome this problem, approaches were proposed which aim at providing the user a control mechanism which allows for improving speech intelligibility. A straightforward method is proposed in [2] for enhancing the dialogue in discrete 5.1 mixes. Based on the assumption that the relevant dialogue is mixed into the center channel, this approach attenuates all non-center channels. A similar approach is proposed in [3]. For high-quality content delivery channels, such discrete multi-channel signals are typically available. For everyday broadcasting and streaming (e. g. YouTube), however, content is typically only available in the form of a stereo downmix which lacks the discrete center channel. In this case, more sophisticated methods for dialogue enhancement are necessary.

Select All
1.
M. Armstrong, "Audio processing and speech intelligibility: a literature review", BBC Research Development Whitepaper, 2011.
2.
B. G. Shirley, Improving Television sound for people with hearing impairments, 2013.
3.
H. Fuchs, S. Tuff and C. Bustad, "Dialogue enhancement-technology and experiments", EBU Technical review, vol. 2, pp. 1, 2012.
4.
E. Vickers, "Frequency-domain two-to three-channel upmix for center channel derivation and speech enhancement", AES Convention 127, 2009.
5.
C. Avendano and J.-M. Jot, "A frequency-domain approach to multichannel upmix", Journal of the Audio Engineering Society, vol. 52, no. 7/8, pp. 740-749, 2004.
6.
C. Uhle, O. Hellmuth and J. Weigel, "Speech enhancement of movie sound", AES Convention, 2008.
7.
F. Rumsey, "Hearing enhancement", Journal of the Audio Engineering Society, vol. 57, no. 5, pp. 353-359, 2009.
8.
Y. Ephraim and D. Malah, "Speech enhancement using a minimum mean-square error log-spectral amplitude estimator", Acoustics Speech and Signal Processing IEEE Transactions on, vol. 33, no. 2, pp. 443-445, 1985.
9.
T. Virtanen, "Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria", Audio Speech and Language Processing IEEE Transactions on, vol. 15, no. 3, pp. 1066-1074, 2007.
10.
Y. Xu, J. Du, L.-R. Dai and C.-H. Lee, "An experimental study on speech enhancement based on deep neural networks", Signal Processing Letters IEEE, vol. 21, no. 1, pp. 65-68, 2014.
11.
"Speech enhancement", pp. 191-467, 2011.
12.
E. Scheirer and M. Slaney, "Construction and evaluation of a robust multifeature speech/music discriminator", Proc. ICASSP, pp. 1331-1334, 1997.
13.
R. A. Bradley and M. E. Terry, "Rank analysis of incomplete block designs: I. the method of paired comparisons", Biometrika, vol. 39, no. 3/4, pp. 324-345, 1952.
14.
S. Choisel and F. Wickelmaier, "Ratio-scaling of listener preference of multichannel reproduced sound", Proc. DAGA, 2005.
15.
A. W. Rix, J. G. Beerends, M. P. Hollier and A. P. Hekstra, "Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs", Proc. ICASSP, pp. 749-752, 2001.
16.
J. H. Hansen and B. L. Pellom, "An effective quality evaluation protocol for speech enhancement algorithms", ICSLP, pp. 2819-2822, 1998.
17.
R. Martin, "Noise power spectral density estimation based on optimal smoothing and minimum statistics", Speech and Audio Processing IEEE Transactions on, vol. 9, no. 5, pp. 504-512, 2001.

Contact IEEE to Subscribe

References

References is not available for this document.