1. Introduction
Recent studies show that many people, especially hearing-impaired listeners, have problems in understanding dialogues in TV sound [1], [2]. Although movie soundtracks are normally carefully mixed in order to achieve a good speech intelligibility, problems can still arise in suboptimal listening conditions. To overcome this problem, approaches were proposed which aim at providing the user a control mechanism which allows for improving speech intelligibility. A straightforward method is proposed in [2] for enhancing the dialogue in discrete 5.1 mixes. Based on the assumption that the relevant dialogue is mixed into the center channel, this approach attenuates all non-center channels. A similar approach is proposed in [3]. For high-quality content delivery channels, such discrete multi-channel signals are typically available. For everyday broadcasting and streaming (e. g. YouTube), however, content is typically only available in the form of a stereo downmix which lacks the discrete center channel. In this case, more sophisticated methods for dialogue enhancement are necessary.