I. Introduction
There has been significant progress in spoken language processing in recent years. A main goal for developing such technologies is the construction of effective human-centric information systems, a required feature of which is that the user communicates with the system using natural language. This necessitates a hands-free speech interface that can recognize and understand human language in the form of spontaneous utterances. Also, information is increasingly available in the form of multimedia, such as broadcast programs, meeting, and lecture recordings, in which speech signals are very often produced spontaneously. Processing and extracting information from spontaneous utterances in such multimedia content thus is another highly desired key feature of human-centric information systems.