The TC-STAR (Technology and Corpora for Speech to Speech Translation) project financed by the European Commission within the Sixth framework Program is a long-term effort to advance research in speech to speech translation technologies
Project No. FP6-506738
. The primary goal of the TC-STAR project is to produce an end-to-end system in English and Spanish that accepts parliamentary speeches in one language, transcribes, translates and synthesizes them into another language, while significantly reducing the gap between the performance of a human (interpreter) and a machine. To support this goal, the performance of each component technology, namely, speech recognition (ASR), machine translation (MT) and text-to-speech (TTS) is optimized to produce the best output at their respective stages [1]. The EPPS corpus comprises of over 800 politicians discussing current affairs during several public sessions of the European Parliament in multiple languages and the minutes of these sessions edited by the European parliament also known as the Final Text Editions (FTE). In the 2007 evaluation, the training, development and evaluation data comprised of recordings made between April 1996 and May 2006. Within the TC-STAR project, the evaluation is done under three different conditions:public, which allows the use of any data that is publicly available, such as Broadcast news, data mined from the web released by University of Washington, and data from the British Parliament sessions in addition to the EPPS acoustic training data and Final Text Editions;
restricted, which allows the use of EPPS data only;
open, which allows the use of publicly available and any in-house material in addition to the EPPS data.