1. INTRODUCTION
Intent in speech is manifested in how a sentence is delivered, its phrasing, rhythm, intonation, energy and voice quality. Given the same sentence, speakers have large variability and freedom in focussing any concept (word) they choose to, and the degree to which the emphasis is laid. This prominence pattern of the words in an utterance bears information about the words' relevance, given/newness etc., in addition to the general style of the speaker. These aspects broadly fall under the ‘augmentative’ and ‘affective’ parts of prosody, the extra information in speech, to ensure that the intended message is unambiguously decoded by the listeners [1]. In this work, we use the term ‘intent’ to exclusively refer to such aspects within intonation. While it is ideal for Text-to-speech (TTS) systems to synthesize as appropriate to the underlying meaning of the sentence, intent is largely under-represented in text. However, there are certain domains where this information may be accessible to the synthesizer. In this work, we deal with one such domain, speech-to-speech machine translation (S2SMT). The goal of S2SMT is to take as input, speech in one language and automatically ‘dub’ it to generate as out-put, a translated sentence with the same meaning spoken in another language.