1. INTRODUCTION
Speech synthesis is typically formulated as the conversion of text to speech (TTS). This formulation, however, leaves out control for all the aspects of speech not contained in the text. Here we approach the problem of expressive speech synthesis which includes not just text, but other characteristics such as pitch, rhythm and emphasis. There are formulations to expressive speech synthesis that require animated and emotive voice data. This is an inconvenient drawback given the limited access to such data. In our approach, we can make a voice emote and sing without any such data.