1. INTRODUCTION
Spoken Language Identification(SLID) is the problem of classifying the language spoken by a speaker in an audio clip. SLID is useful in personalized voice assistants, automatic speech translation systems, and multi-lingual speech recognition systems and has been used in call centers to route calls to a specific language operator automatically. Earlier studies [1] [2] have used the phonetic, phonotactic, prosodic, and lexical features for SLID. Classical SLID models first extract the i-vectors [3], or x-vectors [4] and then train an independent classifier model on top. Acoustic features such as MFCCs and filter banks are also commonly used as input features [5].