1. INTRODUCTION
Interfaces that support human language as a medium of communication between humans and computers have been of interest for decades [1], [2]. Known as Natural Language Interfaces (NLIs), early systems saw limited success due to the difficult problem of endowing computers with the ability to understand natural language. Progress in language understanding has led to renewed interest in NLIs [3]. In particular, several studies have focused on NLIs to databases (NLIDBs) [4], [5], [6]. NLIDBs, when fully realized, stand to support users who are not proficient in query languages. The primary focus of NLIDBs has been on parsing natural language text utterances into executable SQL queries (text-to-SQL parsing). Motivated by the rise of speech-driven digital assistants on smartphones, tablets, and other small handheld devices, we study the task of parsing spoken natural language to executable SQL queries (speech-to-SQL parsing).