I. Introduction
Traditionally, computers are operated by either a keyboard and a mouse or touch. They provide feedback to the user primarily via visual clues on a display. This human-computer interaction model can be unintuitive to a human user at first, but it allows the user to express its intent clearly, as long as their goal is supported and they are equipped with sufficient knowledge to operate the machine. A spoken dialogue system (SDS) aims to make the human-computer interaction more intuitive by equipping computers with the ability to translate between human and computer language, thereby relieving humans of this burden and creating an intuitive interaction model. More specifically, the objective of an SDS is to help a human user achieve their goal in a specific domain (e.g., hotel booking), using speech as the form of communication. Recent advances in artificial intelligence (AI) and reinforcement learning (RL) have established the necessary technology to build the first generation of commercial SDSs deployable as regular household items. Examples of such systems are Amazon's Alexa, Google's Home or Apple's Siri. While initially built as voice-command systems, over the years these systems have become capable of sustaining dialogues that can span a few turns.