1 Introduction
Despite its expressive richness, when speech is represented through captions it is typically reduced to its words, and its words only. Whatever nuance was originally conveyed by the ways in which the speaker modulated their voice — their mood, emotions, dispositions, etc — is lost in this flattened textual representation. This is particularly relevant when captions are used not as a complement to a readily available audio channel, but as its replacement. This can be true for deaf and hard of hearing (DHH) persons, but will potentially affect anyone, including hearing individuals facing a situational hearing impairment, e.g., someone affected by a situational hearing impairment such as watching a film on their mobile phone in a noisy environment [1]. If so much of communication is expressed in nuances not captured by written text, what is to be said of the experience of those who have no direct access to acoustic speech but only to its written forms?