Loading [MathJax]/extensions/MathMenu.js
Reconstructing Speech From CNN Embeddings | IEEE Journals & Magazine | IEEE Xplore

Reconstructing Speech From CNN Embeddings


Abstract:

The complete understanding of the decision-making process of Convolutional Neural Networks (CNNs) is far from being fully reached. Many researchers proposed techniques to...Show More

Abstract:

The complete understanding of the decision-making process of Convolutional Neural Networks (CNNs) is far from being fully reached. Many researchers proposed techniques to interpret what a network actually “learns” from data. Nevertheless many questions still remain unanswered. In this work we study one aspect of this problem by reconstructing speech from the intermediate embeddings computed by a CNNs. Specifically, we consider a pre-trained network that acts as a feature extractor from speech audio. We investigate the possibility of inverting these features, reconstructing the input signals in a black-box scenario, and quantitatively measure the reconstruction quality by measuring the word-error-rate of an off-the-shelf ASR model. Experiments performed using two different CNN architectures trained for six different classification tasks, show that it is possible to reconstruct time-domain speech signals that preserve the semantic content, whenever the embeddings are extracted before the fully connected layers.
Published in: IEEE Signal Processing Letters ( Volume: 28)
Page(s): 952 - 956
Date of Publication: 16 April 2021

ISSN Information:


I. Introduction

Thanks to the availability of large amounts of data and increased computational power, CNNs have replaced multiple state-of-the-art techniques in a wide variety of fields, from image analysis, to audio processing. Despite the indubitable gains that CNNs offer in several tasks, a complete understanding of all the intricate and hidden processes that lie behind a CNN-based model has not been reached yet. For instance, researchers are still investigating whether learned features are interpretable [1]. Other authors are studying which portion of a CNN input actually triggers a specific classification result [2]–[4]. Answering these additional questions does not only help to develop more accurate solutions, but it also makes CNNs results easier to explain.

Contact IEEE to Subscribe

References

References is not available for this document.