Journals & Magazines >IEEE Signal Processing Letters >Volume: 28

Reconstructing Speech From CNN Embeddings

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

The complete understanding of the decision-making process of Convolutional Neural Networks (CNNs) is far from being fully reached. Many researchers proposed techniques to...Show More

Metadata

Abstract:

The complete understanding of the decision-making process of Convolutional Neural Networks (CNNs) is far from being fully reached. Many researchers proposed techniques to interpret what a network actually “learns” from data. Nevertheless many questions still remain unanswered. In this work we study one aspect of this problem by reconstructing speech from the intermediate embeddings computed by a CNNs. Specifically, we consider a pre-trained network that acts as a feature extractor from speech audio. We investigate the possibility of inverting these features, reconstructing the input signals in a black-box scenario, and quantitatively measure the reconstruction quality by measuring the word-error-rate of an off-the-shelf ASR model. Experiments performed using two different CNN architectures trained for six different classification tasks, show that it is possible to reconstruct time-domain speech signals that preserve the semantic content, whenever the embeddings are extracted before the fully connected layers.

Published in: IEEE Signal Processing Letters ( Volume: 28)

Page(s): 952 - 956

Date of Publication: 16 April 2021

ISSN Information:

DOI: 10.1109/LSP.2021.3073628

Contents

I. Introduction

Thanks to the availability of large amounts of data and increased computational power, CNNs have replaced multiple state-of-the-art techniques in a wide variety of fields, from image analysis, to audio processing. Despite the indubitable gains that CNNs offer in several tasks, a complete understanding of all the intricate and hidden processes that lie behind a CNN-based model has not been reached yet. For instance, researchers are still investigating whether learned features are interpretable [1]. Other authors are studying which portion of a CNN input actually triggers a specific classification result [2]–[4]. Answering these additional questions does not only help to develop more accurate solutions, but it also makes CNNs results easier to explain.

References is not available for this document.

Reconstructing Speech From CNN Embeddings

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Reconstructing Speech From CNN Embeddings

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?