Journals & Magazines >IEEE Signal Processing Letters >Volume: 28

Reconstructing Speech From CNN Embeddings

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

The complete understanding of the decision-making process of Convolutional Neural Networks (CNNs) is far from being fully reached. Many researchers proposed techniques to...Show More

Metadata

Abstract:

The complete understanding of the decision-making process of Convolutional Neural Networks (CNNs) is far from being fully reached. Many researchers proposed techniques to interpret what a network actually “learns” from data. Nevertheless many questions still remain unanswered. In this work we study one aspect of this problem by reconstructing speech from the intermediate embeddings computed by a CNNs. Specifically, we consider a pre-trained network that acts as a feature extractor from speech audio. We investigate the possibility of inverting these features, reconstructing the input signals in a black-box scenario, and quantitatively measure the reconstruction quality by measuring the word-error-rate of an off-the-shelf ASR model. Experiments performed using two different CNN architectures trained for six different classification tasks, show that it is possible to reconstruct time-domain speech signals that preserve the semantic content, whenever the embeddings are extracted before the fully connected layers.

Published in: IEEE Signal Processing Letters ( Volume: 28)

Page(s): 952 - 956

Date of Publication: 16 April 2021

ISSN Information:

DOI: 10.1109/LSP.2021.3073628

No metrics found for this document.

Contents

I. Introduction

Thanks to the availability of large amounts of data and increased computational power, CNNs have replaced multiple state-of-the-art techniques in a wide variety of fields, from image analysis, to audio processing. Despite the indubitable gains that CNNs offer in several tasks, a complete understanding of all the intricate and hidden processes that lie behind a CNN-based model has not been reached yet. For instance, researchers are still investigating whether learned features are interpretable [1]. Other authors are studying which portion of a CNN input actually triggers a specific classification result [2]–[4]. Answering these additional questions does not only help to develop more accurate solutions, but it also makes CNNs results easier to explain.

Usage

Select a Year

View as

Total usage sinceApr 2021:506

Year Total:5

Data is updated monthly. Usage includes PDF downloads and HTML views.

Citations

Crossref^®

Search for
Citations in
Google Scholar^®

References is not available for this document.

Reconstructing Speech From CNN Embeddings

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

View as

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Reconstructing Speech From CNN Embeddings

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

Authors

Figures

References

Citations

Keywords

Metrics

View as

Footnotes

References

IEEE Account

Purchase Details

Profile Information

Need Help?