Conferences >ICASSP 2024 - 2024 IEEE Inter...

FIRNet: Fundamental Frequency Controllable Fast Neural Vocoder With Trainable Finite Impulse Response Filter

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Some neural vocoders with fundamental frequency (f0) control have succeeded in performing real-time inference on a single CPU while preserving the quality of the syntheti...Show More

Metadata

Abstract:

Some neural vocoders with fundamental frequency (f0) control have succeeded in performing real-time inference on a single CPU while preserving the quality of the synthetic speech. However, compared with legacy vocoders based on signal processing, their inference speeds are still low. This paper proposes a neural vocoder based on the source-filter model with trainable time-variant finite impulse response (FIR) filters, to achieve a similar inference speed to legacy vocoders. In the proposed model, FIRNet, multiple FIR coefficients are predicted using the neural networks, and the speech waveform is then generated by convolving a mixed excitation signal with these FIR coefficients. Experimental results show that FIRNet can achieve an inference speed similar to legacy vocoders while maintaining f0 controllability and natural speech quality.

Published in: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Date of Conference: 14-19 April 2024

Date Added to IEEE Xplore: 18 March 2024

ISBN Information:

ISSN Information:

DOI: 10.1109/ICASSP48485.2024.10446960

Conference Location: Seoul, Korea, Republic of

Contents

1. INTRODUCTION

A neural vocoder is a well-known neural waveform generation technique that allows us to convert acoustic features to high-quality speech waveforms. Since the invention of WaveNet [1], many neural vocoders have been proposed [2]–[6] and applied in speech generation systems, such as text-to-speech (TTS), voice conversion, and singing voice synthesis. For practical use, they require fundamental frequency (f₀) controllability and real-time generation speed on a single CPU. Therefore, it is important to develop neural vocoders that satisfy these requirements.

References is not available for this document.

MIT Libraries

MIT Libraries

FIRNet: Fundamental Frequency Controllable Fast Neural Vocoder With Trainable Finite Impulse Response Filter

Abstract:

Metadata

Abstract:

ISSN Information:

1. INTRODUCTION

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

FIRNet: Fundamental Frequency Controllable Fast Neural Vocoder With Trainable Finite Impulse Response Filter

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

1. INTRODUCTION

References