1. INTRODUCTION
A neural vocoder is a well-known neural waveform generation technique that allows us to convert acoustic features to high-quality speech waveforms. Since the invention of WaveNet [1], many neural vocoders have been proposed [2]–[6] and applied in speech generation systems, such as text-to-speech (TTS), voice conversion, and singing voice synthesis. For practical use, they require fundamental frequency (f0) controllability and real-time generation speed on a single CPU. Therefore, it is important to develop neural vocoders that satisfy these requirements.