Conferences >ICASSP 2022 - 2022 IEEE Inter...

Music Phrase Inpainting Using Long-Term Representation and Contrastive Loss

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Deep generative modeling has already become the leading technique for music automation. However, long-term generation remains a challenging task as most methods fall shor...Show More

Metadata

Abstract:

Deep generative modeling has already become the leading technique for music automation. However, long-term generation remains a challenging task as most methods fall short in preserving a natural structure and the overall musicality when the generation scope exceeds several beats. In this study, we tackle the problem of long-term, phrase-level symbolic melody inpainting by equipping a sequence prediction model with phrase-level representation (as an extra condition) and contrastive loss (as an extra optimization term). The underlying ideas are twofold. First, to predict phrase-level music, we need phrase-level representations as a better context. Second, we should predict notes and their high-level representations simultaneously, while contrastive loss serves as a better target for abstract representations. Experimental results show that our method significantly outperforms the baselines. In particular, contrastive loss plays a critical role in the generation quality, and the phase-level representation further enhances the structure of long-term generation.¹

Published in: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Date of Conference: 23-27 May 2022

Date Added to IEEE Xplore: 27 April 2022

ISBN Information:

ISSN Information:

DOI: 10.1109/ICASSP43922.2022.9747817

Conference Location: Singapore, Singapore

Contents

1. INTRODUCTION

In recent years, deep generative models have achieved promising progress in the field of symbolic music generation [1]–[3]. In particular, music inpainting task [4]–[7] draws lots of research attention due to its great practical value in humancomputer music co-creation [8]. The general setting is that human composers create some parts of a piece, while the algorithm inpaints (or infills) the rest. However, long-term generation remains a challenging task. When the inpainting scope exceeds several beats, current methods cannot yet preserve a natural structure and the overall musicality.

References is not available for this document.

MIT Libraries

MIT Libraries

Music Phrase Inpainting Using Long-Term Representation and Contrastive Loss

Abstract:

Metadata

Abstract:

ISSN Information:

1. INTRODUCTION

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

Music Phrase Inpainting Using Long-Term Representation and Contrastive Loss

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

1. INTRODUCTION

References