Conferences >ICASSP 2024 - 2024 IEEE Inter...

Adapting Pitch-Based Self Supervised Learning Models for Tempo Estimation

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Tempo estimation is the task of estimating the periodicity of the dominant rhythm pulse of a music audio signal. It has therefore a close relationship with dominant pitch...Show More

Metadata

Abstract:

Tempo estimation is the task of estimating the periodicity of the dominant rhythm pulse of a music audio signal. It has therefore a close relationship with dominant pitch estimation. Recently, both tasks have been addressed in a Self-Supervised Learning (SSL) fashion so as to leverage unlabelled data for training. In this work, we study the applicability of two successful pitch-based SSL models, SPICE and PESTO, for the purpose of tempo estimation. Both successfully exploit Siamese networks with a pitch-shifting view generation between the two branches. To apply these models for tempo estimation, we represent the audio signal by the Constant-Q transform (CQT) of its onset-strength-function and adapt their view generation using time-stretching (instead of pitch shifting), which is efficiently implemented by shifting the CQT. In a large experiment, we show that simply adapting PESTO in this way yields superior results than the previous SSL approach to tempo estimation for most datasets used in the reference benchmark. Further, since PESTO is light-weight, requiring only a few training data, we study a new learning scheme where the downstream datasets are processed directly in a SSL fashion (without access to labels) showing that this is an interesting alternative further improving the performance for some datasets.

Published in: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Date of Conference: 14-19 April 2024

Date Added to IEEE Xplore: 18 March 2024

ISBN Information:

ISSN Information:

DOI: 10.1109/ICASSP48485.2024.10447129

Conference Location: Seoul, Korea, Republic of

Contents

1. INTRODUCTION

Given its wide range of applications (recommendation, playlist generation, synchronization, dj-ing, audio or audio/video editing, beat-synchronous analysis), tempo estimation remains a major tasks in Music Information Retrieval (MIR). At its core, tempo estimation seeks to estimate the periodicity of the dominant rhythm pulse of a music audio signal, often expressed in beat per minute (BPM). Formulated in such a manner it has a strong resemblance to the task of pitch estimation. Recently, there has been a notable shift in the task of pitch estimation towards the adoption of Self-Supervised Learning (SSL). This has shown superiority over the conventional supervised models [1], [2]. In this work we explore the adaptation of such pitch-based SSL systems to the task of tempo estimation.

References is not available for this document.

Adapting Pitch-Based Self Supervised Learning Models for Tempo Estimation

Abstract:

Metadata

Abstract:

ISSN Information:

1. INTRODUCTION

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Adapting Pitch-Based Self Supervised Learning Models for Tempo Estimation

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

1. INTRODUCTION

Authors

Figures

References

Citations

Keywords

Metrics

Footnotes

References

IEEE Account

Purchase Details

Profile Information

Need Help?