Adapting Pitch-Based Self Supervised Learning Models for Tempo Estimation | IEEE Conference Publication | IEEE Xplore

Adapting Pitch-Based Self Supervised Learning Models for Tempo Estimation


Abstract:

Tempo estimation is the task of estimating the periodicity of the dominant rhythm pulse of a music audio signal. It has therefore a close relationship with dominant pitch...Show More

Abstract:

Tempo estimation is the task of estimating the periodicity of the dominant rhythm pulse of a music audio signal. It has therefore a close relationship with dominant pitch estimation. Recently, both tasks have been addressed in a Self-Supervised Learning (SSL) fashion so as to leverage unlabelled data for training. In this work, we study the applicability of two successful pitch-based SSL models, SPICE and PESTO, for the purpose of tempo estimation. Both successfully exploit Siamese networks with a pitch-shifting view generation between the two branches. To apply these models for tempo estimation, we represent the audio signal by the Constant-Q transform (CQT) of its onset-strength-function and adapt their view generation using time-stretching (instead of pitch shifting), which is efficiently implemented by shifting the CQT. In a large experiment, we show that simply adapting PESTO in this way yields superior results than the previous SSL approach to tempo estimation for most datasets used in the reference benchmark. Further, since PESTO is light-weight, requiring only a few training data, we study a new learning scheme where the downstream datasets are processed directly in a SSL fashion (without access to labels) showing that this is an interesting alternative further improving the performance for some datasets.
Date of Conference: 14-19 April 2024
Date Added to IEEE Xplore: 18 March 2024
ISBN Information:

ISSN Information:

Conference Location: Seoul, Korea, Republic of

1. INTRODUCTION

Given its wide range of applications (recommendation, playlist generation, synchronization, dj-ing, audio or audio/video editing, beat-synchronous analysis), tempo estimation remains a major tasks in Music Information Retrieval (MIR). At its core, tempo estimation seeks to estimate the periodicity of the dominant rhythm pulse of a music audio signal, often expressed in beat per minute (BPM). Formulated in such a manner it has a strong resemblance to the task of pitch estimation. Recently, there has been a notable shift in the task of pitch estimation towards the adoption of Self-Supervised Learning (SSL). This has shown superiority over the conventional supervised models [1], [2]. In this work we explore the adaptation of such pitch-based SSL systems to the task of tempo estimation.

Contact IEEE to Subscribe

References

References is not available for this document.