Loading [MathJax]/extensions/MathMenu.js
Roland Badeau - IEEE Xplore Author Profile

Showing 1-25 of 105 results

Results

Supervised deep learning approaches to underdetermined audio source separation achieve state-of-the-art performance but require a dataset of mixtures along with their corresponding isolated source signals. Such datasets can be extremely costly to obtain for musical mixtures. This raises a need for unsupervised methods. We propose a novel unsupervised model-based deep learning approach to musical s...Show More
Estimating mixtures of damped chirp sinusoids in noise is a problem that affects audio analysis, coding, and synthesis applications. Phase-based non-stationary parameter estimators assume that sinusoids can be resolved in the Fourier transform domain, whereas high-resolution methods estimate superimposed components with accuracy close to the theoretical limits, but only for sinusoids with constant...Show More
In various audio signal processing applications, such as source separation and dereverberation, accurate mathematical modeling of both source signals and room reverberation is needed to properly describe the audio data. In a previous paper, we introduced a stochastic room impulse response model based on the image source principle, and we proposed an expectation-maximization algorithm that was able...Show More
Supervised source separation requires expensive synthetic datasets containing clean, ground truth-source signals, while unsupervised separation requires only data mixtures. Existing unsupervised methods still use supervision to avoid over-separation and compete with fully supervised methods. We present a new method of completely unsupervised single-channel blind source separation, based on variati...Show More
The goal of singing voice separation is to recover the vocals signal from music mixtures. State-of-the-art performance is achieved by deep neural networks trained in a supervised fashion. Since training data are scarce and music signals are extremely diverse, it remains challenging to achieve high separation quality across various recording and mixing conditions as well as music styles. In this pa...Show More
State space models have been extensively applied to model and control dynamical systems in disciplines including neuroscience, target tracking, and audio processing. A common modeling assumption is that both the state and data noise are Gaussian because it simplifies the estimation of the system's state and model parameters. However, in many real-world scenarios where the noise is heavy-tailed or ...Show More
Approximate Bayesian Computation (ABC) is a popular method for approximate inference in generative models with intractable but easy-to-sample likelihood. It constructs an approximate posterior distribution by finding parameters for which the simulated data are close to the observations in terms of summary statistics. These statistics are defined beforehand and might induce a loss of information, w...Show More
Variational inference of a Bayesian linear dynamical system is a powerful method for estimating latent variable sequences and learning sparse dynamic models in domains ranging from neuroscience to audio processing. The hardest part of the method is inferring the model's latent variable sequence. Here, we propose a solution using matrix inversion lemmas to derive what may be considered as the Bayes...Show More
Speech separation quality can be improved by exploiting textual information. However, this usually requires text-to-speech alignment at phoneme level. Classical alignment methods are made for rather clean speech and do not work as well on corrupted speech. We propose to perform text-informed speech-music separation and phoneme alignment jointly using recurrent neural networks and the attention mec...Show More
Prior information about the target source can improve audio source separation quality but is usually not available with the necessary level of audio alignment. This has limited its usability in the past. We propose a separation model that can nevertheless exploit such weak information for the separation task while aligning it on the mixture as a byproduct using an attention mechanism. We demonstra...Show More
We propose a semi-supervised multichannel speech enhancement system based on a probabilistic model which assumes that both speech and noise follow the heavy-tailed multi-variate complex Cauchy distribution. As we advocate, this allows handling strong and adverse noisy conditions. Consequently, the model is parameterized by the source magnitude spectrograms and the source spatial scatter matrices. ...Show More
In the field of room acoustics, it is well known that reverberation can be characterized statistically in a particular region of the time-frequency domain (after the transition time and above Schroeder's frequency). Since the 1950s, various formulas have been established, focusing on particular aspects of reverberation: exponential decay over time, correlations between frequencies, correlations be...Show More
In this study, we propose a novel probabilistic model for separating clean speech signals from noisy mixtures by decomposing the mixture spectra into a structured speech part and a more flexible residual part. The main novelty in our model is that it uses a family of heavy-tailed distributions, so called the α-stable distributions, for modeling the residual signal. We develop an expectation-maximi...Show More
This paper presents a Bayesian framework for under-determined audio source separation in multichannel reverberant mixtures. We model the source signals as Student's t latent random variables in a time-frequency domain. The specific structure of musical signals in this domain is exploited by means of a nonnegative matrix factorization model. Conversely, we design the mixing model in the time domain...Show More
For audio source separation applications, it is common to estimate the magnitude of the short-time Fourier transform (STFT) of each source. In order to further synthesize time-domain signals, it is necessary to recover the phase of the corresponding complex-valued STFT. Most authors in this field choose a Wiener-like filtering approach, which boils down to use the phase of the original mixture. In...Show More
While most dereverberation methods focus on how to estimate the magnitude of an anechoic signal in the time-frequency domain, we propose a method which also takes the phase into account. By applying a harmonic model to the anechoic signal, we derive a formulation to compute the amplitude and phase of each harmonic. These parameters are then estimated by our method in presence of reverberation. As ...Show More
This paper addresses the problem of under-determined audio source separation in multichannel reverberant mixtures. We target a semiblind scenario assuming that the mixing filters are known. Source separation is performed from the time-domain mixture signals in order to accurately model the convolutive mixing process. The source signals are however modeled as latent variables in a time-frequency do...Show More
Source separation, which consists in decomposing data into meaningful structured components, is an active research topic in many fields including music signal processing. In this paper, we introduce the Positive α-stable (PαS) distributions to model the latent sources, which are a subclass of the stable distributions family. They notably permit us to model random variables that are both nonnegativ...Show More
This paper introduces a new method for single-channel denoising that sheds new light on classical early developments on this topic that occurred in the 70's and 80's with Wiener filtering and spectral subtraction. Operating both in the short-time Fourier transform domain, these methods consist in estimating the power spectral density (PSD) of the noise without speech. Then, the clean speech signal...Show More
Hyperparameter estimation is a recurrent problem in the signal and statistics literature. Popular strategies are cross-validation or Bayesian inference, yet it remains an active topic of research in order to offer better or faster algorithms. The models considered here are sparse regression models with convex or non-convex group-Lasso-like penalties. Following the recent work of Pereyra et al. [1]...Show More
While most dereverberation methods focus on how to estimate the amplitude of an anechoic signal, we propose a method which also takes the phase into account. By applying a sinusoidal model to the anechoic signal, we derive a formulation to compute the amplitude and phase of each sinusoid. These parameters are then estimated by our method in the reverberant case. As we jointly estimate the amplitud...Show More
This paper addresses the problem of multichannel audio source separation in under-determined convolutive mixtures. We target a semi-blind scenario assuming that the mixing filters are known. The convolutive mixing process is exactly modeled using the time-domain impulse responses of the mixing filters. We propose a Student's t time-frequency source model based on non-negative matrix factorization ...Show More
In this paper, we focus on the problem of sound source localization and we propose a technique that exploits the known and arbitrary geometry of the microphone array. While most probabilistic techniques presented in the past rely on Gaussian models, we go further in this direction and detail a method for source localization that is based on the recently proposed α-stable harmonizable processes. Th...Show More
In this paper, we focus on modeling multichannel audio signals in the short-time Fourier transform domain for the purpose of source separation. We propose a probabilistic model based on a class of heavy-tailed distributions, in which the observed mixtures and the latent sources are jointly modeled by using a certain class of multivariate alpha-stable distributions. As opposed to the conventional G...Show More
Phase reconstruction of complex components in the time-frequency domain is a challenging but necessary task for audio source separation. While traditional approaches do not exploit phase constraints that originate from signal modeling, some prior information about the phase can be obtained from sinusoidal modeling. In this paper, we introduce a probabilistic mixture model which allows us to incorp...Show More