Loading [MathJax]/extensions/MathMenu.js
Jiashuo Yu - IEEE Xplore Author Profile

Showing 1-3 of 3 results

Filter Results

Show

Results

Although audio-visual representation has been proven to be applicable in many downstream tasks, the representation of dancing videos, which is more specific and always accompanied by music with complex auditory contents, remains challenging and uninvestigated. Considering the intrinsic alignment between the cadent movement of the dancer and music rhythm, we introduce MuDaR, a novel Music-Dance Rep...Show More
Audio-visual event localization aims to localize an event that is both audible and visible in the wild, which is a widespread audio-visual scene analysis task for unconstrained videos. To address this task, we propose a Multimodal Parallel Network (MPN), which can perceive global semantics and unmixed local information parallelly. Specifically, our MPN framework consists of a classification subnet...Show More
Speech enhancement in realistic scenarios still remains many challenges, such as complex background signals and data limitations. In this paper, we present a co-attention based framework that incorporates self-supervised and curriculum learning to derive the target speech in noisy environments. Specifically, we first leverage self-supervision to pre-train the co-attention model on the task of audi...Show More