Loading [MathJax]/extensions/MathZoom.js
Li Song - IEEE Xplore Author Profile

Showing 1-25 of 197 results

Filter Results

Show

Results

With the increasing consumption of 3D displays and virtual reality, multi-view video has become a promising format. However, its high resolution and multi-camera shooting result in a substantial increase in data volume, making storage and transmission a challenging task. To tackle these difficulties, we propose an implicit-explicit integrated representation for multi-view video compression. Specif...Show More
Realistic image restoration is a crucial task in computer vision, and diffusion-based models for image restoration have garnered significant attention due to their ability to produce realistic results. Restoration can be seen as a controllable generation conditioning on priors. However, due to the severity of image degradation, existing diffusion-based restoration methods cannot fully exploit prio...Show More
Recently, numerous complexity control approaches have been proposed to achieve the target encoding complexity. However, only few of them were developed for VVC encoders. This paper fills this gap by proposing an efficient and flexible complexity control approach for VVC. The support for both Acceleration Ratio Control (ARC) and Encoding Time Control (ETC) makes our method highly versatile for vari...Show More
In streaming media services, video transcoding is a common practice to alleviate bandwidth demands. Unfortunately, traditional methods employing a uniform rate factor (RF) across all videos often result in significant inefficiencies. Content-adaptive encoding (CAE) techniques address this by dynamically adjusting encoding parameters based on video content characteristics. However, existing CAE met...Show More
The rapid advancements in medical imaging have led to a growing demand for high-performance lossless compression of large 3D medical image datasets. Unlike natural images, medical images typically feature three-dimensional structures, and high bit-depth, necessitating specialized compression techniques. Based on a decoder-only transformer, we propose a learnable dual-decoder model for lossless com...Show More
Learned image compression (LIC) methods often employ symmetrical encoder and decoder architectures, evitably increasing decoding time. However, practical scenarios demand an asymmetric design, where the decoder requires low complexity to cater to diverse low-end devices, while the encoder can accommodate higher complexity to improve coding performance. In this paper, we propose an asymmetric light...Show More
HTTP adaptive streaming (HAS) constructs bitrate ladders to deliver videos with the best possible quality under varying network conditions. Though per-shot content adaptive encoding (CAE) largely improves the compression efficiency by constructing the optimal bitrate ladder for each video shot, it suffers from excessive encoding complexity as all the points in the operating space (typically resolu...Show More
Generating photo-realistic avatars from audio plays an important role in extended reality (XR) and metaverse. In this paper, we lift the input audio from speech to singing, which has been rarely studied. The significant distinction between singing and talking poses great challenges for adapting talking face generation methods to the singing regime. To address this, we propose a high-fidelity singi...Show More
Volumetric videos, benefiting from immersive 3D realism and interactivity, hold vast potential for various applications, while the tremendous data volume poses significant challenges for compression. Recently, NeRF has demonstrated remarkable potential in volumetric video compression thanks to its simple representation and powerful 3D modeling capabilities, where a notable work is ReRF. However, R...Show More
Recently, NVS in human-object interaction scenes has received increasing attention. Existing human-object interaction datasets mainly consist of static data with limited views, offering only RGB images or videos, mostly containing interactions between a single person and objects. Moreover, these datasets exhibit complexities in lighting environments, poor synchronization, and low resolution, hinde...Show More
In recent years, numerous learned video compression (LVC) methods have emerged, demonstrating rapid developments and satisfactory performance. However, in most previous methods, only the previous one frame is used as reference. Although some works introduce the usage of the previous multiple frames, the exploitation of temporal information is not comprehensive. Our proposed method not only utilize...Show More
Reconstructing the human body mesh often faces challenges like self-occlusion, object occlusion, and interference from other people. However, focusing on the model's robustness in scenarios of occlusion leads to a compromise in the accuracy of estimating non-occluded humans. Striking the right balance is a research question worth exploring. In this study, we introduce the Visibility-aware Human Me...Show More
Colorizing grayscale images offers an engaging visual experience. Existing automatic colorization methods often fail to generate satisfactory results due to incorrect semantic colors and unsaturated colors. In this work, we propose an automatic colorization pipeline to overcome these challenges. We leverage the extraordinary generative ability of the diffusion prior to synthesize color with plausi...Show More
We introduce LFCAVE, an interactive 3D display system comprised of display and interaction modules. In the display aspect, we have developed a multi-screen light field model that incorporates multiple consumer-grade light field displays for seamless multi-screen presentations. Compared to traditional single-screen setups, our system offers an expanded viewing angle and accommodates a larger number...Show More
Restoring old photos that contain numerous unknown and complex defects is a challenging and ill-posed problem. Traditional methods often struggle to address both structured and unstructured defects in real old photos, frequently leading to over-smoothed and uncompleted results. In this paper, we exploit powerful diffusion priors to construct a novel solution for the restoration of old photos. Our ...Show More
Theadvent of Free Viewpoint Video (FVV) marks a significant evolution in internet video services, moving from traditional formats to more interactive and immersive experiences. Current free viewpoint video transmission systems face several challenges, including insufficient scalability in high-concurrency scenarios and additional response delays during interaction. To address these issues, we prop...Show More
With the rise of deep learning and the widespread use of face recognition, face image privacy has become a critical research issue. Face de-identification is acknowledged as effective for protecting identity privacy. As media formats diversify, it is imperative to extend privacy protection to videos. Addressing the core problem of identity consistency between frames, we propose a video de-identifi...Show More
The multi-path data scheduler stands as the pivotal element profoundly influencing the performance of any multipath transport. Multi-site parallel downloading (MPD), which emerges as an alternative to the costly traditional dedicated Content Delivery Network (CDN), requests video segments from multiple economical edge data nodes simultaneously. However, the existing data scheduler for MPD solely p...Show More
This paper proposes a novel no-reference quality assessment method for text-to-image generation. Text-to-image refers to the process of generating image content from textual descriptions using deep learning models. Although advances in technology and improvements in models have made it possible to generate some high-quality images, some generated images still exhibit unique distortions that reflec...Show More
The use of social media networks and mobile devices has experienced tremendous growth in recent years. This has led to a surge in the number of videos recorded and uploaded to social media platforms like TikTok and YouTube. However, this increase has also resulted in the rise of illegal duplicate videos, which are essentially the same as the original videos but with minor editing effects and varia...Show More
Talking face generation aims at generating photo-realistic video portraits of a target person driven by input audio. According to the nature of audio to lip motions mapping, the same speech content may have different appearances even for the same person at different occasions. Such one-to-many mapping problem brings ambiguity during training and thus causes inferior visual results. Although this o...Show More
Novel-view synthesis with sparse input views is important for practical applications such as AR/VR and autonomous driving. Many works in this field have already integrated depth information into NeRF, utilizing depth priors for assistance in geometric and spatial understanding. However, most existing work tends to either overlook the inaccuracies in depth maps or only handle them roughly, limiting...Show More
Text patterns typically exhibit distinct boundaries and sparse color histograms. However, in current hybrid codec frameworks, the positions of coding units are often misaligned with the text patterns, resulting in prediction and color mapping tools consuming a large number of bits to indicate these patterns. Nowadays, some text detection and recognition methods have been proposed to accurately loc...Show More
Depth image-based rendering (DIBR) view synthesis is the most widely employed method in real-time FVV research. Despite recent progress, most DIBR-based FVV synthesis approaches are not sufficiently simple and effective in filling holes and artifacts. Additionally, they use RGB-D cameras, which are difficult to widely adopt or take considerable time to estimate high-quality depth images. This arti...Show More
Recent advancements in SDRTV-to-HDRTV conversion have yielded impressive results in reconstructing high dynamic range television (HDRTV) videos from standard dynamic range television (SDRTV) videos. However, the practical applications of these techniques are limited for ultra-high definition (UHD) video systems due to their high computational and memory costs. In this paper, we propose EffiHDR, an...Show More