Conferences >2024 IEEE/CVF Conference on C...

UniHuman: A Unified Model For Editing Human Images in the Wild

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Human image editing includes tasks like changing a person's pose, their clothing, or editing the image according to a text prompt. However, prior work often tackles these...Show More

Metadata

Abstract:

Human image editing includes tasks like changing a person's pose, their clothing, or editing the image according to a text prompt. However, prior work often tackles these tasks separately, overlooking the benefit of mutual reinforcement from learning them jointly. In this paper, we propose UniHuman, a unified model that addresses multiple facets of human image editing in real-world settings. To enhance the model's generation quality and generalization capacity, we leverage guidance from human visual encoders and introduce a lightweight pose-warping module that can exploit different pose representations, accommodating unseen textures and patterns. Furthermore, to bridge the disparity between existing human editing benchmarks with real-world data, we curated 400K high-quality human image-text pairs for training and collected 2K human images for out-of-domain testing, both encompassing diverse clothing styles, backgrounds, and age groups. Experiments on both in-domain and out-of-domain test sets demonstrate that UniHuman outperforms task-specific models by a significant margin. In user studies, UniHuman is preferred by the users in an average of 77% of cases. Our project is available at this link.

Published in: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Date of Conference: 16-22 June 2024

Date Added to IEEE Xplore: 16 September 2024

ISBN Information:

ISSN Information:

DOI: 10.1109/CVPR52733.2024.00199

Conference Location: Seattle, WA, USA

Contents

1. Introduction

In the realm of computer graphics and computer vision, the synthesis and manipulation of human images have evolved into a captivating and transformative field. This field holds invaluable applications covering a range of domains: reposing strives to generate a new pose of a person given a target pose [2], [43], [45], [47], virtual try-on aims to seamlessly fit a new garment onto a person [23], [26], [48], and text-to-image editing manipulate a person's clothing styles based on text prompts [5], [11], [12], [40]. However, most approaches address these tasks in isolation, neglecting the benefits of learning them jointly to mutually reinforce one another via the uti-lization of auxiliary information provided by related tasks [9], [16], [42]. In addition, few studies have explored effective ways to adapt to unseen human-in-the-wild cases. Figure 1.

The results of UniHuman on diverse real-world images. UniHuman learns informative representations by leveraging multiple data sources and connections between related tasks, achieving high-quality results across various human image editing objectives.

References is not available for this document.

UniHuman: A Unified Model For Editing Human Images in the Wild

Abstract:

Metadata

Abstract:

ISSN Information:

1. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

UniHuman: A Unified Model For Editing Human Images in the Wild

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

1. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?