Shengfeng He - IEEE Xplore Author Profile

Showing 1-25 of 110 results

Filter Results

Show

Results

We introduce a new task, Open-set Mixed Domain Adaptation (OSMDA), which considers the potential mixture of multiple distributions in the target domains, thereby better simulating real-world scenarios. To tackle the semantic ambiguity arising from multiple domains, our key idea is that the linguistic representation can serve as a universal descriptor for samples of the same category across various...Show More
In recent years, significant progress has been made in prototype-based learning methods for few-shot semantic segmentation. However, prototype features originating from the support images are interfered with by intra-class diversity and thus cannot be aligned with the query foreground, resulting in poor segmentation accuracy. Therefore, we propose a novel self-support prototype-aware (SSPA) networ...Show More
The vulnerability of 3D point cloud analysis to unpredictable rotations poses an open yet challenging problem: orientation-aware 3D domain generalization. Cross-domain robustness and adaptability of 3D representations are crucial but not easily achieved through rotation augmentation. Motivated by the inherent advantages of intricate orientations in enhancing generalizability, we propose an innovat...Show More
In this paper, we propose a novel translation model, UniTranslator, for transforming representations between visually distinct domains under conditions of limited training data and significant visual differences. The main idea behind our approach is leveraging the domain-neutral capabilities of CLIP as a bridging mechanism, while utilizing a separate module to extract abstract, domain-agnostic sem...Show More
Crowd counting has drawn increasing attention across various fields. However, existing crowd counting tasks primarily focus on estimating the overall population, ignoring the behavioral and semantic information of different social groups within the crowd. In this paper, we aim to address a newly proposed research problem, namely fine-grained crowd counting, which involves identifying different cat...Show More
RGB-Thermal Salient Object Detection (RGB-T SOD) aims to pinpoint prominent objects within aligned pairs of visible and thermal infrared images. A key challenge lies in bridging the inherent disparities between RGB and Thermal modalities for effective saliency map prediction. Traditional encoder-decoder architectures, while designed for cross-modality feature interactions, may not have adequately ...Show More
Throughout history, static paintings have captivated viewers within display frames, yet the possibility of making these masterpieces vividly interactive remains intriguing. This research paper introduces 3DArtmator, a novel approach that aims to represent artforms in a highly interpretable stylized space, enabling 3D-aware animatable reconstruction and editing. Our rationale is to transfer the int...Show More
Monocular depth prediction has received significant attention in recent years. However, the impact of illumination variations, which can shift scenes to unseen domains, has often been overlooked. To address this, we introduce the first indoor scene dataset featuring RGB-D images captured under multiple illumination conditions, allowing for a comprehensive exploration of indoor depth prediction. Ad...Show More
Although pre-trained large-scale generative models StyleGAN series have proven to be effective in various editing and translation tasks, they are limited to pre-defined fixed aspect ratio. To overcome this limitation, we propose StyleGAN-$\infty$, a model that enables pre-trained StyleGAN to perform arbitrary-ratio conditional synthesis. Our key insight is to distill the expressive StyleGAN featur...Show More
Current sketch extraction methods either require extensive training or fail to capture a wide range of artistic styles, limiting their practical applicability and versatility. We introduce Mixture-of-Self-Attention (MixSA), a training-free sketch extraction method that leverages strong diffusion priors for enhanced sketch perception. At its core, MixSA employs a mixture-of-self-attention technique...Show More
Multi-view representation learning aims to derive robust representations that are both view-consistent and view-specific from diverse data sources. This paper presents an in-depth analysis of existing approaches in this domain, highlighting a commonly overlooked aspect: the redundancy between view-consistent and view-specific representations. To this end, we propose an innovative framework for mul...Show More
We propose a voxel-based optimization framework, Re VoRF, for few-shot radiance fields that strategically ad-dress the unreliability in pseudo novel view synthesis. Our method pivots on the insight that relative depth relationships within neighboring regions are more reliable than the ab-solute color values in disoccluded areas. Consequently, we devise a bilateral geometric consistency loss that c...Show More
Point-based interactive editing serves as an essential tool to complement the controllability of existing generative mod-els. A concurrent work, DragD iffus ion, updates the diffusion latent map in response to user inputs, causing global latent map alterations. This results in imprecise preservation of the original content and unsuccessful editing due to gradient vanishing. In contrast, we present...Show More
Existing methods for asymmetric image retrieval employ a rigid pairwise similarity constraint between the query network and the larger gallery network. However, these one-to-one constraint approaches often fail to maintain retrieval order consistency, especially when the query network has limited representational capacity. To overcome this problem, we introduce the Decoupled Differential Distillat...Show More
In this paper, we delve into a novel aspect of learning novel diffusion conditions with datasets an order of magnitude smaller. The rationale behind our approach is the elimination of textual constraints during the few-shot learning process. To that end, we implement two optimization strategies. The first, prompt-free conditional learning, utilizes a prompt-free encoder derived from a pre-trained ...Show More
Reversible face anonymization, unlike traditional face pixelization, seeks to replace sensitive identity information in facial images with synthesized alternatives, preserving privacy without sacrificing image clarity. Traditional methods, such as encoder-decoder networks, often result in significant loss of facial details due to their limited learning capacity. Additionally, relying on latent man...Show More
Mild cognitive impairment (MCI) represents an early stage of Alzheimer’s disease (AD), characterized by subtle clinical symptoms that pose challenges for accurate diagnosis. The quest for the identification of MCI individuals has highlighted the importance of comprehending the underlying mechanisms of disease causation. Integrated analysis of brain imaging and genomics offers a promising avenue fo...Show More
3D neural rendering enables photo-realistic reconstruction of a specific scene by encoding discontinuous inputs into a neural representation. Despite the remarkable rendering results, the storage of network parameters is not transmission-friendly and not extendable to metaverse applications. In this paper, we propose an invertible neural rendering approach that enables generating an interactive 3D...Show More
In Visual Question Answering (VQA), addressing language prior bias, where models excessively rely on superficial correlations between questions and answers, is crucial. This issue becomes more pronounced in real-world applications with diverse domains and varied question-answer distributions during testing. To tackle this challenge, Test-time Adaptation (TTA) has emerged, allowing pre-trained VQA ...Show More
The degradation of printed photographs due to inadequate preservation is a major problem that can be addressed through deep learning-based restoration methods. However, these methods are often limited by their reliance on annotated data, making them less effective for new domains with limited training samples. In this paper, we propose a semi-supervised old photo restoration network that employs a...Show More
Text-to-image generation models have significantly broadened the horizons of creative expression through the power of natural language. However, navigating these models to generate unique concepts, alter their appearance, or reimagine them in unfamiliar roles presents an intricate challenge. For instance, how can we exploit language-guided models to transpose an anime character into a different ar...Show More
Enabling efficient and accurate deep neural network (DNN) inference on microcontrollers is non-trivial due to the constrained on-chip resources. Current methodologies primarily focus on compressing larger models yet at the expense of model accuracy. In this paper, we rethink the problem from the inverse perspective by constructing small/weak models directly and improving their accuracy. Thus, we i...Show More
Crowd image is arguably one of the most laborious data to annotate. In this paper, we aim to reduce the massive demand for densely labeled crowd data, and propose a novel weakly-supervised setting, in which we leverage the binary ranking of two images with high-contrast crowd counts as training guidance. To enable training under this new setting, we convert the crowd count regression problem to a ...Show More
The fully convolutional network (FCN) has dominated salient object detection for a long period. However, the locality of CNN requires the model deep enough to have a global receptive field and such a deep model always leads to the loss of local details. In this paper, we introduce a new attention-based encoder, vision transformer, into salient object detection to ensure the globalization of the re...Show More
HD map reconstruction is crucial for autonomous driving. LiDAR-based methods are limited due to expensive sensors and time-consuming computation. Camera-based methods usually need to perform road segmentation and view transformation separately, which often causes distortion and missing content. To push the limits of the technology, we present a novel framework that reconstructs a local map formed ...Show More