Loading [MathJax]/extensions/MathMenu.js
Hao Tang - IEEE Xplore Author Profile

Showing 1-25 of 77 results

Results

In this paper, we present an innovative solution tailored for the intricate challenges of the virtual try-on task–our novel Hierarchical Cross-Attention Network, HCANet. HCANet is meticulously crafted with two primary stages: geometric matching and try-on, each playing a crucial role in delivering realistic and visually convincing virtual try-on outcomes. A distinctive feature of HCANet is the inc...Show More
Over the years, deep learning-based automatic target recognition (ATR) in synthetic aperture radar (SAR) imagery has made remarkable progress on the assumption that the target category library is immutable. However, the target category library will continue to expand over time in real-world scenarios, so the ATR model should be updated to acquire reasoning capabilities for subsequent acquired targ...Show More
Autonomous driving platforms encounter diverse driving scenarios, each with varying hardware resources and precision requirements. Given the computational limitations of embedded devices, it is crucial to consider computing costs when deploying on target platforms like the DRIVE PX 2. Our objective is to customize the semantic segmentation network according to the computing power and specific scen...Show More
Extracting keypoint locations from input hand frames, known as 3D hand pose estimation, is a critical task in various human-computer interaction applications. Essentially, the 3D hand pose estimation can be regarded as a 3D point subset generative problem conditioned on input frames. Thanks to the recent significant progress on diffusion-based generative models, hand pose estimation can also benef...Show More
Despite the impressive achievements of Deep Neural Networks (DNNs) in computer vision, their vulnerability to adversarial attacks remains a critical concern. Extensive research has demonstrated that incorporating sophisticated perturbations into input images can lead to a catastrophic degradation in DNNs’ performance. This perplexing phenomenon not only exists in the digital space but also in the ...Show More
The rise of deep learning has furnished a potent boost for the rapid development of automatic target recognition (ATR) in synthetic aperture radar (SAR) imagery. The existing SAR ATR methods can achieve impressive results with the great many labeled samples available. However, in real SAR application scenarios, the acquisition of quite a few SAR samples is costly or sometimes infeasible. Thus, SAR...Show More
As deep learning technology advances, human fall detection (HFD) leveraging convolutional neural networks (CNNs) has recently garnered significant interest within the research community. However, most existing works ignore the cross-frame association of skeleton keypoints and aggregation of feature representations. To address this, we first introduce an image preprocessing (IPP) module, which enha...Show More
Vision foundation models (VFMs), such as the segment anything model (SAM), allow zero-shot or interactive segmentation of visual contents; thus, they are quickly applied in a variety of visual scenes. However, their direct use in many remote sensing (RS) applications is often unsatisfactory due to the special imaging properties of RS images (RSIs). In this work, we aim to utilize the strong visual...Show More
We present a novel graph Transformer generative adversarial network (GTGAN) to learn effective graph node relations in an end-to-end fashion for challenging graph-constrained architectural layout generation tasks. The proposed graph-Transformer-based generator includes a novel graph Transformer encoder that combines graph convolutions and self-attentions in a Transformer to model both local and gl...Show More
High Dynamic Range (HDR) images can be recovered from several Low Dynamic Range (LDR) images by existing Deep Neural Networks (DNNs) techniques. Despite the remarkable progress, DNN-based methods still generate ghosting artifacts when LDR images have saturation and large motion, which hinders potential applications in real-world scenarios. To address this challenge, we formulate the HDR deghosting...Show More
Jointly processing information from multiple sensors is crucial to achieving accurate and robust perception for reliable autonomous driving systems. However, current 3D perception research follows a modality-specific paradigm, leading to additional computation overheads and inefficient collaboration between different sensor data. In this paper, we present an efficient multi-modal backbone for outd...Show More
Owing to the large distribution gap between the heterogeneous data in Visible-Infrared Person Re-identification (VI Re-ID), we point out that existing paradigms often suffer from the inter-modal semantic misalignment issue and thus fail to align and compare local details properly. In this paper, we present Concordant Attention Learning (CAL), a novel framework that learns semantic-aligned represen...Show More
We present a novel graph Transformer generative adversarial network (GTGAN) to learn effective graph node relations in an end-to-end fashion for the challenging graph-constrained house generation task. The proposed graph-Transformer-based generator includes a novel graph Transformer encoder that combines graph convolutions and self-attentions in a Transformer to model both local and global interac...Show More
Generating a high-quality High Dynamic Range (HDR) image from dynamic scenes has recently been extensively studied by exploiting Deep Neural Networks (DNNs). Most DNNs-based methods require a large amount of training data with ground truth, requiring tedious and time-consuming work. Few-shot HDR imaging aims to generate satisfactory images with limited data. However, it is difficult for modern DNN...Show More
With the ever-increasing popularity of edge devices, it is necessary to implement real-time segmentation on the edge for autonomous driving and many other applications. Vision Transformers (ViTs) have shown considerably stronger results for many vision tasks. However, ViTs with the fullattention mechanism usually consume a large number of computational resources, leading to difficulties for real- ...Show More
Synthesizing high-fidelity complex images from text is challenging. Based on large pretraining, the autoregressive and diffusion models can synthesize photo-realistic images. Although these large models have shown notable progress, there remain three flaws. 1) These models require tremendous training data and parameters to achieve good performance. 2) The multi-step generation design slows the ima...Show More
The rapid advances in Vision Transformer (ViT) refresh the state-of-the-art performances in various vision tasks, overshadowing the conventional CNN-based models. This ignites a few recent striking-back research in the CNN world showing that pure CNN models can achieve as good performance as ViT models when carefully tuned. While encouraging, designing such high-performance CNN models is challengi...Show More
Transformer-based models achieve favorable performance in artistic style transfer recently thanks to its global receptive field and powerful multi-head/layer attention operations. Nevertheless, the over-paramerized multi-layer structure increases parameters significantly and thus presents a heavy burden for training. Moreover, for the task of style transfer, vanilla Transformer that fuses content ...Show More
The aim of this paper is to propose a large scale dataset for image restoration (LSDIR). Recent work in image restoration has been focused on the design of deep neural networks. The datasets used to train these networks ‘only’ contain some thousands of images, which is still incomparable with the large scale datasets for other vision tasks such as visual recognition and object detection. The small...Show More
We propose a novel edge guided generative adversarial network with contrastive learning (ECGAN) for the challenging semantic image synthesis task. Although considerable improvements have been achieved by the community in the recent period, the quality of synthesized images is far from satisfactory due to three largely unresolved challenges. 1) The semantic labels do not provide detailed structural...Show More
Data-driven automatic target recognition (ATR) methods have become the mainstream in the synthetic aperture radar (SAR) community at this stage. However, in real SAR application scenarios, the scarcity of training samples is a common problem. Especially in military application scenarios, only a small number of samples of each type of target are usually available. In the case of limited sample avai...Show More
3D-aware GANs have shown their impressive power on 3D controlling for synthesized portraits. While the plausible facial reality is achieved, the inherent 3D properties of the generated results have actually not been well analyzed. One of the reasons is that the wildly-used metrics, such as Inception Score (IS) or Fréchet Inception Distance (FID), focus more on the perceptual features rather than e...Show More
For semantic-guided cross-view image translation, it is crucial to learn where to sample pixels from the source view image and where to reallocate them guided by the target view semantic map, especially when there is little overlap or drastic view difference between the source and target images. Hence, one not only needs to encode the long- range dependencies among pixels in both the source view i...Show More
Brain vessel image segmentation can be used as a promising biomarker for better prevention and treatment of different diseases. One successful approach is to consider the segmentation as an image-to-image translation task and perform a conditional Generative Adversarial Network (cGAN) to learn a transformation between two distributions. In this paper, we present a novel multi-view approach, MLP-GA...Show More
Deep-learning-based synthetic aperture radar (SAR) automatic target recognition (ATR) algorithms have achieved outstanding performance under the condition of hundreds or thousands of training samples in recent years. Nevertheless, it is often rare to acquire great quantities of target samples in real SAR application scenarios. This article proposes a novel ATR method called transductive prototypic...Show More