Loading [MathJax]/extensions/MathZoom.js
In So Kweon - IEEE Xplore Author Profile

Showing 1-25 of 277 results

Results

Unsupervised domain adaptation (UDA) has been a potent technique to handle the lack of annotations in the target domain, particularly in semantic segmentation task. This study introduces a different UDA scenarios where the target domain contains unlabeled video frames. Drawing upon recent advancements of self-supervised learning of the object motion from unlabeled videos with geometric constraint,...Show More
We present a novel data-efficient semi-supervised framework to improve the generalization of image captioning models. Constructing a large-scale labeled image captioning dataset is expensive in terms of labor, time, and cost. In contrast to manually annotating all the training samples, separately collecting uni-modal datasets is immensely easier, e.g., a large-scale image dataset and a sentence da...Show More
Diffusion models have gained significant popularity for image-to-image translation tasks. Previous efforts applying diffusion models to image super-resolution have demonstrated that iteratively refining pure Gaussian noise using a U-Net architecture trained on denoising at various noise levels can yield satisfactory high-resolution images from low-resolution inputs. However, this iterative refinem...Show More
Full-sky autonomous star identification is one of the key technologies in the research on star sensors. As one of the classical pattern-based star identification methods, the grid algorithm has shown promising performance. Na further modified it to improve its robustness to position noise. However, the inherent alignment star mismatch and pattern inconsistency are still not solved. To address thes...Show More
Actual image super-resolution is an extremely challenging task due to complex degradations existing in the image. To solve this problem, two dominant methodologies have emerged: degradation-estimation-based Addressing actual image super-resolution remains a formidable challenge due to the intricate degradations present in images. Two primary methodologies have emerged: degradation-estimation-based...Show More
Diffusion probabilistic models (DPM) have been widely adopted in image-to-image translation to generate high-quality images. Prior attempts at applying the DPM to image super-resolution (SR) have shown that iteratively refining a pure Gaussian noise with a conditional image using a U-Net trained on denoising at various-level noises can help obtain a satisfied high-resolution image for the low-reso...Show More
Prior to the deployment of robotic systems, pre-training the deep-recognition models on all potential visual cases is infeasible in practice. Hence, test-time adaptation (TTA) allows the model to adapt itself to novel environments and improve its performance during test time (i.e., lifelong adaptation). Several works for TTA have shown promising adaptation performances in continuously changing env...Show More
Test-time adaptation methods have been gaining attention recently as a practical solution for addressing source-to-target domain gaps by gradually updating the model without requiring labels on the target data. In this paper, we propose a method of test-time adaptation for category-level object pose estimation called TTA-COPE. We design a pose ensemble approach with a self-training loss using pose...Show More
In this paper, we propose a single image scale estimation method based on a novel scale field representation. A scale field defines the local pixel-to-metric conversion ratio along the gravity direction on all the ground pixels. This representation resolves the ambiguity in camera parameters, allowing us to use a simple yet effective way to collect scale annotations on arbitrary images from human ...Show More
Robust and accurate geometric understanding against adverse weather conditions is one top prioritized conditions to achieve a high-level autonomy of self-driving cars. However, autonomous driving algorithms relying on the visible spectrum band are easily impacted by weather and lighting conditions. A long-wave infrared camera, also known as a thermal imaging camera, is a potential rescue to achiev...Show More
Mask-guided matting has shown great practicality compared to traditional trimap-based methods. The mask-guided approach takes an easily-obtainable coarse mask as guidance and produces an accurate alpha matte. To extend the success toward practical usage, we tackle mask-guided matting in the wild, which covers a wide range of categories in their complex context robustly. To this end, we propose a s...Show More
The task of Visual Question Answering (VQA) is known to be plagued by the issue of VQA models exploiting biases within the dataset to make its final prediction. Various previous ensemble based debiasing methods have been proposed where an additional model is purposefully trained to be biased in order to train a robust target model. However, these methods compute the bias for a model simply from th...Show More
This paper presents a simple yet effective approach that improves continual test-time adaptation (TTA) in a memory-efficient manner. TTA may primarily be conducted on edge devices with limited memory, so reducing memory is crucial but has been overlooked in previous TTA studies. In addition, long-term adaptation often leads to catastrophic forgetting and error accumulation, which hinders applying ...Show More
Driven by improved architectures and better representation learning frameworks, the field of visual recognition has enjoyed rapid modernization and performance boost in the early 2020s. For example, modern ConvNets, represented by ConvNeXt [33], have demonstrated strong performance in various scenarios. While these models were originally designed for supervised learning with ImageNet labels, they ...Show More
Counterfactuals have been shown to be a powerful method in Visual Question Answering in the alleviation of Visual Question Answering’s unimodal bias. However, existing counterfactual methods tend to generate samples that are not diverse or require auxiliary models to synthesize additional data. In this regard, we propose a more diverse and simple counterfactual sample synthesis method called Count...Show More
The goal of this work is to develop self-sufficient framework for Continuous Sign Language Recognition (CSLR) that addresses key issues of sign language recognition. These include the need for complex multi-scale features such as hands, face, and mouth for understanding, and absence of frame-level annotations. To this end, we propose (1) Divide and Focus Convolution (DFConv) which extracts both ma...Show More
Universal Domain Adaptation aims to transfer the knowledge between the datasets by handling two shifts: domain-shift and category-shift. The main challenge is correctly distinguishing the unknown target samples while adapting the distribution of known class knowledge from source to target. Most existing methods approach this problem by first training the target adapted known classifier and then re...Show More
To understand our surrounding world, our brain is continuously inundated with multisensory information and their complex interactions coming from the outside world at any given moment. While processing this information might seem effortless for human brains, it is challenging to build a machine that can perform similar tasks since complex interactions cannot be dealt with a single type of integrat...Show More
Recently, thermal image based 3D understanding is gradually attracting attention for an illumination condition agnostic machine vision. However, the difficulty of the thermal image lies in insufficient training supervision due to its low-contrast and texturesless properties. Also, introducing additional modality requires further constraints such as complicated multi-sensor calibration and synchron...Show More
Recent state-of-the-art active learning methods have mostly leveraged generative adversarial networks (GANs) for sample acquisition; however, GAN is usually known to suffer from instability and sensitivity to hyperparameters. In contrast to these methods, in this article, we propose a novel active learning framework that we call Maximum Classifier Discrepancy for Active Learning (MCDAL) that takes...Show More
With the increasing social demands of disaster response, methods of visual observation for rescue and safety have become increasingly important. However, because of the shortage of datasets for disaster scenarios, there has been little progress in computer vision and robotics in this field. With this in mind, we present the first large-scale synthetic dataset of egocentric viewpoints for disaster ...Show More
In this paper, we propose a multi-objective camera ISP framework that utilizes Deep Reinforcement Learning (DRL) and camera ISP toolbox that consist of network-based and conventional ISP tools. The proposed DRL-based camera ISP framework iteratively selects a proper tool from the toolbox and applies it to the image to maximize a given vision task-specific reward function. For this purpose, we impl...Show More
Test-time adaptation approaches have recently emerged as a practical solution for handling domain shift without access to the source domain data. In this paper, we propose and explore a new multi-modal extension of test-time adaptation for 3D semantic segmentation. We find that, directly applying existing methods usually results in performance instability at test time, because multi-modal input is...Show More
Recently, memory-based approaches show promising results on semi-supervised video object segmentation. These methods predict object masks frame-by-frame with the help of frequently updated memory of the previous mask. Different from this per-frame inference, we investigate an alternative perspective by treating video object segmentation as clip-wise mask propagation. In this per-clip inference sch...Show More
The capability of the traditional semi-supervised learning (SSL) methods is far from real-world application due to severely biased pseudo-labels caused by (1) class imbalance and (2) class distribution mismatch between labeled and unlabeled data. This paper addresses such a relatively under-explored problem. First, we propose a general pseudo-labeling framework that class-adaptively blends the sem...Show More