Loading [MathJax]/extensions/MathMenu.js
Real-Time User-guided Adaptive Colorization with Vision Transformer | IEEE Conference Publication | IEEE Xplore

Real-Time User-guided Adaptive Colorization with Vision Transformer


Abstract:

Recently, the vision transformer (ViT) has achieved remarkable performance in computer vision tasks and has been actively utilized in colorization. Vision transformer use...Show More

Abstract:

Recently, the vision transformer (ViT) has achieved remarkable performance in computer vision tasks and has been actively utilized in colorization. Vision transformer uses multi-head self attention to effectively propagate user hints to distant relevant areas in the image. However, despite the success of vision transformers in colorizing the image, heavy underlying ViT architecture and the large computational cost hinder active real-time user interaction for colorization applications. Several research removed redundant image patches to reduce the computational cost of ViT in image classification tasks. However, the existing efficient ViT methods cause severe performance degradation in colorization task since it completely removes the redundant patches. Thus, we propose a novel efficient ViT architecture for real-time interactive colorization, AdaColViT determines which redundant image patches and layers to reduce in the ViT. Unlike existing methods, our novel pruning method alleviates performance drop and flexibly allocates computational resources of input samples, effectively achieving actual acceleration. In addition, we demonstrate through extensive experiments on ImageNet-ctest10k, Oxford 102flowers, and CUB-200 datasets that our method outperforms the baseline methods.
Date of Conference: 03-08 January 2024
Date Added to IEEE Xplore: 09 April 2024
ISBN Information:

ISSN Information:

Conference Location: Waikoloa, HI, USA

Funding Agency:

Citations are not available for this document.

1. Introduction

Despite the difficulty of colorization due to the requirement of a semantic understanding of the scenery and natural colors that dwell in the wild, various user-guided image colorization methods have shown remarkable results in restoring grayscale photographs as well as black and white films. Among the user-guided colorization, the point-interactive colorization methods [12],[27],[36] help users with user-guided hints to assist in colorizing an image, while minimizing interaction with users. In particular, [36] proposed a colorization method with U-net architecture trained on ImageNet [3] and training with synthetically generated user hints through 2-D Gaussian sampling. However, prior works suffer from partial colorization, where the unclear boundary of images is not colored successfully. Furthermore, failure in consistent colorization comes from the difficulty of propagating hints to large and distant semantic regions. In order to tackle this problem, [33] leverages the architecture of vision transformers (ViT), allowing the model to learn to propagate the user hints to other distant and similar regions with self-attention. Despite the exceptional performance of ViT in colorization applications, transformer-based models contain redundant computations resulting in slow inference speed. This problem limits users’ active interactions on a variety of real-time colorization applications.

Cites in Papers - |

Cites in Papers - Other Publishers (1)

1.
Shaopeng Li, Decao Ma, Yao Ding, Yong Xian, Tao Zhang, "DBSF-Net: Infrared Image Colorization Based on the Generative Adversarial Model with Dual-Branch Feature Extraction and Spatial-Frequency-Domain Discrimination", Remote Sensing, vol.16, no.20, pp.3766, 2024.
Contact IEEE to Subscribe

References

References is not available for this document.