Conferences >2023 IEEE/CVF Conference on C...

Dense Multitask Learning to Reconfigure Comics

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

In this paper, we develop a MultiTask Learning (MTL) model to achieve dense predictions for comics panels to, in turn, facilitate the transfer of comics from one publicat...Show More

Metadata

Abstract:

In this paper, we develop a MultiTask Learning (MTL) model to achieve dense predictions for comics panels to, in turn, facilitate the transfer of comics from one publication channel to another by assisting authors in the task of reconfiguring their narratives. Our MTL method can successfully identify the semantic units as well as the embedded notion of 3D in comics panels. This is a significantly challenging problem because comics comprise disparate artistic styles, illustrations, layouts, and object scales that depend on the author's creative process. Typically, dense image-based prediction techniques require a large corpus of data. Finding an automated solution for dense prediction in the comics domain, therefore, becomes more difficult with the lack of ground-truth dense annotations for the comics images. To address these challenges, we develop the following solutions- we leverage a commonly-used strategy known as unsupervised image-to-image translation, which allows us to utilize a large corpus of real-world annotations; - we utilize the results of the translations to develop our multitasking approach that is based on a vision transformer backbone and a domain transferable attention module; -we study the feasibility of integrating our MTL dense-prediction method with an existing retargeting method, thereby reconfiguring comics.

Published in: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

Date of Conference: 17-24 June 2023

Date Added to IEEE Xplore: 14 August 2023

ISBN Information:

ISSN Information:

DOI: 10.1109/CVPRW59228.2023.00598

Conference Location: Vancouver, BC, Canada

Funding Agency:

Contents

1. Introduction

Comics represent an important part of cultural heritage, preserving decades of artistic expression, stories, societal views, and lore, that predate digital media. Appreciated across age groups, the medium of comics has undergone significant evolution over the past decade. Particularly, there has been a rapid proliferation of digital comics as a consequence of reduced costs, easier transportation, and ubiquitous access. The increasing demands for comics digitization across platforms such as computers, tablets, and mobile phones, call for the automated extraction and identification of relevant elements within comics books. This process of automatically transferring the semantic or graphical units of comics from one publication medium to another is explained as reconfiguration of comics. To achieve such reconfigurations across various publishing platforms, a viable step is investigating the relations between the comics elements by means of computational modeling. However, this is a significantly challenging problem due to the 1) disparate styles of comics panels, 2) different text layouts, 3) changing appearances of comics characters, 4) different image scales of elements, panels, etc. Typically, the existing studies that investigate the various elements in comics are restricted to speech balloon segmentation [1], [20], [33], text detection [32], panel detection [31], [34], [46], comics character detection [10], [26], and region of interest detection [29]. Recently, [3], presented a depth estimation method on comics by exploiting the scene context. While these methods achieve promising results, they do not produce strong cues for reconfiguring the comics. Further, none of these existing techniques present a unified approach to produce multiple dense predictions of the graphical elements such as semantic segmentation and depth estimation, simultaneously. In this paper, we present a multitasking method, to segment and estimate the depth of the graphical contents within a comics panel, which are significant cues for reconfiguring comics across different media, as shown in Figure 1. This would help comics authors to diffuse their work across diverse publication channels, thereby benefiting the comics industry. Our contributions are as follows: 1) We introduce a cross-domain multitasking method to perform dense predictions by leveraging an off-the-shelf unsupervised I2I translation method and a vision transformer backbone. 2) We exploit the long-range transformer attention [21] to achieve segmentation and depth predictions in the comics domain. To this end, we use a domain transferable attention mechanism that enforces similarity between the domains. 3) We utilize our dense MTL predictions with an existing retargeting algorithm that successfully reconfigures comics panels across different media.

References is not available for this document.

Dense Multitask Learning to Reconfigure Comics

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

1. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Dense Multitask Learning to Reconfigure Comics

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

1. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?