Loading [MathJax]/extensions/MathMenu.js
Dense Multitask Learning to Reconfigure Comics | IEEE Conference Publication | IEEE Xplore

Dense Multitask Learning to Reconfigure Comics


Abstract:

In this paper, we develop a MultiTask Learning (MTL) model to achieve dense predictions for comics panels to, in turn, facilitate the transfer of comics from one publicat...Show More

Abstract:

In this paper, we develop a MultiTask Learning (MTL) model to achieve dense predictions for comics panels to, in turn, facilitate the transfer of comics from one publication channel to another by assisting authors in the task of reconfiguring their narratives. Our MTL method can successfully identify the semantic units as well as the embedded notion of 3D in comics panels. This is a significantly challenging problem because comics comprise disparate artistic styles, illustrations, layouts, and object scales that depend on the author's creative process. Typically, dense image-based prediction techniques require a large corpus of data. Finding an automated solution for dense prediction in the comics domain, therefore, becomes more difficult with the lack of ground-truth dense annotations for the comics images. To address these challenges, we develop the following solutions- we leverage a commonly-used strategy known as unsupervised image-to-image translation, which allows us to utilize a large corpus of real-world annotations; - we utilize the results of the translations to develop our multitasking approach that is based on a vision transformer backbone and a domain transferable attention module; -we study the feasibility of integrating our MTL dense-prediction method with an existing retargeting method, thereby reconfiguring comics.
Date of Conference: 17-24 June 2023
Date Added to IEEE Xplore: 14 August 2023
ISBN Information:

ISSN Information:

Conference Location: Vancouver, BC, Canada

Funding Agency:


1. Introduction

Comics represent an important part of cultural heritage, preserving decades of artistic expression, stories, societal views, and lore, that predate digital media. Appreciated across age groups, the medium of comics has undergone significant evolution over the past decade. Particularly, there has been a rapid proliferation of digital comics as a consequence of reduced costs, easier transportation, and ubiquitous access. The increasing demands for comics digitization across platforms such as computers, tablets, and mobile phones, call for the automated extraction and identification of relevant elements within comics books. This process of automatically transferring the semantic or graphical units of comics from one publication medium to another is explained as reconfiguration of comics. To achieve such reconfigurations across various publishing platforms, a viable step is investigating the relations between the comics elements by means of computational modeling. However, this is a significantly challenging problem due to the 1) disparate styles of comics panels, 2) different text layouts, 3) changing appearances of comics characters, 4) different image scales of elements, panels, etc. Typically, the existing studies that investigate the various elements in comics are restricted to speech balloon segmentation [1], [20], [33], text detection [32], panel detection [31], [34], [46], comics character detection [10], [26], and region of interest detection [29]. Recently, [3], presented a depth estimation method on comics by exploiting the scene context. While these methods achieve promising results, they do not produce strong cues for reconfiguring the comics. Further, none of these existing techniques present a unified approach to produce multiple dense predictions of the graphical elements such as semantic segmentation and depth estimation, simultaneously. In this paper, we present a multitasking method, to segment and estimate the depth of the graphical contents within a comics panel, which are significant cues for reconfiguring comics across different media, as shown in Figure 1. This would help comics authors to diffuse their work across diverse publication channels, thereby benefiting the comics industry. Our contributions are as follows: 1) We introduce a cross-domain multitasking method to perform dense predictions by leveraging an off-the-shelf unsupervised I2I translation method and a vision transformer backbone. 2) We exploit the long-range transformer attention [21] to achieve segmentation and depth predictions in the comics domain. To this end, we use a domain transferable attention mechanism that enforces similarity between the domains. 3) We utilize our dense MTL predictions with an existing retargeting algorithm that successfully reconfigures comics panels across different media.

Contact IEEE to Subscribe

References

References is not available for this document.