1. Introduction
Comics represent an important part of cultural heritage, preserving decades of artistic expression, stories, societal views, and lore, that predate digital media. Appreciated across age groups, the medium of comics has undergone significant evolution over the past decade. Particularly, there has been a rapid proliferation of digital comics as a consequence of reduced costs, easier transportation, and ubiquitous access. The increasing demands for comics digitization across platforms such as computers, tablets, and mobile phones, call for the automated extraction and identification of relevant elements within comics books. This process of automatically transferring the semantic or graphical units of comics from one publication medium to another is explained as reconfiguration of comics. To achieve such reconfigurations across various publishing platforms, a viable step is investigating the relations between the comics elements by means of computational modeling. However, this is a significantly challenging problem due to the 1) disparate styles of comics panels, 2) different text layouts, 3) changing appearances of comics characters, 4) different image scales of elements, panels, etc. Typically, the existing studies that investigate the various elements in comics are restricted to speech balloon segmentation [1], [20], [33], text detection [32], panel detection [31], [34], [46], comics character detection [10], [26], and region of interest detection [29]. Recently, [3], presented a depth estimation method on comics by exploiting the scene context. While these methods achieve promising results, they do not produce strong cues for reconfiguring the comics. Further, none of these existing techniques present a unified approach to produce multiple dense predictions of the graphical elements such as semantic segmentation and depth estimation, simultaneously. In this paper, we present a multitasking method, to segment and estimate the depth of the graphical contents within a comics panel, which are significant cues for reconfiguring comics across different media, as shown in Figure 1. This would help comics authors to diffuse their work across diverse publication channels, thereby benefiting the comics industry. Our contributions are as follows: 1) We introduce a cross-domain multitasking method to perform dense predictions by leveraging an off-the-shelf unsupervised I2I translation method and a vision transformer backbone. 2) We exploit the long-range transformer attention [21] to achieve segmentation and depth predictions in the comics domain. To this end, we use a domain transferable attention mechanism that enforces similarity between the domains. 3) We utilize our dense MTL predictions with an existing retargeting algorithm that successfully reconfigures comics panels across different media.