Introduction
The range of colors that a device is able to reproduce is called its color gamut. A very common and convenient way of describing colors is to ignore their luminance component and just represent the chromatic content on a 2D plane known as the CIE xy chromaticity diagram, shown in Fig. 1. In this figure the tongue-shaped region corresponds to the chromaticities of all the colors a standard observer can perceive. Most existing displays are based on the trichromacy property of human vision, creating colors by mixing three well-chosen red, green and blue primaries in different proportions. The chromaticities of these primaries determine a triangle in the CIE xy chromaticity diagram, and this triangle is the color gamut of the display in question. Therefore, for any given three-primary display there will be many colors that we could perceive but the display is not able to generate, i.e., all the colors with chromaticities outside the triangle associated to the display. Also, devices with different sets of primaries will have different gamuts. For this reason, in order to facilitate inter-operability a number of standard distribution gamuts have been defined, and for cinema the most relevant ones are shown in Fig. 1: DCI-P3 [1] is the standard gamut used in cinema postproduction and recommended for digital cinema projection, BT.709 [2] is used for cable and broadcast TV, DVD, Blu-Ray and streaming, and BT.2020 [3] is a very wide color gamut for the next generation UHDTV, currently only achievable by some state-of-the-art laser projectors. Fig. 1 also shows Pointer’s gamut [4], which covers all the frequently occurring real surface colors; we can see how only BT.2020 is able to completely include Pointer’s gamut.
The adaptation to a standard gamut implies altering the range of colors (and contrast) of the original content. This process is either carried out within the camera in live TV broadcasts (or low-budget movie productions), or performed off-line by expert technicians in the cinema industry. In practice, for the purpose of modifying the movie gamut, colorists at the post-production stage build 3D look-up-tables (LUTs) for each movie or specific scenes in it. These LUTs contain millions of entries and colorists only specify a few colors manually, while the rest are interpolated regardless of their spatial or temporal distribution [5]. Subsequently, the resulting movie may have false colors that were not originally present. To tackle this problem, colorists usually perform intensive manual correction in a shot-by-shot, object-by-object basis. This process is difficult, time consuming and expensive, and therefore it makes an automated procedure called gamut mapping (GM) very desirable: GM transforms an image so that its colors better fit the target gamut.
In general, there are two types of GM procedures. First is gamut reduction (GR), in which colors are mapped from a larger source gamut to a smaller destination gamut. A common situation where GR is necessary is when a movie intended for cinema viewing is displayed on a TV [6], [7]. Second is gamut extension (GE), that involves mapping colors from a smaller source gamut to a larger destination gamut. For example, wide-gamut state-of the-art displays often receive movies that are encoded with limited-gamuts as a precaution against regular (or poor) display devices; therefore, we cannot exploit the full color rendering potential of these new devices unless we use a GE procedure [5]. The process of GE is gaining importance with the introduction of new display technologies and laser projectors [8], [9]. These new displays use pure (very saturated) color primaries which enable them to cover much wider gamuts, so now a tablet screen may have a DCI-P3 gamut for instance, while all the content it shows comes in the smaller BT.709 standard.
At this point, we present a clarification on how gamut reduction and gamut extension differ practically. Gamut reduction is required, not optional, when the colors of the input image fall outside the display’s gamut; without GR, the display will reproduce the image with artifacts and loss of spatial detail. On the contrary, gamut extension is not essential, rather it is considered as an enhancement operation [10]. For example, BT.709 footage presented as it is on a wide-gamut BT.2020 display device won’t show any visual artifact, it’s just that we will be missing the color rendering potential of the wide-gamut screen.
As a main contribution, in this paper we present a framework for gamut mapping that is based on models from vision science and that allows us to perform both gamut reduction and gamut extension. It is computationally efficient and yields results that outperform state-of the-art methods, as validated using psychophysical tests. Another contribution of our research is to highlight the limitations of existing image quality metrics when applied to the GM problem, as none of them, including two state-of-the-art deep learning metrics for image perception, trained over large and very large scale databases (20,000+ images in one case, 160,000+ in the other) is able to predict the preferences of the observers.
We believe our results are of importance to the computer vision community for two main reasons. First, because they provide another example that drawing insights from vision science and developing algorithms based on vision models can yield state-of-the-art performance for computer vision applications. And second, because our results demonstrate how deep learning approaches are not yet suitable to emulate perception with an adequate degree of accuracy, even when using very large databases with a huge number of human annotations. This begs the question of whether this failure is due to limitations in the network architecture, or rather a more intrinsic issue is at hand, as for instance it has been argued that the convolution-based spatial summation of artificial neural networks cannot constitute a proper model of how biological networks process information [11].
Related Work
A large number of gamut mapping algorithms (GMAs) have been proposed in the literature, we refer the interested reader to the comprehensive book of Morovic̆ [10]. GMAs can be divided into two main categories: gamut reduction algorithms (GRAs) and gamut extension algorithms (GEAs). Both GRAs and GEAs can further be classified into two sub-classes: global and local. Global (also known as non-local or non-adaptive) methods map colors of an image to the target gamut independently, while completely ignoring the spatial distribution of colors in the image. Whereas local (also known as spatial) methods modify pixel values by taking into account their neighborhoods; as a result, two identical values surrounded by different neighborhoods will be mapped to two different values.
Global GRAs. One class of global GRAs consists of gamut clipping methods [12], [13], [14]. Gamut clipping is a very common approach to perform gamut reduction where colors that lie inside the destination gamut are left untouched while those colors that fall outside are projected onto the destination gamut boundary. In order to produce reduced-gamut images, gamut clipping techniques use particular strategies and mapping directions. For example, clipping chroma of the out-of-gamut (OOG) colors along lines of constant hue and lightness [15]; clipping each OOG color by seeking a minimum
Local GRAs. The frequency-based local GRAs [26], [27], [28] first reduce the gamut of the source image using a global method, and then in the second stage the high frequency image detail (obtained by using a spatial filter) is added to the reduced-gamut image. In these GRAs, another stage of gamut clipping is integrated to process the resulting image in case the spatial filtering operation places a few pixels outside the destination gamut. Local GRAs that are inspired from the Retinex framework perform spatial comparisons to retain source image gradients in the reproduced images [29], [30], [31], [32]. Some spatial GRAs [33], [34], [35], [36], [37] pose gamut mapping as an optimization scheme where, given a source image and its gamut mapped version, the aim is to keep perturbing the gamut mapped image until its difference w.r.t. the source image is minimized according to an error metric. Finally, an image energy functional [38] is introduced to decrease the contrast of the input image in order to perform gamut reduction.
Global GEAs. While the majority of the published GMAs deal with the problem of gamut reduction, the case is very different for gamut extension: only a few works have been proposed in this direction. One simple solution to perform gamut extension is to take any compression GRA and use it in the reverse direction [39], [40], [41]. However, this way of approaching GE may yield images that are unnatural and unpleasant in appearance. The pioneering global GEAs [42], [43] map limited-gamut printed images to the wide gamut of HDTV in two stages: first the lightness is mapped using a non-linear tone reproduction curve, and second the chroma is extended along lines of constant hue and lightness. A few methods [44], [45] perform gamut extension using functions learned from user studies. Unlike the aforementioned GEAs, some global methods [46], [47], [48] first classify the colors of the input image according to a criterion, and then perform gamut extension differently for each class. For example, labelling each color of a given image as skin or non-skin [46]; dealing with objects of low chroma and high chroma differently [47]; identifying certain memory colors such as green grass and blue sky, and rendering them independently [48]. Other approaches [49], [50] propose three types of extensions: chroma extension, extension along lines from the origin, and adaptive mapping that is a compromise between the first two strategies. Some global GEAs [51], [52], [53] aim at preserving skin tones in the reproduced images.
Local GEAs. The local GEAs extend colors by taking into account their spatial distribution in the input image. This property certainly makes local GEAs adaptive and flexible but at the same time far more complex and computationally expensive than global GEAs. The multilevel GEA [54] in its first stage extends the source gamut using a non-linear hue-varying function, and in the second stage applies an image-dependent chroma smoothing operation to avoid an over-enhancement of contrast and to preserve detail in the final image. Recent works [38], [55], [56] perform spatial gamut extension using partial differential equations. In particular, the contrast of the input image is enhanced by minimizing an energy functional [38]; a monotonically increasing function [55] is applied on the saturation channel of the input image in HSV color space that allows to increase contrast without decreasing the image saturation values; and the GEA [56] operates only on the chromatic components of CIELAB color space, while taking into account the analysis of distortions in hue, chroma and saturation.
In this paper we propose a novel framework that overcomes several issues of current GM approaches, performing both GR and GE at a low computational cost and where the results are free from spatio-temporal artifacts. In the next section we will start by briefly mentioning some facts and models from the vision science literature that form the basis of our GM framework, that is going to be introduced in Section 4.
Some Vision Facts and Models for Gamut Mapping
Light reaching the retina is transformed into electrical signals by photoreceptors, rods and cones. At photopic light levels, rods are saturated and the visual information comes from cones, of which there are three types, according to the wavelengths they are most sensitive to: L (long), M (medium), and S (short). The response of all photoreceptors is non-linear and, for a single cell without feedback, can be well approximated by the Naka-Rushton equation [57], which is a particular instance of a divisive normalization operation [58], i.e., a process that computes the ratio between the response of an individual neuron and some weighted average of the activity of its neighbors, and this in turns allows the photoreceptor response to adapt to the average light level therefore optimizing its operative range.
Photoreceptors do not operate individually though, they receive negative (inhibitory) feedback from horizontal cells, which receive excitatory input from cones and generate inhibitory input to cones. Cone output goes to bipolar cells, that also receive lateral inhibition from horizontal cells and from another type of retinal neurons called amacrine cells. Bipolars feed into retinal ganglion cells (RGCs), which also receive input from amacrine cells, and the axons of the ganglion cells form the optic nerve, sending visual signals to the lateral geniculate nucleus (LGN) in the thalamus, where the signals are re-organized into different layers each projecting to a specific layer in the cortex. There are numerous axons providing feedback from the cortex to the LGN, but their influence on color vision is not known [59].
The lateral inhibition or center-surround processing, in which a cell’s output corresponds to the difference between the activity of the cell’s closest neighbors and the activity of the cells in the near (and possibly far) surround, allows to encode and enhance contrast therefore being key for efficient representation, and is present at every stage of visual processing from the retina to the cortex. The size of the receptive field (the visual region to which a neuron is sensitive to) tends to increase as we progress down the visual pathway. Lateral inhibition is often modeled as a linear operation, a convolution with a kernel shaped as a difference of Gaussians (DoG). In recent studies, the surround receptive field of RGCs is modeled as a sum of Gaussians [60]. RGCs produce an achromatic signal by combining information coming from the three cone types (L+M+S), and produce chromatic opponent signals by performing center-surround processing on signals coming from cones of different types: (L+M)-S roughly corresponds to “Yellow - Blue” opponency, and L-M to “Red - Green”. Achromatic and color-opponent signals are kept separate in the LGN and onto the cortex.
There are two types of bipolars, one that is excited by light increments but does not respond to decrements, and the other that responds only to light decrements; they are organized in parallel channels that separately transmit lightness and darkness, and that are maintained separate from the retina to the cortex throughout the whole visual pathway.
In the vision science literature the response of a cell is often (but not exclusively) modeled as a linear operation (weighted summation of the neighbors’ activity, e.g., for lateral inhibition) followed by a non-linear operation (e.g., rectification, so as to consider only increments or decrements, but not both). For the linear part, DoG filters and oriented DoG (ODoG) filters are useful in predicting many perceptual phenomena [61], while common models for the non-linear part include rectification, divisive normalization and power-laws. For instance, non-linear photoreceptor response followed by linear filtering produces bandpass contrast enhancement that correlates with the contrast sensitivity function (CSF) of the human visual system [62].
The purity of a color is represented by its saturation
Finally, and very importantly for the GM problem, we shall mention the so-called Helmholtz-Kohlrausch effect, that implies that brightness perception depends on luminance and chrominance: color patches of the same luminance but different hue appear to have different brightness, as well as color patches of the same luminance and hue but different saturation (the higher the saturation, the higher the perceived brightness). As a consequence, if we were to modify the saturation of a color while preserving all other attributes, its brightness would appear to change. The interaction between achromatic and chromatic signals that produces brightness perception has been shown to happen at V1 [65], not before; there are some models for this, e.g., [66], [67], with a review in [68].
Proposed Gamut Mapping Framework
In this section we first describe the basic functionality of our gamut extension and gamut reduction methods, and later provide implementation details in Section 4.3.
4.1 Gamut Extension
In previous works we have shown how a contrast enhancement method, implemented as a partial differential equation that minimizes a certain energy, produces gamut extension when applied independently to the R, G and B channels of an input image [38], but also when applied just on the color opponent channels [56] or just on the saturation channel [55]. See Fig. 2 for an illustration.
Contrast enhancement produces gamut extension. Top row: (a) Input image, (b) enhancing the contrast of all channels in RGB [38], (c) enhancing the contrast of chroma in CIELAB [56], and (d) enhancing the contrast of saturation in HSV [55]. Bottom row: corresponding gamuts in CIE diagram. Note that the source gamut (black) and target gamut (red) are fixed as they correspond to the gamut of the display devices.
Based on this gamut extension ability that contrast enhancement has, and considering some of the vision models enumerated in the previous section, we now propose the following basic gamut extension method: perform contrast enhancement on the saturation channel by center-surround filtering (using a model of lateral inhibition), followed by rectification so as to ensure that the saturation does not decrease (based on a model of nonlinear processing by single neurons and the existence of ON and OFF pathways), and finally modify the brightness to account for the Helmholtz-Kohlrausch (H-K) effect (using a modified model of brightness based on neural activity data).
Basic GE method:
The inputs are an image sequence, whose gamut we want to extend, and the specifications of the source and target gamuts.
Convert each input frame into
color space (see Appendix, which can be found on the Computer Society Digital Library at http://doi.ieeecomputersociety.org/10.1109/TPAMI.2019.2938499,). We will keepHSV constant.H Using the specifications of source and target gamuts, define a linear filter
similar to a DoG. This filter is then convolved withK_e , obtainingS which is contrast-enhanced. Fig. 3a (left) shows an exampleS_1 filter.K_e Fig. 3.Examples of kernel used in our framework: (a) For gamut extension. (b) For gamut reduction.
Add a constant value image
toC , obtainingS_1 . This step attempts to preserve the mean of the original image.S_2 Rectify
and produce(S_2-S) , whereS_3= S + rectified(S_2-S) . This ensures thatrectified(S_2-S)=max(S_2-S,0) for each and every pixelS_3(x) \geq S(x) , i.e., that the process increases the saturation for all pixels with respect to their value in the original image.x Modify
to compensate for the Helmholtz-Kohlrausch effect, correctingV so that perceived brightness does not change for those colors whose saturation has been modified. This is done using a simplified version of the model by Pridmore [67] that yieldsV .V_1=V\left(\frac{S}{S_3}\right)^\rho The final result is the image with channels
.(H,S_3,V_1)
As an enhancement to the method we can add a logistic function
\begin{equation*}
\tau (S(x)) = 1-\frac{1}{\left(1+0.55e^{-1.74S(x)}\right)^2}. \tag{1}
\end{equation*}
Comparison of gamut extension results: (a) Input image, (b) extended-gamut image ignoring the H-K effect, and (c) extended-gamut image considering the H-K effect.
4.2 Gamut Reduction
Essentially, gamut extension can be seen as the inverse of the gamut reduction problem [10]. Since GE can be achieved by contrast enhancement, GR can be obtained by decreasing contrast, as we proved in [38]. In [69] Kim et al. showed that convolution with
Basic GR method:
The inputs are an image sequence, whose gamut we want to reduce, and the specifications of the source and target gamuts.
Convert each input frame into
color space. We will keepHSV constant.H Use a linear filter
, similar to a sum of Gaussians, to convolve withK_r , obtainingS which is contrast-decreased.S_1 Add a constant value image
toC , obtainingS_1 . This step attempts to preserve the mean of the original image.S_2 Rectify
and produce(S - S_2) , whereS_3= S - rectified(S - S_2) . This ensures thatrectified(S-S_2)=max(S-S_2,0) for each and every pixelS_3(x) \leq S(x) , i.e., that the process decreases the saturation for all pixels with respect to their value in the original image.x Modify
to compensate for the Helmholtz-Kohlrausch effect:V .V_1=V\left(\frac{S}{S_3}\right)^\rho The final result is the image with channels
.(H,S_3,V_1)
See Fig. 6 comparing the original image (left) with the intermediate result replacing
Comparison of gamut reduction results: (a) Input image, (b) reduced-gamut image ignoring the H-K effect, and (c) reduced-gamut image considering the H-K effect.
While one pass of this basic method already performs GR, we have found that it gives better results to iterate steps (2) to (4) with a sequence of filters
Effect of increasing kernel size on image gamut. Row 1: Example of kernels with progressively larger spatial extent. Row 2: Reduced-gamut images corresponding to each kernel. Last column: Evolution of image gamut that progressively decreases with an increase in the spatial extent of the kernel.
4.3 Implementation Details
4.3.1 Computation of the Convolution Kernel
The kernel
\begin{equation*}
K_e = \mathcal {F}^{-1} \left(\frac{1}{1 - \gamma (\frac{19}{20}- \mathcal {F}(\omega))} \right), \tag{2}
\end{equation*}
The kernel
\begin{equation*}
K_r = \mathcal {F}^{-1} \left(\frac{1}{1 - \gamma (\frac{21}{20}- \mathcal {F}(\omega))}\right), \tag{3}
\end{equation*}
4.3.2 Computation of Optimal \gamma Value
The basic GR method that we have proposed in Section 4.2 is already capable of mapping the colors of a wide-gamut image to a small destination gamut. However, we observed that the same method yields better results if used iteratively in the following manner. At iteration level one we apply steps (2)-(4) of the basic GR method with
In the case of gamut extension, the gamut of the optimal result (in terms of appearance) usually does not extend to the target gamut boundaries, it lies somewhere in between the source and the target gamuts. This implies that for an input image there can be many
Given a pair of source and destination gamuts (3-primaries triangles), we compute the area of the source gamut (
\begin{equation*}
d_{g} = |SG_{area} - DG_{area}|. \tag{4}
\end{equation*}
Considering the color differences between the source and target gamuts (
\begin{equation*}
\gamma _{base} = \root 3 \of {d_g}, \tag{5}
\end{equation*}
For each image, we created gamut extended images using several different values of
\begin{equation*}
\gamma = \gamma _{base} + (T_S - P_{LS})\gamma _{base}, \tag{6}
\end{equation*}
4.3.3 Temporal Aspect
Both for GR and GE our method is applied independently to each frame of the video input. We have not needed to impose any sort of temporal consistency to our algorithm and this is due to the effectively large size of the kernels we use, which remove or strongly reduce the influence of sudden changes and make the results stable and the framework very robust.
4.3.4 The HDR Case
The input is assumed to be of standard dynamic range (SDR), as it is done in the GM literature. This assumption would correspond (going back to the vision fundamentals mentioned in Section 3) to having as input image for our GM framework the signal generated by the photoreceptors. The reason for this is the well known fact that the Naka-Rushton equation that models photoreceptor responses optimizes the performance efficiency of cones and rods by adapting the possibly high dynamic range (HDR) input intensities to the SDR representation capabilities of photoreceptors [70]. In fact several successful tone mapping approaches (that convert HDR images into SDR) in the computer graphics and image processing communities use non-linear curves based on the Naka-Rushton model (e.g., [71], [72]).
Therefore, if the input video to be gamut-mapped is in HDR, our framework requires that it is tone-mapped first and then processed with our GM algorithm. This is consistent with the workflow for GM of HDR content proposed in [73].
The output of our GM method will also be in SDR form. If it were required for it to be in HDR, then an inverse tone mapping method should be applied to the output, preferrably respecting the artistic intent present in the material as in the case of [74].
Psychophysical Evaluation
The goal of gamut mapping for cinema is to develop GMAs that reproduce content respecting as much as possible the artist’s vision, because it is an important feature that a GMA should have in order to be adopted by the movie industry. This could be achieved by including the reference images in the psychophysical tests that act as a stand-in for the content creator’s intent. Therefore we conduct psychophysical experiments in order to compare the performance of the proposed GMAs with other methods in cinematic conditions using a digital cinema projector (Barco-DP-1200 [75]) and a large projection screen.
5.1 Viewing Conditions and Evaluation Protocol
To emulate real cinema-like conditions, we used a large hall with the ambient illuminance of 1 lux and the illumination measured at the screen was around 750 lux. During the experiments there was not any strong colored object present in the observers’ field of view. We used a glare-free screen that was 3 meters wide and 2 meters high. Each observer was instructed to sit approximately 5 meters away from the screen.
In this study, we used a forced-choice pairwise comparison technique to gather raw experiment data (independently) for both gamut reduction and gamut extension problems. Each observer was shown three images simultaneously on the projection screen: the reference image (in the middle) and a pair of reproductions (one image on the left side and the other on the right side of the reference image). We asked each observer to make selections according to these instructions: a) if there are any sort of artifacts in one of the reproductions, choose the other, and b) if both of the reproductions have artifacts or are free from artifacts, choose the one which is perceptually closer to the reference image. In pair comparison evaluation [21], in order to calculate differences among n chosen GMAs, observers need to compare
Finally, a corpus of 15 observers participated in each of the experiments we performed in our lab, as this is the number of observers for pair comparison tests that is suggested by several technical recommendation documents (e.g., [79], [80]). All the observers (12 male and 3 female with ages in the range of 23 to 36 years) had normal color vision as tested with the Ishihara’s test of color deficiency.
5.2 Image Media
The DCI-P3 wide-gamut test images that we used in evaluating GEAs and GRAs are, respectively, shown in Figs. 8 and 9. (Note that these are sRGB images because we are limited to the sRGB standard to show results on paper.) Some of these test images were taken from the publicly available datasets [76], [77], while others were from [56] and from mainstream feature films.
5.3 Experiment 1: Evaluation of GEAs
In the case of the psychophysical evaluation of GEAs, the first step is to create limited-gamut input images. This is achieved by applying a clipping operation on the DCI-P3 reference images in order to map the out-of-gamut colors to the boundary of the BT.709 gamut (or any other desired gamut). In order to perform clipping we used the xyY color space, and clip chromaticities of the out-of-gamut colors of a given image to the boundary of the destination gamut towards a focal point that is the white point ‘D65’. The experimental gamuts that we use in this paper are depicted in Fig. 10, and their primaries are listed in Table 1. The procedure for computing the
5.3.1 Experimental Setups for GE
We have defined the following two different experimental setups for the evaluation of GEAs.
Setup 1: Mapping from small gamut to DCI-P3 gamut. As quantum dot displays [81] and laser projectors [82] with their extended gamut capabilities are becoming popular, in the near future the common case will be to have large color differences between the standard gamut and the gamut of the display. Therefore, this setup is created to investigate how different GEAs will perform when the difference between source and target gamuts is large. To this end, we map the source images from the small ‘Toy’ gamut (slightly smaller than the BT.709 gamut) to the large DCI-P3 gamut. On the chromaticity diagram, the difference in gamuts for this setup is nearly equal to the difference between BT.709 and BT.2020. This represents the future scenario where we need to show on a wide-gamut display some content that was mastered for TV.
Setup 2: Mapping from BT.709 to DCI-P3 gamut. In this setup we mimic the practical situation where the source material has BT.709 gamut and we map the source colors to the colors of the DCI-P3 gamut.
5.3.2 Competing GEAs
For each set-up, we compare the proposed method with the top-ranked GEAs in a recent work [56]). These GEAs are briefly explained as follows.
Same Drive Signal (SDS) method linearly maps the RGB primaries of the source gamut to the RGB primaries of the destination device gamut, therefore making the full use of the gamut of the target display.
Hybrid Color Mapping (HCM) is a combination of the SDS algorithm and the true-color algorithm. The true-color algorithm represents the input image in the target gamut without applying any extension.
The HCM algorithm [50] analyzes the saturation of the input image and then linearly combines the output of the true-color method and the SDS method
where\begin{equation*} \begin{bmatrix}R \\ G \\ B \end{bmatrix}_{HCM} = (1-\kappa) \begin{bmatrix}R \\ G \\ B \end{bmatrix}_{true-color} +\kappa \begin{bmatrix}R \\ G \\ B \end{bmatrix}_{SDS}, \tag{7} \end{equation*} View Source\begin{equation*} \begin{bmatrix}R \\ G \\ B \end{bmatrix}_{HCM} = (1-\kappa) \begin{bmatrix}R \\ G \\ B \end{bmatrix}_{true-color} +\kappa \begin{bmatrix}R \\ G \\ B \end{bmatrix}_{SDS}, \tag{7} \end{equation*}
is a mixing factor that works as a function of saturation:\kappa \begin{equation*} \kappa (S)=\left\lbrace \begin{array}{ll}0, \quad &\text{if } S \leq S_{L} \\ \frac{S-S_L}{S_H-S_L}, \quad &\text{if } S_L < S < S_H \\ 1, \quad &\text{if } S \geq S_H \end{array} \right., \tag{8} \end{equation*} View Source\begin{equation*} \kappa (S)=\left\lbrace \begin{array}{ll}0, \quad &\text{if } S \leq S_{L} \\ \frac{S-S_L}{S_H-S_L}, \quad &\text{if } S_L < S < S_H \\ 1, \quad &\text{if } S \geq S_H \end{array} \right., \tag{8} \end{equation*}
andS_L are constants to define the ranges of saturation for the mixing functionS_H , and their values that we used in our experiments are\kappa andS_L = 0.8 as defined in [56].S_H = 1 The method of HCM aims at preserving natural colors by leaving unchanged the low-saturated colors such as flesh tones, while mapping the high-saturated colors using the SDS method.
GEA of Zamir et al. [56] is a spatially-variant GEA, implemented as a PDE-based optimization procedure, that performs gamut extension in CIELAB color space by taking into account the analysis of distortions in hue, chroma and saturation.
5.3.3 Results of GEAs Under Experimental Setup 1 and Setup 2
Once the reproductions were obtained by applying GEAs on the input images of both setups, we conducted a psychophysical evaluation separately for each setup using the 15 observers mentioned in Section 5.1.
Fig. 11a presents the accuracy scores computed by analyzing the psychophysical data of the setup 1 where it can be seen that, when the difference between the source gamut and the destination gamut is large, the proposed GEA yields images that are perceptually more faithful to the reference images than the other competing algorithms. The observers declared SDS [50] as the least accurate method, whereas the algorithm of [56] ranked second.
Accuracy scores of competing GEAs: 15 observers took part in each experiment and 30 images were used.
In Fig. 11b we present results for the setup 2 where it can be seen that, when the color difference between the source-destination gamut pair is small, our algorithm ranks first, followed by the HCM algorithm [50] and the method of [56].
5.4 Experiment 2: Evaluation of GRAs
This section is devoted to examining the image reproduction quality of competing GRAs. To obtain the reduced-gamut images, we apply the proposed GRA on the saturation channel of the input images by using the proposed GRA in an iterative manner (over the
5.4.1 Experimental Setups for GR
All the competing GRAs receive as input the wide-gamut DCI-P3 images and generate reproductions for the following two different experimental setups.
Setup 1: Mapping from DCI-P3 gamut to a small gamut. We created this particular setup with a large difference between source and target gamuts, nearly as large as it is between BT.2020 and BT.709 gamuts. An experimental setup with such large difference in gamuts allows us to not only evaluate the performance of competing GRAs reliably but also provides us an indication of how these GRAs might perform when BT.2020 content becomes commonly available and needs to be mapped to BT.709 displays or DCI-P3 cinema projectors. To compute the results using the competing GRAs, we map the colors of the 15 DCI-P3 test images shown in Fig. 9 to the challenging smaller ‘Toy’ gamut.
Setup 2: Mapping from DCI-P3 to BT.709 gamut. Colorists perform this gamut reduction procedure by using 3D LUTs (as we mentioned in more detail in the introduction.) Therefore, we engaged a professional colorist from a post-production company to use their own in-house 3D LUTs and apply them on our DCI-P3 test images in order to create reduced-gamut BT.709 images. We also perform GR using the following competing GRAs.
5.4.2 Competing GRAs
LCLIP [15] clips the chroma of the out-of-gamut colors to the destination gamut boundary along lines of constant hue and lightness.
Hue Preserving Minimum
(HPMINDE) [16] involves clipping of the out-of-gamut color to the closest color, in terms of\Delta E error, on the boundary of the destination gamut along lines of constant hue.\Delta E Alsam and Farup [29] proposed an iterative GRA that at iteration level zero behaves as a gamut clipping algorithm, and as the number of iterations increases the solution approaches spatial gamut mapping.
Schweiger et al. [25] make use of a compression function that squeezes colors near the destination gamut boundary in order to accommodate the out-of-gamut colors. This is a method proposed and used by the British Broadcasting Corporation (BBC).
5.4.3 Results of GRAs Under Experimental Setup 1 and Setup 2
The 15 observers that took part in the GR experiment for setup 1 were the same observers that participated in the evaluation of the GE algorithms. The analysis of psychophysical data gathered for GRAs is presented in Fig. 12a. It can be seen in this figure that the proposed GRA produces images that are perceptually more faithful to the original images than any other competing method. It is evident from Fig. 12a that observers did not prefer the HPMINDE algorithm in most of the tests images, and therefore rated it as the least accurate method. The algorithms of Schweiger et al. [25], Alsam and Farup [25] and LCLIP [15] are, respectively, ranked second, third and fourth by the observers.
Accuracy scores of competing GRAs: 15 observers took part in each experiment and 15 images were used.
For the experimental setup 2 we also ran the psychophysical tests with 15 observers, of which 9 had experience in image processing and the other 6 were skilled technicians (colorists and editors) from a post-production company.
In order to reduce the number of pair comparisons, in this particular setup we opted to use the reproduced images of the top three ranked methods from setup 1 and the reduced-gamut images created by using the custom LUT of the same post-production company. Fig. 12b shows the result for all the observers. Observers preferred the in-house LUT results over the other methods, with our GRA being ranked second.
More specifically, we can focus our attention on the result for this experiment when considering only the 6 skilled technicians. This result is shown in Fig. 13. In this case, we can see that the trend is very similar to the one obtained by the 15 observers (the ranking of the algorithms is not modified), but also that the skilled technicians are more inclined to select the in-house LUT of their post-production company, probably because they might be more inclined to select the solution they are used to working with. Also, the use of a LUT is well suited for the case of DCI-P3 to BT.709 reduction, where the blue primary is essentially the same for both gamuts, and the differences in the other two primaries are rather small: however, for larger gamut differences, the LUT approach might be hard to generalize.
Accuracy scores of competing GRAs for the skilled technicians case. The experiment was performed by Six experts, and 15 images were used.
5.5 Video Results
In order to test the temporal coherence we apply the proposed gamut reduction and gamut extension methods to all frames of videos independently. We confirm that the results produced by our algorithms are free from artifacts. The videos are available at http://ip4ec.upf.edu/GamutMappingVideos.
Does Any Error Metric Approximate our Psychophysical Results?
In this section we evaluate if there exists any image metric able to predict the result of our psychophysical test, following the same strategy we used for the GE case in [56]. To this end we consider a total of 10 metrics: a perceptually-based color image difference (CID) metric [83] particularly tailored for the gamut mapping problem, its more recent extension iCID [36], CIE
In order to perform a fair comparison to our experiment, we consider the metrics as if they were observers in our pair comparison test. This means that, for each metric, we will run all the possible comparisons, and in each comparison we will give a 1 to the image with better metric value and a 0 to the image with worse metric value. Later, we will apply the Thurstone Case V analysis [78] to each of the image metrics to end up with the preference values for each of the methods. These preference values will therefore be comparable to the ones shown for the psychophysical analysis in our previous section. For readers interested in the exact numerical values of the metrics (e.g., the mean value for each method, etc.), we provide them in the supplementary material, available online.
Fig. 14 shows the result of the aforementioned analysis. Each of the experimental setups is individually colored with a color code where the hue goes from pure red for the lowest value to pure green for the highest one. Therefore, for any metric to be able to predict the psychophysical results, its color code should match that of the results of the observers, shown in the last row. We can see that there is only one specific case where we could argue that this is happening, the NAE metric in the first setup of gamut extension. However, the same metric is not able to predict the observers’ response in any of the other three cases.
Comparison between the results of different image metrics and the results from psychophysical evaluation. Metrics were considered as observers in a pair comparison experiment. Each experiment is color coded individually. Color codes are green for the best result and red for the worst one.
It is interesting to mention that the CID and iCID metrics, which were specifically developed for gamut mapping, do not match the observer data; one possible reason is that these metrics were designed for input images in the BT.709 gamut, while in this paper the input images are in DCI-P3, which may explain the limitations of CID and iCID in the context of our problem.
Another significant result is that two state-of-the-art deep learning metrics, PieAPP [86] and LPIPS [87], designed to predict perceptual image error like human observers and based on large scale datasets (20 K images in one case, 160 K images in the other) labeled with pair-comparison preferences and, in the case of [87], using close to 500 K human judgements, are not able to predict the observers’ preference in any of the experimental set-ups that we have tested. This result is important, as it suggests that current deep learning approaches are not accurate enough for validating (and therefore developing) GM methods for cinema applications, although it’s not a surprising result in the sense that several very recent works have also shown how large or very large scale image databases (250,000+ images) can be used to train deep neural networks to predict user preference but whose performance decays remarkably when used on images that belong to some other dataset different from the one used for training [87], [88].
In summary, there does not seem to be an adequate metric that is able to predict the preference of observers for GM results. This has two important consequences. First, that in order to evaluate gamut mapping methods, we still need to rely only on psychophysical studies. However, conducting subjective studies is operationally difficult, hard to replicate from experiment to experiment, economically costly and may require special equipment (cinema projector, large screen, etc.). Second, that we cannot develop GM methods or optimize GM results simply by maximizing an image appearance metric or minimizing an error metric (as it is done in other contexts, e.g., [89]), which of course would be extremely practical. Therefore, the results presented in this section point out the importance of working towards defining better metrics for the gamut mapping problem that are able to predict the observers’ preference, which we strongly believe would be of great importance for the color imaging community.
Conclusion
We have introduced a GM framework based on some basic vision science facts and models. The algorithms for GE and GR are simple, of low computational complexity, produce results without artifacts and outperform state-of-the-art methods according to psychophysical experiments.
We also tested a number of established and state-of-the-art objective metrics on the results produced by the GM methods we have compared, and we have observed that these metrics correlate poorly with the choices made by the observers. Therefore there is a need for developing an image quality metric for GM: this would be a very significant contribution, as it would greatly simplify the validation of new methods and would also allow to optimize GM results by optimizing said metric.
ACKNOWLEDGMENTS
The authors are grateful to all the participants of the psychophysical experiments. Special thanks to Dirk Maes from Barco N.V. and to Stephane Cattan from Deluxe-Spain for their invaluable help and unwavering support. This work has received funding from the European Union’s Horizon 2020 research and innovation programme under Grant agreement number 761544 (project HDR4EU) and under Grant agreement number 780470 (project SAUCE), and by the Spanish government and FEDER Fund, grant ref. PGC2018-099651-B-I00 (MCIU/AEI/FEDER, UE). The work of J. Vazquez-Corral was supported by the Spanish government under Grant IJCI-2014-19516.