Introduction
Recently, there have been rapid advancements in 3D display techniques. Three-dimensional displays with a directional diffuser [1]–[4] can enhance the 3D image quality and effectively increase the viewing angle to 100 degrees. To provide a continuous 3D image, the viewpoint number of the 3D displays must be at least 50 [5]. With increasing viewpoint number and viewing angle, traditional real-time computer-generated image methods for SMV 3D displays are facing great challenges. The traditional methods for SMV rendering can be divided into two broad categories.
A. Geometry-Based Rendering Methods
The input data of the geometry-based rendering methods are 3D geometry data, texture, etc. The 3D image is generated by the methods without a complex process. The simplest geometry-based rendering method is each camera viewpoint-independent rendering (ECVIR) [6], which sets up one camera for each viewpoint and renders all the viewpoint images in turn. Consequently, with increasing viewpoint number, the rendering efficiency of this method decreases quickly. The multiview rendering (MVR) [7] method first renders scenes to epipolar plane images and then transforms them into viewpoint images. MVR can promote rendering efficiency in theory, but it fails to be supported by standard computer graphics rendering engines. The backward ray tracing (BRT) method [8] for the SMV display is difficult to render in real time due to the many calculations for ray-object intersections. Although great progress has been made in the real-time ray tracing technique, it is impossible to achieve real-time ultrahigh resolution rendering for complex virtual scenes on a PC machine within the next decade according to an NVIDIA report [9].
B. Image-Based Rendering Methods
The input data of image-based rendering methods is an array of images. The common image-based rendering methods for SMV displays are volume rendering and light field rendering. Volume rendering based on the ray-casting technique [10] has the same problem as the BRT method. Light field rendering for SMV requires considerable graphics memory, so it is impractical for rendering a large virtual scene in real time. However, the depth image-based rendering (DIBR) technique [11]–[19] is a promising 3D rendering method only for traditional SMV 3D displays with viewing angles of less than 10 degrees. In addition, the DIBR method generates images with poor quality and has difficulties with filling the holes and dealing with light in complex scenes. Multiple view plus depth (MVD) 3D representation carries both multiple view color and partial geometry information of the scene (carried by the multiple view depth maps) that can be used in combination to render image data for an SMV 3D display.
Recently, many works on MVD have focused on removing holes and promoting image quality. The Gaussian mixture modeling (GMM) method to realize virtual view synthesis for MVD [20] is a valid approach for addressing the issue of image quality degradation. Emerging holes in a target virtual view can be greatly alleviated by making good use of other neighboring complementary views in addition to the two closest neighboring primary views [21]. Reference [22] presents the multiview video plus depth retargeting (MVDRT) technique for stereoscopic 3D displays, which takes shape preservation, line bending and visual comfort constraints into account and simultaneously optimizes the horizontal, vertical and depth coordinates in the display space. Reference [23] uses color correction of reference views and combines depth-based image fusion with direct color image fusion to decrease ghost effects. Additionally, the cracks are filled using depth filtering and inverse warping. To accelerate the generation of new viewpoint images, many references [24], [25] adopt a parallel computing scheme.
Although the MVD methods effectively remove holes and promote image quality, the lighting problem is not considered, so the newly generated viewpoint image has some color distortions and inaccurate lighting, which leads to degradation of the image quality. There are three different types of light sources in virtual scenes: directional lights, point lights and spotlights. A material in a virtual scene should contain four components: ambience, diffuseness, specularity and shininess. This does not strictly require that the scenes in the above MVD methods only consider the ambient light and diffuseness of material to illustrate higher PSNR than other algorithms. This may be suitable for the same special video sequences, but the image quality is unacceptable for virtual scene rendering.
In computer graphics, lighting is very important for rendering [26] and is based on a simple model of the interaction of materials and light sources. There are three familiar lighting models for real-time rendering – Phong [27], Blinn-Phong [28] and Cook-Torrance [29]. Forward shading is a straightforward approach where we render an object, light it according to all the light sources in a scene and then render the next object and repeat this sequence for each object in the scene. This process is quite computationally intensive as each rendered object has to iterate over each light source for every rendered fragment, which is a considerable number of steps. Deferred shading [30] overcomes this issue and is widely used in games and interactive 3D programs. This method consists of a geometry pass and lighting pass. The geometry pass is used to retrieve all kinds of geometric information from the objects that are stored in a collection of textures. The lighting pass calculates the lighting for each fragment using the geometric information stored in the textures. The lighting pass is an image-based rendering method. The great advantage of deferred shading in comparison with traditional rendering algorithms is that this method has the worst-case computational complexity O(No + Nl), where No and Nl denote the number of objects and the number of light sources, respectively.
In computer graphics, hybrid rendering is a common and important technique to solve rendering problems, such as rendering quality and efficiency. For instance, a hybrid rendering method exists that combines a color-coded surface rendering method and a volume rendering method, exploits the advantages of both rendering methods, provides an excellent overview of the tracheobronchial system and allows clear depiction of the complex spatial relationships of anatomical and pathological features [31]. A multiview rendering hardware architecture consisting of hybrid parallel DBIR and pipeline interlacing is proposed to improve the performance in [32]. The proposed multiview rendering architecture can achieve 60 frames per second for processing full HD (
The hybrid rendering technique, which combines rasterization and real-time ray tracing techniques, has made great progress since 2018 since the revolutionary NVIDIA Turing™architecture [34] was proposed. Barré-Brisebois et al. [35] proposed a hybrid rendering pipeline in which rasterization, computation, and ray tracing shaders work together to enable real-time visuals to approach the quality of offline path tracing in 2019.
Here, a whole new SMV rendering pipeline based on a hybrid rendering technique is presented to address the problems that exist in all the previous SMV rendering methods. The proposed method introduces additional normal information, diffuseness information and shininess information and exploits the advantages of ECVIR of superior quality 3D images without viewing angle limitations, the high rendering efficiency of the MVD technique and the perfect lighting effect of deferred shading and can be treated as a hybrid rendering technique.
The HRT rendering pipeline contains four steps. First, images of sparse reference viewpoints are generated. Then, we apply multiple view reprojection and hole-filling to generate images of new viewpoints. The target view of images (e.g., depth images, normal images, diffuseness images and specular images) can composite target color images with an accurate lighting effect using the deferred shading technique. Finally, the reconstructed 3D image is generated according to the viewpoint arrangement of the SMV 3D display.
The remainder of the paper is organized as follows. In section II, a new SMV rendering pipeline based on the HRT method is proposed. Then, the principles of generating SMV 3D images with large viewing angles are illustrated. In section III, we carry out experiments to demonstrate the validity of the HRT method. Finally, we conclude our work in Section IV.
The HRT SMV Rendering Pipeline and Principles of Generating SMV 3D Images
The HRT SMV rendering pipeline is shown in Figure 1, which contains four stages to render a one frame SMV 3D image: sparse reference viewpoint image generation, dense viewpoint image generation, deferred shading and image synthesis. The former two stages increase the viewing angle and promote rendering efficiency. Deferred shading is used to generate an accurate lighting effect for every viewpoint image. The image synthesis stage generates the reconstructed 3D image according to the parameters of the SMV 3D display.
A. Sparse Reference Viewpoint Image Generation
The first stage applies the render-to-texture technique [36] and programming shaders for generating sparse reference viewpoint images, including depth images, normal images, diffuseness images and specular images. The common render-to-texture techniques to create multiview images have two types: single pass stereo (SPS) and Turing multiview rendering (TMVR). TMVR is a new technique that can simultaneously generate four viewpoint images, and its rendering speed is 2-3 times that of the SPS technique [9]. Therefore, we choose the TMVR technique to generate reference viewpoint images and store them in GBuffers, as shown in Figure 1. The traditional usage of TMVR is only to generate two viewpoint color images for VR devices, but we used it to generate four kinds of images for sparse reference viewpoints.
There are two kinds of input data in this stage, including a 3D virtual scene and an N-view virtual camera array. As shown in Figure 2, each viewpoint image can be generated from a translational-offset virtual camera, which corresponds to different off-axis asymmetric sheared view frustums with parallel view directions. Every virtual camera property can be determined by its view matrix and perspective matrix, and the n-th virtual camera view matrix Mvn and projection matrix Mpn in the camera array can be determined by Equation (1) [37]. The view matrix is applied to set the position and direction of the virtual camera. The projection matrix is used to project 3D world objects in homogeneous coordinates into an image. Mvc and Mpc are the view matrix and projection matrix of the center virtual camera in the virtual camera array, respectively. Because the virtual cameras in different positions have the same direction, their view matrix can be obtained by multiplying the translation matrix MT and the view matrix Mvc. The translation matrix is determined by the distance between adjacent virtual cameras and the index of the virtual camera. The projection matrix of each virtual camera can be obtained by multiplying Mpc and the shear matrix Mshear.\begin{align*} \textrm {M}_{\textrm {vn}}=&\textrm {M}_{\textrm {T}} \textrm {M}_{\textrm {vc}} \!=\!\left ({{{\begin{array}{cccccccccccccccccccc} {1} & \quad {0} & \quad {0} & \quad {0} \\ {0} & \quad {1} & \quad {0} & \quad {0} \\ {0} & \quad {0} & \quad {1} & \quad {0} \\ {(n-N/2)d} & \quad {0} & \quad {0} & \quad {1} \\ \end{array}}} }\right)\textrm {M}_{\textrm {vc}} \\ \textrm {M}_{\textrm {pn}}=&\textrm {M}_{\textrm {pc}} \textrm {M}_{\textrm {shear}} \!=\!\textrm {M}_{\textrm {PC}} \left ({{{\begin{array}{cccccccccccccccccccc} 1 & \quad 0 & \quad 0 & \quad 0 \\ 0 & \quad 1 & \quad 0 & \quad 0 \\ {(n-N/2)d/\textrm {d}_{\textrm {h}}} & \quad 0 & \quad 1 & \quad 0 \\ 0 & \quad 0 & \quad 0 & \quad 1 \\ \end{array}}} }\right) \\\tag{1}\end{align*}
Assuming that the viewing angle of the SMV 3D display is \begin{align*} \textrm {d}= \begin{cases} \textrm {2d}_{\textrm {h}} \textrm {tan}\dfrac {\theta }{2}\textrm {/(N-1)} & \textrm {N} ~\textrm {is} ~ \textrm {odd} \\[0.5pc] \textrm {2d}_{\textrm {h}} \textrm {tan}\dfrac {\theta }{2}\textrm {/N} & \textrm {N} ~ \textrm {is}~ \textrm {even} \\ \end{cases}\tag{2}\end{align*}
Four images as one group are generated by a virtual camera. By using these images as intermediate results, the viewpoint color image created with the deferred shading technique is illustrated in the fourth stage.
B. Dense Viewpoint Generation
In this stage, dense target viewpoint images can be generated by sparse reference viewpoint images. We improved the MVD technique. The difference between the traditional MVD technique and the stage is that the latter has to process depth, normal, diffuseness and shininess information, whereas the former only has to process depth and color information. The precision of depth in the traditional MVD technique is 8 bits, while the precision of depth in the stage is 32 bits. The basic approach of generating new viewpoint images includes view reprojection and hole filling. In the first step, the points in the reference view are projected into a 3D space and then projected to a new view (reprojection view). Holes are introduced, which decreases the reprojection view image quality. However, a texture map from other reference views is developed to inpaint most of the holes and avoid degrading the 3D image quality. In the hole-filling step, the remaining holes are filled by linear interpolation.
Each pixel in the depth image contains one virtual point, and its position \begin{equation*} \vec {{\text {p}}}=(\text {u},\text {v},\text {d},1)\text {M}_{\textrm {pn}}^{-1} \textrm {M}_{\textrm {vn}}^{-1}\tag{3}\end{equation*}
The reprojection view can be obtained by shifting the image value in the horizontal direction according to the depth value. As illustrated in Figure 3(c), there are two virtual points (p1,p2) with the same projection point pj. However, they have different depth values (d1,d2) in the reprojection view. The shift values of \begin{align*} \Delta \text {s}_{1}=&\textrm {d}_{1} {\Delta \text {x}/(\text {d}}_{\textrm {h}} \textrm {-d}_{1}) \\ \Delta s_{2}=&\textrm {d}_{2} {\Delta \text {x}/(\text {d}}_{\textrm {h}} \textrm {-d}_{2})\tag{4}\end{align*}
There are N reference views in our display system, so each viewpoint image in the dense target views has N reprojection views. With increasing
Linear interpolation is the simplest approach for hole filling. Assuming that pixel (x,y) in target view m is a hole pixel, the nearest left valid pixel pl and the nearest right valid pixel pr are shown in Figure 4. Therefore, the filling value for pixel (x, y, m) can be calculated from the following equation:\begin{equation*} \textrm {V}_{\textrm {(x,y,m)}} =(\textrm {d}_{\textrm {r}} \textrm {V}_{\textrm {l}} +\textrm {d}_{\textrm {l}} \textrm {V}_{\textrm {r}}) (\textrm {/d}_{\textrm {l}} +\textrm {d}_{\textrm {r}})\tag{5}\end{equation*}
Parallel computing is implemented in this stage to provide the real-time performance of view reprojection and hole-filling. Given that the resolution of a target view is
C. Deferred Shading Stage
In this stage, the color image of a viewpoint and the correct lighting effects can be generated from its four images, as shown in Figure 6. The lighting model used in this stage is the Blinn-Phong model, which is the most popular model used in interactive 3D programs. The following text is from [38], illustrating the detailed process of deferred shading.
Color texture is generated from the other four images with the deferred shading technique.
For convenience, we only consider point lights, as shown in Figure 7. The total lighting equation of the Blinn-Phong model is shown in Equation (6).\begin{equation*} \textrm {I}_{\textrm {tot}} =\textrm {I}_{\textrm {amb}} +\textrm {I}_{\textrm {diff}} +\textrm {I}_{\textrm {spec}}\tag{6}\end{equation*}
Subpixel-viewpoint arrangement of SMV 3D display and the corresponding parameters.
The diffuseness term \begin{equation*} \text {I}_{\text {diff}}=\text {MAX}(0, \mathord {\stackrel {{\lower 3pt\hbox {$\scriptscriptstyle \rightharpoonup $}}} {\text {l}}}\bullet \mathord {\stackrel {{\lower 3pt\hbox {$\scriptscriptstyle \rightharpoonup $}}} {\text {n}}})\text {M}_{\textrm {diff}} \otimes \textrm {S}_{\textrm {diff}}\tag{7}\end{equation*}
The specular term is a key component in determining the brightness of specular highlights, along with shininess to determine the size of the highlights. The specular term Ispec, which represents the specular parts of both light sources and materials can be computed as follows:\begin{equation*} \textrm {I}_{\textrm {spec}}=\text {MAX}(0, \mathord {\stackrel {{\lower 3pt\hbox {$\scriptscriptstyle \rightharpoonup $}}} {\text {h}}}\bullet \mathord {\stackrel {{\lower 3pt\hbox {$\scriptscriptstyle \rightharpoonup $}}} {\text {n}}})^{\textrm {M}_{\textrm {shi}}}\textrm {M}_{\textrm {spec}} \otimes \textrm {S}_{\textrm {spec}}\tag{8}\end{equation*}
\begin{equation*} \mathord {\stackrel {{\lower 3pt\hbox {$\scriptscriptstyle \rightharpoonup $}}} {\text {h}}}=\frac {\mathord {\stackrel {{\lower 3pt\hbox {$\scriptscriptstyle \rightharpoonup $}}} {\text {l}}}+\mathord {\stackrel {{\lower 3pt\hbox {$\scriptscriptstyle \rightharpoonup $}}} {\text {v}}}} {\left \|{\mathord {\stackrel {{\lower 3pt\hbox {$\scriptscriptstyle \rightharpoonup $}}} {\text {l}}+\mathord {\stackrel {{\lower 3pt\hbox {$\scriptscriptstyle \rightharpoonup $}}} {\text {v}}}}}\right \|}\tag{9}\end{equation*}
\begin{equation*} \textrm {I}_{\textrm {amb}} =\textrm {M}_{\textrm {diff}} \otimes \textrm {Factor}_{\textrm {amb}}\tag{10}\end{equation*}
From Equations (6)–(10), the color images can be easily calculated in the compute shader.
D. Image Synthesis Stage
The viewpoint mask as input data is a 2D buffer that records the viewpoint arrangement of the SMV 3D display. The relationship between the viewpoint number vkl and the subpixel index (k, l) can be represented by the following equation [39], [40]:\begin{equation*} \textrm {v}_{\textrm {kl}} \!=\!\frac {(k-3l \tan \alpha) \text {mod}((m\!+\!\textrm {1)p}_{\textrm {u}} {/(}m\textrm {p}_{\textrm {h}} {\cos \alpha))}}{((m\!+\!\textrm {1)p}_{\textrm {u}} {/(} m\textrm {p}_{\textrm {h}} {\cos \alpha))}}\textrm {N}_{\textrm {tot}}\tag{11}\end{equation*}
Experiments and Results
The HRT method is implemented in the computer shader and fragment shader. The PC hardware includes an Intel ⓇI9 9980 XE (4.26 GHz) CPU with 16 GB of RAM and an NVIDIA GeForce 1080 GPU with 8 GB of RAM. The GPU is the main factor that affects the rendering frame rate. Six 3D models, including monkey, car, heart, buildings, Manhattan and furniture, are used to test the performance of the HRT method. Monkey, car, heart, and buildings are simple 3D models, while Manhattan and furniture are complex 3d models. The numbers of vertices, faces and triangles in the models are listed in Table 1. An 8k SMV 3D display with 80 degree viewing angle and 100 viewpoints is applied in the experiment, and other parameters of the display are shown in Table. 2.
In our experiment, PSNR is applied to measure the squared intensity differences of the synthesized and ideal view image pixels. The ideal view image can be obtained with the ECVIR or BRT methods. Then, based on the average PSNR performance, we compare the outcome of the HRT method with those of state-of-the-art methods, namely, GMMDIBR [20] and MVDRT [22]. We apply different hole-filling methods in GMMDIBR, MVDRT and HRT to refine the blended image. The input data of GMMDIBR and MVDRT are color images and depth images, while the input data of HRT are depth images, normal images, diffuseness images and specular images. Figure 10 shows that the proposed HRT method provides better performance than the existing state-of-the-art MVD methods. The number of reference views is four, the number of frames is 200, and the resolution of the reference views is
Because new viewpoint images should have different colors for the same virtual point in different reference views, the new viewpoint images generated by GMMDIBR and MVDRT have many error pixels in Figure 11. Color images and depth images cannot provide enough material information to generate new viewpoint images with accurate lighting, so the PSNR values of these methods are lower than that of the HRT method.
The new viewpoint images generated with different methods and their ideal view images. The HRT images have accurate lighting.
Figure 12 illustrates that with increasing reference views, our proposed HRT pipeline can provide better performance. The improvement ranges vary from 2.169 dB to 3.303 dB with an average improvement of 2.432 dB using three reference views and 3.585 dB to 4.630 dB with an average improvement of 3.726 dB using four reference views.
PSNR comparison for rendering monkey with 200 frames by using different numbers of reference views in the HRT pipeline.
Observers standing at different positions should obtain different colors for one virtual point because of the different unit half vectors in Equation (8). The specular term Ispec can directly affect the image quality of the newly generated viewpoint, as shown in Figure 13. The values of Mspec and Sspec are (1.0, 1.0, 0.0) and (0.7, 0.7, 0.7), respectively. The monkey material is an ideal diffuse material when Mshi is zero. The PSNRs of GMMDIBR and MVDRT are higher than the PSNR of the HRT method. The hole-filling method of the former methods is better than that of the HRT method under these conditions. With increasing Mshi, the PSNR values of GMMDIBR and MVDRT decrease quickly, while the PSNR value of HRT increases slowly. Therefore, HRT is more suitable for virtual rendering than GMMDIBR and MVDRT for most virtual scenes.
PSNR comparison for new viewpoint images with different specular materials by setting different
The rendering times of the different methods and every stage of HRT are shown in Table 3 and Table 4. As shown in Table 3, the rendering time is related to the complexity of virtual scenes. The results also illustrate that our HRT method has an obvious advantage in rendering efficiency compared to those of the ECVIR and BRT methods. Because the HRT should process more input data and has more stages than GMMDIBR and MVDRT, it requires more time to render one frame of a 3D image, while the frame rate is still more than 35 fps on average. As depicted in Table 4, the time consumption of the last three stages has no relationship with the 3D models.
Figure 14 illustrates the main factors affecting rendering efficiency. The rendering frame rates of the six models decrease rapidly with increasing view numbers in Figure 14(a). However, the frame rate also remains above 20 fps when the viewpoint number reaches 200. As shown in Figure 14(b), with increasing target view resolution, the frame rate decreases rapidly because the number of computing units in the GPU is limited. In addition, as shown in Figure 14(c), increasing reference viewpoint number results in a decrease in frame rate because the first stage consumes more time.
The factors affecting the rendering efficiency: (a) The number of target views. The resolution of the target view is
The final 3D image that is displayed on the LCD panel of the SMV 3D display can be generated by the image synthesis stage, as shown in Figure 15. The proposed HRT algorithm is implemented on an 8k SMV 3D display. The real-time reconstructed 3D images from different perspectives are shown in Figure 16.
Conclusion
In summary, a whole new SMV rendering pipeline based on the hybrid rendering technique (HRT) is constructed that can generate accurate lighting effects in real-time when the number of viewpoints of the SMV 3D display is greater than 50, the viewing angle is greater than 100 degrees, the resolution of a single viewpoint image is more than 512*512 and the resolution of the LCD panel is 7680*4320; in particular, complex scenes can be generated in real time. Real-time 3D optical reconstruction with accurate lighting effects is realized on an 8k SMV 3D display with an 80-degree viewing angle and 100 viewpoints. The main factors affecting the rendering efficiency are the number of target views, the number of reference views and the resolution of the target views. Experiments demonstrate that when the number of reference views is four and the resolution of the target view is