Journals & Magazines >IEEE Access >Volume: 9

Accuracy in Depth Recovery and 3D Image Synthesis From Single Image Using Multi-Color Filter Aperture and Shallow Depth of Field

Accurate-depth virtual 3D objects generation from multi- plane images (MPIs) and multi-region images (MRIs) from single image in shallow depth of field.

Abstract:

A computational 3D image generation using a single view with multi-color filter aperture (MCA) and multi-plane representation is a cost-effective approach and most useful...Show More

Metadata

Abstract:

A computational 3D image generation using a single view with multi-color filter aperture (MCA) and multi-plane representation is a cost-effective approach and most useful when there is no option to acquire either stereo or multi-views with orientation at all. Although this approach generates 3D perception image that includes multiple objects with both similar and dissimilar colors having occluded by each other, it may be insufficient for virtual/augmented reality applications due to inaccurate depth. In this article, we obtain a more accurate geometric depth estimation by formulating a suitable relationship between inter-objects depth of the 3D scene in the depth-of-field (DoF) zone and its corresponding inter-image plane depths of a 3D perception image in depth-of-focus (DoFo) zone of a given camera under shallow DoF zone constraint. But, this shallow depth zone is configured to be dependent only on the focal distance between the lens and object while the remaining parameters such as aperture diameter, focal length, and sensor sensitivity are held at constant values. All-in-focus 3D perception image is synthesized from multi-plane images (MPIs) by utilizing the inter-image plane depths computed from the disparities caused across the boundaries and its smooth surface from image textures inside the respective boundaries of the 2D MCA image. The 2.1D sketch is used as a semantic segmentation technique to determine the number of objects in the 3D scene as one in-focus region and the rest as out-of-focus regions due to the circle of confusions (CoCs) on the fixed image sensor plane. The same enables both ordering of the image regions and identifying occlusion wherever applicable. An accurate depth 3D image is synthesized, replacing accurate inter-depths in place of inter-depth between MPIs used for 3D perception image. In the end, the paper summarizes few experimental validations for the proposed approach with some salient examples having depth gaps between 0.5cm to 10....

Accurate-depth virtual 3D objects generation from multi- plane images (MPIs) and multi-region images (MRIs) from single image in shallow depth of field.

Published in: IEEE Access ( Volume: 9)

Page(s): 123528 - 123540

Date of Publication: 03 September 2021

Electronic ISSN: 2169-3536

DOI: 10.1109/ACCESS.2021.3109865

Contents

CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.

SECTION I.

Introduction

All in-focus 3D image generation having accurate depths is an essential need in many computer vision-based applications for getting crucial details regarding the 3D scene. In this article, we mean all-in-focus image as approximately computable such that every pixel in this image is in focus. Creating a virtual 3D object or 3D image synthesis is essential for any virtual/mixed reality applications, especially for medical diagnosis and surgical training appliances. Any conventional optical imaging camera system that is used for visible light intensity imaging, a real-world 3D scene or 3D object typically comprises of three intrinsic components, namely, (i) 3D object space, (ii) image space, and (iii) optical imaging system [1]. Among the various known optical imaging systems that use various ranges of electromagnetic rays, we consider only visible spectral light rays based on the photographic system. Moreover, the optical image sensors used in the image space in most digital imaging systems would be either CMOS or CCD sensors, which are typically 2D in nature for capturing the image of any 3D object, which may be due to a low-cost compulsion. The method of recovering the depth information to represent a 3D image needs some tricky ways to project the irradiant rays emitted from the scene/object surface onto the 2D sensor plane. To date, most of the approaches suggested are based on either single view, stereo-view or multi-views [1], [2] for generating 3D perception qualitatively for the real 3D world scene from the irradiant optical stimulations (mathematically it would be of one to many mapping), [3], which we term as “3D perception image”. Among those approaches that produce 3D perception image, single view image-based 3D image generation is a most challenging one though it is advantageous because of two reasons, namely, (i) it involves less optics with an extensive computational complexity and (ii) it is helpful in some scenarios where stereo or multi-view imaging is not possible. The generation of a 3D image from a single-view 2D image is an active area of research due to the following reasons, namely, (i) suitable imaging can be readily obtained by a commercially available and affordable camera with either minor modifications or plug-in attachments in contrast to the devices that involve time-of-flight [4] structured illumination [5] or stereo camera [5], etc., (ii) unlike the conventional camera, computational camera composes of both geometrical imaging optics having built-in image processing/analysis functionality with computational resources [6], (iii) especially, in medical imaging applications, all focused images would give good clarity in identifying pathological symptoms. In any imaging system, exposure parameters, namely shutter speed and aperture size, either one or both of them may be used to get the desired amount of light radiation onto the sensor. Further, the sensitivity of the sensor also plays a crucial role. However, the camera is designed and developed based on the desired sensor and lens, resulting in fixed sensor and lens characteristics. Moreover, in conventional cameras, flexibility lies only in aperture and shutter speed. The flexibility in the camera system is achieved in many ways by the aperture used in the camera, viz, (i) radius of the aperture or size of the aperture [1], (ii) multiple apertures (e.g. color filter aperture) [7]–[9] (iii) sparse aperture (e.g. color-coded apertures) [10], (iv) offset aperture [11] and (v) differential aperture photometry [12]. In recent days, computational photography systems are gaining more importance over traditional photography because it comprises not only optics and electronics to capture the image but also software-based computational manipulation to improve the imaging capability [13]. These manipulations may involve to compute attributes, such as shape, position, resolution, structure, unfocused pixel value or pixel regions, variation of aperture, and control the amount of incoming light, etc. It is important to distinguish between 3D perception scene image and the more precise 3D virtual scene or image. In a 3D perception image, the depths between the various portions of the image pertaining to their corresponding features are proportional to give a feel for the 3D environment. However, in the 3D precise image, the depths mentioned above are of a more accurate estimate, which depicts the real measurement of the scenes as required for auto maneuvering of robots or vehicles. This article will extend the computational 3D perception imaging system comprising a camera, multi-color filter aperture (MCA), and computation of depth by disparities among various boundaries identified using a 2.1D sketch for the single image [14]. In this article, we adapt computational photography techniques to generate a 3D image from a single RGB image acquired from a conventional digital camera with add-on MCA using red, green, and blue filters. We propose to obtain multi-plane images (MPIs) that attributes to all-in-focus image regions based on geometric optics principle as shown in Fig. 1 with associated inter-depths between them by suitable computational processes. Ultimately, we aim to deduce the quantitative model utilizing associated camera parameters to arrive at the depth parameter using the acquired 2D RGB image. In this research effort, we develop a quantitative model for the computational imaging system suggested in [14] that employs both depth cues from MCA [8] and composition of a 3D image using MPIs by using multi-region image (MRI) decomposition of the acquired image using a 2.1D sketch [15] as semantic segmentation. The primary goal here is to arrive at better accurate 3D images or 3D virtual objects in terms of associated geometrical and intrinsic camera parameters. In brief, the method relates accurate inter-image depth in multiple object layers in DoF and its corresponding inter-image depths in MPIs in the DoFo region. However, inter-depths for the decomposed MPIs are associated with disparity parameters using MCA 2D image and 2.1D sketch. Ultimately, it leads to an affordable device due to minor optics modification as MCA with red, green, and blue filters arrangement shown in Fig. 1 embedded in CANON 50mm f1.8 II lens to the readily available CANON 70D DSLR camera.

FIGURE 1.

Red, green and blue filters arrangement as aperture.

Show All

In summary, our contributions are:

Deduction of explicit relation between inter-depth associated with MPIs of 3D perception image with inter-objects depth of real-world 3D scene with a shallow DoF constraint.
Generation of more accurate geometric depth 3D image from a single 2D RGB image as three steps, namely, (i) Generation of 3D perception image as a set of MPIs, using inter-image planes depth and image surface, (ii) The set of MPIs are obtained by multi-regions due to varying CoCs as a consequence of shallow DoF determined using 2.1D sketch as semantic image segmentation, inter-image plane depths from the disparities across the boundaries caused due to MCA and smooth surface by adjusting the image textures from respective regions of acquired 2D MCA image, and iii) Final more accurate depth 3D image is obtained from 3D perception image by realigning with their inter-objects depth.

Section 2 reviews the relevant depth estimation using MCA and associated image representation for view synthesis by several researchers along with limitations. Motivation for this project is highlighted in section 3. Section 4 explains the geometric 3D perception of the image from a multi-plane image based on ordered multi-region image and inter-depth estimation. Section 5 describes a 3D accurate depth image or virtual object from MPIs based on a newly deduced relationship between the inter-depth of the perception image and its 3D accurate depth image or virtual scene. Section 6 demonstrates the experimental simulation to validate the above developed 3D image, Section 7 gives the concluding remarks with the future scope of research.

SECTION II.

Related Research Works

Although substantial research has been carried out on the depth estimation and reconstruction of 3D perception image (typically termed as all-in-focus image) using MCA, we notice that hardly any discussion is carried out on precise depth. As mentioned earlier, our main aim involves 3D imaging using a single image with a conventional camera with more accurate depth based on cues associated with MCA and DoF. Some crucial computer vision applications such as robotic, medical diagnosis, and surgery planning need more exact shape, size, and volume-based object recognition. Particularly in the now being considered technique, this boils down to establish an exclusive relationship between perception depth image and actual depth of the object in the 3D scene. Hence, our review is centered around depth estimation using both DoF and MCA to arrive at all-in-focus image synthesis.

A. Review on Depth Estimation From Disparity Using MCA

Jaehyun Im et al. [15] have suggested an optically modified color-filter aperture (CFA) for tracking objects with both depth and position expressed in color shifts by assuming there would be color discontinuities when there are two different objects in the scene that are located at different depths. The phase correlation is employed in the adaptive mean shift algorithm for real-time implementation of the CFA approach. However, in this method, occlusion handling entirely depends on empirical threshold selections based on pixels, and the method does not assure to take care of the multiple objects with the same color at different positions along the optical axis. Yosuke Bando et al. [8] proposed the improved depth estimation and usage of matte to distinguish between foreground and background using color misalignment utilizing several image-editing techniques. More crucially, the disparities increase when the lens is focused nearer, wherein inter-depths between any two objects in the scene may not be consistently the same. It is noted that the better inter-depth results, when objects are at 50cm to 250cm, and if the distance between object and camera is about twice as far away from the background. Further, performance degrades for smaller color misalignment and the objects with the same color. On the other hand, the restriction of a distinct colored object limits the practical utility of this approach.

The color-coded aperture (CCA) based method by Ivan Panchenko et al. [16] needs calibration to arrive at better results for different coded aperture designs and is based on the calibration and the effective focal length. The demerits of CCA are its limited field of view (FOV) [17], and are claimed to perform better than pinhole collimators only with a very localized point source.

The incorporation of light efficient CCA was designed and calibrated in skillful ways by Vladimir Paramonov et al. [18] to arrive at depth resolution in millimeters for a given image frame. Moreover, their contribution demonstrates DSLR, smart-phone, and compact camera to real-time 3D scene generation and depth-based image effects. But, this resolution is applicable at the center of the acquired image only. Unfortunately, both of these methods exploit GeForce GTX 780 hardware GPGPU for handling computation intensiveness. There is no assurance to guarantee inter-depth for the objects with the same color and textures in the image. But, the accuracy of CCA method depends on many factors such as working range, stronger texture information for depth extraction, lower accuracy in strongly defocused areas, and image restoration to get a sharp image. A single MCA camera-based methodology suggested by Seungwon Lee et al. [19] and modified multi-focusing image misalignment by Sangjin Kim et al. [19] to track the object upon exploiting sparsely extracting edges due to red, green, and blue color shifting vectors at each channel assures good estimate of depth only in the central region of the image. However, both of these methods assure good depth only at the center of the image without any mention of occlusion handling issues. Recently, we have proposed a novel method of generating a 3D image from a single 2D RGB image for a scene utilizing the depth from disparity based on MCA that handles multiple objects in the scene both with occlusions and with the same colors at different depths along the optical axis. It utilizes a 2.1D sketch instead of image matte on the MCA images to determine the $L$ number of image regions corresponding to $L$ objects in the given scene. The MRIs obtained using a 2.1D sketch are based on: one in-focus and $\left ({ L- 1}\right )$ out-of-focus regions with varying blurriness as a natural consequence of differing CoCs. This blurriness is due to objects lying at different distances in DoF regions. Synthesis of 3D perception image as MPIs corresponding to MRIs at different inter-depths are obtained using inter-image region disparities. [14].

Although the previously discussed algorithms deal with various aspects of how to arrive at 3D perception image having all-in-focus image surface with proportional depth information, acceptable in some computer vision applications, may not be adequate in some applications such as robotic environment, auto-driving, and some mixed reality medical diagnostic studies. Again, we note that they exploit only image-side information to arrive at 3D perception image that does not use many camera parameters, lays the foundation for our motivation discussed in the following subsection.

B. Review on Layered Representation for 3D Perception Image

The requirement of more accurate inter-depths between the objects is essential for geometry-aware manipulation of 3D virtual objects in most virtual or augmented reality applications when the 3D scene comprises more than one object. It is important to note that any 3D object in the 3D scene possesses non-shape changing attributes such as color, overall size, orientation, and shape-changing attributes such as the number of sides (parallel or non-parallel, straight or curved), vertices, edges, boundaries, etc. Many of the above attributes may exist in the 3D perception image in general except for the accurate depth information.

In the literature, we notice that layered depth images (LDI) representation is used as one of the compact coding representations for multi-view synthesis to address occlusion and hidden information associated with a 3D scene [20]. Vincent Jantet, et al., [21] suggested an improved virtual synthesis based on object levels distinction between background and foreground, using region growing segmentation technique. But this algorithm is developed for multi-image views.

The multi-plane image (MPI) representation is another way to synthesize a 3D image that enables to render each pixel to get scene independent new views with consistent non-occlusion when multiple objects are involved in the scene. Each image plane is considered as a RGBA image belongs to the part of frustum with apex at lens and positioned at fixed equally spaced depth obtained as inverse of disparity [22]–[25], [26], [27]. Recently, we have utilized MPIs as $L$ number of fronto-parallel planes for synthesizing 3D perception image corresponding to $L$ number of MRIs, and each MPIs are positioned in the DoFo zone belonging to image space with non-uniform interval using respective inter-depths [14]. But, it lacks the assurance of true depth in the image. As mentioned earlier, 3D geometric modeling is required for virtual reality applications for generating required views from the reference. It implies that inter-depths involved are to be established based on camera parameters, as indicated in the previous subsection.

SECTION III.

Motivation

In this article, before bringing up salient points for quantification of depth in 3D imaging, we distinguish 3D visual luminance image as: 3D perception image realized with an illusion of depth to the 2D luminance image, whereas 3D virtual object or 3D precise synthesized image is realized with a true depth estimated from realistic parameters to the 2D luminance image. Further, we point out the 3D virtual object or 3D precisely synthesized image is advantageous in various real-world applications. We note few relevant points, which are relevant for our quantified 3D imaging systems using shallow DoF are listed below:

The insufficiency of 3D perception image obtained intuitively using a central theme for generating an all-in-focus 3D image from a single image needs to decompose the given image data as appropriate ordered multiple layered images. On adapting only image data as a set of varying blurred or defocus image regions, we could generate only 3D perception images from the depth from disparities and decomposed multiple layer images.
Inability in acquiring all-in-focus 2D images or accurate depth-based 3D image corresponding to entire 3D objects of the scene using a 2D image sensor as there is no simplistic and cheaper known 3D image sensor other than costly 3D sensor like time of flight [2], Microsft Kinet device [28], etc.,
Limitations associated with DoF, especially under limited illumination conditions [1],
Technically, deep to shallow DoF zones are parametrically controllable by aperture, focal length, and distance of the object from the objective lens of the camera [1],
The change in aperture radius will not only vary DoF zone but also FoV [1].
Interestingly, for a fixed aperture size or FoV, on one hand, DoF is directly proportional to object distance with focal length held constant, and on the other hand, DoF is inversely proportional to focal length when the object distance is held constant [1].
Very shallow DoF results very small all-in-focus 2D image [1],
DoF is sensitive on digital format size or CoC criterion for a camera [1],
Professional photographers use DoF photography, which involves controlling deep to shallow DoF for getting creative effects. Especially shallow DoF isolates the required portion of the image in the larger scene or FoV, since a shallow DoF zone in the object space leads to a small all-in-focus region along the optical axis, but the regions before or after the all-in-focus region along the optical axis are out-of-focus [1],
Changing the DoF by varying focal length is impossible once the lens is decided for the camera unless it is a liquid lens, which amounts to the extra cost.
The DoFo zone interval in image space is related to the DoF zone interval in object space and is dependent on both intrinsic and geometric parameters of the given camera [1].
The concept of exploiting DoF is used in the field of computer graphics or visual synthesis or virtual reality [20]. But, it is used as a software simulation for generating 3D perception image to get the feel of photorealism only.

The above-enlisted issues and properties have motivated us to arrive at an affordable 3D image generation system. That would find applications for 3D imaging for biological surface tissues, collision-free autonomous driving, and maneuvering industrial robotics, where the objects that need to be imaged are always very close to the camera lens. This small focal distance (object to lens distance) is an obvious consequence of wide-angle aperture, and FoV [1]. A shallow DoF is a direct consequence of focal distance, which corresponds to a small DoFo leading to a smaller in-focus image region, and many image regions are out-of-focus, causing blurriness due to CoCs [1]. These factors prompted us to derive virtual positions for multiple image sensor planes as consequences of disparities caused due to blurriness on the fixed image sensor in the pre-configured camera setup to the acquired small in-focus region and remaining out-of-focus regions during single image acquisition of the scene. In this article, among the many interpretations for the 3D image corresponding to a 3D scene in nature, our focus is on geometrical aspects restricted either as a 3D perception image, and actual 3D image from a single image. The perception 3D image referred to the image of a scene that represents the geometrical environment displayable as an all-in-focus image with qualitative depth information, whereas a precise 3D image as a virtual object would be able to project with more realistic depth information.

SECTION IV.

Geometric 3D Perception Image From Multi-Plane Image

In our previous work, we had considered single 3D scene image representation as a composition of multi-plane images from 2D multi-region images by exploiting the objects with varying amounts of blurriness, lying at a varying depth, that enables us to arrive at perception depth [19].

A. Decomposition of Multiple Image Regions and Computation of Inter-Depths

Typically, any given scene comprises multiple objects with the same color or different colors at varying depths, either with and without overlaps. These objects would result in varying blurriness in a captured 2D image due to DoF configurations. The first step is to decompose the given 2D image into various image regions corresponding to the $L$ number of objects in the scene. The decomposition into multiple image regions with boundaries is obtained based on one in-focus region and $\left ({ L-1 }\right )$ out-of-focus regions. For any camera setup shown in Fig. 2, in the captured 2D RGB image, focussed portion or object in the scene results in a region with no blurriness, and the remaining $\left ({ L-1 }\right )$ regions corresponding to those objects yields varying blurriness that depends on other positions in 3D scene project on 2D sensor plane due to varied CoCs as consequences of DoF [1]. The second step is to obtain the disparities along the boundaries, which are of the above spaces are dependent on four parameters, namely (1) focal length, (2) sensor sensitivity, (3) aperture type with size, and (4) camera focus distance. In any camera system, with a decided sensor type and lens, once manufactured, makes those parameters constant. It is flexibly easy to select either shallow or deep focus region-range from adjusting $d_{a}$ compared with focus distance. Further, for a specific object in a 3D scene, we suggest configuring $d_{a}$ with a suitable value such that we obtain an appropriate FoV with shallow DoF that results in a small focussed image on the sensor plane and remaining objects located at different distances on the optical axis behind it generates differing defocus image regions due to CoCs. Indirectly, we presume that the acquired single image consists of $L$ number of MRIs for the objects in the 3D scene. Since our intention is to tackle the occlusion aware decomposition of single image into MRIs, associated with varying amount of defocus, 2.1D sketch formulated by either Nitzburg and Mumford [15] or a 2.5D sketch David Marr [29] can be used, which enables to generate the 3D scene representation.

FIGURE 2.

(a) MCA camera set up for varying defocused captured 2D image and computed 3D perception image as MPIs and transforming to 3D virtual object/ estimated 3D image. (b) 3D perception image as MPIs (c) 3D Virtual object or estimated 3D image.

Show All

Especially in this article, we consider a 2.1D sketch since it segments the image regions that handle occlusion explicitly to decide the partial ordering of regions by indicating their ordering. This method exploits depth cues from edges, curves, cusps, crack tips, and $T-$ junctions. Further, we solve energy minimization problem [30] using DIvided RECTangle (DIRECT) search-based deterministic global optimization method [31], [32] that slices the given single 2D image into a set of MRIs with ordered top to bottom layers. This is based on similarly clustered gray levels and/or colors apart from distinguishing their overlaps or occlusions. Also, it gives information about the total number of different regions/patches present in a given 2D image.

Any given 2D image is described with domain $\mathcal {D}$ , where $\mathcal {D} = \left ({\left ({ i, j }\right ), 0 \leq i \leq \left ({ M - 1 }\right ), 0 \leq j \leq \left ({ N - 1 }\right ) }\right )$ and on adopting principle of 2.1D sketch as described by Nitzburg and Mumford [15], we arrive at $L$ number of MRIs as \begin{equation*} \mathcal {D} = \bigcup _{l=0}^{ \left ({ L - 1 }\right ) } \mathcal {R}_{l}', {~\text {where }} \mathcal {R}_{l}' = \bigcup _{ l < k, \mathcal {R}_{l} < \mathcal {R}_{k} } \mathcal {R}_{l} \tag{1}\end{equation*} View Source where < denotes partial ordering includes occlusion.

B. Revisiting the Perception 3D Image From MPI

We recall from our earlier research work the all-in-focus image as 3D perception image represented as $L$ number of fronto-parallel MPIs, and its inter-plane depths is given by:\begin{equation*} \Delta d_{0l} = \frac {\delta _{l}\times d_{ls}} {\left ({ d_{a} + \delta _{l} }\right ) }, \quad l=1,2,3 \cdots , \left ({ L - 1 }\right ) \tag{2}\end{equation*} View Source

In natural scenario, these partial or full occlusions between the objects lying in a 3D scene may not be very well ordered but these occlusions are caused due to the objects at different heights and locations other than focus distance in the 3D scene. Further, as per geometric optics, shallow DoF would result in shift varying blurred regions across the captured single image. In our previous research, we are inspired to use 2.1D sketch since it enables occlusion aware image segmentation for decomposing the single image into MRIs, and allowing to take into account the partial occlusion of the farther object by those with nearer ones [1]. Briefly, the salient steps for perception 3D image generation algorithm is enumerated as below:

Identify various image regions using salient boundaries based on the change in disparities and obtain inter-depths between the regions using (2).
Order the MRIs from foreground to background layers using depth cues as per 2.1D sketch.
Generate 3D image as a composition of $L$ fronto-parallel MPIs as a part of a frustum with a reference to the image sensor planes as shown in Fig. 2(b) (shown in dotted box as MPI frustum), each at the distance $\Delta d_{0l}$ , $ l=0,1, \cdots , \left ({ L -1 }\right )$ and its smooth surface of $l^{th}$ MPI as $ g \left ({ i, j }\right ) \in \mathcal {R}_{l} +\Delta d_{l}$ .

The 3D perception image obtained above can be expressed as:\begin{align*} g \left ({ i, j }\right ) \in \mathcal {D}=&\bigoplus _{i=0}^{ \left ({ L - 1 }\right )} \left ({ g \left ({i, j }\right ) \in \mathcal R_{l}' }\right ) + \left ({ \Delta d_{0l} }\right ) \\ \mathcal {D}=&\bigcup \limits _{l=0}^{ \left ({ L -1 }\right )} R_{l}' \;\; \& \bigcap \limits _{i=0}^{\left ({ L -1 }\right )} R_{l}' = \emptyset \tag{3}\end{align*} View Source where $\mathcal {D}$ and $\bigoplus $ denotes domain of the given image and compositing operation, respectively.

C. Limitation in the 3D Perception Image

The above obtained inter-image depths from the boundaries of decomposed multiple image region and synthesizing 3D perception image may not assure the accurate depth, which is one of the vital requirements for the all-in-focus 3D image or virtual 3D object. Essentially, a virtual 3D object is nothing but a 3D model of the realistic scene object. Indeed 3D image generation with accurate inter-image region depth is crucial for applications like biomedical, industrial robotics, and auto-driving vehicle maneuvering. It is easy to infer that the perception depth is not scale-invariant. For example, in robotic navigation or AR/VR application, there is a need to generate a 3D image scene that should be photo-realistic and deformable. We address this in the next section by relating the perception depth to the actual depth.

SECTION V.

Improved Depth 3D Image or Virtual 3D Object Using Geometric Optics

In the previous section, although the 3D perception image uses blur caused due to the DoF phenomenon, the sharpness in the image space is not a specific point, but the acceptable area caused due to CoC with no noticeable distortions. This fact implies that there must exist a relationship between the object space and image space for any camera in terms of specific parameters due to the following characteristics:

For a given FoV, the distant DoF (termed as far DoF) zone maps to the nearest DoFo (termed as near DoFo) zone for the camera and vice versa. The other points in the DoF zone starting from far point to near, will map in reverse order in DoFo zone. These DoF and DoFo zones are also subsets of object and image spaces, respectively. In this space, there would be negligibly noticeable degradation or blurriness in the image, if an image is focused well.
For a specific FoV of the camera and a 3D object placed in the DoF space, the irradiant rays originated from the points on various virtual planes in DoF zone form the least CoC or most focused image points at specific DoFo space points if the sensor plane is positioned at those locations along optical axis. But, it forms non-focused image points or larger CoC in any other location beyond or before this location (refer Fig. 3(a) and 3(b)).

FIGURE 3.

(a) Thin lens model for near focus planes (b) Thin lens model for far focus planes, (c) Thin lens model for lateral magnification.

Show All

Briefly, the far-field and near-field 3D scene points in object space forms a near focus and far focus image points in the image space, as shown in Fig. 3(a) and 3(b), respectively. The actual positions of the above spaces are dependent on four parameters, namely (1) focal length, $f$ (2) sensor sensitivity, (3) aperture type with diameter, $d_{a}$ , and (4) focus distance from the camera. In any camera system, rigid lens and sensor parameters would get frozen to constant value while manufacturing. Among the latter two parameters, we can consider the choice for shallow or deep DoF using camera to object distance, whereas aperture type and diameter $d_{a}$ are pre-configured since we are using MCA.

On noting the above pre-configurations of many parameters, for shallow DoF range, the relationship is deduced for inter-depths between objects in the 3D scene as computed inter-depths between corresponding two MPIs for non-transparent and non-speculative 3D objects at the stationary position while imaging for optical setup shown in Fig. 3(b). These configurations in the imaging optics yield the following characteristics when we look at both object space to image space together.

There exists a specific point on the DoFo zone in image space along the optical axis where the image sensor could be positioned to get a focused image points corresponding to a specific object surface belonging to the 3D scene in the DoF zone of the object space of the camera.
In any conventional camera, the 2D sensor is fixed at a distance $d_{ls}$ from lens to capture the focus image region for a specific shallow surface of the object in the 3D scene and many out-of-focus image regions due to varying size CoCs proportional to the points lying on the optical axis in DoF zone with reference to the focused surface for remaining objects in the 3D scene.
Though the width of the DoF space and DoFo space are not equal, and apart from being distributed unequally, for the above specific conditions in the camera setup, it is possible to know the corresponding far and near field points in DoF space to near and far focus points in DoFo space.

A. All-in-Focus Image as Multi-Plane Images

In Fig. 3(a) and (b), we have shown only $l=3$ number of planes for the sake of clarity, but we could have $L$ number of virtual sensor planes which correspond to $L$ number of MRIs. For the sake of generality, let us assume $\mathcal {S}_{0}, \mathcal {S}_{1}, \cdots , \mathcal {S}_{\left ({ L-2 }\right )}, \mathcal {S}_{\left ({ L-1 }\right )}$ be the virtual image sensor planes belong to the image space pertaining to a specific camera setup. Furthermore, we interpret these planes are specific planes of frustum whose base is fixed sensor plane $\mathcal {S}_{0}$ and the apex is at the lens. From the geometric optics principle, we know that the focussed image region on a fixed plane $\mathcal {S}_{0}$ is due to the object in the 3D scene at the focal distance from the lens that belongs to DoF zone in the object space. The objects before and after focal distance form defocused image regions on $\mathcal {S}_{0}$ . Obviously, the $\left ({ L-1 }\right )$ number of objects which are at different distances from the camera lens would create a corresponding $\left ({ L - 1 }\right )$ number of defocused image regions on $\mathcal {S}_{0}$ depending on the area of respective CoCs. At the same time, every $l^{th}$ MRIs would have resulted in the in-focus image if there is a virtual image sensor plane $\mathcal {S}_{l}$ in the image space of the camera whose inter-depth as $\Delta d_{0l}$ with respect to the $\mathcal {S}_{0}$ .

B. Multi-Plane Objects as 3D Scene

In this subsection, we denote any 3D scene comprising of many objects that need to be imaged could be represented as slices of multi-plane objects as layered DoF as described by David C Schedl and Michael Wimmer [20], [33]. This is also in a similar line to the 3D image as multi-plane images discussed in the previous subsection IV(B). However, this could be viewed in two ways, as one being multi-plane object layers as a part of the frustum in the object space that is intended to yield in-focus image region in DoFo space and arranging planes towards far focus point of DoF zone, and the other being multi-plane object layers as part of a frustum with referencing to the object plane which is intended to yield in-focus image in DoFo and arranging planes towards near focus point in DoF region. In both cases, the 3D scene is viewed as $L$ number of slices as multi-plane scenes belonging to part of frustum coinciding with DoF with inter-object depths as $\Delta D_{0l}, l=1,2, \cdots , \left ({ L-1 }\right )$ as shown in Fig. 2(c). Essentially, we use this fact to deduce the accurate relationship between inter-depths in the scene and 3D images represented as MPIs.

C. Relationship Between Inter-Image Planes Depth in DoFo and Inter-Objects Depth in DoF

Due to the geometric optics principle in any camera system, for a specific object position in FoV and camera configurations, back and front DoF regions in object space form the corresponding front and back DoFo regions in the image space, respectively. To arrive at $L$ objects corresponding to $L$ MPIs, we consider DoF virtual planes as $\mathcal {P}_{l}, l=0, 1,2, \cdots , \left ({ L - 2 }\right ), \left ({L - 1 }\right ) $ along the optical axis, respectively, belonging to the back DoF zone of the object space. When a camera is focused on the object, the selected virtual object planes form corresponding virtual image planes $\mathcal {S}_{l}, \;\; l= \left ({ L - 1 }\right ), \left ({ L - 2 }\right ), \cdots , 2, 1, 0$ , which are not only reverse ordered but also located at non-equidistant intervals.

As per [32], 3D image is generated using fusion of focused multiple layer image planes at points $S_{0}, S_{1}, \cdots , S_{\left ({ L-1 }\right )}$ on DoFo space in correspondence to the DoF planes $\mathcal {P}_{0}, \mathcal {P}_{1}, \cdots , \mathcal {P}_{\left ({ L-1 }\right )}$ in DoF space along optical axis, respectively. 3D perception image seems to be blind for parameters related to camera, object-image geometric parameters, and lighting parameters except for the fact that there exist $L$ number of objects in the 3D scene while imaging is carried out. It is important to consider the imaging process parameters for generating a 3D image, which plays a crucial role in computational photography.

In order to compute a 3D image with more realistic depth, we consider the object, image, and camera as shown in Fig. 3(b). For the sake of simplicity, we consider the case where we position the given 3D scene, in which the top portion of the 3D scene as $C'$ (at object layer $\mathcal {P}_{0}$ ) yields an in-focus image region on the fixed image sensor $\mathcal {S}_{0}$ (as shown in Fig. 3(b)) and bottom portion of the 3D scene is assumed to be at $A'$ (at object layer $\mathcal {P}_{2}$ ), which would form an in-focus image at virtual plane $\mathcal {S}_{2}$ in front DoFo zone of image space but captured as defocused image at the image sensor plane. It amounts to place the 3D object in the back DoF zone instead of the front DoF region. The irradiant light ray coming from the point $A'$ forms a conical bundle at object plane $\mathcal {P}_{0}$ with sectional diameter $\delta _{0}$ . Similarly, the irradiant light rays reflected from the object top position at $C'$ on object plane $\mathcal {P}_{0}$ would create a sharp image at sensor plane $\mathcal {S}_{0}$ in image space at point $C$ . Considering similar triangles $EA{'} D$ and $GA'F$ in Fig. 3(b), we arrive at \begin{equation*} \frac {D_{lo} + \Delta D_{0l}} { d_{a}} = \frac {\Delta D_{0l}} {\delta _{0}}, \quad {~\text {where }} l=1,2,3. \tag{4}\end{equation*} View Source where $D_{lo}$ is focal distance and $\Delta D_{0l}$ is inter-object depth between focused point $C'$ and non-focused point $A{'}$ . Noting magnification $M = \delta _{l} / \delta _{0} $ , the equation for $D_{lo}$ from (4), can be expressed as:\begin{equation*} D_{lo} = \frac { \Delta D_{0l} \left ({ M d_{a} - \delta _{l} }\right )} {\delta _{l}} \tag{5}\end{equation*} View Source

The depth estimation using geometric optics with Gaussian lens equation mentioned as:\begin{equation*} \frac {1}{\left ({ D_{lo}+ \Delta D_{0l} }\right )} +\frac {1}{d_{ls}-\Delta d_{0l} } = \frac {1}{f}, \tag{6}\end{equation*} View Source where $f$ represents focal length, distance between far focus point in DoF space to the lens is $\left ({ D_{lo} + \Delta D_{0l} }\right )$ (point $A'$ in Fig. 3(b)) and distance between far focus point in DoFo space to the sensor as $\left ({ d_{ls} - \Delta d_{0l} }\right )$ (point $A$ in Fig. 3(b)). Now, we recall the simplified relationship for far point $A'$ using the intercept theorem by J.N.P. Martel et al. [34] in-terms of intrinsic and extrinsic parameters as, \begin{equation*} \frac {1}{\left ({ D_{lo} + \Delta D_{0l} }\right ) } =\frac { d_{a} \left ({ d_{ls} {-} f }\right ) {-} f \delta _{l}} { d_{a} f d_{ls}}. \tag{7}\end{equation*} View Source

Further, we would like to mention that in J.N.P. Martel et al. [34] the real-time depth estimation is carried out by using a focal plane processor array with a tunable focus lens. Naturally, this method is not directly applicable to use the conventional camera because it involves changing both the image sensor and lens. Moreover, the tunable lenses are unstable in performance. But, our intention here is to estimate the depth using a rigid lens and fixed sensor with simple camera optics attachment with a computational approach from disparities. Magnification $M$ can be represented as:\begin{equation*} M= \frac {d_{ls} {-} f } {f},\;\;\; {~\text {where }} f {~\text {represents the focal distance. }}\quad \tag{8}\end{equation*} View Source

Due to above equations, (7) can be re-written as \begin{equation*} \Delta D_{0l}= \frac { \left ({ d_{ls} d_{a} }\right )}{\left ({ M d_{a} - \delta _{l} }\right )} - D_{lo}, \tag{9}\end{equation*} View Source where $l=1,2,3 \cdots L$ object layers.

On substituting $D_{lo}$ from (5) in (9) we arrive at:\begin{equation*} \Delta D_{0l} = \frac {\left ({ d_{ls} \delta _{l} }\right )} { M\left ({ M d_{a} - \delta _{l}}\right ) } \tag{10}\end{equation*} View Source

Re-writing $\delta _{l}$ from (2) in terms of an inter-image layers depth between image sensor plane to the $l^{th}$ image layer plane, $\delta _{l}$ as \begin{equation*} \delta _{l} = \frac { d_{a} \Delta d_{0l} } { \left ({ d_{ls} - \Delta d_{0l} }\right ) } \;\;\; l=1, 2,3 \cdots , \left ({ L - 1 }\right ) \tag{11}\end{equation*} View Source and substituting the same in (10), we arrive at a relationship between inter-objects layer depth between focus layer to non-focused $l^{th}$ layer as below:\begin{equation*} \Delta D_{0l} = \frac { \Delta d_{0l}} { M \left ({ M - \frac {\left ({ M + 1 }\right )}{d_{ls}} \Delta d_{0l}}\right ) }. \tag{12}\end{equation*} View Source

Alternatively, (12) can be expressed in the form below:\begin{equation*} \Delta D_{0l} = \frac { \Delta d_{0l}} { M^{2} \left ({ 1 - \frac {\left ({ M + 1 }\right )}{M}\frac {\Delta d_{0l}}{d_{ls}} }\right ) }. \tag{13}\end{equation*} View Source

In the similar lines, we can obtain $\Delta D_{0l}$ for back DoF zone as:\begin{equation*} \Delta D_{0l} = \frac { \Delta d_{0l}} { M^{2} \left ({ 1 + \frac {\left ({ M + 1 }\right )}{M}\frac {\Delta d_{0l}}{d_{ls}} }\right ) }. \tag{14}\end{equation*} View Source

The interesting point regarding the above relationship is that the inter-depth of the virtual 3D object is expressed only as image space parameters and magnification of the camera, where the latter is measured without direct object space parameters. In Fig. 3(c), based on the geometric optics principle, we infer that transverse and axial magnifications of the camera setup will alter not only the size of the image but also the width of the object layers and corresponding image layers. Further, we also note that the equidistant points on the optical axis in the object space will distribute unequally on image space along the optical axis. The relationship shown in (13) is used for obtaining true depth at various points on the 3D object surface with its corresponding point on the generated 3D image surface. Note here that the relationship given in (10) can also be used to identify the inter-depth from disparity $\delta _{l}$ itself without generating a 3D image, that helps in generating virtual objects required for mixed reality applications. Further, this way of identifying inter-object layer depth information is helpful in object detection, especially for industrial robotic and auto vehicle anti-collision applications. For the sake of simplicity, we have considered geometric optics with a thin lens camera setup. However, similar relations could be deduced for thick lens or more than one lens also though it is not discussed in the current study.

D. 3D Virtual Object From Multiple Plane Images and Improved Inter-Depths

A 3D topographic surface of a non-transparent and non-speculative 3D object visible under FoV of any given camera along its optical axis is defined as a discrete luminance function represented in the form of a 2D array as below:\begin{equation*} G_{\left ({ i,j }\right )} = Z_{i,j} \hspace {0.3cm} \left ({i, j }\right ) \in \mathcal {D} \tag{15}\end{equation*} View Source where $Z_{i,j} $ denotes the maximum topographic value of the 3D object surface with reference to the virtual plane $\mathcal {P}_{0}$ in the DoF zone of the camera’s object space. The above-said object surface points on various virtual object layers have corresponding positions $\mathcal {S}_{l}, \;\; l=0,1, \cdots , \left ({ L-1 }\right )$ of the image sensor planes that would generate in-focus MPI layers that are formed as out-of-focus image regions, if acquired at $\mathcal {S}_{0}$ . As pointed out earlier, out-of-focus image regions on an image sensor plane are caused due to the object points lying on any $\mathcal {P}_{l}$ , the $l^{th}$ object layer that is farther away from the in-focus object layer $\mathcal {P}_{0}$ and give rise to disparity as $\delta _{l}$ on the image sensor plane $\mathcal {S}_{0}$ .

E. Synthesis of Virtual 3D Object From 3D Perception Image

In this subsection, we utilize multiple planes for the object representation similar to MPIs as shown in Fig. 2(a) that is a one-to-one correspondence with the layers of MPI, except for the fact that inter-object layer as $\Delta D_{0l}$ instead of $\Delta d_{0l}$ . Any 3D virtual object is synthesized in similar lines like the 3D perception image described in section IV(B) using the relationship between inter-object layers depths given in equation (3) as below:\begin{align*} G \left ({ i, j }\right ) \in \mathcal {D}=&\bigoplus \limits _{i=0}^ {\left ({ L - 1 }\right ) } \left ({ g \left ({ i, j }\right ) \in \mathcal R_{l}' }\right ) + \Delta D_{0l} \\ \mathcal {D}=&\bigcap \limits _{l=0}^{\left ({ L -1 }\right )} R_{l}' \;\;\; \& \hspace {0.5cm} \bigcup \limits _{i=0}^{\left ({ L -1 }\right )} R_{l}' = \emptyset \tag{16}\end{align*} View Source

Indirectly, adding $\Delta D_{0l}$ to each pixel in the region enables to get in-focus image regions from the defocused pixels due to the effect of shallow DoF. We interpret the above synthesized virtual 3D object or image as nothing but a 3D object representation expressed with actual depth in similar lines to 3D perception image. Again, this may find advantageous to surgical devices for probing and dissecting or LASER device pointing more accurately. The virtual 3D object is helpful to view using virtual or augmented or mixed reality visual systems. We believe that it aids in medical imaging appliances to analyze the medical practitioner for diagnosis of disease symptoms.

SECTION VI.

Discussion on Experimental Evaluations for 3D Imaging System

In this section we discuss some simulated experimental results to validate the quantified 3D image generation formulated in previous sections.

A. MCA Camera Setup for Image Acquisition

The experimental setup is formed using the scheme shown in Fig. 3(a) and 3(b) with the specifications as the diameter of the aperture being $d_{a} = 12mm$ , three RGB filters arranged as per Fig. 1, and distance between lens and sensor being $d_{ls} = 52.63mm$ for canon 50mm f1.8 II when focused at 100cm. Hence, we determine the minimum inter-object in DoF range object space and its shifts in terms of pixels for the camera (CANON EOS 70D DSLR Camera) for our demonstration purpose.

B. Data Set Creations for Evaluations

To demonstrate and evaluate the practical performance of the proposed approach, we need to create relevant image data set because there seems to be no standard color filter aperture image database. We have created the following sample MCA 2D RGB images by various arrangements with known inter-depths between the objects.

Scene consisting of non-overlapping similar color objects arranged one behind the other but focal distance is on one of the intermediate object,
Scene similar to above objects with different colors,
Scene with one object overlaps the other with varying sizes and colors,
Scene with an object in slant position, overlapped with its middle portion and visible at both sides,
Scene with different color and sized objects one behind the other such that mid portions are occluded.
Scene with both overlapped and non-overlapped objects.
Most of the above scene with inter-object depth is maintained between 0.5cm to 10.5cm gap.

C. Experimental Evaluations

This subsection describes both qualitative and quantitative results on images with non-trivial scenarios captured using MCA camera with color filter arrangement shown in given Fig. 1.

The qualitative evaluations on the depth map for all the generated data sets are discussed and the results are displayed in Fig. 4.

In col. (II) of all rows, we note that the 2.1D sketch yields all the regions in the scene image with comparatively good accuracy in the boundaries irrespective of their occlusions, which play a vital role in arriving at good 3D perception and actual 3D image.
Depth maps for MCA images shown in col. (III) of Fig. 4 is represented in terms of gray levels, where higher gray value for a region, implies that it is nearer to the lens. Also, it has a very narrow range of gray levels since the inter-object depth gap is minimal.
In col. (IV) of all rows indicates the scatter map. Rows (b), (e), (f), and (g) indicates the detection of the overlaps that exist in the images very accurately, and the same is utilized to arrive at MPI’s, resulting in good 3D images in the respective rows shown in col. (V).
From col. (IV), we see that the scatter map display depicts that the proposed algorithm resolves all regions very well.
Col. (VI) displays the running time in seconds to execute the algorithm for arriving at 3D virtual objects corresponding to the acquired 2D MCA image. The quoted runtime for algorithm is applicable for MATLAB based implementation that runs on specific hardware and software platform, having the Intel®core i5-4460 CPU @ 3.20GHz $\times 4$ with pre-loaded UBUNTU 16.04 LTS operating system and 8.00GB memory without considering any effort on many possible optimizations. There are many scopes for improving the execution time by adopting optimization of algorithm not only with respect to hardware and software on any intended application scenarios but also with the elimination of redundant computations and memory requirements.

Table 1 presents the quantitative comparisons between ground truth and estimated values for all the images shown in Fig. 4. However Table 2. gives absolute and relative depth errors. The absolute error is expressed as the absolute difference between the ground truth and estimated depth at image labels to signify the amount of deviation for obtained depth at labeled locations. The relative errors expressed as a ratio between the absolute error and the ground truth depth at labeled points signify the percentage error with respect to the ground truth depth. From Table 2, we note that the estimated inter-depth values are very near to the true values.

TABLE 1 Quantitative Comparison Results Reported With Actual Values Between Estimated Inter-Depths of 3D Image and Their Respective Ground Truth Inter-Depth for Salient Examples for Designated Objects, A, B, C, D, E, and F Present in the Scene and its Resultant 2D Image for Respective Regions

TABLE 2 Performance Errors Between Estimated Inter-Depth in the Generated 3D Image and True Inter-Depths in the Scene

FIGURE 4.

Illustration scene images of salient scenarios with overlaps, inclined positioning with respect to the optical axis of the MCA camera, and various combinations are displayed as rows with its intermediate and final stages results as columns, col (I): MCA images, col (II): 2.1D sketch, col (III): Depth map, col (IV): scatter map col (V): Accurate inter-depth 3D image plot, col (VI): Runtime in seconds, row (a): MCA image of pots placed one behind the other with a gap 3cm with no overlap, row (b): MCA image of non-overlapped blue and green blocks with a gap of 3.5cm and orange block overlapped by blue block at one end and green block at the other end at a gap of 4cm, row (c): Six color blocks one behind other at a gap of 3cm, row (d): MCA image of four similar color bottles kept at one behind other with a gap 7cm, 9cm, and 8cm, row (e): MCA image of the wooden spatula is overlapped by a red tomato, orange carrot and, green tomato, which are placed at the gap of 0.5cm, 4cm, 4cm, and 6.2cm, row (f): MCA images of three different color and size cups overlapping each other placed one behind other at a gap of 6.5cm and 10.5cm row (g): MCA image of a knife in slant position along the optical axis with one end at 3cm front and the other end at 10cm behind overlapping orange carrot.

Show All

We present the detailed performance analysis as below:

On referring to row (c), row (e), and row (f), we see the error in estimating the respective inter-region depth is significantly less.
From row (a), row (c), and row (d), we notice that the error in estimated inter-depth is lesser when the actual gap between is less.
The estimated inter-depth value seeming to be less erroneous when the object is near to the focussed object (refer rows (a), (b), (d), (e), and (f)).
From Table 1, we note that the estimated inter-region depth in perception 3D image is not near-to the ground truth inter-objects depth. On the other hand, the computed inter-image depth in the 3D image from the newly derived relationship yields better results.
From the results shown in row (a), we could infer that the error increases as the actual gap between the objects increases.
As per row (e), the error is less when the gap between the objects is less, but on the other hand, it also exhibits the error that is comparatively more for the object that lies on the near focus side, though the object gap is same. Similar results are exhibited in rows (d) through (g).
On observing all rows (a) through (g), we infer that errors are minimal when the object depth is nearer to the focal distance.
The performance accuracy is similar in both rows (c) and (e).
From the results depicted in a row (b), we infer that the estimated depth is the same for objects A and B with less estimated error than the object C, which is behind object D. But, we note that it exhibits the slant orientation as expected.
From row (g), the estimated depth between B and C is comparatively lesser than that of inter-object depth between A and B since their inter-object is based on near-focus conditions. A similar characteristic is exhibited in row (c).
From Table 2, we observe that errors are less for near focus compared to far focus depth for all figures in rows (a) through (g).
We notice that in Table 2, for rows (b), (c), and (d) the errors are less near to the focus region than the errors away from focus regions.
Especially, in rows (a), (c), (d), and (e) with the focus region from the camera on one of the intermediate objects in the scene instead of the nearest object to the lens, error estimation for inter-depth are less for those objects behind focus object than the objects those in front of the focussed object. The same trend is true in the rows (b), (f), and (g), where focal distance is nearer to lens.
In general, the accuracy of the inter-object depth results are better when the object is focused, and the remaining objects are placed between the focus distance and distant endpoint in DoF zone of object space.
Table 2 shows that the relative depth measured for a near focus object has an error of 10%, and the percentage error increases gradually up to 57% as the objects are placed laterally away from the focused object.

SECTION VII.

Conclusion and Future Scope

We have presented a novel method of synthesizing a more accurate geometric depth 3D image from the 3D perception depth image using the newly formulated inter-depth relationship. We have derived a specific non-linear dependency between inter-depths of two objects lying in 3D scene and corresponding inter-depths of two MPIs in image space parameters under shallow DoF zone constraints. On the basis of this, a more accurate geometric depth 3D image synthesis from a single image has arrived as three steps: (i) Generation of 3D perception image from MPIs using its inter-depths computed from inter-image region boundaries disparities. (ii) The single 2D image is decomposed into one in-focus region and many out-of-focused regions caused due to varying CoCs using 2.1D sketch as a semantic image segmentation, and (iii) 3D image is composed by re-aligning the image surface and corresponding MPIs with accurate respective inter-depths.

The partial ordering of image regions using the 2.1D sketch enables us to determine the occlusions among the multiple objects in the scene better than the alpha-matting. Experiments show that the proposed method gives better results not only when the scene comprises multiple objects lying at different depths with dissimilar colors but also objects lying with the same color. Further, the proposed method has been demonstrated to yield smaller inter-depths of the order of few millimeters (range 5mm to 105mm gap between the objects in the 3D scene), making it quite qualified for real-time applications.

Few future scopes for research directions are enlisted as below:

Improving the accuracy of depth estimation for the objects lying on the off-axis with respect to the camera’s optical axis.
Definitely worth making comparative studies between the suggested approach with plenoptic imaging to understand the importance of optical lens arrays.
The extension of the above-discussed methods for microscopic 3D image analysis could be interesting for many machine inspection and bio-imaging applications.
Exploration on deep layer network for robust and more precise 3D image generation using a single image to arrive at improved performance over the suggested approach in this article.
Exploration on the computation of 3D virtual and 3D perception images using front DoF and back DoFo zone is a worthwhile exercise to see the performance regarding the accuracy in inter-depths.

References is not available for this document.

Accuracy in Depth Recovery and 3D Image Synthesis From Single Image Using Multi-Color Filter Aperture and Shallow Depth of Field

Abstract:

Metadata

Abstract:

Introduction

Related Research Works

A. Review on Depth Estimation From Disparity Using MCA

B. Review on Layered Representation for 3D Perception Image

Motivation

Geometric 3D Perception Image From Multi-Plane Image

A. Decomposition of Multiple Image Regions and Computation of Inter-Depths