Introduction
All in-focus 3D image generation having accurate depths is an essential need in many computer vision-based applications for getting crucial details regarding the 3D scene. In this article, we mean all-in-focus image as approximately computable such that every pixel in this image is in focus. Creating a virtual 3D object or 3D image synthesis is essential for any virtual/mixed reality applications, especially for medical diagnosis and surgical training appliances. Any conventional optical imaging camera system that is used for visible light intensity imaging, a real-world 3D scene or 3D object typically comprises of three intrinsic components, namely, (i) 3D object space, (ii) image space, and (iii) optical imaging system [1]. Among the various known optical imaging systems that use various ranges of electromagnetic rays, we consider only visible spectral light rays based on the photographic system. Moreover, the optical image sensors used in the image space in most digital imaging systems would be either CMOS or CCD sensors, which are typically 2D in nature for capturing the image of any 3D object, which may be due to a low-cost compulsion. The method of recovering the depth information to represent a 3D image needs some tricky ways to project the irradiant rays emitted from the scene/object surface onto the 2D sensor plane. To date, most of the approaches suggested are based on either single view, stereo-view or multi-views [1], [2] for generating 3D perception qualitatively for the real 3D world scene from the irradiant optical stimulations (mathematically it would be of one to many mapping), [3], which we term as “3D perception image”. Among those approaches that produce 3D perception image, single view image-based 3D image generation is a most challenging one though it is advantageous because of two reasons, namely, (i) it involves less optics with an extensive computational complexity and (ii) it is helpful in some scenarios where stereo or multi-view imaging is not possible. The generation of a 3D image from a single-view 2D image is an active area of research due to the following reasons, namely, (i) suitable imaging can be readily obtained by a commercially available and affordable camera with either minor modifications or plug-in attachments in contrast to the devices that involve time-of-flight [4] structured illumination [5] or stereo camera [5], etc., (ii) unlike the conventional camera, computational camera composes of both geometrical imaging optics having built-in image processing/analysis functionality with computational resources [6], (iii) especially, in medical imaging applications, all focused images would give good clarity in identifying pathological symptoms. In any imaging system, exposure parameters, namely shutter speed and aperture size, either one or both of them may be used to get the desired amount of light radiation onto the sensor. Further, the sensitivity of the sensor also plays a crucial role. However, the camera is designed and developed based on the desired sensor and lens, resulting in fixed sensor and lens characteristics. Moreover, in conventional cameras, flexibility lies only in aperture and shutter speed. The flexibility in the camera system is achieved in many ways by the aperture used in the camera, viz, (i) radius of the aperture or size of the aperture [1], (ii) multiple apertures (e.g. color filter aperture) [7]–[9] (iii) sparse aperture (e.g. color-coded apertures) [10], (iv) offset aperture [11] and (v) differential aperture photometry [12]. In recent days, computational photography systems are gaining more importance over traditional photography because it comprises not only optics and electronics to capture the image but also software-based computational manipulation to improve the imaging capability [13]. These manipulations may involve to compute attributes, such as shape, position, resolution, structure, unfocused pixel value or pixel regions, variation of aperture, and control the amount of incoming light, etc. It is important to distinguish between 3D perception scene image and the more precise 3D virtual scene or image. In a 3D perception image, the depths between the various portions of the image pertaining to their corresponding features are proportional to give a feel for the 3D environment. However, in the 3D precise image, the depths mentioned above are of a more accurate estimate, which depicts the real measurement of the scenes as required for auto maneuvering of robots or vehicles. This article will extend the computational 3D perception imaging system comprising a camera, multi-color filter aperture (MCA), and computation of depth by disparities among various boundaries identified using a 2.1D sketch for the single image [14]. In this article, we adapt computational photography techniques to generate a 3D image from a single RGB image acquired from a conventional digital camera with add-on MCA using red, green, and blue filters. We propose to obtain multi-plane images (MPIs) that attributes to all-in-focus image regions based on geometric optics principle as shown in Fig. 1 with associated inter-depths between them by suitable computational processes. Ultimately, we aim to deduce the quantitative model utilizing associated camera parameters to arrive at the depth parameter using the acquired 2D RGB image. In this research effort, we develop a quantitative model for the computational imaging system suggested in [14] that employs both depth cues from MCA [8] and composition of a 3D image using MPIs by using multi-region image (MRI) decomposition of the acquired image using a 2.1D sketch [15] as semantic segmentation. The primary goal here is to arrive at better accurate 3D images or 3D virtual objects in terms of associated geometrical and intrinsic camera parameters. In brief, the method relates accurate inter-image depth in multiple object layers in DoF and its corresponding inter-image depths in MPIs in the DoFo region. However, inter-depths for the decomposed MPIs are associated with disparity parameters using MCA 2D image and 2.1D sketch. Ultimately, it leads to an affordable device due to minor optics modification as MCA with red, green, and blue filters arrangement shown in Fig. 1 embedded in CANON 50mm f1.8 II lens to the readily available CANON 70D DSLR camera.
In summary, our contributions are:
Deduction of explicit relation between inter-depth associated with MPIs of 3D perception image with inter-objects depth of real-world 3D scene with a shallow DoF constraint.
Generation of more accurate geometric depth 3D image from a single 2D RGB image as three steps, namely, (i) Generation of 3D perception image as a set of MPIs, using inter-image planes depth and image surface, (ii) The set of MPIs are obtained by multi-regions due to varying CoCs as a consequence of shallow DoF determined using 2.1D sketch as semantic image segmentation, inter-image plane depths from the disparities across the boundaries caused due to MCA and smooth surface by adjusting the image textures from respective regions of acquired 2D MCA image, and iii) Final more accurate depth 3D image is obtained from 3D perception image by realigning with their inter-objects depth.
Section 2 reviews the relevant depth estimation using MCA and associated image representation for view synthesis by several researchers along with limitations. Motivation for this project is highlighted in section 3. Section 4 explains the geometric 3D perception of the image from a multi-plane image based on ordered multi-region image and inter-depth estimation. Section 5 describes a 3D accurate depth image or virtual object from MPIs based on a newly deduced relationship between the inter-depth of the perception image and its 3D accurate depth image or virtual scene. Section 6 demonstrates the experimental simulation to validate the above developed 3D image, Section 7 gives the concluding remarks with the future scope of research.
Related Research Works
Although substantial research has been carried out on the depth estimation and reconstruction of 3D perception image (typically termed as all-in-focus image) using MCA, we notice that hardly any discussion is carried out on precise depth. As mentioned earlier, our main aim involves 3D imaging using a single image with a conventional camera with more accurate depth based on cues associated with MCA and DoF. Some crucial computer vision applications such as robotic, medical diagnosis, and surgery planning need more exact shape, size, and volume-based object recognition. Particularly in the now being considered technique, this boils down to establish an exclusive relationship between perception depth image and actual depth of the object in the 3D scene. Hence, our review is centered around depth estimation using both DoF and MCA to arrive at all-in-focus image synthesis.
A. Review on Depth Estimation From Disparity Using MCA
Jaehyun Im et al. [15] have suggested an optically modified color-filter aperture (CFA) for tracking objects with both depth and position expressed in color shifts by assuming there would be color discontinuities when there are two different objects in the scene that are located at different depths. The phase correlation is employed in the adaptive mean shift algorithm for real-time implementation of the CFA approach. However, in this method, occlusion handling entirely depends on empirical threshold selections based on pixels, and the method does not assure to take care of the multiple objects with the same color at different positions along the optical axis. Yosuke Bando et al. [8] proposed the improved depth estimation and usage of matte to distinguish between foreground and background using color misalignment utilizing several image-editing techniques. More crucially, the disparities increase when the lens is focused nearer, wherein inter-depths between any two objects in the scene may not be consistently the same. It is noted that the better inter-depth results, when objects are at 50cm to 250cm, and if the distance between object and camera is about twice as far away from the background. Further, performance degrades for smaller color misalignment and the objects with the same color. On the other hand, the restriction of a distinct colored object limits the practical utility of this approach.
The color-coded aperture (CCA) based method by Ivan Panchenko et al. [16] needs calibration to arrive at better results for different coded aperture designs and is based on the calibration and the effective focal length. The demerits of CCA are its limited field of view (FOV) [17], and are claimed to perform better than pinhole collimators only with a very localized point source.
The incorporation of light efficient CCA was designed and calibrated in skillful ways by Vladimir Paramonov et al. [18] to arrive at depth resolution in millimeters for a given image frame. Moreover, their contribution demonstrates DSLR, smart-phone, and compact camera to real-time 3D scene generation and depth-based image effects. But, this resolution is applicable at the center of the acquired image only. Unfortunately, both of these methods exploit GeForce GTX 780 hardware GPGPU for handling computation intensiveness. There is no assurance to guarantee inter-depth for the objects with the same color and textures in the image. But, the accuracy of CCA method depends on many factors such as working range, stronger texture information for depth extraction, lower accuracy in strongly defocused areas, and image restoration to get a sharp image. A single MCA camera-based methodology suggested by Seungwon Lee et al. [19] and modified multi-focusing image misalignment by Sangjin Kim et al. [19] to track the object upon exploiting sparsely extracting edges due to red, green, and blue color shifting vectors at each channel assures good estimate of depth only in the central region of the image. However, both of these methods assure good depth only at the center of the image without any mention of occlusion handling issues. Recently, we have proposed a novel method of generating a 3D image from a single 2D RGB image for a scene utilizing the depth from disparity based on MCA that handles multiple objects in the scene both with occlusions and with the same colors at different depths along the optical axis. It utilizes a 2.1D sketch instead of image matte on the MCA images to determine the
Although the previously discussed algorithms deal with various aspects of how to arrive at 3D perception image having all-in-focus image surface with proportional depth information, acceptable in some computer vision applications, may not be adequate in some applications such as robotic environment, auto-driving, and some mixed reality medical diagnostic studies. Again, we note that they exploit only image-side information to arrive at 3D perception image that does not use many camera parameters, lays the foundation for our motivation discussed in the following subsection.
B. Review on Layered Representation for 3D Perception Image
The requirement of more accurate inter-depths between the objects is essential for geometry-aware manipulation of 3D virtual objects in most virtual or augmented reality applications when the 3D scene comprises more than one object. It is important to note that any 3D object in the 3D scene possesses non-shape changing attributes such as color, overall size, orientation, and shape-changing attributes such as the number of sides (parallel or non-parallel, straight or curved), vertices, edges, boundaries, etc. Many of the above attributes may exist in the 3D perception image in general except for the accurate depth information.
In the literature, we notice that layered depth images (LDI) representation is used as one of the compact coding representations for multi-view synthesis to address occlusion and hidden information associated with a 3D scene [20]. Vincent Jantet, et al., [21] suggested an improved virtual synthesis based on object levels distinction between background and foreground, using region growing segmentation technique. But this algorithm is developed for multi-image views.
The multi-plane image (MPI) representation is another way to synthesize a 3D image that enables to render each pixel to get scene independent new views with consistent non-occlusion when multiple objects are involved in the scene. Each image plane is considered as a RGBA image belongs to the part of frustum with apex at lens and positioned at fixed equally spaced depth obtained as inverse of disparity [22]–[25], [26], [27]. Recently, we have utilized MPIs as
Motivation
In this article, before bringing up salient points for quantification of depth in 3D imaging, we distinguish 3D visual luminance image as: 3D perception image realized with an illusion of depth to the 2D luminance image, whereas 3D virtual object or 3D precise synthesized image is realized with a true depth estimated from realistic parameters to the 2D luminance image. Further, we point out the 3D virtual object or 3D precisely synthesized image is advantageous in various real-world applications. We note few relevant points, which are relevant for our quantified 3D imaging systems using shallow DoF are listed below:
The insufficiency of 3D perception image obtained intuitively using a central theme for generating an all-in-focus 3D image from a single image needs to decompose the given image data as appropriate ordered multiple layered images. On adapting only image data as a set of varying blurred or defocus image regions, we could generate only 3D perception images from the depth from disparities and decomposed multiple layer images.
Inability in acquiring all-in-focus 2D images or accurate depth-based 3D image corresponding to entire 3D objects of the scene using a 2D image sensor as there is no simplistic and cheaper known 3D image sensor other than costly 3D sensor like time of flight [2], Microsft Kinet device [28], etc.,
Limitations associated with DoF, especially under limited illumination conditions [1],
Technically, deep to shallow DoF zones are parametrically controllable by aperture, focal length, and distance of the object from the objective lens of the camera [1],
The change in aperture radius will not only vary DoF zone but also FoV [1].
Interestingly, for a fixed aperture size or FoV, on one hand, DoF is directly proportional to object distance with focal length held constant, and on the other hand, DoF is inversely proportional to focal length when the object distance is held constant [1].
Very shallow DoF results very small all-in-focus 2D image [1],
DoF is sensitive on digital format size or CoC criterion for a camera [1],
Professional photographers use DoF photography, which involves controlling deep to shallow DoF for getting creative effects. Especially shallow DoF isolates the required portion of the image in the larger scene or FoV, since a shallow DoF zone in the object space leads to a small all-in-focus region along the optical axis, but the regions before or after the all-in-focus region along the optical axis are out-of-focus [1],
Changing the DoF by varying focal length is impossible once the lens is decided for the camera unless it is a liquid lens, which amounts to the extra cost.
The DoFo zone interval in image space is related to the DoF zone interval in object space and is dependent on both intrinsic and geometric parameters of the given camera [1].
The concept of exploiting DoF is used in the field of computer graphics or visual synthesis or virtual reality [20]. But, it is used as a software simulation for generating 3D perception image to get the feel of photorealism only.
The above-enlisted issues and properties have motivated us to arrive at an affordable 3D image generation system. That would find applications for 3D imaging for biological surface tissues, collision-free autonomous driving, and maneuvering industrial robotics, where the objects that need to be imaged are always very close to the camera lens. This small focal distance (object to lens distance) is an obvious consequence of wide-angle aperture, and FoV [1]. A shallow DoF is a direct consequence of focal distance, which corresponds to a small DoFo leading to a smaller in-focus image region, and many image regions are out-of-focus, causing blurriness due to CoCs [1]. These factors prompted us to derive virtual positions for multiple image sensor planes as consequences of disparities caused due to blurriness on the fixed image sensor in the pre-configured camera setup to the acquired small in-focus region and remaining out-of-focus regions during single image acquisition of the scene. In this article, among the many interpretations for the 3D image corresponding to a 3D scene in nature, our focus is on geometrical aspects restricted either as a 3D perception image, and actual 3D image from a single image. The perception 3D image referred to the image of a scene that represents the geometrical environment displayable as an all-in-focus image with qualitative depth information, whereas a precise 3D image as a virtual object would be able to project with more realistic depth information.
Geometric 3D Perception Image From Multi-Plane Image
In our previous work, we had considered single 3D scene image representation as a composition of multi-plane images from 2D multi-region images by exploiting the objects with varying amounts of blurriness, lying at a varying depth, that enables us to arrive at perception depth [19].
A. Decomposition of Multiple Image Regions and Computation of Inter-Depths
Typically, any given scene comprises multiple objects with the same color or different colors at varying depths, either with and without overlaps. These objects would result in varying blurriness in a captured 2D image due to DoF configurations. The first step is to decompose the given 2D image into various image regions corresponding to the
(a) MCA camera set up for varying defocused captured 2D image and computed 3D perception image as MPIs and transforming to 3D virtual object/ estimated 3D image. (b) 3D perception image as MPIs (c) 3D Virtual object or estimated 3D image.
Especially in this article, we consider a 2.1D sketch since it segments the image regions that handle occlusion explicitly to decide the partial ordering of regions by indicating their ordering. This method exploits depth cues from edges, curves, cusps, crack tips, and
Any given 2D image is described with domain \begin{equation*} \mathcal {D} = \bigcup _{l=0}^{ \left ({ L - 1 }\right ) } \mathcal {R}_{l}', {~\text {where }} \mathcal {R}_{l}' = \bigcup _{ l < k, \mathcal {R}_{l} < \mathcal {R}_{k} } \mathcal {R}_{l} \tag{1}\end{equation*}
B. Revisiting the Perception 3D Image From MPI
We recall from our earlier research work the all-in-focus image as 3D perception image represented as \begin{equation*} \Delta d_{0l} = \frac {\delta _{l}\times d_{ls}} {\left ({ d_{a} + \delta _{l} }\right ) }, \quad l=1,2,3 \cdots , \left ({ L - 1 }\right ) \tag{2}\end{equation*}
In natural scenario, these partial or full occlusions between the objects lying in a 3D scene may not be very well ordered but these occlusions are caused due to the objects at different heights and locations other than focus distance in the 3D scene. Further, as per geometric optics, shallow DoF would result in shift varying blurred regions across the captured single image. In our previous research, we are inspired to use 2.1D sketch since it enables occlusion aware image segmentation for decomposing the single image into MRIs, and allowing to take into account the partial occlusion of the farther object by those with nearer ones [1]. Briefly, the salient steps for perception 3D image generation algorithm is enumerated as below:
Identify various image regions using salient boundaries based on the change in disparities and obtain inter-depths between the regions using (2).
Order the MRIs from foreground to background layers using depth cues as per 2.1D sketch.
Generate 3D image as a composition of
fronto-parallel MPIs as a part of a frustum with a reference to the image sensor planes as shown in Fig. 2(b) (shown in dotted box as MPI frustum), each at the distance$L$ ,$\Delta d_{0l}$ and its smooth surface of$ l=0,1, \cdots , \left ({ L -1 }\right )$ MPI as$l^{th}$ .$ g \left ({ i, j }\right ) \in \mathcal {R}_{l} +\Delta d_{l}$
The 3D perception image obtained above can be expressed as:\begin{align*} g \left ({ i, j }\right ) \in \mathcal {D}=&\bigoplus _{i=0}^{ \left ({ L - 1 }\right )} \left ({ g \left ({i, j }\right ) \in \mathcal R_{l}' }\right ) + \left ({ \Delta d_{0l} }\right ) \\ \mathcal {D}=&\bigcup \limits _{l=0}^{ \left ({ L -1 }\right )} R_{l}' \;\; \& \bigcap \limits _{i=0}^{\left ({ L -1 }\right )} R_{l}' = \emptyset \tag{3}\end{align*}
C. Limitation in the 3D Perception Image
The above obtained inter-image depths from the boundaries of decomposed multiple image region and synthesizing 3D perception image may not assure the accurate depth, which is one of the vital requirements for the all-in-focus 3D image or virtual 3D object. Essentially, a virtual 3D object is nothing but a 3D model of the realistic scene object. Indeed 3D image generation with accurate inter-image region depth is crucial for applications like biomedical, industrial robotics, and auto-driving vehicle maneuvering. It is easy to infer that the perception depth is not scale-invariant. For example, in robotic navigation or AR/VR application, there is a need to generate a 3D image scene that should be photo-realistic and deformable. We address this in the next section by relating the perception depth to the actual depth.
Improved Depth 3D Image or Virtual 3D Object Using Geometric Optics
In the previous section, although the 3D perception image uses blur caused due to the DoF phenomenon, the sharpness in the image space is not a specific point, but the acceptable area caused due to CoC with no noticeable distortions. This fact implies that there must exist a relationship between the object space and image space for any camera in terms of specific parameters due to the following characteristics:
For a given FoV, the distant DoF (termed as far DoF) zone maps to the nearest DoFo (termed as near DoFo) zone for the camera and vice versa. The other points in the DoF zone starting from far point to near, will map in reverse order in DoFo zone. These DoF and DoFo zones are also subsets of object and image spaces, respectively. In this space, there would be negligibly noticeable degradation or blurriness in the image, if an image is focused well.
For a specific FoV of the camera and a 3D object placed in the DoF space, the irradiant rays originated from the points on various virtual planes in DoF zone form the least CoC or most focused image points at specific DoFo space points if the sensor plane is positioned at those locations along optical axis. But, it forms non-focused image points or larger CoC in any other location beyond or before this location (refer Fig. 3(a) and 3(b)).
(a) Thin lens model for near focus planes (b) Thin lens model for far focus planes, (c) Thin lens model for lateral magnification.
Briefly, the far-field and near-field 3D scene points in object space forms a near focus and far focus image points in the image space, as shown in Fig. 3(a) and 3(b), respectively. The actual positions of the above spaces are dependent on four parameters, namely (1) focal length,
On noting the above pre-configurations of many parameters, for shallow DoF range, the relationship is deduced for inter-depths between objects in the 3D scene as computed inter-depths between corresponding two MPIs for non-transparent and non-speculative 3D objects at the stationary position while imaging for optical setup shown in Fig. 3(b). These configurations in the imaging optics yield the following characteristics when we look at both object space to image space together.
There exists a specific point on the DoFo zone in image space along the optical axis where the image sensor could be positioned to get a focused image points corresponding to a specific object surface belonging to the 3D scene in the DoF zone of the object space of the camera.
In any conventional camera, the 2D sensor is fixed at a distance
from lens to capture the focus image region for a specific shallow surface of the object in the 3D scene and many out-of-focus image regions due to varying size CoCs proportional to the points lying on the optical axis in DoF zone with reference to the focused surface for remaining objects in the 3D scene.$d_{ls}$ Though the width of the DoF space and DoFo space are not equal, and apart from being distributed unequally, for the above specific conditions in the camera setup, it is possible to know the corresponding far and near field points in DoF space to near and far focus points in DoFo space.
A. All-in-Focus Image as Multi-Plane Images
In Fig. 3(a) and (b), we have shown only
B. Multi-Plane Objects as 3D Scene
In this subsection, we denote any 3D scene comprising of many objects that need to be imaged could be represented as slices of multi-plane objects as layered DoF as described by David C Schedl and Michael Wimmer [20], [33]. This is also in a similar line to the 3D image as multi-plane images discussed in the previous subsection IV(B). However, this could be viewed in two ways, as one being multi-plane object layers as a part of the frustum in the object space that is intended to yield in-focus image region in DoFo space and arranging planes towards far focus point of DoF zone, and the other being multi-plane object layers as part of a frustum with referencing to the object plane which is intended to yield in-focus image in DoFo and arranging planes towards near focus point in DoF region. In both cases, the 3D scene is viewed as
C. Relationship Between Inter-Image Planes Depth in DoFo and Inter-Objects Depth in DoF
Due to the geometric optics principle in any camera system, for a specific object position in FoV and camera configurations, back and front DoF regions in object space form the corresponding front and back DoFo regions in the image space, respectively. To arrive at
As per [32], 3D image is generated using fusion of focused multiple layer image planes at points
In order to compute a 3D image with more realistic depth, we consider the object, image, and camera as shown in Fig. 3(b). For the sake of simplicity, we consider the case where we position the given 3D scene, in which the top portion of the 3D scene as \begin{equation*} \frac {D_{lo} + \Delta D_{0l}} { d_{a}} = \frac {\Delta D_{0l}} {\delta _{0}}, \quad {~\text {where }} l=1,2,3. \tag{4}\end{equation*}
\begin{equation*} D_{lo} = \frac { \Delta D_{0l} \left ({ M d_{a} - \delta _{l} }\right )} {\delta _{l}} \tag{5}\end{equation*}
The depth estimation using geometric optics with Gaussian lens equation mentioned as:\begin{equation*} \frac {1}{\left ({ D_{lo}+ \Delta D_{0l} }\right )} +\frac {1}{d_{ls}-\Delta d_{0l} } = \frac {1}{f}, \tag{6}\end{equation*}
\begin{equation*} \frac {1}{\left ({ D_{lo} + \Delta D_{0l} }\right ) } =\frac { d_{a} \left ({ d_{ls} {-} f }\right ) {-} f \delta _{l}} { d_{a} f d_{ls}}. \tag{7}\end{equation*}
Further, we would like to mention that in J.N.P. Martel et al. [34] the real-time depth estimation is carried out by using a focal plane processor array with a tunable focus lens. Naturally, this method is not directly applicable to use the conventional camera because it involves changing both the image sensor and lens. Moreover, the tunable lenses are unstable in performance. But, our intention here is to estimate the depth using a rigid lens and fixed sensor with simple camera optics attachment with a computational approach from disparities. Magnification \begin{equation*} M= \frac {d_{ls} {-} f } {f},\;\;\; {~\text {where }} f {~\text {represents the focal distance. }}\quad \tag{8}\end{equation*}
Due to above equations, (7) can be re-written as \begin{equation*} \Delta D_{0l}= \frac { \left ({ d_{ls} d_{a} }\right )}{\left ({ M d_{a} - \delta _{l} }\right )} - D_{lo}, \tag{9}\end{equation*}
On substituting \begin{equation*} \Delta D_{0l} = \frac {\left ({ d_{ls} \delta _{l} }\right )} { M\left ({ M d_{a} - \delta _{l}}\right ) } \tag{10}\end{equation*}
Re-writing \begin{equation*} \delta _{l} = \frac { d_{a} \Delta d_{0l} } { \left ({ d_{ls} - \Delta d_{0l} }\right ) } \;\;\; l=1, 2,3 \cdots , \left ({ L - 1 }\right ) \tag{11}\end{equation*}
\begin{equation*} \Delta D_{0l} = \frac { \Delta d_{0l}} { M \left ({ M - \frac {\left ({ M + 1 }\right )}{d_{ls}} \Delta d_{0l}}\right ) }. \tag{12}\end{equation*}
Alternatively, (12) can be expressed in the form below:\begin{equation*} \Delta D_{0l} = \frac { \Delta d_{0l}} { M^{2} \left ({ 1 - \frac {\left ({ M + 1 }\right )}{M}\frac {\Delta d_{0l}}{d_{ls}} }\right ) }. \tag{13}\end{equation*}
In the similar lines, we can obtain \begin{equation*} \Delta D_{0l} = \frac { \Delta d_{0l}} { M^{2} \left ({ 1 + \frac {\left ({ M + 1 }\right )}{M}\frac {\Delta d_{0l}}{d_{ls}} }\right ) }. \tag{14}\end{equation*}
The interesting point regarding the above relationship is that the inter-depth of the virtual 3D object is expressed only as image space parameters and magnification of the camera, where the latter is measured without direct object space parameters. In Fig. 3(c), based on the geometric optics principle, we infer that transverse and axial magnifications of the camera setup will alter not only the size of the image but also the width of the object layers and corresponding image layers. Further, we also note that the equidistant points on the optical axis in the object space will distribute unequally on image space along the optical axis. The relationship shown in (13) is used for obtaining true depth at various points on the 3D object surface with its corresponding point on the generated 3D image surface. Note here that the relationship given in (10) can also be used to identify the inter-depth from disparity
D. 3D Virtual Object From Multiple Plane Images and Improved Inter-Depths
A 3D topographic surface of a non-transparent and non-speculative 3D object visible under FoV of any given camera along its optical axis is defined as a discrete luminance function represented in the form of a 2D array as below:\begin{equation*} G_{\left ({ i,j }\right )} = Z_{i,j} \hspace {0.3cm} \left ({i, j }\right ) \in \mathcal {D} \tag{15}\end{equation*}
E. Synthesis of Virtual 3D Object From 3D Perception Image
In this subsection, we utilize multiple planes for the object representation similar to MPIs as shown in Fig. 2(a) that is a one-to-one correspondence with the layers of MPI, except for the fact that inter-object layer as \begin{align*} G \left ({ i, j }\right ) \in \mathcal {D}=&\bigoplus \limits _{i=0}^ {\left ({ L - 1 }\right ) } \left ({ g \left ({ i, j }\right ) \in \mathcal R_{l}' }\right ) + \Delta D_{0l} \\ \mathcal {D}=&\bigcap \limits _{l=0}^{\left ({ L -1 }\right )} R_{l}' \;\;\; \& \hspace {0.5cm} \bigcup \limits _{i=0}^{\left ({ L -1 }\right )} R_{l}' = \emptyset \tag{16}\end{align*}
Indirectly, adding
Discussion on Experimental Evaluations for 3D Imaging System
In this section we discuss some simulated experimental results to validate the quantified 3D image generation formulated in previous sections.
A. MCA Camera Setup for Image Acquisition
The experimental setup is formed using the scheme shown in Fig. 3(a) and 3(b) with the specifications as the diameter of the aperture being
B. Data Set Creations for Evaluations
To demonstrate and evaluate the practical performance of the proposed approach, we need to create relevant image data set because there seems to be no standard color filter aperture image database. We have created the following sample MCA 2D RGB images by various arrangements with known inter-depths between the objects.
Scene consisting of non-overlapping similar color objects arranged one behind the other but focal distance is on one of the intermediate object,
Scene similar to above objects with different colors,
Scene with one object overlaps the other with varying sizes and colors,
Scene with an object in slant position, overlapped with its middle portion and visible at both sides,
Scene with different color and sized objects one behind the other such that mid portions are occluded.
Scene with both overlapped and non-overlapped objects.
Most of the above scene with inter-object depth is maintained between 0.5cm to 10.5cm gap.
C. Experimental Evaluations
This subsection describes both qualitative and quantitative results on images with non-trivial scenarios captured using MCA camera with color filter arrangement shown in given Fig. 1.
The qualitative evaluations on the depth map for all the generated data sets are discussed and the results are displayed in Fig. 4.
In col. (II) of all rows, we note that the 2.1D sketch yields all the regions in the scene image with comparatively good accuracy in the boundaries irrespective of their occlusions, which play a vital role in arriving at good 3D perception and actual 3D image.
Depth maps for MCA images shown in col. (III) of Fig. 4 is represented in terms of gray levels, where higher gray value for a region, implies that it is nearer to the lens. Also, it has a very narrow range of gray levels since the inter-object depth gap is minimal.
In col. (IV) of all rows indicates the scatter map. Rows (b), (e), (f), and (g) indicates the detection of the overlaps that exist in the images very accurately, and the same is utilized to arrive at MPI’s, resulting in good 3D images in the respective rows shown in col. (V).
From col. (IV), we see that the scatter map display depicts that the proposed algorithm resolves all regions very well.
Col. (VI) displays the running time in seconds to execute the algorithm for arriving at 3D virtual objects corresponding to the acquired 2D MCA image. The quoted runtime for algorithm is applicable for MATLAB based implementation that runs on specific hardware and software platform, having the Intel®core i5-4460 CPU @ 3.20GHz
with pre-loaded UBUNTU 16.04 LTS operating system and 8.00GB memory without considering any effort on many possible optimizations. There are many scopes for improving the execution time by adopting optimization of algorithm not only with respect to hardware and software on any intended application scenarios but also with the elimination of redundant computations and memory requirements.$\times 4$
Illustration scene images of salient scenarios with overlaps, inclined positioning with respect to the optical axis of the MCA camera, and various combinations are displayed as rows with its intermediate and final stages results as columns, col (I): MCA images, col (II): 2.1D sketch, col (III): Depth map, col (IV): scatter map col (V): Accurate inter-depth 3D image plot, col (VI): Runtime in seconds, row (a): MCA image of pots placed one behind the other with a gap 3cm with no overlap, row (b): MCA image of non-overlapped blue and green blocks with a gap of 3.5cm and orange block overlapped by blue block at one end and green block at the other end at a gap of 4cm, row (c): Six color blocks one behind other at a gap of 3cm, row (d): MCA image of four similar color bottles kept at one behind other with a gap 7cm, 9cm, and 8cm, row (e): MCA image of the wooden spatula is overlapped by a red tomato, orange carrot and, green tomato, which are placed at the gap of 0.5cm, 4cm, 4cm, and 6.2cm, row (f): MCA images of three different color and size cups overlapping each other placed one behind other at a gap of 6.5cm and 10.5cm row (g): MCA image of a knife in slant position along the optical axis with one end at 3cm front and the other end at 10cm behind overlapping orange carrot.
We present the detailed performance analysis as below:
On referring to row (c), row (e), and row (f), we see the error in estimating the respective inter-region depth is significantly less.
From row (a), row (c), and row (d), we notice that the error in estimated inter-depth is lesser when the actual gap between is less.
The estimated inter-depth value seeming to be less erroneous when the object is near to the focussed object (refer rows (a), (b), (d), (e), and (f)).
From Table 1, we note that the estimated inter-region depth in perception 3D image is not near-to the ground truth inter-objects depth. On the other hand, the computed inter-image depth in the 3D image from the newly derived relationship yields better results.
From the results shown in row (a), we could infer that the error increases as the actual gap between the objects increases.
As per row (e), the error is less when the gap between the objects is less, but on the other hand, it also exhibits the error that is comparatively more for the object that lies on the near focus side, though the object gap is same. Similar results are exhibited in rows (d) through (g).
On observing all rows (a) through (g), we infer that errors are minimal when the object depth is nearer to the focal distance.
The performance accuracy is similar in both rows (c) and (e).
From the results depicted in a row (b), we infer that the estimated depth is the same for objects A and B with less estimated error than the object C, which is behind object D. But, we note that it exhibits the slant orientation as expected.
From row (g), the estimated depth between B and C is comparatively lesser than that of inter-object depth between A and B since their inter-object is based on near-focus conditions. A similar characteristic is exhibited in row (c).
From Table 2, we observe that errors are less for near focus compared to far focus depth for all figures in rows (a) through (g).
We notice that in Table 2, for rows (b), (c), and (d) the errors are less near to the focus region than the errors away from focus regions.
Especially, in rows (a), (c), (d), and (e) with the focus region from the camera on one of the intermediate objects in the scene instead of the nearest object to the lens, error estimation for inter-depth are less for those objects behind focus object than the objects those in front of the focussed object. The same trend is true in the rows (b), (f), and (g), where focal distance is nearer to lens.
In general, the accuracy of the inter-object depth results are better when the object is focused, and the remaining objects are placed between the focus distance and distant endpoint in DoF zone of object space.
Table 2 shows that the relative depth measured for a near focus object has an error of 10%, and the percentage error increases gradually up to 57% as the objects are placed laterally away from the focused object.
Conclusion and Future Scope
We have presented a novel method of synthesizing a more accurate geometric depth 3D image from the 3D perception depth image using the newly formulated inter-depth relationship. We have derived a specific non-linear dependency between inter-depths of two objects lying in 3D scene and corresponding inter-depths of two MPIs in image space parameters under shallow DoF zone constraints. On the basis of this, a more accurate geometric depth 3D image synthesis from a single image has arrived as three steps: (i) Generation of 3D perception image from MPIs using its inter-depths computed from inter-image region boundaries disparities. (ii) The single 2D image is decomposed into one in-focus region and many out-of-focused regions caused due to varying CoCs using 2.1D sketch as a semantic image segmentation, and (iii) 3D image is composed by re-aligning the image surface and corresponding MPIs with accurate respective inter-depths.
The partial ordering of image regions using the 2.1D sketch enables us to determine the occlusions among the multiple objects in the scene better than the alpha-matting. Experiments show that the proposed method gives better results not only when the scene comprises multiple objects lying at different depths with dissimilar colors but also objects lying with the same color. Further, the proposed method has been demonstrated to yield smaller inter-depths of the order of few millimeters (range 5mm to 105mm gap between the objects in the 3D scene), making it quite qualified for real-time applications.
Few future scopes for research directions are enlisted as below:
Improving the accuracy of depth estimation for the objects lying on the off-axis with respect to the camera’s optical axis.
Definitely worth making comparative studies between the suggested approach with plenoptic imaging to understand the importance of optical lens arrays.
The extension of the above-discussed methods for microscopic 3D image analysis could be interesting for many machine inspection and bio-imaging applications.
Exploration on deep layer network for robust and more precise 3D image generation using a single image to arrive at improved performance over the suggested approach in this article.
Exploration on the computation of 3D virtual and 3D perception images using front DoF and back DoFo zone is a worthwhile exercise to see the performance regarding the accuracy in inter-depths.