Journals & Magazines >IEEE Access >Volume: 12

A Non-Rigid Three-Dimensional Image Reconstruction Algorithm Based on Deformable Shape Reliability

Calculate the low rank matrix of dynamic shape basis, use continuity as a constraint, design an objective function to suppress the amplitude of shape changes, and improve...

Abstract:

Most reconstruction algorithms for non-rigid three-dimensional (3D) images assume that non-rigidity can be represented as a linear combination of a fixed number of rigid ...Show More

Society Section: IEEE Reliability Society Section

Metadata

Abstract:

Most reconstruction algorithms for non-rigid three-dimensional (3D) images assume that non-rigidity can be represented as a linear combination of a fixed number of rigid bases. However, this assumption struggles to establish reliable shape functions and initial values for nonlinear and non-rigid motions, decreasing reconstruction reliability. This paper introduces an enhanced-reliability reconstruction algorithm for non-rigid 3D images. Our algorithm models the dynamic non-rigid shape basis as a low-rank matrix composed of image points and depth factors, improving the restoration of non-rigid shape base changes and providing accurate parameters for constructing objective functions. By leveraging manifold alignment and physical continuity constraints, our method optimizes the function structures. Assuming minimal reconstruction error and shape change, we solve for the motion structure parameters and select the key initial shape basis value by minimizing the objective function with the L-M nonlinear optimization method. Our experimental results on 3D image sequence reconstructions demonstrate significant error reduction, underscoring our model’s credibility, robust reliability, and minimal re-projection error.

Society Section: IEEE Reliability Society Section

Calculate the low rank matrix of dynamic shape basis, use continuity as a constraint, design an objective function to suppress the amplitude of shape changes, and improve...

Published in: IEEE Access ( Volume: 12)

Page(s): 76995 - 77008

Date of Publication: 14 May 2024

Electronic ISSN: 2169-3536

DOI: 10.1109/ACCESS.2024.3400884

Funding Agency:

Contents

SECTION I.

Introduction

The analysis and understanding of visual information by computers facilitate a more accurate simulation of the real world, giving rise to the field of computer vision [1]. This discipline primarily focuses on acquiring various real-world details through the capture of 2D images, encompassing shape and motion recognition of 3D scenes [2]. A prominent research area within this field is the recovery of motion scenes and their corresponding parameters from continuous image sequences, commonly referred to as 3D motion vision. While significant advances have been made in the reconstruction of rigid bodies, exploring non-rigid body reconstructions continues to present considerable challenges [3]. Non-rigid 3D image reconstruction seeks to recover a 3D model of non-rigid objects from 2D images captured from multiple viewpoints, employing image processing and computer vision techniques [4]. This technology has wide-ranging applications in movie production, game development, and industrial design [5], [6]. However, the complexity and diversity of non-rigid motion render non-rigid 3D image reconstruction a challenging task, with ensuring the reliability of shape bases computation standing out as a particular concern.

In the realm of 3D image reconstruction, Greffier et al. employed deep learning algorithms to reconstruct original images using methods such as Filtered Back Projection (FBP), enhanced AIDR 3D (AIDR 3De), and AiCE at three levels (mild, standard, and strong) [7]. This approach, which relies on linear assumptions for reconstruction, falls short in adequately addressing the impacts of nonlinear deformations, thus proving ineffective in establishing stable and reliable shape objective functions for nonlinear, non-rigid motion. On the other hand, Matthieu et al. suggested training deep neural networks (DNNs) as denoisers to learn a priori image models, which would replace hand-crafted proximal regularization operators in optimization algorithms. Their AIRI framework, aimed at imaging complex intensity structures from visibility data, merges the robustness and interpretability of optimization with the efficiency and learning capability of neural networks [8]. Nevertheless, this method encounters difficulties in practical applications, particularly in selecting suitable initial values for nonlinear non-rigid motion, resulting in low reliability of reconstruction results.

Lin et al. approached the challenge by converting optimal spatial deformation into a nonlinear regularized variational optimization problem, incorporating local smoothing and input constraints. They leveraged data parallelism and flash memory optimization strategies for online tracking and reconstruction of non-rigid scenes [9]. Despite these efforts, the method struggled to effectively manage non-linear non-rigid motion during the research phase, impacting the establishment of reliable shape objective functions and leading to suboptimal reconstruction outcomes. Murase’s study utilized a helical digital body phantom to generate degraded projection data, employing system function graphs and Gaussian noise. The entire system matrix (SM) was calculated by linking each projection data set with slices for 3D image reconstruction [10]. However, this approach did not adequately capture the characteristics and variations of non-rigid motion, resulting in unreliable initial values for non-linear non-rigid motion and diminished reconstruction reliability.

In Jo’s research, attention modules were introduced, and simple algebraic pre-smoothing techniques like Gaussian filtering were applied to data. This pre-smoothed data served to derive an operator for image reconstruction through dynamic mode decomposition [11]. Although this method facilitated camera calibration with rigid parts and subsequent application to non-rigid reconstruction, it fell short in establishing stable and reliable shape objective functions and initial values for all non-rigid motions, posing challenges for generalization in practical applications.

SECTION II.

Design of Non-Rigid 3D Image Reconstruction Algorithm

A. The Calculation of Low-Rank Matrices for Dynamic Shape Bases in Non-Rigid 3D Image Reconstruction

Non-rigid body shapes undergo specific transformations at different time points. To effectively complete non-rigid 3D image reconstruction, it is essential to accurately describe these dynamic shape changes. In this context, we utilize a low-rank matrix, combined with image points and depth factors, to represent the data structure of basis parameters in dynamic shape for non-rigid 3D image reconstruction. This approach enables the capture of both local and global features of the shape, thereby better restoring the original non-rigid shape basis’s change process and providing precise manifold parameter values for the structure of the objective function.

The process for calculating the low-rank matrix for the dynamic shape basis in non-rigid 3D image reconstruction involves several steps:

Digitalization of Non-Rigid 3D Images: Utilize the coordinates and depth factors of all image points to digitize the non-rigid 3D image.

Construction of the Point Cloud Matrix: Based on the 3D image, construct a point cloud matrix of the shape for a specific frame.

Singular Value Decomposition (SVD): Use singular value decomposition to arrange the constructed matrix in descending order of importance. Then, select the first kth singular values to retain the low-rank matrix for describing the dynamic shape basis in non-rigid 3D image reconstruction.

It is assumed that a classic pinhole camera model facilitates the non-rigid 3D image acquisition process [12], [13], allowing the non-rigid 3D image to be described as follows: $\begin{equation*} \lambda \mathbf {U=\mathbf {K}}\left ({{ \mathbf {Rt} }}\right )\mathbf {w}=\mathbf {Pw} \tag {1}\end{equation*}$ View Source where $\lambda$ denotes the depth factor, $\mathbf {U}=\left ({{ u \, v\mathrm {1} }}\right)^{\mathrm {T}}$ is the homogeneous coordinates of the image point, $\mathbf {K,\mathbf {R}}$ and t denote the camera’s intrinsic matrix, rotation matrix and translation vector corresponding to the camera’s shooting position, respectively, and R and t also perform the camera’s extrinsic matrix, $\mathbf {w=}\left ({{ x \, y \, z\mathrm {1} }}\right)^{\mathrm {T}}$ and $\mathbf {P=\mathbf {K}(\mathbf {Rt})}$ denote homogeneous coordinates of 3D points and camera projection matrix.

With total F frames and N 3D spatial points, the equation is transformed as follows for the f^th frame: $\begin{equation*} \mathbf{U}_f \lambda_f=\mathbf{K}\left(\mathbf{R}_f \mathbf{t}_f\right) \mathbf{Y}_f \tag {2}\end{equation*}$ View Source where $\mathbf {U}_{f}\mathrm {=}(u_{f,1},u_{f,2},\ldots , u_{f,N})$ is the image matrix consists of the points from the f-th image, $\lambda _{f}$ represents the diagonal matrix composed of all depth factors $\lambda _{f,i}$ , $\mathbf {Y}_{f}\mathrm {=}(w_{f,1},w_{f,2},\ldots , w_{f,N})$ denotes a low-rank matrix composed of all 3D spatial points at time f.

Finally, singular value decomposition is performed on the shape point cloud matrix of the constructed f^th frame image. The singular values are arranged in descending order, and the first k^th singular values are retained. The low rank matrix of the dynamic shape basis for non-rigid 3D image reconstruction can be obtained as: $\begin{equation*} \mathbf {Y}_{f}=\frac {\mathbf {I}_{\mathrm {k}}u_{f_{k}}\lambda _{f_{k}}}{\mathbf {K(}\mathbf {R}_{f_{k}}\mathbf {t}_{f_{k}})} \tag {3}\end{equation*}$ View Source where, $\mathbf {I}_{\mathrm {k}}$ represents an orthogonal matrix.

Thus, the low rank matrix calculation of dynamic shape basis for non-rigid three-dimensional image imaging is completed. When an object undergoes rigid motion, the 3D spatial point $\mathrm {Y}_{\mathrm {f}}$ remains unchanged throughout the motion. Otherwise, $\mathrm {Y}_{\mathrm {f}}$ changes at each time instance f, representing a dynamic process of shape base variation.

B. Design of A Strongly Reliable Objective Function for Non-Rigid Image Reconstruction

1) The Proposed Model

The enhancement of reliability in our current method fundamentally addresses a nonlinear optimization challenge for dynamic shape bases. This involves identifying the minimum value within a specific parameter structure. During the training of parameters for non-rigid motion structures, the chosen objective function aims to minimize the discrepancy between the 3D coordinates of the manifold and the transformed 2D coordinates of the manifold group. This is based on assumed motion structure parameters within the photography model, ensuring that the derived motion structure parameters align closely with the actual observed image points and their theoretically predicted 3D shapes. By reducing this discrepancy, we enhance both the quality and accuracy of the reconstruction results.

Thus, the developed objective function effectively synchronizes the motion structure parameters with the manifold, ensuring reliable shape reconstruction. In the context of non-rigid motion, the structure parameters across continuous frames exhibit minimal changes. Such physical continuity is leveraged as a constraint in deriving the motion structure parameter matrix. By incorporating physical continuity, the objective function is designed to dampen the amplitude of shape changes, thereby bolstering the reliability of the motion structure parameters. In essence, the objective function employs manifold alignment and physical continuity constraints to reconcile motion structure parameters with observed data, yielding accurate and reliable outcomes. This approach also stably navigates the challenges posed by imperfect data.

Therefore, building on the previously described calculation of the low-rank matrix for the dynamic shape basis in non-rigid 3D image reconstruction, this process delineates the data structure of parameters associated with this dynamic shape basis. Additionally, it facilitates the acquisition of manifold parameter values amid changes in the non-rigid shape basis. During the projection of the manifold motion group, the manifolds $c,d,e,f$ and the centroid manifold C are used to describe the translation of the object. Assuming that the matrix formed by the 2D coordinates of these five points in the $i^{\mathrm {th}}$ frame is denoted as ${\boldsymbol {\Phi }}_{i}$ , and the initial 3D coordinates are denoted as ${\boldsymbol {\Phi }}_{i}^{\prime }$ , then $\begin{align*} \boldsymbol {\Phi }_{i}& =\left [{{\begin{array}{cccccccccccccccccccc} u_{ic}^{\prime } & \, u_{id}^{\prime } & \, u_{ie}^{\prime } & \, u_{if}^{\prime } & \, u_{iC}^{\prime } \\ v_{ic}^{\prime } & \, v_{id}^{\prime } & \, v_{ie}^{\prime } & \, v_{if}^{\prime } & \, v_{iC}^{\prime } \\ \end{array}}}\right ] \tag {4}\\ \boldsymbol {\Phi }_{i}^{\prime }& =\left [{{\begin{array}{cccccccccccccccccccc} c_{ix} & \, c_{ix} & \, c_{ix} & \, c_{ix} & \, c_{ix} \\ c_{ix} & \, c_{ix} & \, c_{ix} & \, c_{ix} & \, c_{ix} \\ c_{ix} & \, c_{ix} & \, c_{ix} & \, c_{ix} & \, c_{ix} \\ \end{array}}}\right ] \tag {5}\end{align*}$ View Source

The transformation function resulting from the equation after camera transformation is: $\begin{equation*} \boldsymbol{\Phi}_{\mathbf{i}}=[\text { quater }(\mathrm{R})] \times\left[\text { quater }\left(\mathrm{Q}_i\right) \times \boldsymbol{\Phi}_{\mathbf{i}}^{\prime}+\mathrm{T}_i\right] \tag {6}\end{equation*}$ View Source

In Equation (6), $\mathrm {quater}\left ({{ \mathrm {R} }}\right)$ represents the camera rotation matrix, $\mathrm {quater}(\mathrm {Q}_{i})$ represents the rotation vector of the $i^{\mathrm {th}}$ shape basis, $\mathbf {T}_{i}$ represents the translation vector of the $i^{\mathrm {th}}$ shape basis.

By subjecting each assumed manifold for each frame to the aforementioned camera transformation, we can obtain a 2D matrix $\mathrm {\boldsymbol {\Phi }}$ composed of the five manifolds in the F frames. $\begin{align*} \boldsymbol {\Phi }_{\mathrm {i}}^{\prime }=\left [{{\begin{array}{cccccccccccccccccccc} u_{1c}^{\prime } & \, u_{1d}^{\prime } & \, u_{1e}^{\prime } & \, u_{1f}^{\prime } & \, u_{1C}^{\prime } \\ v_{1c}^{\prime } & \, v_{1d}^{\prime } & \, v_{1e}^{\prime } & \, v_{1f}^{\prime } & \, v_{1C}^{\prime } \\ \ldots & \, \ldots & \, \ldots & \, \ldots & \, \ldots \\ u_{Fc}^{\prime } & \, u_{Fd}^{\prime } & \, u_{Fe}^{\prime } & \, u_{Ff}^{\prime } & \, u_{FC}^{\prime } \\ v_{Fc}^{\prime } & \, v_{Fd}^{\prime } & \, v_{Fe}^{\prime } & \, v_{Ff}^{\prime } & \, v_{FC}^{\prime } \\ \end{array}}}\right ] \tag {7}\end{align*}$ View Source

Equation (7) provides the solution of 2D manifold coordinates through camera transformation with the assumed 3D coordinates of manifolds. After taking the difference between the matrix $\Phi$ and the actual measured values of 2D coordinates in the manifold group, shown in Equation (8), we obtain the matrix ${\mathbf {DE}}_{1}$ . Finally, we compute the two-norm of all elements in this matrix to obtain ${\mathrm {de}}_{1}$ : $\begin{equation*} {\mathrm {de}}_{1}=\sqrt {\sum \limits _{j=1}^{5} \sum \limits _{i=1}^{2F} \left \|{{ {\mathbf {DE}}_{1} }}\right \|^{2}} =\sqrt {\sum \limits _{j=1}^{5} \sum \limits _{i=1}^{2F} \left \|{{ \mathrm {\boldsymbol {\Phi }}_{ij}-\mathbf {M}_{ij} }}\right \|^{2}} \tag {8}\end{equation*}$ View Source

During the optimization and training of the entire motion structure matrix, the displacement and velocity changes of feature points between consecutive frames are slow for a high-speed camera [14], [15]. Reflecting such a property in the rotation matrix Q and translation matrix T of non-rigid body motion, the variation in the rotation matrix $\mathbf {Q}^{\mathrm {T}}$ reflects the change in rotation angle between consecutive frames according to the displacement difference formula $\mathrm {\Delta }S=S_{i}-S_{i-1}$ (where $S_{i}$ represents the displacement of a point in the $i^{\mathrm {th}}$ frame). Similarly, the expression of displacement difference S in the translation matrix T represents the change in the non-rigid body’s translational position during motion. Therefore, the parameters in the $i^{\mathrm {th}}$ frame should exhibit very small changes compared to the ( $i-1$ )^th frame. Here, these two matrices are combined into one matrix RS with size $F6$ . $\begin{align*} {\mathbf {RS}}_{F6}=\left [{{\begin{array}{cccccccccccccccccccc} \theta _{1X} & \, \theta _{1Y} & \, \theta _{1Z} & \, X_{1} & \, Y_{1} & \, Z_{1} \\ \ldots & \, \ldots & \, \ldots & \, \ldots & \, \ldots & \, \ldots \\ \theta _{iX} & \, \theta _{iY} & \, \theta _{iZ} & \, X_{i} & \, Y_{i} & \, Z_{i} \\ \ldots & \, \ldots & \, \ldots & \, \ldots & \, \ldots & \, \ldots \\ \theta _{FX} & \, \theta _{FY} & \, \theta _{FZ} & \, X_{F} & \, Y_{F} & \, Z_{F} \\ \end{array}}}\right ] \tag {9}\end{align*}$ View Source

Based on the above discussion, the difference between the $i^{\mathrm {th}}$ and the ( $i-1$ )^th the frames of ${\mathbf {RS}}_{F6}$ combines to a new matrix $\mathbf {R}\mathbf {S}^{\prime }$ with size $(F\mathrm {-1)6}$ in Equation (10), as shown at the bottom of the next page. $\begin{align*} {\mathbf {RS}}_{(F\mathrm {-1)6}}^{\prime }=\left [{{\begin{array}{cccccccccccccccccccc} \theta _{2X}-\theta _{1X} & \theta _{2Y}-\theta _{1Y} & \theta _{2Z}-\theta _{1Z} & X_{2}-X_{1} & Y_{2}-Y_{1} & Z_{2}-Z_{1} \\ \ldots & \ldots & \ldots & \ldots & \ldots & \ldots \\ \theta _{FX}-\theta _{F-1X} & \theta _{FY}-\theta _{F-1Y} & \theta _{FZ}-\theta _{F-1Z} & X_{F}-X_{F-1} & Y_{F}-Y_{F-1} & Z_{F}-Z_{F-1} \\ \end{array}}}\right ] \tag {10}\end{align*}$ View Source Each element of this matrix represents the variation in parameters between consecutive frames in non-rigid body motion. Such changes are minimal during the capture process of a camera.

Due to the continuity of non-rigid body motion and the motion structure parameters reflected in the image sequence obtained through high-speed capture [16], [17], the variation between consecutive frames is very small. This introduces the first constraint ${\mathrm {de}}_{2}$ added during the construction of the objective function, describing the variation in the displacement of parameters between consecutive frames. $\begin{equation*} {\mathrm {de}}_{2}=\sqrt { \sum \nolimits _{j=1}^{6} \sum \nolimits _{i=1}^{F-1} \left \|{{ {\mathbf {RS}}_{ij}^{\prime }}}\right \|^{2}} \tag {11}\end{equation*}$ View Source

Image sequences taken from the high-speed camera have a resolution of $640 \times 480$ and a frame rate of at least 1000 frames per second, save rapidly moving objects and subtle changes, and provide more accurate details. Assuming the time interval t between frames of the extracted image sequence, and the velocity interval for each frame’s image capture is a small constant. The physical motion laws shows: $\begin{equation*} \bar {v_{i}}=\frac {\Delta s}{t}=\frac {S_{i}-S_{i-1}}{t} \, \, \mathrm { and } \, \, \Delta v=\bar {v_{i}}-\bar {v_{i-1}} \tag {12}\end{equation*}$ View Source

The ratio of the displacement difference of an object during this time interval to t is the average velocity of the motion. Similarly, for a high-speed image sequence, such changes in average velocity are also very small. Assuming the average velocity from the $i^{\mathrm {th}}$ to the ( $i-1$ )^th frame is $\overline {v_{i}}$ , then the average velocity between the ( $i+1$ )^th frame and the i^th frame is $\overline {v_{i+1}}$ . This physical law is used into the motion parameter matrix of the non-rigid body, shown as the difference matrix ${\mathbf {RS}}^{\mathrm {''}}$ between the $i^{\mathrm {th}}$ and ( $i-1$ )^th frames with size $(F\mathrm {-2) \times 6}$ . At this point, a constraint ${\mathrm {de}}_{3}$ is obtained, describing the velocity changes in non-rigid body motion. $\begin{align*} {\mathrm {de}}_{3}& =\sqrt { \sum \nolimits _{j=1}^{6} \sum \nolimits _{i=1}^{F-2} \left \|{{ {\mathbf {RS}}_{ij}^{\mathrm {''}} }}\right \|^{2}} \\ & =\sqrt { \sum \nolimits _{j=1}^{6} \sum \nolimits _{i=1}^{F-1} \left \|{{ {\mathbf {RS}}_{ij}^{\prime }-{\mathbf {RS}}_{i-1j}^{\mathrm {''}} }}\right \|} \tag {13}\end{align*}$ View Source

After this discussion, we have obtained the error ${\mathrm {de}}_{1}$ between the points describing the transformation of the manifold motion group and the measured points as the main subject for minimizing. Additionally, we have two constraints ${\mathrm {de}}_{2}$ and ${\mathrm {de}}_{3}$ describing the displacement and velocity changes of parameters in the non-rigid body motion process. In a set of high-speed captured image sequences, the variations in these two constraints between consecutive frames should be very small. This results in a strong and reliable objective function $f\mathrm {(\mathbf {R,Q,T})}$ used to determine the motion structure parameter matrix. $\begin{align*} f\left ({{ \mathbf {R,Q,T} }}\right )& =\sum \nolimits _{j=1}^{6} \sum \nolimits _{i=1}^{3} {w_{\mathrm {i}}{\mathrm {de}}_{\mathrm {i}}} \\ & =w_{1}\sqrt {\sum \nolimits _{j=1}^{5} \sum \nolimits _{i=1}^{2F} \left \|{{ \mathrm {\boldsymbol {\Phi }}_{ij}-\mathbf {M}_{ij} }}\right \|^{2}} \\ & \quad +w_{2}\sqrt {\sum \nolimits _{j=1}^{6} \sum \nolimits _{i=1}^{F-1} \left \|{{ {\mathbf {RS}}_{ij}^{\prime }}}\right \|^{2}} \\ & \quad +w_{3}\sqrt { \sum \nolimits _{j=1}^{6} \sum \nolimits _{i=2}^{F-1} \left \|{{ {\mathbf {RS}}_{ij}^{\prime }-{\mathbf {RS}}_{i-1j}^{\prime }}}\right \|^{2}} \tag {14}\end{align*}$ View Source

$\mathop {\mathrm {Min}}\limits _{\mathbf {Q,R,T}} \left \|{{ f\mathrm {(\mathbf {Q,R,T})} }}\right \|$ is the objective function used to train the motion structure parameter matrix, which consists of three motion structure parameter matrices $\mathbf {Q,R,T}$ that describe camera transformations. These matrices represent the rotation matrix of the shape basis, the camera rotation matrix, and the translation matrix of the shape basis, respectively; $w_{1},w_{2},w_{3}$ are the weight coefficients of different constraint parts in the entire objective function. By considering the constraints of reconstruction error and minimal frame changes before and after shape deformation, the objective function aims to achieve its minimum value, ensuring that the reconstruction results are closer to the actual observation data and minimizing unreasonable shape changes. Therefore, by minimizing the objective function $f\mathrm {(\mathbf {Q,R,T})}$ , the reliability of the reconstruction results can be improved. Next, based on the above, the Levenberg-Marquardt (L-M) nonlinear optimization method is then employed to determine these motion structure parameters [18], and complete the selection of key initial values for the shape basis.

2) The Selection of Initial Values

The essence of solving the problem using nonlinear optimization is to determine the description of the rotation matrix Q and the translation matrix T. In the L-M optimization method, these two parameters are treated as unknown. First, introducing the initialization method for the rotation matrix Q:

In this paper, a method combining factorization techniques is used for the initialization of the rotation matrix Q [19]. As analyzed in Chapter Three, considering a known measurement matrix $\mathrm {W}_{2FP}$ composed of feature points $\begin{aligned} \left [{{\begin{array}{cccccccccccccccccccc} u_{ij} \\ v_{ij} \\ \end{array}}}\right]{\begin{array}{cccccccccccccccccccc} i\mathrm {=1,\ldots ,}F \\ j\mathrm {=1,\ldots ,}P \\ \end{array} } \end{aligned}$ from each frame of images (where F is the number of image frames and P is the number of feature points in one frame), the objective is to determine the 3D structure $\tilde {S}_{{i}_{3P}}$ and the rotation matrix $\mathbf {R}_{{i}_{\mathrm {3 \times 3}}}$ for each frame image.

Assuming that the 3D shape of the non-rigid body is a weighted linear combination of shape bases, we have: $\tilde {S}_{i}=\sum \nolimits _{l=1}^{K} {\omega _{il}S_{l}}$ , where $\omega _{il}$ is the weight coefficients, $S_{l}$ is the shape bases, and K is the number of shape bases. Therefore, based on the number of shape bases K and the weighting coefficient $\omega _{il}$ , when $K\mathrm {=1,}\omega _{il}=1$ , corresponding to the case of a rigid object; When $K\mathrm {\gt 1,}\omega _{il}\mathrm {\gt 1}$ , it corresponds to the case of non-rigid objects.

Under the weak perspective projection model, there is $\begin{align*} \left [{{\begin{array}{cccccccccccccccccccc} u_{i1},\ldots ,u_{iP} \\ v_{i1},\ldots ,v_{iP} \end{array}}}\right ]=\overline R _{i}\left ({{\sum \nolimits _{\mathrm {l=1}}^{K} {\omega _{il}S_{l}} }}\right )+\overline T_{i}\mathbf {e}_{n}^{\mathrm {T}} \tag {15}\end{align*}$ View Source where $\overline R _{i}$ represents the first two rows of the rotation matrix $\mathbf {R}_{{i}_{\mathrm {3 \times 3}}}$ , $\overline T _{i}$ is the first two elements of the translation vector $\mathbf {T}_{{i}_{\mathrm {3 \times 1}}}$ , and $\mathbf {e}_{n}^{\mathrm {T}}$ represents the position vector of the object in the camera coordinate system, $\mathbf {e}_{n}^{\mathrm {T}}\mathrm {=[1,\ldots ,1}]_{\mathrm {1 \times }n}$ .

Transforming Equation (15) with the 2D coordinate origin as the centroid, we have $\bar {\mathbf {W}} =\mathbf {M}_{2F3K}\mathbf {B}_{3KP}$ , where $\mathbf {M}_{2F3K}$ represents the projection from the 3D coordinates of the shape basis to 2D image plane, $\mathbf {B}_{3K \times P}$ represents the three-dimensional shape coordinates obtained from the shape transformation of P feature points using the shape basis matrix B.

The matrix $\mathbf {M}_{2F \times 3K}$ contains information about the rotation matrix. To further decompose the motion matrix M and obtain the rotation matrix $[\bar {\mathbf {R}}_{i}]$ and the weighting coefficient $\omega _{il}$ , we perform singular value decomposition (SVD) [20], [21] on M by rearranging M with a matrix block. $\begin{equation*} \mathbf {M=}\left [{{ \mathbf {M}_{1}^{\mathrm {T}},\mathbf {M}_{2}^{\mathrm {T}},\ldots ,\mathbf {M}_{F}^{\mathrm {T}} }}\right ]^{\mathrm {T}} \tag {16}\end{equation*}$ View Source

Then, each matrix block is given by: $\begin{equation*} \boldsymbol {M}_{i}=\left [{{\begin{array}{cccccccccccccccccccc} \omega _{i1}\overline {\mathbf {R}}_{i} & {.} & {.} & {.} & \omega _{iK}\overline {\mathbf {R}}_{i} \end{array}}}\right ]_{\mathrm {2 \times 3}K} \tag {17}\end{equation*}$ View Source where $\begin{aligned} \overline {\mathbf {R}}_{i}=\left [{{\begin{array}{cccccccccccccccccccc} r_{i1} & r_{i2} & r_{i3} \\ r_{i4} & r_{i5} & r_{i6} \\ \end{array}}}\right] \end{aligned}$ . Reordering the matrix blocks, we have: $\begin{align*} \mathbf {\widetilde {M}}_{i}=\left [{{\begin{array}{cccccccccccccccccccc} \omega _{i1} \\ \omega _{i2} \\ . \\ . \\ . \\ \omega _{iK} \\ \end{array}}}\right ]\left [{{\begin{array}{cccccccccccccccccccc} {\begin{array}{cccccccccccccccccccc} r_{i1} & r_{i2} & . \end{array}} & . & . & r_{i6} \end{array}}}\right ]=\bar {\boldsymbol {\Omega }_{i}}{\boldsymbol {\overline{\mathfrak {R}}}_{i}} \tag {18}\end{align*}$ View Source

Clearly, the most rank of $\mathbf {\widetilde {M}}_{i}$ is 1. Therefore, performing SVD on $\mathbf {\widetilde {M}}_{i}$ can decompose it into the deformed rotation matrix $\boldsymbol {\overline{\mathfrak {R}}}_{i}$ and the weighted coefficient matrix $\bar {\boldsymbol {\Omega }_{i}}$ . By applying SVD to each of the F matrices $\mathbf {\widetilde {M}}_{i}$ , we can obtain the deformed rotation matrices $\boldsymbol {\overline{\mathfrak {R}}}_{i}$ and the weighted coefficient matrices $\bar {\boldsymbol {\Omega }_{i}}$ for each frame.

The decomposition result of Equation (18) is still not unique. Thinking about $\mathbf {\widetilde {M}}_{i}=\bar {\boldsymbol {\Omega }_{i}}C\frac {1}{C}\boldsymbol {\overline{\mathfrak {R}}}_{i}$ for any non-zero constant C, the C can be optimized by minimizing Equation (19): $\begin{equation*} f(C)=\min \left \|{{ \mathrm {\boldsymbol {\Omega }}_{i}-\boldsymbol {\Omega }_{i-1} }}\right \|_{F} \tag {19}\end{equation*}$ View Source Further, we can obtain ${\boldsymbol {\mathfrak {R}}_{i}}=\frac {1}{C}\boldsymbol {\overline{\mathfrak {R}}}_{i}$ and $\mathrm {\boldsymbol {\Omega }}_{i}=\bar {\boldsymbol {\Omega }}_{i}C$ .

Adjusting the row vector $\boldsymbol {\mathfrak {R}}_{i}$ yields a $2 \times 3$ rotation matrix $\bar {\mathbf {R}}_{i}$ . However, the rotation matrix obtained through the singular value decomposition of $\mathbf {\widetilde {M}}_{i}$ is $\begin{aligned} \overline {\mathbf {R}}_{i}=\left [{{\begin{array}{cccccccccccccccccccc} r_{i1} & r_{i2} & r_{i3} \\ r_{i4} & r_{i5} & r_{i6} \\ \end{array} }}\right] \end{aligned}$ . Therefore, in order to avoid singular value problems, maintain the continuity of rotation, and reduce numerical errors, the quaternion method is employed to address rotation matrices in this study [22]. The rotation matrix part for each frame is expressed using three parameters $\mathbf {R}_{i}=\left [{{ a_{i},b_{i},c_{i} }}\right]$ .

According to the quaternion method and the exponential mapping of the rotation matrix, $a_{i}=r_{i6}\mathrm {;}b=-r_{i3}\mathrm {;}c=r_{i2}$ is used as the initial value for the rotation matrix $\mathbf {Q}_{i}$ in the $i^{\mathrm {th}}$ frame [23]. Once $\mathbf {Q}_{i}$ is obtained, it is substituted into the objective function. At this point, $\mathbf {T}_{i}$ is initialized by a random initialization method. Then, a nonlinear optimization method is applied to the Equation (15) to obtain a set of values for $\mathbf {T}_{i}$ , denoted as $\mathbf {T}_{i}^{\prime }$ . Then, $\mathbf {T}_{i}^{\prime }$ and $\mathbf {Q}_{i}$ are used as the initial values for the final estimation of the motion structure parameter matrix. This process significantly enhances the reliability of the following reconstruction.

C. Euclidean Reconstruction

With the motion structure parameter matrices $\mathbf {Q,R,T}$ , the known 2D feature points of image sequence are optimized to obtain the 3D coordinates for each point. Let the known 2D feature points $\begin{aligned} \left [{{\begin{array}{cccccccccccccccccccc} u_{ij} \\ v_{ij} \\ \end{array}}}\right]{\begin{array}{cccccccccccccccccccc} i=1,\ldots ,F \\ j=1,\ldots ,P \\ \end{array} } \end{aligned}$ form the measurement matrix $\begin{align*} \mathbf {W}_{2F \times P}=\left [{{\begin{array}{cccccccccccccccccccc} u_{11} & \ldots & u_{1p} \\ v_{11} & \ldots & u_{1p} \\ \ldots & \ldots & \ldots \\ u_{F1} & \ldots & u_{Fp} \\ v_{F1} & \ldots & u_{Fp} \\ \end{array}}}\right ]\end{align*}$ View Source(where F is the number of frames in the image and P is the number of feature points in the image), the reconstruction of non-rigid motion is as follows: For each point P in the F frames, the known 2D coordinates of this point can be represented as: $\begin{equation*} \mathbf {W}_{j}=\left [{{ u_{1j},v_{1j},\ldots ,u_{Fj},v_{Fj} }}\right ]^{\mathrm {T}} \tag {20}\end{equation*}$ View Source

$\mathbf {W}_{j}$ represents the $j^{\mathrm {th}}$ feature point in the image sequence, and it is a known feature point obtained from the image sequence. The 3D reconstruction based on nonlinear optimization uses an assumed 3D feature point $\begin{aligned} \mathbf {W}_{j}^{\prime } =\left [{{\begin{array}{cccccccccccccccccccc} X_{1j} & Y_{1j} & Z_{1j} \\ \ldots & \ldots & \ldots \\ X_{Fj} & Y_{Fj} & Z_{Fj} \\ \end{array}}}\right] \end{aligned}$ as the parameters to be solved. After transforming it with the obtained parameter matrix, the 2D coordinates $\mathbf {T}_{j}=\left [{{ u^{\prime }_{1j},v^{\prime }_{1j},\ldots ,u^{\prime }_{Fj},v^{\prime }_{Fj} }}\right]^{\mathrm {T}}$ of the points are obtained.

Then the matrix $\mathrm {d}\mathrm {e}_{4}$ shows the difference between $\mathbf {W}_{j}$ and $\mathbf {T}_{j}$ , which is the main part of the objective function. For the $j^{\mathrm {th}}$ feature point in the $i^{\mathrm {th}}$ frame, the camera transformation is given by: $\begin{align*} \mathrm {d}\mathrm {e}_{4}\!=\!\mathrm {T}_{ij}\!-\!\mathrm {W}_{ij}=\left [{{ \mathrm {quater}\left ({{ \mathrm {R} }}\right ) }}\right ]\! \times \!\left [{{ \mathrm {quater}\left ({{ \mathrm {Q}_{i} }}\right )\mathrm {W}_{ij}^{\prime }\!+\!\mathrm {T}_{i} }}\right ]\!-\!\mathrm {W}_{ij} \tag {21}\end{align*}$ View Source

In Equation (21), the known parameter is the $j^{\mathrm {th}}$ feature point in the measurement matrix, and the unknown parameter of the objective function is the 3D coordinate point $\mathbf {W}_{ij}^{\prime }$ to be determined. Due to the continuity of the object’s motion, the change in the displacement of the matrix de₄ between consecutive frames should be very small according to the displacement difference formula $\mathrm {\Delta }S=S_{i}-S_{i-1}$ . Based on this constraint, the strong reliability objective function for the non-linear optimization of the j-th feature point $\mathbf {W}_{j}^{\prime }$ is obtained: $\begin{equation*} \mathop {\mathrm {Min}}\limits _{\mathbf {w}_{j}^{'}}\left \|{{ f(\mathbf {W}_{j}^{\prime }) }}\right \|=\sqrt {\sum \nolimits _{i\mathrm {=3}}^{2F} \left \|{{ \mathbf {DE}\mathbf {4}_{i1}\mathrm {-\mathbf {DE}}\mathbf {4}_{i-21} }}\right \|^{2}} \tag {22}\end{equation*}$ View Source

Equation (22) is the objective function for the 3D reconstruction of the $j^{\mathrm {th}}$ feature point. Using the L–M non-linear optimization method, $\mathbf {W}_{j}^{\prime }$ is calculated as the 3D coordinates of the $j^{\mathrm {th}}$ point with size $3F$ . Then, applying the same 3D reconstruction to all feature points, the 3D coordinates of P feature points are represented by TS. $\begin{align*} \mathbf {T}\mathbf {S}_{3FP}=\left [{{\begin{array}{cccccccccccccccccccc} \mathbf {W}_{1}^{\mathrm {T}} \\[2.3pt] \mathbf {W}_{2}^{\mathrm {T}} \\[2.3pt] \mathrm {\ldots } \\[2.3pt] \mathbf {W}_{P}^{\mathrm {T}} \end{array}}}\right ] \tag {23}\end{align*}$ View Source

Equation (23) represents a matrix of size $3FP$ , which is the result of the 3D reconstruction of all feature points. This matrix represents the 3D coordinates of the feature points, enhancing the reliability of the reconstruction.

In summary, the proposed algorithm for non-rigid 3D image reconstruction addresses the significant limitations identified in existing algorithms, particularly their challenges in establishing stable shape objective functions and determining initial values for nonlinear, non-rigid body movements. The methodology of the algorithm unfolds in a series of meticulously designed steps. Initially, it employs image points and depth factors to construct a low-rank matrix that accurately describes the dynamic shape basis of non-rigid bodies. This matrix not only facilitates a better restoration of the transformations of the non-rigid body shape basis but also provides precise manifold parameters essential for the formulation of objective functions. Subsequently, the algorithm leverages manifold alignment and physical continuity constraints to optimize the construction of the objective function. This optimization step is crucial for aligning the motion structure parameters with the observed data and reducing the amplitude of shape changes, thereby ensuring the accuracy and reliability of the reconstruction results.

Further enhancing the robustness of the algorithm, constraints on reconstruction error and minimal frame-to-frame shape changes are integrated. These constraints ensure that the minimization of the objective function yields reconstruction results that closely align with actual observational data, while also preventing unrealistic alterations in shape, thereby improving the reliability of the reconstruction outcomes. Following this, the L-M (Levenberg-Marquardt) nonlinear optimization method is applied to solve for the motion structure parameters and to select the key initial values of the shape basis, based on the minimized objective function. The culmination of these steps is the Euclidean reconstruction, which, by using the solved motion structure parameters, obtains the 3D coordinates of each point, achieving a reliable reconstruction process. This comprehensive approach not only overcomes the deficiencies of prior models but also sets a new benchmark for accuracy and reliability in non-rigid 3D image reconstruction.

SECTION III.

Experimental Results

A. Experimental Parameters

Experiment data is the Stanford 40 Actions dataset from the Stanford Image Library, which is a large-scale image and video database created by the Stanford Vision Lab. This database contains multiple datasets, covering different tasks in fields such as computer vision and machine learning.

The Stanford 40 Actions dataset contains approximately 9500 video clips, of which 6000 are male and 3500 are female. There are 2500, 5000, and 2500 video clips for different age distributions, including adolescents, young people, and elderly people. And this dataset covers 40 different categories of human actions. Therefore, use this dataset to complete the reconstruction performance test of the proposed algorithm. To ensure the effectiveness of the test, it is necessary to set the experimental parameters, as shown in Table 1.

TABLE 1 Parameter Settings

B. Impact of the Image Number on the Proposed Model

Taking dynamic facial images as an example to verify the impact of the image number on the proposed algorithm [24], first a set of changing camera intrinsic parameters $f_{i}\mathrm {=800+}i$ (i representing the i-th dynamic image) is generated, with scale factor 1.1, distortion factor 0.5, and intrinsic matrix $\left ({{ u_{0},v_{0} }}\right)=\left ({{ 320,240 }}\right)$ .

Then, 100 3D points within a unit sphere are randomly generated and divided into three rigid body elements: the first rigid body element consists of the first 50% of the space points; the second consists of the middle 30%, and the third consists of the last 20%. Simultaneously, the external parameter matrix of the camera is varied to generate images of sizes $640 \times 480$ from 20 to 200 frames. Adding 1 pixel of Gaussian noise to each image 100 times under each number of images, the average reprojection error e is calculated in Equation (24), which refers to the difference between the reconstructed 3D model and the original image, that is, the difference between the reconstructed 3D model and the original image after being reprojected back to the image plane. It focuses more on the overall consistency between the reconstructed 3D model and the original image. $\begin{align*} e=\frac {1}{mn}\sum \nolimits _{j=1}^{n} \sum \nolimits _{i=1}^{m} {\frac {1}{\lambda _{i,j}}\left \|{{ \mathbf {m}_{i,j}-\left ({{ \mathbf {P}_{i}\left ({{\begin{array}{l} X_{i,j} \\ 1 \end{array}}}\right ) }}\right ) }}\right \|} \tag {24}\end{align*}$ View Source

In Equation (24), e represents the reprojection error, i and j denote the $i^{\mathrm {th}}$ frame and the $j^{\mathrm {th}}$ space point, m and n represent the number of images and space points, $\mathbf {m}_{i,j}$ and $\mathbf {p}_{i}$ represent the $j^{\mathrm {th}}$ image point on the $i^{\mathrm {th}}$ frame and the projection matrix of the camera, $X_{i,j}$ and $\left \|{{ \cdot }}\right \|$ represent the $j^{\mathrm {th}} \, 3$ D space point on the $i^{\mathrm {th}}$ frame and the 2-norm, respectively, $\lambda _{i,j}$ represents the depth factor of the $j^{\mathrm {th}} \, 3$ D spatial point in the $i^{\mathrm {th}}$ frame.

Therefore, using this indicator, the impact of the image number on the proposed model is analyzed to demonstrate the reconstruction effect. The experimental results are shown in Figure 1.

FIGURE 1.

Variation of reprojection error with the number of images.

MIT Libraries

MIT Libraries

A Non-Rigid Three-Dimensional Image Reconstruction Algorithm Based on Deformable Shape Reliability

Alerts

Abstract:

Metadata

Abstract:

Funding Agency:

Introduction

Design of Non-Rigid 3D Image Reconstruction Algorithm

A. The Calculation of Low-Rank Matrices for Dynamic Shape Bases in Non-Rigid 3D Image Reconstruction

B. Design of A Strongly Reliable Objective Function for Non-Rigid Image Reconstruction

1) The Proposed Model

2) The Selection of Initial Values

C. Euclidean Reconstruction

Experimental Results

A. Experimental Parameters

B. Impact of the Image Number on the Proposed Model

C. Analysis of the Impact of Depth Factor Values on Capturing Global Feature Quantity

D. Reconstruction Results

E. Reconstruction Performance

Detailed Discussion

A. Experimental Description

B. Results Discussion

Conclusion

References

IEEE Account

Purchase Details

Profile Information

Need Help?