Introduction
The analysis and understanding of visual information by computers facilitate a more accurate simulation of the real world, giving rise to the field of computer vision [1]. This discipline primarily focuses on acquiring various real-world details through the capture of 2D images, encompassing shape and motion recognition of 3D scenes [2]. A prominent research area within this field is the recovery of motion scenes and their corresponding parameters from continuous image sequences, commonly referred to as 3D motion vision. While significant advances have been made in the reconstruction of rigid bodies, exploring non-rigid body reconstructions continues to present considerable challenges [3]. Non-rigid 3D image reconstruction seeks to recover a 3D model of non-rigid objects from 2D images captured from multiple viewpoints, employing image processing and computer vision techniques [4]. This technology has wide-ranging applications in movie production, game development, and industrial design [5], [6]. However, the complexity and diversity of non-rigid motion render non-rigid 3D image reconstruction a challenging task, with ensuring the reliability of shape bases computation standing out as a particular concern.
In the realm of 3D image reconstruction, Greffier et al. employed deep learning algorithms to reconstruct original images using methods such as Filtered Back Projection (FBP), enhanced AIDR 3D (AIDR 3De), and AiCE at three levels (mild, standard, and strong) [7]. This approach, which relies on linear assumptions for reconstruction, falls short in adequately addressing the impacts of nonlinear deformations, thus proving ineffective in establishing stable and reliable shape objective functions for nonlinear, non-rigid motion. On the other hand, Matthieu et al. suggested training deep neural networks (DNNs) as denoisers to learn a priori image models, which would replace hand-crafted proximal regularization operators in optimization algorithms. Their AIRI framework, aimed at imaging complex intensity structures from visibility data, merges the robustness and interpretability of optimization with the efficiency and learning capability of neural networks [8]. Nevertheless, this method encounters difficulties in practical applications, particularly in selecting suitable initial values for nonlinear non-rigid motion, resulting in low reliability of reconstruction results.
Lin et al. approached the challenge by converting optimal spatial deformation into a nonlinear regularized variational optimization problem, incorporating local smoothing and input constraints. They leveraged data parallelism and flash memory optimization strategies for online tracking and reconstruction of non-rigid scenes [9]. Despite these efforts, the method struggled to effectively manage non-linear non-rigid motion during the research phase, impacting the establishment of reliable shape objective functions and leading to suboptimal reconstruction outcomes. Murase’s study utilized a helical digital body phantom to generate degraded projection data, employing system function graphs and Gaussian noise. The entire system matrix (SM) was calculated by linking each projection data set with slices for 3D image reconstruction [10]. However, this approach did not adequately capture the characteristics and variations of non-rigid motion, resulting in unreliable initial values for non-linear non-rigid motion and diminished reconstruction reliability.
In Jo’s research, attention modules were introduced, and simple algebraic pre-smoothing techniques like Gaussian filtering were applied to data. This pre-smoothed data served to derive an operator for image reconstruction through dynamic mode decomposition [11]. Although this method facilitated camera calibration with rigid parts and subsequent application to non-rigid reconstruction, it fell short in establishing stable and reliable shape objective functions and initial values for all non-rigid motions, posing challenges for generalization in practical applications.
Design of Non-Rigid 3D Image Reconstruction Algorithm
A. The Calculation of Low-Rank Matrices for Dynamic Shape Bases in Non-Rigid 3D Image Reconstruction
Non-rigid body shapes undergo specific transformations at different time points. To effectively complete non-rigid 3D image reconstruction, it is essential to accurately describe these dynamic shape changes. In this context, we utilize a low-rank matrix, combined with image points and depth factors, to represent the data structure of basis parameters in dynamic shape for non-rigid 3D image reconstruction. This approach enables the capture of both local and global features of the shape, thereby better restoring the original non-rigid shape basis’s change process and providing precise manifold parameter values for the structure of the objective function.
The process for calculating the low-rank matrix for the dynamic shape basis in non-rigid 3D image reconstruction involves several steps:
Digitalization of Non-Rigid 3D Images: Utilize the coordinates and depth factors of all image points to digitize the non-rigid 3D image.
Construction of the Point Cloud Matrix: Based on the 3D image, construct a point cloud matrix of the shape for a specific frame.
Singular Value Decomposition (SVD): Use singular value decomposition to arrange the constructed matrix in descending order of importance. Then, select the first kth singular values to retain the low-rank matrix for describing the dynamic shape basis in non-rigid 3D image reconstruction.
It is assumed that a classic pinhole camera model facilitates the non-rigid 3D image acquisition process [12], [13], allowing the non-rigid 3D image to be described as follows:\begin{equation*} \lambda \mathbf {U=\mathbf {K}}\left ({{ \mathbf {Rt} }}\right )\mathbf {w}=\mathbf {Pw} \tag {1}\end{equation*}
With total F frames and N 3D spatial points, the equation is transformed as follows for the fth frame:\begin{equation*} \mathbf{U}_f \lambda_f=\mathbf{K}\left(\mathbf{R}_f \mathbf{t}_f\right) \mathbf{Y}_f \tag {2}\end{equation*}
Finally, singular value decomposition is performed on the shape point cloud matrix of the constructed fth frame image. The singular values are arranged in descending order, and the first kth singular values are retained. The low rank matrix of the dynamic shape basis for non-rigid 3D image reconstruction can be obtained as:\begin{equation*} \mathbf {Y}_{f}=\frac {\mathbf {I}_{\mathrm {k}}u_{f_{k}}\lambda _{f_{k}}}{\mathbf {K(}\mathbf {R}_{f_{k}}\mathbf {t}_{f_{k}})} \tag {3}\end{equation*}
Thus, the low rank matrix calculation of dynamic shape basis for non-rigid three-dimensional image imaging is completed. When an object undergoes rigid motion, the 3D spatial point
B. Design of A Strongly Reliable Objective Function for Non-Rigid Image Reconstruction
1) The Proposed Model
The enhancement of reliability in our current method fundamentally addresses a nonlinear optimization challenge for dynamic shape bases. This involves identifying the minimum value within a specific parameter structure. During the training of parameters for non-rigid motion structures, the chosen objective function aims to minimize the discrepancy between the 3D coordinates of the manifold and the transformed 2D coordinates of the manifold group. This is based on assumed motion structure parameters within the photography model, ensuring that the derived motion structure parameters align closely with the actual observed image points and their theoretically predicted 3D shapes. By reducing this discrepancy, we enhance both the quality and accuracy of the reconstruction results.
Thus, the developed objective function effectively synchronizes the motion structure parameters with the manifold, ensuring reliable shape reconstruction. In the context of non-rigid motion, the structure parameters across continuous frames exhibit minimal changes. Such physical continuity is leveraged as a constraint in deriving the motion structure parameter matrix. By incorporating physical continuity, the objective function is designed to dampen the amplitude of shape changes, thereby bolstering the reliability of the motion structure parameters. In essence, the objective function employs manifold alignment and physical continuity constraints to reconcile motion structure parameters with observed data, yielding accurate and reliable outcomes. This approach also stably navigates the challenges posed by imperfect data.
Therefore, building on the previously described calculation of the low-rank matrix for the dynamic shape basis in non-rigid 3D image reconstruction, this process delineates the data structure of parameters associated with this dynamic shape basis. Additionally, it facilitates the acquisition of manifold parameter values amid changes in the non-rigid shape basis. During the projection of the manifold motion group, the manifolds \begin{align*} \boldsymbol {\Phi }_{i}& =\left [{{\begin{array}{cccccccccccccccccccc} u_{ic}^{\prime } & \, u_{id}^{\prime } & \, u_{ie}^{\prime } & \, u_{if}^{\prime } & \, u_{iC}^{\prime } \\ v_{ic}^{\prime } & \, v_{id}^{\prime } & \, v_{ie}^{\prime } & \, v_{if}^{\prime } & \, v_{iC}^{\prime } \\ \end{array}}}\right ] \tag {4}\\ \boldsymbol {\Phi }_{i}^{\prime }& =\left [{{\begin{array}{cccccccccccccccccccc} c_{ix} & \, c_{ix} & \, c_{ix} & \, c_{ix} & \, c_{ix} \\ c_{ix} & \, c_{ix} & \, c_{ix} & \, c_{ix} & \, c_{ix} \\ c_{ix} & \, c_{ix} & \, c_{ix} & \, c_{ix} & \, c_{ix} \\ \end{array}}}\right ] \tag {5}\end{align*}
The transformation function resulting from the equation after camera transformation is:\begin{equation*} \boldsymbol{\Phi}_{\mathbf{i}}=[\text { quater }(\mathrm{R})] \times\left[\text { quater }\left(\mathrm{Q}_i\right) \times \boldsymbol{\Phi}_{\mathbf{i}}^{\prime}+\mathrm{T}_i\right] \tag {6}\end{equation*}
In Equation (6),
By subjecting each assumed manifold for each frame to the aforementioned camera transformation, we can obtain a 2D matrix \begin{align*} \boldsymbol {\Phi }_{\mathrm {i}}^{\prime }=\left [{{\begin{array}{cccccccccccccccccccc} u_{1c}^{\prime } & \, u_{1d}^{\prime } & \, u_{1e}^{\prime } & \, u_{1f}^{\prime } & \, u_{1C}^{\prime } \\ v_{1c}^{\prime } & \, v_{1d}^{\prime } & \, v_{1e}^{\prime } & \, v_{1f}^{\prime } & \, v_{1C}^{\prime } \\ \ldots & \, \ldots & \, \ldots & \, \ldots & \, \ldots \\ u_{Fc}^{\prime } & \, u_{Fd}^{\prime } & \, u_{Fe}^{\prime } & \, u_{Ff}^{\prime } & \, u_{FC}^{\prime } \\ v_{Fc}^{\prime } & \, v_{Fd}^{\prime } & \, v_{Fe}^{\prime } & \, v_{Ff}^{\prime } & \, v_{FC}^{\prime } \\ \end{array}}}\right ] \tag {7}\end{align*}
Equation (7) provides the solution of 2D manifold coordinates through camera transformation with the assumed 3D coordinates of manifolds. After taking the difference between the matrix \begin{equation*} {\mathrm {de}}_{1}=\sqrt {\sum \limits _{j=1}^{5} \sum \limits _{i=1}^{2F} \left \|{{ {\mathbf {DE}}_{1} }}\right \|^{2}} =\sqrt {\sum \limits _{j=1}^{5} \sum \limits _{i=1}^{2F} \left \|{{ \mathrm {\boldsymbol {\Phi }}_{ij}-\mathbf {M}_{ij} }}\right \|^{2}} \tag {8}\end{equation*}
During the optimization and training of the entire motion structure matrix, the displacement and velocity changes of feature points between consecutive frames are slow for a high-speed camera [14], [15]. Reflecting such a property in the rotation matrix Q and translation matrix T of non-rigid body motion, the variation in the rotation matrix \begin{align*} {\mathbf {RS}}_{F6}=\left [{{\begin{array}{cccccccccccccccccccc} \theta _{1X} & \, \theta _{1Y} & \, \theta _{1Z} & \, X_{1} & \, Y_{1} & \, Z_{1} \\ \ldots & \, \ldots & \, \ldots & \, \ldots & \, \ldots & \, \ldots \\ \theta _{iX} & \, \theta _{iY} & \, \theta _{iZ} & \, X_{i} & \, Y_{i} & \, Z_{i} \\ \ldots & \, \ldots & \, \ldots & \, \ldots & \, \ldots & \, \ldots \\ \theta _{FX} & \, \theta _{FY} & \, \theta _{FZ} & \, X_{F} & \, Y_{F} & \, Z_{F} \\ \end{array}}}\right ] \tag {9}\end{align*}
Based on the above discussion, the difference between the \begin{align*} {\mathbf {RS}}_{(F\mathrm {-1)6}}^{\prime }=\left [{{\begin{array}{cccccccccccccccccccc} \theta _{2X}-\theta _{1X} & \theta _{2Y}-\theta _{1Y} & \theta _{2Z}-\theta _{1Z} & X_{2}-X_{1} & Y_{2}-Y_{1} & Z_{2}-Z_{1} \\ \ldots & \ldots & \ldots & \ldots & \ldots & \ldots \\ \theta _{FX}-\theta _{F-1X} & \theta _{FY}-\theta _{F-1Y} & \theta _{FZ}-\theta _{F-1Z} & X_{F}-X_{F-1} & Y_{F}-Y_{F-1} & Z_{F}-Z_{F-1} \\ \end{array}}}\right ] \tag {10}\end{align*}
Due to the continuity of non-rigid body motion and the motion structure parameters reflected in the image sequence obtained through high-speed capture [16], [17], the variation between consecutive frames is very small. This introduces the first constraint \begin{equation*} {\mathrm {de}}_{2}=\sqrt { \sum \nolimits _{j=1}^{6} \sum \nolimits _{i=1}^{F-1} \left \|{{ {\mathbf {RS}}_{ij}^{\prime }}}\right \|^{2}} \tag {11}\end{equation*}
Image sequences taken from the high-speed camera have a resolution of \begin{equation*} \bar {v_{i}}=\frac {\Delta s}{t}=\frac {S_{i}-S_{i-1}}{t} \, \, \mathrm { and } \, \, \Delta v=\bar {v_{i}}-\bar {v_{i-1}} \tag {12}\end{equation*}
The ratio of the displacement difference of an object during this time interval to t is the average velocity of the motion. Similarly, for a high-speed image sequence, such changes in average velocity are also very small. Assuming the average velocity from the \begin{align*} {\mathrm {de}}_{3}& =\sqrt { \sum \nolimits _{j=1}^{6} \sum \nolimits _{i=1}^{F-2} \left \|{{ {\mathbf {RS}}_{ij}^{\mathrm {''}} }}\right \|^{2}} \\ & =\sqrt { \sum \nolimits _{j=1}^{6} \sum \nolimits _{i=1}^{F-1} \left \|{{ {\mathbf {RS}}_{ij}^{\prime }-{\mathbf {RS}}_{i-1j}^{\mathrm {''}} }}\right \|} \tag {13}\end{align*}
After this discussion, we have obtained the error \begin{align*} f\left ({{ \mathbf {R,Q,T} }}\right )& =\sum \nolimits _{j=1}^{6} \sum \nolimits _{i=1}^{3} {w_{\mathrm {i}}{\mathrm {de}}_{\mathrm {i}}} \\ & =w_{1}\sqrt {\sum \nolimits _{j=1}^{5} \sum \nolimits _{i=1}^{2F} \left \|{{ \mathrm {\boldsymbol {\Phi }}_{ij}-\mathbf {M}_{ij} }}\right \|^{2}} \\ & \quad +w_{2}\sqrt {\sum \nolimits _{j=1}^{6} \sum \nolimits _{i=1}^{F-1} \left \|{{ {\mathbf {RS}}_{ij}^{\prime }}}\right \|^{2}} \\ & \quad +w_{3}\sqrt { \sum \nolimits _{j=1}^{6} \sum \nolimits _{i=2}^{F-1} \left \|{{ {\mathbf {RS}}_{ij}^{\prime }-{\mathbf {RS}}_{i-1j}^{\prime }}}\right \|^{2}} \tag {14}\end{align*}
2) The Selection of Initial Values
The essence of solving the problem using nonlinear optimization is to determine the description of the rotation matrix Q and the translation matrix T. In the L-M optimization method, these two parameters are treated as unknown. First, introducing the initialization method for the rotation matrix Q:
In this paper, a method combining factorization techniques is used for the initialization of the rotation matrix Q [19]. As analyzed in Chapter Three, considering a known measurement matrix
Assuming that the 3D shape of the non-rigid body is a weighted linear combination of shape bases, we have:
Under the weak perspective projection model, there is \begin{align*} \left [{{\begin{array}{cccccccccccccccccccc} u_{i1},\ldots ,u_{iP} \\ v_{i1},\ldots ,v_{iP} \end{array}}}\right ]=\overline R _{i}\left ({{\sum \nolimits _{\mathrm {l=1}}^{K} {\omega _{il}S_{l}} }}\right )+\overline T_{i}\mathbf {e}_{n}^{\mathrm {T}} \tag {15}\end{align*}
Transforming Equation (15) with the 2D coordinate origin as the centroid, we have
The matrix \begin{equation*} \mathbf {M=}\left [{{ \mathbf {M}_{1}^{\mathrm {T}},\mathbf {M}_{2}^{\mathrm {T}},\ldots ,\mathbf {M}_{F}^{\mathrm {T}} }}\right ]^{\mathrm {T}} \tag {16}\end{equation*}
Then, each matrix block is given by:\begin{equation*} \boldsymbol {M}_{i}=\left [{{\begin{array}{cccccccccccccccccccc} \omega _{i1}\overline {\mathbf {R}}_{i} & {.} & {.} & {.} & \omega _{iK}\overline {\mathbf {R}}_{i} \end{array}}}\right ]_{\mathrm {2 \times 3}K} \tag {17}\end{equation*}
\begin{align*} \mathbf {\widetilde {M}}_{i}=\left [{{\begin{array}{cccccccccccccccccccc} \omega _{i1} \\ \omega _{i2} \\ . \\ . \\ . \\ \omega _{iK} \\ \end{array}}}\right ]\left [{{\begin{array}{cccccccccccccccccccc} {\begin{array}{cccccccccccccccccccc} r_{i1} & r_{i2} & . \end{array}} & . & . & r_{i6} \end{array}}}\right ]=\bar {\boldsymbol {\Omega }_{i}}{\boldsymbol {\overline{\mathfrak {R}}}_{i}} \tag {18}\end{align*}
Clearly, the most rank of
The decomposition result of Equation (18) is still not unique. Thinking about \begin{equation*} f(C)=\min \left \|{{ \mathrm {\boldsymbol {\Omega }}_{i}-\boldsymbol {\Omega }_{i-1} }}\right \|_{F} \tag {19}\end{equation*}
Adjusting the row vector
According to the quaternion method and the exponential mapping of the rotation matrix,
C. Euclidean Reconstruction
With the motion structure parameter matrices \begin{align*} \mathbf {W}_{2F \times P}=\left [{{\begin{array}{cccccccccccccccccccc} u_{11} & \ldots & u_{1p} \\ v_{11} & \ldots & u_{1p} \\ \ldots & \ldots & \ldots \\ u_{F1} & \ldots & u_{Fp} \\ v_{F1} & \ldots & u_{Fp} \\ \end{array}}}\right ]\end{align*}
\begin{equation*} \mathbf {W}_{j}=\left [{{ u_{1j},v_{1j},\ldots ,u_{Fj},v_{Fj} }}\right ]^{\mathrm {T}} \tag {20}\end{equation*}
Then the matrix \begin{align*} \mathrm {d}\mathrm {e}_{4}\!=\!\mathrm {T}_{ij}\!-\!\mathrm {W}_{ij}=\left [{{ \mathrm {quater}\left ({{ \mathrm {R} }}\right ) }}\right ]\! \times \!\left [{{ \mathrm {quater}\left ({{ \mathrm {Q}_{i} }}\right )\mathrm {W}_{ij}^{\prime }\!+\!\mathrm {T}_{i} }}\right ]\!-\!\mathrm {W}_{ij} \tag {21}\end{align*}
In Equation (21), the known parameter is the \begin{equation*} \mathop {\mathrm {Min}}\limits _{\mathbf {w}_{j}^{'}}\left \|{{ f(\mathbf {W}_{j}^{\prime }) }}\right \|=\sqrt {\sum \nolimits _{i\mathrm {=3}}^{2F} \left \|{{ \mathbf {DE}\mathbf {4}_{i1}\mathrm {-\mathbf {DE}}\mathbf {4}_{i-21} }}\right \|^{2}} \tag {22}\end{equation*}
Equation (22) is the objective function for the 3D reconstruction of the \begin{align*} \mathbf {T}\mathbf {S}_{3FP}=\left [{{\begin{array}{cccccccccccccccccccc} \mathbf {W}_{1}^{\mathrm {T}} \\[2.3pt] \mathbf {W}_{2}^{\mathrm {T}} \\[2.3pt] \mathrm {\ldots } \\[2.3pt] \mathbf {W}_{P}^{\mathrm {T}} \end{array}}}\right ] \tag {23}\end{align*}
Equation (23) represents a matrix of size
In summary, the proposed algorithm for non-rigid 3D image reconstruction addresses the significant limitations identified in existing algorithms, particularly their challenges in establishing stable shape objective functions and determining initial values for nonlinear, non-rigid body movements. The methodology of the algorithm unfolds in a series of meticulously designed steps. Initially, it employs image points and depth factors to construct a low-rank matrix that accurately describes the dynamic shape basis of non-rigid bodies. This matrix not only facilitates a better restoration of the transformations of the non-rigid body shape basis but also provides precise manifold parameters essential for the formulation of objective functions. Subsequently, the algorithm leverages manifold alignment and physical continuity constraints to optimize the construction of the objective function. This optimization step is crucial for aligning the motion structure parameters with the observed data and reducing the amplitude of shape changes, thereby ensuring the accuracy and reliability of the reconstruction results.
Further enhancing the robustness of the algorithm, constraints on reconstruction error and minimal frame-to-frame shape changes are integrated. These constraints ensure that the minimization of the objective function yields reconstruction results that closely align with actual observational data, while also preventing unrealistic alterations in shape, thereby improving the reliability of the reconstruction outcomes. Following this, the L-M (Levenberg-Marquardt) nonlinear optimization method is applied to solve for the motion structure parameters and to select the key initial values of the shape basis, based on the minimized objective function. The culmination of these steps is the Euclidean reconstruction, which, by using the solved motion structure parameters, obtains the 3D coordinates of each point, achieving a reliable reconstruction process. This comprehensive approach not only overcomes the deficiencies of prior models but also sets a new benchmark for accuracy and reliability in non-rigid 3D image reconstruction.
Experimental Results
A. Experimental Parameters
Experiment data is the Stanford 40 Actions dataset from the Stanford Image Library, which is a large-scale image and video database created by the Stanford Vision Lab. This database contains multiple datasets, covering different tasks in fields such as computer vision and machine learning.
The Stanford 40 Actions dataset contains approximately 9500 video clips, of which 6000 are male and 3500 are female. There are 2500, 5000, and 2500 video clips for different age distributions, including adolescents, young people, and elderly people. And this dataset covers 40 different categories of human actions. Therefore, use this dataset to complete the reconstruction performance test of the proposed algorithm. To ensure the effectiveness of the test, it is necessary to set the experimental parameters, as shown in Table 1.
B. Impact of the Image Number on the Proposed Model
Taking dynamic facial images as an example to verify the impact of the image number on the proposed algorithm [24], first a set of changing camera intrinsic parameters
Then, 100 3D points within a unit sphere are randomly generated and divided into three rigid body elements: the first rigid body element consists of the first 50% of the space points; the second consists of the middle 30%, and the third consists of the last 20%. Simultaneously, the external parameter matrix of the camera is varied to generate images of sizes \begin{align*} e=\frac {1}{mn}\sum \nolimits _{j=1}^{n} \sum \nolimits _{i=1}^{m} {\frac {1}{\lambda _{i,j}}\left \|{{ \mathbf {m}_{i,j}-\left ({{ \mathbf {P}_{i}\left ({{\begin{array}{l} X_{i,j} \\ 1 \end{array}}}\right ) }}\right ) }}\right \|} \tag {24}\end{align*}
In Equation (24), e represents the reprojection error, i and j denote the
Therefore, using this indicator, the impact of the image number on the proposed model is analyzed to demonstrate the reconstruction effect. The experimental results are shown in Figure 1.
The physical meaning of Equation (24) is that it calculates the average error between the reprojected image points and the actual image points. A smaller reprojection error indicates a higher reconstruction accuracy, while a larger error implies lower reconstruction accuracy. Additionally, to investigate the impact of the number of space points on the algorithm, following the same procedure as above, the camera’s internal parameters were varied, while the number of space points ranged from 20 to 150. These space points were divided into three rigid body elements according to the aforementioned ratio. Using these 3D space points, 150 images were generated, and 1 pixel of Gaussian noise was added to each image. The algorithm was run 100 times for each number of space points, and the average reprojection error was calculated, as shown in Figure 2.
From Figures 1 and 2, it can be seen that as the number of spatial points and images increases, the amplitude of the reprojection error in our algorithm will show a trend of first increasing and then decreasing. The reason is that when the number of spatial points and images is relatively small, the number of equations and unknowns is close, making the solution relatively unstable and more susceptible to noise, resulting in a larger residual error; On the contrary, when there are more spatial points and images, the number of equations is greater than the unknowns, and the solving process is more overconstrained and stable, so the residual difference is relatively small. So as the number of spatial points and images increases, the residual variation will show a trend of first increasing and then decreasing, and eventually stabilizing. This indicates that the number of images has a certain impact on the model in this article. The more images there are, the smaller the reprojection error, and the better the reconstruction processing effect of the proposed algorithm.
C. Analysis of the Impact of Depth Factor Values on Capturing Global Feature Quantity
In non-rigid 3D image reconstruction, the value of depth factor determines the field of view of camera imaging. Based on the requirements of the high-speed camera used in the article, the range of depth factor values is set to [0.01,0.04]. One image is randomly selected from the Stanford 40 Actions dataset, and imaging is performed on the image at different depth factor values. The number of global features that can be captured is counted, and the more features can be captured, the more, The better the subsequent reconstruction effect, the capturing global features are shown in Table 2.
According to the results presented in Table 2, as the depth factor increases, there is an upward trend in the number of global features that can be captured. This upward trend is attributable to the fact that as the depth factor rises, distant objects within the camera’s imaging range become encompassed within the field of view, resulting in an expanded number of global features that can be identified and captured. Therefore, in the above parameter settings, the depth factor values set are effective and can provide reliable support for subsequent 3D image reconstruction to improve the reconstruction effect.
D. Reconstruction Results
Selecting any two samples from the moving facial image and three frames each (
The selected three frames were reconstructed using the algorithm proposed in this paper, and the results are shown in Figure 4.
Additionally, to further illustrate the superior performance of the proposed algorithm in non-rigid 3D image reconstruction, five comparison algorithms from literature [7] based on deep learning, literature [8] based on regularization operators, literature [9] based on translation kernel factorization, literature [10] based on projection, and literature [11] based on constrained bilateral smoothing and dynamic mode decomposition were selected. These five comparison algorithms were used to reconstruct the selected images in the study, and the results are shown in Figures 5–9.
Reconstruction model based on constrained bilateral smoothing and dynamic mode decomposition.
By analyzing Figures 4–9, it can be concluded that the proposed algorithm can effectively restore the 3D structure and motion of non-rigid bodies. The reconstruction results of the selected image are basically consistent with the initial image, indicating that the proposed algorithm can effectively achieve the goal of 3D non-rigid body reconstruction.
There are certain differences between the reconstruction results of the five comparative algorithms and the initial image. Among them, the reconstruction results based solely on deep learning algorithms have a small difference from the initial image, but there are angle issues and missing details. Therefore, comparing the reconstruction results of the five comparative algorithms with the reconstruction results of the algorithm in this paper, it can be seen that the robustness and reconstruction effect of the algorithm in this paper are better than those of several comparative algorithms. The main reason is that the algorithm in this paper solves the problem of robustness degradation caused by the continuous change of shape basis in non-rigid motion images. Compared with the fixed shape basis calculation applied in other comparative algorithms, The proposed algorithm determines the low rank matrix of the dynamic shape basis for non-rigid 3D image reconstruction, describes the data structure of the dynamic shape basis variables for non-rigid 3D image reconstruction, captures local and global features of the shape, and better represents the motion robustness process. Therefore, compared with several comparative algorithms, it has higher reconstruction robustness and effectiveness.
Further validating the local reconstruction results of the proposed algorithm, AIDR 3D reconstruction technology is an advanced medical image processing technique that uses iterative methods to reconstruct high-quality 3D images. This technology is based on a series of mathematical and physical principles, including filtering, backprojection, reconstruction algorithms, etc. It can shorten imaging time, reduce radiation dose, and lower imaging costs. Therefore, AIDR 3D reconstruction technology was selected for comparison with the proposed algorithm. Taking the first set of male image sequences as an example, AIDR 3D reconstruction and the proposed algorithm were used to reconstruct the frontal, lateral, and top views of the local nasal tip, and the compared results were shown in Figures 10 and 11.
Comparison of frontal, lateral, and overhead views of the reconstructed nasal tip using the proposed algorithm.
Comparison of frontal, lateral, and overhead views of the reconstructed nasal tip using the AIDR 3D algorithm.
In Figures 10 and 11, the red circles and lines represent the locally reconstructed nasal tip image using the proposed algorithm, while the blue dots and lines represent the local feature points of the nasal tip in the dynamic facial image. From Figure 10, it can be observed that the locally reconstructed nasal tip image using the proposed algorithm conforms to the local features of the nasal tip in dynamic facial images. From Figure 11, it can be observed that the frontal image of the nasal tip reconstructed locally using AIDR 3D reconstruction technology is basically consistent with the proposed algorithm, and can well represent the local feature points of the nasal tip. However, when reconstructing the side and top views, the reconstruction results are significantly different from the local feature points of the nasal tip, and the reconstruction effect is not good compared to the proposed algorithm. This indicates that the reconstructed nasal tip of the proposed algorithm can better reflect the characteristics of high nasal bridges in European and American individuals, proving the effectiveness of the proposed algorithm.
In order to visually illustrate the effect of incorporating a constraint term for non-rigid motion and velocity variation into the proposed algorithm, the left eye reconstruction results of the proposed algorithm were compared with those of five contrastive algorithms, as shown in Figure 12.
Left eye reconstruction results of the proposed algorithm and five comparison algorithms.
In Figure 12, the red circles represent the left eye reconstruction results under the constraint of non-rigid motion and velocity variation, while the blue dots indicate the relative positions of the eye and eyebrow. Based on the results in Figure 12, it can be observed that the proposed algorithm clearly reflects the left eye reconstruction results under the constraint of non-rigid motion and velocity variation, which is consistent with the reconstructed model results presented in this paper, and the relative positions of the eye and eyebrow indicated by the blue dots are relatively accurate. This indicates that the incorporation of a constraint for non-rigid motion and velocity variation in the proposed algorithm results in a more continuous deformation of the non-rigid body, leading to a more accurate reconstruction.
E. Reconstruction Performance
Back projection error refers to the error generated during the projection process from a 3D model to a 2D image. It measures the difference between the pixel values projected by the model onto the image plane and the actual pixel values. Compared to reprojection error, it focuses more on the accuracy of individual pixel points and is more detailed. It can be used to evaluate the accuracy and reliability of reconstruction algorithms. Therefore, this article uses the variation of back projection error during the iteration process to prove the accuracy and reliability of this algorithm. The back projection error can be defined as:\begin{equation*} \sigma =\frac {\left \|{{ \mathbf {W-}\mathbf {W}_{\mathrm {r}} }}\right \|_{F}}{\left \|{{ \mathbf {W} }}\right \|_{F}}\mathrm {100\% } \tag {25}\end{equation*}
In Equation (25), W is the original measurement matrix, and
Backprojection errors obtained under the proposed algorithm and five comparative algorithms.
From Figure 13, it can be seen that when the number of iterations reaches 1000, the average back projection errors of reconstruction algorithms based on deep learning, regularization operator, translation kernel decomposition, projection reconstruction, constrained bilateral smoothing, and dynamic mode decomposition are 0.59%, 0.80%, 0.90%, 1.32%, and 1.51%, respectively. In contrast, the average inversion error of the proposed algorithm is only 0.33%.
This indicates that the proposed algorithm has the smallest back projection error compared to the five comparison algorithms, indicating that the reconstruction results of the proposed algorithm are more accurate and reliable. This is because the proposed algorithm constructs a low rank matrix describing the dynamic non-rigid body shape basis by combining image points and depth factors before reconstruction. The algorithm can effectively restore the process of non-rigid body shape basis changes and optimize the construction objective function using manifold alignment and physical continuity constraints. The L-M nonlinear optimization method is used to solve the problem and obtain the key initial values of the shape basis, This further reduces the back projection error of the reconstruction results, making them closer to the actual shape changes, more accurate and reliable, and having good reconstruction performance, which can effectively achieve accurate reconstruction of non-rigid 3D images. To further verify the reconstruction performance of the proposed algorithm, a 3D ShapeNet dataset was selected for comparative testing of the back projection error index and the above 5 comparison algorithms. Randomly select 5 sets of images from this dataset for 3D reconstruction, each containing 100 images. The back projection error results of each algorithm reconstruction are shown in Table 3.
According to the results obtained in Table 3, the proposed algorithm can also have good reconstruction performance in this dataset, with an average back projection error of 0.30%. The reconstruction algorithms based on deep learning, regularization operator, translation kernel decomposition, projection reconstruction, constrained bilateral smoothing, and dynamic mode decomposition have their respective average back projection errors of 0.60%, 0.88%, 1.01%, 1.32%, and 1.59%, respectively. In contrast, the proposed algorithm can still maintain a low reconstruction backprojection error, has good reconstruction accuracy, performs well in reconstruction, and can ensure the quality of reconstruction results. The reason is that the proposed algorithm utilizes image points and depth factors to form a low rank matrix, which can better capture the changing characteristics of dynamic non-rigid body shape bases. By using low rank representation, the dimensionality of the original data can be effectively reduced while retaining important information, which is beneficial for reducing information loss during the reconstruction process and improving the accuracy of the reconstruction. And through manifold alignment and physical continuity constraints, the shape changes during the reconstruction process are limited within a reasonable range to avoid unreasonable distortion or deformation, thereby helping to maintain the authenticity and stability of the reconstruction results and reducing reconstruction backprojection errors.
The Structural Similarity Index (SSIM) is a metric used to compare the degree of similarity between two images. The SSIM can be employed to assess the quality loss in images and is widely utilized in image processing and compression algorithms. It evaluates the similarity between images by comparing three aspects: brightness, contrast, and structure. Brightness compares the average brightness of the images, contrast assesses the differences in contrast of the images, and structure compares the structural information in the images. All three aspects take into consideration the characteristics of human visual perception, thus making the evaluation more in line with human perception of image similarity. The SSIM value typically ranges from -1 to 1, where 1 indicates complete similarity between two images, 0 indicates no similarity, and -1 indicates complete dissimilarity. In order to verify the degree of detail feature preservation in the 3D motion facial images reconstructed using this algorithm, the restoration effect of the algorithm on image details was compared with five motion facial image comparison algorithms through SSIM analysis. As shown in Figure 3, select the moving face image of Sample 1 as the sample, extract 68 facial feature points, and compare the SSIM index of the images obtained by our algorithm with five comparison algorithms, as shown in Figure 14.
SSIM index of images obtained by the proposed algorithm and five comparison algorithms.
From Figure 14, it can be observed that for the 68 facial feature points, the SSIM indices of the images obtained using reconstruction algorithms based on deep learning, regularization operator, translation kernel factorization, projection-based reconstruction, and constrained bilateral smoothing and dynamic mode decomposition reach 0.962, 0.125, 0.569, 0.124, and 0.397, respectively.
When using the algorithm proposed in this article for non-rigid 3D image reconstruction, the SSIM index reaches a maximum of 0.998, close to 1, indicating that the difference between the reconstructed image using the algorithm proposed in this article and the moving facial image is relatively small. This is because the algorithm in this article uses image points and depth factors to form a low rank matrix that describes the dynamic non-rigid body shape basis, better capturing facial motion and deformation features, and using manifold alignment and physical continuity constraints to optimize the objective function, further providing reliable support for subsequent reconstruction, effectively maintaining consistency between the reconstruction results and real facial motion.
The consistency of curvature refers to the degree of similarity between the reconstructed results and the real image in terms of curvature properties in image reconstruction. Curvature values are usually used to measure the similarity. The closer the curvature value of the reconstructed results is to the curvature value of the real image, the more the reconstruction algorithm captures the curvature features of the real image and can accurately restore the details of curvature changes. It has good reconstruction performance and can achieve accurate non-rigid 3D image reconstruction. In order to verify the degree of detail restoration of non-rigid 3D images reconstructed using the algorithm proposed in this paper, 8 images were randomly selected from the Stanford 40 Actions dataset. The curvature values of the reconstructed images using the algorithm proposed in this paper were compared with those obtained from five comparison algorithms based on the consistency index of curvature, as shown in Figure 15.
Using the algorithm proposed in this article and five comparison algorithms to calculate the curvature value of reconstructed images.
From Figure 15, it can be seen that for any 8 selected images, there is a significant difference between the curvature values of the reconstructed images and the actual images using reconstruction algorithms based on deep learning, regularization operators, translation kernel decomposition, projection reconstruction, constrained bilateral smoothing, and dynamic mode decomposition. The curvature value of the reconstructed image using the proposed algorithm is consistent with the curvature value of the actual image. This demonstrates the effectiveness of the proposed algorithm in improving the degree of detail restoration in 3D image reconstruction, thereby effectively enhancing the quality of image restoration. This is because the proposed algorithm optimizes the construction of the objective function through manifold alignment and physical continuity constraints, making the reconstruction results more in line with the shape and characteristics of real objects. It also uses the L-M nonlinear optimization method to solve the shape based key initial values, better guiding and optimizing the reconstruction process, and improving the degree of detail restoration.
Detailed Discussion
A. Experimental Description
Throughout the completion of the entire paper, the non-rigid 3D image reconstruction algorithm based on the reliability of the deformable shape priors was subjected to three sets of experiments.
Set 1: This experiment investigates the impact of the number of moving facial images on the algorithm proposed in this paper, using moving facial images as an example. Firstly, generate variable camera internal parameters
Set 2: In the first step, 68 feature points were marked on the moving face using a white marker pen, as shown in Figure 16.
Facial feature points marked with a white marker pen in a certain frame of the image sequence.
These marker points will be used as features for extraction, as shown in Figure 16. These points basically cover all areas of the face, including parts such as the eye socket and nose bridge that do not change significantly during facial expression changes, as well as points like the mouth.
A larger number of points were taken at the mouth at the first step, which can more clearly display these points with larger changes during facial expression changes.
The second step is to use the SONY HDR-XR150E-400 megapixels high-speed camera to capture images, which has a resolution of 4.2 million pixels and can accurately extract the coordinates of these feature points. Approximately 10 seconds of facial expression changes were captured, and then 150 frames of images were captured using professional image capture software.
On step 3, MATLAB was used on simulation to extract the 2D coordinates of these feature points for each frame, forming a measurement matrix
On Step 4, a 3D reconstruction process based on variable shape basis reliability was carried out on the measurement matrix. For each feature point, the 3D coordinates were taken in 150 frames of the image. After reconstructing all feature points, a matrix consisting of 68 feature points was formed, each feature point represented by 3D coordinates as a
Set 3: This experiment describes the performance of non-rigid 3D image reconstruction, with indicators including back projection error, similarity index, and consistency of curvature. Firstly, obtain non-rigid 3D image data and record projection data from different angles. Using the collected projection data, the image is reconstructed using the algorithm proposed in this article and five comparative algorithms. Compare the reconstructed image with the moving facial image. 68 feature points marked on moving faces were loaded into a non-rigid 3D image reconstruction algorithm program based on deformable basis reliability, and experimental research was conducted.
B. Results Discussion
In the first set of experiments, as the number of spatial points and images increases, the amplitude of the reprojection error will show a trend of first increasing and then decreasing. This is mainly because when there are a large number of spatial points and images, the number of equations is more than the unknowns, and the solving process is more overconstrained and stable, so the residual difference is relatively small. So as the number of spatial points and images increases, the residual variation will show a trend of first increasing and then decreasing, and will eventually stabilize [25].
In the second set of experiments, the algorithm proposed in this paper can effectively restore the 3D structure and motion of non-rigid bodies, and the reconstruction results of the selected images are basically consistent with the initial images. This is mainly because when the algorithm in this article trains the motion structure parameters of non-rigid objects, the selected objective function ontology is the image model, and the difference between its 2D coordinates and the image manifold group is the minimum value. When a non-rigid object is in motion, the variation of its motion structure parameters between the front and back frames is very small. This physical continuity can provide constraints for obtaining the parameter matrix of the motion structure. The algorithm proposed in this article constitutes an objective function with high reliability in these two aspects, effectively solving the problem of reliability degradation caused by the continuous change of shape basis in non-rigid motion images, and has higher reconstruction reliability. The algorithm proposed in this article has a good local reconstruction effect. For the local image of the nasal tip, it can obtain the local features of the nasal tip that match the motion of facial images, better reflecting the high nasal tip characteristics of Europeans and Americans. This is mainly because in this article selects the appropriate reconstruction algorithm and the correct parameter selection and optimization, thereby effectively achieving good local reconstruction results. The reconstruction results of the left eye under the constraint of velocity variation in non-rigid body motion in this algorithm are consistent with the reconstruction results of the moving face image, and the relative position of the eye and eyebrow is relatively accurate. This is mainly because the algorithm proposed in this article incorporates physical laws into the motion parameter matrix of non-rigid bodies, reflects them in the matrix, and obtains a constraint to describe the velocity change in non-rigid body motion. It also obtains the error between the transformed points of the manifold motion group as the minimum value subject and the actual measurement points, as well as two constraints to describe the displacement and velocity change of the parameters before and after the frame during non-rigid body motion, thus, by adding a constraint on velocity variation in non-rigid motion, the deformation of the non-rigid body becomes more continuous, and the position of the reconstructed 3D structure is more accurate [26].
In the third set of experiments, the back projection error of the algorithm proposed in this paper was the smallest, indicating that the low-priced approximation of the measurement matrix obtained by the algorithm proposed in this paper is more accurate. This is mainly because the algorithm proposed in this article effectively controls the complexity of the model and reduces the possibility of overfitting. By retaining the main low rank structures, the essential features of the data can be better described without being affected by excessive noise and details. The SSIM index obtained from non-rigid 3D image reconstruction using this algorithm is close to 1, proving that the difference between the reconstructed image using this algorithm and the moving face image in Figure 3 is smaller. This is mainly because the algorithm used in this article uses nonlinear optimization to solve parameter problems. By selecting appropriate initial values, it can help accelerate the convergence speed of the optimization algorithm and also help avoid getting stuck in local optima. By providing an initial value close to the optimal solution, the similarity between the two images can be improved, thereby reducing the difference in non-rigid 3D images. The curvature value of the reconstructed image in this article is consistent with the actual curvature value of the image, which is effective in improving the degree of detail restoration in 3D image reconstruction and also improving the quality of non-rigid 3D image restoration. This is mainly because the algorithm proposed in this article optimizes the construction of the objective function through manifold alignment and physical continuity constraints, making the reconstruction results more in line with the shape and characteristics of real objects. The L-M nonlinear optimization method is used to solve the shape based key initial values, better guiding and optimizing the reconstruction process, improving the degree of detail restoration, and thus improving image quality [27].
Conclusion
This paper presents a novel reconstruction algorithm for non-rigid 3D images, leveraging a low-rank matrix derived from image points and depth factors to accurately represent dynamic non-rigid shapes. By incorporating an improved method for defining the nonlinear objective function and selecting initial values, alongside classical nonlinear optimization techniques, the algorithm effectively reconstructs the 3D structure and parameter matrices for non-rigid forms. Its key strengths include optimized objective function and initial value computation, and a precise approach to transformation matrix calculation, ensuring consistent image treatment. Simulation results confirm the algorithm’s high reliability and its success in minimizing back projection errors, highlighting its potential to advance non-rigid 3D image reconstruction.