Introduction
Optical synthetic aperture imaging is an important technology approach to realize high-resolution imaging, exceeding the aperture size limits of the monolithic primary mirror systems with lighter weight and less manufacturing costs. However, the phase differences between sub-apertures adversely affect the optical interference and damage image quality. In order to achieve the optimal imaging performance, piston sensing technology, which acts a critical role in synthetic aperture imaging, is proposed to realize the co-phase between sub-apertures.
The present piston sensing methods can be categorized into two types, specific optics-based methods and image-based methods respectively. The modified Shack-Hartmann sensor [1], dispersed fringe sensor [2] and pyramid sensor [3] measure piston errors from the pupil information modulated by specially designed hardware, which inevitably increase the system complexity. The image-based methods, such as phase diversity [4] and phase retrieval [5], measure piston errors by analyzing the intensity distribution in image plane, for which a simpler optical design that requires significantly less hardware can produce the required image data. However, although the mentioned image-based piston sensing greatly compacts the sparse aperture system, it does need a large amount of iterative optimization calculation, thus failing to realize instant correction.
With the advent of era of Artificial Intelligence, neural network method has successfully resolved many optical imaging problems with piston sensing technology included [6], [7], [8]. Differing from the conventional optimization methods, the neural network method allows a computational model to holistically describe a specific problem by use of massive training data, thus making the prediction more efficient. It can be regarded that the offline time for iterative optimization is transferred to the online training stage. For the issue of piston sensing, the data-driven mode enables multi-layer network to extract abstract features by processing Point Spread Function (PSF), so as to establish the mapping relationship between image information and pistons. Guerra-Ramos et al. demonstrated the feasibility of deep learning-based piston sensing method in simulation. they trained two shallow convolutional neural networks respectively for fine phasing and coarse phasing, with multi-wavelength approach adopted to settle the 2π ambiguity problem and expand the capture range [9]. Subsequently, piston sensing method using a single network is developed, which is capable of predicting pistons directly from the raw broadband focal image via the execution of only one iteration in testing phase [10], [11]. This end-to-end mode makes the faster sensing realizable.
Benefited from the strong parallel computing capability of graphics processing unit (GPU), large amounts of mathematical operations in deep network and other iterative algorithms can be executed effectively. However, due to the inherent limitation of the von Neumann architecture in hardware processing speed, the traditional electrical neural network has encountered bottlenecks. Besides, the image acquisition speed of CCD will also hinder the real-time sensing of the image-based methods. Recently, diffractive optical elements (DOEs) have been used to build the deep learning framework and some achievements have been achieved in specific tasks, including image classification [12], [13], object detection [14], and pupil phase retrieval [15], among others. The parallel computing capability and high-speed optical transmission allow these tasks to be executed more efficiently.
Inspired by researches on diffractive network, we propose a real-time piston sensing method based on all-optical diffractive neural network (ODNN). The principle of the all-optical piston sensing is described in Section II. The simulation results and some discussions are presented in Section III. By using a three-layer phase-only ONDD, a numerical blind testing accuracy of 97% is achieved for two-aperture imaging system, according to which a fine piston sensing with average RMSE of 0.024λ is obtained. In addition, it is also shown that a well-trained network has strong robustness for aberration. Finally, concluding thoughts and relevant discussions are offered in Section IV.
Method Description
Different from the electrical neural network method, where intensity images recorded by CCD are needed as the input data, ODNN composed of several DOEs is capable of processing the light field at the speed of light without any image acquisition and hardware processing. As conceptual illustration of the piston sensing model shown in Fig. 1, the collimated incident light, which is sampled by the pupil with two sub-apertures and carries piston information, is focused on the first layer of the designed network, then the interference light field is modulated by the diffractive units and transformed to the specific intensity distribution, which corresponds to predicted piston value. The whole piston sensing process based on free-space optical propagation is implemented with all-optical computing, thus achieving light-speed sensing. It should be specially explained that there is no imaging device in the practical piston sensing module. For the purpose of high real-time sensing, several photosensitive elements will be placed at the output plane in the further experiment research, which will convert the optical signal into voltage signal, thus actuating the fast reflect mirror to compensate the piston errors.
Conceptual illustration of the piston sensing model using all-optical diffractive network.
The training flow diagram of the ODNN-based method is shown in Fig. 2. To begin with, the simulation model of a sparse aperture imaging system is constructed to produce the complex amplitude of Point Spread Function (PSF), which is stored as two components: amplitude distribution and phase distribution. Then the diffractive network is trained to learn the mapping relationship between the complex-value PSF and intensity distribution at the output plane. Through iterative optimizations, optimum predicted intensity distributions are obtained with the convergence of loss function. In the computer-based training and testing, simulated complex-value PSF of sparse aperture system is generated and saved as input dataset. Once the network parameters are determined and the physical system is constructed, the digit input is no longer required.
Schematic of training procedure of the ODNN-based piston sensing method. The subregions numbered 0-9 (R0-R9) on the output plane correspond to 10 subranges (P0-P9) of the piston sensing range.
According to the synthetic imaging principle, the generalized pupil function at the monochromatic wavelength λ can be expressed as:
\begin{equation*}
A\left( {{{x}_0},{{y}_0}} \right) \!=\! \sum\limits_{n = 1}^N {{{A}_{sub}}({{x}_0} - {{x}_n},{{y}_0} - {{y}_n})} \exp \left(\frac{{2\pi i}}{\lambda }OP{{D}_n}\right). \tag{1}
\end{equation*}
\begin{equation*}
U\left( {{{x}_f},{{y}_f}} \right) = FT\left\{ {A({{x}_0},{{y}_0})} \right\}. \tag{2}
\end{equation*}
For monochromatic source, the capture range (−0.5λ, 0.5λ) can be divided into 10 subranges at 0.1λ intervals, which are numbered P0-P9 {P0: (−0.5λ, −0.4λ), P1: (−0.4λ, −0.3λ), P2: (−0.3λ, −0.2λ), P3: (−0.2λ, −0.1λ), P4: (−0.1λ, 0), P5: (0, 0.1λ), P6: (0.1λ, 0.2λ), P7: (0.2λ, 0.3λ), P8: (0.3λ, 0.4λ), P9: (0.4λ, 0.5λ),}. Correspondingly, the output plane is divided into 10 subregions (R0-R9). For example, when the piston of system belongs to subrange P0: (−0.5λ, −0.4λ), the well-trained ODNN will ideally focus the maximum optical signal on the subregion 0 (R0), as the target shown in Fig. 2.
The information forward propagation process is realized through the diffraction between adjacent layers, the model of which can be established based on Rayleigh-Sommerfeld diffraction equation and written as:
\begin{equation*}
w_i^l\left( {x,y,z,l} \right) = \frac{{z - {{z}_i}}}{{{{r}^2}}}\left( {\frac{1}{{2\pi r}} + \frac{1}{{j\lambda }}} \right)\exp \left( {\frac{{j2\pi r}}{\lambda }} \right), \tag{3}
\end{equation*}
Where r is the distance from the source to the neuron located at
\begin{equation*}
t_i^l\left( {{{x}_i},{{y}_i},{{z}_i}} \right) = a_i^l\left( {{{x}_i},{{y}_i},{{z}_i}} \right)\exp \left( {j\varphi _i^l\left( {{{x}_i},{{y}_i},{{z}_i}} \right)} \right), \tag{4}
\end{equation*}
The output of the i-th neuron in the l-th layer is the product of its input complex amplitude information and transmission coefficients:
\begin{equation*}
u_i^l\left( {x,y,z} \right) = w_i^l\left( {x,y,z} \right)t_i^l\left( {{{x}_i},{{y}_i},{{z}_i}} \right)\sum_k {u_k^{l - 1}\left( {{{x}_i},{{y}_i},{{z}_i}} \right)}, \tag{5}
\end{equation*}
Where
\begin{equation*}
S_i^{M + 1} = {{\left| {u_i^{M + 1}} \right|}^2}. \tag{6}
\end{equation*}
If the number of pixels at output plane is K, the loss function is defined as the mean squared error between the intensity on the output plane
\begin{equation*}
E = \frac{1}{K}{{\left( {\sum\nolimits_k {S_k^{M + 1}} - g_k^{M + 1}} \right)}^2}, \tag{7}
\end{equation*}
In the error backpropagation process, stochastic gradient descent algorithm is implemented to iteratively updating the network parameters until the goal of minimizing the loss function is attained. Then most of the output light is focused on the target region, thus achieving the mapping from imaging optical field to piston values. The final parameters of the model are saved and the architecture of the diffractive network used to perform piston sensing task is obtained. Once the model is physically fabricated, light-speed piston sensing task can be performed.
Results
To demonstrate the feasibility of the proposed ODNN-based piston sensing method, several simulations have been implemented. First, we model a two-aperture imaging system with 600 nm monochromatic light source, 10 mm diameter sub-aperture, and 2 m focal length according to the configuration of muti-aperture system in our lab. For each subrange, 1000 random values are generated as the pistons to produce input data of the network, which can be divided into 800 training samples and 200 testing samples. Then training dataset with 8000 samples and testing dataset with 2000 samples are obtained.
The ODNN utilized in this paper comprises three transmissive diffractive layers. Each neuron has a size of 15 um and each layer composed of 160000 neurons has a size of 6 mm×6 mm (400 × 400). The distance between the adjacent layers is 11cm and the first layer is located at the focal plane of the multi-aperture imaging system. The network in trained on a desktop computer with NVIDIA GTX 1080 Ti graphic card. The parameters are optimized by using AMSGrad algorithm [16] with a learning rate of 0.01 and batch size of 64. When the training stage is completed, the phase parameters of each diffractive layer are determined and diffractive network capable of executing piston sensing task all-optically is obtained. In the numerical simulations, the trained phase parameters are random values between 0 and 2π. Considering the deviation in the manufacturing process, where eight-level binary optical technique will be utilized to fabricate the diffractive units, we finalize the phase values using eight-step quantization. All the simulation results in this paper are worked out with these quantized phase parameters.
Then simulation without aberration is investigated. In the numerical testing, testing dataset with 2000 samples are input to the ODNN and corresponding predicted intensity distributions are output. The confusion matrix for 1000 testing samples randomly selected from testing dataset is shown as Fig. 3(a), which describes the statistical overview of correct classification and incorrect classification for each subrange. According to the simulation results, the blind testing accuracy of the subrange prediction is approximately 97%.
(a) The confusion matrixes of testing results (a) without aberration and (b) with aberration.
Here we take the middle value of the predicted subrange as the piston sensing result. For example, when the predicted subrange is P9: (0.4λ, 0.5λ), the estimated piston value is regarded as 0.45λ. When the predicted labels are consistent with the ground-truth labels, the maximum deviation between the estimated result and the ground-truth piston value is 0.05λ. In the case of incorrect predictions, it can be seen that the wrong predicted labels are adjacent to the ground-truth label from the confusion matrixes in Fig. 3(a), which means the maximum deviation is no more than 0.15λ. The statistical result indicates that the average value of the RMSE for testing samples is about 0.024λ. The histograms in Fig. 4 display the intensity ratios of different subregions in 10 blind tests where the predicted labels are 0 to 9 respectively. It can be seen that there is a remarkable intensity contrast between the expected labels and the others. This is a meaningful index in the further piston correction stage, which can greatly prevents the possibility of misjudgment and assures the subrange prediction accuracy.
Intensity distribution ratios of 10 subregions for imaging system without aberrations when the predicted labels are 0 to 9, respectively. The pink bar in each histogram corresponds to energy distribution percentage of the target subregion.
The purpose of this paper is aimed at piston sensing, so we assume that the wavefront aberrations are eliminated. However, the tip-tilt and sub-aperture aberrations can't be corrected entirely. Thus, slight distortions of 0.05λ, which is generated by using 2–15 Zernike coefficients, is introduced in the imaging system. The Zernike coefficients of aberrations loaded in the two sub-apertures are generated randomly. The generation approach of data is similar to that in the simulation without aberration. 1000 samples are generated in each subrange. All the samples in 10 subranges add up to 10000. The training dataset size is 8000 and the testing dataset size is 2000. Numerical simulation is implemented to evaluate the impact of aberration on this method. According to the confusion matrix shown in Fig. 3(b), the testing accuracy achieves 97% as well. When piston value is −0.18λ, which belongs to subrange 3, the intensity distributions at the output plane without aberration and with aberration are presented in Fig. 5(a) and (b) respectively. It is obvious that almost all the light energy is focused on the defined 10 subregions in both cases. Compared to the intensity distribution without aberration, where there is little background noise outside the subregions R0-R9, the noise on the undesirable regions is relatively greater when aberration exists. However, this background noise would not cause interference on the subregion recognition.
The intensity distributions at the output plane for the simulated samples (a) without aberrations and (b) with aberrations.
To further evaluate the method's robustness against various coefficients of aberration, testing data with various coefficients of aberrations are input the network. The corresponding testing results are shown in Fig. 6. When the coefficient of aberration is 0.05λ, the testing accuracy is 97%. When the coefficient of aberration is 0.03λ, 0.04λ, 0.06λ or 0.07λ, the testing accuracy slightly decreases in varying degrees. Taking the case with 0.07λ aberration as example, where the lowest accuracy of 89% is obtained, we present corresponding confusion matrix of the testing result in Fig. 7. The average value of the RMSE for testing samples is calculated to be about 0.031λ. Compared to the testing accuracy of 0.024λ in the case with 0.05λ aberrations, the accuracy of 0.031λ is still acceptable, which indicates that the proposed method has good robustness against various aberrations.
There are some discussions about the design of the diffractive network for piston sensing task. The first one is about the neuron size. Larger neuron size can reduce the difficulties in fabrication as well as the transverse alignment of network layers in experiment, which refers to the different offsets between the centers of diffractive layers and the optical axis of imaging system. However, for our multi-aperture imaging system, the PSF only occupies an area of hundreds of microns. Larger neuron will discrete the complex amplitude information into less pixels, and the resulting low sampling rate will lead to failure in network training. According to the hyper-parameter tuning results, neuron size is defined as 15um in our simulations. It should be noted that transverse alignment error has a serious influence on the sensing accuracy. When the transverse alignment error is equal to one neuron size, the accuracy decreases from 97% to 54%. Thus translation stages with high precision are needed to control the transverse alignment error within one neuron size. The second one is the axial distance between adjacent layers. On the one hand, the distance setting need to make the light beam cover more neurons in the forward-propagation, thus guaranteeing the higher-degree connectivity of the network. On the other hand, larger distance can improve the tolerance of the network to the axial alignment error, which refers to the deviations from the pre-set distance value between adjacent diffractive layers. As shown in Fig. 8, when the deviation of axial distance between the successive layers is 1.25 mm, the accuracies of the method decrease from 97% to 41% for distance of 4 cm and 61% for distance of 8 cm. Whereas, high accuracy of 92% is still achieved when the distance is 11 cm. Reasonable parameters setting can effectively promote the piston sensing performance of the network.
The effect on accuracy of alignment error in axial distance with different distance parameters d.
Conclusion and Discussion
In conclusion, this paper has demonstrated that ODNN is capable of implementing piston sensing in an all-optical manner. The designed three-layer phase-only ODNN can directly convert the imaging optical field into intensity distribution which represents piston value. Fine phasing is achieved in the cases without aberration and with aberration. As a preliminary feasibility study, there are still some restrictions on the all-optical realization of piston sensing. Firstly, as categorization model is adopted instead of numerical fitting for the real-time performance, it becomes more intricate for the cases with more apertures. This may be resolved by using multiple diffractive networks to estimate piston value of every two sub-apertures. Additionally, network with smaller neurons and more compact design will be used in the further researches. Secondly, other superior piston representation models need to be explored to address the difficulties in piston sensing of extended objects.