Introduction
Deep learning is a type of machine learning method that allows computers to perform complex tasks through artificial neural networks (ANNs) [1]. As integrated circuit (IC) scale expands, deep learning witnessed significant advancements in applications such as image recognition [2], automatic driving [3], signal processing [4], and natural language processing [5]. However, with the rapid growth of data processing requirements, IC-based deep learning may face challenges like high energy consumption and processing time [6], [7].
Optical diffractive neural network (ODNN) has attracted extensive research for its high energy efficiency and low latency [8], [9], [10], [11]. Compared to IC-based Von-Neumann solutions like CPU and GPU, which need to frequently read/write network parameters and intermediate data from memory, ODNNs use neuromorphic photonic neurons that intrinsically achieve neuron interconnection and activation. Users only need to load the image at the input layer and then read the result at the output layer. The neural network calculations will be performed automatically with almost no latency. In this way, the latency, power consumption and throughput of ODNN may be greatly improved compared to IC-based solutions.
Utilizing passive diffractive devices, such as 3D-printed surfaces with variable thicknesses [9] or etched metasurfaces with variable patterns [12], researchers have achieved commendable results in tasks like image recognition. After the fabrication of these passive devices, only illuminating light and photodetectors (PDs) consume power. Therefore, these schemes are usually more power-efficient than conventional IC-based neural networks [9], [12]. However, the neuron size of the passive diffractive devices in ODNN is usually on the subwavelength scale, precision fabrication of neurons’ thickness or pattern may become a huge challenge [11]. Additionally, the performance of ODNN is susceptible to misalignment and manufacturing errors, an additional electrical network layer on the backend is used to compensate for these errors [13], [14], [15], [16]. Compared with passive diffractive devices, reconfigurable ODNN (R-ODNN) is able to compensate for production errors and perform different tasks without remanufacturing. Different reconfigurable ODNN have been demonstrated by using active devices, such as digital micromirror devices (DMD) [15], [16], spatial light modulators (SLM) [17], [18], [19], or reprogrammable metasurfaces with homebuilt microwave antenna transceivers [20]. These designs exhibit remarkable reconfigurability and accuracy. However, it is important to note that these reconfigurable active layers require a continuous power supply compared to passive designs [14], [15], which may significantly increase the power consumption. Recently, phase-change materials (PCMs) have been widely used in silicon photonic integrated platform, which provide an attractive opportunity for non-volatile photonic applications, such as filtering [21], optical matrix computation [22], [23]. While photonic integration approach has significant potential benefits, a free-space reconfigurable and non-volatile ODNN is still lacking.
In this work, we propose and design a reconfigurable digital all-optical diffractive neural network (ROD2NN) powered by phase-change material (PCM). The PCM-based scheme provides a robust platform with convenient regulation and non-volatile modulation. We can reprogram the network for different image recognition tasks without remanufacturing its structure and constant energy supply. All programmable optical neurons share identical simple structures, potentially enhancing consistency in large-scale manufacturing. Depending on the build-in PCM being in a completely amorphous state or a completely crystalline state, the optical neuron can demonstrate a phase shift of 0 or π respectively [24]. Compared with intermediate states, switching to a completely crystalline/amorphous state may be more robust. Since optical neurons have only binary states, the network is digitalized [25]. In the training process, we employed an efficient gradient optimization algorithm based on the variable density method, treating neurons as continuously changing entities. To ensure the digitization of the final optical neurons, we applied a dynamic binarization penalty factor. Featuring three diffracting layers, each containing 120 × 120 neurons, and ten photodetectors as output, our model attains a recognition accuracy of 93.8% for the MNIST handwritten digit dataset. The network performance is further enhanced to 94.46% with the incorporation of a correcting resistor network. Since the correcting resistor network only performs a linear matrix multiplication, it can be implemented using passive linear electronics. Given that both the optical and electrical components of the network are passive, it results in an exceptionally low static power consumption. In this approach, static power consumption only includes the light source, photodetector, transimpedance amplifier, and operational amplifier. Then we discuss the manufactory and assembly robustness of this device. We find that the reconfigurable correcting resistor network may also improve the resilience to misalignment errors. We also trained the Fashion-MNIST data set to demonstrate the reconfigurability of the network.
Digital Diffractive Neural Network Structure
Fig. 1 shows the structure of the ROD2NN. It consists of three digital diffractive layers, a photodetector layer, and a correcting layer. A diffractive layer is 120 μm × 120 μm in size and consists of 120 × 120 non-volatile digital optical neurons. The distance between the neighborhood diffractive layers is 50 μm. The photodetector layer consists of ten photodetectors and converts the optical information into electrical information. The surface area of the PD is 1 μm × 1 μm. The simple correcting layer is implemented by resistors and can perform a 10 × 10 linear matrix operation. Fig. 1(b) shows the equivalent neural network mathematical structure. It consists of 3 hidden linear fully-connected layers, a nonlinear activation layer, and a 10 × 10 linear fully-connected output layer. These layers simulate the PCM diffractive layers, the photodetectors, and the correcting layer, respectively. The wavelength of the incident light is 1.55 μm.
The Schematics of the R-ODNN. (a) The physical structure of the R-ODNN. (b) The equivalent neural network mathematical structure. (c) Partial top view of the diffractive layer. (d) Single neuron structure and parameters. (e) Forward propagation and neuron transmission function in the equivalent neural network. The behavior of a single neuron is shown in the orange box. The blue line represents the diffracted transmission of light from the upper layer to the next. The accumulation sign “Σ” represents the interference of light at the input surface of an optical neuron.
Fig. 1(c) and (d) show the basic structure of the optical neuron. Optical PCM fills rectangular etching on the silicon dioxide substrate. The etching hole has a depth of 1.1 μm and a side length of 0.67 μm, arranged in a grid pattern of 1 μm × 1 μm on the diffractive layer. In this case, the transmittance phase shift difference of a neural between amorphous and crystalline states is equal to π. For light with the wavelength of 1.55 μm, the refractive indices of the Sb2Se3 phase change material in the completely amorphous and completely crystalline states are 3.28 and 4.05, respectively [24], [26], which allows neurons to have a larger amount of phase modulation between the two states. The intrinsic absorption
The forward propagation process of the layers in the equivalent neural network is shown in Fig. 1(e). In this model, each neuron is treated as a point receiver and transmitter. When the light passes through a neuron, it applies different amplitude modulation
\begin{align*}
&n_i^l = m_i^l\ T_i^l \tag{1}\\
&T_i^l = k_i^l\ \exp \left( {j\theta _i^l} \right)\tag{2}
\end{align*}
\begin{align*}
& {{{\mathcal{N}}^l}\ \left( {{{f}_x},{{f}_y}} \right) = \ {\mathcal{F}}\left( {{{n}^l}\left( {x,y} \right)} \right)} \tag{3}\\
& {{{\mathcal{M}}^{l + 1}}\ \left( {{{f}_x},{{f}_y}} \right) = {{\mathcal{N}}^l}\ \left( {{{f}_x},{{f}_y}} \right) \odot {\mathcal{H}}\left( {{{f}_x},{{f}_y}} \right)} \tag{4}\\
&{{{m}^{l + 1}}\ \left( {x,y} \right) = {{\mathcal{F}}^{ - 1}}\ \left( {\mathcal{M}_i^{l + 1}\left( {{{f}_x},{{f}_y}} \right)} \right)} \tag{5}
\end{align*}
\begin{equation*}
{\mathcal{H} ( {{{f}_x},{{f}_y}}) = {{e}^{ik{\rm{\Delta }}z\sqrt {1 - {{{\left( {\lambda {{f}_x}} \right)}}^2} - {{{\left( {\lambda {{f}_y}} \right)}}^2}} }}} \tag{6}
\end{equation*}
\begin{equation*}
{m_i^{l + 1}\ \left( {x,y} \right) = n_i^l\ \left( {x^{\prime},y^{\prime}} \right)*{{F}^{ - 1}}\left[ {\mathcal{H}\left( {{{f}_x},{{f}_y}} \right)} \right]} \tag{7}
\end{equation*}
The cascaded correcting layer is a fully-connected layer that further boosts accuracy and compensates for system errors. We may use a variable resistor network or digital signal processor in the physical model to finish the task. The inputs of this network are the response currents of 10 photodetectors. The network obtains more accurate outputs through a linear matrix operation by
\begin{equation*}
{{R}_{10}} = {{W}_{10 \times 10}}\ {{I}_{10}} \tag{8}
\end{equation*}
Because correction layer is mainly used to compensate for the errors of the optical layers, we train the optical layers and the correcting layer separately. So, we divide the training process into two stages: the optical stage and the correcting stage. In the optical stage, we optimize the binary state of optical neurons to improve the recognition rate of the optical part. In the correcting stage, we cascade the corrective section to the optical network and use it as the final output of the neural network. In this stage, we freeze the state of optical layers and only optimize the linear matrix
In the optical stage, we optimize the recognition rate of the optical part. The training process is shown in Fig. 2(a). The forward complex electric field diffraction process has been described by (1)–(5). We simulate the process by using the Pytorch tensor computing package. We then use the cross-entropy between the softmax-normalized photodetector intensity and the one-hot-encoding train label as the accuracy loss function. We choose the backward propagation (BP) method to calculate the gradient of the loss function by
\begin{align*}
&\frac{{\partial Loss}}{{\partial n_i^l}} = \frac{{\partial Loss}}{{\partial n_i^{l + 1}}}\ \cdot T_i^{l + 1}*{{F}^{ - 1}}\left[ {\mathcal{H}\left( {{{f}_x},{{f}_y}} \right)} \right]\tag{9}\\
&\frac{{\partial Loss}}{{\partial \theta _i^l}} = \frac{{\partial Loss}}{{\partial n_i^l}}\ \cdot m_i^lk_i^l\exp \left( {j\left( {\theta _i^l + \frac{\pi }{2}} \right)} \right)\tag{10}
\end{align*}
\begin{equation*}
\theta _i^l = \left( {1 - \rho _i^l} \right)\ \cdot 0 + \rho _i^l \cdot \pi \tag{11}
\end{equation*}
\begin{equation*}
\frac{{\partial Loss}}{{\partial \rho _i^l}} = \frac{{\partial Loss}}{{\partial \theta _i^l}}\ \cdot \frac{{\partial \theta _i^l}}{{\partial \rho _i^l}} \tag{12}
\end{equation*}
(a) The training process of the optical layers of the equivalent neural network in the first stage. (b) The physical structure and the schematics of the correcting resistor network. The 10 × 10 fully connected layer is the equivalent mathematical model of the resistance correction network.
The calculation of (9)–(12) can be easily implemented by the automatic derivation tool of the Pytorch package. In order to make the neurons in the optimization results be in a binary state, we also introduced a binary penalty function to
\begin{align*}
&P_i^l = {{{\left( {\rho _i^l - 0} \right)}}^2}\ + {{{\left( {\rho _i^l - 1} \right)}}^2}\tag{13}\\
&\frac{{dP_i^l}}{{d\rho _i^l}} = \ 4\rho _i^l\tag{14}
\end{align*}
Where the penalty function \begin{equation*}
g_i^l = \frac{{\partial Loss}}{{\partial \rho _i^l}}\ + T \cdot \frac{{dP_i^l}}{{d\rho _i^l}}\tag{15}
\end{equation*}
In the second stage, we cascade the corrective section to the optical network to further improve accuracy. As Fig. 3(b) shows, the responsive current of each photodetector is the input of the corrective section, and the production of 10 adders tells the final classification result of the neural network. During the training process, we fixed the optical network model parameters and only trained the correction layer. We also use the softmax normalization and cross-entropy loss function to calculate the loss. Then we use the Adam optimizer to optimize the weights of this layer. As shown in Fig. 3, the accuracy of this correction-assisted network can be boosted to 94.6% after 230 epochs. The initial values of the correcting section are randomly set so that the accuracy drops during the first few epochs.
Measured network accuracy with and without correcting the resistor network. The first 300 epochs use a dynamic penalty coefficient to achieve binarized weights.
Results and Discussion
We perform a 3D finite-difference time-domain (FDTD) full-vector simulation to accurately verify the feasibility and accuracy of the neural network. In the ideal diffraction model mentioned above, each neuron is treated as a point receiver and transmitter. It does not consider the impact of the physical structure of optical neurons on incident light. In FDTD simulation, we use real physical structures to obtain a more realistic optical transmission process. We use commercial software (Lumerical FDTD: 3D Electromagnetic Simulator) for simulation. We directly set the PCM refractive index in the crystalline state and amorphous state to 3.28 and 4.05 respectively. We then obtain the input/output intensity distribution of each diffraction layer through the frequency-domain field profile monitor. An example of the results with pattern “5” input is shown in Fig. 4(a).
(a) The intensity of light on each layer. The output diffraction pattern and the position of photodetectors are shown on the right. (b) The results of optical diffraction model calculation and FDTD full-vector simulation for handwritten digit classification. The energy distribution simulated by the two methods matches well in most situations.
Using a similar approach, we simulated the intensity distribution in the photodetector layer under different input patterns. We also compare the output intensity distribution of the ideal diffraction model and FDTD simulation in Fig. 4(b). We found that the output results of the two simulation models are relatively similar, which may prove the accuracy of the ideal diffraction model. However, minor hot spots in our final diffraction patterns could interfere with our classification results. These problems can be solved by training the correcting network online.
According to Fig. 4, the result of our ideal diffraction theory-based neural network simulates the actual situation well in general. Therefore, we can approximate the accuracy of our physical model with the neural network. The accuracy is shown in Fig. 5, showing that the correcting network included version reached 94.5% in the end, and the optical-only model achieved an accuracy of 93.6%.
The confusion matrix of the neural network. (a) accuracy without the correcting layer; (b) accuracy with the correcting layer.
We also trained the optical diffraction network at different layer distances. Fig. 6(a) shows the accuracy of the network after training when the layer spacing is 30, 35, 40, 50, 55, 60, 65, and 70 μm. We can find that the changes of classification accuracy are less than 0.03% as the layer distance varies. Considering the assembly process and energy loss, we choose a layer spacing of 50 μm for our model.
(a) The accuracy degradation of the ROD2NN under layer distance error. The orange line shows the restoration of performance brought by the correcting layer. (b) The accuracy degradation of the ROD2NN under layer distance error. The orange line shows the restoration of performance brought by the correcting layer.
Moreover, we discussed assembly error and manufacturing error to further evaluate the robustness of our ROD2NN. The assembly error is caused by the disposition of diffractive layers. We specified the spacing between diffractive layers to be 50 μm during training, but it is difficult to achieve it accurately during assembly. As shown in Fig. 6(b), ROD2NN is sensitive to optical axis misalignment, and a severe accuracy degradation occurs at a deviation of about 1 μm. However, we can retrain the correcting layer online to compensate for errors after the optical structure of the ROD2NN has been fabricated. In this way, the ROD2NN can maintain an accuracy above 85% over 1 μm tolerance.
We also analyzed the impact of neuron manufacturing errors on network accuracy. Manufacturing errors include errors in neuron size, thickness, and refractive index. These errors lead to changes in neuronal transmittance and phase shift. We first discuss the accuracy of the ROD2NN under different transmittance ratio
Degradation of network accuracy due to the error of the phase shift difference
Fig. 8(a)–(c) show the
The
Our ROD2NN can perform different identification tasks without remanufacturing. We made an additional simulation on the Fashion-MNIST dataset to verify reconfigurable ability. It achieved an accuracy of 84.5% for the corrective-section-included setup, and 83.7% for an optical-only setup. The ROD2NN can do this by just reconfiguring the state of optical neurons and the values of correcting layer. As shown in Fig. 9(b) and (c), we can see the error of the label “0 (T-shirt)”, “4 (Coat)”, and “6 (Shirt)” are significantly higher. This is probably because the Fashion-MNIST image classification dataset is more complex than the MNIST dataset. Since the size of the current optical network is relatively small, the accuracy could drop when facing complex images. Adding more optical layers or increasing the size of each layer would further improve the classification accuracy.
The results of neural network calculation and FDTD vector simulation on the Fashion-MNIST dataset. (a) Diffraction pattern of neural network calculation and FDTD simulation. (b) Confusion matrix of the complete ROD2NN setup trained with the Fashion-MNIST dataset. (c) Confusion matrix of the optical-section-only ROD2NN setup trained with the Fashion-MNIST dataset.
We compare the static power, reconfigurability, neuron configurate time of our ROD2NN with other ODNNs in Table I. Our scheme, along with other schemes based on active devices [15], [17], [20], can be reprogrammed to adapted to different tasks. However, these active devices, such as DMD [15], SLM [15], [17], CMOS [15], and homebuilt microwave antenna transceivers [20], tend to consume more power in intermediate calculation (excluding I/O). In contrast, passive device-based schemes [9], [12] only consume energy in the optical I/O, with the transmission part consuming no energy. So, the switching speed of neurons mainly depends on the heater response time. The overall switching speed may be on the millisecond level. Typically, its state can be switched using either optical modulation or electrical modulation methods. Light modulation methods include laser serially direct writing [34], [35] or parallel pattern exposure [36]. Electrical regulation requires arranging heating units and transparent indium tin oxide (ITO) circuit on the diffraction surface [37], [38]. In actual experiments, the impact of circuits and heating elements on the light field must be considered. Fresnel reflections between different materials may be mitigated by designing coatings to reduce reflectivity. We can also attempt to reduce the impact of reflections through coating techniques [39]. Therefore, comprehensive design of the neuron structure and circuitry is essential when developing experimental schemes.
Conclusion
We propose and design a non-volatile reconfigurable digital optical diffractive neural network based on phase change material. The network consists of three digital diffractive layers, each 1 μm thickness, in which digital neurons are placed on a 1 μm-by-1 μm grid with a side length of 0.8 μm. With the resistive correcting network, the R-ODNN achieves 94.46% classification accuracy for the MNIST handwritten digit dataset and keeps above 93% under the following errors: optical axis alignment error below 1 μm, neuron edge length error below −22 nm – +18 nm, thickness error below −24 nm – +36 nm or refractive index error of phase change material below −0.09 – +0.06, proving that our model is tolerant towards manufacturing errors.
This work has provided a fundamental scheme for creating an easily reconfigurable, off-line trainable, and on-chip integrated R-ODNN based on PCMs. The novel binarized neural network in this work highlights that simple and linear structures can exhibit strong performance in optical systems. Moreover, the introduction of non-linear optical layers may significantly enhance the accuracy of our network. We are very excited to discover new materials or structures that can achieve efficient nonlinear operations. We also anticipate obtaining explicit performance results from subsequent physical fabrication experiments.