Journals & Magazines >IEEE Photonics Journal >Volume: 16 Issue: 6

Non-Volatile Reconfigurable Optical Digital Diffractive Neural Network Based on Phase Change Material

Abstract:

Optical diffractive neural networks have sparked extensive research due to their low power consumption and high-speed capabilities in image processing. Here we propose an...Show More

Metadata

Abstract:

Optical diffractive neural networks have sparked extensive research due to their low power consumption and high-speed capabilities in image processing. Here we propose and design a reconfigurable all-optical diffractive neural network structure with digital non-volatile optical neurons. The optical neurons are built with Sb₂Se₃ phase-change material and can switch between crystalline and amorphous states with no constant energy supply. Using three reconfigurable non-volatile digital diffractive layers and 10 photodetectors connected to a reconfigurable resistor network, our model achieves an accuracy of 94.46% in the handwritten digit recognition task. Moreover, the fabrication and assembly robustness of the proposed optical diffractive neural network is verified through full-vector diffractive simulation. Thanks to its reconfigurability and low energy supply, the digital optical diffractive neural network holds great potential to facilitate a programmable and low-power-consumption photonic processor for optical-artificial-intelligence.

Published in: IEEE Photonics Journal ( Volume: 16, Issue: 6, December 2024)

Article Sequence Number: 8500608

Date of Publication: 28 November 2024

ISSN Information:

DOI: 10.1109/JPHOT.2024.3508052

Funding Agency:

Contents

CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.

SECTION I.

Introduction

Deep learning is a type of machine learning method that allows computers to perform complex tasks through artificial neural networks (ANNs) [1]. As integrated circuit (IC) scale expands, deep learning witnessed significant advancements in applications such as image recognition [2], automatic driving [3], signal processing [4], and natural language processing [5]. However, with the rapid growth of data processing requirements, IC-based deep learning may face challenges like high energy consumption and processing time [6], [7].

Optical diffractive neural network (ODNN) has attracted extensive research for its high energy efficiency and low latency [8], [9], [10], [11]. Compared to IC-based Von-Neumann solutions like CPU and GPU, which need to frequently read/write network parameters and intermediate data from memory, ODNNs use neuromorphic photonic neurons that intrinsically achieve neuron interconnection and activation. Users only need to load the image at the input layer and then read the result at the output layer. The neural network calculations will be performed automatically with almost no latency. In this way, the latency, power consumption and throughput of ODNN may be greatly improved compared to IC-based solutions.

Utilizing passive diffractive devices, such as 3D-printed surfaces with variable thicknesses [9] or etched metasurfaces with variable patterns [12], researchers have achieved commendable results in tasks like image recognition. After the fabrication of these passive devices, only illuminating light and photodetectors (PDs) consume power. Therefore, these schemes are usually more power-efficient than conventional IC-based neural networks [9], [12]. However, the neuron size of the passive diffractive devices in ODNN is usually on the subwavelength scale, precision fabrication of neurons’ thickness or pattern may become a huge challenge [11]. Additionally, the performance of ODNN is susceptible to misalignment and manufacturing errors, an additional electrical network layer on the backend is used to compensate for these errors [13], [14], [15], [16]. Compared with passive diffractive devices, reconfigurable ODNN (R-ODNN) is able to compensate for production errors and perform different tasks without remanufacturing. Different reconfigurable ODNN have been demonstrated by using active devices, such as digital micromirror devices (DMD) [15], [16], spatial light modulators (SLM) [17], [18], [19], or reprogrammable metasurfaces with homebuilt microwave antenna transceivers [20]. These designs exhibit remarkable reconfigurability and accuracy. However, it is important to note that these reconfigurable active layers require a continuous power supply compared to passive designs [14], [15], which may significantly increase the power consumption. Recently, phase-change materials (PCMs) have been widely used in silicon photonic integrated platform, which provide an attractive opportunity for non-volatile photonic applications, such as filtering [21], optical matrix computation [22], [23]. While photonic integration approach has significant potential benefits, a free-space reconfigurable and non-volatile ODNN is still lacking.

In this work, we propose and design a reconfigurable digital all-optical diffractive neural network (ROD²NN) powered by phase-change material (PCM). The PCM-based scheme provides a robust platform with convenient regulation and non-volatile modulation. We can reprogram the network for different image recognition tasks without remanufacturing its structure and constant energy supply. All programmable optical neurons share identical simple structures, potentially enhancing consistency in large-scale manufacturing. Depending on the build-in PCM being in a completely amorphous state or a completely crystalline state, the optical neuron can demonstrate a phase shift of 0 or π respectively [24]. Compared with intermediate states, switching to a completely crystalline/amorphous state may be more robust. Since optical neurons have only binary states, the network is digitalized [25]. In the training process, we employed an efficient gradient optimization algorithm based on the variable density method, treating neurons as continuously changing entities. To ensure the digitization of the final optical neurons, we applied a dynamic binarization penalty factor. Featuring three diffracting layers, each containing 120 × 120 neurons, and ten photodetectors as output, our model attains a recognition accuracy of 93.8% for the MNIST handwritten digit dataset. The network performance is further enhanced to 94.46% with the incorporation of a correcting resistor network. Since the correcting resistor network only performs a linear matrix multiplication, it can be implemented using passive linear electronics. Given that both the optical and electrical components of the network are passive, it results in an exceptionally low static power consumption. In this approach, static power consumption only includes the light source, photodetector, transimpedance amplifier, and operational amplifier. Then we discuss the manufactory and assembly robustness of this device. We find that the reconfigurable correcting resistor network may also improve the resilience to misalignment errors. We also trained the Fashion-MNIST data set to demonstrate the reconfigurability of the network.

SECTION II.

Digital Diffractive Neural Network Structure

Fig. 1 shows the structure of the ROD²NN. It consists of three digital diffractive layers, a photodetector layer, and a correcting layer. A diffractive layer is 120 μm × 120 μm in size and consists of 120 × 120 non-volatile digital optical neurons. The distance between the neighborhood diffractive layers is 50 μm. The photodetector layer consists of ten photodetectors and converts the optical information into electrical information. The surface area of the PD is 1 μm × 1 μm. The simple correcting layer is implemented by resistors and can perform a 10 × 10 linear matrix operation. Fig. 1(b) shows the equivalent neural network mathematical structure. It consists of 3 hidden linear fully-connected layers, a nonlinear activation layer, and a 10 × 10 linear fully-connected output layer. These layers simulate the PCM diffractive layers, the photodetectors, and the correcting layer, respectively. The wavelength of the incident light is 1.55 μm.

$Fig. 1. - The Schematics of the R-ODNN. (a) The physical structure of the R-ODNN. (b) The equivalent neural network mathematical structure. (c) Partial top view of the diffractive layer. (d) Single neuron structure and parameters. (e) Forward propagation and neuron transmission function in the equivalent neural network. The behavior of a single neuron is shown in the orange box. The blue line represents the diffracted transmission of light from the upper layer to the next. The accumulation sign “Σ” represents the interference of light at the input surface of an optical neuron.$

Fig. 1.

The Schematics of the R-ODNN. (a) The physical structure of the R-ODNN. (b) The equivalent neural network mathematical structure. (c) Partial top view of the diffractive layer. (d) Single neuron structure and parameters. (e) Forward propagation and neuron transmission function in the equivalent neural network. The behavior of a single neuron is shown in the orange box. The blue line represents the diffracted transmission of light from the upper layer to the next. The accumulation sign “Σ” represents the interference of light at the input surface of an optical neuron.

Show All

Fig. 1(c) and (d) show the basic structure of the optical neuron. Optical PCM fills rectangular etching on the silicon dioxide substrate. The etching hole has a depth of 1.1 μm and a side length of 0.67 μm, arranged in a grid pattern of 1 μm × 1 μm on the diffractive layer. In this case, the transmittance phase shift difference of a neural between amorphous and crystalline states is equal to π. For light with the wavelength of 1.55 μm, the refractive indices of the Sb₂Se₃ phase change material in the completely amorphous and completely crystalline states are 3.28 and 4.05, respectively [24], [26], which allows neurons to have a larger amount of phase modulation between the two states. The intrinsic absorption $\kappa$ of Sb₂Se₃ in both the crystalline and amorphous states is less than 10⁻⁵ [24]. The transmittances of the neuron in both states are approximately 90%, which may increase the intensity of light received by the photodetector. For computational convenience, we normalize the linear transmittance to 1. In fact, phase change materials GSST and Sb₂S₃ are also widely used in the optical field. When in its crystalline state, material GSST exhibits low intrinsic absorption for light with wavelengths above 2.5 μm [27], making it suitable for designing phase control devices in the mid-infrared and far-infrared bands [28]. With an intrinsic absorption $\kappa \approx 1$ for light at 1.55 μm [29], it is often utilized in the design of amplitude control devices [30]. Sb₂S₃ shows low intrinsic absorption at 1.55 μm in both states, but experiments have indicated that its morphology may be easily damaged during the annealing process [24].

The forward propagation process of the layers in the equivalent neural network is shown in Fig. 1(e). In this model, each neuron is treated as a point receiver and transmitter. When the light passes through a neuron, it applies different amplitude modulation $k$ and phase shifts $\theta$ to the injected light as (1) and (2).

$\begin{align*} &n_i^l = m_i^l\ T_i^l \tag{1}\\ &T_i^l = k_i^l\ \exp \left( {j\theta _i^l} \right)\tag{2} \end{align*}$ View Source

where

$i,l$

represent the

${{i}\text{th}}$

neuron on the

${{l}\text{th}}$

layer.

$m_i^l$

and

$n_i^l$

are the optical input and output of the neuron, respectively.

$T_i^l$

is the complex transmission of the neuron.

$\theta _i^l$

represents the phase shift of the neural and we set it to 0 or π according to the PCM state.

$k_i^l$

is the amplitude modulation of the neuron and is set to 1. When the light propagates between layers, the diffraction is calculated with angular spectrum diffraction theory by

$\begin{align*} & {{{\mathcal{N}}^l}\ \left( {{{f}_x},{{f}_y}} \right) = \ {\mathcal{F}}\left( {{{n}^l}\left( {x,y} \right)} \right)} \tag{3}\\ & {{{\mathcal{M}}^{l + 1}}\ \left( {{{f}_x},{{f}_y}} \right) = {{\mathcal{N}}^l}\ \left( {{{f}_x},{{f}_y}} \right) \odot {\mathcal{H}}\left( {{{f}_x},{{f}_y}} \right)} \tag{4}\\ &{{{m}^{l + 1}}\ \left( {x,y} \right) = {{\mathcal{F}}^{ - 1}}\ \left( {\mathcal{M}_i^{l + 1}\left( {{{f}_x},{{f}_y}} \right)} \right)} \tag{5} \end{align*}$

View Source

where

$\mathcal{F}{\rm{()}}$

is the Fourier transform and

${{\mathcal{F}}^{ - 1}}{\rm{()}}$

is the inverse Fourier transform.

${{n}^l}( {x,y} )$

is the complex output of the

$l$

layer and

${{m}^{l + 1}}( {x,y} )$

is the complex input of the

$l + 1$

layer. To ensure accuracy, we need to properly pad the 0-amplitude light field around the

${{n}^l}( {x,y} )$

${{f}_x}$

and

${{f}_y}$

are the spatial frequencies in the

$x$

and

$y$

directions, respectively.

${{\mathcal{N}}^l}( {{{f}_x},{{f}_y}} )$

and

${{\mathcal{M}}^{l + 1}}( {{{f}_x},{{f}_y}} )$

are the spatial frequency domain distributions of

${{n}^l}( {x,y} )$

and

${{m}^{l + 1}}( {x,y} )$

respectively.

$\odot$

the Hadamard product, also called “element-wise multiplication” in deep learning.

$\mathcal{H}( {{{f}_x},{{f}_y}} )$

is the frequency-domain transmission function of diffraction as

$\begin{equation*} {\mathcal{H} ( {{{f}_x},{{f}_y}}) = {{e}^{ik{\rm{\Delta }}z\sqrt {1 - {{{\left( {\lambda {{f}_x}} \right)}}^2} - {{{\left( {\lambda {{f}_y}} \right)}}^2}} }}} \tag{6} \end{equation*}$

View Source

where

$\lambda \ = \ 1550\,\text{nm}$

is the wavelength and

$k\ = \ 2\pi {\rm{/}}\lambda$

is the spatial angular frequency.

${\rm{\Delta }}z\ = \ 50\;\mu\text{m}$

is the distance between the neighborhood layers. Comprehensive (3)–(6), input of the

$l + 1$

layer can be calculated by:

$\begin{equation*} {m_i^{l + 1}\ \left( {x,y} \right) = n_i^l\ \left( {x^{\prime},y^{\prime}} \right)*{{F}^{ - 1}}\left[ {\mathcal{H}\left( {{{f}_x},{{f}_y}} \right)} \right]} \tag{7} \end{equation*}$

View Source

where

$*$

denotes 2D convolution. The ten detectors on the output surface respectively correspond to ten input patterns. For example, as shown in Fig. 1(a), when we set the pattern “3” as input, we hope that the detector numbered “3” will receive the maximum light intensity.

The cascaded correcting layer is a fully-connected layer that further boosts accuracy and compensates for system errors. We may use a variable resistor network or digital signal processor in the physical model to finish the task. The inputs of this network are the response currents of 10 photodetectors. The network obtains more accurate outputs through a linear matrix operation by

$\begin{equation*} {{R}_{10}} = {{W}_{10 \times 10}}\ {{I}_{10}} \tag{8} \end{equation*}$ View Source

where

${{I}_{10}}$

is a 10-element vector corresponded to the input intensity.

${{W}_{10 \times 10}}$

is a

$10 \times 10$

linear matrix accomplished by the corrected network.

${{R}_{10}}$

is the 10-element vector output of the network and may get a more accurate result.

Because correction layer is mainly used to compensate for the errors of the optical layers, we train the optical layers and the correcting layer separately. So, we divide the training process into two stages: the optical stage and the correcting stage. In the optical stage, we optimize the binary state of optical neurons to improve the recognition rate of the optical part. In the correcting stage, we cascade the corrective section to the optical network and use it as the final output of the neural network. In this stage, we freeze the state of optical layers and only optimize the linear matrix ${{W}_{10 \times 10}}$ in the correcting layer. The training details are as follows.

In the optical stage, we optimize the recognition rate of the optical part. The training process is shown in Fig. 2(a). The forward complex electric field diffraction process has been described by (1)–(5). We simulate the process by using the Pytorch tensor computing package. We then use the cross-entropy between the softmax-normalized photodetector intensity and the one-hot-encoding train label as the accuracy loss function. We choose the backward propagation (BP) method to calculate the gradient of the loss function by

$\begin{align*} &\frac{{\partial Loss}}{{\partial n_i^l}} = \frac{{\partial Loss}}{{\partial n_i^{l + 1}}}\ \cdot T_i^{l + 1}*{{F}^{ - 1}}\left[ {\mathcal{H}\left( {{{f}_x},{{f}_y}} \right)} \right]\tag{9}\\ &\frac{{\partial Loss}}{{\partial \theta _i^l}} = \frac{{\partial Loss}}{{\partial n_i^l}}\ \cdot m_i^lk_i^l\exp \left( {j\left( {\theta _i^l + \frac{\pi }{2}} \right)} \right)\tag{10} \end{align*}$ View Source

where we use (9) describes the gradient BP across the adjacent layers and (10) to calculate the gradient of the

$\theta _i^l$

in each layer. However, since the phase shift

$\theta _i^l$

of a digital optical neuron has only two states, the gradient descent (GD) algorithm cannot simply update discrete variables. Here we assume that the phase shift

$\theta _i^l$

changes continuously between 0 and π. Thus, we introduce a density function

$\rho$

to represent the ratio of the two states by:

$\begin{equation*} \theta _i^l = \left( {1 - \rho _i^l} \right)\ \cdot 0 + \rho _i^l \cdot \pi \tag{11} \end{equation*}$

View Source

where

$\rho _i^l \in [ {0,\ 1} ]$

. When

$\rho _i^l = \ 0$

$\theta _i^l = \ 0$

means the completely amorphous state. When

$\rho _i^l = \ 1$

$\theta _i^l = \ \pi$

means the completely crystalline state. Then we can calculate the gradient of

$\rho _i^l$

$\begin{equation*} \frac{{\partial Loss}}{{\partial \rho _i^l}} = \frac{{\partial Loss}}{{\partial \theta _i^l}}\ \cdot \frac{{\partial \theta _i^l}}{{\partial \rho _i^l}} \tag{12} \end{equation*}$

View Source

Fig. 2.

(a) The training process of the optical layers of the equivalent neural network in the first stage. (b) The physical structure and the schematics of the correcting resistor network. The 10 × 10 fully connected layer is the equivalent mathematical model of the resistance correction network.

Show All

The calculation of (9)–(12) can be easily implemented by the automatic derivation tool of the Pytorch package. In order to make the neurons in the optimization results be in a binary state, we also introduced a binary penalty function to $\rho$ by

$\begin{align*} &P_i^l = {{{\left( {\rho _i^l - 0} \right)}}^2}\ + {{{\left( {\rho _i^l - 1} \right)}}^2}\tag{13}\\ &\frac{{dP_i^l}}{{d\rho _i^l}} = \ 4\rho _i^l\tag{14} \end{align*}$ View Source

Where the penalty function $P_i^l$ is the sum of l₂-norms distance between the current continues $\rho _i^l$ and two target binary values, 0 and 1. The final gradient includes the gradient of the accuracy loss and binary penalty function by

$\begin{equation*} g_i^l = \frac{{\partial Loss}}{{\partial \rho _i^l}}\ + T \cdot \frac{{dP_i^l}}{{d\rho _i^l}}\tag{15} \end{equation*}$ View Source

where

$g_i^l$

is the gradient of each optical neural.

$\mathcal{T}$

is the penalty coefficient controlling the direction and magnitude of this penalty force. When

$\mathcal{T} > 0$

, it means giving

$\rho _i^l$

a force to the intermediate state, which may help the optical neuron switch between the two states more easily. When

$\mathcal{T} < 0$

, it means giving

$\rho _i^l$

a force toward the binary state, so that the optimization result only has binary neurons. Here we use dynamic coefficients

$\mathcal{T}$

, where we set it to

${\rm{1 \times 1}}{{{0}}^{{\rm{ - 6}}}}$

, 0,

${\rm{ - 1 \times 1}}{{{0}}^{{\rm{ - 5}}}}$

${\rm{ - 3 \times 1}}{{{0}}^{{\rm{ - 5}}}}$

${\rm{ - 5 \times 1}}{{{0}}^{{\rm{ - 5}}}}$

${\rm{ - 7 \times 1}}{{{0}}^{{\rm{ - 5}}}}$

, and

${\rm{ - 1 \times 1}}{{{0}}^{{\rm{ - 4}}}}$

in epochs 0, 30, 60, 90, 120, 150, 180, and 210, respectively. We use the Adam optimizer to optimize this network. Fig. 4 shows the recognition accuracy of the ROD²NN during the training process. The first 300 epochs show the accuracy variation as the penalty coefficient

$\mathcal{T}$

varies. We find that the accuracy increases rapidly in the first 50 epochs, and then decreases as the

$\mathcal{T}$

decreases. It shows that digitization will cause some degradation in network accuracy. We set

$\mathcal{T}$

to the negative maximum at epoch 300 to ensure that the optical neuron is completely binary. The recognition accuracy of the optical layers in the ROD²NN finally converges to 93.8%.

In the second stage, we cascade the corrective section to the optical network to further improve accuracy. As Fig. 3(b) shows, the responsive current of each photodetector is the input of the corrective section, and the production of 10 adders tells the final classification result of the neural network. During the training process, we fixed the optical network model parameters and only trained the correction layer. We also use the softmax normalization and cross-entropy loss function to calculate the loss. Then we use the Adam optimizer to optimize the weights of this layer. As shown in Fig. 3, the accuracy of this correction-assisted network can be boosted to 94.6% after 230 epochs. The initial values of the correcting section are randomly set so that the accuracy drops during the first few epochs.

Fig. 3.

Measured network accuracy with and without correcting the resistor network. The first 300 epochs use a dynamic penalty coefficient to achieve binarized weights.

Show All

SECTION III.

Results and Discussion

We perform a 3D finite-difference time-domain (FDTD) full-vector simulation to accurately verify the feasibility and accuracy of the neural network. In the ideal diffraction model mentioned above, each neuron is treated as a point receiver and transmitter. It does not consider the impact of the physical structure of optical neurons on incident light. In FDTD simulation, we use real physical structures to obtain a more realistic optical transmission process. We use commercial software (Lumerical FDTD: 3D Electromagnetic Simulator) for simulation. We directly set the PCM refractive index in the crystalline state and amorphous state to 3.28 and 4.05 respectively. We then obtain the input/output intensity distribution of each diffraction layer through the frequency-domain field profile monitor. An example of the results with pattern “5” input is shown in Fig. 4(a).

$Fig. 4. - (a) The intensity of light on each layer. The output diffraction pattern and the position of photodetectors are shown on the right. (b) The results of optical diffraction model calculation and FDTD full-vector simulation for handwritten digit classification. The energy distribution simulated by the two methods matches well in most situations.$

Fig. 4.

(a) The intensity of light on each layer. The output diffraction pattern and the position of photodetectors are shown on the right. (b) The results of optical diffraction model calculation and FDTD full-vector simulation for handwritten digit classification. The energy distribution simulated by the two methods matches well in most situations.

Show All

Using a similar approach, we simulated the intensity distribution in the photodetector layer under different input patterns. We also compare the output intensity distribution of the ideal diffraction model and FDTD simulation in Fig. 4(b). We found that the output results of the two simulation models are relatively similar, which may prove the accuracy of the ideal diffraction model. However, minor hot spots in our final diffraction patterns could interfere with our classification results. These problems can be solved by training the correcting network online.

According to Fig. 4, the result of our ideal diffraction theory-based neural network simulates the actual situation well in general. Therefore, we can approximate the accuracy of our physical model with the neural network. The accuracy is shown in Fig. 5, showing that the correcting network included version reached 94.5% in the end, and the optical-only model achieved an accuracy of 93.6%.

Fig. 5.

The confusion matrix of the neural network. (a) accuracy without the correcting layer; (b) accuracy with the correcting layer.

Show All

We also trained the optical diffraction network at different layer distances. Fig. 6(a) shows the accuracy of the network after training when the layer spacing is 30, 35, 40, 50, 55, 60, 65, and 70 μm. We can find that the changes of classification accuracy are less than 0.03% as the layer distance varies. Considering the assembly process and energy loss, we choose a layer spacing of 50 μm for our model.

Fig. 6.

(a) The accuracy degradation of the ROD²NN under layer distance error. The orange line shows the restoration of performance brought by the correcting layer. (b) The accuracy degradation of the ROD²NN under layer distance error. The orange line shows the restoration of performance brought by the correcting layer.

Show All

Moreover, we discussed assembly error and manufacturing error to further evaluate the robustness of our ROD²NN. The assembly error is caused by the disposition of diffractive layers. We specified the spacing between diffractive layers to be 50 μm during training, but it is difficult to achieve it accurately during assembly. As shown in Fig. 6(b), ROD²NN is sensitive to optical axis misalignment, and a severe accuracy degradation occurs at a deviation of about 1 μm. However, we can retrain the correcting layer online to compensate for errors after the optical structure of the ROD²NN has been fabricated. In this way, the ROD²NN can maintain an accuracy above 85% over 1 μm tolerance.

We also analyzed the impact of neuron manufacturing errors on network accuracy. Manufacturing errors include errors in neuron size, thickness, and refractive index. These errors lead to changes in neuronal transmittance and phase shift. We first discuss the accuracy of the ROD²NN under different transmittance ratio ${{T}_c}{\rm{/}}{{T}_a}$ and phase shift ${\rm{\Delta }}\varphi$ in Fig. 7. Where ${{T}_c}$ and ${{T}_a}$ are the transmittances of the neurons in the crystalline and amorphous states, respectively. Here we directly load the error onto the trained network. We also try to retrain the correction layer to reduce the impact. As shown in Fig. 7(a), The simulated accuracy of the ROD²NN drops less than 3% when the ${\rm{\Delta }}\varphi$ is set within ±20%. Retraining the correcting layer online can make the degradation caused by errors even lower. As shown in Fig. 7(b), we can find that the network accuracy drops less than 0.3% when the ratio of transmittances varies from 0.8 to 1.2. This means that our network is more severely affected by phase errors. Altogether, our network has a high tolerance for optical neuron errors.

$Fig. 7. - Degradation of network accuracy due to the error of the phase shift difference ${\rm{\Delta }}\varphi $ and the ratio of transmittances ${{T}_c}{\rm{/}}{{T}_a}$. The ideal value of the ${\rm{\Delta }}\varphi \ = \ \pi $, and the ${{T}_c}{\rm{/\ }}{{T}_a} = \ 1$. The green dashed curve is the 94.46% baseline accuracy of our ROD2NN with no manufacturing errors. The red curve shows accuracy after retraining the correcting layer. (a) Accuracy of the ROD2NN when the phase difference of neurons shifts from 0.8π to 1.2π, respectively. (b) Accuracy of the ROD2NN when the transmittance of two neuron states is different.$

Fig. 7.

Degradation of network accuracy due to the error of the phase shift difference ${\rm{\Delta }}\varphi$ and the ratio of transmittances ${{T}_c}{\rm{/}}{{T}_a}$ . The ideal value of the ${\rm{\Delta }}\varphi \ = \ \pi$ , and the ${{T}_c}{\rm{/\ }}{{T}_a} = \ 1$ . The green dashed curve is the 94.46% baseline accuracy of our ROD²NN with no manufacturing errors. The red curve shows accuracy after retraining the correcting layer. (a) Accuracy of the ROD²NN when the phase difference of neurons shifts from 0.8π to 1.2π, respectively. (b) Accuracy of the ROD²NN when the transmittance of two neuron states is different.

Show All

Fig. 8(a)–(c) show the ${{T}_c}{\rm{/}}{{T}_a}$ and ${\rm{\Delta }}\varphi$ between the crystalline and amorphous states of neurons under different thicknesses, sizes, and refractive index errors. We found that when the width error of the optical neuron is less than −22 nm – +18 nm, or the height error is less than −24 nm – +36 nm, or the refractive index error is less than −0.09 – +0.06, the transmittance ratio and phase error are less than 20%.

$Fig. 8. - The ${{T}_c}{\rm{/}}{{T}_a}$ and ${\rm{\Delta }}\varphi $ between two states of the neuron upon different types of neuron manufacturing errors. The blue area indicates the ±20% ratio of transmittance error around 1 (0.8–1.2). The red area indicates the ±20% phase shift difference error around π (0.8π–1.2π). (c) The ${{T}_c}{\rm{/}}{{T}_a}$ and ${\rm{\Delta }}\varphi $ of a neuron with different width error from −30 nm to +30 nm. (d) a neuron with different thickness error from -50 nm to +50 nm. (e) a neuron with different index shift errors. The index shift error is measured by the deviation from the ideal index of the neuron, from −0.1 to +0.1.$

Fig. 8.

The ${{T}_c}{\rm{/}}{{T}_a}$ and ${\rm{\Delta }}\varphi$ between two states of the neuron upon different types of neuron manufacturing errors. The blue area indicates the ±20% ratio of transmittance error around 1 (0.8–1.2). The red area indicates the ±20% phase shift difference error around π (0.8π–1.2π). (c) The ${{T}_c}{\rm{/}}{{T}_a}$ and ${\rm{\Delta }}\varphi$ of a neuron with different width error from −30 nm to +30 nm. (d) a neuron with different thickness error from -50 nm to +50 nm. (e) a neuron with different index shift errors. The index shift error is measured by the deviation from the ideal index of the neuron, from −0.1 to +0.1.

Show All

Our ROD²NN can perform different identification tasks without remanufacturing. We made an additional simulation on the Fashion-MNIST dataset to verify reconfigurable ability. It achieved an accuracy of 84.5% for the corrective-section-included setup, and 83.7% for an optical-only setup. The ROD²NN can do this by just reconfiguring the state of optical neurons and the values of correcting layer. As shown in Fig. 9(b) and (c), we can see the error of the label “0 (T-shirt)”, “4 (Coat)”, and “6 (Shirt)” are significantly higher. This is probably because the Fashion-MNIST image classification dataset is more complex than the MNIST dataset. Since the size of the current optical network is relatively small, the accuracy could drop when facing complex images. Adding more optical layers or increasing the size of each layer would further improve the classification accuracy.

$Fig. 9. - The results of neural network calculation and FDTD vector simulation on the Fashion-MNIST dataset. (a) Diffraction pattern of neural network calculation and FDTD simulation. (b) Confusion matrix of the complete ROD2NN setup trained with the Fashion-MNIST dataset. (c) Confusion matrix of the optical-section-only ROD2NN setup trained with the Fashion-MNIST dataset.$

Fig. 9.

The results of neural network calculation and FDTD vector simulation on the Fashion-MNIST dataset. (a) Diffraction pattern of neural network calculation and FDTD simulation. (b) Confusion matrix of the complete ROD²NN setup trained with the Fashion-MNIST dataset. (c) Confusion matrix of the optical-section-only ROD²NN setup trained with the Fashion-MNIST dataset.

Show All

We compare the static power, reconfigurability, neuron configurate time of our ROD²NN with other ODNNs in Table I. Our scheme, along with other schemes based on active devices [15], [17], [20], can be reprogrammed to adapted to different tasks. However, these active devices, such as DMD [15], SLM [15], [17], CMOS [15], and homebuilt microwave antenna transceivers [20], tend to consume more power in intermediate calculation (excluding I/O). In contrast, passive device-based schemes [9], [12] only consume energy in the optical I/O, with the transmission part consuming no energy. So, the switching speed of neurons mainly depends on the heater response time. The overall switching speed may be on the millisecond level. Typically, its state can be switched using either optical modulation or electrical modulation methods. Light modulation methods include laser serially direct writing [34], [35] or parallel pattern exposure [36]. Electrical regulation requires arranging heating units and transparent indium tin oxide (ITO) circuit on the diffraction surface [37], [38]. In actual experiments, the impact of circuits and heating elements on the light field must be considered. Fresnel reflections between different materials may be mitigated by designing coatings to reduce reflectivity. We can also attempt to reduce the impact of reflections through coating techniques [39]. Therefore, comprehensive design of the neuron structure and circuitry is essential when developing experimental schemes.

TABLE I Comparison of Static Power, Reconfigurability, Neuron Configurate Time Over Different Schemes

SECTION IV.

Conclusion

We propose and design a non-volatile reconfigurable digital optical diffractive neural network based on phase change material. The network consists of three digital diffractive layers, each 1 μm thickness, in which digital neurons are placed on a 1 μm-by-1 μm grid with a side length of 0.8 μm. With the resistive correcting network, the R-ODNN achieves 94.46% classification accuracy for the MNIST handwritten digit dataset and keeps above 93% under the following errors: optical axis alignment error below 1 μm, neuron edge length error below −22 nm – +18 nm, thickness error below −24 nm – +36 nm or refractive index error of phase change material below −0.09 – +0.06, proving that our model is tolerant towards manufacturing errors.

This work has provided a fundamental scheme for creating an easily reconfigurable, off-line trainable, and on-chip integrated R-ODNN based on PCMs. The novel binarized neural network in this work highlights that simple and linear structures can exhibit strong performance in optical systems. Moreover, the introduction of non-linear optical layers may significantly enhance the accuracy of our network. We are very excited to discover new materials or structures that can achieve efficient nonlinear operations. We also anticipate obtaining explicit performance results from subsequent physical fabrication experiments.

References is not available for this document.

Non-Volatile Reconfigurable Optical Digital Diffractive Neural Network Based on Phase Change Material

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

Introduction

Digital Diffractive Neural Network Structure

Results and Discussion

Conclusion

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Non-Volatile Reconfigurable Optical Digital Diffractive Neural Network Based on Phase Change Material

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

Introduction

Digital Diffractive Neural Network Structure

Results and Discussion

Conclusion

References