# **E**

# A 4.0 μm Stacked Digital Pixel Sensor Operating in a Dual Quantization Mode for High Dynamic Range

Kazuya Mori<sup>®</sup>, Naoto Yasuda, Ken Miyauchi<sup>®</sup>, *Member, IEEE*, Toshiyuki Isozaki, Isao Takayanagi<sup>®</sup>, *Member, IEEE*, Junichi Nakamura<sup>®</sup>, *Fellow, IEEE*, Ho-Ching Chien, Ken Fu, Shou-Gwo Wuu, *Member, IEEE*, Andrew Berkovich<sup>®</sup>, Song Chen, Wei Gao, and Chiao Liu, *Member, IEEE* 

Abstract—It is anticipated that ubiquitous computer vision (CV) and artificial intelligence (AI) applications used on mobile devices will grow significantly. Such applications require battery-powered, always-on mobile devices to support indoor/outdoor, day/night usages. A global shutter (GS), stacked digital pixel sensor (DPS) is a promising candidate to meet such requirements because of its potential for ultralow-power, ultrahigh dynamic range (HDR), and a small form factor. This article presents a prototype 4.0- $\mu$ m stacked DPS operating in its dual quantization (2Q) to realize HDR. The 4.0- $\mu$ m DPS pixel is formed with two layers, a backside illuminated pinned photodiode (PD) pixel on the top layer and an in-pixel analog-to-digital conversion (ADC) circuit with 9-bit static random access memory (SRAM) on the bottom layer. A Cu-to-Cu hybrid bonding (HB) technology is used to connect the two layers via pixel-level interconnect. In the 2Q scheme, a time-stamp (TS) quantization and a linear ADC are performed sequentially in the same frame, which extends the dynamic range (DR) with a small number of ADC bits of 9. The DPS with a 1024  $\times$  832 pixel array has achieved a single-exposure ultra HDR of 107 dB in a single frame. The nonlinear conversion characteristic of the TS mode provides an equivalent full well capacity (FWC) of 2000ke<sup>-</sup>, while the noise floor in the linear ADC mode is 8.3e<sup>-</sup>.

*Index Terms*—CMOS image sensor (CIS), computer vision (CV) sensor, digital pixel sensor (DPS), global shutter (GS), high dynamic range (HDR), noise analysis, stacked process.

#### I. INTRODUCTION

THE digital pixel sensor (DPS) architecture [1], [2] was studied with high expectation to achieve high-speed, lowpower, and high dynamic range (HDR) [3], [4] in a global shutter (GS) operation in 1990s to early 2000s. However, its

Manuscript received February 25, 2022; revised March 25, 2022; accepted March 26, 2022. Date of publication April 25, 2022; date of current version May 24, 2022. The review of this article was arranged by Editor R. M. Guidash. (*Corresponding author: Kazuya Mori.*)

Kazuya Mori, Naoto Yasuda, Ken Miyauchi, Toshiyuki Isozaki, Isao Takayanagi, and Junichi Nakamura are with Brillnics Japan Inc., Tokyo 140-0013, Japan (e-mail: mori.kazuya@brillnics.com).

Ho-Ching Chien, Ken Fu, and Shou-Gwo Wuu are with Brillnics Inc., Zhubei 302, Taiwan.

Andrew Berkovich, Song Chen, Wei Gao, and Chiao Liu are with Reality Labs Research, Meta Inc., Redmond, WA 98052 USA.

Color versions of one or more figures in this article are available at https://doi.org/10.1109/TED.2022.3164032.

Digital Object Identifier 10.1109/TED.2022.3164032

optical performance was poorer than that of the then conventional GS pixels, simply because they used front-side illumination (FSI) devices, which resulted in a lower photodiode (PD) fill factor and a larger pixel size. However, recent progress in the state-of-the-art CMOS fabrication processes allows a stacked backside illuminated DPS approach where the CMOS image sensor (CIS) pixel and the pixel-level analog-to-digital conversion (ADC) and in-pixel memories are fabricated on separate, stacked wafers, which permit a smaller pixel size with higher image quality [5]–[7].

In recent camera applications, significant growth in computer vision (CV) and artificial intelligence (AI) applications on mobile platforms is expected, and thus there is an urgent need for a new generation of GS sensors that can deliver lowpower, high dynamic range (DR), high sensitivity, and low latency capabilities simultaneously while the single-exposure HDR rolling shutter sensor technologies have been extensively studied [8]–[16].

In such context, we have developed a stacked DPS with a triple quantization (3Q) scheme that achieves 127-dB DR and ultralow power of 5.7 mW at 30 frames/s with 10 bits/pixel ADC in a 4.6- $\mu$ m pixel size [17], [18].

The DR of 127 dB corresponds to 22 bits in linear coding. The scheme we have adopted to realize the 127-dB DR with only 9-bit resolution was a combination of compressive coding using the time-to-saturation (TTS) quantization scheme and two linear ADCs [19]. Although using fewer bits may result in fine image details not being captured [7] and the combined three different operation modes may result in poorer SNR characteristics, especially at the junction points between two different operation modes, the obtained reconstructed linear photoresponse in the 3Q scheme exhibits the SNR gaps between modes approximately 4 dB [18]. Thus, the compressed representations in this scheme are best suited for CV applications, where the relevant information relies on image features [17]–[19].

Before the development of 3Q DPS, a proof-of-concept device was designed, fabricated, and characterized to verify the basic operation of a combined quantization scheme with a time-domain quantization and a linear ADC [20]. This article reports its operating principle, pixel structure, and characterization results, together with a comparison with the 3Q scheme.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/



Fig. 1. Circuit/block diagram and cross-sectional view of the stacked DPS.



Fig. 2. Conceptual operation of time-to-stamp DPSs.

The rest of the article is organized as follows. In Section II, the concept of the dual-quantization (2Q) scheme and its difference from the 3Q scheme is presented, and then, circuit implementation and operation are described in Section III, followed by characterization results of the 2Q device and comparison with the 3Q device in Section IV; finally, the conclusion is given in Section V.

# II. 2Q SCHEME FOR ULTRA HDR

# A. Stacked DPS Architecture

Fig. 1 shows a circuit diagram and a cross-sectional view of the stacked DPS pixel. The pixel circuit is formed with a two-wafer stack consisting of a CIS layer (top) and an ADC layer (bottom) and these two layers are connected using the pixel-level hybrid bonding (HB) technology [21], [22]. This stacked DPS structure has the advantage of small pixel size, low parasitic light sensitivity (PLS), and large PD fill factor, despite a large number of pixel components of more than 50 as a typical example.

# B. TS Quantization

Fig. 2 shows the simplified conceptual diagram of TS operation. The voltage at the floating diffusion (FD) node ( $V_{FD}$ ) and a ramping reference voltage ( $V_{FD\_REF}$ ) are continuously compared, and the time code (N) is counted during the exposure period by an in-pixel comparator. The pinned photo diode (PPD) starts to overflow at  $t_{of}$  depending on the incident light intensity. When  $V_{FD}$  reaches  $V_{FD\_REF}$  at  $t_{flip}$ , the comparator flips and the time code  $N_{flip}$  is latched in the in-pixel memory.



Fig. 3. Conceptual diagram of sensor output signals in (a) 2Q scheme and (b) 3Q scheme. A conceptual diagram showing coverage of each quantization scheme for several light levels.

Time  $t_{\text{flip}}$  is inversely proportional to the light intensity. The equivalent signal charges  $Q_{\text{EO TS}}$  can be estimated as

$$Q_{\rm EQ_{-TS}} = \left(\frac{t_{\rm int}}{t_{\rm flip}}\right) \times \left(Q_{\rm FD} + Q_{\rm PD_FWC}\right) \tag{1}$$

where  $t_{\text{int}}$ ,  $t_{\text{flip}}$ ,  $Q_{\text{FD}}$ , and  $Q_{\text{PD}-\text{FWC}}$  are the integration time, the flipping time, the signal charge of FD at  $t_{\text{flip}}$ , and the full well capacity (FWC) of PD, respectively. Note that  $Q_{\text{FD}}$  is a function of  $t_{\text{flip}}$ , and thus is a function of the light intensity. For example, when 8 bits [256 digital numbers (DNs)] are allocated to the TS region, the maximum equivalent FWC  $Q_{\text{EQ}-\text{TS}}$  is 256 × ( $Q_{\text{PD}} + Q_{\text{FD}-\text{REF}}$ ), which means 2<sup>8</sup> times or 48 dB DR extension can be achieved.

#### C. 2Q Mode Scheme

As described in Section II-A and Section II-B, the stacked DPS operating in the TS mode has the potential of extending the DR even with a small pixel size. However, low light performance in the TS mode could be a serious issue due to its high noise if it is used solely. To achieve better noise performance at the low light end while simultaneously extending DR toward high light end within single exposure, a 2Q scheme, which combines the TS mode and the linear ADC mode, is investigated.

Fig. 3 shows conceptual plots for charge domain photoresponse characteristics of the 2Q scheme (left) and that of the 3Q scheme (right [12], [13]) as a function of exposure time. During the exposure period, when the PD well is filled by the photo-generated charge, excess charge starts to overflow from the PD to FD at  $t_{of}$ .

In such high light conditions in the 2Q scheme, the TS scheme detects the light intensity from  $t_{\text{flip}}$ , the time at which the FD voltage crosses the FD-referred ramping  $V_{\text{FD}_{\text{REF}}}$ , and time  $t_{\text{flip}}$  is quantized as DN. The reference voltage  $V_{\text{REF}}$ 

needs to be ramping to avoid a large gap between the two quantization modes, as shown in Figs. 2 and 3(a). On the contrary,  $V_{\text{REF}}$  is set at a constant level in the 3Q scheme as shown in Fig. 3(b) where the amount of charge that is detected during the TS mode is constant, which is set to slightly below the FD saturation level. Therefore, the TS mode in the 3Q scheme is referred to as the TTS mode [17]. Note that the TTS mode in the 3Q scheme operates in a low conversion gain mode with the sense node capacitance being the sum of the FD capacitance and the additional capacitance  $C_{\rm S}$  inside the pixel to obtain a smooth transition in the photo-transfer characteristic from the TTS mode and the following linear ADC mode with low conversion gain, named FDADC mode [18]. On the other hand, the minor signal gap in the 2Q scheme cannot be intrinsically avoidable because an individual junction point in a photoresponse of two quantization scheme could not be fit into the desirable ADC operation range mainly due to PD full well variation ( $\Delta Q_{PD}$ ), and that component is discussed Section IV.

After exposure, linear ADC is performed for the charge accumulated in lower light conditions. A single ADC is performed in the 2Q mode, while two ADCs are performed in the 3Q mode. In both the cases, the digital data from one of the quantization modes are automatically selected and stored in the in-pixel memory. As a result, HDR image can be captured within a single exposure and a single frame with minimum digital bits.

# **III. CIRCUIT IMPLEMENTATION AND OPERATION**

In this section, circuit implementation and operation of the overlapped 2Q DPS are described. Figs. 4–6 show the pixel circuit schematic, the timing diagram, and the potential diagrams, respectively.

#### A. CIS Pixel

In the back side illumination (BSI) pixel, the CIS layer circuit consists of a conventional 4T pixel that has a deep pinned PD with enhanced near-infrared response (NIR) response [123], a transfer gate (TG), a reset gate (RST), and a source follower gate (SF). In addition, an in-pixel bias transistor ( $V_{bn}$ ) is formed to provide a bias current for pixel SF, which enables the pixel parallel/global signal sampling [24], [25]. A buried overflow path is formed beneath TG through which excess charges flow from PD to FD during the exposure period [15]. As a result, good overflow performance can be achieved while dark current generation under TG is suppressed by a negative TG bias voltage.

# B. In-Pixel ADC Circuit

In-pixel ADC circuit is composed of a comparator, wordline selection gate, and 9-bit static random access memory (SRAM) that include the "Flag" bit implemented with memory "Lock" control logic as shown in Fig. 4. The comparator consists of the input stage and the gain stage with feedback loop. A coupling capacitor CC is inserted at the input of the comparator to compensate the variations in SF output and the comparator threshold voltage in each pixel by the auto-zeroing

VAAPIX Comparato Memory Lock rst 📕 FLAG<0> DATA<0:7> Comp\_RST S FD Comn Over 册 flow Lock Comp\_REF Δ ٦V Flag RAMP Comp CH **CIS** layer ADC layer

Fig. 4. Pixel circuit schematic of 2Q DPS.

(AZ) operation. In the comparator reset phases (at  $t_0$  and  $t_2$ , in Fig. 5), Comp RST goes high to equalize the negative input node voltage of the comparator  $V_{\rm IN}$ , and then Comp\_REF level is lowered to force the comparator output (Comp\_out) to be logical "0." Comp\_out is then connected to the word line of SRAM. When the comparator flips and Comp\_out goes to "1," the correspondent ADC code on the bitline is latched as the ADC result. The ADC circuit is re-used for two different quantization modes that can be differentiated by the "Flag bit." The "Flag" bit, together with the memory "Lock" control logic are used for the automatic quantization mode adaptation. It is enabled by the dynamic memory write control logic working as an overwrite protect circuit. The "Flag" bit state can be toggled when Comp\_CHK is "H" and Comp\_out state is flipped during exposure. By placing the flag bit memory cell as a part of the partial memory array block in the pixel array, its layout can be contiguous as part of SRAM, and area efficiency is achieved without layout overhead.

#### C. Pixel Operation in 2Q Scheme

Detailed operations of each quantization mode are described in this section.

1) TS Mode: Fig. 5 shows the timing diagram during TS mode, where the FD-referred ramp signal (red dotted line:  $V_{\rm FD REF}$ ) and FD voltages (solid lines) are illustrated. The FD voltage and its corresponding comparator flip time (the DN code) are a function of the input light intensity. The corresponding potential diagrams are illustrated as well at the bottom of this figure. As an example, three scenarios, (a) "very bright," (b) "bright," and (c) "just over PD saturation," are shown. Pixel driving pulses of TG and RST are set high for shutter operation at the beginning of exposure at  $t = t_0$ . Then, the AZ operation takes place by pulsing Comp\_RST after the setup time ( $t_{\text{setup TS}}$ ), which triggers Comp\_REF to start ramping and SRAM bit lines to receive the digital code. During exposure, FD voltage is continuously compared with  $V_{\text{FD}_{\text{REF}}}$  (or  $V_{\text{IN}}$  is continuously compared with Comp\_REF) to detect  $t_{\text{flip}}$ . At  $t_{\text{flip}}$ , the flag bit is set to 1 and a corresponding digital code DN (note that DN is 8 bit and goes downward over time from 255 to 0) is stored in the SRAM. On the other hand, if the FD voltage does not reach  $V_{\rm FD REF}$  during the exposure period, the flag bit is kept at "0" and the conventional linear ADC takes place after the charge is transferred from PD to FD and the resulting digital code is stored in the SRAM.

Critical parameters, including the duration between the end of exposure and the start time of the linear ADC,  $t_{TS\_Lin}$ , the



Fig. 5. TS operation and potential diagram.

PD full well variation,  $\Delta Q_{PD}$ , and the ADC offset,  $\Delta V_{AZ\_error}$ , are shown in Fig. 5 and discussed in Section IV-B.

As we first introduced in Section II-B, the time to flip during the exposure period is inversely proportional to the input light intensity. Based on (1), the equivalent signal charge  $N_{EQ_TS}$ , obtained by the TS quantization mode, can be expressed approximately by the following:

$$N_{\rm EQ_{TS}} = 256/(256 - \rm DN) \cdot (N_{\rm FD} + N_{\rm PD_{FWC}})$$
 (2)

where DN is the stored TS digital code, and  $N_{\rm FD}$  and  $N_{\rm PD_FWC}$  denote the number of stored charges on FD at the comparator flip timing and PD FWC, respectively. The nonlinear data code in this TS scheme can be linearized easily by off-chip data processing.

Throughout this operation, the total static bias current of the top and bottom pixel circuits is below 100 nA.

#### D. Linear Mode

A timing diagram and the associated potential plots are shown in Fig. 6. After the end of exposure, the linear ADC mode is activated. At first, AZ is performed to equalize the comparator input voltage  $V_{\rm IN}$  to  $V_{\rm COMP\_REF}$  at  $t = t_2$ . Just after that, PD charge is transferred to FD by the TG pulse and the single-slope ADC operation follows. The correlated double sampling (CDS) is realized in this mode for the kTC noise associated with FD reset. Note that accumulated leakage current on the FD node is excluded by this AZ.

# E. Sensor Chip Architecture

The sensor block diagram is shown in Fig. 7. The sensor chip was fabricated using a 45-nm enhanced NIR sensitivity CIS and 65-nm logic stacked sensor process. The two layers are connected by pixel-level interconnects using a Cu-to-Cu HB technology.

The pixel array contains  $1024 \times 832$  effective pixels. Because of the nature of a proof-of-concept device, sensor control signals and ramp voltage for quantization are generated off-chip using FPGA, and post digital signal processing is performed off-chip as well. ADC codes are input/output from the chip. IN/OUT control is also done by FPGA.

A chip photograph is shown in Fig. 8. The die size of the sensor is  $4.0 \times 4.0 \text{ mm}^2$ . The optical center is aligned to the die center.

# **IV. CHARACTERIZATION RESULTS**

In this section, characterization system, photoresponse, noise characteristics, and its approximation model in the TS and linear mode operation are described. Figs. 9 and 10



Fig. 6. Pixel potential and timing diagram of linear ADC operation.



Fig. 7. Sensor chip architecture by the stacked process.



Fig. 8. Sensor chip photograph.

show photoresponse, noise characteristics, and sample images, respectively.

#### A. Characterization System

As described in Section III-D, sensor control signals are generated by an off-chip FPGA. Although the operation timing needs to be fully optimized to obtain the best sensor performance by minimizing a signal gap between the two modes of operation, the characterization system had some limitations in manipulating control signals. Thus, there are some limitations in optimizing the test chip sensor performance. In particular, it is highly dependent on some critical parameters such as the time delay from the end of exposure to the start of the linear ADC (described in Section II-C).

#### B. Photoresponse and Noise Characteristics

The photoresponse characteristics are shown in Fig. 9 [Fig. 9(a): linear ADC (8 b), Fig. 9(b): 2Q (9 b), Fig. 9(c): TS (8 b)]. Two operation modes are sequentially performed and form a 9-bit digital code, as shown in Fig. 9(b). Thanks to the combined operation sequence, the wide range of photoresponse from low light to very bright light is obtained.

In reality, the variations in individual pixel performance, such as the variations in PD FWC and those in ADC offset and delay time, cause pixel-to-pixel variations or fixed pattern noise (FPN), especially at the junction point between the two operation modes. Furthermore, the signal gap at the junction point that appears as the charge gap is observed, which is caused by the duration ( $t_{TS_Lin}$ ) between TS and linear ADC operation.

As described in the previous section, PD saturation variation  $(\Delta Q_{\rm PD})$  causes variation in  $t_{\rm of}$  ( $\Delta t_{\rm of}$ ), which in turn results in time-to-flip variation ( $\Delta t_{\rm flip}$ ). The ADC offset, namely, the auto zero offset variation ( $\Delta V_{\rm AZ\_error}$ ), is another factor that causes flipping time variation and signal gap. Also, FD leakage current causes a false saturation signal.

For better understanding of photoresponse characteristics, the entire range in a photoresponse is divided into



Fig. 9. Photoresponse characteristics (left: linear ADC with offsets subtraction, center: 2Q (9 b), right: TS). (a) 1Q linear ADC only (with high gain at  $4 \times$ ). (b) 2Q with flag bit operation (full 9 bit function). (c) 1Q TS only (with theoretical model plot).



Fig. 10. Noise characteristics in a TS operation for FPN and temporal noise components.

individual operation. Photoresponse and noise characteristics in each segment are described next.

# C. TS Mode

Fig. 9(c) shows the measured photoresponse and a theoretical model of the TS quantization. Since overflow charge is detected after PD saturation, ADC output stay fixed as overhead code by which lower side of operation range is secured until PD fully saturated and photoresponse starts. The photoresponse estimated from the model described in (1) agrees well with the measurement data. As described in Section III-B, the pixel equivalent maximum detectable signal is estimated to be  $2.0 \times 10^6 \text{ e}^-$  from (2), and the measured read noise is  $8.3\text{ e}^-$  in the linear ADC mode. This yields 107 dB in the resulting 2Q mode.

The FPN and temporal noise in TS operation as a function of the signal level are shown with approximate line by following equation in Fig. 10. The FPN and temporal noise decrease monotonically as the signal level and the input light intensity increase while it increases up to lower code around 40 LSB which is assumed to be mode transient area for this TS operation. As discussed in Section III-C2, the FPN in the TS operation is caused by the variation in the comparator flip time  $t_{flip}$ , and its main contribution is considered as the variation in PD FWC. When the light intensity is strong enough for the comparator to flip right after the exposure starts, the variation in time to fill the PD well is negligibly small. However, when the input light intensity is lower and the comparator flipping occurs closer to the end of the exposure time, the variation in PD full well, which in turn causes the variation in  $t_{of}$ , results in large FPN.

The FPN component  $\sigma_{\text{FPN}_TS}$  in time can be approximately modeled as

$$\sigma_{\text{FPN}_{\text{TS}}} = \frac{t_{\text{flip}}}{t_{\text{int}}} \times \sigma_{\text{FPN}_{\text{PDFW}}}$$
(3)

where  $\sigma_{\_FPN\_PDFW}$  is the FPN caused by the PD full well variation,  $\Delta Q_{PD}$ . This FPN component in time is digitized by TS quantization that will be observed as SNR drops to about 23 dB at the transition from linear mode to TS mode from 36 dB dominated by photon shot noise at the maximum linear ADC code.

On the other hand, the major temporal noise components in the TS mode are the ADC noise and the photon shot noise. In case that ADC noise component is sufficiently small, the major contribution to the temporal noise comes from the photon shot noise, which is represented by  $(Q_{\rm PD} + Q_{\rm FD}(t_{\rm flip}))^{1/2}$ .

Similar to the formula for FPN above, the temporal noise,  $\sigma_{\text{TN TS}}$ , in the TS mode is modeled as

$$\sigma_{\rm TN\_TS} = \frac{t_{\rm flip}}{t_{\rm int}} \times \sigma_{\rm TN\_SN} \tag{4}$$

where  $t_{\text{int}}$ ,  $t_{\text{flip}}$ , and  $\sigma_{\text{TN}_{SN}}$  are the integration time, a flipping time, and shot noise at  $t_{\text{flip}}$ , respectively.

The model described by (3) and (4) agrees well with the measurement data. Therefore, it is confirmed that the above



Fig. 11. HDR image of a real-life scene with 2Q operation. (a) 9-bit raw image. (b) TS-only image. (c) Linear-only image. (d) Binary flag bit image.

noise approximation model is valid and the major FPN and temporal noise sources are the PD full well variation and photon shot noise, respectively.

1) Linear Mode: As shown in Fig. 9(a), the linear photoresponse has been obtained up to the full code (offsets are subtracted). In linear ADC mode, FPN and temporal noise at the dark are  $63.9e^-$  and  $8.3e^-$ , respectively. The main dark noise source in this linear mode is considered for both the temporal noise and FPN to be noise coupling within the in-pixel ADC module, due to its unoptimized layout. The linear photoresponse in high-gain ADC operation (4×) is demonstrated as well. For FPN, we have confirmed two major contributing factors, one is a dc offset variation in the first-stage amplifier in the comparator due to its limited dc gain and the other is the delay variation in the comparator mainly due to the bias current variation [18]. The pixel-topixel photoresponse non-uniformity (PRNU) was 1.5% around 50% of PD saturation.

#### D. Sample Images

The sample images are shown in Fig. 11. The photograph at the top left is a full 9-bit HDR image in the 2Q mode. A filament in a light bulb and a name card behind the light bulb are clearly seen, which clearly demonstrates the HDR capability of this sensor. The top right image and the bottom left image are 8-bit images that are reproduced with digital codes of the TS mode only and of the linear ADC mode only, respectively, both extracted from the single shot image data shown at the top left. To distinguish the two modes, the flag bit is used and is shown in the bottom right as a flag bit map, which demonstrates the per-pixel selection of operational mode.

# E. Comparison With 3Q Operation

Table I summarizes the pixel performance of this prototype device in comparison to the stacked DPS with a 3Q scheme [12].

 TABLE I

 PIXEL PERFORMANCE SUMMARY WITH 3Q-DPS

| Specification                          | This work [20]      | IEDM2020[17]      |
|----------------------------------------|---------------------|-------------------|
| Quantization scheme                    | 2Q<br>(TS & Linear) | 3Q<br>(PD-FD-TTS) |
| Pixel size [µm]                        | 4.0                 | 4.6               |
| In pixel Memory bit #<br>(state bit #) | 9b(1)               | 10b(2)            |
| QE max [%]                             | >90                 | >90               |
| Dynamic range [dB]                     | 107                 | 127               |
| CG [uV/e <sup>-</sup> ]                | 150                 | 150/12            |
| Linear full well [ke <sup>-</sup> ] *  | 4/2000              | 3.8/51/9000       |
| Noise floor [e <sup>-</sup> ]          | 8.3                 | 4.2               |
| Dark FPN [e <sup>-</sup> ]             | 63.9                | 47                |

\*Equivalent maximum handling photo charge at the each of read out segment.

The smaller pixel size in the 2Q scheme is due to the simpler operation which requires simpler control logic in the ADC layer pixel circuit. Dominant dark FPN and temporal noise in the linear mode come from performance variation in the in-pixel ADC [17]. In the latest implementation for the 3Q operation, both FPN and temporal noise are reduced by suppressing noise coupling within the in-pixel ADC module and by solving the operating timing issue that is described in Section III-C1.

The equivalent FWC is represented by (2) with DN being 255. The difference in DR between the 2Q and 3Q schemes comes from the differences in the handling capacity at the charge sense node and temporal noise floor. As described in Section II-C, the sense node capacitance in the TTS mode in the 3Q scheme is  $C_{\text{FD}} + C_S$ , while it is  $C_{\text{FD}}$  in the TS mode in the 2Q scheme. The PLS, which is defined for the GS pixel by the ratio of the memory node responsivity to the PD responsivity, is extremely low, due to the nature of DPS, i.e., the stored data in the memory node are static digital data, and it is under the detection limit of our measurement setup, while the parasitic responsivity at the FD node is measured to be -55 dB.

Note that FPN can be corrected with dark frames and its value in reconstructed code for 2Q and 3Q DPS can be negligibly small (<1 LSB) [13].

# V. CONCLUSION

We have developed a  $1024 \times 832$  pixel, stacked DPS with a 2Q scheme using the advanced wafer-to-wafer HB technology. The basic operation of a combined quantization scheme with a time-domain quantization and a linear ADC has been demonstrated. Also, its operating principle, pixel structure, and characterization results are presented, and detailed comparison with the 3Q scheme is discussed.

The proposed 2Q operation incorporates TS quantization and linear ADC sequentially in the same frame. The nonlinear conversion characteristic of the TS mode enables to extend the DR with the small number of ADC bits of 8 bits, while a 1-bit flag distinguishes the two quantization modes. The DPS with a 4.0- $\mu$ m pixel size, which is, to the best of our knowledge, the smallest pixel size ever reported, achieves ultra HDR of 107 dB with 2000ke- equivalent FWC and noise floor of 8.3e-.

# ACKNOWLEDGMENT

The authors are deeply indebted to the outstanding group of researchers and engineers, as well as technology visionaries across Reality Labs Research, Meta Inc., Redmond, WA, USA, Brillnics, Tokyo, Japan, and TSMC. The authors wish to acknowledge Dr. Toshinori Otaka of Tokyo University of Science, Tokyo, Japan, for initiation in design and characterization of these accomplishments.

#### REFERENCES

- A. El Gamal *et al.*, "Pixel-level processing: Why, what, and how?" *Proc.* SPIE, vol. 3650, pp. 2–13, Mar. 1999.
- [2] O. Storka et al., "CMOS digital pixel sensors: Technology and applications," Proc. SPIE, vol. 9060, Apr. 2014, Art. no. 90600G.
- [3] A. Guilvard *et al.*, "A digital high dynamic range CMOS Image sensor with multi-integration and pixel readout request," *Proc. SPIE*, vol. 6501, Feb. 2007, Art. no. 65010L.
- [4] S. Kavusi and A. El Gamal, "Quantitative study of high-dynamicrange image sensor architectures," *Proc. SPIE*, vol. 5301, pp. 264–275, Jun. 2004.
- [5] M. Sakakibara *et al.*, "A 6.9-μm pixel-pitch back-illuminated global shutter CMOS image sensor with pixel-parallel 14-bit subthreshold ADC," *IEEE J. Solid-State Circuits*, vol. 53, no. 11, pp. 3017–3025, Feb. 2018.
- [6] H. Sugo et al., "A dead-time free global shutter CMOS image sensor with in-pixel LOFIC and ADC using pixel-wis e connections," in Proc. IEEE Symp. VLSI Circuits (VLSI-Circuits), Jun. 2016, pp. 1–2.
- [7] H. Sugo et al., "A dead-time free global shutter CMOS image sensor with in-pixel LOFIC and ADC using pixel-wise connections," in Proc. IEEE Symp. VLSI Circuits (VLSI-Circuits), Jun. 2016, pp. 1–2.
- [8] C. Liu, A. Berkovich, S. Chen, H. Reyserhove, S. S. Sarwar, and T.-H. Tsai, "Intelligent vision systems-bringing human-machine interface to AR/VR," in *IEDM Tech. Dig.*, Dec. 2019, pp. 218–221.
- [9] M. Abrash, "Creating the future: Augmented reality, the next humanmachine interface," in *IEDM Tech. Dig.*, San Francisco, CA, USA, Dec. 2021, pp. 9–19.

- [10] C. Liu, S. Chen, T.-H. Tsai, B. de Salvo, and J. Gomez, "Augmented reality—The next frontier of image sensors and compute systems," in *Proc. IEEE Int. Solid- State Circuits Conf. (ISSCC)*, San Francisco, CA, USA, Feb. 2022, pp. 426–428.
- [11] M. Oh et al, "3.0 μm Backside illuminated, lateral overflow, high dynamic range, LED flicker mitigation image sensor," in *Proc. Program Int. Image Sensor Workshop (IISW)*, vol. 34, 2019, p. 262.
- [12] S. Iida *et al.*, "A 0.68e-rms random-noise 121 dB dynamic-range subpixel architecture CMOS image sensor with LED flicker mitigation," in *IEDM Tech. Dig.*, Dec. 2018, p. 10.
- [13] M. Innocent *et al.*, "Automotive 8.3 MP CMOS image sensor with 150 dB dynamic range and light flicker mitigation," in *IEDM Tech. Dig.*, Dec. 2021, p. 30.
- [14] J. Šolhusvik et al., "A 1280–960 2.8 μm HDR CIS with DCG and splitpixel combined," in Proc. Int. Image Sensor Workshop (IISW), vol. 32, 2019, pp. 254–257.
- [15] I. Takayanagi, K. Miyauchi, S. Okura, K. Mori, J. Nakamura, and S. Sugawa, "A 120-ke<sup>-</sup> full-well capacity 160-μV/e<sup>-</sup> conversion gain 2.8-μm backside-illuminated pixel with a lateral overflow integration capacitor," *Sensors*, vol. 19, no. 24, p. 5572, Dec. 2019.
- [16] X. Guo, X. Qi, and J. G. Harris, "A time-to-first-spike CMOS image sensor," *IEEE Sensors J.*, vol. 7, no. 8, pp. 1165–1175, Jun. 2007.
- [17] C. Liu et al., "A 4.6 μm, 512×512, ultra-low power stacked digital pixel sensor with triple quantization and 127dB dynamic range," in *IEDM Tech. Dig.*, San Francisco, CA, USA, Dec. 2020, pp. 327–330.
- [18] R. Ikeno *et al.*, "A 4.6-μm, 127-dB dynamic range, ultra-low power stacked digital pixel sensor with overlapped triple quantization," *IEEE Trans. Electron Devices*, early access, Jan. 13, 2022, doi: 10.1109/TED.2021.3121352.
- [19] S. Vargas-Sierra, G. Linaán-Cembrano, and A. Rodríguez-Vázquez, "A 151 dB high dynamic range CMOS image sensor chip architecture with tone mapping compression embedded in-pixel," *IEEE Sensors J.*, vol. 15, no. 1, pp. 180–195, Jan. 2015.
- [20] K. Mori *et al.*, "A 4.0μm stacked digital pixel sensor operating in a dual quantization mode for over 120 dB dynamic range," in *Proc. Int. Image Sensor Workshop*, 2021, Paper R47.
- [21] C.-T. Ko and K.-N. Chen, "Wafer-level bonding/stacking technology for 3D integration," *Microelectron. Rel.*, vol. 50, no. 4, pp. 481–488, Apr. 2010.
- [22] P. Ramm, J. J-Q. Lu, and M. M. V. Taklo, *Handbook of Wafer Bonding*, Hoboken, NJ, USA: Wiley, 2012.
- [23] S. Tanaka *et al.*, "Single exposure type wide dynamic range CMOS image sensor with enhanced NIR sensitivity," *ITE Trans. Media Technol. Appl.*, vol. 6, no. 3, pp. 195–201, 2018.
- [24] K. Mori *et al.*, "Back side illuminated high dynamic range 4.0 μm voltage domain global shutter pixel with multiple gain readout," in *Proc. Int. Image Sensor Workshop*, 2019, pp. 326–329.
- [25] K. Miyauchi *et al.*, "A stacked back side-illuminated voltage domain global shutter CMOS image sensor with a 4.0 μm multiple gain readout pixel," *Sensors*, vol. 20, no. 2, p. 486, Jan. 2020.