Introduction
Frame-Based image sensors are widely used in machine vision, but they suffer from a high power consumption and long data-processing delay. The event-driven dynamic vision sensor (DVS) [1]–[5] reduces redundant data for post-processing by only reporting changes in scene reflectance with latencies and temporal precision down to microseconds. These features, along with the wide dynamic range (
Despite the efficiency of data processing by virtue of the sparse output of the DVS, traditional computer vision algorithms cannot be readily applied because no static scene information is encoded. To capture static light intensity, the ATIS sensor [4] uses the DVS pixel as an event trigger and asynchronously updates the pixel intensity value based on pulse width modulation (PWM). However, this time-based intensity readout has three drawbacks: The use of an additional photodiode and complex intensity readout circuit makes the area of an ATIS pixel twice as large as a DVS pixel, the PWM intensity readout triples the asynchronous event data, and the non-uniform exposure can result in strong motion artifacts. To elaborate the last point in an example, a dark, narrow moving object can become invisible in the ATIS intensity readout because intensity measurements are repeatedly reset before completion due to consecutively incoming event triggers with short intervals.
In conventional CIS, motion artifacts can be minimized by using a global shutter. The proposed “dynamic and active pixel vision sensor—DAVIS” (formerly called apsDVS [9], [10]) combines the DVS with an active pixel sensor (APS) at the pixel level. It allows simultaneous output of asynchronous events and synchronous frames. Even though the two readout circuits share the same photodiode, they can operate independently. The SBRet10 chip [9], the first generation of the DAVIS, produces motion artifacts because it uses a rolling shutter scheme [10]. The improved DAVIS design implemented in the SBRet20 chip presented in this paper resolves this problem for it allows global shutter operation. Global shutter image sensors are often used in machine vision applications because minimal motion artifacts are necessary.
The combined static and dynamic output of the DAVIS makes it promising in a range of applications: The DVS output can be used to track and segment fast moving objects, while the APS output allows for the recognition and classification of these objects using established machine vision techniques. Because tracking is done using only DVS events, the frame rate of the APS output can be set arbitrarily low. The combined advantage of the dual outputs makes the DAVIS sensor well-suited for mobile applications or distributed sensor networks with a tight power budget, because it allows low latency at low system-level power consumption.
The rest of this paper is organized as follows. Section II explains the DAVIS concept, Section III describes the pixel design, Section IV describes the chip implementation, Section V presents characterization results and Section VI concludes the paper.
Davis Concept
A useful property of the DVS pixel is that it does not consume the photocurrent during its operation and so the absolute light intensity can be measured by integration of this current, as is done in conventional CIS. In the proposed DAVIS pixel, the intensity readout uses an active pixel readout scheme which is the most common readout scheme for CIS. The only major disadvantage of the APS readout is the limited intra-scene dynamic range, requiring an intelligent exposure control strategy. On the other hand this way of measuring the intensity has several advantages compared to the event-triggered approach used by the ATIS. The shared photodiode and the small size of the APS circuit lead to a pixel area that is only 5% larger than a DVS only pixel. The frames have a uniform exposure and they are acquired using a synchronous global shutter which makes them compatible with existing machine vision algorithms and applications. The sampling rate and the region of interest (ROI) can be chosen according to the demands of applications and the readout is independent of DVS activities.
Pixel Design
A. Fusion of the APS and DVS Pixel Design
The pixel schematic (Fig. 1) shows how a simultaneous operation of the temporal contrast detection and the frame-based intensity readout is achieved without interfering with each other. The DVS photoreceptor circuit (MN6/MN7/MP1) holds the photodiode PD at virtual ground and the photocurrent which is available at the drain of MN6 allows the 4T APS [11] intensity readout with transistors MN1 to MN4. The cascode transistor MN5 protects the drain of MN6 from the large reset transient of Vdr. The APS signal is integrated on the parasitic gate capacitance Caps. To prevent saturated pixels from affecting the DVS pathway during integration, column reset CR is set to an intermediate overflow protection voltage level Vov of around 1.5 V instead of ground, which ensures that MN5 always stays in saturation.
The DVS part of the pixel is similar to [1]: The photocurrent is continuously and logarithmically encoded by the photoreceptor output voltage Vpr at the gate of MN6. A source follower buffer isolates Vpr from the self-timed switched-capacitor amplifier, which amplifies the change from the last reset-level with a gain of −20. This amplified change is compared against ON and OFF thresholds by two common-source static inverter comparators with different bias voltages on the static load transistor gates. As soon as one of the thresholds is crossed, an event is communicated to the periphery (ON for increasing intensity, OFF for decreasing intensity) and concurrently the switched-capacitor amplifier is reset by a pulse generated by the AER Logic block. This reset stores the new illumination level. Each output event thus encodes a fixed amount of change in log intensity since the last event.
B. Comparison to Other Designs
The DAVIS uses a logarithmic transimpedance photoreceptor circuit [1] where the photocurrent is sourced from the n-FET MN6 whose gate is driven by feedback; this will be called an n-FET photoreceptor. An alternative arrangement used by [2] sources the photocurrent from a p-FET whose source is driven by feedback, while the gate is held at a common voltage; this will be called a common-gate photoreceptor. Both circuits use a high gain inverting forward amplifier (MN7/MP1 in the DAVIS pixel). The common-gate photoreceptor reduces the Miller-capacitance of the circuit and thus improves the speed of the pixel. However it has two disadvantages: Firstly the photoreceptor forward amplifier bias current Ipr must be at least as large as the photo-current, which makes it harder to limit the pixel bandwidth. This capability is sometimes required to filter out flickering light sources or reduce noise. Secondly the common-gate photoreceptor prevents direct access to the photocurrent because it is sourced from the forward amplifier, rather than directly available at the drain of the feedback transistor MN6. Therefore a shared photodiode for an APS-DVS circuit is not possible with a common-gate photoreceptor. Although a current mirror allows copying the photocurrent (which is used in [2] for global gain control), this copied current would be too imprecise for an acceptable APS output.
A practical problem with the nFET photoreceptor, which was observed in earlier unpublished designs where MN6 and MN7 were transistors with normal threshold voltage, can arise when the photocurrent is so small that the Vgs of MN6 becomes negative. This condition leads to decreased gain of the MN7-MP1 amplifier because MN7 might enter the linear region. This decreased gain decreases the photoreceptor bandwidth. In both SBRet10 and SBRet20, MN6 and MN7 were replaced by high-threshold thick-gate transistors, as illustrated in Fig. 1. Measurements from a test pixel show that even in the dark, the Vpr is still high enough to ensure the saturation of MN7.
In the ATIS [4] all the analog parts of the pixel are built with a 3.3 V power supply, requiring large n-well design rules for p-FETs. In the DAVIS sensor, this problem is addressed by running the DVS part of the pixel with 1.8 V supply. Because the APS part of the pixel consists of only n-FETs with no 3.3 V n-well, a compact design is achieved by using only the 1.8 V n-well design rules for the p-FETs in the DVS circuit.
Sensor Design
This section discusses the communication interface for DVS events, the acquisition process of intensity images as well as pixel layout considerations.
A. DVS Readout Design
The DVS events are communicated using word-serial burst mode address-event representation (AER) circuits which are adapted from [12]. The circuits work as follows (block diagram in Fig. 2): If a pixel crosses threshold, it generates a request to transmit an event by pulling down a row request line which is shared by all pixels in a row (RR in Fig. 1). A fair arbitration circuit [13] chooses among all active requesting rows and acknowledges a single one (RA). In this acknowledged row, all pixels that have crossed threshold then request in the column direction; the polarity of the event is encoded by different request lines (CRON for ON events, CROFF for OFF events). In the meantime, the row address is registered and transmitted off-chip. During the time (
B. APS Readout Design
For the APS readout, differential double sampling is essential to remove significant FPN mainly caused by MN2. The first sample is the reset voltage of the pixel and the second sample is the voltage at the end of the exposure period.
The rolling shutter DAVIS readout uses a shift register along the x-axis that carries two bit patterns. These patterns are decoded by column-parallel logic. Using these bit patterns simplifies the interface because it requires only a single clock and data input. The first of these two bit patterns selects the column which is reset and sampled. The second bit pattern selects the column which is read out. This separation of the reset and signal columns allows a sequential readout of the two values in different columns. The distance between the two columns corresponds to the time between the reset read and signal read signals i.e., the exposure time. The first generation SBRet10 chip [9] is only capable of rolling shutter readout. The SBRet20 sensor reported here has both rolling and global shutter readout modes, which will be explained in the next paragraph. Fig. 3 compares images of a fan captured from SBRet20 in rolling and global shutter modes to show how the motion artifacts resulting from rolling shutter readout are eliminated by global shutter readout. In Fig. 3(a), the rolling shutter exposure corresponds to the distance between the reset and signal readout planes (marked as exposure). Because the columns are exposed at different times, the exposure plane is tilted and strong motion artifacts can be observed for fast moving objects, as seen by the highly distorted fan shape in Fig. 3(c).
To allow a synchronous global exposure and to still maintain the differential double sampling, a globally-controlled switch TX (MN3) is added to the pixel from SBRet10 and the pixel APS reset signal CR is made globally and column-wise controllable. The global shutter readout timing diagram is shown in Fig. 4. To start the frame read, the integration capacitance (Caps) is charged by setting
The global shutter readout has the additional potential benefit of separating the readout and exposure phases, which could allow a region of interest (ROI) readout. An exposure can be started and stopped at any time, e.g., based on DVS activity. The signal readout could then also be sped up by reading out only part of the array if deemed useful. This partial readout is possible because of the separate exposure and readout phases enabled by the global shutter operation.
In a rolling shutter readout, the TX switch is always on. After a column reset, all the reset values are read from the column. After this, the column reset CR is set to
C. Layout and Process
The SBRet20 DAVIS chip was built in a 180 nm 6M1P CMOS image sensor (CIS) process with MiM capacitors and uses surface photodiodes available in this CIS process, rather than the n-well photodiodes used in the standard mixed-signal process used for fabricating SBRet10 [9]. The CIS process used for SBRet20 has several additional potential advantages that have not yet been utilized including micro lenses, anti-reflection coating, and annealing steps that reduce junction leakage currents. They will be included in a future implementation.
Fig. 5(a) shows a microphotograph of the 5 mm×5 mm die with an inlay of a 2×2 pixel layout. The pixel layout uses a quad-mirrored form to share source contacts and bias lines, and to group analog and digital parts of adjoining pixels [1]. It is essential to minimize parasitic capacitive coupling in the pixel layout, particularly to the high-impedance node PR. Parasitic capacitance extraction tools were used to identify and minimize these capacitances and were successful except for one capacitance as will be discussed in Section V-E. The pixel pitch is 18.5
D. USB Camera Implementation
A bus-powered USB camera implementation (Fig. 5(b)) uses a Cypress FX2 high-speed USB 2.0 interface and a Lattice CPLD reconfigurable logic chip to capture time-stamped DVS events with a timestamp resolution of 1
Characterization
In the following sections, the overall sensor performance, the DVS readout, the APS readout and their interaction are characterized. Table I summarizes the measurements of this section and compares the SBRet20 DAVIS design with recent DVS-based vision sensor implementations.
A. Power Consumption
The 240×180 sensor has a total power consumption between 5∼mW to 14 mW depending on the DVS activity (not including the ADC of the APS output). Most of the power is consumed by the circuits powered by the 3.3 V digital supply, mainly the digital output pads: 1.2 mW at low DVS activity and 8.3 mW at high DVS activity. The circuits on the analog 3.3 V supply (mainly the analog output pads) also consume a considerable amount of power: 3.3 mW at high activity with APS readout but only 0.1 mW without APS readout (pad buffer off) and low DVS activity. The remaining power consumption is by circuits on the 1.8 V supplies, that is, the AER, the pixel backend and the bias generator.
B. Example Data From Natural Scenes
The portability of this system is shown by the recorded outputs of the sensor shown in Fig. 6 for two example outdoor natural scenes. Fig. 6(a)–(c) show a tennis player hitting a backhand stroke. This data can be viewed as the video “DAVIS tennis.wmv” attached as supplementary material. The global shutter 0.3 ms exposure APS readout in Fig. 6(a) shows no motion artifacts despite the APS frame readout time of about 20 ms. The short time slices of DVS output in Fig. 6(b) show the sparse output of the DVS readout between the APS frames. During the backhand stroke, the average DVS event rate is 60∼keps (thousands of events per second), with peak rates of about 250∼keps. Each event requires a little bit more than 2 bytes on average to transmit because events from the same row share the same row address and timestamp. Therefore, the data rate for the DVS events was about 150 kBps.
As a comparison: a 1 kfps APS camera of equal spatial resolution would produce data at a rate of 240 × 180 × (1000 fps) × (10 bits/8 bits/byte) = 54 MBps. This rate is about 360 times higher than the DVS data rate and the temporal resolution of 1ms would still be below that of the DVS events. The frame rate in the recording was approximately 6 fps and the data rate of the APS output at the host was about 310 kBps (240×180×5.7 fps×(10 bits/8 bits/byte)). This APS data rate was twice the data rate of the DVS output. Fig. 6(c) shows a close-up of the arm and racket over a 31 ms time slice where the gray scale level shows the time of the DVS events during the time slice. The gray level changes smoothly and continuously over the time slice, demonstrating the continuous analog-time representation of the DVS event timestamps. The fine temporal resolution of the DVS events allows efficient event-driven extraction of dynamic information such as optical flow [15] and object tracking [16]. Fig. 6(d) and (e) show a street scene recorded from the dashboard of a moving vehicle. The synchronous APS output frame allows conventional image processing on this scene. Because the cyclist in the middle of the scene and the tram are moving, they stand out in the 134 us time slice of the DVS output shown in Fig. 6(d). The non-moving far-away clouds in the sky do not produce DVS events, while the expanding size of the leading and trailing edges of the high-contrast street markings and the traffic signals produce events. The DVS output thus provides a sparse edge map of the moving parts of the scene in each short time slice.C. DVS Characterization
DVS specifications in Table I were measured using methods established in [1]. Compared to these other implementations, the DAVIS achieves the smallest pixel size, shortest latency, lowest power consumption, highest fill factor and largest output bandwidth, with all other specifications being comparable except for the higher contrast sensitivity of [2].
To measure the maximum output bandwidth of the AER, the chip request line Req was shorted to the chip acknowledge Ack (Fig. 2). The bias settings were chosen so that all pixels fired continuously which resulted in a request rate of 55 MHz. To translate this request rate into the actual event rate (which is not the same in the word-serial AER scheme), the frequency of the bit signaling a row request was also measured. The resulting 5 MHz row request rate was subtracted from the 55 MHz event rate, leading to an AER event bandwidth limit of about 50 Meps. But the address registration, encoding and communication through USB 2.0 with the given setup does not allow a transmission of more than 12 Meps.
D. APS Characterization
The APS readout was characterized using the photon transfer curve (PTC) technique [17]. The PTC was measured at the camera level, using the off-chip ADC. The noise variance was plotted against the signal for global shutter readout. According to [17], the conversion gain can be computed from the shot noise statistics of the signal by s_{{\rm tot}}^{2} =k^{2}s_{R}^{2} +k({S_{{\rm tot}} -S_{{\rm off}}}) \eqno{\hbox{(1.1)}}
E. Characterization of the APS-DVS Interaction
In order to exploit the advantages of the two parallel readouts, the readouts should not interfere with each other. The measured interference of the APS readout on the DVS readout and vice versa is discussed next.
1. Influence of DVS Event Rate on APS SNR
Ideally, the APS SNR is not corrupted by the transmission of DVS events and the image quality should not be dependent on the DVS event rate. To determine whether this is true, the chip was illuminated using an integrating sphere to create a spatially uniform, sinusoidally-modulated light at different frequencies to produce different rates of DVS events. The temporal noise of the APS readout was measured by calculating the standard deviation of the pixel signals over a series of frames. To remove the APS sample variations caused by the modulation of the light which would increase the computed noise value, the mean value of each frame is subtracted from the pixel measurement. Measurements ranging from 0 to greater than 1 keps rate of DVS events show that the temporal noise in the APS output is not correlated to the event rate (data not plotted because there is no measurable correlation for the flat curve of noise vs. event rate). Therefore the influence of DVS activity on APS noise is negligible.
2. Influence of APS Sample Rate on DVS Output
To estimate how many DVS events are triggered by the readout of each frame, the event rate versus frame rate was measured while the APS exposure time was kept constant. To ensure that all events were caused by noise, the sensor was illuminated with uniform non-changing light using an integrating sphere. As seen in Fig. 7 the events per second (eps) are a linear function of the frames per second (fps) and the slope corresponds to the number of events triggered by the acquisition of a frame. The vertical offset of the measurement was caused by the background noise caused by the leakage in the reset transistor [1]. The global shutter readout scheme led to about 110 events per frame (0.25% of the pixels) that were triggered by the readout of the frame. These events might be caused by the capacitive coupling between the TX and the Vpr nodes. To verify this, the sensor was operated in the rolling shutter mode, where TX was not clocked. Under these conditions, the event rate was no longer correlated to the frame rate which indicated that the frame readout triggered noise events can be avoided by shielding the Vpr node from the TX gate. Post layout simulation verified that there was an undesirable coupling between TX and Vpr; unfortunately, this problem was not identified before tapeout.
Conclusion
The combination of the frame-based image readout and asynchronous event-based temporal contrast readout allows the DAVIS to capture information on static as well as dynamic visual scene content. This dual readout is achieved through a shared photodiode and by adding five transistors to the original DVS pixel, increasing the DVS area by only about 5%. While the DVS readout does not affect the APS readout, the APS readout triggers DVS events but this can be overcome by a better pixel layout. The 240×180 sensor has a power consumption (not including the ADC of the APS output) between 5 mW to 14 mW depending on the DVS activity. The DVS has a minimum latency of 3
The temporal contrast DVS events enhance the frame-based approach to machine vision in several ways. In mobile devices, autonomous robots, and other applications where power consumption and latency are critical factors, the events can be used to track moving features in a scene and to analyze motion. Using the events will decrease the processing latency and cost compared with conventional frame-based cameras. When frames are needed, a low frame rate can be used to reduce power consumption. Online analysis of the DVS events (e.g., by counting events, detecting increases in event rates, or detecting tracked objects) can be used to trigger frame exposure and optional readout if the data is determined to be necessary, and ROI readout control can also be used to limit the data rate. Overall, this combination of functionalities can enable faster and lower power vision.