# A 200-Mb/s Energy Efficient Transcranial Transmitter Using Inductive Coupling

Wen Li <sup>D</sup>, Member, IEEE, Yida Duan, Member, IEEE, and Jan Rabaey <sup>D</sup>, Fellow, IEEE

Abstract—This paper presents an energy efficient wireless transmitter (TX) for neural implants. It utilizes inductive coupling with de-Q'ed TX inductor to achieve 200 Mb/s throughput. An ultra-low power injection-locked phase lock loop with background frequency calibration generates a clean 200-MHz TX clock from a 10-MHz reference. The TX chip is fabricated in TSMC 65-nm CMOS process, and the  $10 \times 10 \text{ mm}^2$  coupled inductors are implemented on two-layer printed circuit boards. A custom receiver is fabricated in the same CMOS process to facilitate measurements. The prototype transceiver achieves 5e-11 bit error rate (BER) over the 11.8-mm-thick skull of an eight-week primordial piglet carcass and <1e-12 BER over 11-mm air gap. The entire TX chip consumes 300  $\mu$ W from a single 0.5 V supply. The energy efficiency of the TX is 1.5 pJ/b.

*Index Terms*—Brain machine interface, biomedical implant, inductive coupling, injection locking, transcranial wireless links, ultra-low power transceiver.

#### I. INTRODUCTION

**R**ECENT advances in brain machine interface (BMI) technology have enabled doctors and neural scientists to understand and decode complex brain functions. The numerous applications of BMI include sensory-motor restorations of paralyzed patients [1]–[3], pre-surgical mapping to find seizure locations prior to surgery [4], deep brain stimulation to treat Parkinson's disease [5], and etc. Typical BMI systems record neural signals in real time and transmit the collected data outside the skull to be processed or translate in to action. Since through-the-skull wires that connect neuron readout channels and external devices pose as a major risk of infection, wireless data transmission is preferred. However, the large amount of data generated by thousands of neurons imposes a stringent requirement on the transmitter throughput. For example,

Manuscript received July 16, 2018; revised September 3, 2018 and October 28, 2018; accepted November 24, 2018. Date of publication January 25, 2018; date of current version March 22, 2019. This work was supported in part by DARPA Subnets, in part by STARnet Human Intranet, in part by BWRC, and in part by StarNet Terraswarm Center (SRC/DARPA sponsored) with Award 034206-002 and Sponsor Award ID 2013-MA-2386. This paper was recommended by Associate Editor M. Ghovanloo. (*Corresponding author: Wen Li.*)

W. Li is with the Qualcomm Atheros, San Jose, CA 95110 USA (e-mail: wen.li@berkeley.edu).

Y. Duan is with the Inphi Corporation, Santa Clara, CA 95054 USA (e-mail: yidaduan@gmail.com).

J. Rabaey is with the University of California—Berkeley, Berkeley, CA 94720 USA (e-mail: jan@eecs.berkeley.edu).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

decoding simple brain functions such as rudimentary movement of an arm requires signals from close to a hundred neurons to be analyzed [1]. The next-generation 1024-channel neural recorder that uses 8b 20KS/s ADC per channel generates up to 164 MB/s data stream [6], [7]. In addition, the implanted TX does not have access to external power supply, so its power consumption must be kept as low as possible to prolong battery life. On the other hand, the power constraint of the wireless receiver outside the skull is much more relaxed.

Most state-of-the-art wireless TX's for biomedical implants that use ultra-wide-band (UWB), backscattering, or pulseharmonic-modulation (PHM) cannot satisfy both throughput and power consumption requirements of the next generation neural recorders. Although UWB has the potential for high throughput [8], [9], it requires a bulky broadband antenna with a 50  $\Omega$  input impedance. A large inverter must be used to match the 50  $\Omega$  antenna impedance, and significant power is consumed just to charge the input capacitor of the inverter driver. On the other hand, the throughput of backscattering TX is limited to a small fraction of the carrier frequency due to the narrowband modulation scheme, which is usually <10 Mb/s [10], [11]. The recently proposed PHM technique is promising to achieve high energy efficiency [12], [13], but it is difficult to extend its throughput to hundreds of Mb/s due to its on-off keying (OOK) modulation scheme. Other techniques such as ultrasound usually have even lower data rate and have difficulty to penetrate skull bone because of the reflection at the boundary between skull bone and brain [14], [15].

To overcome these challenges, this work utilizes inductive coupling to reduce driver size and significantly improves TX power efficiency. The TX inductor is de-Q'ed to achieve high throughput. To minimize impact of jitter on BER, an injection-locked phase lock loop (IL-PLL) is used. The paper is organized as following: After a brief overview of the proposed transceiver system, Section II introduces inductivelycoupled data transmission in the context of transcranial links. It further discusses the coupled-inductor design and explores methods to improve the transmission throughput. Section III focuses on ultra-low power clock generation for the implanted TX. After a brief description of the die photo and the custom receiver to facilitate testing, measurement results are presented in Section IV. The paper concludes in Section V.

435

1932-4545 © 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications\_standards/publications/rights/index.html for more information.

Digital Object Identifier 10.1109/TBCAS.2018.2889802



Fig. 1. (a) System overview of the proposed transcranial transceiver, (b) layered tissue model for cranium channel, and (c) dielectric constant and loss of bio-tissues used in the model.

## II. ARCHITECTURE AND DESIGN OF INDUCTIVELY-COUPLED TRANSCRANIAL TRANSMITTER

#### A. System Overview and Coupled Inductor Design

The overall system diagram of the proposed inductivelycoupled transcranial transceiver is shown in Fig. 1a. The implanted TX chip consists of a PLL, a pseudo random bit sequence (PRBS) generator, and an inductor driver. The PLL generates the 200 MHz clock from a 10 MHz reference. The PRBS generator generates a 200 Mbps random bit stream, and the inductor driver sends the bit stream to the TX inductor without frequency modulation. The TX data pulses are transmitted over the skull channel to the RX inductor through magnetic coupling. The RX chain consists of 4 cascaded pre-amplifiers and a comparator for bit detection. The comparator is clocked at the same rate of the transmitted data. This work focuses on the TX architecture and design. The next section will discuss the data modulation and transmission in detail.

A typical cranium channel shown in Fig. 1b is 11 mm thick. It consists of 7 mm skull bone, 2 mm fat, and 2 mm skin [16]. As shown in Fig. 1c, the bio-tissues in the channel have electrical properties similar to lossy dielectrics [17]. The skull is sandwiched between the TX and RX inductors, with RX inductor on top of the skin and the implanted TX inductor below the skull bone. For good signal-to-noise ratio (SNR), the received signal strength must be significantly above the noise floor, so the mutual inductance cannot be too small. On-chip inductors are limited by the die size and cannot achieve required mutual inductance, so 10 mm  $\times$  10 mm<sup>2</sup> off-chip coupled inductors are used. As shown in Fig. 2a, both TX and RX inductors are implemented using 0.2 mm wide traces on 2-layer PCB's. We use a 2-turn TX coil to further increase the mutual inductance<sup>1</sup> at the cost of low self-resonant frequency. As will be discussed in Section II-C, the issues caused by the low TX self-resonant



Fig. 2. (a) PCB photos of coupled inductors, and (b) simulated self and mutual inductance.

frequency can be addressed using de-Q technique, whereas the effect of low self-resonant frequency in the RX inductor cannot be alleviated without reducing SNR. Therefore, 1-turn RX coil is used for its high self-resonant frequency. The coupled inductor structure along with the layered bio-tissue model is carefully simulated in electromagnetic simulator HFSS. The extracted self and mutual inductances are shown in Fig. 2b. The mutual inductance is around 1 nH. The RX inductance is 30 nH with the self-resonance at 2.4 GHz. The TX inductance is 92 nH, but its self-resonance is as low as 312 MHz. This low TX self-resonance can cause inter-symbol interference (ISI) and significantly worsens BER. This issue will be discussed more in Section II-B. It is worth mentioning that all the on-board clock and data traces except the inductor coils are shielded underneath by ground plane to minimize the undesired magnetic coupling to the inductor coils.

#### B. Data Modulation and Pulse Response

To understand data modulation in inductively-coupled links, we first use ideal coupled inductors shown in Fig. 3a as an example. The voltages across the TX and RX inductors  $(V_{T,R})$  can be written as a function of the current flowing into them  $(I_{T,R})$  using the following equation:

$$\begin{bmatrix} V_R \\ V_T \end{bmatrix} = \begin{bmatrix} sL_R & sM \\ sM & sL_T \end{bmatrix} \cdot \begin{bmatrix} I_R \\ I_T \end{bmatrix}$$
(1)

If RX load impedance,  $Z_L$ , is large, the RX current  $(I_R)$  is 0. Then, the RX voltage is simply the derivative of TX current,  $V_R = sMI_T$ . Taking advantage of this fact, the TX can be designed as a simple current driver that injects return-to-zero current pulses into the TX inductor as shown Fig. 3b [18], [19]. The transmitted bits are encoded in the polarity of the TX current. The received voltage waveform  $(V_R)$  is a series of pulses that corresponds to the transitions of  $I_T$ , with a positive pulse

<sup>&</sup>lt;sup>1</sup>Note that the mutual inductance is proportional to the number of turns used in the TX coil because more turns generate larger magnetic flux. On the other hand, the coupling factor is relatively independent of the number of turns used.



Fig. 3. (a) Circuit model and (b) signal waveforms of ideal coupled inductors; (c) circuit model of real coupled inductors including parasitic elements.

followed by a negative pulse representing bit 1, and the opposite as bit 0. In reality, the  $10 \times 10 \text{ mm}^2$  on-board inductors shown in Fig. 2a have parasitic resistors  $(R_R, R_T)$  that cause loss and parasitic capacitors  $(C_R, C_T)$  that give rise to self-resonance (Fig. 3c). Equation (1) must be modified to take the parasitic elements into account:

$$\begin{bmatrix} V_R - (I_R - sC_R V_R) R_R \\ V_T - (I_T - sC_T V_T) R_T \end{bmatrix} = \begin{bmatrix} sL_R & sM \\ sM & sL_T \end{bmatrix}$$
$$\cdot \begin{bmatrix} I_R - sC_R V_R \\ I_T - sC_T V_T \end{bmatrix}$$
(2)

If we define self-resonant frequencies of the inductors  $f_{T,R} = \omega_{T,R}/2\pi = 1/\sqrt{L_{T,R}C_{T,R}}/2\pi$ , the quality factors  $Q_{T,R} = \sqrt{L_{R,T}/C_{R,T}}/R_{R,T}$ , and coupling factor  $k = M/\sqrt{L_T L_R}$ . The transfer function from  $I_T$  to  $V_R$  can be derived from Equation (2):

 $V_R =$ 

$$\frac{sM \cdot I_T}{\left[1 + \frac{s}{Q_T \,\omega_T} + \left(\frac{s}{\omega_T}\right)^2\right] \left[1 + \frac{s}{Q_R \,\omega_R} + \left(\frac{s}{\omega_R}\right)^2\right] + \left(\frac{ks^2}{\omega_T \,\omega_R}\right)^2} \tag{3}$$

Since  $M \ll \sqrt{L_T L_R}$  from Fig. 2b,  $k \ll 1$ . Equation (3) can be simplified as:

$$V_R \approx \frac{sM \cdot I_T}{\left[1 + \frac{s}{Q_T \omega_T} + \left(\frac{s}{\omega_T}\right)^2\right] \left[1 + \frac{s}{Q_R \omega_R} + \left(\frac{s}{\omega_R}\right)^2\right]} \quad (4)$$

The numerator of Equation (4) is the same as the ideal coupled inductor, while the denominator represents  $2^{nd}$  order RLC responses caused by TX and RX self-resonance. Since  $Q_{T,R}$  of off-chip inductors are much greater than 0.5, both TX and RX coils are underdamped, which cause ringing in pulse response. Since this design targets 200 Mb/s date-rate, a 2.5 ns wide current pulse (half of the 5 ns bit-time) with 100 ps transition time is injected into the TX coil to simulate transmitted current pulse. As shown in Fig. 4, RX voltage waveform is heavily distorted by ringing to the point that the positive and negative  $V_R$  pulses of



Fig. 4. Pulse response of the coupled inductors with and without de-Q.

the ideal coupled inductors cannot be distinguished. The small amplitude, high frequency ringing is caused by RX inductor resonance, whereas the large amplitude, low frequency ringing is caused by TX resonance. Note the TX ringing lasts much longer than 5 ns bit-time, which results in significant ISI. This corrupts subsequent bits and causes detection errors. Therefore, this ringing-induced ISI must be minimized.

#### C. ISI Suppression by Inductor De-Q

Although many equalization techniques such as continuous time linear equalization (CTLE), feed-forward equalization (FFE), and decision feedback equalization (DFE) can be used to remove ISI in theory, none of them are effective to alleviate ringing. The CTLE approach inverts the frequency response of the channel using a continuous time filter. To equalize the resonant peak caused by the TX inductor, the CTLE must have a sharp notch at the exact frequency. It is difficult to implement in practice because the location of the CTLE notch is sensitive to process and temperature variations. On the other hand, the discrete time equalizers such as FFE and DFE rely on accurate signal samples. The presence of ringing causes the sampled signal to be extremely sensitive to sampling time error. Therefore, BER of FFE and DFE approaches can be severely limited if TX jitter is not small. Therefore, ringing must be removed before sampling or bit detection. In this work, a simple but effective method — inductor de-Q — is used to alleviate ringing.

The ringing arises because the large  $Q_{T,R}$  cause underdamped response, so one can reduce its effect simply by adding series resistance  $R_{T,R}$  to lower  $Q_{T,R}$ . Thus, the name de-Q is used for this technique. Since adding  $R_{T,R}$  potentially increases noise and adversely impacts SNR, de-Q can only be used in TX coil where the signal amplitude is large. Thankfully, the self-resonant frequency of the 1-turn RX inductor is much higher than the signal bandwidth (~200 MHz), so the small RX ringing can be alleviated by a simple low-pass filter. To completely remove TX ringing,  $Q_T$  is set to be 0.5 to meet critical damping condition, and Equation (4) becomes:

$$V_R \approx \frac{sM \cdot I_T}{\left(1 + \frac{s}{2\omega_T}\right)^2 \left[1 + \frac{s}{Q_R \,\omega_R} + \left(\frac{s}{\omega_R}\right)^2\right]} \tag{5}$$

The pair of complex conjugate poles caused TX selfresonance is converted into 2 real poles at  $2\omega_T$ . As a result, TX ringing is completely removed.



Fig. 5. (a) Schematic and (b) signal waveform of TX.

De-Q resistor value can be found by setting  $Q_T = \sqrt{L_T/C_T}/R_T = 0.5$ . One drawback of using de-Q technique is the maximum RX voltage amplitude is reduced with the added de-Q resistor. The larger the de-Q resistance, the smaller the amplitude. In this design, we find making TX coil slightly underdamped gives the maximum RX eye opening through simulation. This translate into roughly 262.5  $\Omega$  total de-Q resistance. To implement the de-Q resistor, 7 surface-mount resistors are distributed on the TX coil as shown in Fig. 2a. The resistance of each resistor equals 37.5  $\Omega$ . The simulated pulse response after de-Q in Fig. 4 shows that the large-amplitude ringing is eliminated. The pair of  $V_R$  pulses that corresponds to the up and down  $I_T$  transitions can be clearly identified. ISI is no longer significant after 5 ns.

#### D. Transmitter Implementation

The TX circuit shown in Fig. 5a consists of an inductor driver  $(M_{1-12})$ , a phase lock loop (PLL), and a pseudo-random-bitsequence (PRBS) generator for testing. 0.5 V supply is used for the entire TX. The PLL generates 200 MHz TX clock (*ck*) from a 10 MHz off-chip reference. The differential outputs of the PRBS generator, *data* and *data*, are synchronized to the rising edge of *ck*. The inductor driver design is similar to [20], [21]. The output stage  $M_{1-4}$  directly drives TX inductor. The transistors,  $M_{5-7}$  and  $M_{9-12}$ , form 2 current starved inverters to limit the edge rate of inductor current  $I_T$ . Depending on polarity of data, either  $M_{2,3}$  or  $M_{1,4}$  are turned on at any given time to control the direction of  $I_T$ . In the case of *data* = 1,  $M_{1,4}$  are off.



Fig. 6. (a) Schematic, (b) phase noise model, and (c) timing diagram of an injection lock ring oscillator.

 $M_2$  is immediately turned on at the rising edge of ck. The current starved inverter ( $M_{5-8}$ ) slowly discharges X, which gradually turns on  $M_3$  and ramps up  $I_T$ . On the other hand, since X is gated by ck through a nor-gate and a current starved inverter, the falling edge of ck slowly charges X back to 0.5 V to shut down  $M_3$ . It causes  $I_T$  to gradually return to zero. When data = 0,  $M_{1,4}$  are turned on during high phase of ck to steer  $I_T$  to the opposite direction.

## III. POWER EFFICIENT CLOCK GENERATION USING INJECTION LOCKING

In addition to ISI, timing errors caused by random and deterministic jitter can also limit the system BER if left unchecked. Due to finite bandwidth of the de-Q'ed TX inductor and driver, timing error can be converted into voltage error by none-zero rise and fall time of the RX pulses. For this reason, a clean clock source is desired. Due to the stringent requirement on power consumption, generating this clock could be very challenging. This section discusses circuit and system techniques to reduce clock jitter while keeping the generator power as low as possible.

#### A. Noise Rejection by Injection Locking

The largest contributor to overall clock jitter is the oscillator. The implanted TX demands the clock generator to have both low power consumption and low jitter. Although LC oscillator has the potential to meet these requirements, its large inductor size makes it unattractive. Ultra-low power ring oscillator (RO) is a viable solution, but it can be extremely noisy. Conventional PLL approach is insufficient to suppress the large RO phase noise because of its low bandwidth. In most designs, the bandwidth of conventional PLL is limited to around one tenth of the reference frequency ( $f_r$ ) to ensure loop stability. Thus, oscillator phase noise beyond the PLL bandwidth is not effectively rejected by the PLL. An effective method to extend the noise rejection bandwidth is to use injection locking [22].

Fig. 6a shows an example of injection locked ring oscillator circuit (ILRO) and its clock waveforms. At the beginning of



Fig. 7. Phase noise transfer function of injection locking and conventional PLL.

every reference clock period,  $T_r$ , an injection clock with narrow pulse width,  $CK_{inj}$ , briefly turns on switch S<sub>1</sub> to reset  $CK_{out}$ . In phase domain, the injection corrects phase errors accumulated in the previous reference clock period. If the injection strength is strong, the accumulated phase error is reset to 0 every time  $CK_{inj}$  pulses arrive [22], [23] (Fig. 6c). This mechanism can be modeled as a feedforward noise cancellation circuit shown in Fig. 6b. The phase noise of the free running RO,  $\phi_{e,RO}$ , is sampled at rising edges of  $CK_{inj}$ , and subsequently subtracted from itself to produce output phase error,  $\phi_{e,o}$ . In frequency domain,  $\phi_{e,o}$  can be written as:

$$\phi_{e,o}\left(f\right) = f_r \left[\delta\left(\frac{f}{f_r}\right) - \sum_{n=-\infty}^{\infty} \delta\left(f - nf_r\right) \operatorname{sinc}\left(\frac{f}{f_r}\right) e^{\frac{j\pi f}{f_r}}\right]$$
$$\otimes \phi_{e,RO}\left(f\right) \tag{6}$$

Where the operator  $\otimes$  represents convolution. If the jitter bandwidth of RO is much lower than  $f_r$ , all the terms inside the summation can be ignored except for n = 0. Then, Equation (6) can be simplified as:

$$\phi_{e,o}\left(f\right) = \left[1 - \operatorname{sinc}\left(\frac{f}{f_r}\right)e^{\frac{j\pi f}{f_r}}\right]\phi_{e,RO}\left(f\right) \tag{7}$$

For  $f_{\tau} = 10$  MHz, the jitter transfer function described in Equation (7) is plotted in Fig. 7. For comparison, the jitter transfer function of a 2<sup>nd</sup> order conventional PLL with 1 MHz bandwidth and damping factor of 0.5 is also shown in the plot. The injection locking significantly improves phase noise rejection than conventional PLL.

#### B. Injection Spur and Frequency Calibration

One major issue with injection locked PLL is injection spur. As mentioned earlier, injection pulses reset oscillator phase error once per reference clock period,  $T_r$ . As shown in Fig. 8a, the oscillator runs at its natural frequency,  $f_{nat}$ , undisturbed between  $CK_{inj}$  pulses. For a division ratio of 20, if  $f_{nat}$  is not exactly the same as 20 times the reference frequency ( $f_{nat} \neq 20f_r$ ), the instantaneous periods at injection instances,  $T_{20}$ , must change accordingly to compensate for the phase error ( $\phi_e$ ) accumulated



Fig. 8. Clock waveforms and phase error for the injection locked oscillator in the case of (a)  $f_{nat} < 20f_r$  and (b)  $f_{nat} > 20f_r$ . (c) Block diagram for the background frequency calibration circuit.

during  $T_r$ . For example, if  $f_{nat} < 20f_r$  as shown in Fig. 8a,  $CK_{inj}$  shortens  $T_{20}$  to compensate for positive  $\phi_e$ . The rest of oscillation cycles,  $T_{1-19}$ , are left unchanged and still have their period equal to  $T_{nat}$ . This ensures the average frequency of  $CK_{out}$  to equal  $20f_r$ . On the other hand, if  $f_{nat} > 20f_r$ ,  $T_{20}$  is widened to compensate the negative  $\phi_e$ . The resulting periodic change in  $\phi_e$  manifests into spurs in the spectrum of  $CK_{out}$ . If it is not controlled,  $\phi_e$  causes large deterministic jitter and lowers the link BER.

In order to alleviate injection spur,  $f_{nat}$  must be calibrated to be exactly  $20 f_{ref}$ . Many injection-locked voltage-controlledoscillator (IL-VCO) circuits can be utilized to adjust  $f_{nat}$ . However,  $f_{nat}$  is very sensitive to temperature, and it can drift over time. Therefore, frequency calibration needs to be running continuously in background. A common approach to implement background frequency calibration is to use a conventional PLL in parallel with injection locking [22], [24]. In this approach, injection locking and PLL may fight to force oscillator phase to converge to different values, so the delay of the PLL divider output clock needs to be carefully tuned to avoid conflict of convergence. An elegant solution is to use the difference between instantaneous oscillation period at injection instance  $(T_{20})$  and the subsequent period  $(T_1)$  as an indicator for calibration [23]. A least-mean-square (LMS) engine shown in Fig. 8c can be used to update IL-VCO frequency until  $T_1 = T_{20}$ . This calibration method directly detects the error between the natural frequency of IL-VCO and the desired value. Thus, it does not conflict with injection locking mechanism. The following section will describe the implementation of this frequency calibration loop in detail.

## C. Injection Locked PLL Architecture and Oscillator Schematic

Fig. 9b shows schematic of the ring-based IL-VCO used in this design. It consists of 5 current starved inverters under 0.5 V supply. The bottom NMOS transistor  $M_1$  of each inverter is biased to limit its discharge current. The bulk node of  $M_1$  instead of its gate is chosen as the control knob to lower its frequency sensitivity to  $V_c - K_{vco}$ . Since  $V_c$  is generated by a digital-



Fig. 9. (a) Injection-locked PLL architecture, (b) the IL-VCO schematic, and (c) its timing diagram.

to-analog converter (DAC), the quantization noise of the DAC is converted to clock jitter by  $K_{vco}$ . Therefore, lowering  $K_{vco}$  reduces the impact of the quantization noise. Note the higher the bulk voltage, the larger driving strength of M<sub>1</sub>. Therefore, the oscillator frequency is directly proportional to  $V_c$ . Thanks to the low supply voltage, the forward biased bulk-source junction of M<sub>1</sub> does not draw significant current from the DAC.

The block diagram of the PLL implementation is shown in Fig. 9a.  $CK_{inj}$  is generated from a 50% duty-cycle reference clock by a pulse generator. A shift register counter that consists of 21 cascaded flip-flops (FF<sub>1-21</sub>) counts the number of  $CK_{out}$ cycles passed after  $CK_{inj}$  goes low. It produces 2 pulses —  $P_1$ and  $P_{20}$  — whose pulse-widths equal  $T_1$  and  $T_{20}$  respectively. A pulse-width comparator circuit consisting of a pair of integrator followed by a voltage comparator detects the difference between  $T_1$  and  $T_{20}$ , and the results are averaged and accumulated by a digital calibration engine before feeding back to IL-VCO through a DAC. If  $T_{20} > T_1$ , the oscillator is running too fast, and  $V_c$  is decreased to lower its natural frequency; if  $T_{20} < T_1$ ,  $V_c$  is increased to speed up oscillation. Note 2 additional FF's inside shift register counter,  $F\!F_{\rm S0,S1},$  latch the outputs of  $FF_{20,21}$  at the rising edge of  $CK_{inj}$ . Along with the multiplexer MUX<sub>2</sub> and the shift-register counter, FF<sub>S0,S1</sub> function as a frequency tracking loop by forcing the division ratio to equal 20.  $MUX_2$  selects the pulse-width comparator result only if the 2-bit output of  $FF_{S0,S1} - S[1:0]$  – equals 1, the condition that the shift register counts exactly 20 cycles before reset. A count greater than 20 makes S[1:0] = 3, resulting in  $V_c$  to be reduced; a count less than 20 cycles (S[1:0] = 0)causes  $V_c$  to increase. The timing offset of pulse-width comparator is calibrated in foreground. The entire digital frequency and spur calibration loop is continuously running during normal operation.



Fig. 10. (a) Prototype die photo, and (b) on-chip RX block diagram.



Fig. 11. Measurement setup.



Fig. 12. Measured (a) phase noise and (b) time-domain total jitter of IL-PLL.



Fig. 13. Measured (a) RX eye-diagram and (b) bathtub curve for over-the-air and over-the-skull channels.



Fig. 14. Measured bathtub curve for over-the-air channel at various offsets and distances between TX and RX coils.



Fig. 15. TX power breakdown.

### IV. MEASUREMENT RESULTS

To demonstrate the proposed architecture and circuits, the 200 Mb/s TX prototype is fabricated in CMOS 65nm process. As shown in Fig. 10a, the die measures  $1.5 \times 1.5 \text{ mm}^2$ . The active circuits in TX occupy only a small portion of the die. To facilitate measurements, a simple receiver (RX) shown in Fig. 10b is implemented on the same die to pair with the TX. The RX consists of 5 cascaded pre-amplifiers followed by a comparator. The outputs of the pre-amplifiers can be

 TABLE I

 COMPARISON TO STATE-OF-THE-ARTS TX FOR BIO-IMPLANTS

|              | Technology  | Data Rate | Comm. Method /<br>Modulation      | Antenna Size                   | Distance | Channel Media            | BER    | Power                    | FOM <sup>a</sup>      |
|--------------|-------------|-----------|-----------------------------------|--------------------------------|----------|--------------------------|--------|--------------------------|-----------------------|
| [8]          | 0.35um CMOS | 90Mb/s    | UWB / OOK                         | 10×5mm <sup>2 b</sup>          | N/A      | N/A                      | N/A    | 1.6mW <sup>c</sup>       | 17.8pJ/b°             |
| [9]          | 130nm CMOS  | 10Mb/s    | UWB / OOK                         | N/A                            | 5cm      | N/A                      | 5e-3   | $100 \mu W^{c}$          | 10pJ/b <sup>c</sup>   |
| [10]         | 65nm CMOS   | 1Mb/s     | Backscattering /<br>OOK           | $6.5 \times 6.5 \text{mm}^2$   | 10mm     | in-vivo                  | <1e-7  | 13µW                     | 13pJ/b                |
| [12]         | 0.5um CMOS  | 10.2Mb/s  | PHM / OOK                         | $10 \times 10 \text{mm}^2$     | 10mm     | N/A                      | 6.3e-8 | $3.52 \text{mW}^{\circ}$ | 345pJ/b°              |
| [13]         | 0.35um CMOS | 1Mb/s     | PHM / OOK                         | $1 \times 1 \text{mm}^2$       | 18mm     | In-vivo                  | 5e-6   | $8.86 \mu W^{c,d}$       | 8.86pJ/b <sup>c</sup> |
| [14]         | 65nm CMOS   | 95Kb/s    | Ultrasound / OOK                  | $0.55 \times 0.55 \text{mm}^2$ | 8.5cm    | animal tissue            | <1e-4  | $157 \mu W^e$            | 1.65nJ/b <sup>e</sup> |
| This<br>work | 65nm CMOS   | 200Mb/s   | Ind. coupling /<br>Return-to-Zero | 10×10mm <sup>2</sup>           | 11mm     | piglet skin and<br>skull | 5e-11  | 300µW                    | 1.5pJ/b               |

<sup>a</sup>FOM is defined as energy require to transmit 1-bit information (energy efficiency).

<sup>b</sup>Antenna size is estimated from PCB photo.

<sup>c</sup>Power consumption does not include clock generation.

<sup>d</sup>Calculated from FOM and data rate.

<sup>e</sup>Power is estimated by adding clock generation power to peak PA power.

sent off chip to plot eye-diagram. The 200 MHz comparator strobe clock is derived from an external 400 MHz clock, and its delay can be adjusted by a phase-interpolator (PI). The root-mean-square input-referred noise of the RX is around 23  $\mu$ V. The RX bandwidth is roughly 330 MHz.

The measurement setup is shown in Fig. 11. The head of an 8-week-old primordial piglet carcass is used to model human skull. The thickness of the piglet skull is 11.8 mm. The RX board is mounted on the outer side of the skull, while the TX board is placed on the inner side. Measurements are performed with the skull placed on top of the piglet brain. To minimize over-the-air electromagnetic interferences, a small aluminum foil cover is placed on top of the RX board. A signal generator (SG) provides the 400 MHz clock input for RX chip, and its 10 MHz reference output is used for the TX PLL reference. The TX PLL phase noise is measured by a signal source analyzer (SSA). A digital sampling oscilloscope (DSO) measures the RX eye-diagram. A bit-error-rate tester (BERT) is used to measure the BER of strobed RX outputs (comparator outputs). The TX and RX chips are configured by 2 FPGA boards.

The measured TX PLL phase noise is shown in Fig. 12a. With the help of injection locking and background frequency calibration loop, it achieves -43 dBc spur and 59 ps total integrated jitter. As shown in Fig. 12b, the time-domain jitter measurement results in 60 ps total integrated jitter. This confirms the accuracy of phase noise measurement. The entire PLL consumes only 130  $\mu$ W from a 0.5 V supply. The RX eye-diagrams for both 11 mm air gap and 11.8 mm skull channel are plotted in Fig. 13a. 2 RX eyes are present in one unit-interval (UI) because every return-to-zero TX current pulse generates 2 RX voltage pulses. Since the 2<sup>nd</sup> eye carries redundant information, it is discarded for simplicity. The 1st eye for skull channel is not as wide as the air-gap channel because of the conductive loss of bio-tissues. However, the 2<sup>nd</sup> eye for the skull channel is not as distorted as the air-gap channel. This is because the de-Q resistor value is specifically chosen for the environment inside the skull. The surrounding bio-tissue makes the self-resonant frequency of the TX inductor inside the skull to be quite different from

in the air. Therefore, residue ringing for over-the-air measurement is more pronounced. The RX BER versus sampling phase (bathtub curve) for both channels are shown in Fig. 13b. The inductively coupled link achieves 5e-11 BER over the piglet skull. For air gap channel, the link operates without an error for more than 1e13 bits. The impacts of inductor misalignment and spacing variation are investigated by repeating over-the-air BER measurement at different offsets and spacing between the 2 inductors. The results are shown in Fig. 14. With the offset as large as 6.4 mm, the transceiver achieves 1e-9 BER. Even for 17.5 mm spacing, the BER is still as low as 1e-7.

The power consumption of the entire TX chip is 300  $\mu$ W. As shown in the Fig. 15, the PLL consumes 130  $\mu$ W; the inductor driver consumes 165  $\mu$ W; the rest 5  $\mu$ W is consumed by peripheral circuits including PBRS generator, clock and data buffers. The entire TX chip achieves 1.5 pJ/b energy efficiency. Comparing to state-of-the-art TX for bio-implants shown in Table I, this work achieves more than 2× higher throughput, lowers BER by more than 3 orders of magnitude, and improves energy efficiency by 6×.

#### V. CONCLUSION

This work demonstrates inductive coupling is a viable solution for the next-generation high-throughput wireless transcranial data communication. A straightforward return-to-zero data modulation combined with inductor de-Q technique enables energy efficient data transmission at hundreds of Mb/s. Generating a clean clock with ultra-low power consumption for the implanted TX is achievable with the help of injection locking and background frequency calibration. With these techniques, robust wireless links with very low BER can be established for brain implants.

#### ACKNOWLEDGMENT

The authors would like to thank Dr. Karim Abdelhalim, Dr. Belal Helal, Dr. Charles Chu, Ravidran Mohanavelu, Dr. Jorge Pernillo, Lawrence Tse for technical inputs and testing support; Dr. Yue Lu, Dr. Michael Mark, and Dr. Pierluigi Nuzzo for helpful discussion; TSMC for chip fabrication.

#### REFERENCES

- L. Hochberg *et al.*, "Neuronal ensemble control of prosthetic devices by a human with tetraplegia," *Nature*, vol. 442, pp. 164–171, Jul. 13, 2006.
- [2] J. Collinger et al., "High-performance neuroprosthetic control by an individual with tetraplegia," *Lancet*, vol. 381, pp. 557–564, Feb. 16, 2013.
- [3] A. Ajiboye *et al.*, "Restoration of reaching and grasping movements through brain-controlled muscle stimulation in a person with tetraplegia: A proof-of-concept demonstration," *Lancet*, vol. 389, pp. 1821–1830, May 6, 2017.
- [4] A. Kuruvilla and R. Flink, "Intraoperative electrocorticography in epilepsy surgery: Useful or not?" Seizure, vol. 12, no. 8, pp. 577–584, Dec. 2003.
- [5] M. C. Rodriguez-Oroz *et al.*, "Bilateral deep brain stimulation in Parkinson's disease: A multicentre study with 4 years follow-up," *Brain*, vol. 128, no. 10, pp. 2240–2249, Oct. 1, 2005.
- [6] M. Ballini et al., "A 1024-channel CMOS microelectrode-array system with 26400 electrodes for recording and stimulation of electroactive cells in-vitro," in *Proc. Symp. Very Large Scale Integr. Circuits*, Kyoto, Japan, 2013, pp. C54–C55.
- [7] S. Ha, J. Park, Y. Chi, J. Viventi, J. Rogers, and G. Cauwenberghs, "85 dB dynamic range 1.2 mW 156 kS/s biopotential recording IC for high-density ECoG flexible active electrode array," in *Proc. Eur. Solid-State Circuits Conf.*, Sep. 2013, pp. 141–144.
- [8] M. Chae et al., "A 128-channel 6 mW wireless neural recording IC with on-the-fly spike sorting and UWB transmitter," in Proc. IEEE Int. Solid-State Circuits Conf.—Dig. Tech. Papers, Feb. 2008, pp. 146–147.
- [9] K. Abdelhalim, H. M. Jafari, L. Kokarovtseva, J. L. P. Velazquez, and R. Genov, "64-channel UWB wireless neural vector analyzer SOC with a closed-loop phase synchrony-triggered neurostimulator," *IEEE J. Solid-State Circuits*, vol. 48, no. 6, pp. 2494–2510, Oct. 2013.
- [10] R. Muller et al., "A miniaturized 64-channel 225 μW wireless electrocorticographic neural sensor," in Proc. IEEE Int. Solid-State Circuits Conf.— Dig. Tech. Papers, Feb. 2014, pp. 412–413.
- [11] C. Sutardja and J. Rabaey, "Isolator-less near-field RFID reader for subcranial power/data link of mm-sized implants," in *Proc. 43rd IEEE Eur. Solid State Circuits Conf.*, Sep. 2017, pp. 372–375.
- [12] F. Inanlou, M. Kiani, and M. Ghovanloo, "A 10.2 Mbps pulse harmonic modulation based transceiver for implantable medical devices," *IEEE J. Solid-State Circuits*, vol. 46, no. 6, pp. 1296–1306, Jun. 2011.
- [13] P. Yeon, S. A. Mirbozorgi, J. Lim, and M. Ghovanloo, "Feasibility study on active back telemetry and power transmission through an inductive link for millimeter-sized biomedical implants," *IEEE Trans. Biomed. Circuits Syst.*, vol. 11, no. 6, pp. 1366–1376, Dec. 2017.
- [14] T. Chang, M. Wang, J. Charthad, M. Weber, and A. Arbabian, "A 30.5 mm<sup>3</sup> fully packaged implantable device with duplex ultrasonic data and power links achieving 95 kb/s with <10-4 BER at 8.5 cm depth," in *Proc. IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers*, Feb. 2017, pp. 460–461.
- [15] D. Seo *et al.*, "Wireless recording in the peripheral nervous system with ultrasonic neural dust," *Neuron*, vol. 91, no. 3, pp. 529–539, 2016.
- [16] M. Mark et al., "Wireless channel characterization for mm-size neural implants," in Proc. Annu. Int. Conf. IEEE Eng. Med. Biol. Soc., Sep. 2010, pp. 1565–1568.
- [17] C. Gabriel, S. Gabriel, and E. Corthout, "The dielectric properties of biological tissues: I. Literature survey," *Phys. Med. Biol.*, vol. 41, pp. 2231– 2249, 1996.
- [18] "Return-to-zero," Wikipedia. [Online]. Available: https://en.wikipedia. org/wiki/Return-to-zero. Accessed on: Oct. 20, 2018.
- [19] W. Li, Y. Duan, and J. Rabaey, "A 200 Mb/s inductively coupled wireless transcranial transceiver achieving 5e-11 BER and 1.5 pJ/b transmit energy efficiency," in *Proc. IEEE Int. Solid-State Circuits Conf.*—*Dig. Tech. Papers*, Feb. 2018, pp. 290–292.
- [20] S. Kawai, H. Ishikuro, and T. Kuroda, "A 2.5 Gb/s/ch 4PAM inductivecoupling transceiver for non-contact memory card," in *Proc. IEEE Int. Solid-State Circuits Conf.*—*Dig. Tech. Papers*, Feb. 2010, pp. 264–265.
- [21] S. Lee, K. Song, J. Yoo, and H. Yoo, "A low-energy inductive coupling transceiver with Cm-range 50-Mbps data communication in mobile devices applications," *IEEE J. Solid-State Circuits*, vol. 45, no. 11, pp. 2366– 2374, Nov. 2010.

- [22] S. Ye, L. Jansson, and I. Galton, "A multiple-crystal interface PLL with VCO realignment to reduce phase noise," *IEEE J. Solid-State Circuits*, vol. 37, no. 12, pp. 1795–1803, Dec. 2002.
- [23] B. Helal, M. Straayer, G. Y. Wei, and M. Perrott, "A highly digital MDLLbased clock multiplier that leverages a self-scrambling time-to-digital converter to achieve subpicosecond jitter performance," *IEEE J. Solid-State Circuits*, vol. 43, no. 4, pp. 885–863, Apr. 2008.
- [24] J. Lee and H. Wang, "Study of subharmonically injection-locked PLLs," *IEEE J. Solid-State Circuits*, vol. 44, no. 5, pp. 1539–1553, May 2009.



Wen Li (S'08–M'18) received the B.Eng. degree in electrical engineering from The Chinese University of Hong Kong, Hong Kong, in 2008, and the Ph.D. degree from the University of California—Berkeley, Berkeley, CA, USA, in 2017.

Since 2018, she has been with the Qualcomm Atheros, San Jose, CA, USA. Her research interests include circuits for biomedical applications, lowpower wireless transceivers, power efficient seriallink transceivers, low-power PLL, and low-power analog-to-digital converters.



**Yida Duan** (S'08–M'15) received the B.S., M.S., and Ph.D. degrees in electrical engineering and computer sciences from the University of California— Berkeley, Berkeley, CA, USA, in 2007, 2015, and 2015, respectively.

In Summer 2010, he was an Analog Design Intern with the NXP Semiconductor, San Jose, CA, USA. From May 2012 to December 2012, he was with the Marvell Semiconductor, Santa Clara, CA, USA. Since 2015, he has been an Analog/Mix-Signal Design Engineer with the Inphi Corporation, Santa

Clara, CA, USA. His research interests include high-speed analog-to-digital converters, high-speed serial-link transceivers, high-frequency PLL, and circuits and systems for biomedical applications.

Dr. Duan was the recipient of the IEEE Custom Integrate Circuit Conference Best Student Paper Award in 2013.



Jan Rabaey (M'87–SM'92–F'95) holds the Donald O. Pederson Distinguished Professorship at the University of California—Berkeley, Berkeley, CA, USA. Before joining the faculty at the University of California—Berkeley, he was a Research Manager with IMEC from 1985 to 1987. He is a Founding Director of the Berkeley Wireless Research Center and the Berkeley Ubiquitous SwarmLab, and has served as the Electrical Engineering Division Chair at Berkeley twice.

He has made high-impact contributions to a num-

ber of fields, including advanced wireless systems, low-power integrated circuits, mobile devices, sensor networks, and ubiquitous computing. His current interests include the conception of the next-generation distributed systems, as well as the exploration of the interaction between the cyber and the biological world.

Prof. Rabaey is the recipient of major awards, such as the IEEE Mac Van Valkenburg Award, the European Design Automation Association Lifetime Achievement award, the Semiconductor Industry Association University Researcher Award, and the SRC Aristotle Award. He is a member of the Royal Flemish Academy of Sciences and Arts of Belgium, and has received honorary doctorates from Lund (Sweden), Antwerp (Belgium), and Tampere (Finland). He has been involved in a broad variety of start-up ventures, including Cortera Neurotechnologies, of which he is a co-founder.