I. Introduction
The global IP traffic triples every 5 years and is projected to be over 200 Exabytes per month by 2020 [1]. In the past decades, to support the data boom, electrical links have adopted nonreturn-to-zero (NRZ) signaling due to its simplicity and the advancement of CMOS technology. But now the CMOS technology scaling is encountering challenges like unsustainable cost. NRZ signaling is losing its attraction due to future electrical links requiring a higher data rate. Four-level pulse amplitude (PAM4) signaling with doubled bandwidth (BW) efficiency has become the most promising solution for the next generation Ethernet. Therefore, power-efficient PAM4 transceivers are highly desired to save the cost of the hyper-scale data centers. Besides equalization and clock data recovery, PAM4 receivers also require a PAM4-to-NRZ decoder for further digital processing. Both analog-to-digital (ADC)-based receivers [2], [3] and mixed-signal receivers [4], [5] have been reported. In ADC-based receivers, most of the PAM4 signals are processed in the digital domain, facilitating the decoder design and the implementation of the advanced equalization. In addition, ADC-based receivers have good design flexibility and process portability, but with an inferior bit efficiency of ~10 pJ/bit [2]. In contrast, mixed-signal receivers can achieve a better bit efficiency of <4 pJ/bit by employing power-efficient analog circuit techniques [5], [6], therefore they are more attractive for low-power designs. Decoders in mixed-signal receivers usually consist of three comparators and the subsequent thermometer-to-binary logic. The comparator without extra reference decodes the maximum significant bit (MSB) from the PAM4 signal, while the other two comparators with amplitude-proportional references are for the least significant bit (LSB) decoding. To accommodate different input amplitudes, should be generated adaptively or can be a constant value with the help of a variable-gain amplifier (VGA). In [7], the adaptively generated is equal to 2/3 of the detected peak-to-peak amplitude. In [8], the is equal to the vertical opening of the PAM4 middle eye, but the introduced analog adaptation path consumes extra power. For a decoder with a constant , the VGA will amplify the input signal amplitude to 3/. In all the methods above, nonlinearity issue will degrade the performance of LSB decoding due to the small PAM4 eye opening. In a full-rate PAM4 receiver, a current-mode logic (CML) circuit is usually adopted to achieve high-speed operation; therefore, the VGA and the decoder have to consume more power in the tradeoff between power and speed. Subrate topologies are preferred, especially when the data rate is close to the process limit [9]. For a 1/4-rate topology, although the number of 1/4-rate blocks is fourfold, the total power consumption does not have to increase since the 1/4-rate blocks may adopt more power-efficient digital logic instead of CML. Both digital logic and CML circuits can achieve high speed. However, to achieve higher speed, CML needs to burn much more power than its alternative. Digital logic circuits can take advantage of the technology scaling and with ultralow power. For example, the power consumption of a strong-arm latch-based comparator is speed-proportional and far less than that of a CML configuration [10]. To further save power, merging functions into one block is also effective, like merging a VGA into a decoder.