Loading [MathJax]/extensions/MathMenu.js
Marios Papaefthymiou - IEEE Xplore Author Profile

Showing 1-25 of 89 results

Filter Results

Show

Results

This work proposes a one-step-ahead electrical load forecasting strategy based on machine learning. The approach relies on a novel data-efficient online training data selection mechanism that uses an improved dynamic time warping method to dynamically subselect the most relevant training data from the entire dataset. It also uses an adaptive prediction result optimization method to further enhance...Show More
This paper proposes a real-time electrical load forecasting framework that supports two prediction modes: one-step-ahead and one-day-ahead. The one-step-ahead predictor relies on a feedback mechanism to reduce the impact of random electrical load activity on prediction results. A prediction evaluator assesses previous prediction outcomes to automatically determine the most suitable of the two fore...Show More
A deep-learning processor is presented for achieving ultra-low-power operation in mobile applications. Using a heterogeneous architecture that includes a low-power always-on front-end and a selectively-enabled high-performance backend, the processor dynamically adjusts computational resources at runtime to support conditional execution in neural networks and meet performance targets with increased...Show More
An AES core designed for low-cost and energy-efficient IoT security applications is fabricated in a 65nm CMOS technology. A novel Dual-Rail Flush Logic (DRFL) with switching-independent power profile is used to yield intrinsic resistance against Differential Power Analysis (DPA) attacks with minimum area and energy consumption. Measurement results show that this 0.048mm2 core achieves energy consu...Show More
This paper presents an ANSI S1.11 1/3-octave filter-bank chip for binaural hearing aids with two microphones per ear. Binaural multimicrophone systems significantly suppress noise interference and preserve interaural time cues at the cost of significantly higher computational and power requirements than monophonic single-microphone systems. With clock rates around the 1MHz mark, these systems are ...Show More
This paper presents an ultra-high-performance neural network engine fabricated in a 65nm CMOS technology. The 0.9mm2 core relies on an energy-efficient resonant clock mesh running at 5.5GHz to achieve 0.76 8-bit TOPS, improving throughput by over 4x, area efficiency by over 8×, and energy-delay-area product by over 1.8× compared to previous state-of-the-art neural network designs. Achieving a char...Show More
The portion of clock power in system is rapidly increasing with the continuous increasing of clock frequency and clock resources. Last two decades, a great research attention has been paid to minimizing the clock power. Recently, it is shown that the structure of resonant clock networks is very effective in saving power since it can store electric energy to the inserted inductors rather than dissi...Show More
A 576-bit LDPC decoder is designed using a charge-recovery logic family and in-package inductors. The decoder testchip is fabricated in a 65nm CMOS flip-chip process. Unlike all previously published high-performance charge-recovery chips, which use on-chip inductors to recover charge from parasitic capacitance, this charge-recovery design uses in-package inductors, avoiding the area overheads of o...Show More
A 128-bit Advanced Encryption Standard (AES) core targeted for high-performance security applications is fabricated in a 65nm CMOS technology. A novel charge-recovery logic family, called Bridge Boost Logic (BBL), is introduced in this design to achieve switching-independent energy dissipation for an intrinsic high resistance against Differential Power Analysis (DPA) attacks. Based on measurements...Show More
The portion of clock power in system is rapidly increasing with the continuous increasing of clock frequency and clock resources. Last two decades, a great research attention has been paid to minimizing the clock power. Recently, it is shown that the structure of resonant clock networks is very effective in saving power since it can store electric energy to the inserted inductors rather than dissi...Show More
Computational sprinting has been proposed to improve responsiveness for the intermittent computational demands of many current and emerging mobile applications by briefly activating reserve cores and/or boosting frequency and voltage to power levels that far exceed the system's sustained cooling capability. In this work, we focus on the thermal consequences of computational sprinting, studying the...Show More
This paper presents a 576b LDPC decoder test-chip designed using a charge-recovery logic family. The chip has been fabricated in a 65nm CMOS process and relies on 16 integrated inductors to achieve energy-efficient operation by recovering charge from gate fanouts. When self-oscillating at 821MHz, the chip recovers 51.4% of the energy supplied to it. In terms of device count, this chip is more than...Show More
Computational sprinting activates dark silicon to improve responsiveness by briefly but intensely exceeding a system's sustainable power limit. This article focuses on the energy implications of sprinting. The authors observe that sprinting can save energy even while improving responsiveness by enabling execution in chip configurations that, though thermally unsustainable, improve energy efficienc...Show More
The tight thermal constraints of mobile devices, which limit sustainable performance, and the bursty nature of interactive mobile applications call for a new design focus: enhancing user responsiveness rather than sustained throughput. To that end, this article explores computational sprinting, wherein a mobile device temporarily exceeds sustainable thermal limits to provide a brief, intense burst...Show More
AMD's 32-nm x86-64 core code-named “Piledriver” features a resonant global clock distribution to reduce clock distribution power while maintaining a low clock skew. To support a wide range of operating frequencies expected of the core, the global clock system operates in two modes: a resonant-clock (rclk) mode for energy-efficient operation over a desired frequency range and a conventional, direct...Show More
AMD's 4+ GHz x86-64 core codenamed “Piledriver” employs resonant clocking to reduce clock distribution power up to 24% while maintaining a low clock-skew target. To support testability and robust operation at the wide range of operating frequencies required of a commercial processor, the clock system operates in two modes: direct-drive (cclk) and resonant (rclk). Leveraging favorable factors such ...Show More
Although transistor density continues to increase, voltage scaling has stalled and thus power density is increasing each technology generation. Particularly in mobile devices, which have limited cooling options, these trends lead to a utilization wall in which sustained chip performance is limited primarily by power rather than area. However, many mobile applications do not demand sustained perfor...Show More
This paper presents a 14-tap 8-bit finite impulse response (FIR) test-chip that has been designed using a novel charge-recovery logic family, called Enhanced Boost Logic (EBL), to achieve high-speed and low-power operation. Compared to previous charge-recovery circuitry, EBL achieves increased gate overdrive, resulting in low latency overhead over static CMOS design. The EBL-based FIR has been des...Show More
A 65nm CMOS 5.5GS/s non-interleaved 5-bit flash ADC with resonant clocking is presented. An on-chip 0.77nH inductor resonates the entire clock distribution network to achieve energy-efficient operation. The ADC occupies 0.035mm2 and consumes 28mW when operating at 5.5GHz, yielding 396fJ per conversion step. The clock network dissipates only 10.7% of total power, consuming 54% lower energy over CV2...Show More
This paper presents an 8-cycle 64 FO4 single-precision fused-multiply-add floating-point unit (FPU) chip with fine-grain resonant clocking and dynamic-evaluation static-latch logic to achieve dynamic-logic levels performance with significant power reduction. Fabricated in a 90nm low-power RVT technology, the resonant FPU achieves clock speeds up to 2.07GHz. At its resonant frequency of 1.81GHz, it...Show More
This paper presents a finite impulse response (FIR) filter chip that relies on a charge-recovery logic family to achieve multi-MHz clock frequencies with subthreshold DC supply levels. Fabricated in a 0.13 ¿m CMOS process with Vth,nmos = 0.40 V, the FIR operates with a two-phase power-clock in the 5 MHz-187 MHz range and with DC supplies in the 0.16 V-0.36 V range. Using a single DC supply, the ch...Show More
An ARM926EJ-STM microcontroller with a fully resonant clock distribution network and 16KB data and instruction caches has been implemented in 130nm bulk silicon. Workloads execute successfully across process and temperature corners, and at room temperature, typical-process chips run at clock speeds up to 200MHz with 1.2V supply. At resonance, the microcontroller core dissipates 0.23mW/MHz, recover...Show More
We present a 14-tap 8-bit FIR chip designed using a novel charge-recovery logic family with only 1.5 cycles of additional latency over the best possible static CMOS design. Fabricated in a 0.13 mum CMOS process, the chip operates in the 365-600 MHz range with a 3 nH on-chip inductor. At its resonant frequency of 466 MHz, it dissipates 39.1 mW and recovers 45% of the energy supplied to it.Show More
A 187MHz FIR filter with a single subthreshold DC supply is designed using a charge-recovery logic family. Fabricated in a 0.13µm CMOS process with Vth,nmos = 0.40V, the FIR operates in the 80MHz–187MHz range with DC supply in the 0.30V–0.36V range, respectively. Resonating at 100MHz, it consumes 21.27pJ per cycle and achieves 23.7nW/Tap/MHz/InBit/CoeffBit.Show More
This paper describes RF1 and RF2, two level-clocked test-chips that deploy resonant clocking to reduce power consumption in their clock distribution networks. It also highlights RCL, a novel resonant-clock latch-based methodology that was used to design the two test-chips. RF1 and RF2 are 8-bit 14-tap finite-impulse response (FIR) filters with identical architectures. Designed using a fully automa...Show More