Processing math: 0%
H. Yao - IEEE Xplore Author Profile

Showing 1-20 of 20 results

Filter Results

Show

Results

While vision transformers (ViTs) have shown consistent progress in computer vision, deploying them for real-time decision-making scenarios (<1 ms) is challenging. Current computing platforms like CPUs, GPUs, or FPGA-based solutions struggle to meet this deterministic low-latency real-time requirement, even with quantized ViT models. Some approaches use pruning or sparsity to reduce the model size ...Show More
This paper introduces SDA, the first effort to adapt the expensive stable diffusion (SD) model for edge FPGA deployment. First, we apply quantization-aware training to quantize its weights to 4 -bit and activations to 8 -bit (W 4 A 8) with a negligible accuracy loss. Based on that, we propose a high-performance hybrid systolic array (hybridSA) architecture that natively executes convolution and ...Show More
Image aesthetic quality assessment (IAA) is aimed at predicting the general aesthetic evaluation of images by the public. However, the common neural network has a fixed size requirement for the input and ignores the close relation between image composition and aesthetic rating. Therefore, in this paper, we design and implement a concise and efficient IAA algorithm based on graph convolutional neur...Show More
Diffusion-based text-to-image generative models, e.g., Stable Diffusion, have revolutionized the field of content generation, enabling significant advancements in areas like image editing and video synthesis. Despite their formidable capabilities, these models are not without their limitations. It is still challenging to synthesize an image that aligns well with the input text, and multiple runs w...Show More
Deformable convolutions can improve detection accuracy in Convolution Neural Networks (CNNs) by leveraging flexible spatial sampling in augmenting kernels with learnable offsets. However, the resulting irregular memory access patterns and additional pixel lookup overhead introduced by deformable layers pose inherent challenges when executed on high-throughput devices such as GPUs. To address these...Show More
As Deep Neural Networks (DNNs) become popular in mobile systems, their high computational and memory demands make them major power consumers, especially in limited-budget scenarios. In this paper, we propose DACO, a DNN-Adaptive CPU-GPU CO-optimization technique, to reduce the power consumption of DNNs. First, a resource-oriented classifier is proposed to quantify the computation/memory intensity ...Show More
With the success of Vision Transformers (ViTs) in computer vision tasks, recent arts try to optimize the performance and complexity of ViTs to enable efficient deployment on mobile devices. Multiple approaches are proposed to accelerate attention mechanism, improve inefficient designs, or incorporate mobile-friendly lightweight convolutions to form hybrid architectures. However, ViT and its varian...Show More
With the emergence of DNN applications on mobile devices, plenty of attention has been attracted to their optimization. However, the impact of DNN inference tasks on device power consumption is still a lack of comprehensive study. In this work, we propose MOC, a Multi-Objective deep reinforcement learning-assisted DNN inference stage-adaptive CPU-GPU Co-optimization approach. We find through exper...Show More
As edge devices become readily available and indispensable, there is an urgent need for effective and efficient intelligent applications to be deployed widespread. However, fairness has always been an issue, especially in edge medical applications. Although many approaches have been proposed to mitigate the unfairness problem, their edge performance is not desirable. By examining the fairness perf...Show More
As edge devices become readily available and indispensable, there is an urgent need for effective and efficient intelligent applications to be deployed widespread. However, fairness has always been an issue, especially in edge medical applications. Compared to convolutional neuron networks (CNNs), Vision Transformer (ViT) has a better ability to extract global information, which will contribute to...Show More
With the ever-increasing popularity of edge devices, it is necessary to implement real-time segmentation on the edge for autonomous driving and many other applications. Vision Transformers (ViTs) have shown considerably stronger results for many vision tasks. However, ViTs with the fullattention mechanism usually consume a large number of computational resources, leading to difficulties for real- ...Show More
Stochastic rounding is crucial in the low-bit (e.g., 8-bit) training of deep neural networks (DNNs) to achieve high accuracy. One of the drawbacks of prior studies is that they require a large number of high-precision stochastic rounding units (SRUs) to guarantee low-bit DNN accuracy, which involves considerable hardware overhead. In this paper, we use extremely low-bit SRUs (ESRUs) to save a larg...Show More
Over-the-air analog computation allows offloading computation to the wireless environment through carefully constructed transmitted signals. In this paper, we design and implement the first-of-its-kind convolution that uses over-the-air computation and demonstrate it for inference tasks in a convolutional neural network (CNN). We engineer the ambient wireless propagation environment through reconf...Show More
Vision transformers (ViTs) are emerging with significantly improved accuracy in computer vision tasks. However, their complex architecture and enormous computation/storage demand impose urgent needs for new hardware accelerator design methodology. This work proposes an FPGA-aware automatic ViT acceleration framework based on the proposed mixed-scheme quantization. To the best of our knowledge, thi...Show More
Orthogonal Frequency Division Multiplexing (OFDM)-based waveforms are used for communication links in many current and emerging Internet of Things (IoT) applications, including the latest WiFi standards. For such OFDM-based transceivers, many core physical layer functions related to channel estimation, demapping, and decoding are implemented for specific choices of channel types and modulation sch...Show More
In-memory deep neural network (DNN) accelerators will be the key for energy-efficient autonomous edge systems. The resistive random access memory (ReRAM) is a potential solution for the non-CMOS-based in-memory computing platform for energy-efficient autonomous edge systems, thanks to its promising characteristics, such as near-zero leakage-power and non-volatility. However, due to the hardware in...Show More
Generating custom modulation patterns as well as dynamically varying the mapping of the constellation points to their corresponding bit representations are some existing methods for mitigating eavesdropping attacks. In such cases, the custom symbol to bit mapping needs to be conveyed to the receiver through a secure and reliable channel. Instead of sending the representations of the modified symbo...Show More
This work proposes a novel Deep Neural Network (DNN) quantization framework, namely RMSMP, with a Row-wise Mixed-Scheme and Multi-Precision approach. Specifically, this is the first effort to assign mixed quantization schemes and multiple precisions within layers – among rows of the DNN weight matrix, for simplified operations in hardware inference, while preserving accuracy. Furthermore, this pap...Show More
With the increasing demand to efficiently deploy DNNs on mobile edge devices, it becomes much more important to reduce unnecessary computation and increase the execution speed. Prior methods towards this goal, including model compression and network architecture search (NAS), are largely performed independently, and do not fully consider compiler-level optimizations which is a must-do for mobile a...Show More
Deep Neural Networks (DNNs) have achieved extraordinary performance in various application domains. To support diverse DNN models, efficient implementations of DNN inference on edge-computing platforms, e.g., ASICs, FPGAs, and embedded systems, are extensively investigated. Due to the huge model size and computation amount, model compression is a critical step to deploy DNN models on edge devices....Show More