Processing math: 0%
A. Khapikov - IEEE Xplore Author Profile

Showing 1-19 of 19 results

Filter Results

Show

Results

Deep learning algorithms are increasingly employed at the edge. However, edge devices are resource constrained and thus require efficient deployment of deep neural networks. Pruning methods are a key tool for edge deployment as they can improve storage, compute, memory bandwidth, and energy usage. In this paper we propose a novel accurate pruning technique that allows precise control over the outp...Show More
Current state-of-the-art Neural Architecture Search (NAS) methods neither efficiently scale to multiple hardware platforms, nor handle diverse architectural search-spaces. To remedy this, we present DONNA (Distilling Optimal Neural Network Architectures), a novel pipeline for rapid, scalable and diverse NAS, that scales to many user scenarios. DONNA consists of three phases. First, an accuracy pre...Show More
The trend of pushing inference from cloud to edge due to concerns of latency, bandwidth, and privacy has created demand for energy-efficient neural network hardware. This paper presents a mixed-signal binary convolutional neural network (CNN) processor for always-on inference applications that achieves 3.8 μJ/classification at 86% accuracy on the CIFAR-10 image classification data set. The goal of...Show More
The energy efficiency demands of future abundant-data applications, e.g., those which use inference-based techniques to classify large amounts of data, exceed the capabilities of digital systems today. Field-effect transistors (FETs) built using nanotechnologies, such as carbon nanotubes (CNTs), can improve energy efficiency significantly. However, carbon nanotube FETs (CNFETs) are subject to proc...Show More
Recently, there has been an increasing demand for advanced classification capabilities embedded on wearable battery constrained devices, such as smartphones or watches. Achieving such functionality with a tight power and energy budget has proven a real challenge, specifically for large-scale neural network-based applications. Previously, cascaded systems have been proposed to minimize energy consu...Show More
This paper introduces BinarEye: the first digital processor for always-on Binary Convolutional Neural Networks. The chip maximizes data reuse through a Neuron Array exploiting local weight Flip-Flops. It stores full network models and feature maps and hence requires no off-chip bandwidth, which leads to a 230 lb-TOPS/W peak efficiency. Its 3-levels of flexibility - (a) weight reconfiguration, (b) ...Show More
Deployment of convolutional neural networks (ConvNets) in always-on Internet of Everything (IoE) edge devices is severely constrained by the high memory energy consumption of hardware ConvNet implementations. Leveraging the error resilience of ConvNets by accepting bit errors at reduced voltages presents a viable option for energy savings, but few implementations utilize this due to the limited qu...Show More
The trend of pushing deep learning from cloud to edge due to concerns of latency, bandwidth, and privacy has created demand for low-energy deep convolutional neural networks (CNNs). The single-layer classifier in [1] achieves sub-nJ operation, but is limited to moderate accuracy on low-complexity tasks (90% on MNIST). Larger CNN chips provide dataflow computing for high-complexity tasks (AlexNet) ...Show More
This work targets the automated minimum-energy optimization of Quantized Neural Networks (QNNs) - networks using low precision weights and activations. These networks are trained from scratch at an arbitrary fixed point precision. At iso-accuracy, QNNs using fewer bits require deeper and wider network architectures than networks using higher precision operators, while they require less complex ari...Show More
Deep learning has recently become immensely popular for image recognition, as well as for other recognition and pattern matching tasks in, e.g., speech processing, natural language processing, and so forth. The online evaluation of deep neural networks, however, comes with significant computational complexity, making it, until recently, feasible only on power-hungry server platforms in the cloud. ...Show More
Several applications in machine learning and machine-to-human interactions tolerate small deviations in their computations. Digital systems can exploit this fault-tolerance to increase their energy-efficiency, which is crucial in embedded applications. Hence, this paper introduces a new means of Approximate Computing: Dynamic-Voltage-Accuracy-Frequency-Scaling (DVAFS), a circuit-level technique en...Show More
ConvNets, or Convolutional Neural Networks (CNN), are state-of-the-art classification algorithms, achieving near-human performance in visual recognition [1]. New trends such as augmented reality demand always-on visual processing in wearable devices. Yet, advanced ConvNets achieving high recognition rates are too expensive in terms of energy as they require substantial data movement and billions o...Show More
A precision-scalable processor for low-power ConvNets or convolutional neural networks is implemented in a 40-nm CMOS technology. To minimize energy consumption while maintaining throughput, this paper is the first to implement dynamic precision and energy scaling and exploit the sparsity of convolutions in a dedicated processor architecture. The processor's 256 parallel processing units achieve a...Show More
A low-power precision-scalable processor for ConvNets or convolutional neural networks (CNN) is implemented in a 40nm technology. Its 256 parallel processing units achieve a peak 102GOPS running at 204MHz. To minimize energy consumption while maintaining throughput, this works is the first to both exploit the sparsity of convolutions and to implement dynamic precision-scalability enabling supply- ...Show More
Recently convolutional neural networks (ConvNets) have come up as state-of-the-art classification and detection algorithms, achieving near-human performance in visual detection. However, ConvNet algorithms are typically very computation and memory intensive. In order to be able to embed ConvNet-based classification into wearable platforms and embedded systems such as smartphones or ubiquitous elec...Show More
A wide variety of existing and emerging applications in recognition, mining and synthesis and machine-to-human interactions tolerate small errors or deviations in their computational results. Digital systems can exploit this error tolerance to increase their energy efficiency, which is crucial in high performance wearable electronics and in emerging low power systems for the internet-of-things. A ...Show More
The continued scaling of feature sizes in integrated circuit technology leads to more uncertainty and unreliability in circuit behaviour. Maintaining the paradigm of deterministic Boolean computing therefore becomes increasingly challenging. Stochastic computing (SC) processes digital data in the form of pseudo-random bit-streams denoting probabilities and is therefore less vulnerable to uncertain...Show More
The continued scaling of feature sizes in integrated circuit technology leads to more uncertainty and unreliability in circuit behavior. Maintaining the paradigm of deterministic Boolean computing therefore becomes increasingly challenging. Stochastic computing (SC) processes digital data in the form of long pseudo-random bit-streams denoting probabilities and is therefore less vulnerable to uncer...Show More