Loading [MathJax]/extensions/TeX/upgreek.js
P2M-DeTrack: Processing-in-Pixel-in-Memory for Energy-efficient and Real-Time Multi-Object Detection and Tracking | IEEE Conference Publication | IEEE Xplore

P2M-DeTrack: Processing-in-Pixel-in-Memory for Energy-efficient and Real-Time Multi-Object Detection and Tracking

Publisher: IEEE

Abstract:

Today’s high resolution, high frame rate cameras in autonomous vehicles generate a large volume of data that needs to be transferred and processed by a downstream process...View more

Abstract:

Today’s high resolution, high frame rate cameras in autonomous vehicles generate a large volume of data that needs to be transferred and processed by a downstream processor or machine learning (ML) accelerator to enable intelligent computing tasks, such as multi-object detection and tracking. The massive amount of data transfer incurs significant energy, latency, and bandwidth bottlenecks, which hinders real-time processing. To mitigate this problem, we propose an algorithm-hardware co-design framework called Processing-in-Pixel-in-Memory-based object Detection and Tracking (P 2 M-DeTrack). P 2 M-DeTrack is based on a custom faster R-CNN-based model that is distributed partly inside the pixel array (front-end) and partly in a separate FPGA/ASIC (back-end). The proposed front-end in-pixel processing down-samples the input feature maps significantly with judiciously optimized strided convolution and pooling. Compared to a conventional baseline design that transfers frames of RGB pixels to the back-end, the resulting P 2 M-DeTrack designs reduce the data bandwidth between sensor and back-end by up to 24×. The designs also reduce the sensor and total energy (obtained from in-house circuit simulations at Globalfoundries 22nm technology node) per frame by 5.7× and 1.14×, respectively. Lastly, they reduce the sensing and total frame latency by an estimated 1.7× and 3×, respectively. We evaluate our approach on the multi-object object detection (tracking) task of the large-scale BDD100K dataset and observe only a 0.5% reduction in the mean average precision (0.8% reduction in the identification F1 score) compared to the state-of-the-art.
Date of Conference: 03-05 October 2022
Date Added to IEEE Xplore: 08 November 2022
ISBN Information:

ISSN Information:

Publisher: IEEE
Conference Location: Patras, Greece

I. Introduction & Related Work

Artificial intelligence (AI)-enabled video processing presents a challenging problem because the high resolution, high dynamic range, and high frame rates of image sensors generate large amounts of data that must be processed in real-time [14], [28]. In particular, the data transmission between the image sensor and the off-chip processing unit leads to significant latency, energy, and bandwidth bottlenecks. This problem is further exacerbated in an autonomous driving scenario, where there are a plethora of other sensors, such as radars and inertial measurement units (IMUs), that also need to transmit data for intelligence processing, including perception and localization [30].

References

References is not available for this document.