Conferences >2021 IEEE/CVF Conference on C...

Fast and Accurate Single-Image Depth Estimation on Mobile Devices, Mobile AI 2021 Challenge: Report

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Depth estimation is an important computer vision problem with many practical applications to mobile devices. While many solutions have been proposed for this task, they a...Show More

Metadata

Abstract:

Depth estimation is an important computer vision problem with many practical applications to mobile devices. While many solutions have been proposed for this task, they are usually very computationally expensive and thus are not applicable for on-device inference. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop an end-to-end deep learning-based depth estimation solutions that can demonstrate a nearly real-time performance on smartphones and IoT platforms. For this, the participants were provided with a new large-scale dataset containing RGB-depth image pairs obtained with a dedicated stereo ZED camera producing high-resolution depth maps for objects located at up to 50 meters. The run-time of all models was evaluated on the popular Raspberry Pi 4 platform with a mobile ARM-based Broadcom chipset. The proposed solutions can generate VGA resolution depth maps at up to 10 FPS on the Raspberry Pi 4 while achieving high fidelity results, and are compatible with any Android or Linux-based mobile devices. A detailed description of all models developed in the challenge is provided in this paper.

Published in: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

Date of Conference: 19-25 June 2021

Date Added to IEEE Xplore: 01 September 2021

ISBN Information:

ISSN Information:

DOI: 10.1109/CVPRW53098.2021.00288

Conference Location: Nashville, TN, USA

Andrey Ignatov

Computer Vision Lab, ETH, Zurich, Switzerland

AI Witchlabs, Switzerland

Grigory Malivenko

Raspberry Pi (Trading) Ltd

David Plowman

Computer Vision Lab, ETH, Zurich, Switzerland

Samarth Shukla

Computer Vision Lab, ETH, Zurich, Switzerland

Radu Timofte

Computer Vision Lab, ETH, Zurich, Switzerland

AI Witchlabs, Switzerland

Ziyu Zhang

Tencent GY-Lab, China

Yicheng Wang

Tencent GY-Lab, China

Zilong Huang

Tencent GY-Lab, China

Guozhong Luo

Tencent GY-Lab, China

Gang Yu

Tencent GY-Lab, China

Bin Fu

Tencent GY-Lab, China

Yiran Wang

Key Laboratory of Image Processing and Intelligent Control, Ministry of Education, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, China

Xingyi Li

Key Laboratory of Image Processing and Intelligent Control, Ministry of Education, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, China

Min Shi

Key Laboratory of Image Processing and Intelligent Control, Ministry of Education, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, China

Ke Xian

Key Laboratory of Image Processing and Intelligent Control, Ministry of Education, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, China

Zhiguo Cao

Key Laboratory of Image Processing and Intelligent Control, Ministry of Education, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, China

Jin-Hua Du

Nanjing Artificial Intelligence Chip Research, Institute of Automation, China

Pei-Lin Wu

Nanjing Artificial Intelligence Chip Research, Institute of Automation, China

Chao Ge

Nanjing Artificial Intelligence Chip Research, Institute of Automation, China

Jiaoyang Yao

Black Sesame Technologies Inc, Singapore

Fangwen Tu

Black Sesame Technologies Inc, Singapore

Bo Li

Black Sesame Technologies Inc, Singapore

Jung Eun Yoo

Visual Media Lab, KAIST, South Korea

Kwanggyoon Seo

Visual Media Lab, KAIST, South Korea

Jialei Xu

Harbin Institute of Technology, China Peng Cheng Laboratory, China

Zhenyu Li

Harbin Institute of Technology, China Peng Cheng Laboratory, China

Xianming Liu

Harbin Institute of Technology, China Peng Cheng Laboratory, China

Junjun Jiang

Harbin Institute of Technology, China Peng Cheng Laboratory, China

Wei-Chi Chen

Multimedia and Computer Vision Laboratory, National Cheng Kung University, Taiwan

Shayan Joya

Samsung Research, UK, United Kingdom

Huanhuan Fan

OPPO Research Institute, China

Zhaobing Kang

OPPO Research Institute, China

Ang Li

OPPO Research Institute, China

Tianpeng Feng

OPPO Research Institute, China

Yang Liu

OPPO Research Institute, China

Chuannan Sheng

OPPO Research Institute, China

Jian Yin

OPPO Research Institute, China

Fausto T. Benavides

ETH, Zurich, Switzerland

Contents

1. Introduction

A wide spread of various depth-guided problems related to augmented reality, gesture recognition, object segmentation, autonomous driving and bokeh effect rendering tasks has created a strong demand for fast and efficient single-image depth estimation approaches that can run on portable low-power hardware. While many accurate deep learning-based solutions have been proposed for this problem in the past [46], [16], [14], [47], [48], [42], [15], [10], they were optimized for high fidelity results only while not taking into account computational efficiency and mobile-related constraints, which is essential for tasks related to image processing [23], [24], [37] on mobile devices. This results in solutions requiring powerful high-end GPUs and consuming gigabytes of RAM when processing even low-resolution input data, thus being incompatible with resource-constrained mobile hardware. In this challenge, we change the current depth estimation benchmarking paradigm by using a new depth estimation dataset collected in the wild and by imposing additional efficiency-related constraints on the designed solutions.

Andrey Ignatov

Computer Vision Lab, ETH, Zurich, Switzerland

AI Witchlabs, Switzerland

Grigory Malivenko

Raspberry Pi (Trading) Ltd

David Plowman

Computer Vision Lab, ETH, Zurich, Switzerland

Samarth Shukla

Computer Vision Lab, ETH, Zurich, Switzerland

Radu Timofte

Computer Vision Lab, ETH, Zurich, Switzerland

AI Witchlabs, Switzerland

Ziyu Zhang

Tencent GY-Lab, China

Yicheng Wang

Tencent GY-Lab, China

Zilong Huang

Tencent GY-Lab, China

Guozhong Luo

Tencent GY-Lab, China

Gang Yu

Tencent GY-Lab, China

Bin Fu

Tencent GY-Lab, China

Yiran Wang

Key Laboratory of Image Processing and Intelligent Control, Ministry of Education, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, China

Xingyi Li

Key Laboratory of Image Processing and Intelligent Control, Ministry of Education, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, China

Min Shi

Key Laboratory of Image Processing and Intelligent Control, Ministry of Education, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, China

Ke Xian

Key Laboratory of Image Processing and Intelligent Control, Ministry of Education, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, China

Zhiguo Cao

Key Laboratory of Image Processing and Intelligent Control, Ministry of Education, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, China

Jin-Hua Du

Nanjing Artificial Intelligence Chip Research, Institute of Automation, China

Pei-Lin Wu

Nanjing Artificial Intelligence Chip Research, Institute of Automation, China

Chao Ge

Nanjing Artificial Intelligence Chip Research, Institute of Automation, China

Jiaoyang Yao

Black Sesame Technologies Inc, Singapore

Fangwen Tu

Black Sesame Technologies Inc, Singapore

Bo Li

Black Sesame Technologies Inc, Singapore