Loading [MathJax]/extensions/MathZoom.js
Driver Head Pose Estimation with Multimodal Temporal Fusion of Color and Depth Modeling Networks | IEEE Conference Publication | IEEE Xplore

Driver Head Pose Estimation with Multimodal Temporal Fusion of Color and Depth Modeling Networks


Abstract:

For in-vehicle systems, head pose estimation (HPE) is a primitive task for many safety indicators, including driver attention modeling, visual awareness estimation, behav...Show More

Abstract:

For in-vehicle systems, head pose estimation (HPE) is a primitive task for many safety indicators, including driver attention modeling, visual awareness estimation, behavior detection, and gaze detection. The driver’s head pose information is also used to augment human-vehicle interfaces for infotainment and navigation. HPE is challenging, especially in the context of driving, due to the sudden variations in illumination, extreme poses, and occlusions. Due to these challenges, driver HPE based only on 2D color data is unreliable. These challenges can be addressed by 3D-depth data to an extent. We observe that features from 2D and 3D data complement each other. The 2D data provides detailed localized features, but is sensitive to illumination variations, whereas 3D data provides topological geometrical features and is robust to lighting conditions. Motivated by these observations, we propose a robust HPE model which fuses data obtained from color and depth cameras (i.e., 2D and 3D). The depth feature representation is obtained with a model based on PointNet++. The color images are processed with the ResNet-50 model. In addition, we add temporal modeling to our framework to exploit the time-continuous nature of head pose trajectories. We implement our proposed model using the multimodal driving monitoring (MDM) corpus, which is a naturalistic driving database. We present our model results with a detailed ablation study with unimodal and multimodal implementations, showing improvement in head pose estimation. We compare our results with baseline HPE models using regular cameras, including OpenFace 2.0 and HopeNet. Our fusion model achieves the best performance, obtaining an average root mean square error (RMSE) equal to 4.38 degrees.
Date of Conference: 02-05 June 2024
Date Added to IEEE Xplore: 15 July 2024
ISBN Information:

ISSN Information:

Conference Location: Jeju Island, Korea, Republic of

Funding Agency:


I. INTRODUCTION

In the field of advanced driver-assistance systems (ADAS), head pose estimation (HPE) of the driver is a primitive task for determining several safety metrics for in-vehicle systems. For example, HPE is a key technology for determining driver attention modeling [27], [28]. HPE can also be instrumental for other tasks such as predicting the driver gaze [16], [17], [20], [21], [26] or estimating the drowsiness level of a driver [43]. In addition to solutions to facilitate safety systems, HPE also plays a key role in improving driver-vehicle interfaces for navigation and infotainment purposes [1], [31]. As we transition to autonomous vehicles, it is also important to identify the visual awareness of the driver for take-over tasks [37]. These applications highlight the need for robust in-vehicle solutions for HPE.

Contact IEEE to Subscribe

References

References is not available for this document.