Conferences >2024 IEEE Intelligent Vehicle...

Driver Head Pose Estimation with Multimodal Temporal Fusion of Color and Depth Modeling Networks

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

For in-vehicle systems, head pose estimation (HPE) is a primitive task for many safety indicators, including driver attention modeling, visual awareness estimation, behav...Show More

Metadata

Abstract:

For in-vehicle systems, head pose estimation (HPE) is a primitive task for many safety indicators, including driver attention modeling, visual awareness estimation, behavior detection, and gaze detection. The driver’s head pose information is also used to augment human-vehicle interfaces for infotainment and navigation. HPE is challenging, especially in the context of driving, due to the sudden variations in illumination, extreme poses, and occlusions. Due to these challenges, driver HPE based only on 2D color data is unreliable. These challenges can be addressed by 3D-depth data to an extent. We observe that features from 2D and 3D data complement each other. The 2D data provides detailed localized features, but is sensitive to illumination variations, whereas 3D data provides topological geometrical features and is robust to lighting conditions. Motivated by these observations, we propose a robust HPE model which fuses data obtained from color and depth cameras (i.e., 2D and 3D). The depth feature representation is obtained with a model based on PointNet++. The color images are processed with the ResNet-50 model. In addition, we add temporal modeling to our framework to exploit the time-continuous nature of head pose trajectories. We implement our proposed model using the multimodal driving monitoring (MDM) corpus, which is a naturalistic driving database. We present our model results with a detailed ablation study with unimodal and multimodal implementations, showing improvement in head pose estimation. We compare our results with baseline HPE models using regular cameras, including OpenFace 2.0 and HopeNet. Our fusion model achieves the best performance, obtaining an average root mean square error (RMSE) equal to 4.38 degrees.

Published in: 2024 IEEE Intelligent Vehicles Symposium (IV)

Date of Conference: 02-05 June 2024

Date Added to IEEE Xplore: 15 July 2024

ISBN Information:

ISSN Information:

DOI: 10.1109/IV55156.2024.10588411

Conference Location: Jeju Island, Korea, Republic of

Funding Agency:

References is not available for this document.

Contents

I. INTRODUCTION

In the field of advanced driver-assistance systems (ADAS), head pose estimation (HPE) of the driver is a primitive task for determining several safety metrics for in-vehicle systems. For example, HPE is a key technology for determining driver attention modeling [27], [28]. HPE can also be instrumental for other tasks such as predicting the driver gaze [16], [17], [20], [21], [26] or estimating the drowsiness level of a driver [43]. In addition to solutions to facilitate safety systems, HPE also plays a key role in improving driver-vehicle interfaces for navigation and infotainment purposes [1], [31]. As we transition to autonomous vehicles, it is also important to identify the visual awareness of the driver for take-over tasks [37]. These applications highlight the need for robust in-vehicle solutions for HPE.

References is not available for this document.

Driver Head Pose Estimation with Multimodal Temporal Fusion of Color and Depth Modeling Networks

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. INTRODUCTION

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Driver Head Pose Estimation with Multimodal Temporal Fusion of Color and Depth Modeling Networks

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. INTRODUCTION

Authors

Figures

References

Keywords

Metrics

Footnotes

References

IEEE Account

Purchase Details

Profile Information

Need Help?