Loading [MathJax]/extensions/MathZoom.js
Multimodal Road Sign Interpretation for Autonomous Vehicles | IEEE Conference Publication | IEEE Xplore

Multimodal Road Sign Interpretation for Autonomous Vehicles


Abstract:

Autonomous vehicles (AVs) are becoming increasingly prevalent. However, current AVs are unable to handle unexpected traffic signs (e.g., construction zones, road closures...Show More

Abstract:

Autonomous vehicles (AVs) are becoming increasingly prevalent. However, current AVs are unable to handle unexpected traffic signs (e.g., construction zones, road closures) encountered on the roads. To address this limitation, we propose MOSER, a Multimodal rOad Sign intERpretation system, to enable automated detection and interpretation of diverse road signs. Our system consists of a pipeline architecture with three main components, including perception, text processing, and planning. The perception component detects arbitrary road signs and extracts the sign text into proper groups and orders. The text processing component then identifies the high-level semantics of the text and determines whether any actions are required for the autonomous vehicle. Based on the interpretation of the signs, the planning component provides navigation guidance, such as instructing the vehicle to stop at a specific location or adding rules to its internal map. To the best of our knowledge, this is the first attempt to address the interpretation of arbitrary road signs using a multimodal processing strategy. Our work provides important insights and capabilities to support Level 4 autonomous vehicles, ensuring their safety and smoothness of operation.
Date of Conference: 17-20 December 2022
Date Added to IEEE Xplore: 26 January 2023
ISBN Information:
Conference Location: Osaka, Japan
References is not available for this document.

I. Introduction

Within the last several years, autonomous vehicle (AV) technology has seen significant progress due to rapid advancements in computer vision [Sun+20]; [Liu+20]; [Vou+18] and artificial intelligence research [Wan+21]; [Bad+21]. Recently, AVs are being deployed in a growing number of major cities across the United States including San Francisco, Las Vegas, and Phoenix, with imminent plans to inaugurate driver-less taxis [NPR22]. However, one limitation of current AVs is that they often rely on pre-generated, high-definition maps – a data abstraction that captures "pre-recorded" information about an environment, which AVs use to navigate. Consequently, any acute changes to the environment can drastically affect their ability to operate, which can also have a significant and negative bearing on road safety.

Select All
1.
Smith Ray, "An overview of the Tesseract OCR engine", Ninth international conference on document analysis and recognition (ICDAR 2007), vol. 2, pp. 629-633, 2007.
2.
Laurens Van der Maaten and Geoffrey Hinton, "Visualizing data using t-SNE", Journal of machine learning research, vol. 9, no. 11, 2008.
3.
Angel X Chang and Christopher D Manning, "Sutime: A library for recognizing and normalizing time expressions", Lrec., 2012.
4.
Jack Greenhalgh and Majid Mirmehdi, "Recognizing text-based traffic signs", IEEE Transactions on Intelligent Transportation Systems, vol. 16, no. 3, pp. 1360-1369, 2014.
5.
Xuejian Rong, Chucai Yi and Yingli Tian, "Recognizing text-based traffic guide panels with cascaded localization network", European Conference on Computer Vision, 2016.
6.
Seles Xavier and R Reshmi, "Automatic detection and recognition of text in traffic sign boards based on word recognizer", vol. 3, no. 04, 2016.
7.
Alexey Dosovitskiy et al., "CARLA: An Open Urban Driving Simulator", Proceedings of the 1st Annual Conference on Robot Learning, 2017.
8.
He Kaiming et al., "Mask r-cnn", Proceedings of the IEEE international conference on computer vision., pp. 2961-2969, 2017.
9.
Neuhold Gerhard et al., "The mapillary vistas dataset for semantic understanding of street scenes", Proceedings of the IEEE international conference on computer vision., 2017.
10.
et al., "East: an efficient and accurate scene text detector", Proceedings of the IEEE conference on Computer Vision and Pattern Recognition., pp. 5551-5560, 2017.
11.
Cer Daniel et al., "Universal sentence encoder", 2018.
12.
Minghui Liao, Baoguang Shi and Xiang Bai, "Textboxes++: A single-shot oriented scene text detector", IEEE transactions on image processing, vol. 27, no. 8, pp. 3676-3690, 2018.
13.
Luo Hengliang et al., "Traffic Sign Recognition Using a Multi-Task Convolutional Neural Network", IEEE Transactions on Intelligent Transportation Systems, 2018.
14.
Christian S Perone, Roberto Silveira and Thomas S Paula, "Evaluation of sentence embeddings in downstream and linguistic probing tasks", 2018.
15.
Voulodimos Athanasios et al., "Deep learning for computer vision: A brief review", Computational intelligence and neuroscience, vol. 2018, 2018.
16.
Baek Jeonghun et al., "What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis", International Conference on Computer Vision (ICCV), 2019.
17.
Baek Youngmin et al., "Character region awareness for text detection", Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition., pp. 9365-9374, 2019.
18.
Jia Li and Zengfu Wang, "Real-Time Traffic Sign Recognition Based on Efficient CNNs in the Wild", IEEE Transactions on Intelligent Transportation Systems, vol. 20, no. 3, pp. 975-984, 2019.
19.
Ertler Christian et al., "The mapillary traffic sign dataset for detection and classification on a global scale", European Conference on Computer Vision, pp. 68-84, 2020.
20.
Jocher Glenn et al., ultralytics/yolov5: v3.1 - Bug Fixes and Performance Improvements, 2020.
21.
Liao Minghui et al., "Real-time scene text detection with differentiable binarization", Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp. 11474-11481, 07. 2020.
22.
Liu Li et al., "Deep learning for generic object detection: A survey", International journal of computer vision, vol. 128, no. 2, pp. 261-318, 2020.
23.
Memon Jamshed et al., "Handwritten optical character recognition (OCR): A comprehensive systematic literature review (SLR)", IEEE Access, vol. 8, pp. 142642-142668, 2020.
24.
Sun Pei et al., "Scalability in perception for autonomous driving: Waymo open dataset", Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020.
25.
Domen Tabernik and Danijel Skocaj, "Deep Learning for Large-Scale Traffic-Sign Detection and Recognition", Transactions On Intelligent Transportation Systems, 2020.
26.
Badue Claudine et al., "Self-driving cars: A survey", Expert Systems with Applications, vol. 165, pp. 113816, 2021.
27.
Chen Ee Heng et al., "Investigating Binary Neural Networks for Traffic Sign Detection and Recognition", Intelligent Vehicles Symposium, 2021.
28.
Liu Ze et al., "Swin transformer: Hierarchical vision transformer using shifted windows", Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021.
29.
Yousef Taki and Elmoukhtar Zemmouri, "An Overview of Real-Time Traffic Sign Detection and Classification", International Conference on Smart City Applications, 2021.
30.
Wang Mingyu et al., "Game-theoretic planning for self-driving cars in multivehicle competitive scenarios", IEEE Transactions on Robotics, vol. 37, no. 4, pp. 1313-1325, 2021.

Contact IEEE to Subscribe

References

References is not available for this document.