Loading web-font TeX/Main/Regular
Lite Vision Transformer with Enhanced Self-Attention | IEEE Conference Publication | IEEE Xplore

Lite Vision Transformer with Enhanced Self-Attention


Abstract:

Despite the impressive representation capacity of vision transformer models, current light-weight vision transformer models still suffer from inconsistent and incorrect d...Show More

Abstract:

Despite the impressive representation capacity of vision transformer models, current light-weight vision transformer models still suffer from inconsistent and incorrect dense predictions at local regions. We suspect that the power of their self-attention mechanism is limited in shallower and thinner networks. We propose Lite Vision Transformer (LVT), a novel light-weight transformer network with two enhanced self-attention mechanisms to improve the model performances for mobile deployment. For the low-level features, we introduce Convolutional Self-Attention (CSA). Unlike previous approaches of merging convolution and self-attention, CSA introduces local self-attention into the convolution within a kernel of size 3\times 3 to enrich low-level features in the first stage of LVT. For the high-level features, we propose Recursive Atrous Self-Attention (RASA), which utilizes the multi-scale context when calculating the similarity map and a recursive mechanism to increase the representation capability with marginal extra parameter cost. The superiority of LVT is demonstrated on ImageNet recognition, ADE20K semantic segmentation, and COCO panoptic segmentation. The code is made publicly available11https://github.com/Chenglin-Yang/LVT.
Date of Conference: 18-24 June 2022
Date Added to IEEE Xplore: 27 September 2022
ISBN Information:

ISSN Information:

Conference Location: New Orleans, LA, USA

Funding Agency:

Citations are not available for this document.

1. Introduction

Transformer-based architectures have achieved remarkable success most recently, they demonstrated superior performances on a variety of vision tasks, including visual recognition [65], object detection [38], [56], semantic segmentation [10], [60] and etc [32], [54], [55].

Cites in Papers - |

Cites in Papers - IEEE (43)

Select All
1.
Jing Xu, Wentao Shi, Pan Gao, Qizhu Li, Zhengwei Wang, "MUSTER: A Multi-Scale Transformer-Based Decoder for Semantic Segmentation", IEEE Transactions on Emerging Topics in Computational Intelligence, vol.9, no.1, pp.202-212, 2025.
2.
Bingchao Huang, Chuantao Yin, Chao Wang, Hui Chen, Yanmei Chai, Yuanxin Ouyang, "Video-Based Recognition of Online Learning Behaviors Using Attention Mechanisms", 2024 IEEE International Conference on Teaching, Assessment and Learning for Engineering (TALE), pp.1-7, 2024.
3.
Yining Liu, Ziyao Wang, "Enhanced Transformer-LSTM Daily Sales Forecasting Model Based on Attention Mechanism", 2024 6th International Conference on Robotics, Intelligent Control and Artificial Intelligence (RICAI), pp.730-734, 2024.
4.
Mazen Amria, Aziz M. Qaroush, Mohammad Jubran, Alaa Zuhd, Ahmad Khatib, "Distillation-Based Model Compression Framework for Swin Transformer", 2024 IEEE International Conference on Future Machine Learning and Data Science (FMLDS), pp.545-551, 2024.
5.
Jiwon Yoo, Jangwon Lee, Gyeonghwan Kim, "A Decoding Scheme With Successive Aggregation of Multi-Level Features For Light-Weight Semantic Segmentation", 2024 IEEE International Conference on Image Processing (ICIP), pp.1071-1077, 2024.
6.
Bouzid Arezki, Anissa Mokraoui, Fangchen Feng, "Efficient Image Compression Using Advanced State Space Models", 2024 IEEE 26th International Workshop on Multimedia Signal Processing (MMSP), pp.1-6, 2024.
7.
Yawen Lu, Cheng Han, Qifan Wang, Heng Fan, Zhaodan Kong, Dongfang Liu, Yingjie Chen, "Optical Flow as Spatial-Temporal Attention Learners", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.46, no.12, pp.11491-11506, 2024.
8.
Ruiping Liu, Kailun Yang, Alina Roitberg, Jiaming Zhang, Kunyu Peng, Huayao Liu, Yaonan Wang, Rainer Stiefelhagen, "TransKD: Transformer Knowledge Distillation for Efficient Semantic Segmentation", IEEE Transactions on Intelligent Transportation Systems, vol.25, no.12, pp.20933-20949, 2024.
9.
Liangliang Su, Junyu Song, Yalong Yang, Xianyu Ge, Yubin Liu, "LVALH: An Image Hashing Method Based on Lightweight ViT and Asymmetric Learning", 2024 17th International Conference on Advanced Computer Theory and Engineering (ICACTE), pp.297-304, 2024.
10.
Jiayu Dai, Jinshan Pan, Fenglei Xu, Zhongwei Shen, "Lightweight Vit-Based Pedestrian Multi-Object Detection in Building Construction Scenarios", 2024 2nd International Conference on Machine Vision, Image Processing & Imaging Technology (MVIPIT), pp.108-112, 2024.
11.
Chuanyu Dong, Qing Wang, Guan Kai, Zhiqiang Wu, Zhicheng Dong, "LFSC: A Lite and Fidelity-Enhanced Semantic Communication Scheme", 2024 IEEE/CIC International Conference on Communications in China (ICCC), pp.1615-1620, 2024.
12.
Wenfeng Song, Xuan Wang, Yuting Guo, Shuai Li, Bin Xia, Aimin Hao, "CenterFormer: A Novel Cluster Center Enhanced Transformer for Unconstrained Dental Plaque Segmentation", IEEE Transactions on Multimedia, vol.26, pp.10965-10978, 2024.
13.
Qin Yang, Jin Yang, Shengqiao Ni, Jing Zhang, Hang Ren, Nuo Qun, "Research on the classification model of Thangka subjects based on efficient PatchEmbed", 2024 International Joint Conference on Neural Networks (IJCNN), pp.1-8, 2024.
14.
Qihang Fan, Huaibo Huang, Mingrui Chen, Hongmin Liu, Ran He, "RMT: Retentive Networks Meet Vision Transformers", 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.5641-5651, 2024.
15.
Xuan-Thuy Vo, Duy-Linh Nguyen, Adri Priadana, Kang-Hyun Jo, "Reweighting Foveal Visual Representations", 2024 IEEE 33rd International Symposium on Industrial Electronics (ISIE), pp.1-7, 2024.
16.
Jinyang Liu, Shutao Li, Renwei Dian, Ze Song, Xudong Kang, "MDENet: Multidomain Differential Excavating Network for Remote Sensing Image Change Detection", IEEE Transactions on Geoscience and Remote Sensing, vol.62, pp.1-11, 2024.
17.
Yang Chen, Peiliang Zhang, Tong Wu, Jiwei Hu, Xin Zhang, Yi Liu, "LCPTCN: Lightweight Temporal Convolutional Network with Cross-Group Pruning for Dynamic Load Forecasting", 2024 International Conference on Cloud and Network Computing (ICCNC), pp.87-93, 2024.
18.
Mingxin Yu, Ji Zhang, Lianqing Zhu, Shengjun Liang, Wenshuai Lu, Xinglong Ji, "An Intelligent System for Outfall Detection in UAV Images Using Lightweight Convolutional Vision Transformer Network", IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol.17, pp.6265-6277, 2024.
19.
Xiaying Chen, Yue Zhou, "Efficient Hierarchical Stripe Attention for Lightweight Image Super-Resolution", ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.3770-3774, 2024.
20.
Qiming Li, Jinghang Cheng, Yin Gao, Jun Li, "Learning Geometric Information via Transformer Network for Key-Points Based Motion Segmentation", IEEE Transactions on Circuits and Systems for Video Technology, vol.34, no.9, pp.7856-7869, 2024.
21.
Abolfazl Younesi, Mohsen Ansari, Mohammadamin Fazli, Alireza Ejlali, Muhammad Shafique, Jörg Henkel, "A Comprehensive Survey of Convolutions in Deep Learning: Applications, Challenges, and Future Trends", IEEE Access, vol.12, pp.41180-41218, 2024.
22.
Pan Huang, Hualiang Xiao, Peng He, Chentao Li, Xiaodong Guo, Sukun Tian, Peng Feng, Hu Chen, Yuchun Sun, Francesco Mercaldo, Antonella Santone, Jing Qin, "LA-ViT: A Network With Transformers Constrained by Learned-Parameter-Free Attention for Interpretable Grading in a New Laryngeal Histopathology Image Dataset", IEEE Journal of Biomedical and Health Informatics, vol.28, no.6, pp.3557-3570, 2024.
23.
Iksoo Shin, Changsik Cho, Seon-Tae Kim, "Method for Expanding Search Space With Hybrid Operations in DynamicNAS", IEEE Access, vol.12, pp.10242-10253, 2024.
24.
Wei Dai, Rui Liu, Tianyi Wu, Min Wang, Jianqin Yin, Jun Liu, "Deeply Supervised Skin Lesions Diagnosis With Stage and Branch Attention", IEEE Journal of Biomedical and Health Informatics, vol.28, no.2, pp.719-729, 2024.
25.
Nannan Li, Yaran Chen, Weifan Li, Zixiang Ding, Dongbin Zhao, Shuai Nie, "BViT: Broad Attention-Based Vision Transformer", IEEE Transactions on Neural Networks and Learning Systems, vol.35, no.9, pp.12772-12783, 2024.
26.
Ji Zhang, Zhihao Chen, Yiyuan Ge, Mingxin Yu, "An Efficient Convolutional Multi-Scale Vision Transformer for Image Classification", 2023 International Conference on Image Processing, Computer Vision and Machine Learning (ICICML), pp.344-347, 2023.
27.
Yizeng Han, Dongchen Han, Zeyu Liu, Yulin Wang, Xuran Pan, Yifan Pu, Chao Deng, Junlan Feng, Shiji Song, Gao Huang, "Dynamic Perceiver for Efficient Visual Recognition", 2023 IEEE/CVF International Conference on Computer Vision (ICCV), pp.5969-5979, 2023.
28.
Jiahao Zheng, Longqi Yang, Yiying Li, Ke Yang, Zhiyuan Wang, Jun Zhou, "Lightweight Vision Transformer with Spatial and Channel Enhanced Self-Attention", 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), pp.1484-1488, 2023.
29.
Yichen Yuan, Yifan Wang, Lijun Wang, Xiaoqi Zhao, Huchuan Lu, Yu Wang, Weibo Su, Lei Zhang, "Isomer: Isomerous Transformer for Zero-shot Video Object Segmentation", 2023 IEEE/CVF International Conference on Computer Vision (ICCV), pp.966-976, 2023.
30.
Huilong Xie, Wenwei Song, Wenxiong Kang, "MSBA-Net: Multiscale Behavior Analysis Network for Random Hand Gesture Authentication", IEEE Transactions on Instrumentation and Measurement, vol.72, pp.1-13, 2023.

Cites in Papers - Other Publishers (33)

1.
Jiacheng Yang, "AFM-DViT: A framework for IoT-driven medical image analysis", Alexandria Engineering Journal, vol.113, pp.294, 2025.
2.
Matej Vitek, Vitomir Štruc, Peter Peer, "GazeNet: A lightweight multitask sclera feature extractor", Alexandria Engineering Journal, vol.112, pp.661, 2025.
3.
Tianping Li, Xiaolong Yang, Zhenyi Zhang, Zhaotong Cui, Zhou Maoxia, "Mix-layers semantic extraction and multi-scale aggregation transformer for semantic segmentation", Complex & Intelligent Systems, vol.11, no.1, 2025.
4.
Xuan-Thuy Vo, Duy-Linh Nguyen, Adri Priadana, Kang-Hyun Jo, "Efficient Vision Transformers with\\xa0Partial Attention", Computer Vision – ECCV 2024, vol.15141, pp.298, 2025.
5.
Hyunwoo Yu, Yubin Cho, Beoungwoo Kang, Seunghun Moon, Kyeongbo Kong, Suk-Ju Kang, "Embedding-Free Transformer with\\xa0Inference Spatial Reduction for\\xa0Efficient Semantic Segmentation", Computer Vision – ECCV 2024, vol.15100, pp.92, 2025.
6.
Pan Li, Xiaofang Yuan, Haozhi Xu, Jinlei Wang, Yaonan Wang, "EMPViT: Efficient multi-path vision transformer for security risks detection in power distribution network", Neurocomputing, pp.128967, 2024.
7.
Zehan Tan , Weidong Yang , Zhiwei Zhang , " PyraBiNet: A Hybrid Semantic Segmentation Network Combining PVT and\xa0BiSeNet for\xa0Deformable Objects in\xa0Indoor Environments ", Neural Information Processing , vol. 1968 , pp. 552 , 2024 .
8.
Ji Zhang, Mingxin Yu, Wenshuai Lu, Yuxiang Dai, Huiyu Shi, Rui You, "A novel dual-granularity lightweight transformer for vision tasks", Intelligent Data Analysis, pp.1, 2024.
9.
Nannan Li, Yaran Chen, Dongbin Zhao, "Adaptive search for broad attention based vision transformers", Neurocomputing, pp.128696, 2024.
10.
Xinxin Zhang, Weisong Mu, "GMamba: State space model with convolution for Grape leaf disease segmentation", Computers and Electronics in Agriculture, vol.225, pp.109290, 2024.
11.
Mohamed A. Massoud, Mohamed E. El-Bouridy, Wael A. Ahmed, "Revolutionizing Alzheimer’s detection: an advanced telemedicine system integrating Internet-of-Things and convolutional neural networks", Neural Computing and Applications, 2024.
12.
Xiaomei Liao, Lirong He, Jiayou Mao, Meng Xu, "Spectral Superresolution Using Transformer with Convolutional Spectral Self-Attention", Remote Sensing, vol.16, no.10, pp.1688, 2024.
13.
Jun Chen, Yiping Huang, Ling Zhang, Guangzhen Si, Juzhen Wang, "CSCNN: Lightweight Modulation Recognition Model for Mobile Multimedia Intelligent Information Processing", Mobile Networks and Applications, 2024.
14.
Seung Il Lee, Kwanghyun Koo, Jong Ho Lee, Gilha Lee, Sangbeom Jeong, Seongjun O, Hyun Kim, "Vision transformer models for mobile/edge devices: a survey", Multimedia Systems, vol.30, no.2, 2024.
15.
Shiyan Cui, Bin Hui, "Dual-Dependency Attention Transformer for Fine-Grained Visual Classification", Sensors, vol.24, no.7, pp.2337, 2024.
16.
Deguang Chen, Jianrui Chen, Chaowei Fang, Zhichao Zhang, "Complex visual question answering based on uniform form and content", Applied Intelligence, 2024.
17.
Jiangning Zhang, Xiangtai Li, Yabiao Wang, Chengjie Wang, Yibo Yang, Yong Liu, Dacheng Tao, "EATFormer: Improving Vision Transformer Inspired by Evolutionary Algorithm", International Journal of Computer Vision, 2024.
18.
Jianming Zhang, Zi Xing, Mingshuang Wu, Yan Gui, Bin Zheng, "Enhancing low-light images via skip cross-attention fusion and multi-scale lightweight transformer", Journal of Real-Time Image Processing, vol.21, no.2, 2024.
19.
Xinxin Zhang, Fei Li, Haiying Zheng, Weisong Mu, "UPFormer: U-sharped Perception lightweight Transformer for segmentation of field grape leaf diseases", Expert Systems with Applications, pp.123546, 2024.
20.
Jian Feng, Peng Wu, Renjie Xu, Xiaoming Zhang, Tao Wang, Xuan Li, "CSFNet: a compact and efficient convolution-transformer hybrid vision model", Multimedia Tools and Applications, 2024.
21.
Jiaoju Wang, Jiewen Luo, Jiehui Liang, Yangbo Cao, Jing Feng, Lingjie Tan, Zhengcheng Wang, Jingming Li, Alphonse Houssou Hounye, Muzhou Hou, Jinshen He, "Lightweight Attentive Graph Neural Network with Conditional Random Field for Diagnosis of Anterior Cruciate Ligament Tear", Journal of Imaging Informatics in Medicine, 2024.
22.
Gary Y. Li, Junyu Chen, Se?In Jang, Kuang Gong, Quanzheng Li, "SwinCross: Cross?modal Swin transformer for head?and?neck tumor segmentation in PET/CT images", Medical Physics, 2023.
23.
Kunyu Feng, Li Lun, Xiaofeng Wang, Xiaoxin Cui, "LRTransDet: A Real-Time SAR Ship-Detection Network with Lightweight ViT and Multi-Scale Feature Fusion", Remote Sensing, vol.15, no.22, pp.5309, 2023.
24.
Shengjun Liang, Mingxin Yu, Wenshuai Lu, Xinglong Ji, Xiongxin Tang, Xiaolin Liu, Rui You, "A lightweight vision transformer with symmetric modules for vision tasks", Intelligent Data Analysis, pp.1, 2023.
25.
Krishna Teja Chitty-Venkata, Sparsh Mittal, Murali Emani, Venkatram Vishwanath, Arun K. Somani, "A survey of techniques for optimizing transformer inference", Journal of Systems Architecture, vol.144, pp.102990, 2023.
26.
Yuanlun Xie, Wenhong Tian, Zitong Yu, "Robust facial expression recognition with Transformer Block Enhancement Module", Engineering Applications of Artificial Intelligence, vol.126, pp.106795, 2023.
27.
Jing Li, Xueping Luo, "A Study of Weather-Image Classification Combining VIT and a Dual Enhanced-Attention Module", Electronics, vol.12, no.5, pp.1213, 2023.
28.
Daniel Bolya, Cheng-Yang Fu, Xiaoliang Dai, Peizhao Zhang, Judy Hoffman, "Hydra Attention: Efficient Attention with Many Heads", Computer Vision ? ECCV 2022 Workshops, vol.13807, pp.35, 2023.
29.
Pierluigi Carcagni, Marco Leo, Marco Del Coco, Cosimo Distante, Andrea De Salve, "Convolution Neural Networks and Self-Attention Learners for Alzheimer Dementia Diagnosis from Brain MRI", Sensors, vol.23, no.3, pp.1694, 2023.
30.
Liang Chen, Yuyi Yang, Zhenheng Wang, Jian Zhang, Shaowu Zhou, Lianghong Wu, "Underwater Target Detection Lightweight Algorithm Based on Multi-Scale Feature Fusion", Journal of Marine Science and Engineering, vol.11, no.2, pp.320, 2023.
Contact IEEE to Subscribe

References

References is not available for this document.