Conferences >2023 11th International Confe...

ViTMed: Vision Transformer for Medical Image Analysis

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

The COVID-19 global health crisis has presented daunting challenges to medical professionals, making accurate and efficient diagnoses more important than ever. In view of...Show More

Metadata

Abstract:

The COVID-19 global health crisis has presented daunting challenges to medical professionals, making accurate and efficient diagnoses more important than ever. In view of this, this paper proposes a Vision Transformer model, ViTMed, with an attention mechanism to classify the CT scan images for more effective diagnosis of COVID-19. Given the input CT scan images, it is represented as sequences of tokens and a transformer is utilized to capture global and local dependencies between features by utilizing self-attention mechanism. The core element in ViTMed is the transformer encoder with multi-headed attention (MHA) mechanism and feed-forward network. This enables model to learn hierarchical representation of image and make more informed predictions. The proposed ViTMed achieves promising performance with fewer parameters and computations than conventional Convolutional Neural Networks. From the experimental results, the proposed ViTMed outperforms state-of-the-art approaches for all three public benchmark datasets of COVID-19, 98.38%, 90.48%, and 99.17% accuracy for SARS-CoV-2-CT, COVID-CT, and iCTCF datasets, respectively. The number of samples collected for each dataset are 2482, 746, 19685. The datasets consist of two to three classes, which are Covid, Non-Covid and Non-informative cases.

Published in: 2023 11th International Conference on Information and Communication Technology (ICoICT)

Date of Conference: 23-24 August 2023

Date Added to IEEE Xplore: 29 September 2023

ISBN Information:

DOI: 10.1109/ICoICT58202.2023.10262548

Conference Location: Melaka, Malaysia

Funding Agency:

Contents

I. Introduction

Medical image analysis plays a critical role in the healthcare industry by developing automated solutions for the diagnosis and treatment of various medical conditions. According to [1], the AI in Healthcare Market is expected to grow from around USD 14.6 Billion to USD 102.7 Billion by 2028, as the generation of vast and intricate healthcare datasets continues to grow steadily. Its primary objective is to accurately detect and classify diseases using medical imaging data, ultimately improving patient outcomes and reducing the workload of medical professionals. In this work, we focus on the coronavirus disease, COVID-19, which is a highly infectious respiratory illness caused by the SARS-CoV-2 virus [2]. To improve the accuracy and efficiency of COVID-19 diagnoses, we propose a Vision Transformer model, ViTMed, which uses a transformer architecture to classify Computed Tomography (CT) scan images. Unlike traditional convolutional neural networks (CNNs) that process images using convolutions, ViTMed processes images as sequences of patches, which are transformed into embeddings using multi-head self-attention layers. This approach has been shown to outperform CNN-based approaches on some datasets [3]. Limited availability of medical images is a persistent challenge, which is mitigated by applying techniques such as data augmentation. Another challenge is capturing the significant features and attributes from the images for accurate classification. Therefore, various algorithms and models are studied to identify the most suitable approach that can yield the most accurate results. The contributions of this paper are concluded as below:

References is not available for this document.

ViTMed: Vision Transformer for Medical Image Analysis

Abstract:

Metadata

Abstract:

Funding Agency:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

ViTMed: Vision Transformer for Medical Image Analysis

Alerts

Abstract:

Metadata

Abstract:

Funding Agency:

I. Introduction

References