Butterfly construction-based Vision Transformer (B_ViT) using Global-Local Attention for Visualization-based Malware Classification.
Abstract:
In recent studies, convolutional neural networks (CNNs) are mostly used as dynamic techniques for visualization-based malware classification and detection. Though vision ...Show MoreMetadata
Abstract:
In recent studies, convolutional neural networks (CNNs) are mostly used as dynamic techniques for visualization-based malware classification and detection. Though vision transformer (ViT) proved its efficiency in image classification, a few of the earlier studies developed a ViT-based malware classifier. This paper proposes a butterfly construction-based vision transformer (B_ViT) model for visualization-based malware classification and detection. B_ViT has four phases: (1) image partitioning and patches embeddings; (2) local attention; (3) global attention; and (4) training and malware classification. B_ViT is an enhanced ViT architecture that supports the parallel processing of image patches and captures local and global spatial representations of malware images. B_ViT is a transfer learning-based model that uses a pre-trained ViT model on the ImageNet dataset to initialize the training parameters of transformers. Four B_ViT variants are experimented and evaluated on grayscale malware images collected from MalImg, Microsoft BIG datasets or converted from portable executable imports. The experiments show that B_ViT variants outperform the Input Enhanced vision transformer (IEViT) and ViT variants, achieving an accuracy equal to 99.49% and 99.99% for malware classification and detection respectively. The experiments also show that B_ViT is time effective for malware classification and detection where the average speed-up of B_ViT variants over IEViT and ViT variants are equal to 2.42 and 1.81 respectively. The analysis proves the efficiency of texture-based malware detection as well as the resilience of B_ViT to polymorphic obfuscation. Finally, the proposed B_ViT-based malware classifier outperforms the CNN-based malware classification methods in well.
Butterfly construction-based Vision Transformer (B_ViT) using Global-Local Attention for Visualization-based Malware Classification.
Published in: IEEE Access ( Volume: 11)
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Vision Transformer ,
- Convolutional Neural Network ,
- Image Classification ,
- Detection Model ,
- Image Patches ,
- Transformer Model ,
- Image Representation ,
- Spatial Representation ,
- Global Attention ,
- ImageNet Dataset ,
- Global Representation ,
- Local Attention ,
- Local Features ,
- Input Image ,
- Global Features ,
- Detection Approach ,
- Grayscale Images ,
- Convolutional Neural Network Model ,
- Transformer Encoder ,
- Convolutional Neural Network Architecture ,
- Spatial Information Of Images ,
- Linear Projection ,
- Local Position ,
- Ransomware ,
- Number Of Heads ,
- Running Costs ,
- Phase Step ,
- Malicious Activities
- Author Keywords
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Vision Transformer ,
- Convolutional Neural Network ,
- Image Classification ,
- Detection Model ,
- Image Patches ,
- Transformer Model ,
- Image Representation ,
- Spatial Representation ,
- Global Attention ,
- ImageNet Dataset ,
- Global Representation ,
- Local Attention ,
- Local Features ,
- Input Image ,
- Global Features ,
- Detection Approach ,
- Grayscale Images ,
- Convolutional Neural Network Model ,
- Transformer Encoder ,
- Convolutional Neural Network Architecture ,
- Spatial Information Of Images ,
- Linear Projection ,
- Local Position ,
- Ransomware ,
- Number Of Heads ,
- Running Costs ,
- Phase Step ,
- Malicious Activities
- Author Keywords