Introduction
Malware is malicious software that evolved purposefully and affects computer systems [1]. It is utilized to infiltrate, attack, or acquire accessibility to any digital resources that can be highly complex and cause loss or undesirable outcomes in a system [2]. The main aim is to cause damage and attack resources whose accessibility is not open. Approximately 360,000 novel malware files have been identified regularly, and the quantity of files created every day must be raised by 5.2% [3]. The rapid evolution and widespread dissemination of malware are made possible by using automated and sophisticated malware creation tools [4]. Anti-virus software depends upon two frequently employed methods, behaviour- and signature-based detection, to identify and classify malware [5]. The signatures of the malware have been generally gained from known malware by executing static analysis (no implementation of the malware) [6]. A database made of signatures gathered from different malicious objects that could be employed for malware identification and classification [7]. Although the signature-based detection technique is extremely fast and accurate, it can be simply avoided by using obfuscation algorithms (for example, metamorphism, packing, encryption, and polymorphism) to produce a new variant [8].
Furthermore, the signature database, typically reliant on static analysis, has traditionally been manually updated and curated, resulting in labour-intensive and time-consuming processes [9]. Unlike signature-based detection, in behaviour-based detection, behaviours of the malware can be obtained and recorded in the execution of specified malicious codes on dynamic analysis [10]. Therefore, this method will recognize metamorphic and polymorphic viruses dependent upon their behaviours. However, storing run-time behavioural patterns is deliberated by extensive resources [11]. Additionally, the behaviour dataset must be upgraded once a new malware type can be determined. An innovative technique is of great importance for malware classification, which depends on image processing [12]. According to the texture of the malware image, this algorithm permits a classifier to identify and categorize the present malware samples that could be transformed from the collected malware binary.
Various researchers have developed numerous innovative methods, a few of which are accomplished by employing the previous techniques for classification [13]. Primarily, shallow machine learning (ML) methods are implemented for the classification of malware instances. Convolution neural networks (CNNs) entered into presence in the ’90s [14]. They became extensively utilized as a region of interest (RoI) for computer vision (CV) relevant tasks, particularly images, speech, and time series [15]. Similarly, it can be highly efficacious in object detection for identification, classification difficulties, and so on. Stimulated by the achievement of CNN in classification difficulties, several studies have been implemented to understand the usage of CNN in malware classification [16]. CNN is the deep neural network (DNN) that exceedingly depends on convolutional methods and comprises layers such as a fully connected (FC) layer, pooling layer, and convolution layer, which are developed for automatically and flexibly learning the spatial hierarchies of features by employing a backpropagation (BP) method [17]. The CNN model automated the feature extraction and entwined learned features with input data, creating a robust tool for classifying information with vast amounts of features or images [18].
This study introduces a new Snake Optimization Algorithm with Deep Convolutional Neural Network for Image-Based Malware Classification (SODCNN-IMC) technique. In the SODCNN-IMC model, the ShuffleNet technique is applied for effectual derivation of the feature vectors. Besides, the SO algorithm could be exploited to boost the choice of hyperparameters in the ShuffleNet architecture. For the detection and classification of malware images, attention-based bi-directional long short-term memory (ABiLSTM) approach. The performance evaluation of the SODCNN-IMC technique is validated using the Malimg malware dataset. The experimental values inferred that the SODCNN-IMC methodology achieves excellent performance over other methods in terms of diverse evaluation measures. In short, the key contributions of the study is given as follows:
Design a strong malware detection using SODCNN-IMC method through efficiently leveraging the feature extracted by the ShuffleNet, finetuned the hyperparameter through the Snake Optimization, and the attention module of ABiLSTM.
Utilizes ShuffleNet for effective feature extraction from images that considerably decreases the computation difficulty while maintaining superior performance, making it fit for resource-constraint environment.
Develops SO technique to automatically finetune the hyperparameter of the ShuffleNet model that effectively explore the hyperparameter space, which enables the model to adapt and enhance its performance for various datasets.
Applies BiLSTM layers for capturing long-range dependencies and focus on significant parts within the input images. The attention module improves the model’s capability to distinguish between benign and malicious features, enhancing the overall accuracy of detection.
Hyperparameter optimization of the ABiLSTM algorithm using SO technique based cross-validation assist in boosting the prediction outcomes of the presented technique for hidden data.
The rest of the paper is organized as follows. Section II provides the related works and section III offers the proposed model. Then, section IV gives the result analysis and section V concludes the paper.
Related Works
Zhao et al. [19] presented a static malware identification method dependent upon the AlexNet CNN model. Instead of present solutions, the developed method converted each malicious byte into colour images. It also provides an enhanced AlexNet framework and resolves the unbalanced databases with the data improvement algorithm. In [20], a new visual malware recognition architecture dependent upon DNNs was developed. Initially, execution file instances were composed and transformed into bytes and asm files using disassembled technology. Next, visualization technology integrated with data augmentation was utilized for additional sample conversion into 3-channel RGB images. Lastly, a DNN model was introduced, i.e. SEResNet50+ Bi-LSTM + Attention (SERLA). Chaganti, Ravi and Pham [21] projected an effective NN architecture, EfficientNetB1, employing the level of malware byte image representation method. To mitigate the computational resources and consumption caused by DL methods testing and training the different CNN-based techniques, the technique executed the task and computational efficacy assessment of the diverse CNN pre-trained architecture for choosing the preeminent CNN model for classifying malware.
In [22], a new DeepGray technique was presented to classify multi-class malware through grayscale images and the supremacy of DL. This method includes converting execution files into a format appropriate for DL. In the data preprocessing stage, Principal Component Analysis (PCA) was implemented. ViT, VGG16, EfficientNet, and modified CNN frameworks could be employed for classification. Bakour and Ünver [23] implemented an innovative hybrid DL algorithm named DeepVisDroid. There are two kinds of image-based features, such as global and local features, that must be removed. Subsequently, 1D convolutional layers-based NN architecture was designed and trained. Additionally, two standard 2D convolutional layers-based NN frameworks could be developed, and two well-known DL methods were examined.
In [24], a lightweight malware classification method named IMCLNet was introduced, which must be determined by malware images and required domain knowledge and feature engineering. While developing the architecture, the method widely weighed accuracy, multiple parameters, and computational rate and incorporated Coordinate Attention, Global Context Embedding, and Depth-wise Separable Convolution.
Copiaco et al. [25] designed a new multi-functional malware recognition architecture. This method discovers numerous pre-trained networks comprising traditional and compact networks, series, and fixed acyclic graph structures for classifying malware. The algorithm exploits grayscale transform-based features as consistent features, converting malware classification through different file categories. The technique incorporates several databases into the training model. In [26], an innovative method employing image-based DL classification space. Especially, Jadeite removes the Inter-procedural Control Flow Graph (ICFG) from a specific Java bytecode file followed by clipping the ICFG and transforming it into an adjacent matrix. The architecture leverages an object identification technique in a DCNN model to detect maliciousness.
In [27], a multi-headed attention based technique is combined to a CNN to find and classify the tiny diseased areas in the complete image. The performance of the planned multi-headed attention-based CNN technique was equated with numerous non-attention-CNN-based techniques on numerous data splits of testing and training malware image benchmark dataset. Chaganti et al. [28] proposed a DL-based CNN method in order to achieve the malware identification on Portable Executable (PE) dual files utilizing the fusion feature set model. We presented a wide performance assessment of numerous DL method structure and ML classifier i.e. Support Vector Machine (SVM), on multi-aspect feature sets covering the dynamic, static, and image features to pick the developed CNN method. Reilly et al. [4] explores the efficiency of training DL methods with Generative Adversarial Network-generated data to recover their sturdiness beside such assaults. Ben Abdel Ouahab et al. [29] developed and test a malware classifier capable to affect every inputted malware into its equivalent family. To do so, we utilize the multi-layer perceptron technique with malware visualization model.
The Proposed Method
In this research, we have developed an innovative SODCNN-IMC system. The foremost goal of the SODCNN-IMC model is to apply a hyperparameter-tuned DL method for recognizing and categorizing malware images. The SODCNN-IMC technique contains three major processes, namely ShuffleNet-based feature extraction, SO-based parameter tuning, and ABiLSTM-based classification process. Fig. 1 demonstrates the complete procedure of the SODCNN-IMC system.
A. Feature Extraction
ShuffleNet architecture can be applied to the effectual derivation of feature vectors for feature extraction. We implement the ShuffleNet, an extremely effective DL model produced with mobile devices [30]. According to the computation resource (hardware), we applied the shufflenetV1 version of the pre-trained ShuffleNet method to obtain the best outcome at a lower computation cost. This introduced technique can be higher than typical CNN with 50 learnable layers through the FC layer, one convolutional (Conv) layer, and 48 group Conv layers. The CNN model has a total of 172 layers, involving 49 BN layers, a classification layer, one max pooling layer, four average pooling layers, a softmax layer, and 33 ReLu layers.
The primary layer is the input layer and accepts an image size of \begin{align*} s\left ({{ i,j }}\right)& =\left ({{ I\times K }}\right)\left ({{ i,j }}\right) \\ & =\sum \limits _{n}^{} \sum \limits _{m}^{} I \left ({{ m,n }}\right)K\left ({{ i-m,j-n }}\right) \tag {1}\end{align*}
In Eq. (1), s denotes the outcome mapping feature, i indicates the input image,
The ShuffleNet architecture with stride (shift) of \begin{align*} f\left ({{ x }}\right)= \begin{cases} \displaystyle 0, & x\lt 0 \\ \displaystyle x, & x\ge 0 \end{cases} \tag {2}\end{align*}
ReLU stimulates neurons with positive value and disables neurons with negative value of 0. The architecture has a 3-by-3 average pooling under the shorter path. It has 16 consecutive ShuffleNet elements. Also, it has 50 layers, all are contained the trained feature map [32]. Also, the layer performs feature extraction. Softmax activation could be employed to resolve the classification possibility employed by the ultimate classification layer.\begin{equation*} a_{i}=\sum \limits _{j=0}^{m\times n-1} w_{ij} \times x+b_{i} \tag {3}\end{equation*}
In Eq. (3), b denotes the bias; w represents the weights. i refers to the index output of the FC layer;
B. Hyperparameter Tuning Using So Model
Besides, the SO algorithm can be exploited to select the optimum hyperparameter for the ShuffleNet architecture. SO has been developed to handle a diverse set of optimization functions that reproduce the superior mating behavior of snakes [33]. All snakes (male/female) compete to have the most significant partner once the present quantity of food is sufficient and the temperature can be lower. The SO architecture depends on two main phases similar to the alternative swarm-based methods: local search (exploitation) and global search (exploration). While food is rare and the conditions are formal, snakes in the exploration stage are distributed through the hunting region. Alternatively, the exploitation phase was broken down into numerous smaller phases. This part clarifies the arithmetical method of the SO algorithm. Thus, the below-mentioned steps demonstrate the SO method:
Initialize the solutions: SO begins a range of random solutions by employing Eq. (4) in the search space. These outcomes form the snake population of SO will be enhanced in the following steps:\begin{equation*} S_{i}=S_{\min }+R\times \left ({{ S_{\max }-S_{\min } }}\right) \tag {4}\end{equation*}
Snake population separation: Utilizing Eqs. (5) and (6), the population is divided equally into dual parts like 50 per cent female and 50 per cent male:\begin{align*} N_{m}& =\frac {N}{2} \tag {5}\\ N_{f}& \approx N-N_{m} \tag {6}\end{align*}
Estimation of snakes: Acquire the ideal snake from the female and male groups (\begin{equation*} T=e^{\frac {-c}{C}} \tag {7}\end{equation*}
\begin{equation*} Q=k_{1}x\ominus \left ({{ \frac {c-C}{c} }}\right) \tag {8}\end{equation*}
Exploring the searching region (food is not determined): This mainly depends upon the usage of a selected value of the threshold. When Q is 0.25, the snakes upgrade their locations relative to an assumed arbitrary place to hunt globally. It is defined in Eqs. (9) to (12).\begin{align*} S_{mi}\left ({{ c+1 }}\right)& =S_{mR}\left ({{ c }}\right)\pm K_{2}\times AB_{m} \\ & \qquad \qquad \quad \times \left ({{ \left ({{ S_{\max }-S_{\min } }}\right)\times R+S_{\min } }}\right) \tag {9}\end{align*}
\begin{equation*} AB_{m}=e^{\left ({{ -F_{mR} }}\right)\mathrm {/(}F_{mi})} \tag {10}\end{equation*}
\begin{align*} S_{fi}\left ({{ c+1 }}\right)& =S_{fR}\left ({{ c }}\right)\pm \mathrm { }K_{2}\times AB_{f} \\ & \qquad \quad \times \left ({{ \left ({{ S_{\max }-S_{\min } }}\right)\mathrm {\times }R+S_{\min } }}\right) \tag {11}\end{align*}
\begin{equation*} AB_{f}=e^{\left ({{ -F_{fR} }}\right)\mathrm {/(}F_{fi})} \tag {12}\end{equation*}
Exploiting the searching region (food is found): The temperature should be verified when the quantity of food is more significant than a pre-determined threshold \begin{align*} S_{\left ({{ i,j }}\right)}\left ({{ c+1 }}\right)=L_{food}\pm K_{3}\times T\times R\times \left ({{ L_{food}-S_{\left ({{ i,j }}\right)}\left ({{ c }}\right) }}\right) \tag {13}\end{align*}
The snake can be in the mating or fighting mode if \begin{align*} S_{mi}\left ({{ c+1 }}\right)=S_{mi}\left ({{ c }}\right)\pm \mathrm { }K_{3}\times FA_{m}\times R\times \left ({{ S_{fbest}-S_{mi}\left ({{ c }}\right) }}\right) \tag {14}\end{align*}
\begin{align*} S_{fi}\left ({{ c+1 }}\right)=S_{fi}\left ({{ c }}\right)\pm \mathrm { }K_{3}\times FA_{f}\times R\times \left ({{ S_{mbest}-S_{fi}\left ({{ c }}\right) }}\right) \tag {15}\end{align*}
\begin{align*} FA_{m}& =e^{\mathrm {(-}F_{fbest}\mathrm {)/(}F_{i})} \tag {16}\\ FA_{f}& =e^{\mathrm {(-}F_{mbest}\mathrm {)/(}F_{i})} \tag {17}\end{align*}
\begin{align*} S_{mi}\left ({{ c+1 }}\right)& =S_{mi}\left ({{ c }}\right)\pm \mathrm { }K_{3}\times MA_{m}\times R \\ & \quad \times \left ({{ Q\times S_{fi}\left ({{ c }}\right)-S_{mi}\left ({{ c }}\right) }}\right) \tag {18}\\ S_{fi}\left ({{ c+1 }}\right)& =S_{fi}\left ({{ c }}\right)\pm \mathrm { }K_{3}\times MA_{f}\times R \\ & \quad \times \left ({{ Q\times S_{mi}\left ({{ c }}\right)-S_{fi}\left ({{ c }}\right) }}\right) \tag {19}\end{align*}
\begin{align*} MA_{m}& =e^{\mathrm {(-}F_{fi}\mathrm {)/(}F_{mi})} \tag {20}\\ MA_{f}& =e^{\mathrm {(-}F_{mi}\mathrm {)/(}F_{fi})} \tag {21}\end{align*}
If an egg hatches, pick the worst female and male snakes and exchange them.\begin{align*} S_{mworst}& =S_{\min }+R\times \left ({{ S_{\max }-S_{\min } }}\right) \tag {22}\\ S_{fworst}& =S_{\min }+R\times \left ({{ S_{\max }-S_{\min } }}\right) \tag {23}\end{align*}
The SO method technique derives a fitness function to attain improved classification performance. It determines a positive integer to represent the better performance of the candidate solutions. In this study, the minimization of the classification error rate is considered as the fitness function as specified in Eq. (24).\begin{align*} fitness\left ({{ x_{i} }}\right)& =ClassifierErrorRate\left ({{ x_{i} }}\right) \\ & =\frac {No\mathrm {.}of~misclassified~instances}{Total~no\mathrm {.}of~instances} \ast 100 \tag {24}\end{align*}
C. ABiLSTM-Based Classification
Eventually, the SODCNN-IMC approach can be utilized in the ABiLSTM model. Like LSTM and Bi-LSTM, a deep neural network (DNN) is introduced for extracting temporal features between network packets since network traffic is an incessant flow of sequence data in packet bytes [36]. LSTM is a variant of RNN used to address explosion and vanishing gradient problems. When equated to RNN, LSTM networks have been significantly more effective in detecting long-term dependency within sequence data, making them the best alternative to extract temporal features from traffic network data. The architecture of LSTM has forget, input, and output gates. Fig. 2 signifies the framework of the ABiLSTM model.
A forget gate is used to control how much data is to be retained or discarded. The data from the prior HLs and the existing input are passed over the sigmoid function that produces values between 0 and 1. When the value is closer to 1, then it is highly possible to be remembered; when the value is closer to zero, then it is highly possible to be forgotten. Eq. (25) evaluates the forget vector, where the parameters \begin{equation*} f_{t}=\sigma \left ({{ W_{f}\mathrm {.}\left [{{ h_{t-1},x_{t} }}\right ]+b_{f} }}\right) \tag {25}\end{equation*}
The input gate acts as a value evaluator for information included in extended memory as original data. It handles how much \begin{equation*} i_{t}=\sigma \left ({{ W_{i}\cdot \left [{{ h_{t-1},x_{t} }}\right ]+b_{i} }}\right) \tag {26}\end{equation*}
The forgot gate \begin{align*} \tilde {C}_{\mathrm {t}}& =tanh\left ({{ W_{c}\mathrm {.}\left [{{ h_{t-1},x_{t} }}\right ]+b_{c} }}\right) \tag {27}\\ C_{t}& =f_{t}\ast C_{t-1}+i_{t}\ast \tilde {C}_{t} \tag {28}\end{align*}
The output gate in LSTM defines what long-term memory must be transmitted to the outcome. \begin{align*} o_{t}& =\sigma \left ({{ W_{o}\cdot \left [{{ h_{t-1},x_{t} }}\right ]+b_{o} }}\right) \tag {29}\\ h_{t}& =\mathrm {o}_{t}\ast tanh\left ({{ C_{t} }}\right) \tag {30}\end{align*}
BiLSTM is a variant of the sequence processing model composed of 2 LSTMs, namely the forward and reverse direction [38]. The LSTM network is used to address the long-term dependency problem. Due to its architecture, LSTM can memorize data over time and learn long-term information. Compared to LSTM, this study exploits the BiLSTM model due to its high prediction accuracy. The attention module focuses on the data generated by the HL of BiLSTM.
Performance Validation
The experimental evaluation of the SODCNN-IMC system is validated by employing the Malimg malware database [39]. The database encompasses 1709 samples under ten classes, as described in Table 1. The Malimg dataset is a commonly employed benchmark database in the area of computer security and malware analysis. It includes a collection of grayscale imageries demonstrating dissimilar kinds of malware samples removed from the wild. Every image in the dataset matches to an exact malware model and is resized to an even length for consistency in study. The dataset offers a valued resource for practitioners and researchers in order to progress and assess image-based malware recognition models and approaches. With its various range of malware samples and families, the Malimg dataset eases the survey of numerous features of image-based malware analysis, with feature extraction, detection and classification.
Fig. 3 displays the classifier performance of the SODCNN-IMC system under the test dataset. Figs. 3a- 3b represents the confusion matrices accomplished by the SODCNN-IMC method at 80:20 of TRPH/TSPH. This figure denoted the SODCNN-IMC technique, which can be identified and categorized into ten classes. Next, Fig. 3c reveals the PR of the SODCNN-IMC system. This figure shows that the SODCNN-IMC technique achieves maximum PR performance. Lastly, Fig. 3d shows the ROC investigation of the SODCNN-IMC methodology. This result indicates that the SODCNN-IMC algorithm offers effective outcomes with maximum ROC values in diverse classes.
The malware classification performance of the SODCNN-IMC technique in the applied dataset is described in Table 2 and Fig. 4. These simulation values depict that the SODCNN-IMC technique achieves better performance with 10 class labels. With 80% of TRPH, the SODCNN-IMC technique gives an average
The
Fig. 6 provides an extensive summary of the TR and TS loss values for the SODCNN-IMC technique over frequent epochs. This TR loss reliably reduces as a model improves weights to diminish classification errors below the datasets. The loss curves clearly illustrate the model’s position with the TR data, emphasizing its ability to capture patterns effectively in both datasets. Noteworthy is the continuous refinement of parameters in the SODCNN-IMC methodology, aimed at lessening discrepancies between predictions and actual TR labels.
An extensive comparative study of the SODCNN-IMC method is provided with recent approaches [40] in Table 3. Fig. 7 represents a comparison result of the SODCNN-IMC approach in respect of
Fig. 8 illustrates a comparative review of the SODCNN-IMC technique with respect to CT. The obtained outcomes stated that the SODCNN-IMC algorithm gains increased performance with lesser CT values. According to CT, the SODCNN-IMC technique provides reduced CT of 1.41s while the GoogleNet, ResNET, VGG16+ SVM, SSPNet1, Multi-Objective learning, CapsNet, Kernel-based ELM, and CNN with VGG16 techniques obtain higher CT values of 5.01s, 4.89s, 4.94s, 3.30s, 3.85s, 5.01s, 2.85s, and 2.97s respectively.
The SODCNN-IMC method for malware image classification exhibits excellent performance because of its novel sequence of recent methods aimed particularly at this task. The combination of a ShuffleNet-based feature extraction offers a secure framework for proficiently capturing appropriate image features and optimizing computational resources without offering classifier accuracy. By leveraging the effectual channel shuffling functions and hierarchical feature extractor abilities of ShuffleNet, this method efficiently recognizes intricate patterns in the malware images, improving its discriminative power. In addition, the SO-based hyperparameter tuning process allows a fine-grained optimizer of model parameters, dynamically adjusting to the difficulties and nuances present in the database. This adaptive tuning process improves the method’s flexibility and generalized ability, as it may result in enhanced solutions under various malware samples. Also, the combination of an ABiLSTM classification method enables the model to efficiently capture temporal dependencies and contextual data in order of feature extraction, added to refining the classification decisions. Overall, the SODCNN-IMC method excels in malware image classification by synergistically integrating effectual feature extractors, adaptive hyperparameter tuning, and refined sequence modelling processes because it leads to higher performance related to other methods.
Conclusion
In this research, we have recognized a new SODCNN-IMC methodology. The core concentration of the SODCNN-IMC methodology is to apply the hyperparameter-tuned DL method for the classification and detection of malware images. The SODCNN-IMC model contains three significant processes: ShuffleNet-based feature extractor, SO-based parameter tuning, and ABiLSTM-based classification. The design of the SO-based hyperparameter tuning process helps improve the overall recognition rate of the proposed model. The experimental evaluation of the SODCNN-IMC technique on the Malimg malware dataset demonstrates superior performance with a maximum accuracy of 98.42% over other models. Therefore, the proposed model provides a promising avenue to enhance cybersecurity measures.
While SODCNN-IMC method provides major developments in image-based malware recognition, it is significant to recognize its limits. Initially, in spite of the efficacy gains attained over ShuffleNet and SO, the method might still face tasks in processing higher-resolution imageries or larger-scale datasets owing to characteristic computational limits. Moreover, the Abi-LSTM structure improves feature discrimination and representation, it may fight with taking very difficult spatial relationship or subtle designs within imageries, possibly foremost to misclassification or false negatives. Besides, the efficiency of SO model trusts deeply on the excellence and representativeness of the early parameter configuration, which may not constantly assurance optimum performance through every datasets or states.
Future work can focus on combining advanced methods like transfer learning (TL) and ensemble learning to improve the robustness and generalisation of these classification methods. Additionally, as cyber threats continue to develop, there may be a developing concentration on real-time recognition and response mechanisms leveraging image-based malware classification, allowing proactive defense against developing threats.