Processing math: 100%
Enhanced Image-Based Malware Classification Using Snake Optimization Algorithm With Deep Convolutional Neural Network | IEEE Journals & Magazine | IEEE Xplore

Enhanced Image-Based Malware Classification Using Snake Optimization Algorithm With Deep Convolutional Neural Network


This study introduces a new Snake Optimization Algorithm with Deep Convolutional Neural Network for Image-Based Malware Classification technique. Primarily, the ShuffleNe...

Abstract:

Malware is a malicious software intended to cause damage to computer systems. In recent times, significant proliferation of malware utilized for illegal and malicious goa...Show More

Abstract:

Malware is a malicious software intended to cause damage to computer systems. In recent times, significant proliferation of malware utilized for illegal and malicious goals has been recorded. Several machine and deep learning methods are widely used for the detection and classification of malwares. Image-based malware detection includes the usage of machine learning and computer vision models for analyzing the visual representation of malware, including binary images or screenshots, for the purpose of detecting malicious behaviors. This techniques provides the potential to identify previously hidden or polymorphic malware variants based on the visual features, which provide a further layer of defense against emerging cyber-attacks. This study introduces a new Snake Optimization Algorithm with Deep Convolutional Neural Network for Image-Based Malware Classification technique. The primary intention of the proposed technique is to apply a hyperparameter-tuned deep learning method for identifying and classifying malware images. Primarily, the ShuffleNet method is mainly used to derivate the feature vectors. Besides, the snake optimization algorithm can be deployed to boost the choice of hyperparameters for the ShuffleNet algorithm. For the recognition and classification of malware images, attention-based bi-directional long short-term memory model. The simulation evaluation of the proposed algorithm has been examined using the Malimg malware dataset. The experimental values inferred that the proposed methodology achieves promising performance with a maximum accuracy of 98.42% compared to existing models.
This study introduces a new Snake Optimization Algorithm with Deep Convolutional Neural Network for Image-Based Malware Classification technique. Primarily, the ShuffleNe...
Published in: IEEE Access ( Volume: 12)
Page(s): 95047 - 95057
Date of Publication: 09 July 2024
Electronic ISSN: 2169-3536

Funding Agency:


SECTION I.

Introduction

Malware is malicious software that evolved purposefully and affects computer systems [1]. It is utilized to infiltrate, attack, or acquire accessibility to any digital resources that can be highly complex and cause loss or undesirable outcomes in a system [2]. The main aim is to cause damage and attack resources whose accessibility is not open. Approximately 360,000 novel malware files have been identified regularly, and the quantity of files created every day must be raised by 5.2% [3]. The rapid evolution and widespread dissemination of malware are made possible by using automated and sophisticated malware creation tools [4]. Anti-virus software depends upon two frequently employed methods, behaviour- and signature-based detection, to identify and classify malware [5]. The signatures of the malware have been generally gained from known malware by executing static analysis (no implementation of the malware) [6]. A database made of signatures gathered from different malicious objects that could be employed for malware identification and classification [7]. Although the signature-based detection technique is extremely fast and accurate, it can be simply avoided by using obfuscation algorithms (for example, metamorphism, packing, encryption, and polymorphism) to produce a new variant [8].

Furthermore, the signature database, typically reliant on static analysis, has traditionally been manually updated and curated, resulting in labour-intensive and time-consuming processes [9]. Unlike signature-based detection, in behaviour-based detection, behaviours of the malware can be obtained and recorded in the execution of specified malicious codes on dynamic analysis [10]. Therefore, this method will recognize metamorphic and polymorphic viruses dependent upon their behaviours. However, storing run-time behavioural patterns is deliberated by extensive resources [11]. Additionally, the behaviour dataset must be upgraded once a new malware type can be determined. An innovative technique is of great importance for malware classification, which depends on image processing [12]. According to the texture of the malware image, this algorithm permits a classifier to identify and categorize the present malware samples that could be transformed from the collected malware binary.

Various researchers have developed numerous innovative methods, a few of which are accomplished by employing the previous techniques for classification [13]. Primarily, shallow machine learning (ML) methods are implemented for the classification of malware instances. Convolution neural networks (CNNs) entered into presence in the ’90s [14]. They became extensively utilized as a region of interest (RoI) for computer vision (CV) relevant tasks, particularly images, speech, and time series [15]. Similarly, it can be highly efficacious in object detection for identification, classification difficulties, and so on. Stimulated by the achievement of CNN in classification difficulties, several studies have been implemented to understand the usage of CNN in malware classification [16]. CNN is the deep neural network (DNN) that exceedingly depends on convolutional methods and comprises layers such as a fully connected (FC) layer, pooling layer, and convolution layer, which are developed for automatically and flexibly learning the spatial hierarchies of features by employing a backpropagation (BP) method [17]. The CNN model automated the feature extraction and entwined learned features with input data, creating a robust tool for classifying information with vast amounts of features or images [18].

This study introduces a new Snake Optimization Algorithm with Deep Convolutional Neural Network for Image-Based Malware Classification (SODCNN-IMC) technique. In the SODCNN-IMC model, the ShuffleNet technique is applied for effectual derivation of the feature vectors. Besides, the SO algorithm could be exploited to boost the choice of hyperparameters in the ShuffleNet architecture. For the detection and classification of malware images, attention-based bi-directional long short-term memory (ABiLSTM) approach. The performance evaluation of the SODCNN-IMC technique is validated using the Malimg malware dataset. The experimental values inferred that the SODCNN-IMC methodology achieves excellent performance over other methods in terms of diverse evaluation measures. In short, the key contributions of the study is given as follows:

  • Design a strong malware detection using SODCNN-IMC method through efficiently leveraging the feature extracted by the ShuffleNet, finetuned the hyperparameter through the Snake Optimization, and the attention module of ABiLSTM.

  • Utilizes ShuffleNet for effective feature extraction from images that considerably decreases the computation difficulty while maintaining superior performance, making it fit for resource-constraint environment.

  • Develops SO technique to automatically finetune the hyperparameter of the ShuffleNet model that effectively explore the hyperparameter space, which enables the model to adapt and enhance its performance for various datasets.

  • Applies BiLSTM layers for capturing long-range dependencies and focus on significant parts within the input images. The attention module improves the model’s capability to distinguish between benign and malicious features, enhancing the overall accuracy of detection.

  • Hyperparameter optimization of the ABiLSTM algorithm using SO technique based cross-validation assist in boosting the prediction outcomes of the presented technique for hidden data.

The rest of the paper is organized as follows. Section II provides the related works and section III offers the proposed model. Then, section IV gives the result analysis and section V concludes the paper.

SECTION II.

Related Works

Zhao et al. [19] presented a static malware identification method dependent upon the AlexNet CNN model. Instead of present solutions, the developed method converted each malicious byte into colour images. It also provides an enhanced AlexNet framework and resolves the unbalanced databases with the data improvement algorithm. In [20], a new visual malware recognition architecture dependent upon DNNs was developed. Initially, execution file instances were composed and transformed into bytes and asm files using disassembled technology. Next, visualization technology integrated with data augmentation was utilized for additional sample conversion into 3-channel RGB images. Lastly, a DNN model was introduced, i.e. SEResNet50+ Bi-LSTM + Attention (SERLA). Chaganti, Ravi and Pham [21] projected an effective NN architecture, EfficientNetB1, employing the level of malware byte image representation method. To mitigate the computational resources and consumption caused by DL methods testing and training the different CNN-based techniques, the technique executed the task and computational efficacy assessment of the diverse CNN pre-trained architecture for choosing the preeminent CNN model for classifying malware.

In [22], a new DeepGray technique was presented to classify multi-class malware through grayscale images and the supremacy of DL. This method includes converting execution files into a format appropriate for DL. In the data preprocessing stage, Principal Component Analysis (PCA) was implemented. ViT, VGG16, EfficientNet, and modified CNN frameworks could be employed for classification. Bakour and Ünver [23] implemented an innovative hybrid DL algorithm named DeepVisDroid. There are two kinds of image-based features, such as global and local features, that must be removed. Subsequently, 1D convolutional layers-based NN architecture was designed and trained. Additionally, two standard 2D convolutional layers-based NN frameworks could be developed, and two well-known DL methods were examined.

In [24], a lightweight malware classification method named IMCLNet was introduced, which must be determined by malware images and required domain knowledge and feature engineering. While developing the architecture, the method widely weighed accuracy, multiple parameters, and computational rate and incorporated Coordinate Attention, Global Context Embedding, and Depth-wise Separable Convolution.

Copiaco et al. [25] designed a new multi-functional malware recognition architecture. This method discovers numerous pre-trained networks comprising traditional and compact networks, series, and fixed acyclic graph structures for classifying malware. The algorithm exploits grayscale transform-based features as consistent features, converting malware classification through different file categories. The technique incorporates several databases into the training model. In [26], an innovative method employing image-based DL classification space. Especially, Jadeite removes the Inter-procedural Control Flow Graph (ICFG) from a specific Java bytecode file followed by clipping the ICFG and transforming it into an adjacent matrix. The architecture leverages an object identification technique in a DCNN model to detect maliciousness.

In [27], a multi-headed attention based technique is combined to a CNN to find and classify the tiny diseased areas in the complete image. The performance of the planned multi-headed attention-based CNN technique was equated with numerous non-attention-CNN-based techniques on numerous data splits of testing and training malware image benchmark dataset. Chaganti et al. [28] proposed a DL-based CNN method in order to achieve the malware identification on Portable Executable (PE) dual files utilizing the fusion feature set model. We presented a wide performance assessment of numerous DL method structure and ML classifier i.e. Support Vector Machine (SVM), on multi-aspect feature sets covering the dynamic, static, and image features to pick the developed CNN method. Reilly et al. [4] explores the efficiency of training DL methods with Generative Adversarial Network-generated data to recover their sturdiness beside such assaults. Ben Abdel Ouahab et al. [29] developed and test a malware classifier capable to affect every inputted malware into its equivalent family. To do so, we utilize the multi-layer perceptron technique with malware visualization model.

SECTION III.

The Proposed Method

In this research, we have developed an innovative SODCNN-IMC system. The foremost goal of the SODCNN-IMC model is to apply a hyperparameter-tuned DL method for recognizing and categorizing malware images. The SODCNN-IMC technique contains three major processes, namely ShuffleNet-based feature extraction, SO-based parameter tuning, and ABiLSTM-based classification process. Fig. 1 demonstrates the complete procedure of the SODCNN-IMC system.

FIGURE 1. - Overall procedure of the SODCNN-IMC algorithm.
FIGURE 1.

Overall procedure of the SODCNN-IMC algorithm.

A. Feature Extraction

ShuffleNet architecture can be applied to the effectual derivation of feature vectors for feature extraction. We implement the ShuffleNet, an extremely effective DL model produced with mobile devices [30]. According to the computation resource (hardware), we applied the shufflenetV1 version of the pre-trained ShuffleNet method to obtain the best outcome at a lower computation cost. This introduced technique can be higher than typical CNN with 50 learnable layers through the FC layer, one convolutional (Conv) layer, and 48 group Conv layers. The CNN model has a total of 172 layers, involving 49 BN layers, a classification layer, one max pooling layer, four average pooling layers, a softmax layer, and 33 ReLu layers.

The primary layer is the input layer and accepts an image size of 224\times 224 (CT scan, ECG trace image, or chest radiograph) [31]. The first Conv layer was utilized for removing features in the input image of 224\times 224 at 24 filters (kernels) of 3\times 3 size with a stride of 2\times 2 to generate the feature map. The outcome of the Conv layer is evaluated by:\begin{align*} s\left ({{ i,j }}\right)& =\left ({{ I\times K }}\right)\left ({{ i,j }}\right) \\ & =\sum \limits _{n}^{} \sum \limits _{m}^{} I \left ({{ m,n }}\right)K\left ({{ i-m,j-n }}\right) \tag {1}\end{align*} View SourceRight-click on figure for MathML and additional features.

In Eq. (1), s denotes the outcome mapping feature, i indicates the input image, and ~K shows the filter of the Conv layer. The output of size o=((i-k)+2p)/(s+1) will be produced later employing the Conv process with the input image, whereas i denotes input, p implies padding, s signifies steps, and k means the size of kernels.

The ShuffleNet architecture with stride (shift) of 2\times 2 receives the resultant mapping features of the 1st Conv layer. Its component contains three Conv procedures, viz., 3\times 3 depthwise Convs and dual 1\times 1 pointwise group Conv. A primary point-wise group Conv has been proceeded by BN, channel shuffle process, and ReLU activation function. ReLU activation is used since it is straightforward and efficient.\begin{align*} f\left ({{ x }}\right)= \begin{cases} \displaystyle 0, & x\lt 0 \\ \displaystyle x, & x\ge 0 \end{cases} \tag {2}\end{align*} View SourceRight-click on figure for MathML and additional features.

ReLU stimulates neurons with positive value and disables neurons with negative value of 0. The architecture has a 3-by-3 average pooling under the shorter path. It has 16 consecutive ShuffleNet elements. Also, it has 50 layers, all are contained the trained feature map [32]. Also, the layer performs feature extraction. Softmax activation could be employed to resolve the classification possibility employed by the ultimate classification layer.\begin{equation*} a_{i}=\sum \limits _{j=0}^{m\times n-1} w_{ij} \times x+b_{i} \tag {3}\end{equation*} View SourceRight-click on figure for MathML and additional features.

In Eq. (3), b denotes the bias; w represents the weights. i refers to the index output of the FC layer; i,m,d , and n are the index, width, depth, and height of the FC layer outcome. The classification probability of the softmax layer could be generated up to 1000 distinct classes.

B. Hyperparameter Tuning Using So Model

Besides, the SO algorithm can be exploited to select the optimum hyperparameter for the ShuffleNet architecture. SO has been developed to handle a diverse set of optimization functions that reproduce the superior mating behavior of snakes [33]. All snakes (male/female) compete to have the most significant partner once the present quantity of food is sufficient and the temperature can be lower. The SO architecture depends on two main phases similar to the alternative swarm-based methods: local search (exploitation) and global search (exploration). While food is rare and the conditions are formal, snakes in the exploration stage are distributed through the hunting region. Alternatively, the exploitation phase was broken down into numerous smaller phases. This part clarifies the arithmetical method of the SO algorithm. Thus, the below-mentioned steps demonstrate the SO method:

Initialize the solutions: SO begins a range of random solutions by employing Eq. (4) in the search space. These outcomes form the snake population of SO will be enhanced in the following steps:\begin{equation*} S_{i}=S_{\min }+R\times \left ({{ S_{\max }-S_{\min } }}\right) \tag {4}\end{equation*} View SourceRight-click on figure for MathML and additional features. where S_{i} means the place of the ith snake. An arbitrary number in [{0 ~and~1}] is named R. The maximum and minimum potential values are S_{\max } and S_{\min } , correspondingly

Snake population separation: Utilizing Eqs. (5) and (6), the population is divided equally into dual parts like 50 per cent female and 50 per cent male:\begin{align*} N_{m}& =\frac {N}{2} \tag {5}\\ N_{f}& \approx N-N_{m} \tag {6}\end{align*} View SourceRight-click on figure for MathML and additional features. whereas N denotes the total population size. N_{f} and N_{m} represent the total amount of female and male snakes.

Estimation of snakes: Acquire the ideal snake from the female and male groups (S_{bestm} and S_{bestf} ), and discover the food (L_{food}) . The descriptions of dual extra terms describe food quantity (Q) and temperature (T), which are stated separately as Eqs. (7) and (8):\begin{equation*} T=e^{\frac {-c}{C}} \tag {7}\end{equation*} View SourceRight-click on figure for MathML and additional features. where c denotes the present iteration, and C represents the maximal amount of iterations.\begin{equation*} Q=k_{1}x\ominus \left ({{ \frac {c-C}{c} }}\right) \tag {8}\end{equation*} View SourceRight-click on figure for MathML and additional features. where K_{1} represents the constant value equivalent to 0.5.

Exploring the searching region (food is not determined): This mainly depends upon the usage of a selected value of the threshold. When Q is 0.25, the snakes upgrade their locations relative to an assumed arbitrary place to hunt globally. It is defined in Eqs. (9) to (12).\begin{align*} S_{mi}\left ({{ c+1 }}\right)& =S_{mR}\left ({{ c }}\right)\pm K_{2}\times AB_{m} \\ & \qquad \qquad \quad \times \left ({{ \left ({{ S_{\max }-S_{\min } }}\right)\times R+S_{\min } }}\right) \tag {9}\end{align*} View SourceRight-click on figure for MathML and additional features. where R denotes the random amount in [{0 ~and~1}],S_{mi} and S_{mR} refer to the place of i^{th} and arbitrary male snake. The ability of the male snake to detect food can be signified by AB_{m} and exposed utilizing Eq. (7):\begin{equation*} AB_{m}=e^{\left ({{ -F_{mR} }}\right)\mathrm {/(}F_{mi})} \tag {10}\end{equation*} View SourceRight-click on figure for MathML and additional features. whereas K_{2} denotes the constant equivalent to 0.05, F_{mR} refers to the fitness of the S_{mR} snake, and F_{mi} signifies the fitness of the i^{th} snake from the male group.\begin{align*} S_{fi}\left ({{ c+1 }}\right)& =S_{fR}\left ({{ c }}\right)\pm \mathrm { }K_{2}\times AB_{f} \\ & \qquad \quad \times \left ({{ \left ({{ S_{\max }-S_{\min } }}\right)\mathrm {\times }R+S_{\min } }}\right) \tag {11}\end{align*} View SourceRight-click on figure for MathML and additional features. where S_{fR} denotes the random female snake place, R indicates the random number at 0 and 1,S_{fi} stands for the female snake place with i^{th} , and AB_{f} is the capability of the female snake to discover foodstuff.\begin{equation*} AB_{f}=e^{\left ({{ -F_{fR} }}\right)\mathrm {/(}F_{fi})} \tag {12}\end{equation*} View SourceRight-click on figure for MathML and additional features. where F_{fR} signifies the fitness of the female snake group S_{fR},F_{fi} represents the fitness of the snake within the ith snake, and K_{2} is a constant equivalent to 0.05.

Exploiting the searching region (food is found): The temperature should be verified when the quantity of food is more significant than a pre-determined threshold Q\gt 0.25 . The results can be moved to the food where T\gt 0.6 (hot).\begin{align*} S_{\left ({{ i,j }}\right)}\left ({{ c+1 }}\right)=L_{food}\pm K_{3}\times T\times R\times \left ({{ L_{food}-S_{\left ({{ i,j }}\right)}\left ({{ c }}\right) }}\right) \tag {13}\end{align*} View SourceRight-click on figure for MathML and additional features. While S_{(i,j)} epitomizes a snake’s location, L_{food} signifies the finest snake, and K3 denotes a constant value equivalent to 2 [34].

The snake can be in the mating or fighting mode if T\gt 0.6 (cold), \begin{align*} S_{mi}\left ({{ c+1 }}\right)=S_{mi}\left ({{ c }}\right)\pm \mathrm { }K_{3}\times FA_{m}\times R\times \left ({{ S_{fbest}-S_{mi}\left ({{ c }}\right) }}\right) \tag {14}\end{align*} View SourceRight-click on figure for MathML and additional features. where S_{mi} is the ith male location, S_{fbest} denotes the location of the finest snake within the female group, and FA_{m} signifies the fighting capability of the male snake.\begin{align*} S_{fi}\left ({{ c+1 }}\right)=S_{fi}\left ({{ c }}\right)\pm \mathrm { }K_{3}\times FA_{f}\times R\times \left ({{ S_{mbest}-S_{fi}\left ({{ c }}\right) }}\right) \tag {15}\end{align*} View SourceRight-click on figure for MathML and additional features. where S_{fi} implies the i^{th} female place, S_{mbest} denotes the top snake position within the male group, and FA_{f} represents the fighting capability of the female snake. FA_{m} and FA_{f} are originated as mentioned formula:\begin{align*} FA_{m}& =e^{\mathrm {(-}F_{fbest}\mathrm {)/(}F_{i})} \tag {16}\\ FA_{f}& =e^{\mathrm {(-}F_{mbest}\mathrm {)/(}F_{i})} \tag {17}\end{align*} View SourceRight-click on figure for MathML and additional features. where F_{mbest} indicates the best male snake fitness, F_{i} represents the i^{th} snake fitness, and F_{fbest} designates the best female snake fitness.\begin{align*} S_{mi}\left ({{ c+1 }}\right)& =S_{mi}\left ({{ c }}\right)\pm \mathrm { }K_{3}\times MA_{m}\times R \\ & \quad \times \left ({{ Q\times S_{fi}\left ({{ c }}\right)-S_{mi}\left ({{ c }}\right) }}\right) \tag {18}\\ S_{fi}\left ({{ c+1 }}\right)& =S_{fi}\left ({{ c }}\right)\pm \mathrm { }K_{3}\times MA_{f}\times R \\ & \quad \times \left ({{ Q\times S_{mi}\left ({{ c }}\right)-S_{fi}\left ({{ c }}\right) }}\right) \tag {19}\end{align*} View SourceRight-click on figure for MathML and additional features. where S_{fi} and S_{mi} denote the female and male positions of the ith snake, correspondingly, and MA_{m} and MA_{f} define the abilities of males and females for mating, separately and resultant as below:\begin{align*} MA_{m}& =e^{\mathrm {(-}F_{fi}\mathrm {)/(}F_{mi})} \tag {20}\\ MA_{f}& =e^{\mathrm {(-}F_{mi}\mathrm {)/(}F_{fi})} \tag {21}\end{align*} View SourceRight-click on figure for MathML and additional features.

If an egg hatches, pick the worst female and male snakes and exchange them.\begin{align*} S_{mworst}& =S_{\min }+R\times \left ({{ S_{\max }-S_{\min } }}\right) \tag {22}\\ S_{fworst}& =S_{\min }+R\times \left ({{ S_{\max }-S_{\min } }}\right) \tag {23}\end{align*} View SourceRight-click on figure for MathML and additional features.

S_{fworst} and S_{mworst} denote the worst snakes from the female and male groups, respectively [35]. The pm is a diversity factor operator that offers the choice of increasing or reducing snake locations. It is employed to differentiate the snake locations from the search space in any way.

The SO method technique derives a fitness function to attain improved classification performance. It determines a positive integer to represent the better performance of the candidate solutions. In this study, the minimization of the classification error rate is considered as the fitness function as specified in Eq. (24).\begin{align*} fitness\left ({{ x_{i} }}\right)& =ClassifierErrorRate\left ({{ x_{i} }}\right) \\ & =\frac {No\mathrm {.}of~misclassified~instances}{Total~no\mathrm {.}of~instances} \ast 100 \tag {24}\end{align*} View SourceRight-click on figure for MathML and additional features.

C. ABiLSTM-Based Classification

Eventually, the SODCNN-IMC approach can be utilized in the ABiLSTM model. Like LSTM and Bi-LSTM, a deep neural network (DNN) is introduced for extracting temporal features between network packets since network traffic is an incessant flow of sequence data in packet bytes [36]. LSTM is a variant of RNN used to address explosion and vanishing gradient problems. When equated to RNN, LSTM networks have been significantly more effective in detecting long-term dependency within sequence data, making them the best alternative to extract temporal features from traffic network data. The architecture of LSTM has forget, input, and output gates. Fig. 2 signifies the framework of the ABiLSTM model.

FIGURE 2. - Architecture of ABiLSTM model.
FIGURE 2.

Architecture of ABiLSTM model.

A forget gate is used to control how much data is to be retained or discarded. The data from the prior HLs and the existing input are passed over the sigmoid function that produces values between 0 and 1. When the value is closer to 1, then it is highly possible to be remembered; when the value is closer to zero, then it is highly possible to be forgotten. Eq. (25) evaluates the forget vector, where the parameters W_{f} and b_{f} are the forget gates, x_{T} denotes the input vector at the t step, and the HL vector at step t-1 is h_{t-1} .\begin{equation*} f_{t}=\sigma \left ({{ W_{f}\mathrm {.}\left [{{ h_{t-1},x_{t} }}\right ]+b_{f} }}\right) \tag {25}\end{equation*} View SourceRight-click on figure for MathML and additional features.

The input gate acts as a value evaluator for information included in extended memory as original data. It handles how much x_{t} input dataset is added to C_{t} . Also, it defines what information to be forgotten or retained for the cell state [37]. When the value is closer to zero, then more information is discarded; when the input vector value is closer to one, then additional information is retained in long-range memory. W_{i} and b_{i} are the input gate parameters.\begin{equation*} i_{t}=\sigma \left ({{ W_{i}\cdot \left [{{ h_{t-1},x_{t} }}\right ]+b_{i} }}\right) \tag {26}\end{equation*} View SourceRight-click on figure for MathML and additional features.

The forgot gate f_{t} is multiplied by C_{t-1} (the prior cell layer vector); later, the input gate vector is multiplied utilizing \tilde {C}_{t} point-wise multiplication, \tilde {C}_{t} indicates the data enclosed from the HL vector.\begin{align*} \tilde {C}_{\mathrm {t}}& =tanh\left ({{ W_{c}\mathrm {.}\left [{{ h_{t-1},x_{t} }}\right ]+b_{c} }}\right) \tag {27}\\ C_{t}& =f_{t}\ast C_{t-1}+i_{t}\ast \tilde {C}_{t} \tag {28}\end{align*} View SourceRight-click on figure for MathML and additional features.

The output gate in LSTM defines what long-term memory must be transmitted to the outcome. W_{o} and b_{o} are the output gate parameters.\begin{align*} o_{t}& =\sigma \left ({{ W_{o}\cdot \left [{{ h_{t-1},x_{t} }}\right ]+b_{o} }}\right) \tag {29}\\ h_{t}& =\mathrm {o}_{t}\ast tanh\left ({{ C_{t} }}\right) \tag {30}\end{align*} View SourceRight-click on figure for MathML and additional features.

BiLSTM is a variant of the sequence processing model composed of 2 LSTMs, namely the forward and reverse direction [38]. The LSTM network is used to address the long-term dependency problem. Due to its architecture, LSTM can memorize data over time and learn long-term information. Compared to LSTM, this study exploits the BiLSTM model due to its high prediction accuracy. The attention module focuses on the data generated by the HL of BiLSTM.

SECTION IV.

Performance Validation

The experimental evaluation of the SODCNN-IMC system is validated by employing the Malimg malware database [39]. The database encompasses 1709 samples under ten classes, as described in Table 1. The Malimg dataset is a commonly employed benchmark database in the area of computer security and malware analysis. It includes a collection of grayscale imageries demonstrating dissimilar kinds of malware samples removed from the wild. Every image in the dataset matches to an exact malware model and is resized to an even length for consistency in study. The dataset offers a valued resource for practitioners and researchers in order to progress and assess image-based malware recognition models and approaches. With its various range of malware samples and families, the Malimg dataset eases the survey of numerous features of image-based malware analysis, with feature extraction, detection and classification.

TABLE 1 Details of the Dataset
Table 1- Details of the Dataset

Fig. 3 displays the classifier performance of the SODCNN-IMC system under the test dataset. Figs. 3a- 3b represents the confusion matrices accomplished by the SODCNN-IMC method at 80:20 of TRPH/TSPH. This figure denoted the SODCNN-IMC technique, which can be identified and categorized into ten classes. Next, Fig. 3c reveals the PR of the SODCNN-IMC system. This figure shows that the SODCNN-IMC technique achieves maximum PR performance. Lastly, Fig. 3d shows the ROC investigation of the SODCNN-IMC methodology. This result indicates that the SODCNN-IMC algorithm offers effective outcomes with maximum ROC values in diverse classes.

FIGURE 3. - Classifier performance of (a-b) Confusion matrices and (c-d) PR and ROC curves.
FIGURE 3.

Classifier performance of (a-b) Confusion matrices and (c-d) PR and ROC curves.

The malware classification performance of the SODCNN-IMC technique in the applied dataset is described in Table 2 and Fig. 4. These simulation values depict that the SODCNN-IMC technique achieves better performance with 10 class labels. With 80% of TRPH, the SODCNN-IMC technique gives an average accu_{y} of 98.24%, prec_{n} of 90.44%, reca_{l} of 90.10%, F_{score} of 90.20%, and MCC of 89.26%. Additionally, with 20% of TSPH, the SODCNN-IMC technique offers an average accu_{y} of 98.42%, prec_{n} of 91.36%, reca_{l} of 91.38%, F_{score} of 91.31%, and MCC of 90.46%.

TABLE 2 Malware Classifier Analysis of SODCNN-IMC Technique With 80:20 of TRPH/TSPH
Table 2- Malware Classifier Analysis of SODCNN-IMC Technique With 80:20 of TRPH/TSPH
FIGURE 4. - Average of SODCNN-IMC technique with 80:20 of TRPH/TSPH.
FIGURE 4.

Average of SODCNN-IMC technique with 80:20 of TRPH/TSPH.

The accu_{y} curves for training (TR) and validation (VL) shown in Fig. 5 for the SODCNN-IMC technique offer valuable insights into its performance under various epochs. Mainly, it can be a consistent upgrading in both TR and TS accu_{y} with increasing epochs, indicating the proficiency of the model in learning and recognizing designs from both TR and TS data. The upward trend in TS accu_{y} underscores the model’s adaptability to the TR dataset and its ability to make correct predictions under unnoticed data, emphasizing capabilities of persistent generalization.

FIGURE 5. - 
$\it {Accu}_{\it {y}}$
 curve of the SODCNN-IMC methodology.
FIGURE 5.

\it {Accu}_{\it {y}} curve of the SODCNN-IMC methodology.

Fig. 6 provides an extensive summary of the TR and TS loss values for the SODCNN-IMC technique over frequent epochs. This TR loss reliably reduces as a model improves weights to diminish classification errors below the datasets. The loss curves clearly illustrate the model’s position with the TR data, emphasizing its ability to capture patterns effectively in both datasets. Noteworthy is the continuous refinement of parameters in the SODCNN-IMC methodology, aimed at lessening discrepancies between predictions and actual TR labels.

FIGURE 6. - Loss curve of the SODCNN-IMC methodology.
FIGURE 6.

Loss curve of the SODCNN-IMC methodology.

An extensive comparative study of the SODCNN-IMC method is provided with recent approaches [40] in Table 3. Fig. 7 represents a comparison result of the SODCNN-IMC approach in respect of accu_{y} . These accomplished findings stated that the SODCNN-IMC method has boosted effectiveness with increased accu_{y} values. According to accu_{y} , the SODCNN-IMC technique offers enhanced accu_{y} of 98.42% while the GoogleNet, ResNET, VGG16+ SVM, SSPNet1, Multi-Objective learning, CapsNet, Kernel-based ELM, and CNN with VGG16 methods obtain decreased accu_{y} values of 84%, 86.01%, 92.97%, 96.60%, 96.86%, 96.58%, 94.25%, and 97.62%, respectively.

TABLE 3 Comparison Analysis of the SODCNN-IMC Model With Other Models
Table 3- Comparison Analysis of the SODCNN-IMC Model With Other Models
FIGURE 7. - 
$\it {Accu}_{\it {y}}$
 analysis of the SODCNN-IMC system with other models.
FIGURE 7.

\it {Accu}_{\it {y}} analysis of the SODCNN-IMC system with other models.

Fig. 8 illustrates a comparative review of the SODCNN-IMC technique with respect to CT. The obtained outcomes stated that the SODCNN-IMC algorithm gains increased performance with lesser CT values. According to CT, the SODCNN-IMC technique provides reduced CT of 1.41s while the GoogleNet, ResNET, VGG16+ SVM, SSPNet1, Multi-Objective learning, CapsNet, Kernel-based ELM, and CNN with VGG16 techniques obtain higher CT values of 5.01s, 4.89s, 4.94s, 3.30s, 3.85s, 5.01s, 2.85s, and 2.97s respectively.

FIGURE 8. - CT outcome of SODCNN-IMC technique with other systems.
FIGURE 8.

CT outcome of SODCNN-IMC technique with other systems.

The SODCNN-IMC method for malware image classification exhibits excellent performance because of its novel sequence of recent methods aimed particularly at this task. The combination of a ShuffleNet-based feature extraction offers a secure framework for proficiently capturing appropriate image features and optimizing computational resources without offering classifier accuracy. By leveraging the effectual channel shuffling functions and hierarchical feature extractor abilities of ShuffleNet, this method efficiently recognizes intricate patterns in the malware images, improving its discriminative power. In addition, the SO-based hyperparameter tuning process allows a fine-grained optimizer of model parameters, dynamically adjusting to the difficulties and nuances present in the database. This adaptive tuning process improves the method’s flexibility and generalized ability, as it may result in enhanced solutions under various malware samples. Also, the combination of an ABiLSTM classification method enables the model to efficiently capture temporal dependencies and contextual data in order of feature extraction, added to refining the classification decisions. Overall, the SODCNN-IMC method excels in malware image classification by synergistically integrating effectual feature extractors, adaptive hyperparameter tuning, and refined sequence modelling processes because it leads to higher performance related to other methods.

SECTION V.

Conclusion

In this research, we have recognized a new SODCNN-IMC methodology. The core concentration of the SODCNN-IMC methodology is to apply the hyperparameter-tuned DL method for the classification and detection of malware images. The SODCNN-IMC model contains three significant processes: ShuffleNet-based feature extractor, SO-based parameter tuning, and ABiLSTM-based classification. The design of the SO-based hyperparameter tuning process helps improve the overall recognition rate of the proposed model. The experimental evaluation of the SODCNN-IMC technique on the Malimg malware dataset demonstrates superior performance with a maximum accuracy of 98.42% over other models. Therefore, the proposed model provides a promising avenue to enhance cybersecurity measures.

While SODCNN-IMC method provides major developments in image-based malware recognition, it is significant to recognize its limits. Initially, in spite of the efficacy gains attained over ShuffleNet and SO, the method might still face tasks in processing higher-resolution imageries or larger-scale datasets owing to characteristic computational limits. Moreover, the Abi-LSTM structure improves feature discrimination and representation, it may fight with taking very difficult spatial relationship or subtle designs within imageries, possibly foremost to misclassification or false negatives. Besides, the efficiency of SO model trusts deeply on the excellence and representativeness of the early parameter configuration, which may not constantly assurance optimum performance through every datasets or states.

Future work can focus on combining advanced methods like transfer learning (TL) and ensemble learning to improve the robustness and generalisation of these classification methods. Additionally, as cyber threats continue to develop, there may be a developing concentration on real-time recognition and response mechanisms leveraging image-based malware classification, allowing proactive defense against developing threats.

References

References is not available for this document.