Introduction
Machine learning (ML) has become integral to modern cybersecurity, marking a breakthrough in the detection of zero-day malware. The success of ML techniques has led cyber defense researchers and antivirus vendors to increasingly adopt these methods to address the evolving landscape of malware variants [1], [2]. ML-based malware detection essentially involves analyzing benign and malicious files, extracting features through static and dynamic analysis, and utilizing these features to train ML models [3], [4], [5], [6]. These detection systems have demonstrated high success rates in identifying both known malware and novel threats [7], [8]. However, ML systems are particularly vulnerable to adversarial attacks, where slight modifications to input samples can deceive detectors into misclassifying malware as benignware, posing severe cybersecurity risks.
Adversarial attacks pose even greater challenges in resource-constrained IoT systems. Machine learning-based IoT malware detection remains less developed compared to its Windows counterpart. The limitations in IoT environments, such as restricted computational resources and diverse CPU architectures, necessitate lightweight and efficient solutions, making direct adaptation of Windows-based techniques difficult [6]. To address these challenges, researchers in IoT malware detection have predominantly relied on structural features, such as control flow graphs (CFGs) and function call graphs (FCGs) [9], [10], [11]. These features are particularly effective for detecting malware across the diverse CPU architectures in IoT systems, as noted by Li et al. [12].
Similarly, adversarial attacks on IoT malware detection are still in their infancy compared to those targeting Windows systems. Most studies on adversarial attacks in the IoT domain focus on payload injections into malware samples and involve feature-space manipulations. For instance, [13], [14] embed graphs from benign samples into the malware CFGs to evade detection. Likewise, Esmaeili et al. [15] propose a GNN-based adversarial detector that involves merging CFGs from benign and malware samples to generate adversarial samples, learning the distribution of benign samples to filter out the adversarial ones. Sandor et al. [16] append extra bytes from malware and benign samples into malware binaries to evade detection, followed by adversarial training to harden the detector. Abusnaina et al. [17] demonstrate that most ML IoT malware detection approaches are vulnerable to simple manipulations like packing, stripping, and padding. Khormali et al. [18] introduce the COPYCAT attack, appending adversarial images to malware for IoT and Windows detection evasion. Ngo et al. [19] utilize reinforcement learning to modify PSI-graphs with dummy vertices and edges, followed by adversarial training to improve the detector robustness. While padding and payload injections can trick some malware detectors, these methods can often be mitigated by removing the padded bytes before classification.
This study evaluates the robustness of ML-based IoT malware detection systems against adversarial attacks, focusing on structural detectors due to their prominence in IoT environments. We introduce a novel semantic-preserving black-box adversarial attack on IoT structural detectors. A multi-structural substitute detector is trained on a large IoT dataset using CFG and FCG graphical features, with Explainable AI guiding binary-level manipulations to induce misclassification. Advanced binary diversification methods—function inlining, branch function insertion, control flow graph flattening, basic block merging, and basic block reordering—are used to modify malware binaries at both the basic block and function levels, successfully evading detection. To our knowledge, this is the first use of these techniques in adversarial attacks on ML-based malware detection. The generated adversarial examples demonstrate high transferability, evading detection by four structural detectors, several commercial antivirus engines, and a recent IoT adversarial detector. Our main contributions are summarized below.
We introduce a novel black-box functionality-preserving adversarial attack to evaluate the robustness of ML-based structural IoT malware detectors. Our approach employs advanced binary diversification techniques, such as function inlining, branch function insertion, control flow graph flattening, basic block merging, and basic block reordering, to modify malware samples and evade detection. Unlike common methods like payload injection and padding, our strategy does not leave obvious signatures, making it more challenging to defend against.
We compile a comprehensive IoT dataset containing over 248,000 Executable and Linkable Format (ELF) binary files from various CPU architectures, including benign and malicious samples from diverse IoT malware families, for our experiments. We then train a multi-structural substitute detector, utilizing both CFG and FCG graphical features, achieving high detection rates of up to 98.27%
Leveraging SHAP (SHapley Additive exPlanations) analysis [20], we execute the attack on the substitute detector, generating practical adversarial examples with minimal attack cost. These samples exhibit high transferability, evading four detectors [8], [9], [12], [21] trained on different structural features, with evasion rates up to 100% and an average binary size increase of just 8.35%. Additionally, the adversarial samples evade a recent IoT adversarial detector [15] and several commercial antivirus engines.
The remainder of the paper is organized as follows: Section II covers related work and background information, Section III presents the proposed methodology, Section IV discusses the experimental results and analysis, and Section V concludes the study.
Background Information and Related Work
This section reviews background information and related work, including ML-based malware detection, a literature review of adversarial attacks on malware detection, and binary diversification techniques.
A. Machine Learning Malware Detection
Malware detection is critical across various computing platforms, including Windows, Android, and IoT. Considerable efforts have been devoted to effectively detecting malware. Traditional approaches, rooted in signature-based methods, rely on extensive databases of known malware signatures. When a suspicious file is encountered, its signature is compared against those stored in the database. However, this method’s reliance on predefined signatures renders it ineffective against novel and unknown malware variants and inadequate for emerging cybersecurity threats. To overcome these limitations, and inspired by the success of machine learning in other domains, ML models have been adapted for malware detection, demonstrating strong generalization capabilities for identifying new and unseen (zero-day) malware variants [22]. ML-based malware detection comprises three main steps: data collection, feature engineering, and model training and evaluation.
1) Data Collection
This step involves collecting and labeling sufficient malware and benign samples. Labeling is typically done using malware analysis tools like VirusTotal, which detects malware across about 70 modern antivirus engines [23]. However, detection results for the same file may vary across engines. To address this, either the most recognized antivirus engine is chosen, or a voting-based approach is used.
2) Feature Engineering
As machine learning models only operate on numeric inputs, feature engineering is a pivotal step in ML malware detection. It involves extracting intrinsic features from the collected files and converting them into corresponding numeric representations, which are then used to train the models to distinguish between benign and malicious files. In malware detection, features fall into three categories based on their extraction method: static, dynamic, and hybrid [3], [24].
Static features, derived directly from samples without the need for execution, are widely employed in malware detection due to their ease of extraction and effectiveness. For instance, printable strings [1], [11], [24], byte sequences [25], [26], PE/ELF headers [5], [27], and grayscale images [3], [22], [28], [29] have proven effective in detecting Windows, Android, and IoT malware.
Dynamic features involve executing binaries in isolated environments like virtual machines or sandboxes and monitoring runtime statuses of system resources, networks, registries, and files. Metrics such as CPU usage, I/O requests, and memory usage are then used to train malware detectors [30], [31]. File status features, obtained through counting and logging of created, deleted, modified, or accessed files, have also proven effective in malware detection [31].
Hybrid features are extracted through a combination of static and dynamic analysis methods. Hybrid features such as opcodes (n-gram sequence, images, frequency, etc.) [11], [28], [29], function call graphs (FCGs) [8], [10], Control flow graphs (CFGs) [9], [11], and API/system calls (sequence, list, graphs, etc.) [1], [4] have been successfully utilized in malware detection.
3) Model Training and Evaluation
After extracting numeric features, selecting a suitable machine-learning model for malware detection is crucial. Numerous algorithms, including Deep Neural Networks (DNNs), Convolutional Neural Networks (CNNs) [9], [29], Long Short-Term Memory (LSTM) networks, Multi-Layer Perceptrons (MLPs) [8], Graph Neural Networks (GNNs) [12], Support Vector Machines (SVMs) [1], Random Forests (RFs) [8], and Decision Trees (DTs) [27], have been proposed and rigorously evaluated for malware detection. These models exhibit varying success rates depending on different experimental setups and parameter settings.
B. Adversarial Attacks on Malware Detection
Despite recent advancements, ML-based malware detection systems remain inherently vulnerable to adversarial attacks that seek to undermine their decision-making processes [13], [40]. These attacks can be categorized based on the attacker’s space and knowledge level. The attacker’s space categorization includes feature-space attacks, which involve modifications to the input features, and problem-space attacks, which entail modifying real-world inputs like binary executables or source code to deceive the target detector. Based on the attacker’s knowledge, adversarial attacks can be categorized as white-box or black-box attacks. In white-box attacks, the attacker has complete knowledge of the target model, while in black-box attacks, adversaries typically have minimal information, usually only the model’s prediction output. Gray-box attacks fall between these two extremes, with varying levels of knowledge.
From these categorizations, four fundamental types of adversarial attacks are identified in the existing literature and discussed below. While this paper focuses on adversarial attacks in IoT malware detection, this section will also cover related attacks on Windows and Android platforms to provide a comprehensive overview of the relevant literature.
1) Feature-Space White-Box Adversarial Attacks
Esmaeili et al. [15] propose a structural attack on CFG-based IoT malware detectors, similar to the GEA and SGEA frameworks by Abusnaina et al. [13], [14]. Their approach merges control flow graphs (CFGs) from benign samples with target malware CFGs to create adversarial CFGs intended for a graph neural network (GNN)-based detector. They then train an adversarial detector to recognize benign CFG properties and filter out adversarial CFGs before classification.
In another attack on IoT malware detection, Ngo et al. [19] propose a reinforcement learning-based method that performs adversarial attacks on PSI (printable string information) graphs by adding dummy vertices and edges to deceive detectors. They counter these attacks with adversarial retraining.
Kreuk et al. [32] and Suciu et al. [33] successfully execute an adversarial attack against MalConv [7], a prominent raw byte-based Windows malware detector. Kreuk et al. utilize the Fast Gradient Sign Method (FGSM) to append adversarial payloads to the end of the file (append-FGSM) and into the slack regions of the sample (slack-FGSM). Suciu et al. [33] extend this approach by comparing slack-FGSM and append-FGSM, observing that slack-FGSM is more effective than append-FGSM.
Al-Dujaili et al. [35] and Verwer et al. [34] employ FGSM for white-box adversarial attacks on API Call List-based PE malware detectors. These attacks alter the malware’s binary feature vector by flipping bits in the feature space. Verwer et al.’s attack dynamically adjusts the flipped bits based on solution quality, effectively evading detection by adding irrelevant API calls.
Other attacks, such as ATMPA [37], COPYCAT [18], and AMAO [36] are aimed at image-based detectors. In ATMPA, Liu et al. [37] initially convert malware into a grayscale image and then utilize FGSM and C&W to generate adversarial examples. Similarly, COPYCAT by Khormali et al. [18] employs generic adversarial attacks to generate an adversarial image, which is subsequently appended to the original malware image. Park et al. [36] propose the AMAO adversarial attack, wherein a non-executable adversarial image is first generated using off-the-shelf adversarial attacks. They then attempt to maintain functionality by inserting semantic NOPs into the original malware, making it as similar as possible to the generated non-executable adversarial image.
2) Problem-Space White-Box Attacks
Abusnaina et al. [13] introduce Graph Embedding and Augmentation (GEA), a structural adversarial attack on CFG-based IoT malware detectors. GEA induces misclassification by inserting a benign code into the target malware sample, directly modifying its CFG. Subsequently, they propose Sub-GEA (SGEA) [14], which reduces the required embedded graph size for misclassification.
In another study, Abusnaina et al. [17] evaluate the robustness of various machine-learning IoT malware detectors against simple functionality-preserving modifications, such as padding, packing, and stripping. Their findings confirm that these detection systems remain largely vulnerable to such manipulations.
Sandor et al. [16] propose two adversarial strategies for IoT byte-based malware detection: Chunker, which appends chunks of malware to itself, and Disguiser, which embeds malware in benign files. The generated adversarial examples are then used to retrain and harden the target detector.
Kolosnjaji et al. [39] introduce AMB (Adversarial Malware Binary), a gradient-based attack specifically tailored for PE byte-based malware detectors such as MalConv [7]. This method involves appending adversarial bytes, generated via gradient descent, to the end of the original malware binary. Aryal et al. [40] similarly apply gradient-based methods to generate adversarial examples by injecting code into intra-section caves, successfully evading the MalConv detector [7].
Demetrio et al. [38] employ the integrated gradient explainability technique to assess the feature importance of MalConv detector [7]. Realizing MalConv’s reliance on PE header features, they then perform a white-box attack by modifying specific bytes in the PE binary’s DOS header, successfully evading detection.
Implementing two functionality-preserving modifications, Shift and Extend, Demetrio et al. [41] develop the RAMEn attack framework against the MalConv detector. By shifting the content of the first section of the PE file and extending the DOS header, the authors inject a carefully crafted adversarial payload, successfully evading detection.
Sharif et al. [42] introduce functionality-preserving binary diversification techniques for adversarial attacks on malware detection to enhance attack effectiveness and stealthiness. They employ code displacement and in-place randomization to conduct a white-box attack using gradient ascent, ultimately achieving high evasion rates.
Zhao et al. [43] introduce the Heuristic Optimization Integrated Reinforcement Learning Attack (HRAT), a code-level structural attack against graph-based Android malware detection. HRAT involves subtle modifications to Function Call Graphs (FCGs), including node deletion, insertion, and edge manipulation.
3) Feature-Space Black-Box Attacks
Hu and Tan [44] propose the MalGAN attack against an API call list-based PE malware detector in a black-box setting by training a substitute model. Adversarial examples are generated by appending irrelevant API calls to the original malware samples. Kawais et al. [45] extend MalGAN to Improved-MalGAN, addressing limitations of the original version by using different API call lists to train MalGAN and the substitute detector.
In another study, Hu and Tan [46] devise a generative model to evade RNN-based PE malware detectors. They generate spurious API call sequences using a generative RNN and insert them into the API call sequence of the original malware. A similar strategy is employed by Rosenberg et al. [47] in an attack named GADGET, which targets detectors trained on API call sequences. Utilizing the transferability property, GADGET first trains a surrogate model, conducts a white-box attack, and then heuristically uses the generated adversarial API call sequences to evade the target detector. Subsequently, Rosenberg et al. [48] propose a similar attack framework named BADGER, which limits the number of queries made to the target detector.
In [49], Zhang et al. introduce SRL, a functionality-preserving reinforcement learning-based attack against graph-based (CFG) PE malware detectors. This attack employs a reinforcement learning agent to iteratively select semantic NOPs for insertion into the CFG blocks of the original malware until the generated adversarial samples successfully evade the target detector.
4) Problem-Space Black-Box Attacks
This category represents the most realistic and challenging adversarial attacks, as they are completely agnostic to specific malware detectors. Black-box attacks in the problem space often use strategies like heuristic algorithms, evolutionary algorithms, reinforcement learning, and GANs. For example, Anderson et al. [26], [50] employ reinforcement learning in their Gym-malware attack framework to automatically generate functionality-preserving adversarial examples that deceive static malware detectors and antivirus engines. Gym-malware’s success inspires further research [51], [62], [63]. Some studies expand the action space [51], while others reduce it and use deterministic sequence selection to improve effectiveness and stealth [62], [63].
Castro et al. [55] introduce ARMED (Automatic Random Malware Modifications), which employs random algorithms to apply nine functionality-preserving modifications from [50] and [26] to malware samples until evasion is achieved. They assess the functionality of the resulting adversarial samples using the Cuckoo sandbox. Similarly, Chen et al. [64] generate adversarial examples by randomly appending blocks of data from benignware to malware, successfully evading the MalConv [7] detector.
Some attacks use evolutionary algorithms, such as AIMED by Castro et al. [53], which applies nine format-preserving modifications using genetic programming. AIMED iteratively modifies the malware binary, achieving a 50% speed increase over randomization. Similarly, the MDEA uses a genetic algorithm to generate adversarial examples with ten functionality-preserving modifications [54], while GAMMA [38] employs the same strategy to modify malware files through section injection and padding.
Yuan et al. [56] propose the GAPGAN framework, which utilizes Generative Adversarial Networks (GANs) to deceive the MalConv [7] detector. The framework trains a generator and discriminator to create adversarial payloads appended to malware samples. The discriminator simulates a black-box attack, achieving up to a 100% evasion rate. Similarly, Zhong et al. [57] develop MalFox using a Convolutional GAN to generate adversarial samples that preserve the original functionality of malware and evade detection by antivirus engines.
Lucas et al. [58] introduce a black-box adversarial attack using binary diversifications, such as in-place randomization and code displacement. Unlike the white-box version [42], this method uses a hill-climbing algorithm and accepts transformations only if the benign probability increases after querying the model.
Chen et al. [52] extract and modify APK source code by injecting non-executable code and repackaging it, altering features like permissions, API calls, and CFG structure to evade detection. Similarly, Bostani et al. [61] use payload injection in malware samples to deceive Android malware detectors.
From the reviewed literature, it is evident that few adversarial attacks specifically target IoT malware detection. Most rely on padding and code injection methods, which can be easily identified and filtered before classification. In many studies, these attacks are conducted in the feature space and assume white-box access, which is less realistic in real-world scenarios. As discussed above, only two papers on PE malware detection have explored binary diversification in this context [42], [58]. When implemented correctly, binary diversification preserves the original functionality of the binary while modifying functional parts, making it stealthier and more challenging to defend against. Therefore, we employ binary diversification to manipulate the structural properties of binaries and evade detection.
C. Binary Diversification Techniques
Binary diversification, designed to enhance security against attacks like code reuse, injection, and memory corruption [65], [66], [67], involves creating multiple program versions with identical functionality. Lucas et al. [58] and Sharif et al. [42] pioneered its application in adversarial contexts, using semantic-preserving modifications like in-place randomization and code displacement to bypass raw byte-based PE malware detection. Building on this, we propose an attack framework that uses advanced binary diversification to evade IoT graph-based malware detectors. Unlike Lucas et al.’s method, which relies on instruction-level changes, our approach incorporates structural binary manipulations, such as function inlining, branch function insertion, control flow graph flattening, basic block merging, and basic block reordering [65], [66] (discussed in Section III-C). Additionally, we leverage explainable AI (XAI) and a greedy algorithm, differentiating our strategy from Lucas et al.’s [58] use of reinforcement learning.
Proposed Method
In this section, we present the proposed attack framework, detailing the system model, feature importance analysis, action set, and adversarial example generation algorithm. Figure 1 illustrates the workflow, consisting of four modules that will be discussed in detail later in this section.
The proposed attack framework: AE denotes Adversarial Example, ‘BB_’ prefixes indicate CFG-based features, and ‘F_’ prefixes represent FCG-based features.
A. System Model
1) Threat Model
Our attack scenario assumes the adversary has black-box access to the target detector, meaning they can only receive the prediction confidence that a file is benign or malicious after querying the model. The goal is to use binary diversification to modify malware samples in the problem space until they are misclassified as benign by the target structural detector while preserving their malicious functionality. With limited black-box access, we build a multi-structural substitute detector trained on control flow graph and function call graph features, execute the attack, and transfer it to the target detector.
2) Problem Formulation
In this paper, \begin{equation*} \tau : \mathcal {X} \mapsto \mathcal {Z} \subseteq \mathbb {R}^{n}. \tag {1}\end{equation*}
\begin{align*} & \tilde {z} = \tau (x + \delta) = \tau (\tilde {x}), \\ & \tilde {z} \in \mathcal {Z}, \text {and}~ \tilde {x} \in \mathcal {X}, \tag {2}\end{align*}
To effectively execute the attack, we employ the SHAP [20] algorithm from explainable AI (XAI) to identify the most influential features for the detector \begin{equation*} \gamma (\mathbb {D}, z_{i}) = w_{i} = \left [{{w_{i,0}, w_{i,1}, \cdots, w_{i,j}, \cdots, w_{i,m} }}\right ], \tag {3}\end{equation*}
\begin{equation*} \frac {1}{n}\sum _{i=0}^{n-1} \left |{{w_{i,j}}}\right | \geq \frac {1}{m}\sum _{k=0}^{m-1} \left ({{\frac {1}{n}\sum _{i=0}^{n-1} \left |{{w_{i,k}}}\right |}}\right), \tag {4}\end{equation*}
Next, we apply binary-level modifications that specifically target these influential features to deceive the detector into classifying a malicious file as benign. This approach focuses on manipulating the most critical features. Our attack strategy is designed to work seamlessly within the problem space, preserving the original functionality of the sample while enhancing the attack’s imperceptibility.
3) Feature Set
To train the substitute detector, we extract structural features at both the basic block and function levels. Using Radare2 [69], we derive function call graphs (FCGs) from all training binaries and compute various graph properties with NetworkX [70], including nodes, edges, density, connected components, reciprocity coefficient, and the minimum, maximum, and mean values of closeness centrality, betweenness centrality, degree centrality, and shortest path. For basic block-level features, we use the Angr framework [68] to extract control flow graphs (CFGs) and compute the same set of graphical features as for FCGs. In total, we generate 34 features from both CFGs and FCGs (see Table 2) to train the substitute detector, referred to as a multi-structural detector. Preliminary experiments indicate that training on both CFG and FCG features yields a more robust detector compared to training on either feature set alone.
B. Feature Importance Analysis
The foundation of our imperceptible adversarial attacks is depicted in part (a) of Figure 1. After training the substitute detector, we use the SHAP [20] technique to analyze feature importance and understand the correlation between each feature and the model’s prediction results. Figure 2 shows the distribution of SHAP values for the top 20 features, enabling an intuitive analysis of the predictions. Each row represents the distribution of a feature’s SHAP values across all samples, with higher-ranked features having more influence. The color intensity of each point, representing a test dataset sample, indicates its corresponding SHAP value. We select the top 12 influential features to target for modification using binary diversification techniques. These features include six from the FCGs:
The SHAP value distribution of testing dataset top 20 features when label = malicious (RF).
Cent, BB_density,
The SHAP analysis results, illustrated in Fig. 2, provide several key insights. Notably, the model is more likely to classify a sample as malicious when the FCG features
At the basic block level, features such as
C. Action Set
Based on the feature importance analysis discussed above, we develop five functionality-preserving modifications to alter the binary structure at both the basic block and function levels. These modifications utilize binary diversification techniques originally proposed to protect against code-reuse attacks and similar threats [65], [66], [67]. When implemented correctly, these techniques preserve binary semantics, as demonstrated by Wang et al. [67], who generated diverse ELF binary versions using various diversification methods. To mislead structural target detectors, we adopt several techniques employed by Wang et al. [67], including function inlining, branch function insertion, control flow graph flattening, basic block merging, and basic block reordering, which are detailed below.
1) Function Inlining
Function inlining is an optimization technique used in compilers. It involves replacing function calls with the body of the called function (callee) at the call site. To do this, the call instructions are replaced by jump and push instructions to maintain the original semantics. In each iteration, we randomly select a function, excluding the main function, and inline it at its direct call sites if its size is less than 300 bytes. For each inlined function, its return instruction is changed to a jump instruction, targeting the instruction adjacent to the original call site in the caller function. This transformation significantly alters the structure of the function call graph by reducing the number of edges and nodes, thus reducing the values of
2) Branch Function Insertion
Branch function insertion (shown in Fig. 3) is a technique that substitutes jump instructions with function calls to a predefined “branch routine” function, redirecting the control flow to the original jump destination. In each iteration, we randomly select 1% of the jump instructions for conversion into function calls. These calls are directed to simple functions that reroute the flow to the original destination addresses of the jump instructions. This modification, while minimally impacting the size and performance complexity, significantly alters the binary’s structural properties by increasing the number of nodes and edges, thereby achieving the desired effect on features
3) Control Flow Graph (CFG) Flattening
This method, as shown in Fig. 4, transforms a function’s control flow graph into a “switch” structure using dispatcher blocks to redirect execution flow while preserving the program’s functionality [65], [66]. In this study, we avoid obfuscating functions with indirect jumps due to the complexity of determining control flow destinations. Given the high computational cost of CFG flattening, we adopt a conservative approach by flattening only a small, randomly selected subset of functions. Specifically, in each iteration, we randomly select 1% of functions without indirect jumps for CFG flattening. This modification significantly alters the structure of a function’s CFG and achieves the desired effects on features such as
4) Basic Block Merging
Basic block merging consolidates two basic blocks into one, adjusting flow control instructions to preserve semantics. In each iteration, we randomly select five pairs of basic blocks for merging. Each pair must belong to the same function and be directly connected with exactly one incoming and one outgoing connection. This process significantly alters the binary’s structure at the basic block level without introducing significant overhead in size or performance complexity. Block merging achieves the desired outcomes of reducing the values of
5) Basic Block Reordering
Basic block reordering involves changing the relative positions of two or more basic blocks. To maintain functionality, additional control transfers are introduced, which increases the number of edges in the control flow graph. During each iteration, we examine functions with more than three basic blocks and randomly adjust the positions of a selected pair. While this modification increases the number of edges and achieves desired effects on features
D. Adversarial Example (AE) Generation Algorithm
In this subsection, we present the details of the algorithm behind the attack framework depicted in Fig. 1, part (c). The target detector
In each iteration, we start with an ELF malware binary and transform it from the problem space x to the feature space z using the transformation function
It is noteworthy that the disassembly, modification, and reassembly of binaries require careful handling to mitigate potential errors. In our implementation, we utilize an open-source disassembly-reassembly tool proposed by Wang et al. [71], which is specifically designed for the automatic disassembly of executables in a manner that supports their subsequent reassembly into functional binaries.
Experimental Results and Analysis
This section presents the experimental results and analysis. It begins with an overview of the dataset used in our experiments, followed by a detailed evaluation of the detection results for the substitute detector and the four structural IoT detectors [8], [9], [12], [21] used to assess the proposed attack. Next, the efficacy of the structural attack is examined, followed by a transferability analysis of the generated adversarial examples against these four IoT detectors, an adversarial detector [15], and commercial antivirus engines.
A. Dataset
To evaluate the effectiveness of the proposed attack framework, a large-scale dataset comprising 248,276 IoT Executable and Linkable Format (ELF) binary files representing diverse CPU architectures, including x86-64, x86, ARM, SPARC, PowerPC, and MIPS, was compiled. Sample labeling was conducted using VirusTotal [23], leveraging its extensive database of over 70 antivirus software vendors. The final classification of samples was determined by a majority voting criterion based on the VirusTotal detection report, establishing both the class label and the specific malware family associated with each malicious sample.
The dataset comprised 115,823 benign and 132,453 malware IoT ELF files spanning different families, including Mirai, Android, Tsunami, Bashlite, Hajime, Dofloo, Xorddos, and Pnscan. Mirai emerged as the predominant family, underscoring its prevalence within the IoT domain. The dataset was split, with 80% designated for the training set and 20% for the test set.
B. IoT Malware Detection
1) Multi-Structural Substitute Detector
Upon data preparation, we built a multi-structural detector to serve as our substitute detector in the proposed black-box attack. This detector was trained on a comprehensive set of 34 features extracted from both the FCGs and CFGs of the IoT ELF binaries, as explained in section III-A3. We trained and selected the best four ML models, including Random Forest (RF), K-Nearest Neighbors (KNN), Deep Neural Networks (DNN), and Support Vector Machines (SVM), achieving accuracy scores ranging from 95.61% to 98.24%. Detailed results are presented in Table 3.
2) Alasmary Et Al. [9] Malware Detector
To implement the malware detector proposed by [9], we utilized r2pipe, a Radare2 Python API, to extract the FCGs from the binaries [69]. Subsequently, we employed NetworkX [70] to compute various graphical properties of the FCGs as proposed by [9]. In total, 23 features were extracted and used to train RF, KNN, DNN, and SVM machine learning models. We obtained detection results ranging from 87.01% to 97.09%, as detailed in Table 4.
3) Gramac Malware Detector [21]
We also implemented the structural malware detector proposed by [21] to further assess the efficacy of the proposed attack. This detector is based on the caller-callee relationships of sensitive API calls. Specifically, we used radare2 to extract API call graphs and subsequently employed NetworkX to extract various graphical features. Seven features—number of nodes, edges, indegree, outdegree, loops, connected components, and parallel edges—were used to train RF, KNN, DNN, and SVM models. The detection results range from 86.16% to 97.42%, as presented in Table 4.
4) Wu Et Al. [8] Malware Detector
We implemented the malware detector proposed by [8] to further evaluate our proposed structural attack. This detector leverages structural features such as nodes, edges, and density, as well as graph embedding features extracted using Graph2Vec. It enhances function-call graphs by unifying user-defined functions (UDFs) through matching opcode sequences and assigning universal identifiers. RF, KNN, MLP, and SVM models were trained, yielding impressive results, as detailed in Table 4.
5) Li Et Al. [12] Malware Detector
We also retrained the Graph Neural Network (GNN)-based malware detector proposed by Li et al. [12]. This method integrates semantic information from Opcodes with structural information from function call graphs through three modules: an instruction-level module for semantic extraction, a structure-level module using GraphSAGE for graph embeddings, and a classification module with a Multi-Layer Perceptron (MLP) for malware detection. This detector achieved an accuracy of 98.98%, precision of 98.03%, recall of 98.88%, and F1 score of 98.77%.
C. Diversification-Based Adversarial Attack
To evaluate the effectiveness of the proposed attack, we assembled a test set of 544 IoT ELF malware binaries from the x86 CPU architecture. Using the pre-trained substitute detector discussed previously, we generated adversarial examples with Algorithm 1 across Random Forest (RF), K-Nearest Neighbors (KNN), Deep Neural Networks (DNN), and Support Vector Machines (SVM) models. With a minimal attack cost, defined by the number of iterations and the percentage change in binary size, we produced effective adversarial examples. Specifically, the average percentage changes in the size of the modified binaries for KNN, RF, DNN, and SVM are 8.35%, 12.61%, 15.84%, and 22.51%, respectively. Figures 5 and 6 illustrate the variation in evasion rates with changes in the number of iterations and binary size, respectively.
To assess the robustness of each model, we tested the effectiveness of adversarial examples generated by one model on other models within the substitute detector. Our results show that the Support Vector Machine (SVM) model, with the lowest detection rate, is the most robust against adversarial examples from other models. In contrast, despite having high detection rates, the K-Nearest Neighbors (KNN) model is the least robust. Figure 7 illustrates the transferability of adversarial examples generated by one model to others within the substitute detector.
D. Transferability Analysis
1) Evading the Structural Malware Detectors
First, we tested the generated adversarial examples on the detector by Alasmary et al. [9], achieving high evasion rates of up to 99.07%. The SVM and DNN models proved more resilient compared to KNN and RF. Figure 8 shows a heatmap demonstrating how samples generated by the substitute detector models deceive the Alasmary et al. [9] detector.
We also evaluated our generated samples on the Gramac detector [21], achieving evasion rates of up to 100%, with the lowest being 51.88%. The results are detailed in Figure 9.
Similarly, the adversarial examples were successful on the Wu et al. [8] detector, achieving evasion rates of up to 100% with a minimum of 30.30%. Detailed results are shown in Figure 10.
The GNN-based detector by Li et al. [12] proved the most resilient compared to the other detectors. The adversarial examples generated by the SVM model were the most effective, attaining an evasion rate of 74% on the GNN-based detector. Samples generated by the RF, DNN, and KNN models achieved evasion rates of 62.01%, 53.79%, and 30.31%, respectively.
In further experiments, we evaluated how limiting the allowed change in binary size affects the evasion rate. Our results show that generating adversarial examples increases the binary size, potentially impacting performance. Consequently, we restricted the maximum allowable change in binary size to 30% and studied its effect on the evasion rate of the generated samples across the four structural malware detectors. The results, detailed in Table 5, indicate evasion rates exceeding 97% for some detectors.
2) Evading Esmaeili Et Al. [15] Adversarial Detector
Esmaeili et al. [15] generated adversarial control flow graphs (CFGs) by merging the CFGs of selected benign IoT samples with those of the target malware. They then trained a GNN-based adversarial detector to learn the characteristics of benign CFGs, enabling it to identify and filter out adversarial CFGs before classification. We tested the CFGs of our generated adversarial examples on this detector to determine whether they would be flagged as adversarial. The adversarial detector did not flag our adversarial CFGs and misclassified 95.9% of them as benign.
3) Evading Commercial Antivirus Engines
To further evaluate the effectiveness of our attack approach, we submitted the generated adversarial examples to VirusTotal [23] and compared the detection reports with that of the original malware samples. The original malware samples were flagged as malicious by an average of 44.84 antivirus engines. In contrast, the adversarial samples generated by SVM, DNN, RF, and KNN were flagged by 29.00, 28.97, 29.35, and 29.21 engines, respectively. This indicates that more than 15 antivirus engines were deceived by our adversarial examples. Detailed results are shown in Figure 11.
E. Comparison With Existing Similar Work
Additionally, we compared the adversarial CFGs generated by Esmaeili et al. [15] with those of our generated examples. Esmaeili et al. employed an approach similar to GEA [13], focusing on feature-space manipulations rather than generating executable adversarial examples, and argued theoretically that such an attack could be implemented in the problem space. Our analysis shows that their adversarial CFGs introduced significantly more nodes, edges, and instructions than ours, leading to a substantial increase in binary size, as demonstrated in Figure 12.
Conclusion
Despite significant advancements, machine learning-based malware detection systems remain highly susceptible to adversarial attacks that disguise malware as benignware. This study evaluated the robustness of structural IoT malware detectors against such attacks through binary-level manipulations. We introduced a novel, functionality-preserving black-box attack that successfully deceived four structural detectors, an adversarial detector, and several commercial antivirus engines, achieving up to 100% evasion with minimal binary size increase. These findings underscore the urgent need for more resilient and adaptive cybersecurity defenses.
However, our study focused on structural IoT malware detectors, excluding other types of detectors that also merit investigation. Additionally, challenges in the disassembly-reassembly process led to failures with some malware samples. Future work will employ a more advanced disassembly-reassembly tool and expand the scope to assess the robustness of a broader range of detection systems. Furthermore, we plan to explore defense strategies against adversarial attacks on malware detection.