Journals & Magazines >IEEE Access >Volume: 12

An Adversarial Attack on ML-Based IoT Malware Detection Using Binary Diversification Techniques

This graphical abstract illustrates the proposed attack framework. The framework generates practical adversarial examples (AEs) by training a substitute detector using fe...

Abstract:

The integration of machine learning (ML) has revolutionized malware detection, enabling accurate identification of subtle distinctions between malware and benignware. As ...Show More

Metadata

Abstract:

The integration of machine learning (ML) has revolutionized malware detection, enabling accurate identification of subtle distinctions between malware and benignware. As the threat landscape continually evolves and new malware strains emerge, conventional signature-based detectors are becoming increasingly inadequate, leading to a growing reliance on ML-based detectors. However, ML-based detection systems are particularly vulnerable to adversarial attacks, where subtle alterations to input samples can deceive detectors into misclassifying malware as benignware, highlighting the need for robustness studies, as such misclassifications can lead to significant damage. To this end, we stage a black-box attack on IoT malware detection systems, specifically targeting structure-based detectors, which are predominant due to their ability to detect malware across diverse CPU architectures in IoT environments. Our strategy employs semantic-preserving binary diversification techniques, including function inlining, branch function insertion, control flow graph flattening, and basic block merging and reordering, to modify malware binaries and evade detection. We train a multi-structural substitute detector (based on a combination of control flow graph and function call graph features) on a large-scale dataset of IoT ELF binaries, achieving detection rates of up to 98.24%. Using explainable AI (XAI), we transfer the attack to four structural target detectors, achieving evasion rates of up to 100% on certain detectors, with an average binary size increase of just 8.35%. The modified samples evade detection by a state-of-the-art adversarial detector and several commercial antivirus engines, highlighting the persistent challenge of defending against adversarial threats and emphasizing the need for enhanced and multi-faceted defense mechanisms.

This graphical abstract illustrates the proposed attack framework. The framework generates practical adversarial examples (AEs) by training a substitute detector using fe...

Published in: IEEE Access ( Volume: 12)

Page(s): 185172 - 185186

Date of Publication: 09 December 2024

Electronic ISSN: 2169-3536

DOI: 10.1109/ACCESS.2024.3513713

Funding Agency:

Contents

CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.

SECTION I.

Introduction

Machine learning (ML) has become integral to modern cybersecurity, marking a breakthrough in the detection of zero-day malware. The success of ML techniques has led cyber defense researchers and antivirus vendors to increasingly adopt these methods to address the evolving landscape of malware variants [1], [2]. ML-based malware detection essentially involves analyzing benign and malicious files, extracting features through static and dynamic analysis, and utilizing these features to train ML models [3], [4], [5], [6]. These detection systems have demonstrated high success rates in identifying both known malware and novel threats [7], [8]. However, ML systems are particularly vulnerable to adversarial attacks, where slight modifications to input samples can deceive detectors into misclassifying malware as benignware, posing severe cybersecurity risks.

Adversarial attacks pose even greater challenges in resource-constrained IoT systems. Machine learning-based IoT malware detection remains less developed compared to its Windows counterpart. The limitations in IoT environments, such as restricted computational resources and diverse CPU architectures, necessitate lightweight and efficient solutions, making direct adaptation of Windows-based techniques difficult [6]. To address these challenges, researchers in IoT malware detection have predominantly relied on structural features, such as control flow graphs (CFGs) and function call graphs (FCGs) [9], [10], [11]. These features are particularly effective for detecting malware across the diverse CPU architectures in IoT systems, as noted by Li et al. [12].

Similarly, adversarial attacks on IoT malware detection are still in their infancy compared to those targeting Windows systems. Most studies on adversarial attacks in the IoT domain focus on payload injections into malware samples and involve feature-space manipulations. For instance, [13], [14] embed graphs from benign samples into the malware CFGs to evade detection. Likewise, Esmaeili et al. [15] propose a GNN-based adversarial detector that involves merging CFGs from benign and malware samples to generate adversarial samples, learning the distribution of benign samples to filter out the adversarial ones. Sandor et al. [16] append extra bytes from malware and benign samples into malware binaries to evade detection, followed by adversarial training to harden the detector. Abusnaina et al. [17] demonstrate that most ML IoT malware detection approaches are vulnerable to simple manipulations like packing, stripping, and padding. Khormali et al. [18] introduce the COPYCAT attack, appending adversarial images to malware for IoT and Windows detection evasion. Ngo et al. [19] utilize reinforcement learning to modify PSI-graphs with dummy vertices and edges, followed by adversarial training to improve the detector robustness. While padding and payload injections can trick some malware detectors, these methods can often be mitigated by removing the padded bytes before classification.

This study evaluates the robustness of ML-based IoT malware detection systems against adversarial attacks, focusing on structural detectors due to their prominence in IoT environments. We introduce a novel semantic-preserving black-box adversarial attack on IoT structural detectors. A multi-structural substitute detector is trained on a large IoT dataset using CFG and FCG graphical features, with Explainable AI guiding binary-level manipulations to induce misclassification. Advanced binary diversification methods—function inlining, branch function insertion, control flow graph flattening, basic block merging, and basic block reordering—are used to modify malware binaries at both the basic block and function levels, successfully evading detection. To our knowledge, this is the first use of these techniques in adversarial attacks on ML-based malware detection. The generated adversarial examples demonstrate high transferability, evading detection by four structural detectors, several commercial antivirus engines, and a recent IoT adversarial detector. Our main contributions are summarized below.

We introduce a novel black-box functionality-preserving adversarial attack to evaluate the robustness of ML-based structural IoT malware detectors. Our approach employs advanced binary diversification techniques, such as function inlining, branch function insertion, control flow graph flattening, basic block merging, and basic block reordering, to modify malware samples and evade detection. Unlike common methods like payload injection and padding, our strategy does not leave obvious signatures, making it more challenging to defend against.
We compile a comprehensive IoT dataset containing over 248,000 Executable and Linkable Format (ELF) binary files from various CPU architectures, including benign and malicious samples from diverse IoT malware families, for our experiments. We then train a multi-structural substitute detector, utilizing both CFG and FCG graphical features, achieving high detection rates of up to 98.27%
Leveraging SHAP (SHapley Additive exPlanations) analysis [20], we execute the attack on the substitute detector, generating practical adversarial examples with minimal attack cost. These samples exhibit high transferability, evading four detectors [8], [9], [12], [21] trained on different structural features, with evasion rates up to 100% and an average binary size increase of just 8.35%. Additionally, the adversarial samples evade a recent IoT adversarial detector [15] and several commercial antivirus engines.

The remainder of the paper is organized as follows: Section II covers related work and background information, Section III presents the proposed methodology, Section IV discusses the experimental results and analysis, and Section V concludes the study.

SECTION II.

Background Information and Related Work

This section reviews background information and related work, including ML-based malware detection, a literature review of adversarial attacks on malware detection, and binary diversification techniques.

A. Machine Learning Malware Detection

Malware detection is critical across various computing platforms, including Windows, Android, and IoT. Considerable efforts have been devoted to effectively detecting malware. Traditional approaches, rooted in signature-based methods, rely on extensive databases of known malware signatures. When a suspicious file is encountered, its signature is compared against those stored in the database. However, this method’s reliance on predefined signatures renders it ineffective against novel and unknown malware variants and inadequate for emerging cybersecurity threats. To overcome these limitations, and inspired by the success of machine learning in other domains, ML models have been adapted for malware detection, demonstrating strong generalization capabilities for identifying new and unseen (zero-day) malware variants [22]. ML-based malware detection comprises three main steps: data collection, feature engineering, and model training and evaluation.

1) Data Collection

This step involves collecting and labeling sufficient malware and benign samples. Labeling is typically done using malware analysis tools like VirusTotal, which detects malware across about 70 modern antivirus engines [23]. However, detection results for the same file may vary across engines. To address this, either the most recognized antivirus engine is chosen, or a voting-based approach is used.

2) Feature Engineering

As machine learning models only operate on numeric inputs, feature engineering is a pivotal step in ML malware detection. It involves extracting intrinsic features from the collected files and converting them into corresponding numeric representations, which are then used to train the models to distinguish between benign and malicious files. In malware detection, features fall into three categories based on their extraction method: static, dynamic, and hybrid [3], [24].

Static features, derived directly from samples without the need for execution, are widely employed in malware detection due to their ease of extraction and effectiveness. For instance, printable strings [1], [11], [24], byte sequences [25], [26], PE/ELF headers [5], [27], and grayscale images [3], [22], [28], [29] have proven effective in detecting Windows, Android, and IoT malware.

Dynamic features involve executing binaries in isolated environments like virtual machines or sandboxes and monitoring runtime statuses of system resources, networks, registries, and files. Metrics such as CPU usage, I/O requests, and memory usage are then used to train malware detectors [30], [31]. File status features, obtained through counting and logging of created, deleted, modified, or accessed files, have also proven effective in malware detection [31].

Hybrid features are extracted through a combination of static and dynamic analysis methods. Hybrid features such as opcodes (n-gram sequence, images, frequency, etc.) [11], [28], [29], function call graphs (FCGs) [8], [10], Control flow graphs (CFGs) [9], [11], and API/system calls (sequence, list, graphs, etc.) [1], [4] have been successfully utilized in malware detection.

3) Model Training and Evaluation

After extracting numeric features, selecting a suitable machine-learning model for malware detection is crucial. Numerous algorithms, including Deep Neural Networks (DNNs), Convolutional Neural Networks (CNNs) [9], [29], Long Short-Term Memory (LSTM) networks, Multi-Layer Perceptrons (MLPs) [8], Graph Neural Networks (GNNs) [12], Support Vector Machines (SVMs) [1], Random Forests (RFs) [8], and Decision Trees (DTs) [27], have been proposed and rigorously evaluated for malware detection. These models exhibit varying success rates depending on different experimental setups and parameter settings.

B. Adversarial Attacks on Malware Detection

Despite recent advancements, ML-based malware detection systems remain inherently vulnerable to adversarial attacks that seek to undermine their decision-making processes [13], [40]. These attacks can be categorized based on the attacker’s space and knowledge level. The attacker’s space categorization includes feature-space attacks, which involve modifications to the input features, and problem-space attacks, which entail modifying real-world inputs like binary executables or source code to deceive the target detector. Based on the attacker’s knowledge, adversarial attacks can be categorized as white-box or black-box attacks. In white-box attacks, the attacker has complete knowledge of the target model, while in black-box attacks, adversaries typically have minimal information, usually only the model’s prediction output. Gray-box attacks fall between these two extremes, with varying levels of knowledge.

From these categorizations, four fundamental types of adversarial attacks are identified in the existing literature and discussed below. While this paper focuses on adversarial attacks in IoT malware detection, this section will also cover related attacks on Windows and Android platforms to provide a comprehensive overview of the relevant literature.

1) Feature-Space White-Box Adversarial Attacks

Esmaeili et al. [15] propose a structural attack on CFG-based IoT malware detectors, similar to the GEA and SGEA frameworks by Abusnaina et al. [13], [14]. Their approach merges control flow graphs (CFGs) from benign samples with target malware CFGs to create adversarial CFGs intended for a graph neural network (GNN)-based detector. They then train an adversarial detector to recognize benign CFG properties and filter out adversarial CFGs before classification.

In another attack on IoT malware detection, Ngo et al. [19] propose a reinforcement learning-based method that performs adversarial attacks on PSI (printable string information) graphs by adding dummy vertices and edges to deceive detectors. They counter these attacks with adversarial retraining.

Kreuk et al. [32] and Suciu et al. [33] successfully execute an adversarial attack against MalConv [7], a prominent raw byte-based Windows malware detector. Kreuk et al. utilize the Fast Gradient Sign Method (FGSM) to append adversarial payloads to the end of the file (append-FGSM) and into the slack regions of the sample (slack-FGSM). Suciu et al. [33] extend this approach by comparing slack-FGSM and append-FGSM, observing that slack-FGSM is more effective than append-FGSM.

Al-Dujaili et al. [35] and Verwer et al. [34] employ FGSM for white-box adversarial attacks on API Call List-based PE malware detectors. These attacks alter the malware’s binary feature vector by flipping bits in the feature space. Verwer et al.’s attack dynamically adjusts the flipped bits based on solution quality, effectively evading detection by adding irrelevant API calls.

Other attacks, such as ATMPA [37], COPYCAT [18], and AMAO [36] are aimed at image-based detectors. In ATMPA, Liu et al. [37] initially convert malware into a grayscale image and then utilize FGSM and C&W to generate adversarial examples. Similarly, COPYCAT by Khormali et al. [18] employs generic adversarial attacks to generate an adversarial image, which is subsequently appended to the original malware image. Park et al. [36] propose the AMAO adversarial attack, wherein a non-executable adversarial image is first generated using off-the-shelf adversarial attacks. They then attempt to maintain functionality by inserting semantic NOPs into the original malware, making it as similar as possible to the generated non-executable adversarial image.

2) Problem-Space White-Box Attacks

Abusnaina et al. [13] introduce Graph Embedding and Augmentation (GEA), a structural adversarial attack on CFG-based IoT malware detectors. GEA induces misclassification by inserting a benign code into the target malware sample, directly modifying its CFG. Subsequently, they propose Sub-GEA (SGEA) [14], which reduces the required embedded graph size for misclassification.

In another study, Abusnaina et al. [17] evaluate the robustness of various machine-learning IoT malware detectors against simple functionality-preserving modifications, such as padding, packing, and stripping. Their findings confirm that these detection systems remain largely vulnerable to such manipulations.

Sandor et al. [16] propose two adversarial strategies for IoT byte-based malware detection: Chunker, which appends chunks of malware to itself, and Disguiser, which embeds malware in benign files. The generated adversarial examples are then used to retrain and harden the target detector.

Kolosnjaji et al. [39] introduce AMB (Adversarial Malware Binary), a gradient-based attack specifically tailored for PE byte-based malware detectors such as MalConv [7]. This method involves appending adversarial bytes, generated via gradient descent, to the end of the original malware binary. Aryal et al. [40] similarly apply gradient-based methods to generate adversarial examples by injecting code into intra-section caves, successfully evading the MalConv detector [7].

Demetrio et al. [38] employ the integrated gradient explainability technique to assess the feature importance of MalConv detector [7]. Realizing MalConv’s reliance on PE header features, they then perform a white-box attack by modifying specific bytes in the PE binary’s DOS header, successfully evading detection.

Implementing two functionality-preserving modifications, Shift and Extend, Demetrio et al. [41] develop the RAMEn attack framework against the MalConv detector. By shifting the content of the first section of the PE file and extending the DOS header, the authors inject a carefully crafted adversarial payload, successfully evading detection.

Sharif et al. [42] introduce functionality-preserving binary diversification techniques for adversarial attacks on malware detection to enhance attack effectiveness and stealthiness. They employ code displacement and in-place randomization to conduct a white-box attack using gradient ascent, ultimately achieving high evasion rates.

Zhao et al. [43] introduce the Heuristic Optimization Integrated Reinforcement Learning Attack (HRAT), a code-level structural attack against graph-based Android malware detection. HRAT involves subtle modifications to Function Call Graphs (FCGs), including node deletion, insertion, and edge manipulation.

3) Feature-Space Black-Box Attacks

Hu and Tan [44] propose the MalGAN attack against an API call list-based PE malware detector in a black-box setting by training a substitute model. Adversarial examples are generated by appending irrelevant API calls to the original malware samples. Kawais et al. [45] extend MalGAN to Improved-MalGAN, addressing limitations of the original version by using different API call lists to train MalGAN and the substitute detector.

In another study, Hu and Tan [46] devise a generative model to evade RNN-based PE malware detectors. They generate spurious API call sequences using a generative RNN and insert them into the API call sequence of the original malware. A similar strategy is employed by Rosenberg et al. [47] in an attack named GADGET, which targets detectors trained on API call sequences. Utilizing the transferability property, GADGET first trains a surrogate model, conducts a white-box attack, and then heuristically uses the generated adversarial API call sequences to evade the target detector. Subsequently, Rosenberg et al. [48] propose a similar attack framework named BADGER, which limits the number of queries made to the target detector.

In [49], Zhang et al. introduce SRL, a functionality-preserving reinforcement learning-based attack against graph-based (CFG) PE malware detectors. This attack employs a reinforcement learning agent to iteratively select semantic NOPs for insertion into the CFG blocks of the original malware until the generated adversarial samples successfully evade the target detector.

4) Problem-Space Black-Box Attacks

This category represents the most realistic and challenging adversarial attacks, as they are completely agnostic to specific malware detectors. Black-box attacks in the problem space often use strategies like heuristic algorithms, evolutionary algorithms, reinforcement learning, and GANs. For example, Anderson et al. [26], [50] employ reinforcement learning in their Gym-malware attack framework to automatically generate functionality-preserving adversarial examples that deceive static malware detectors and antivirus engines. Gym-malware’s success inspires further research [51], [62], [63]. Some studies expand the action space [51], while others reduce it and use deterministic sequence selection to improve effectiveness and stealth [62], [63].

Castro et al. [55] introduce ARMED (Automatic Random Malware Modifications), which employs random algorithms to apply nine functionality-preserving modifications from [50] and [26] to malware samples until evasion is achieved. They assess the functionality of the resulting adversarial samples using the Cuckoo sandbox. Similarly, Chen et al. [64] generate adversarial examples by randomly appending blocks of data from benignware to malware, successfully evading the MalConv [7] detector.

Some attacks use evolutionary algorithms, such as AIMED by Castro et al. [53], which applies nine format-preserving modifications using genetic programming. AIMED iteratively modifies the malware binary, achieving a 50% speed increase over randomization. Similarly, the MDEA uses a genetic algorithm to generate adversarial examples with ten functionality-preserving modifications [54], while GAMMA [38] employs the same strategy to modify malware files through section injection and padding.

Yuan et al. [56] propose the GAPGAN framework, which utilizes Generative Adversarial Networks (GANs) to deceive the MalConv [7] detector. The framework trains a generator and discriminator to create adversarial payloads appended to malware samples. The discriminator simulates a black-box attack, achieving up to a 100% evasion rate. Similarly, Zhong et al. [57] develop MalFox using a Convolutional GAN to generate adversarial samples that preserve the original functionality of malware and evade detection by antivirus engines.

Lucas et al. [58] introduce a black-box adversarial attack using binary diversifications, such as in-place randomization and code displacement. Unlike the white-box version [42], this method uses a hill-climbing algorithm and accepts transformations only if the benign probability increases after querying the model.

Chen et al. [52] extract and modify APK source code by injecting non-executable code and repackaging it, altering features like permissions, API calls, and CFG structure to evade detection. Similarly, Bostani et al. [61] use payload injection in malware samples to deceive Android malware detectors.

From the reviewed literature, it is evident that few adversarial attacks specifically target IoT malware detection. Most rely on padding and code injection methods, which can be easily identified and filtered before classification. In many studies, these attacks are conducted in the feature space and assume white-box access, which is less realistic in real-world scenarios. As discussed above, only two papers on PE malware detection have explored binary diversification in this context [42], [58]. When implemented correctly, binary diversification preserves the original functionality of the binary while modifying functional parts, making it stealthier and more challenging to defend against. Therefore, we employ binary diversification to manipulate the structural properties of binaries and evade detection.

C. Binary Diversification Techniques

Binary diversification, designed to enhance security against attacks like code reuse, injection, and memory corruption [65], [66], [67], involves creating multiple program versions with identical functionality. Lucas et al. [58] and Sharif et al. [42] pioneered its application in adversarial contexts, using semantic-preserving modifications like in-place randomization and code displacement to bypass raw byte-based PE malware detection. Building on this, we propose an attack framework that uses advanced binary diversification to evade IoT graph-based malware detectors. Unlike Lucas et al.’s method, which relies on instruction-level changes, our approach incorporates structural binary manipulations, such as function inlining, branch function insertion, control flow graph flattening, basic block merging, and basic block reordering [65], [66] (discussed in Section III-C). Additionally, we leverage explainable AI (XAI) and a greedy algorithm, differentiating our strategy from Lucas et al.’s [58] use of reinforcement learning.

SECTION III.

Proposed Method

In this section, we present the proposed attack framework, detailing the system model, feature importance analysis, action set, and adversarial example generation algorithm. Figure 1 illustrates the workflow, consisting of four modules that will be discussed in detail later in this section.

FIGURE 1.

The proposed attack framework: AE denotes Adversarial Example, ‘BB_’ prefixes indicate CFG-based features, and ‘F_’ prefixes represent FCG-based features.

Show All

A. System Model

1) Threat Model

Our attack scenario assumes the adversary has black-box access to the target detector, meaning they can only receive the prediction confidence that a file is benign or malicious after querying the model. The goal is to use binary diversification to modify malware samples in the problem space until they are misclassified as benign by the target structural detector while preserving their malicious functionality. With limited black-box access, we build a multi-structural substitute detector trained on control flow graph and function call graph features, execute the attack, and transfer it to the target detector.

2) Problem Formulation

In this paper, $\mathcal {X}$ represents the input space (problem space), which includes ELF binary files. Each binary sample $x \in \mathcal {X}$ is associated with a label $y \in \mathcal {Y}$ , where $\mathcal {Y} = \{0, 1\}$ . Using reverse engineering, we transform each binary sample $x \in \mathcal {X}$ into an n-dimensional feature vector $z \in \mathcal {Z}$ , as defined in (1).

$\begin{equation*} \tau : \mathcal {X} \mapsto \mathcal {Z} \subseteq \mathbb {R}^{n}. \tag {1}\end{equation*}$ View Source

Specifically, by leveraging the Angr [68] and Radare2 [69] frameworks, we extract control flow graphs and function call graphs from the ELF binary files and utilize NetworkX [70] to extract the graphical features described in Section III-A3. Given the restricted black-box access, the adversary is limited to obtaining prediction probabilities from the target detector

$\mathbb {D}: \mathcal {Z} \rightarrow [{0, 1}]$

. The adversary’s primary objective is to modify the malware sample

$x \in \mathcal {X}$

to mislead the target detector

$\mathbb {D}$

into misclassifying the malware as benignware. The proposed attack method is defined as

$\begin{align*} & \tilde {z} = \tau (x + \delta) = \tau (\tilde {x}), \\ & \tilde {z} \in \mathcal {Z}, \text {and}~ \tilde {x} \in \mathcal {X}, \tag {2}\end{align*}$

View Source

where

$\delta$

is the perturbation of the malware samples in the problem space, which entails functionality-preserving modifications achieved through binary diversification techniques discussed in III-C.

To effectively execute the attack, we employ the SHAP [20] algorithm from explainable AI (XAI) to identify the most influential features for the detector $\mathbb {D}$ . The explainability analysis, based on the model’s prediction results, determines the positive or negative contribution of each feature. The contribution of the i-th sample to the predicted probability can be expressed as:

$\begin{equation*} \gamma (\mathbb {D}, z_{i}) = w_{i} = \left [{{w_{i,0}, w_{i,1}, \cdots, w_{i,j}, \cdots, w_{i,m} }}\right ], \tag {3}\end{equation*}$ View Source

where

$w_{i,j}$

represents the j-th feature’s contribution for the i-th sample. To determine the most influential features for the model prediction, we select the target features based on the following constraint:

$\begin{equation*} \frac {1}{n}\sum _{i=0}^{n-1} \left |{{w_{i,j}}}\right | \geq \frac {1}{m}\sum _{k=0}^{m-1} \left ({{\frac {1}{n}\sum _{i=0}^{n-1} \left |{{w_{i,k}}}\right |}}\right), \tag {4}\end{equation*}$

View Source

for all j in the range

$0, 1, \ldots, m$

Next, we apply binary-level modifications that specifically target these influential features to deceive the detector into classifying a malicious file as benign. This approach focuses on manipulating the most critical features. Our attack strategy is designed to work seamlessly within the problem space, preserving the original functionality of the sample while enhancing the attack’s imperceptibility.

3) Feature Set

To train the substitute detector, we extract structural features at both the basic block and function levels. Using Radare2 [69], we derive function call graphs (FCGs) from all training binaries and compute various graph properties with NetworkX [70], including nodes, edges, density, connected components, reciprocity coefficient, and the minimum, maximum, and mean values of closeness centrality, betweenness centrality, degree centrality, and shortest path. For basic block-level features, we use the Angr framework [68] to extract control flow graphs (CFGs) and compute the same set of graphical features as for FCGs. In total, we generate 34 features from both CFGs and FCGs (see Table 2) to train the substitute detector, referred to as a multi-structural detector. Preliminary experiments indicate that training on both CFG and FCG features yields a more robust detector compared to training on either feature set alone.

TABLE 1 A summary of the Literature Review on Adversarial Attacks. The Short Forms in the Table are as Follows: BB for Black Box, WB for White Box, FS for Feature Space, PS for Problem Space, and FP Stands for Functionality Preserving

TABLE 2 Feature Set for Multi-Structural Detector

B. Feature Importance Analysis

The foundation of our imperceptible adversarial attacks is depicted in part (a) of Figure 1. After training the substitute detector, we use the SHAP [20] technique to analyze feature importance and understand the correlation between each feature and the model’s prediction results. Figure 2 shows the distribution of SHAP values for the top 20 features, enabling an intuitive analysis of the predictions. Each row represents the distribution of a feature’s SHAP values across all samples, with higher-ranked features having more influence. The color intensity of each point, representing a test dataset sample, indicates its corresponding SHAP value. We select the top 12 influential features to target for modification using binary diversification techniques. These features include six from the FCGs: F_MaxCloCent, F_MaxShortPath, F_MaxBtwCent, F_MeanShortPath, F_reciprocity, and F_CC, as well as six from the CFGs (basic block level): BB_MaxShortPath, BB_MeanBtwCent, BB_MaxBtw

FIGURE 2.

The SHAP value distribution of testing dataset top 20 features when label = malicious (RF).

Show All

Cent, BB_density, BB_reciprocity, and BB_nodes.

The SHAP analysis results, illustrated in Fig. 2, provide several key insights. Notably, the model is more likely to classify a sample as malicious when the FCG features F_MaxCloCent, F_MaxBtwCent, and F_reciprocity have higher values. Consequently, our strategy involves reducing the values of these features through structural modifications implemented using binary diversification techniques. Conversely, lower values of the features F_MaxShortPath, F_CC, and F_MeanShortPath increase the likelihood of a sample being classified as malicious. Therefore, we aim to increase the values of these features to cause the malware samples to be misclassified as benign.

At the basic block level, features such as BB_nodes, BB_MaxShortPath, BB_MaxBtwCent, as well as BB_reciprocity, exhibit a positive correlation with a sample’s classification as malware. Our strategy, therefore, is to reduce these features’ values, thereby deceiving the target detector into misclassifying malicious samples as benign. Conversely, features such as BB_MeanBtwCent and BB_density show a negative correlation with a sample’s classification as malware. As such, our approach involves increasing the values of these features to mislead the detector into misclassifying malware as benign.

C. Action Set

Based on the feature importance analysis discussed above, we develop five functionality-preserving modifications to alter the binary structure at both the basic block and function levels. These modifications utilize binary diversification techniques originally proposed to protect against code-reuse attacks and similar threats [65], [66], [67]. When implemented correctly, these techniques preserve binary semantics, as demonstrated by Wang et al. [67], who generated diverse ELF binary versions using various diversification methods. To mislead structural target detectors, we adopt several techniques employed by Wang et al. [67], including function inlining, branch function insertion, control flow graph flattening, basic block merging, and basic block reordering, which are detailed below.

1) Function Inlining

Function inlining is an optimization technique used in compilers. It involves replacing function calls with the body of the called function (callee) at the call site. To do this, the call instructions are replaced by jump and push instructions to maintain the original semantics. In each iteration, we randomly select a function, excluding the main function, and inline it at its direct call sites if its size is less than 300 bytes. For each inlined function, its return instruction is changed to a jump instruction, targeting the instruction adjacent to the original call site in the caller function. This transformation significantly alters the structure of the function call graph by reducing the number of edges and nodes, thus reducing the values of F_MaxCloCent, F_MaxBtwCent, and F_reciprocity.

2) Branch Function Insertion

Branch function insertion (shown in Fig. 3) is a technique that substitutes jump instructions with function calls to a predefined “branch routine” function, redirecting the control flow to the original jump destination. In each iteration, we randomly select 1% of the jump instructions for conversion into function calls. These calls are directed to simple functions that reroute the flow to the original destination addresses of the jump instructions. This modification, while minimally impacting the size and performance complexity, significantly alters the binary’s structural properties by increasing the number of nodes and edges, thereby achieving the desired effect on features F_MaxShortPath, F_CC, and F_MeanShortPath.

FIGURE 3.

Branch function insertion.

Show All

3) Control Flow Graph (CFG) Flattening

This method, as shown in Fig. 4, transforms a function’s control flow graph into a “switch” structure using dispatcher blocks to redirect execution flow while preserving the program’s functionality [65], [66]. In this study, we avoid obfuscating functions with indirect jumps due to the complexity of determining control flow destinations. Given the high computational cost of CFG flattening, we adopt a conservative approach by flattening only a small, randomly selected subset of functions. Specifically, in each iteration, we randomly select 1% of functions without indirect jumps for CFG flattening. This modification significantly alters the structure of a function’s CFG and achieves the desired effects on features such as BB_MaxShortPath, BB_MaxBtwCent, BB_MeanBtwCent, and BB_density.

FIGURE 4.

Control flow graph flattening.

Show All

4) Basic Block Merging

Basic block merging consolidates two basic blocks into one, adjusting flow control instructions to preserve semantics. In each iteration, we randomly select five pairs of basic blocks for merging. Each pair must belong to the same function and be directly connected with exactly one incoming and one outgoing connection. This process significantly alters the binary’s structure at the basic block level without introducing significant overhead in size or performance complexity. Block merging achieves the desired outcomes of reducing the values of BB_MaxShortPath and BB_nodes as well as increasing the value of BB_density.

5) Basic Block Reordering

Basic block reordering involves changing the relative positions of two or more basic blocks. To maintain functionality, additional control transfers are introduced, which increases the number of edges in the control flow graph. During each iteration, we examine functions with more than three basic blocks and randomly adjust the positions of a selected pair. While this modification increases the number of edges and achieves desired effects on features BB_MaxShortPath, BB_density, and BB_MeanBtwCent, it can also complicate the graph’s structure, potentially leading to higher performance costs.

D. Adversarial Example (AE) Generation Algorithm

In this subsection, we present the details of the algorithm behind the attack framework depicted in Fig. 1, part (c). The target detector $\mathbb {D}$ takes as input the feature vector z extracted from the binary sample x and outputs a confidence score, $\mathbb {D}(\mathbf {z}) \rightarrow [{0, 1}]$ . If the score exceeds 0.5, the sample is classified as malware; otherwise, it is labeled benign. Our primary goal is to modify the malware sample x using binary diversification techniques discussed in III-C until the target detector classifies it as benign with a confidence score below 0.5, i.e., $\mathbb {D}(\tilde {\mathbf {z}}) \lt 0.5$ . To this end, we design Algorithm 1, a greedy algorithm specially tailored to effectively deceive the target detector while minimizing the attack cost.

Algorithm 1

Diversification-Based Adversarial Attack

Show All

In each iteration, we start with an ELF malware binary and transform it from the problem space x to the feature space z using the transformation function $\tau$ . In particular, we employ the Angr [68] and Radare2 [69] frameworks to extract graphical features (z) from the control flow graph and function call graph derived from the ELF binary. We then feed the extracted feature vector z into the target detector $\mathbb {D}$ to obtain a prediction probability. If the probability is greater than 0.5, indicating that the sample is classified as malware, we disassemble the binary into its assembly code, select an action from the action set $\mathcal {A}$ , and apply the action to the disassembled code. We then reassemble the binary, transform it into the feature vector $\tilde {\mathbf {z}}$ , and evaluate its prediction probability. This process is repeated for all actions in $\mathcal {A}$ , and the action resulting in the lowest prediction probability is selected. If no action reduces the confidence score, the step counter is incremented. To handle stagnation, a maximum stagnation limit $N = 10$ is introduced; if the confidence score does not change in 10 consecutive iterations, the algorithm terminates and moves to the next sample. To facilitate exploration and avoid local minima, a randomized parameter is used during action selection; with probability $\epsilon$ , a random action is chosen from $\mathcal {A}$ instead of the best-performing action. This process continues until the prediction probability of the transformed binary is below 0.5 or the maximum stagnation limit is reached.

It is noteworthy that the disassembly, modification, and reassembly of binaries require careful handling to mitigate potential errors. In our implementation, we utilize an open-source disassembly-reassembly tool proposed by Wang et al. [71], which is specifically designed for the automatic disassembly of executables in a manner that supports their subsequent reassembly into functional binaries.

SECTION IV.

Experimental Results and Analysis

This section presents the experimental results and analysis. It begins with an overview of the dataset used in our experiments, followed by a detailed evaluation of the detection results for the substitute detector and the four structural IoT detectors [8], [9], [12], [21] used to assess the proposed attack. Next, the efficacy of the structural attack is examined, followed by a transferability analysis of the generated adversarial examples against these four IoT detectors, an adversarial detector [15], and commercial antivirus engines.

A. Dataset

To evaluate the effectiveness of the proposed attack framework, a large-scale dataset comprising 248,276 IoT Executable and Linkable Format (ELF) binary files representing diverse CPU architectures, including x86-64, x86, ARM, SPARC, PowerPC, and MIPS, was compiled. Sample labeling was conducted using VirusTotal [23], leveraging its extensive database of over 70 antivirus software vendors. The final classification of samples was determined by a majority voting criterion based on the VirusTotal detection report, establishing both the class label and the specific malware family associated with each malicious sample.

The dataset comprised 115,823 benign and 132,453 malware IoT ELF files spanning different families, including Mirai, Android, Tsunami, Bashlite, Hajime, Dofloo, Xorddos, and Pnscan. Mirai emerged as the predominant family, underscoring its prevalence within the IoT domain. The dataset was split, with 80% designated for the training set and 20% for the test set.

B. IoT Malware Detection

1) Multi-Structural Substitute Detector

Upon data preparation, we built a multi-structural detector to serve as our substitute detector in the proposed black-box attack. This detector was trained on a comprehensive set of 34 features extracted from both the FCGs and CFGs of the IoT ELF binaries, as explained in section III-A3. We trained and selected the best four ML models, including Random Forest (RF), K-Nearest Neighbors (KNN), Deep Neural Networks (DNN), and Support Vector Machines (SVM), achieving accuracy scores ranging from 95.61% to 98.24%. Detailed results are presented in Table 3.

TABLE 3 Substitute Detector Results in %

2) Alasmary Et Al. [9] Malware Detector

To implement the malware detector proposed by [9], we utilized r2pipe, a Radare2 Python API, to extract the FCGs from the binaries [69]. Subsequently, we employed NetworkX [70] to compute various graphical properties of the FCGs as proposed by [9]. In total, 23 features were extracted and used to train RF, KNN, DNN, and SVM machine learning models. We obtained detection results ranging from 87.01% to 97.09%, as detailed in Table 4.

TABLE 4 Structural IoT Malware Detection Results in Percentage (%)

3) Gramac Malware Detector [21]

We also implemented the structural malware detector proposed by [21] to further assess the efficacy of the proposed attack. This detector is based on the caller-callee relationships of sensitive API calls. Specifically, we used radare2 to extract API call graphs and subsequently employed NetworkX to extract various graphical features. Seven features—number of nodes, edges, indegree, outdegree, loops, connected components, and parallel edges—were used to train RF, KNN, DNN, and SVM models. The detection results range from 86.16% to 97.42%, as presented in Table 4.

4) Wu Et Al. [8] Malware Detector

We implemented the malware detector proposed by [8] to further evaluate our proposed structural attack. This detector leverages structural features such as nodes, edges, and density, as well as graph embedding features extracted using Graph2Vec. It enhances function-call graphs by unifying user-defined functions (UDFs) through matching opcode sequences and assigning universal identifiers. RF, KNN, MLP, and SVM models were trained, yielding impressive results, as detailed in Table 4.

5) Li Et Al. [12] Malware Detector

We also retrained the Graph Neural Network (GNN)-based malware detector proposed by Li et al. [12]. This method integrates semantic information from Opcodes with structural information from function call graphs through three modules: an instruction-level module for semantic extraction, a structure-level module using GraphSAGE for graph embeddings, and a classification module with a Multi-Layer Perceptron (MLP) for malware detection. This detector achieved an accuracy of 98.98%, precision of 98.03%, recall of 98.88%, and F1 score of 98.77%.

C. Diversification-Based Adversarial Attack

To evaluate the effectiveness of the proposed attack, we assembled a test set of 544 IoT ELF malware binaries from the x86 CPU architecture. Using the pre-trained substitute detector discussed previously, we generated adversarial examples with Algorithm 1 across Random Forest (RF), K-Nearest Neighbors (KNN), Deep Neural Networks (DNN), and Support Vector Machines (SVM) models. With a minimal attack cost, defined by the number of iterations and the percentage change in binary size, we produced effective adversarial examples. Specifically, the average percentage changes in the size of the modified binaries for KNN, RF, DNN, and SVM are 8.35%, 12.61%, 15.84%, and 22.51%, respectively. Figures 5 and 6 illustrate the variation in evasion rates with changes in the number of iterations and binary size, respectively.

FIGURE 5.

Evasion rate Vs Number of iterations.

Show All

FIGURE 6.

Evasion rate Vs % Change in size.

Show All

To assess the robustness of each model, we tested the effectiveness of adversarial examples generated by one model on other models within the substitute detector. Our results show that the Support Vector Machine (SVM) model, with the lowest detection rate, is the most robust against adversarial examples from other models. In contrast, despite having high detection rates, the K-Nearest Neighbors (KNN) model is the least robust. Figure 7 illustrates the transferability of adversarial examples generated by one model to others within the substitute detector.

FIGURE 7.

Transferability within the substitute detector.

Show All

D. Transferability Analysis

1) Evading the Structural Malware Detectors

First, we tested the generated adversarial examples on the detector by Alasmary et al. [9], achieving high evasion rates of up to 99.07%. The SVM and DNN models proved more resilient compared to KNN and RF. Figure 8 shows a heatmap demonstrating how samples generated by the substitute detector models deceive the Alasmary et al. [9] detector.

FIGURE 8.

Evasion rate on [9] detector.

Show All

We also evaluated our generated samples on the Gramac detector [21], achieving evasion rates of up to 100%, with the lowest being 51.88%. The results are detailed in Figure 9.

FIGURE 9.

Evasion rate on [21] detector.

Show All

Similarly, the adversarial examples were successful on the Wu et al. [8] detector, achieving evasion rates of up to 100% with a minimum of 30.30%. Detailed results are shown in Figure 10.

FIGURE 10.

Evasion rate on [8] detector.

Show All

The GNN-based detector by Li et al. [12] proved the most resilient compared to the other detectors. The adversarial examples generated by the SVM model were the most effective, attaining an evasion rate of 74% on the GNN-based detector. Samples generated by the RF, DNN, and KNN models achieved evasion rates of 62.01%, 53.79%, and 30.31%, respectively.

In further experiments, we evaluated how limiting the allowed change in binary size affects the evasion rate. Our results show that generating adversarial examples increases the binary size, potentially impacting performance. Consequently, we restricted the maximum allowable change in binary size to 30% and studied its effect on the evasion rate of the generated samples across the four structural malware detectors. The results, detailed in Table 5, indicate evasion rates exceeding 97% for some detectors.

TABLE 5 Transferability Across Detectors With 30% Maximum Size Change

2) Evading Esmaeili Et Al. [15] Adversarial Detector

Esmaeili et al. [15] generated adversarial control flow graphs (CFGs) by merging the CFGs of selected benign IoT samples with those of the target malware. They then trained a GNN-based adversarial detector to learn the characteristics of benign CFGs, enabling it to identify and filter out adversarial CFGs before classification. We tested the CFGs of our generated adversarial examples on this detector to determine whether they would be flagged as adversarial. The adversarial detector did not flag our adversarial CFGs and misclassified 95.9% of them as benign.

3) Evading Commercial Antivirus Engines

To further evaluate the effectiveness of our attack approach, we submitted the generated adversarial examples to VirusTotal [23] and compared the detection reports with that of the original malware samples. The original malware samples were flagged as malicious by an average of 44.84 antivirus engines. In contrast, the adversarial samples generated by SVM, DNN, RF, and KNN were flagged by 29.00, 28.97, 29.35, and 29.21 engines, respectively. This indicates that more than 15 antivirus engines were deceived by our adversarial examples. Detailed results are shown in Figure 11.

FIGURE 11.

Detection rate of original and manipulated IoT malware by antivirus engines.

Show All

E. Comparison With Existing Similar Work

Additionally, we compared the adversarial CFGs generated by Esmaeili et al. [15] with those of our generated examples. Esmaeili et al. employed an approach similar to GEA [13], focusing on feature-space manipulations rather than generating executable adversarial examples, and argued theoretically that such an attack could be implemented in the problem space. Our analysis shows that their adversarial CFGs introduced significantly more nodes, edges, and instructions than ours, leading to a substantial increase in binary size, as demonstrated in Figure 12.

FIGURE 12.

Comparison with Esmaeili et al. [15] approach.

Show All

SECTION V.

Conclusion

Despite significant advancements, machine learning-based malware detection systems remain highly susceptible to adversarial attacks that disguise malware as benignware. This study evaluated the robustness of structural IoT malware detectors against such attacks through binary-level manipulations. We introduced a novel, functionality-preserving black-box attack that successfully deceived four structural detectors, an adversarial detector, and several commercial antivirus engines, achieving up to 100% evasion with minimal binary size increase. These findings underscore the urgent need for more resilient and adaptive cybersecurity defenses.

However, our study focused on structural IoT malware detectors, excluding other types of detectors that also merit investigation. Additionally, challenges in the disassembly-reassembly process led to failures with some malware samples. Future work will employ a more advanced disassembly-reassembly tool and expand the scope to assess the robustness of a broader range of detection systems. Furthermore, we plan to explore defense strategies against adversarial attacks on malware detection.

References is not available for this document.

MIT Libraries

MIT Libraries

An Adversarial Attack on ML-Based IoT Malware Detection Using Binary Diversification Techniques

Alerts

Abstract:

Metadata

Abstract:

Funding Agency:

Introduction

Background Information and Related Work

A. Machine Learning Malware Detection

1) Data Collection

2) Feature Engineering

3) Model Training and Evaluation

B. Adversarial Attacks on Malware Detection

1) Feature-Space White-Box Adversarial Attacks

2) Problem-Space White-Box Attacks

3) Feature-Space Black-Box Attacks

4) Problem-Space Black-Box Attacks

C. Binary Diversification Techniques

Proposed Method

A. System Model

1) Threat Model

2) Problem Formulation

3) Feature Set

B. Feature Importance Analysis

C. Action Set

1) Function Inlining

2) Branch Function Insertion

3) Control Flow Graph (CFG) Flattening

4) Basic Block Merging

5) Basic Block Reordering

D. Adversarial Example (AE) Generation Algorithm

Experimental Results and Analysis

A. Dataset

B. IoT Malware Detection

1) Multi-Structural Substitute Detector

2) Alasmary Et Al. [9] Malware Detector

3) Gramac Malware Detector [21]

4) Wu Et Al. [8] Malware Detector

5) Li Et Al. [12] Malware Detector

C. Diversification-Based Adversarial Attack

D. Transferability Analysis

1) Evading the Structural Malware Detectors

2) Evading Esmaeili Et Al. [15] Adversarial Detector

3) Evading Commercial Antivirus Engines

E. Comparison With Existing Similar Work

Conclusion

References