Journals & Magazines >IEEE Access >Volume: 10

Android Ransomware Detection From Traffic Analysis Using Metaheuristic Feature Selection

This article proposes a novel feature selection method using particle swarm optimization to detect Android ransomware from traffic analysis. The proposed method detects b...

Abstract:

Among the prevalent cyberattacks on Android devices, a ransomware attack is the most common and damaging. Although there are many solutions for detecting Android ransomwa...Show More

Metadata

Abstract:

Among the prevalent cyberattacks on Android devices, a ransomware attack is the most common and damaging. Although there are many solutions for detecting Android ransomware attacks, existing solutions have limited detection accuracy and high computational complexity. This paper proposes a new Android ransomware detection method based on traffic analysis to address the limitations. We exploit particle swarm optimization (PSO) to select traffic characteristics. Then, based on the selected traffic features, we classify the data traffic using decision tree and random forest classifiers. We examine ransomware cyberattacks in two distinct circumstances. In the first case, we find ransomware traffic; in the second, we locate a specific form of malware traffic among benign traffic. The proposed PSO-assisted feature selection enables the classifier to improve the detection accuracy significantly. The random forest is found to achieve the highest performance in detecting ransomware, whereas the decision tree is the best for detecting the types of ransomware. The accuracy improvements are 2.26% and 3.7% in the first and second scenarios, respectively. The proposed method removes 56.01% to 91.95% of the features. The proposed method convergences quickly as the optimization reaches an optimum value of about ten iterations.

This article proposes a novel feature selection method using particle swarm optimization to detect Android ransomware from traffic analysis. The proposed method detects b...

Published in: IEEE Access ( Volume: 10)

Page(s): 128754 - 128763

Date of Publication: 08 December 2022

Electronic ISSN: 2169-3536

DOI: 10.1109/ACCESS.2022.3227579

Funding Agency:

Contents

CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.

SECTION I.

Introduction

With our increasing reliance on mobile devices, people keep valuable information in their devices and online storage. The wicked people get lured by the potential access to confidential information mainly for illicit financial gain, and other purposes [1]. Such wicked people carry out cyber-attacks to steal that information or hamper the regular usage of the devices. One of the most common attacks on mobile devices is the ransomware attack, in which an attacker either encrypts the data on the devices or locks the devices, thereby preventing legitimate users from accessing the data or devices. Since Android is the most popular operating system in the world of mobile devices [2], these devices are the common target of ransomware attacks. An effective ransomware attack detection technique should be devised to ensure the security of our Android device. Next, innovative approaches are discussed for detecting Android malware. In [3], the authors propose a permission-based malware detection method to differentiate between benign and malware applications in the case of an unknown signature. This method involves two steps: feature selection using information gain and classification using Naïve Bayes, random forest, and decision tree (J48). In [3], the author uses the Genome Project dataset with 100 newly downloaded benign and 100 malware. The authors then use the APK file to extract permission. The random forest gives a higher accuracy of 89.8% than the others. The true positive rate (TPR) and false positive rate (FPR) are 0.890 and 0.110, respectively. In [4], the authors propose to find the best feature selection set with the best classifier for detecting Android malware. A dataset of 30 malicious and 30 benign applications is created. Four sets of features are selected on the basis of four feature selection algorithms. Then, five classifiers named Naïve Bayes, k-nearest neighbor, decision tree (J48), multi-layer perceptron (MLP), and random forest are trained against the selected features. The MLP classifier with Chi-square and information gain achieves the best overall result, with a TPR of 0.9, FPR of 0.23, and accuracy of 0.83. However, the reduction of 70% features still results in a good performance. In [5], an optimizer named genetic algorithm is used to select the best subset from the feature set. The genetic algorithm’s fitness function is defined so that the chromosome that gives the machine learning-based classifier high accuracy is assigned a higher weight. The author works on 40000 APK files with 99 extracted features. After applying the SVM and MLP classifiers, 33 and 40 features are selected from 99 features using SVM and MLP. In this [6], the author proposes a feature selection based on a self-variant genetic algorithm (SV-GA) for detecting Android malware. The proposed algorithm is compared to the existing genetic algorithm using the UCI dataset [7]. Four classifiers are trained using 88-feature datasets. The SV-GA attains 93.6% accuracy.

A static Android malware detection method is proposed that uses ML classifiers based on the symmetric features of Android malware applications. Three feature selection techniques are used to find a lightweight ML algorithm that can reduce data size and features. In [8], a framework is proposed that uses machine learning to prevent permission-based malware attacks. The authors first extract the features from the Android.apk files and then create a dataset from the extracted features from the Android applications. In feature selection, the $\mathrm {k-}$ best information gain method is used. A random forest is found to outperform other classifiers. A comparative study of various combinations of features and classification algorithms is presented in [9] for classifying Android malware. The feature selection algorithms include gain ratio attribute evaluator, consistency subset evaluator, relief attribute evaluator, and correlation feature selection (CFS) subset evaluator. The random forest with the CFS subset evaluation provides the highest accuracy (94.9%). In [10], the selection of features based on the Linux kernel is proposed to detect malware in the Android 5 or later version. In this study, a total of 59 features have been selected by the authors. The system detects the malware by using the SVM classifier. As a result, 23 features are removed, which gives the best result with the 36 features. In [11], the authors propose a method for detecting malicious Android applications using the feature selection method. Three types of features opcodes, methods, and strings, are selected from each Android file using feature selection techniques. Authors use feature selection methods such as Goodman Kruskals, information gain, and CFS. The CFS adaboost classifier achieves the highest malware detection accuracy of 88.75%. The model is biased toward benign apps due to an imbalanced dataset. In [12], a lightweight system is proposed that combines static and dynamic analysis to detect malware on Android devices. In the PCA-RELIEF feature selection method, principal component analysis (PCA) is used to reduce the dimensions of the features. RELIEF then provides the corresponding weights for each feature. Then the features with higher weights are chosen from the dataset. The Androguard is used for static analysis, and the Droidbox is used for dynamic analysis to extract features. The PCA-RELIEF gets the best accuracy, which is 95.2%. Article [13] proposes a method to protect a network from an adversarial attack. They created adversarial samples to achieve this goal through evolutionary optimization. The adversarial and conventional samples are then used to train a deep-learning model. In [14], an improvised particle swarm optimization (PSO) algorithm and a feature selection technique based on a rough set are proposed to select features in the permission-based detection of Android malware. A new random key encoding method converts a classical continuous PSO to a discrete domain. The performance of the proposed method is evaluated using two datasets: Wang’s repository dataset and another is Contagiodump dataset. The proposed technique outperforms feature selection algorithms such as Pearson correlation, information gain, gain ratio, chi-square, and so on. Article [15] introduces a new method to detect ransomware using an evolutionary-based machine learning approach. The SVM algorithm is used for classification. The dataset is generated using API calls. Whereas PSO is used to select features, five oversampling methods balance the datasets. The SMOTE-PSO-SVM approach provides the best result. In [16], the effect of feature selection on malware detection is investigated. In this study, the author works with supervised and unsupervised machine learning. It includes both classification and clustering. The dataset consists of 149 samples, where 68 and 81 samples are malicious and benign, respectively. For dynamic analysis, a cuckoo sandbox is used. In the machine learning phase, both supervised and unsupervised techniques are used. Six supervised learning algorithms are used. In unsupervised learning, the authors use the expectation-maximization (EM) algorithm. The MLP is found to achieve the highest accuracy of 77.18%.

In contrast to most of the above works in which API analysis is considered, a ransomware detection method based on traffic analysis is proposed in [17]. Several feature selection algorithms are used to find the most influential features. A total of 19 features are selected. Then, the long short-term memory (LSTM) is used to classify the ransomware. Only binary classification (ransomware vs. benign traffic) is performed. An accuracy of 97.08% is achieved. Article [18] suggests another method to find ransomware based on traffic analysis. This study investigates the accuracy of finding different ransomware. In each case, a binary classification is performed between a type of ransomware and benign traffic. The random forest is found to achieve the best performance among various machine learning algorithms. A semi-supervised approach is used in [19], where a binary classification between ransomware and benign traffic is performed. In addition, a family classification similar to [18] is also performed. In [17], [18], and [19], CICAndMal2017 dataset is used. Among the studies discussed above, most articles work with API calls to detect ransomware detection. The PSO is used in [15] for feature selection of ransomware detection. However, the number of features is about 200, which incurs a substantial computational complexity. It has yet to explore the field of traffic analysis. The detection of a specific class of ransomware is done in [20]. Decision tree algorithm is used, where hyperparameter tuning is performed using the grid search algorithm. No feature selection is made in [20]. In [21], about 97% precision is obtained with 19 features. However, the flow information, such as the source IP, destination IP, protocol, and timestamp features, are considered. However, these features are experimentally dependent, and the values of these features will not be related to the production network. As a result, using these features gives the experiment a very high degree of accuracy. However, the accuracy of the production network will be much lower. In [18] and [19], the source and destination IP addresses, protocol, and timestamp features are removed to eliminate the experiment dependency. The accuracy of ransomware detection is limited, and feature selection techniques are not applied. Because of the above limitations, we need to find a ransomware detection technique that uses a minimum number of features while providing better accuracy. This paper proposes an Android ransomware detection method from traffic analysis using PSO-based feature selection. We separately select the most influential features for both binary and multiclass classification. An extensive experiment compares the proposed method’s performance with various feature selection methods and classifiers.

SECTION II.

Proposed Ransomware Detection Algorithm

In this chapter, we present the proposed Android ransomware detection system. The system model of the proposed system is shown in Fig. 1. The details of each of the constituents of the proposed system are described in the following.

FIGURE 1.

Block diagram of the proposed Android ransomware detection system.

Show All

A. Dataset

The CICAndMal2017 dataset [22] is generated by the Canadian center for the cyber security of the University of New Brunswick. Besides benign traffic, this dataset contains traffic information for four different types of malware: scareware, adware, ransomware, and SMS malware. They installed the applications in real Android devices to avoid the stealthy features of various malware in an emulator environment. They collected traffic data in three different phases: (i) just after installing the apps, (ii) 15 minutes before rebooting the devices, and (iii) 15 minutes after rebooting the devices. Each of the malware categories has about ten families. For example, the ransomware category comprises ten families: charger, jisut, koler, lockerpin, simplocker, pletor, porndroid, ransomBO, svpeng, and wannalocker. Each instance of the dataset contains 84 features. The detection of Android ransomware is the subject of this study. As a result, we look at the samples that correspond to the ransomware. To do this, we created a new dataset that included both ransomware traffic and benign traffic. In Table 1, we give the number of instances per ransomware class. We merge plenty of ransomware into a single file and add 54161 instances of benign traffic samples captured in 2017 to reduce the imbalance issue in the dataset. Thus, the number of samples in the dataset (ransomware and benign) becomes 402834.

TABLE 1 Number of Instances Per Ransomware Class in the CICAndMal2017 Dataset

In the aggregated dataset, the F1 (Flow ID), F2 (Source IP), F4 (Destination IP), and F7 (Timestamp) features depend on the experimental setup and cannot be matched in a production network. For this reason, we remove these features, as they have no impact on traffic categories, as is done in [18] and [19]. In addition, some features have zero values for all instances. Since the values of the features do not change for the instances, they do not correlate with the class labels. For this reason, we removed those features. The all-zero features include: F38, F39, F40, F52, F56, F57, F63, F64, F65, F66, F67, and F68. In addition, we also removed the F62 feature, as it duplicates the F41 feature. After removing the above features, we have 68 features left in the dataset.

B. Preprocessing

1) Normalization

We normalize the dataset to lessen the dominance of some features with high values relative to other features with low values in the prediction results. In this paper, we use minimax normalization, which is defined as follows:

$\begin{equation*} x_{i}=\frac {x_{i}-x_{\mathrm {min}}}{x_{\mathrm {max}}-x_{\mathrm {min}}}\tag{1}\end{equation*}$ View Source

where

$\mathrm {x_{i}}$

is the value of the

$\mathrm {i^{th}}$

instance of a feature,

$\mathrm {x_{min}}$

and

$\mathrm {x_{max}}$

are the smallest and largest values of the feature.

2) Oversampling

The number of samples per class of ransomware differs significantly, which leads to a severe imbalance in the aggregated ransomware dataset. A machine learning model trained against an imbalanced dataset suffers from the effect of bias toward the classes having a higher number of samples. It predicts more frequently that the class has more samples to achieve higher accuracy. Such a model cannot perform in the production network as efficiently as on the test set. To eliminate the bias effect, we perform an over-sampling of the dataset to produce an equal number of samples in each class. We perform the oversampling by replicating the samples of the classes that have a lower number of samples.

C. Feature Selection

Once the preprocessing is done, we split the dataset into training and testing parts. We select the set of most influential features using the training dataset by applying particle swarm optimization. Then, a machine learning model is trained against the training dataset with the selected features only. After that, we remove unwanted features from the training dataset, keeping only the most influential features. Finally, the trained model is evaluated against the modified test dataset. The proposed method for selecting features uses a metaheuristic algorithm to find the best way to improve the accuracy of classification. We use particle swarm optimization (PSO) as the optimizer. The reasons for using the PSO include its easy implementation, low memory requirement, and high speed of convergence [23]. PSO optimization follows the techniques that various groups of living beings, such as a flock of birds or a school of fish, follow to find their desired target in a search space. Their current directions of searching, the best individual results (so far), influence the movements of the birds in the flock, the best results found by all birds (so far), and a random perturbation. Each bird keeps tracking its own best solution. Then, it exchanges its position with other birds. The exchange of information among birds in a flock enables them to find the desired target. A single bird’s search effort cannot achieve substantial progress in finding the target. However, the flock of birds can find the target. Next, we will describe the use of the PSO algorithm for selecting the best features of the CICAndMal2017 dataset to achieve the highest accuracy with lower computation complexity.

The PSO algorithm for the proposed feature selection is given in Algorithm 1. We use the common PSO proposed in [24]. The algorithm starts with a random population, where the population is the positions of a set of particles. Each particle presents a possible solution where a solution is the subset of features that meet some performance criteria most efficiently. The position is defined in a $N_{f}$ -dimensional space where $N_{f}$ is the number of features. If the dataset consists of $N_{f}$ number of features, the position of a particle will be presented using an $N_{f}$ -dimensional vector. Suppose that we consider $N_{p}$ particles. Then, the population is presented by $\boldsymbol {S}$ , and $S_{i}$ indicates the position of the $i^{th}$ particle. That is, $\boldsymbol {S}$ is a matrix of $N_{p}\times N_{f}$ . The position shows a subset of features for training a machine learning model. The position of each particle is generated randomly and is independent of the positions of other particles. Each particle’s position is generated following the uniform distribution. Afterward, the population is discretized. The lower and upper bounds of each coordinate of the position are ‘0’ and ‘1’, respectively. Then, each coordinate is rounded to its nearest integer. If a coordinate’s value is more than 0.5, it is rounded to ‘1’; otherwise, it is rounded to ‘0’. In this way, the position of a particle is represented as a binary sequence. Repeating this procedure for all particles, we get a binary population where each row of the population matrix represents the position (a binary sequence) of a particle. The ‘0’ coordinate of the position of the particle shows that the respective feature will be excluded from the dataset for training an ML model. However, ‘1’ shows that the corresponding feature will be considered in training an ML model. For example, if $N_{f}=4$ and $S_{1}=[{0 1 1 0}]$ , where $S_{1}$ represents the position of the first particle, the first subset of features includes the second and the third features for training. In contrast, the first and fourth features are excluded from the dataset. Next, the velocity matrix, $\boldsymbol {V}$ , is generated randomly following the uniform distribution. It consists of the velocities of the $N_{p}$ particles, with each particle having $N_{f}$ elements in each velocity vector, $v_{i}$ . We use the notation $v_{i}$ to show a row of the matrix $\boldsymbol {V}$ . Once the population and velocity matrices are initialized, we compute the fitness of each particle, where the following fitness represents the value of the fitness function (also known as the objective function).

$\begin{equation*} F_{i}=qa_{i}+\frac {1-q}{N_{f}^{sel,i}}\tag{2}\end{equation*}$ View Source

where

$F_{i}$

is the fitness of the

$i^{th}$

particle,

$a_{i}$

is the accuracy obtained due to the

$i^{th}$

particle, and

$N_{f}^{sel,i}$

is the number of features selected in the

$i^{th}$

particle. We use multi-objective optimization as we want to achieve a higher detection accuracy with a lower number of features. The parameter

$q$

shows the weight we put on the above-mentioned two objectives. A higher value of

$q$

emphasizes accuracy in maximizing fitness, whereas a lower value of

$q$

emphasizes the number of selected features. To compute the fitness function, we must select a subset of features (in line 2). For each particle, we get a subset of features. Then, the selected features are separated from the dataset, and are used to train an ML classification algorithm. Then, the accuracy of the trained ML classifier is evaluated. In line 4,

$\boldsymbol {N}$

represents the set of features in the dataset. Next, we select the best fitness value for each particle (line 5). Since this is the initialization phase, the best fitness value of a particle is the fitness value computed in line 4. For the

$N_{p}$

particles, we have

$N_{p}$

fitness value with a fitness value for each particle (calculated in line 4). We select the highest fitness (called the best global fitness value),

$F^{gb}$

, in line 6. Then we find the position (called the best global particle),

$P^{gb}$

, of the particle that provides the highest fitness. where

$F_{i}$

is the fitness of the

$i^{th}$

particle,

$a_{i}$

is the accuracy obtained due to the

$i^{th}$

particle, and

$N_{f}^{sel,i}$

is the number of features selected in the

$i^{th}$

particle. We use multi-objective optimization as we want to achieve a higher detection accuracy with a lower number of features. The parameter

$q$

shows the weight we put on the two objectives, as mentioned above. A higher value of

$q$

emphasizes accuracy in maximizing fitness, whereas a lower value of

$q$

emphasizes the number of selected features. To compute the fitness function, we must first select a subset of features (in line 2). For each particle, we get a subset of features. Then, the selected features are separated from the dataset, and are used to train an ML classification algorithm. We then evaluated the trained ML classifier for accuracy. In line 4,

$N$

represents the set of features in the dataset.

Algorithm 1:

Feature Selection Using PSO

Show All

Next, we select the best fitness value for each particle (line 5). Since this is the initialization phase, the best fitness value of a particle is the fitness value computed in line 4. For the $N_{p}$ particles, we have a $N_{p}$ fitness value with a fitness value for each particle (calculated in line 4). Of them, we select the highest fitness (called the best global fitness value), $F^{gb}$ , value in line 6. Then we find the position (called the best global particle), $P^{gb}$ , of the particle that provides the highest fitness.

An iterative process begins when the global best-fitness value and particle are calculated. For each particle, the velocity vector, $v_{i}$ , is updated using the following equation:

$\begin{equation*} v_{i}=wv_{i}+r_{p} \left ({P_{i}^{pb}-S_{i}}\right)+r_{g} \left ({P_{i}^{gb}-S_{i}}\right)\tag{3}\end{equation*}$ View Source

where

$w$

is the weight of the inertia of a particle’s movement, the higher value of

$w$

causes more exploration, whereas the lower value leads to more exploitation [24]. The notations

$r_{p}$

and

$r_{g}$

are the random numbers between

$[0,c_{p}]$

and

$[0,c_{g}]$

, respectively. The notations

$c_{p}$

and

$c_{g}$

represent the acceleration coefficients toward the particle and global best positions, respectively. The acceleration coefficients give random forces in the respective direction to perturb the direction of the particle. The second term on the right side of (3) is the random weighted difference between the best position (so far), and

${i}^{th}$

is the latest particle position. The larger difference leads to more exploration of the search space. A small value of this term causes the confinement of the particle at a local minimum. On the other hand, the third term of the right side of (3) is the weighted difference between the best position among all particles and the latest position of the

$i^{th}$

particle. The larger value of this term results in more exploitation. A balance between exploration and exploitation can cause a stable optimization process [24]. For this reason,

${c_{p}}$

and

${c_{g}}$

can be assumed to be equal [24]. Then, the position of the

${i^{th}}$

particle is updated by adding its velocity to its current position (line 11). Afterward, the fitness of the new position of the particle is computed. Then, the new fitness is compared with the particle’s best fitness as well as the global best fitness (lines 12-18). If the new fitness is higher than the best fitness of the particle, the new fitness is considered the best fitness of the particle (line 14), and the new fitness that provides the position of the particle is considered the best position of the particle (line 15). Then, the new fitness of the particle is compared against the best fitness of the swarm of particles. If the new best fitness of the particle is higher than the global best of the swarm, the best fitness of the particle is considered as the global best of the swarm (17), and the position of the particle is considered as the closest position (so far) to the target (18). This procedure is performed by updating the velocity vector to update the best global position (if required) for each particle. When the operation is performed for all particles, a new iteration starts, and the above-mentioned procedure is followed. At the end of all iterations, the global best fitness,

$F^{gb}$

, represents the best fitness value which can be achieved using the subset of features defined by the global best particle’s position,

$P^{gb}$

SECTION III.

Experimental Results and Analysis

In this section, we evaluate the performance of the proposed Android ransomware detection method. We use the decision tree classifier to classify traffic, whereas the PSO is used to optimize the system performance. The parameters of the proposed method are given in Table 2. The decision tree is implemented using the default parameters of scikit-learn. In this study, we consider two scenarios: Scenario I: binary classification between ransomware traffic and benign traffic, and scenario II: multiclass classification among 10 types of ransomware traffic and benign traffic.

TABLE 2 Simulation Parameters

We present the proposed ransomware detection method for binary classification in the first case. In this case, the goal is to determine if there is ransomware in the traffic. Before detection, we use a PSO-based feature selection algorithm to select the most important features, improving the detection accuracy and providing a high detection rate. Table 3 shows the selected features. We perform experiments for three different weights, $q$ ; the case $q= 1$ indicates that the emphasis is given to increasing the accuracy, and the number of features selected is completely ignored when calculating the objective function. In this case, the proposed method selects 26 features from 62 features, thereby reducing computational overhead by more than 50%. In the second case, we show the situation where $q=0.85$ . In this case, we see that 13 features are selected. Even though the reduction in feature selection is considered in this case, accuracy is the most important thing. In the third case, the importance of the reduced number of features is increased. The decreased number of features selection policy selects only 8 features. The common features selected in all these three cases include F3, F5, F13, F17, F44, F49, F73, and F74. In multiclass classification, fewer features are selected (see Table 4). The number of features selected for $q=1$ , 0.85, and 0.7 is 23, 8, and 5, respectively. The common features in all three scenarios include F3, F5, F73, and F74. These four features are also common among all cases of binary and multiclass classification scenarios.

TABLE 3 Selected Features for Binary Classification

TABLE 4 Selected Features for Multiclass Classification

We investigated the impact of the proposed method on the fitness function in Fig. 2. We see that the value of $q$ considerably impacts fitness. There is roughly a 0.1 difference in fitness for every 0.15 change in the value of $q$ . After the first iteration, the highest and the lowest fitness of 0.781 and 56 are obtained for $q=1$ and $q=0.7$ , respectively. Fitness starts to increase after the first iteration. The improvement in the fitness value almost stops at about 10 iterations, showing its quick convergence speed. The final fitness values are 0.80, 0.69, and 0.595 for $q=1$ , 0.85, and 0.70, respectively.

FIGURE 2.

Optimization of the fitness with respect to iteration for binary classification.

Show All

The impact of the selection of features on the accuracy is shown in Fig. 3 for binary classification. It is evident that the accuracy improves significantly because of the proposed technique, and the value of $q$ considerably impacts the accuracy. We obtain the highest accuracy when $q=1$ . That is, the proposed method achieves the highest accuracy of 80.14% when the requirement of a reduced number of features is not imposed. The accuracy has to be compromised a bit by imposing the requirement. However, the sacrifice is only 0.47%. The accuracy of the values of ${q}$ at 1, 0.85, and 0.70 are respectively 80.14%, 79.88%, and 79.68%. This little sacrifice (0.47%) of precision leads to a dramatic reduction in the number of features. It is noted that an accuracy of 80.14% can be obtained with a 58.06% reduction of the features, which signifies that sacrificing only 0.47% results in 87.1% feature shedding.

FIGURE 3.

Illustration of the optimization of the accuracy with respect to the iteration for ransomware detection.

Show All

To investigate how the number of selected features is optimized, we can consider Fig. 4. We can see that just after one iteration, the number of selected features is reduced by 41.93%, 46.77%, and 54.84% in the $p=1$ , 0.85, and 0.70 cases, respectively. Although the reduction in the number of features initially occurs rapidly, the number stays steady around the $10^{th}$ iteration. Finally, the reduction in the number of features is 58.06% (26 features), 79.03% (13 features), and 87.1% (8 features), respectively, in the three cases.

FIGURE 4.

Optimization of the number of selected features with the iteration for ransomware detection.

Show All

Next, we investigate the performance of the proposed method for ransomware family classification. There are 11 classes with 10 ransomware types and benign traffic. Fig. 5 shows the optimization of the fitness for several iterations. As is seen, the fitness function improves with iteration. The fitness values are 0.678, 0.596, and 0.53 in $q=1$ , 0.85, and 0.70 cases, respectively. However, the fitness value is comparatively lower if we compare Fig. 5 with Fig. 2 (for binary classification). The reductions are 0.122, 0.094, and 0.065 for the $q=1$ , 0.85, and 0.70 cases, respectively. The reduction might be due to the higher number of classes. In multiclass classification, there exist 11 classes. The higher number of classes leads to more errors in the classification.

FIGURE 5.

Optimization of the fitness with respect to iteration for ransomware family classification.

Show All

The precision of the types of ransomware classification is shown in Fig. 6. The accuracy ranges from 67.1% to 67.9%, which is 12%-13% less than the binary classification. In this scenario, we see that $q=0.85$ gives the highest accuracy. A drop in the accuracy is observed for $q=0.70$ in iteration 8. The decline in accuracy might be the reason for finding a smaller number of features in attaining a little more fitness. Since improving fitness optimization is the main objective of an optimization algorithm, when it encounters an opportunity to enhance fitness by reducing accuracy with a reduced number of features, it takes this path of optimization. However, we see an improvement in precision in the $14^{th}$ iteration, which leads to an accuracy of 67.06%. Fig. 7 shows the gradual reduction in the number of features. We see that the number of features reduced from 62 to 31, 35, and 29 after one algorithm iteration. The number of features becomes stable with a few iterations after the $10^{th}$ iteration. Finally, the number of features selected are 26, 8, and 5, respectively, for $q=1$ , 0.85, and 0.7, thus reducing 58.06%, 87.1%, and 91.94% of the features. Comparing Fig. 7 with Fig. 4, we can say that the multiclass classification of the ransomware can be done faster with the proposed method as this method can reduce the number of features in the multiclass classification compared to the binary classification.

FIGURE 6.

Illustration of the optimization of the accuracy with respect to the iteration for ransomware family classification.

Show All

FIGURE 7.

Optimization of the number of selected features with the iteration for ransomware family classification.

Show All

A performance comparison in terms of the accuracy of the proposed method with four prominent ML algorithms is shown in Fig. 8. We use random forest for binary classification in the proposed feature selection method experiment. In contrast, the decision tree for the multiclass classification as these two classifiers works best without feature selection. The proposed method outperforms all other methods in detecting ransomware (binary classification) and its classes (multiclass classification). In binary classification, the proposed method with random forest achieves 2.26% more accuracy than the random forest (RF) ensemble learning method. In [19], the ransomware detection accuracy is 69.5%, which is 12.08% less than our proposed method. The proposed method performs better than any other method in the detection of the types of ransomware. Whereas the proposed method provides 67.8% accuracy, the decision tree, which offers the best performance among other techniques, provides 64.1% accuracy. Thus, the improvement, in this case, is 3.7%. Although the XGBoost is an ensemble classifier, its performance is not even close to the proposed method.

FIGURE 8.

Performance comparison of the proposed method with various methods.

Show All

We compare the performance of the proposed feature selection method with four benchmark feature selection methods: recursive feature elimination (RFE), Anova-F, Chi-square, and information gain. The decision tree is used as the classifier for all feature selection methods. We select the same number of features in all cases. The accuracy for various feature selection methods for ransomware classification is shown in Table 5. For 26 features, whereas the RFE obtains the highest accuracy of 77.27% among the benchmarks, the proposed method attains 2.87% higher accuracy than the RFE. The gains in accuracy for 13 and 8 features are 2.82% and 2.29%, respectively. In the classification of ransomware types (see Table 6), the gains in accuracy for 23, 8, and 5 are 3.81%, 7.64%, and 10.59%, respectively. When comparing Tables 3 and 5, it is found that among the benchmarks, while the RFE method works best to select a higher number of features, the information gain performs best for a minimal number of features. In Table 7, the performance of the proposed method is compared to that of the existing techniques. The proposed method outperforms the existing procedure for ransomware detection and ransomware types detection.

TABLE 5 Comparison of the Proposed Method With Various Feature Selection Methods for Ransomware Detection [%]

TABLE 6 Comparison of the Proposed Method With Various Feature Selection Methods for Types of Ransomware Detection [%]

TABLE 7 Comparison With Existing Methods [%]

SECTION IV.

Conclusion

In this paper, we proposed a method for detecting Android ransomware using machine learning. The core part of the method uses the nature-inspired optimization algorithm, PSO. In this method, we iteratively selected a subset of the available features to optimize the accuracy and computational overhead of the detection method. We exploited the traffic of the Android devices to train a machine learning classifier. We showed that the proposed method outperformed the conventional classifier regarding optimization variables throughout the experiments. It achieved up to 81.58% accuracy in detecting ransomware attacks. Furthermore, the types of ransomware attacks could be detected with up to 67.8% of accuracy compared to the conventional method. The computational overhead could be significantly reduced by shedding up to 91.94% of the features. The proposed optimization method convergence quickly (by around 10 iterations). Compared to the traditional feature selection methods, the proposed method attained up to 10.59% more accuracy for the same number of features. In this paper, we use the default parameter settings of various classifiers. However, in the future, we plan to employ optimization in two levels. In the first optimization level, we are interested in enhancing the hyperparameters of the classifier. In the second level, we aim to improve the precision of the ransomware detection system.

References is not available for this document.

Android Ransomware Detection From Traffic Analysis Using Metaheuristic Feature Selection

Abstract:

Metadata

Abstract:

Funding Agency:

Introduction