Loading [MathJax]/extensions/MathZoom.js
Adversarial Malware Binaries: Evading Deep Learning for Malware Detection in Executables | IEEE Conference Publication | IEEE Xplore

Adversarial Malware Binaries: Evading Deep Learning for Malware Detection in Executables


Abstract:

Machine learning has already been exploited as a useful tool for detecting malicious executable files. Data retrieved from malware samples, such as header fields, instruc...Show More

Abstract:

Machine learning has already been exploited as a useful tool for detecting malicious executable files. Data retrieved from malware samples, such as header fields, instruction sequences, or even raw bytes, is leveraged to learn models that discriminate between benign and malicious software. However, it has also been shown that machine learning and deep neural networks can be fooled by evasion attacks (also known as adversarial examples), i.e., small changes to the input data that cause misclassification at test time. In this work, we investigate the vulnerability of malware detection methods that use deep networks to learn from raw bytes. We propose a gradient-based attack that is capable of evading a recently-proposed deep network suited to this purpose by only changing few specific bytes at the end of each mal ware sample, while preserving its intrusive functionality. Promising results show that our adversarial malware binaries evade the targeted network with high probability, even though less than 1 % of their bytes are modified.
Date of Conference: 03-07 September 2018
Date Added to IEEE Xplore: 02 December 2018
ISBN Information:

ISSN Information:

Conference Location: Rome, Italy

I. Introduction

Detection of malicious binaries still constitutes one of the major quests in computer security [22]. To counter their growing number, sophistication and variability, machine learning-based solutions are becoming increasingly adopted also by anti-malware companies [13]. Although past research work on binary malware detection has explored the use of traditional learning algorithms on n-gram-based, system-call-based, or behavior-based features [1], [19], [21], [26], more recent work has considered the possibility of using deep-learning algorithms on raw bytes as an effective way to improve accuracy on a wide range of samples [18]. The rationale is that such algorithms should automatically learn the relationships among the various sections of the executable file, thus extracting a number of features that correctly represent the role of specific byte groups in specific sections (e.g., if a byte belongs to the code section or simply to a section pointer).

References

References is not available for this document.