I. Introduction
Cyber security threats and attacks are snowballing in a cumulative way with the current system of interlinked world. Malicious outbreak’s tenacity and perseverance has earned it a spot in the global risk report [1] with natural disasters,climatic changes and other adverse conditions. Attackers use manipulated software binaries for induction of cyberattacks. Every fortnight, about 3,470,074 new malware variants are found [2]. In the last decade, the number of malwares grew over thirteen times [3] from 99.71m in 2011 to 1298.18m in 2021. Most of these malwares had similar trends and patterns for infecting the system but their signatures were dissimilar. On further investigation of the code, it was known that these binaries belong to the general canonical families of malware and have obfuscated to sidestep the conventional methods used in detection systems. To stay in pare with these modified binaries and mitigate their effects, conventional frameworks used by prevention and detection engines have to add certain extra layers of armour to strengthen the defence pattern. The most consistent and swift way for identifying a malware is by using their unique signature with a stream of data bytes. This specific identifier is used to check for similarities in familiar malwares and their families to figure out the nature of the binary file. In case of a newly engineered malware, this method can not be deployed as the new identifier won’t match with any other malware’s kind. In addition, slight change of code without changing the functionalities of the binary will alter the digital signature and hence become undetectable. More- over, with the ever-increasing status of malware, storing and maintaining a separate database with all signatures demands a huge storage repository and efficient searching algorithms, this upsurges the complexity of processing data and affects the system performance [4]. In the past years, research for handling sheer amount of data for detecting malware basedon different methods have been meddled with [5] [6] [7]. As the focus is on differentiating the binary file by dissecting the code’s features, it helps to understand a difference betweena malware and a legit file. Studies [8] [9] [10] gives various approaches to deal with altered binaries by creating modifica- tions at every segment and section of a binary file. They added random padding to produce false taxonomy and used different approaches to detect the malevolent file structure. Since there are no alterations of the actual source code of the PE file, these methods don’t perform well against real world malwares which use advanced packers. Other studies, [11] [12] [13] deal with complex and deep pattern learning to figure out malicious codes in a binary file but still fail to detect the presenceof packers in the binary. To defend the real-world systems from binaries which can perform a metamorphic attack, this paper recommends a comprehensive static malware analyser (CASMA). This tool is capable of predicting the capability of any binary in major operating systems such as Windows, Linux, Mac OS, etc.