Loading [MathJax]/extensions/MathMenu.js
Malicious URL Detection using Logistic Regression | IEEE Conference Publication | IEEE Xplore

Malicious URL Detection using Logistic Regression


Abstract:

One of the major challenges faced by the Internet in the present day is to deal with achieving web security from ever-rising diverse types of threats. Machine learning al...Show More

Abstract:

One of the major challenges faced by the Internet in the present day is to deal with achieving web security from ever-rising diverse types of threats. Machine learning algorithms offer promising techniques to detect malicious websites performing unethical anonymous activities on the Internet. Attackers have been found to continuously evolve with updated techniques to attack web users using malicious Uniform Resource Locators (URLs). The main objective of such attacks is to gain financial benefits through acquiring personal information. In the present research, a machine learning (ML)-based approach is proposed to identify malicious users from URL data. An ML model is implemented using Logistic Regression to detect malicious URLs. The data set used in the study is collected from well-known sources like PhishTank, Kaggle.com, and Github.com. Our novel framework is further evaluated against traditional malicious URL models and our results highlight positive steps forward of the proposed approach.
Date of Conference: 23-25 August 2021
Date Added to IEEE Xplore: 02 September 2021
ISBN Information:
Conference Location: Barcelona, Spain
References is not available for this document.

I. Introduction

Globally, cybercriminals choose the Internet as the best option for conducting illegal activities [1]. Every year, the rate of malicious Uniform Resource Locator (URL) victims has been increasing1. According to reports, users lose more than 20 billion Yuan in China from cyberattacks through malicious URLs annually [2]. The basic infrastructure for all such online activities are URLs. A malicious URL is a link that allows users to click and then re-directs them to malicious or fraudulent authentic looking websites. The main objective behind the creation of such pages are extremely negative – such as conducting falsified political agenda, personal and corporate data thefts, and money laundering activities. There are various categories of malicious URLs as follows:

command - and – control: These types of URLs and domains are used by hacked systems to communicate with the attacker’s remote server to receive malicious commands or falsified data.

Malware: These are malicious files or programs extremely harmful to the user. Malware includes computer viruses, worms, Trojan horses and spyware capable of conducting various malicious functionalities such as stealing information, encrypting data, deletion of sensitive data, hijacking of core computing functionalities, and unauthorized monitoring of user activities on computers [3], [4].

Phishing: This type of URL hosts phishing pages or performs phishing to extract sensitive personal information. This also includes web content that misleads users to share secured information such as login credentials, account numbers, Personal Identification Numbers (PIN), and credit card information through various social engineering techniques [5].

Grayware: This type of URL includes software programs that attempt to secretly collect data from the user and transmit it back to the host. This type of risk consists of adware, internet cookies, surveillance tools, and Trojans. The information collected by grayware includes historical data of previously visited websites, credit card information, social security numbers (SSN), login information, passwords, and various other types of information [6]–[11].

Dynamic – Domain Name System (DNS): This includes host and domain names for systems that have dynamically assigned IP addresses. These are used to send malware payload and communication channel (C2) traffic. The dynamic – DNS domains do not pass through the same screening process as normal in case of a registration by a reputable domain registration company. Hence the possibilities of tampering and loss of integrity are increased exponentially [12].

New Domains: There exist domains that are newly registered with the sole purpose of ill intention using domain generating algorithms for conducting malicious activity [13].

Copyright Breaches: This includes URLs that contain illegal content and also allow the downloading of software that contradicts intellectual property-related policies. These types of activities in general are risky for users. These types of malicious URLs hinder child protection laws. Also in some cases, these URLs allow for the sharing of copyrighted material from the Internet without permission [14], [15].

Extremist URLs: These types of URLs promote terrorism, fascism, and extremist racism discriminating against people based on their ethnicity, skin colour, and background [16].

Select All
1.
Y.-C. Chen, Y.-W. Ma and J.-L. Chen, "Intelligent malicious url detection with feature analysis", 2020 IEEE Symposium on Computers and Communications (ISCC), pp. 1-5, 2020.
2.
F. Alkhudair, M. Alassaf, R. U. Khan and S. Alfarraj, "Detecting malicious url", 2020 International Conference on Computing and Information Technology (ICCIT-1441), pp. 1-5, 2020.
3.
J. McGahagan, D. Bhansali, C. Pinto-Coelho and M. Cukier, "Discovering features for detecting malicious websites: An empirical study", Computers Security, pp. 102374, 2021.
4.
V. Mothukuri, R. M. Parizi, S. Pouriyeh, Y. Huang, A. Dehghantanha and G. Srivastava, "A survey on security and privacy of federated learning", Future Generation Computer Systems, vol. 115, pp. 619-640, 2021.
5.
S. S. M. M. Rahman, L. Gope, T. Islam and M. Alazab, "Intanti-phish: an intelligent anti-phishing framework using backpropagation neural network", Machine Intelligence and Big Data Analytics for Cybersecurity Applications, pp. 217-230, 2021.
6.
K. Tian, G. Tan, B. G. Ryder and D. D. Yao, "Prioritizing data flows and sinks for app security transformation", Computers Security, vol. 92, pp. 101750, 2020.
7.
P. Kumar, R. Kumar, G. Srivastava, G. P. Gupta, R. Tripathi, T. R. Gadekallu, et al., "Ppsf: A privacy-preserving and secure framework using blockchain-based machine-learning for iot-driven smart cities", IEEE Transactions on Network Science and Engineering, 2021.
8.
R. Kumar, R. Tripathi, N. Marchang, G. Srivastava, T. R. Gadekallu and N. N. Xiong, "A secured distributed detection system based on ipfs and blockchain for industrial image and video data security", Journal of Parallel and Distributed Computing, vol. 152, pp. 128-143, 2021.
9.
A. Mubashar, K. Asghar, A. R. Javed, M. Rizwan, G. Srivastava, T. R. Gadekallu, et al., "Storage and proximity management for centralized personal health records using an ipfs-based optimization algorithm", Journal of Circuits Systems and Computers, pp. 2250010, 2021.
10.
C. Iwendi, A. Allen and K. Offor, "Smart security implementation for wireless sensor network nodes", Journal of Wireless Sensor Network, vol. 1, no. 1, 2015.
11.
H. Khan, M. U. Asghar, M. Z. Asghar, G. Srivastava, P. K. R. Maddikunta and T. R. Gadekallu, "Fake review classification using supervised machine learning", Pattern Recognition. ICPR International Workshops and Challenges: Virtual Event, pp. 269-288, January 10–15, 2021.
12.
M. Nabeel, I. M. Khalil, B. Guan and T. Yu, "Following passive dns traces to detect stealthy malicious domains via graph inference", ACM Transactions on Privacy and Security (TOPS), vol. 23, no. 4, pp. 1-36, 2020.
13.
X. Yan, Y. Xu, B. Cui, S. Zhang, T. Guo and C. Li, "Learning url embedding for malicious website detection", IEEE Transactions on Industrial Informatics, vol. 16, no. 10, pp. 6673-6681, 2020.
14.
A. K. Jain, S. R. Sahoo and J. Kaubiyal, "Online social networks security and privacy: comprehensive review and analysis", Complex Intelligent Systems, pp. 1-21, 2021.
15.
A. Garba, A. D. Dwivedi, M. Kamal, G. Srivastava, M. Tariq, M. A. Hasan, et al., "A digital rights management system based on a scalable blockchain", Peer-to-Peer Networking and Applications, pp. 1-16, 2020.
16.
D. Ranathunga, M. Roughan and H. Nguyen, "Verifiable policy-defined networking using metagraphs", IEEE Transactions on Dependable and Secure Computing, 2020.
17.
Z. Abbass, Z. Ali, M. Ali, B. Akbar and A. Saleem, "A framework to predict social crime through twitter tweets by using machine learning", 2020 IEEE 14th International Conference on Semantic Computing (ICSC), pp. 363-368, 2020.
18.
S. Banerjee, T. Swearingen, R. Shillair, J. M. Bauer, T. Holt and A. Ross, "Using machine learning to examine cyberattack motivations on web defacement data", Social Science Computer Review, pp. 0894439321994234, 2021.
19.
W. A. Al-Khater, S. Al-Maadeed, A. A. Ahmed, A. S. Sadiq and M. K. Khan, "Comprehensive review of cybercrime detection techniques", IEEE Access, vol. 8, pp. 137 293-137 311, 2020.
20.
Y.-H. Chen and J.-L. Chen, "Ai@ ntiphish—machine learning mechanisms for cyber-phishing attack", IEICE Transactions on Information and Systems, vol. 102, no. 5, pp. 878-887, 2019.
21.
T. Islam, S. Latif and N. Ahmed, "Using social networks to detect malicious bangla text content", 2019 1st International Conference on Advances in Science Engineering and Robotics Technology (ICASERT), pp. 1-4, 2019.
22.
T. Li, G. Kou and Y. Peng, "Improving malicious urls detection via feature engineering: Linear and nonlinear space transformation methods", Information Systems, vol. 91, pp. 101494, 2020.
23.
G. T. Reddy, M. P. K. Reddy, K. Lakshmanna, R. Kaluri, D. S. Rajput, G. Srivastava, et al., "Analysis of dimensionality reduction techniques on big data", IEEE Access, vol. 8, pp. 54 776-54 788, 2020.
24.
G. Srivastava, N. Deepa, B. Prabadevi and P. K. Reddy M, "An ensemble model for intrusion detection in the internet of softwarized things", Adjunct proceedings of the 2021 international conference on distributed computing and networking, pp. 25-30, 2021.
25.
A. Razaque, M. Aloqaily, M. Almiani, Y. Jararweh and G. Srivastava, "Efficient and reliable forensics using intelligent edge computing", Future Generation Computer Systems, vol. 118, pp. 230-239, 2021.
26.
P. K. R. Maddikunta, G. Srivastava, T. R. Gadekallu, N. Deepa and P. Boopathy, "Predictive model for battery life in iot networks", IET Intelligent Transport Systems, vol. 14, no. 11, pp. 1388-1395, 2020.
27.
A. P, S. S. G, G. Srivastava, P. K. R. Maddikunta and T. R. Gadekallu, "A two-stage text feature selection algorithm for improving text classification", vol. 20, no. 3, 2021, [online] Available: https://doi.org/10.1145/3425781.
28.
S. Agrawal, S. Sarkar, G. Srivastava, P. K. R. Maddikunta and T. R. Gadekallu, "Genetically optimized prediction of remaining useful life", Sustainable Computing: Informatics and Systems, vol. 31, pp. 100565, 2021.
29.
O. K. Sahingoz, E. Buber, O. Demir and B. Diri, "Machine learning based phishing detection from urls", Expert Systems with Applications, vol. 117, pp. 345-357, 2019.
30.
C. Ding, "Automatic detection of malicious urls using fine-tuned classification model", 2020 5th International Conference on Information Science Computer Technology and Transportation (ISCTT), pp. 302-320, 2020.

Contact IEEE to Subscribe

References

References is not available for this document.