Conferences >2021 IEEE International Confe...

Malicious URL Detection using Logistic Regression

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

One of the major challenges faced by the Internet in the present day is to deal with achieving web security from ever-rising diverse types of threats. Machine learning al...Show More

Metadata

Abstract:

One of the major challenges faced by the Internet in the present day is to deal with achieving web security from ever-rising diverse types of threats. Machine learning algorithms offer promising techniques to detect malicious websites performing unethical anonymous activities on the Internet. Attackers have been found to continuously evolve with updated techniques to attack web users using malicious Uniform Resource Locators (URLs). The main objective of such attacks is to gain financial benefits through acquiring personal information. In the present research, a machine learning (ML)-based approach is proposed to identify malicious users from URL data. An ML model is implemented using Logistic Regression to detect malicious URLs. The data set used in the study is collected from well-known sources like PhishTank, Kaggle.com, and Github.com. Our novel framework is further evaluated against traditional malicious URL models and our results highlight positive steps forward of the proposed approach.

Published in: 2021 IEEE International Conference on Omni-Layer Intelligent Systems (COINS)

Date of Conference: 23-25 August 2021

Date Added to IEEE Xplore: 02 September 2021

ISBN Information:

DOI: 10.1109/COINS51742.2021.9524269

Conference Location: Barcelona, Spain

References is not available for this document.

Contents

I. Introduction

Globally, cybercriminals choose the Internet as the best option for conducting illegal activities [1]. Every year, the rate of malicious Uniform Resource Locator (URL) victims has been increasing¹. According to reports, users lose more than 20 billion Yuan in China from cyberattacks through malicious URLs annually [2]. The basic infrastructure for all such online activities are URLs. A malicious URL is a link that allows users to click and then re-directs them to malicious or fraudulent authentic looking websites. The main objective behind the creation of such pages are extremely negative – such as conducting falsified political agenda, personal and corporate data thefts, and money laundering activities. There are various categories of malicious URLs as follows: •

command - and – control: These types of URLs and domains are used by hacked systems to communicate with the attacker’s remote server to receive malicious commands or falsified data.

•

Malware: These are malicious files or programs extremely harmful to the user. Malware includes computer viruses, worms, Trojan horses and spyware capable of conducting various malicious functionalities such as stealing information, encrypting data, deletion of sensitive data, hijacking of core computing functionalities, and unauthorized monitoring of user activities on computers [3], [4].

•

Phishing: This type of URL hosts phishing pages or performs phishing to extract sensitive personal information. This also includes web content that misleads users to share secured information such as login credentials, account numbers, Personal Identification Numbers (PIN), and credit card information through various social engineering techniques [5].

•

Grayware: This type of URL includes software programs that attempt to secretly collect data from the user and transmit it back to the host. This type of risk consists of adware, internet cookies, surveillance tools, and Trojans. The information collected by grayware includes historical data of previously visited websites, credit card information, social security numbers (SSN), login information, passwords, and various other types of information [6]–[11].

•

Dynamic – Domain Name System (DNS): This includes host and domain names for systems that have dynamically assigned IP addresses. These are used to send malware payload and communication channel (C2) traffic. The dynamic – DNS domains do not pass through the same screening process as normal in case of a registration by a reputable domain registration company. Hence the possibilities of tampering and loss of integrity are increased exponentially [12].

•

New Domains: There exist domains that are newly registered with the sole purpose of ill intention using domain generating algorithms for conducting malicious activity [13].

•

Copyright Breaches: This includes URLs that contain illegal content and also allow the downloading of software that contradicts intellectual property-related policies. These types of activities in general are risky for users. These types of malicious URLs hinder child protection laws. Also in some cases, these URLs allow for the sharing of copyrighted material from the Internet without permission [14], [15].

•

Extremist URLs: These types of URLs promote terrorism, fascism, and extremist racism discriminating against people based on their ethnicity, skin colour, and background [16].

References is not available for this document.

Malicious URL Detection using Logistic Regression

Abstract:

Metadata

Abstract:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Malicious URL Detection using Logistic Regression

Alerts

Abstract:

Metadata

Abstract:

I. Introduction

Authors

Figures

References

Citations

Keywords

Metrics

Footnotes

References

IEEE Account

Purchase Details

Profile Information

Need Help?