I. Introduction
Globally, cybercriminals choose the Internet as the best option for conducting illegal activities [1]. Every year, the rate of malicious Uniform Resource Locator (URL) victims has been increasing1. According to reports, users lose more than 20 billion Yuan in China from cyberattacks through malicious URLs annually [2]. The basic infrastructure for all such online activities are URLs. A malicious URL is a link that allows users to click and then re-directs them to malicious or fraudulent authentic looking websites. The main objective behind the creation of such pages are extremely negative – such as conducting falsified political agenda, personal and corporate data thefts, and money laundering activities. There are various categories of malicious URLs as follows:
command - and – control: These types of URLs and domains are used by hacked systems to communicate with the attacker’s remote server to receive malicious commands or falsified data.
Malware: These are malicious files or programs extremely harmful to the user. Malware includes computer viruses, worms, Trojan horses and spyware capable of conducting various malicious functionalities such as stealing information, encrypting data, deletion of sensitive data, hijacking of core computing functionalities, and unauthorized monitoring of user activities on computers [3], [4].
Phishing: This type of URL hosts phishing pages or performs phishing to extract sensitive personal information. This also includes web content that misleads users to share secured information such as login credentials, account numbers, Personal Identification Numbers (PIN), and credit card information through various social engineering techniques [5].
Grayware: This type of URL includes software programs that attempt to secretly collect data from the user and transmit it back to the host. This type of risk consists of adware, internet cookies, surveillance tools, and Trojans. The information collected by grayware includes historical data of previously visited websites, credit card information, social security numbers (SSN), login information, passwords, and various other types of information [6]–[11].
Dynamic – Domain Name System (DNS): This includes host and domain names for systems that have dynamically assigned IP addresses. These are used to send malware payload and communication channel (C2) traffic. The dynamic – DNS domains do not pass through the same screening process as normal in case of a registration by a reputable domain registration company. Hence the possibilities of tampering and loss of integrity are increased exponentially [12].
New Domains: There exist domains that are newly registered with the sole purpose of ill intention using domain generating algorithms for conducting malicious activity [13].
Copyright Breaches: This includes URLs that contain illegal content and also allow the downloading of software that contradicts intellectual property-related policies. These types of activities in general are risky for users. These types of malicious URLs hinder child protection laws. Also in some cases, these URLs allow for the sharing of copyrighted material from the Internet without permission [14], [15].
Extremist URLs: These types of URLs promote terrorism, fascism, and extremist racism discriminating against people based on their ethnicity, skin colour, and background [16].