Introduction
DNS translates the human-readable domains to network-layer IP addresses. Moreover, as one fundamental Internet infrastructure, it powers almost all Internet services like email and web. Thus its resilience is often highly concerned. One major threat to DNS resilience is due to its plain-text and connectionless nature: DNS is frequently under attacks like packet manipulation and eavesdropping, which fuels content censorship and access blocking [1]–[3]. We consider the issue from these two perspectives.
Who: Country policies and actions influence the DNS ecosystem. DNS is often used as tool by policymakers to censor/manage/monitor the Internet. Countries have the ability, motivation, and action to manipulate or eavesdrop on the system. In recent years, some countries even “unplug” their network from Internet [4]–[6], which may lead to more serious consequences.
Where: Countries on the resolution path are equally important. Many previous works only consider DNS servers and the network they are located, e.g., topological distribution of authoritative nameservers. It is insufficient since DNS manipulation is generally enacted using Man-In-The-Middle methods [2].
Until now, there was no consensus on how to quantify the real-world impact of a country on the DNS, though the answer is essential in guiding how the Internet should be advanced. Only a few previous studies looked into the geo-location distribution of root servers and TLD servers [7] and the influence of Autonomous System (AS) [8]. Still, the country-level impact cannot be derived from their result. Therefore, the main effort of this work is to collect relevant data and develop a new methodology to assess country-wise importance on DNS, under the consideration of path information.
Achieving such a goal is non-trivial, however. It is impossible to “turn off” and “turn on” a specific country net and learn its precise real-world impact. Although researchers have developed different platforms and client debugging applications for DNS measurement, we found that they cannot be directly applied to our problem (e.g., they are not scalable to analyze the whole DNS infrastructure). Besides, we do not find any evaluation metric about country-wise importance.
A. Our Approach
To address these challenges, we design a new approach named
B. Main Findings
We utilize
C. Contributions
The contributions of this paper are listed as follows:
We propose a new approach
DNSWeight to quantify the importance of a specific country/region to the entire DNS infrastructure.We use
DNSWeight to perform a large-scale measurement study and obtained a suite of insights about DNS reliability in the lens of countries.We will release the source code of
DNSWeight to help other researchers to study related issues.
Background
A. Domain Name System
The domain name system (DNS) is a distributed and hierarchical database. Resource Records (RR) of domain names are stored in the ANS. Figure 1 shows the standard DNS resolution process. When a client requests the resolution of a domain name, the stub resolver will usually perform a DNS request to a recursive resolver (step 1). Then, the recursive resolver iteratively queries root, Top-Level Domain (TLD), Second-Level Domain (SLD), and deeper-level name servers (step 2-4). Finally, the DNS response will be returned to the client (step 5).
In order to make the DNS query function properly, some NS records of a domain name need to be stored in its parent zone. NS records contain domain names of name servers. To reach the name server, one needs to resolve domain names in NS records first. Sometimes, additional glued A or AAAA records are required. On the other hand, some NS records are stored in their authoritative zone. Figure 1 illustrates the two zones for
B. Border Gateway Protocol
The Internet is a decentralized network, consisting of more than 60,000 different interconnected network entities, which are named autonomous systems (ASes). The Border Gateway Protocol (BGP) is designed to broadcast routing and reachability information between ASes. The network routers that running BGP will propagate its IP prefixes and routing information to peers in an iterative way. Particularly, a router propagates and maintains the AS path for each IP prefixes, which represents a routing path from the router towards this prefix with AS information attached. In addition, the origin of this prefix is also included. As shown in Figure 2, for the routers in AS 500, the entry of IP prefix 3.2.1.0/24 has an AS path “
Used as a reference for route selection, an AS path also reveal the path information from an AS to a given IP prefix. We use the BGP data kept by the research projects [11]–[13] to construct the routing path from users to ANS. Section III-C elaborates how we process the BGP data.
C. Network Sovereignty
Network sovereignty refers to the effort of an authority that creates boundaries on its governing network for the purposes like information control [14]. In this post-Snowden era, more countries are seeking network sovereignty [15]. Various countries from East [16]–[18] and West [19] censor web content as an approach towards network sovereignty. While those authorities often argue better network management can be achieved, network sovereignty raises the concerns because of not only the compromise of the open Internet, but also the potential collateral damage to other countries [20]. To the extreme, in 2019, Russia carried out an experiment which unplugs its network from the Internet [4].
While the issue is known, we argue its implication to DNS infrastructure needs to be better studied. As advocated by the Internet community, multiple ANS with broad geographical distribution should be installed by the domain owner. Techniques like IP Anycast automatically distribute users’ DNS requests from one country to another. Taking the example of Russia, unplugging its network could adversely impact the DNS resolution of users outside Russia. There has been no study attempting to quantify a country’s impact, not to mention a “what-if” analysis about the consequence when a country’s network is drastically changed (e.g., what if China blocks all DNS requests from outside). We aim to fill this gap of understanding and present our measurement result in section IV-C.
Methodology
We design a system named
A. Overview
Though we cannot “turn off” or “turn on” a network block to learn its precise impact, we can use the publicly available data from DNS and routers to estimate it.
Therefore, Internet Services Providers (ISPs) or even governments may have various potential impacts to manipulate the DNS traffic. To quantify the likelihood of this kind of threat, we want to measure the “country importance”. It will help us to answer the question that, once a query of a random domain name is issued, what is the possibility that the resolution process could be influenced by a specified country/region? Intuitively, the higher probability indicates higher impact. To this end, a large-scale DNS dataset augmented with routing information has to be gathered and analyzed.
Firstly, we choose authoritative name servers as our measurement targets. We have several considerations over this choice. First, unlike recursive resolvers, authoritative name servers are owners of domain names. A stub resolver could connect to different recursive resolvers under different configurations, but the query will finally go to the authoritative name server of that domain name. Recursive resolvers are often located near stub resolvers because of performance considerations, while the authoritative name servers are distributed in different networks, which are governed by located countries. On the other hand, unlike DNS-over-HTTPS or DNS-over-TLS, the traffic between recursive resolvers and authoritative name servers is still mostly plaintext.
Secondly, in order to obtain a global landscape of country importance parameters, we want to build a comprehensive and representative domain name list and consider all of them in our measurement study. The volume of target domain names is on the order of millions, which may propose a challenge to existing measurement platforms.
Further, due to the country in the middle of the traveling network path of a DNS query may also have the ability to inspect or hijack DNS packets, it requires us not only to premeditate the country importance as the destination one, but also in the middle of the network path. Therefore, two types of importance scores are considered in our approach. One is “destination-wise importance”, which indicates the possibility of a random query going towards a certain country/region. The other is “path-wise importance”, which measures the possibility of a random query going through a country/region. The detailed process is discussed in section III-D.
Note that the impact of recursive resolvers, especially public resolvers, on the DNS ecosystem cannot be ignored, especially from the user’s perspective. A country is able to collect a large amount of user DNS queries from all over the world because it runs a popular and international recursive resolver. However, this paper discusses the impact from the perspective of the name owner, or name servers. The study of recursive resolver could be our future work.
While previous studies have introduced measurement platforms, client-side debugging tools, and data sources for DNS measurement, we found they cannot be easily applied to our setting. Measurement platforms like RIPE Atlas [10], [21] can be leveraged to probe any DNS entity with a selected vantage point, but using them to probe a large number of DNS entities is not scalable. DNS data sources like zone files [22] and passive DNS [23], though they offer a broad view of DNS ecosystem, are not cataloged by countries/regions and the routing information (i.e., the routers and nameservers passed by a DNS request) is not included. In addition, we are not aware of any metric that is tailored to measure country-wise importance on DNS and answer the question we raise at the beginning of this section.
Our approach uses both DNS data and BGP data. On the one hand, we collect information about a large number of name servers through DNS active measurements. On the other hand, domain queries can be wiretapped or even tampered with by the network in the path, so attention needs to be paid to the path, too. As we discussed before, the existing approach cannot achieve large-scale path measurements, therefore we used data from BGP to measure the resolution path. In general, leveraging the hybrid model that combines active with passive measurements, we are able to obtain network path information of specified authoritative name servers from thousands of global distributed vantage points.
B. ANS Crawler
To quantify the country-wise importance with DNS, we take a collection of ANS associated with popular domains as the measurement target. We set the size of the domain list to be above 1 million, so the majority of users’ DNS requests can be covered. While one can obtain a larger list of domains by downloading TLD zone files (e.g.,
1) Global List
We downloaded domain name lists from Alexa Top Sites and Tranco Lists on March 26th, 2020, to fill the global list. Alexa ranks the popular domains according to the incoming traffic volume [25], and it has been widely used in previous DNS measurement studies [8], [26]–[28]. Tranco is another domain list proposed by Pochat et al. [29] which overcomes the issues like instability and ranking manipulation of the Alexa list. We merge the two lists, which have 1,415,146 unique domains in total.
2) Local List
The global list helps us understand a country’s impact on the global DNS users. However, the impact could vary when the scope is reduced to a region, as the regional users have different preferences in visiting domains. As a comparison, we create a local list of popular domains based on the DNS requests to a large China public resolver. The estimated number of users using the resolver is 10 million by counting the source IP addresses. The list contains 1,048,575 unique domain names ordered by the count of DNS queries within one month of 2020.1 Two lists have only 127,678 domains in common (9.0% in global list and 12.2% in local list).
3) Collecting Nameserver Records
We need to extract the IP addresses of all ANS associated with each domain from the domain lists for our measurement study. To this end, we fetch the Nameserver Resource Records (
For each domain name in the list, we perform a recursive resolution starting from the root server. A chain of responses from different levels of authoritative servers (e.g., root, TLD, and SLD) will be returned and we extract the records related to ANS by looking for the response (termed
Algorithm 1 Parent Zone Crawling
domain name
while
if
break;
end if
end while
return
Following, we extract the child
Algorithm 2 Child Zone Crawling
domain name
while True do
if
break;
end if
end while
return
With this recursive searching process, all ANS with connections to a domain can be identified. Besides, this process might incur high overhead on our crawler and DNS servers. We optimize this process by caching the responses locally. Our key observation is that many domain names share a same group of ANS, usually operated by public name service provider like Cloudflare [8]. Thus, if two domains share a same name server, it doesn’t need to be queried twice for its IP address. Therefore, in our approach, all query responses are cached locally in the process of measurement. Every request is sent only after failed attempt to retrieve the answer from the local cache. This optimization reduces the workload significantly of not only the recursive resolver, but also the target name servers.
We deployed our crawler on a VPS (Virtual private server) in Japan. It takes two days to crawl the global and local lists. For the global list, 1,380,395 (97.5%) names could be resolved. We found 314,270 distinct
When taking a closer look at our data, we found nonnegligible inconsistency between the parent zone and the child zone, which illustrates the necessity of performing ANS crawling across different name servers. We found more than 43% collected domains have more IPv4 addresses in the child zone comparing to the parent zone. About 6% domains have IP addresses in parent zone not belonging to domains’ child zone, which might be dangling records [30]. Our result about
4) Influence of Vantage Point (VP)
Crawling from different parts of the world may lead to different
5) Influence of Temporal IP Churn
In addition to spatial VP choice, the influence of temporal IP churn is also considered. Crawling all domains in the list is still time-consuming and the results might vary at different time and by network conditions. The IP churn related to a domain could lead to inconsistency in the later stage of routes lookup. To estimate the likelihood of such IP churn, we crawl the domains again a month later and found only 1.3% domains have at least one IP address changed and all of the changes are IPv6 addresses. Therefore, we conclude our dataset is stable longitudinally during our experiment.
C. Country Path Finder
With the ANS collected, we obtain their associated country paths, which is defined as all countries that a request passes through from a vantage point to a destination IP address of ANS. To achieve this goal, We first collect the routing data and locate the AS paths related to the studied ANS. With the AS-to-country mapping, we convert each AS path to a country path. Below we elaborate on the three steps.
1) Step One: Collecting Routes
We download BGP tables from routers around the world and extract the routes. Three sources are leveraged: 1) RIPE Routing Information Service [11], 2) RouteViews [12], 3) Isolario Project [13]. The details of the data are listed in Table 2. The routes of the three sources are merged and the inconsistency between routers should be resolved. As such, when multiple routers are observed in an AS, we choose only one router with the largest number of routing entries. IPv4 routers and IPv6 routers are selected separately. We collected snapshots of route tables from the same period of time as ANS crawling. In the end, our dataset contains routes from 654 IPv4 routers and 477 IPv6 routers, covering 679 different ASes.
2) Step Two: Locating AS Path
A BGP route in the downloaded BGP data consists of a path of AS numbers (called AS path) and an entry IP prefix, which specifies how a packet should travel from one router to another. We locate the AS path between a vantage point router (
3) Step Three: Country Path Mapping
After an AS path is located, we map each AS node to a country to construct a country path using an AS-to-Country database, ASRank [33]. The derived path could have identical country nodes consecutively, and we remove the duplicated nodes with a node collapsing process.
As shown in Figure 4, AS path “
4) Limitation of Router Selection
The vantage points used by our study are actually routers affiliated with the three data sources. Each router has a number of routes but the distribution is not even. In addition, the routers of some data sources are concentrated in one region. For example, the data we collected has far more routers in Europe (see Table 3). We divide vantage points by RIR in the later measurement study, which alleviates the issue of uneven distribution of routers and gives us a more accurate understanding of user perspectives on each continent. Also, the routing data is acquired from public shared routing tables, so private peering [35] is not considered in our study. We leave that for our future work.
5) Country-Path & Geographical Path
We use the AS’s owner country to construct a country path, instead of its routers’ residential information. Sometimes, the residing country and owner country differ [36]. For instance, though managed by a German network provider, a router forwarding the traffic could reside in an IXP (Internet exchange point) in the US. We argue that, even if a router is not located in its registered country, it is still managed by the network to which it belongs, and thus influenced by the policies of the country to which the AS belongs. For instance, network surveillance or traffic hijacking could be deployed. In our study, we want to estimate the influence of the country on the DNS system, so we define the country path based on the aforementioned observation. On the other hand, the geographical path of a route is also discussed in the later section IV-E.
D. Importance Calculator
After we obtain the country path from all vantage points to each ANS, we calculate the importance scores for each country (or country-wise importance) within DNS. The country located at the destination and middle of the path will be computed separately.
If a country is important as a destination, it is supposed to have strong influences on the resolution result. To be more specific, if a country’s destination-wise importance is 1, it means that all queries of all domains in the list will eventually go into this country.
On the other hand, if a country is important in the middle of the path, it is more likely to impact the resolution topology, e.g., deciding the next country to receive the DNS requests. If the country’s path-wise importance is 1, it means that all DNS queries in our measurement will go through this country on the path.
The country-wise importance is computed separately for IPv4 and IPv6.
While the choices of importance metrics are abundant, we would like the metric to measure the probability that, on average, a DNS query passes through/to a country towards a domain name. Here, we use
Based on the above requirement, we design three levels of importance. The higher-level importance is composed of the lower-level ones. A set of notations are defined here: we define the list of domains as
1) IP-Level Importance
We use betweenness centrality [37], a concept in graph theory, to reflect the degree of interaction between one country node and others. It is built on the number of shortest paths passing through the node. This metric has also been used by routing studies [28], [38].
For \begin{equation*} r_{dst}(c_{i}, IP_{j}) = \frac {|\{ p_{k} | p_{k} \in \sigma _{IP_{j}}, c_{i} = last(p_{k}) \}|}{|\sigma _{IP_{j}}|}\end{equation*}
Similarly, the path-wise importance could be defined as:\begin{equation*} r_{path}(c_{i}, IP_{j}) = \frac {|\{ p_{k} | p_{k} \in \sigma _{IP_{j}}, c_{i} \in path(p_{k}) \}|}{|\sigma _{IP_{j}}|}\end{equation*}
To sum up, given a country
2) Domain-Level Importance
For a country \begin{align*} s_{dst}(c_{i}, d_{k})=&\frac {1}{|N_{k}|}\sum _{A_{j} \in N_{k}}{\frac {1}{|A_{j}|} \sum _{IP_{m} \in A_{j}} {r_{dst}(c_{i}, IP_{m})}} \\ s_{path}(c_{i}, d_{k})=&\frac {1}{|N_{k}|}\sum _{A_{j} \in N_{k}}{\frac {1}{|A_{j}|} \sum _{IP_{m} \in A_{j}} {r_{path}(c_{i}, IP_{m})}}\end{align*}
Domain-level importance is the average of IP-level importance values calculated for each IP address of ANS. It represents how likely a DNS packet goes through/to country
3) DNS-Level Importance
For a country \begin{align*} t_{dst}(c_{i})=&\frac {1}{|D|}\sum _{d_{k} \in D}{s_{dst}(c_{i}, d_{k})} \\ t_{path}(c_{i})=&\frac {1}{|D|}\sum _{d_{k} \in D}{s_{path}(c_{i}, d_{k})}\end{align*}
It represents how likely a DNS packet goes through/to country
4) Discussion
The BGP dump is the snapshot of the current routing table. If a path in it fails due to some reason, it is highly likely for the router to choose another viable route path. Therefore, instead of all possible routes, we only consider the current path in the snapshot in our approach to reduce the computation overhead and depict current importance. Still, the volume of paths (over tens of millions) presents a representative view of the country’s importance in DNS.
Measurement Result
With the routing data collected about domains in our lists, we assessed the country’s importance in the DNS infrastructure and reported our findings in this section. We first look into the impact of different countries by IPv4 and IPv6 addresses and country’s influence on domain names. Then, we switch the perspective from country to domain and measure the geographical patterns of ANS deployment. Next, we analyze the patterns of country paths and the loops in particular. Finally, we study how the local list will impact the measurement result.
A. IPv4 Country Importance
Maintaining the ANS diversity is important for the robustness of DNS resolution, which is highly advocated by the Internet community. In particular, RFC 1034 requires at least 2 ANS to be maintained for each DNS zone [41] and RFC 2182 asks for geographical and topological diversity of ANS [42]. While the ANS diversity within the zone files has been measured [8], what is the role a country plays and how it impacts DNS resolution are not assessed. The data we collect allows us to answer these questions.
Specifically, we choose the domain names in the global list (1,380,395) and their IPv4 ANS (154,333 in parent zone and 223,154 in child zone) to measure the country’s importance in the IPv4 space. For the routes reaching the ANS, we separate them by the routers’ RIR to more precisely assess the impact based on users’ geo-locations, since the routing paths could differ based on where users initiate DNS request. The 5 RIRs are AfriNIC (Africa), APNIC (East Asia, Oceania, South Asia, and Southeast Asia), ARIN (Antarctica, Canada, parts of the Caribbean, and the United States), LACNIC (the Caribbean and all of Latin America) and RIPE NCC (Europe, Central Asia, Russia, and West Asia). Table 3 lists the statistics of our routers based on RIR. Though our collected paths are unbalanced among different RIRs, we are able to have sufficient paths for each one to obtain meaningful results.
We apply
Our first finding is the United States (US) plays the dominant role in DNS, which echoes with other studies about Internet infrastructures [43], [44]. Not only it serves most ANS (
Secondly, we find that for the different regions, some countries have a greater local impact. For instance, in APNIC, countries/regions like Hong Kong (HK, 0.1233), China (CN, 0.057) and Singapore (SG, 0.038) have a big share in importance score. While in LACNIC, Spain (ES, 0.143), Italy(IT, 0.066) and Brazil (BR, 0.031) play important roles. The reason for such diversity may be due to differences in user interests (large
Thirdly, as shown in Figure 5, some countries such as China and Russia have a large
World HeatMap about country-wise importance (in logarithmic scale) aggregated across RIRs.
B. IPv6 Country Importance
Though the push for broader IPv6 adoption is strong, section III-B shows yet the ANS support of IPv6 is disproportional to IPv4 (e.g.,
Table 5 lists the top 10 countries/regions in a similar way as Table 4. US still tops the overall impact (
Each country’s impact on an RIR differs, with some countries having a much stronger presence than others, and we want to quantify the gap within RIR. To this end, we compute the Gini coefficient [48] among countries, which is the most widely used measure of inequality in economics. It has been leveraged to measure the inequality of social networks, e-commerce, and digital divide as well [49]–[51]. Assume \begin{equation*} G = \frac {\sum _{i=1}^{n} { \sum _{i=1}^{n} {|t_{i}-t_{j}|} }}{2n\sum _{i=1}^{n}{t_{i}}}\end{equation*}
If the Gini coefficient equals 0, it means total equality among countries. On the contrary, if the Gini coefficient equals 1, it means one country has full control of the entire DNS in the region. As shown in Table 6, the Gini coefficients are all relatively high (over 0.9) for both IPv4 and IPv6 of all 5 RIRs, meaning the inequality gap is prominent. Comparing to IPv4, IPv6 has even larger Gini coefficients, which can be explained by the vastly different investment each country spends in IPv6 development. To make DNS infrastructure more robust, such inequality should be addressed with continuous efforts from the Internet community.
C. Country’s Influence on Domains
After examining a country’s impact on the entire DNS, we drill down to the level of the individual domain. A domain could be influenced by a country on several levels. If a domain’s ANS is located in a country, then all requests will be affected no matter where they come from. The resolution of this domain could be disrupted if the country chooses to cut off the links from the Internet. The country on the path can also eavesdrop or manipulate DNS packets when encryption is not enforced, which is still the dominant case [52]. Therefore, we break down a country’s influence on domain names into four levels and measure them separately:
Absolute : All paths to all ANS of a domain are directed towards that country (this domain will not be resolvable after Internet cut-off).Semi-Absolute : Excludingabsolute , a country appears in every path to all ANS of a domain5 (the country has the ability to inspect and manipulate all queries about this domain).Influential : Excluding the above two cases, a country appears in at least one path to domain’s ANS.None : A country does not appear in any path.
We combine all IPv4/IPv6 paths from all 5 RIRs to compute every country’s influence level on every domain in the global list. We find
Next, we want to answer this “what-if” question: when a country isolates itself from the Internet, how much impact will be introduced to the DNS ecosystem? We investigate a list of countries which once was found cutting off the Internet [4], [5], [53]–[55] and their influence levels, some of them are listed in Table 7, too. Among these countries with a history of “Internet cut-off”, China (709,899), Russia (328,389), and India (767,454) are the top 3 in
D. ANS Deployment Patterns
The previous measurement tasks investigate the impact of countries on the resolution paths. In this section, we change the view from country to domain and analyze the preferences of domain owners in installing ANS into the zone files. The deployment pattern of ANS has been measured by previous work [8] and we complement this work by adding another view about countries: we compute country-wise diversity, or the number of countries associated with all ANS, for each domain.
Figure 7a illustrates the country-wise diversity of domain names in the global list (1,380,395 domains) and we combine the IPv4 and IPv6 data. We found 77% domains placed all ANS in one country, which can be vulnerable when the country’s Internet is disrupted.
Country-wise diversity versus the ratio of domains. “MOAS”, “MANS”, “both” and “none” characterize the reason behind the country distribution.
We further characterize the reason behind the country distribution, which is also illustrated in Figure 7a. Two prominent reasons are identified: Multiple Origin AS (MOAS) [56] and multiple ANS (MANS). MOAS happens when an IP of an ANS is announced by multiple ASes, which is usually caused by IP Anycast, load-balancing with multi-homing [57] or misconfiguration of routers [56]. MANS occurs when the domain owner intentionally installs multiple ANS in different countries. We classify a domain with multiple ANS into MOAS, MANS, and both, using BGP and AS-to-Country data. We found MANS is the dominant category when more than two countries are involved and it can be explained by the use of third-party DNS providers. For instance, many domain names are hosted by
Then we look into the details about domain names with dual-stack name resolution (supporting both IPv4/IPv6) [58]. Out of 806,361 domains with at least one IPv6 ANS, we found the country-wise diversity for 27.2% domains is more than 1, and the ratio is higher than domains with only IPv4 ANS (16.4%). Still, the majority (72.8%) of dual-stack domain names have all of IPv4 and IPv6 ANS located in one country. On the other hand, for 17,114 domains, at least one new country not covered by the IPv4 ANS is introduced by its IPv6 ANS, suggesting they use a different set of ANS for IPv4 and IPv6 resolution. Yet, the number of IPv6 ANS is much fewer than IPv4 ANS and we recommend broadening the deployment of IPv6 ANS for more robust IPv6 and overall DNS resolution. Figure 8 compares the distribution of country-wise diversity between dual-stack and pure IPv4 resolution.
CDF of the country-wise diversity for domains supporting dual-stack/pure IPv4 resolution.
The prior measurement is carried out on the global list containing both popular domains and less popular domains. We are interested in whether similar country-wise diversity is observed in the very popular domains. To this end, we select a subset of top 1K domains from the Alexa list and the measurement result is illustrated in Figure 7b. It turns out the top domains are more likely to concentrate their ANS: about 88% domains use one country for ANS, comparing to 77% of all domains in the global list. A possible reason is that these sites prefer self-hosting, or being served by DNS providers that do not support country diversity. When more than one country hosts ANS, MANS is the major reason.
E. Country Path and Country Loop
With Country Path Finder of
Among the country paths, country loops, which travel the same country/region twice, deserve more attention because they could introduce inefficient or privacy-violating routing [59]. Previous research uses BGP or traceroute data to identify the loops solely on the routing plane [14], [59]. Our study extends it to the DNS resolution setting. Specifically, we analyze all 78,904,677 country paths (IPv4 and IPv6 together) collected from 2,035 routers and identify the ones with repeated country/region nodes. In total, 2,997,486 paths are detected, constituting 4.6% of all paths. Then, we select the ones with the same starting and ending country/region node and obtain 1,372,231 paths from 1,274 routers, about 1.7% of all paths. They are considered as the country loops of our interest and we divide them based on the RIR of the country nodes. The numbers are shown in Table 8.
Though the ratio of country loops constitutes only a small portion of all country paths, they persist in 62.6% routers (1,274 out of 2,035). We find that 62.1% of country loops follow the path “ARIN -> RIPE NCC -> ARIN”, suggesting the strong connection between Europe and North America. And some countries such as Poland, Japan, and Australia have over 90% of routers which keep at least one country loop.
To further analyze the reasons behind country loops, we cross-check some of the path data with PeeringDB [36], a database storing the geographic locations of facilities of exchange points. We selected the paths of which every hop has a match in PeeringDB. Then the geo-location of every hop could be inferred from the facility location of the exchange point. Our study find that many country loops (15,636 different paths in our data) are possibly located in a single country, since every hop as well as the start and the end of the path has at least one facility in the same country/region. However, this could still be an issue because, as we discuss in section III-C, a router is managed by its network owner and thus could be influenced by the policy of the country behind it, e.g., under network surveillance. On the other hand, a portion of loops goes across countries if no private peering is assumed. For instance, European countries are well connected to each other, and many loops (202 in our data) have emerged. Frequent data exchange between Hong Kong and Singapore sometimes leads to loops (28 in our data) as well. And we also find some “Canada-US-Canada” loops (157 in our data), which echo previous works [14].
F. Local List
We use the global list for the prior measurement tasks. However, country-wise importance could differ when a different set of domains is inspected. To quantify the impact of domain selection, we switch from the global list to the local list described in Section III-B, which consists of 1,048,575 domains encountered by a public resolver in China. The routes to those domains are collected and we use
In Table 9 we show the top 10 countries based on
In contrast to
In our previous experiments, we treated all domains in the target list equally in order to represent the impact of a region on the entire domain name system. In this setting, a region that can influence more domains receives a greater importance score. On the other hand, we can also consider the different levels of importance of each domain name. As introduced in section III-D, Previous research [40] showed that user access to domain names followed a power-law distribution. This implied that more users would visit the most popular domains. For the domains in the local list, we collected the total number of times they were queried. In the following experiment, we weight and normalize each domain by its query count and use this to calculate the importance score. A region that can influence more queries or more popular domains receives a greater importance score in this setting.
The result is shown in table 10. In general, the regions/countries that were previously more important in the algorithm remain essential, such as China (CN), United States (US), Hong Kong (HK) and Singapore (SG). Thus, it suggests that the regions that affect more domain queries generally also affect more domain names. Comparing to IPv6, IPv4 scores are more concentrated in China because many popular websites in the list are used by Chinese users. Their DNS is also deployed in China. Our further fine-grained analysis reveals that for IPv4 authoritative name servers, 97.6% of the importance score is related to just 1% of the IP addresses. This result is consistent with previous studies [8]. Overall, multiple methods of importance calculation are consistent in their description of reality.
Related Works
Our study measures the importance of a country in the lens of DNS. Below we review the measurement studies focusing on routing and DNS first and then the ones using both data.
A. Measurement of Routing
To measure the dynamics of Internet routes, two data sources are mainly used [60], including BGP [59], [61]–[67] and traceroute [14], [28], [68]–[71]. Traceroute requires active probing, which is not applicable for our large-scale analysis [28], [71]. In addition, prior studies show that AS information derived from the traceroute data could be inaccurate [72], [73]. As such, we use BGP data, which has better coverage of routes and accuracy. On the other hand, neither data source is perfect [60], [74], [75], and inconsistencies have been observed [76]. We plan to investigate how to augment the BGP data with traceroute data to obtain more precise results in the future.
Measuring the geographic characteristics of routing is the focus of previous studies. Reference [65] measured country-to-country importance in routing, that is, how likely a country is on the routes between any two other countries. References [66] and [67] evaluated AS-to-AS and AS-to-country importance similarly. Some works estimated country-to-web [28], [71] or AS-to-web [64] importance, that is, how like a country or an AS is on the routes for visiting popular websites. Routing detour is measured in [14], [28], [59], [71], and [77] found anycast traffic sometimes are routed to out-of-country PoP (Point of Presence) even when an in-country PoP is available.
Compared to the prior studies, the main contribution of this work is to offer new insights about country-to-DNS importance, which has never been investigated a priori. In addition, the number of websites and countries we select to assess the importance is significantly larger than prior works using hundreds of popular sites and few countries [28], [64], [71].
B. Measurement of DNS
Numerous works have been done to measure the DNS infrastructure with passive or active data collection. The first approach [78], [79] obtains DNS logs from DNS servers [77] and historical database [23], or downloads zone files [8]. The second approach issues DNS queries from vantage points against DNS servers for performance measurement or anomaly identification. Platforms like RIPE Atlas [9], [10], [21], [80], [81], proxy networks [2], [82] and ad networks [83] were leveraged as crowd-sourced vantage points. In addition, researchers use open resolvers, which can be identified through scanning IPv4 address space [3], [39], [84], to forward DNS requests and conduct active measurement [26], [80], [85].
Our study attempts to assess country importance based on the distribution of nameservers. Previous works have extensively measured nameservers, focusing the aspects like performance [79], [81], [86], [87], security [30], [88], privacy [52], [89], configuration issues [8], [90] and record inconsistencies [9], [10], [21]. Though DNS is designed as a distributed system for reliability, recent studies revealed the trend of centralizing DNS services. For example, [8] showed that SLD names are increasingly sharing nameservers. References [27], [91], [92] discovered that nameservers of popular websites run on a small number of hosting or cloud services. Inter-dependencies were identified between zone files [93], [94], which could damage the reliability of DNS potentially [95]. The results of our study complement prior works regarding the distribution of DNS services, showing that some countries have prominent impacts on the whole DNS infrastructure.
C. Measurement on DNS and Routing Jointly
To optimize the DNS resolution performance, the routes between users and the nameservers are heavily engineered. IP anycast is a technique leveraged to this end and it has been measured using passive or active analysis [77], [96]–[99]. On the other hand, route hijack against DNS has been discovered and DNS and routing data are combined to detect such incidents [1], [98], [100]–[102]. Yet, no prior study has assessed the importance of certain parties related to the DNS infrastructure in the lens of routes, and we make the first attempt.
Conclusion
To quantify the importance of a country on the entire DNS infrastructure, we present
Here we revisit the measurement results and highlight the key insights. Firstly, the importance among countries is quite unbalanced. The US plays the dominant role in DNS infrastructure in every region, with over 0.75 on AfriNIC, LACNIC, and RIPE NCC, while the second country/region only has less than 0.25. European countries like Germany also have a strong influence across RIRs consistently. The gap is even enlarged when IPv6 is inspected, with the US being able to reach over 0.9 overall RIRs except for ARIN. Such observation could be unique to the DNS infrastructure, as diversifying nameservers are recommended by IETF RFCs and the network infrastructure of US and other European countries is more likely to be relied on. Secondly, countries with a history of network sovereignty have a significant impact on a large number of domains if they choose to isolate themselves from the Internet. Out of the 1.38 million domains we surveyed, China, Russia can achieve absolute control over the 50K domains, while China and India can influence the resolution of over 700K domains. This result shows the DNS world is highly connected, and DNS reliability needs to be reconsidered in the context of country politics. Thirdly, the routes of DNS resolution are far from being optimal, with the average length of a country path being 2.7 (except the US), and country loops are observed in 62.6% routers we surveyed.
While we are pleased to see the community’s effort in making DNS more reliable (e.g., multiple nameservers for one domain), we are concerned about the inequality of investment into DNS infrastructure between countries and the impact of network sovereignty potentially by certain countries. The issues with DNS routing should also be addressed to improve users’ DNS experiences. We believe country-wise importance should be considered an essential factor when structuring DNS infrastructure and new research is warranted.