I. Introduction
The applications of the Internet have greatly improved our lives. The scale of the network is increasing rapidly, which makes the network much more complex than before. With the rapid increase of network traffic, the problem of network management is becoming much more prominent [1]. To guarantee the safety and efficiency of the network, the network management is essential for operators. In addition, effective network management can provide users with a high quality of service and keep our networks away from network congestion, Distributed Denial of Service (DDoS) and other network attacks [2]–[4]. As a crucial input parameter, a traffic matrix (TM) describes the dynamic traces of network traffic between origin-destination (OD) flows in the network. There are many taxonomies for achieving the traffic matrix. Generally, it can be divided into two classifications, which are the direct measurement methods and the network traffic estimation methods [5]. The network traffic estimation methods infer the traffic matrix in terms of the relationship between the traffic matrix and link loads. However, it suffers from a highly ill-posed feature so that it is significantly difficult to acquire an accurate traffic matrix estimator. On the other hand, although the direct measurement can obtain a precise traffic matrix, it increases the network load. Besides, it also consumes many resources of routers (e.g., CPU and memory) [6].
An illustration of traffic measurement optimization based on reinforcement learning scheme.