I. Introduction
The number of Internet-connected devices has exponentially increased and is expected to exceed 50 billion devices by 2020, heralding the era of Internet of Things and complex networks [1]. The increase in the size and complexity of such networks magnifies the difficulty in network operation and management. Particularly in fault management, large number of network devices and the connections between them implies high frequency of failures. On the other hand, failure recovery task becomes much more challenging due to the complex nature of the networks. A failure of a crucial network component, if not recovered in a timely manner, could lead to severe consequences such as service disruption to many customers and revenue loss to network operators. Further, anecdotal evidence indicates that it can take hours or days to resolve a failure by using a manual recovery approach based on combination of ping, trace-route and other functionalities for Ethernet and MPLS [2]. Thus, it is desirable that the network should be equipped with an efficient fault management system that is able to firstly diagnose failures as soon as possible, and then quickly recover the network from such failures, e.g., using a proactive failure recovery mechanism [3].