1 Introduction
The growing complexity of information systems has significantly increased the difficulty and cost to maintain and manage large-scale distributed systems. According to an IDC study, 20 percent of IT costs were spent on operational system management in 1990 and, now, this percentage is approaching 70 percent [21], [20]. Large information systems such as Google.com and Amazon.com consist of thousands of components including servers, software, networking devices, and storage equipments. Although each of these components is complex enough by itself, the dynamic interaction among them introduces another dimension of complexity. The complexity of information systems originates not only from their scale but also from their dynamics and heterogeneity. For example, user behaviors and loads are always changing, software and hardware components are frequently replaced or upgraded, and a system itself may also include many uncertainties such as caching. Meantime, each information system is integrated with various software and hardware components, which are usually supplied by many different vendors and have their specific configurations. Therefore, it has been a great challenge to maintain and manage distributed systems with such scale and complexity.