I. Introduction
Skew-Minimized Buffered Clock-Tree Synthesis (BCTS) plays an important role in high-performance very large-scale integration (VLSI) designs for synchronous circuits. Due to the insufficient accuracy of existing timing models for modern chip design, embedding simulation process into a clock-tree synthesis flow becomes inevitable [9]. Consequently, the running time for clock-tree synthesis becomes prohibitively huge as the complexity of chip designs grows rapidly. Therefore, it is desirable to develop an efficient mechanism for the synthesis of large-scale clock trees.