I. Introduction
Nowadays, as the embedded systems are required to meet tight constraints on performance and power-efficiency, coarse-grained reconfigurable architectures (CGRAs) have received much attention due to their high performance and energy-efficiency [1], [2]. Recently, the dual- technique has been employed to CGRAs for reducing both dynamic and static power [3], [4]. In dual- CGRAs, high voltage () is assigned to the processing elements (PEs) to execute the long-delay operations (e.g., multiplication) and low voltage () is assigned to PEs executing the short-delay operations (e.g., addition). In other words, the long-delay operations must be executed on PEs assigned , but short-delay operations can be executed on PEs assigned either or .