1 Introduction
Due to limits in technology scaling, software developers have come to rely on thread-level parallelism to obtain sustainable performance improvement. However, except for the case where the computation is massively parallel (e.g., data-parallel applications), performance of threaded applications is often limited by how inter-thread synchronization is per-formed. For example, using coarse-grained locks can limit scalability, since the execution of lock-guarded critical sections is inherently serialized. Using fine-grained locks, in contrast, may provide good scalability, but increases locking overheads, and can often lead to subtle bugs.