I. Introduction
Preconditioned parallel solvers based on the Krylov iterative method are widely used in scientific and engineering applications. The overhead in global communications is a critical issue when executing these solvers on large-scale massively parallel supercomputers. The communication-computation (CC) overlapping method in halo exchanges [1] is a well-known remedy for this in stencil computation [2].