I. Introduction
As a result of the limited depth of focus in optical lenses, it is difficult to describe the complex situation with a single image accurately [1]. In wireless visual sensor networks, multiple sensors are applied to obtain images of the same scene, and a centralized fusion centre combines source images from multiple sensors into a single image, which is more suitable for human visual and machine perception [2]. Then, the fused image will be transmitted to an upper node.