I. Introduction
The latest video coding standard H.264 as well as other modern standards uses intra (I)- and inter [predicted (P) and bi-directional (B)]- frames for improved video coding [1]. An I-frame is encoded using only its own information and thus can be used for error propagation prevention, fast backward/forward play, random access, indexing, and so on. On the other hand, a P-frame or B-frame is coded with the help of previously encoded I- or P-frame(s) for efficient coding. in the H.264 standard, frames are coded as a group of picture (GOP) comprises one I-frame with subsequent inter frames. The number of I-frame is fewer compared to the inter-frames because an I-frame typically requires several times more bits compared to its inter-coded counterpart for the same image quality. An I-frame is used as an anchor frame for referencing the subsequent inter-frames of a GOP directly or indirectly. Thus, encoding error (due to the quantization) of an I-frame is propagated and accumulated toward the end of the frames of a GOP. As a result, the image quality degrades and the bits requirement increases toward the end of the GOP. When another I-frame is inserted for the next GOP, better image quality (with the cost of more bits) is recovered and then again quality degrades toward the end of GOP. As a result, the further an inter-frame is away from the I-frame, the lower the quality becomes. The fluctuation of image quality (or bits per frame) is not desirable for perceptual quality (or bit rate control) [2]–[4]. By selecting the first frame as an I-frame without verifying its suitability to be an I-frame, we sacrifice: 1) overall rate-distortion performance because of poor selection of an I-frame, and 2) perceptual image quality by introducing image quality fluctuation.