I. Introduction
Using visual information in speech recognition and speaker verification has aroused the interest of many researchers [1]–[5], [7] in recent years because the visual information of lip movement will help enhance the robustness of the system [6]. In the presence of noise, most speech recognition systems suffer from performance degradation. But with the aid of lip image sequence, the recognition performance can be considerably improved.