1. Introduction
Talking head synthesis is a challenging and promising research topic, which aims to generate video portraits with given audio. This technique is widely applied in various practical scenarios including animation, virtual avatars, online education, and video conferencing [4], [45],[48], [51], [54].