I. Introduction
The marketing industry is currently filled with many applications that can automatically create videos from text. Creating videos is a challenging task in computer vision, and it serves various purposes such as video retrieval, video description, and video summaries. One popular use of automatic video generation is for online video platforms like YouTube, where it enhances the user experience by providing information, recommendations, and entertainment. However, there is still ongoing work to improve the automatic video creation process, and the problem has not been fully solved. Animated videos are visually engaging and help explain complex concepts, making it easier for students to understand [1]. [2].