1. INTRODUCTION
With the recent development of Artificial Intelligence (AI) technology, interest in solving problems by connecting AI and humans is increasing to help human life. It is also necessary in human-to-human conversations in everyday life, especially when the importance of virtual meetings and video conferencing is highlighted. Among many problems in human-to-human conversations, the need for technologies recognizing an accurate conversation when voice signals are hardly available has been increasing. This technology is promising since it can help people understand conversation in situations like crowded shopping mall, party with lots of people and loud music, and silent video conference.