1. Introduction
Recently, deep learning algorithms have successfully addressed problems in various fields, such as image classification, machine translation, speech recognition, text-to-speech generation and other machine learning related areas [1 , 2 , 3] . Similarly, substantial improvements in performance have been obtained when deep learning algorithms have been applied to statistical speech processing [4] . These fundamental improvements have led researchers to investigate additional topics related to human nature, which have long been objects of study. One such topic involves understanding human emotions and reflecting it through machine intelligence, such as emotional dialogue models [5 , 6] .