1. INTRODUCTION
Speech emotion, as one kind of meta-information apart from text, plays an important role for understanding speakers’ psychology and response. The relevant research, called speech emotion recognition (SER), aims to automatically recognize emotional category for a given speech utterance. Since emotions are usually conveyed in a subtle and variable way, it have been challenging to identify emotion embeddings, as representation of an utterance, that can effectively classify emotion categories.