Applying Segment-Level Attention on Bi-Modal Transformer Encoder for Audio-Visual Emotion Recognition | IEEE Journals & Magazine | IEEE Xplore