Multimodal Cross- and Self-Attention Network for Speech Emotion Recognition | IEEE Conference Publication | IEEE Xplore