Journals & Magazines >IEEE MultiMedia >Volume: 29 Issue: 3

Context- and Knowledge-Aware Graph Convolutional Network for Multimodal Emotion Recognition

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

This work proposes an approach for emotion recognition in conversation that leverages context modeling, knowledge enrichment, and multimodal (text and audio) learning bas...Show More

Metadata

Abstract:

This work proposes an approach for emotion recognition in conversation that leverages context modeling, knowledge enrichment, and multimodal (text and audio) learning based on a graph convolutional network (GCN). We first construct two distinctive graphs for modeling the contextual interaction and knowledge dynamic. We then introduce an affective lexicon into knowledge graph building to enrich the emotional polarity of each concept, that is the related knowledge of each token in an utterance. Then, we achieve a balance between the context and the affect-enriched knowledge by incorporating them into the new adjacency matrix construction of the GCN architecture, and teach them jointly with multiple modalities to effectively structure the semantics-sensitive and knowledge-sensitive contextual dependence of each conversation. Our model outperforms the state-of-the-art benchmarks by over 22.6% and 11% relative error reduction in terms of weighted-F1 on the IEMOCAP and MELD databases, respectively, demonstrating the superiority of our method in emotion recognition.

Published in: IEEE MultiMedia ( Volume: 29, Issue: 3, 01 July-Sept. 2022)

Page(s): 91 - 100

Date of Publication: 10 May 2022

ISSN Information:

DOI: 10.1109/MMUL.2022.3173430

Funding Agency:

References is not available for this document.

Contents

Emotion recognition in conversations (ERC) has attracted increasing attention because it is a necessary step for a number of applications, including social media threads (such as YouTube, Facebook, Twitter), human–computer interaction, and so on. Different from nonconversation cases, “context” is a vital component of ERC, which represents the previous dialog content of a target utterance. The intention and emotion of a target utterance are mostly affected by the surrounding context, as we can see from conversations in Figure 1. Therefore, it is important but challenging to effectively model the contextual dependence within conversations.

Select All

S. Poria, E. Cambria, D. Hazarika, N. Majumder, A. Zadeh and L.-P. Morency, "Context-dependent sentiment analysis in user-generated videos", Proc. 55th Annu. Meeting Assoc. Comput. Linguistics, pp. 873-883, 2017.

CrossRef Google Scholar

N. Majumder et al., "Dialoguernn: An attentive RNN for emotion detection in conversations", Proc. AAAI Conf. Artif. Intell., vol. 33, pp. 6818-6825, 2019.

CrossRef Google Scholar

D. Ghosal, N. Majumder, S. Poria and A. Gelbukh, "DialogueGCN: A graph convolutional neural network for emotion recognition in conversation", Proc. Conf. Empirical Methods Natural Lang. Process. 9th Int. Joint Conf. Natural Lang. Process., pp. 154-164, 2019.

CrossRef Google Scholar

C. Busso et al., "IEMOCAP: Interactive emotional dyadic motion capture database", Lang. Resour. Eval., vol. 42, no. 4, pp. 335-359, 2008.

CrossRef Google Scholar

Y. Fu et al., "Consk-GCN: Conversational semantic-and knowledge-oriented graph convolutional network for multimodal emotion recognition", Proc. Int. Conf. Multimedia Expo., pp. 1-6, 2021.

View Article

Google Scholar

P. Tzirakis, G. Trigeorgis, M. A. Nicolaou, B. W. Schuller and S. Zafeiriou, "End-to-end multimodal emotion recognition using deep neural networks", IEEE J. Sel. Topics Signal Process., vol. 11, no. 8, pp. 1301-1309, Dec. 2017.

View Article

Google Scholar

N. Li, B. Liu, Z. Han, Y.-S. Liu and J. Fu, "Emotion reinforced visual storytelling", Proc. Int. Conf. Multimedia Retrieval, pp. 297-305, 2019.

CrossRef Google Scholar

T. Mittal, P. Guhan, U. Bhattacharya, R. Chandra, A. Bera and D. Manocha, "Emoticon: Context-aware multimodal emotion recognition using frege’s principle", Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp. 14234-14243, 2020.

View Article

Google Scholar

M. Schlichtkrull et al., "Modeling relational data with graph convolutional networks", Proc. Eur. Semantic Web Conf., pp. 593-607, 2018.

CrossRef Google Scholar

10.

S. Tripathi, S. Tripathi and H. Beigi, "Multi-modal emotion recognition on IEMOCAP dataset using deep learning", 2018.

Google Scholar

11.

Y. Kim, "Convolutional neural networks for sentence classification", Proc. Conf. Empirical Methods Natural Lang. Process., pp. 1746-1751, 2014.

CrossRef Google Scholar

12.

T. Young, E. Cambria, I. Chaturvedi, H. Zhou, S. Biswas and M. Huang, "Augmenting end-to-end dialogue systems with commonsense knowledge", Proc. AAAI Conf. Artif. Intell., vol. 32, no. 1, pp. 4970-4977, 2018.

CrossRef Google Scholar

13.

P. Zhong, D. Wang and C. Miao, "Knowledge-enriched transformer for emotion detection in textual conversations", Proc. Conf. Empirical Methods Natural Lang. Process. 9th Int. Joint Conf. Natural Lang. Process., pp. 165-176, 2019.

CrossRef Google Scholar

14.

R. Speer, J. Chin and C. Havasi, "ConceptNet 5.5: An open multilingual graph of general knowledge", Proc. AAAI Conf. Artif. Intell., pp. 4444-4451, 2017.

CrossRef Google Scholar

15.

S. Mohammad, "Obtaining reliable human ratings of valence arousal and dominance for 20000 english words", Proc. 56th Annu. Meeting Assoc. Comput. Linguistics, pp. 174-184, 2018.

CrossRef Google Scholar

16.

P. Bojanowski, E. Grave, A. Joulin and T. Mikolov, "Enriching word vectors with subword information", Trans. Assoc. Comput. Linguistics, vol. 5, pp. 135-146, 2017.

CrossRef Google Scholar

17.

S. Poria, D. Hazarika, N. Majumder, G. Naik, E. Cambria and R. Mihalcea, "Meld: A multimodal multi-party dataset for emotion recognition in conversations", Proc. 57th Annu. Meeting Assoc. Comput. Linguistics, pp. 527-536, 2019.

CrossRef Google Scholar

18.

L. Guo, L. Wang, J. Dang, L. Zhang and H. Guan, "A feature fusion method based on extreme learning machine for speech emotion recognition", Proc. ICASSP IEEE Int. Conf. Acoust. Speech Signal Process., pp. 2666-2670, 2018.

View Article

Google Scholar

19.

J. D. M.-W. C. Kenton and L. K. Toutanova, "Bert: Pre-training of deep bidirectional transformers for language understanding", pp. 4171-4186, 2018.

Google Scholar

20.

C. E. Osgood, "The nature and measurement of meaning", Psychol. Bull., vol. 49, no. 3, pp. 197-237, 1952.

CrossRef Google Scholar

References is not available for this document.

Context- and Knowledge-Aware Graph Convolutional Network for Multimodal Emotion Recognition

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Context- and Knowledge-Aware Graph Convolutional Network for Multimodal Emotion Recognition

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

Authors

Figures

References

Citations

Keywords

Metrics

Footnotes

References

IEEE Account

Purchase Details

Profile Information

Need Help?