TY - JOUR
KW - Attention Model
KW - Interlocutor State
KW - Context Awareness
KW - Emotion recognition
KW - Multimodal
KW - Sentiment Analysis
AU - Mahesh G. Huddar
AU - Sanjeev S. Sannakki
AU - Vijay S. Rajpurohit
AB - The availability of an enormous quantity of multimodal data and its widespread applications, automatic sentiment analysis and emotion classification in the conversation has become an interesting research topic among the research community. The interlocutor state, context state between the neighboring utterances and multimodal fusion play an important role in multimodal sentiment analysis and emotion detection in conversation. In this article, the recurrent neural network (RNN) based method is developed to capture the interlocutor state and contextual state between the utterances. The pair-wise attention mechanism is used to understand the relationship between the modalities and their importance before fusion. First, two-two combinations of modalities are fused at a time and finally, all the modalities are fused to form the trimodal representation feature vector. The experiments are conducted on three standard datasets such as IEMOCAP, CMU-MOSEI, and CMU-MOSI. The proposed model is evaluated using two metrics such as accuracy and F1-Score and the results demonstrate that the proposed model performs better than the standard baselines.
IS - Regular Issue
M1 - 6
N2 - The availability of an enormous quantity of multimodal data and its widespread applications, automatic sentiment analysis and emotion classification in the conversation has become an interesting research topic among the research community. The interlocutor state, context state between the neighboring utterances and multimodal fusion play an important role in multimodal sentiment analysis and emotion detection in conversation. In this article, the recurrent neural network (RNN) based method is developed to capture the interlocutor state and contextual state between the utterances. The pair-wise attention mechanism is used to understand the relationship between the modalities and their importance before fusion. First, two-two combinations of modalities are fused at a time and finally, all the modalities are fused to form the trimodal representation feature vector. The experiments are conducted on three standard datasets such as IEMOCAP, CMU-MOSEI, and CMU-MOSI. The proposed model is evaluated using two metrics such as accuracy and F1-Score and the results demonstrate that the proposed model performs better than the standard baselines.
PY - 2021
SP - 112
EP - 121
T2 - International Journal of Interactive Multimedia and Artificial Intelligence
TI - Attention-based Multi-modal Sentiment Analysis and Emotion Detection in Conversation using RNN
UR - https://www.ijimai.org/journal/sites/default/files/2021-05/ijimai_6_6_12.pdf
VL - 6
SN - 1989-1660
ER -