Recognition of Emotions using Energy Based Bimodal Information Fusion and Correlation

Authors

DOI:

https://doi.org/10.9781/ijimai.2014.272

Keywords:

Intelligent Systems, Emotion recognition, Machine Learning, Bimodal Fusion, Energy Map
Supporting Agencies
The work reported in this paper is supported by the grant received from All India Council for Technical Education; A Statutory body of the Govt. of India. vide f. no. 8023/BOR/RID/RPS-129/2008-09.

Abstract

Multi-sensor information fusion is a rapidly developing research area which forms the backbone of numerous essential technologies such as intelligent robotic control, sensor networks, video and image processing and many more. In this paper, we have developed a novel technique to analyze and correlate human emotions expressed in voice tone & facial expression. Audio and video streams captured to populate audio and video bimodal data sets to sense the expressed emotions in voice tone and facial expression respectively. An energy based mapping is being done to overcome the inherent heterogeneity of the recorded bi-modal signal. The fusion process uses sampled and mapped energy signal of both modalities’s data stream and further recognize the overall emotional component using Support Vector Machine (SVM) classifier with the accuracy 93.06%.

Downloads

Download data is not yet available.

References

[1] Wang, J., Kankanhalli, M. S., Yan, W., & Jain, R. (2003). Experiential sampling for video surveillance. In First ACM SIGMM international workshop on Video surveillance (pp. 77-86).

[2] Neti, C., Maison, B., Senior, A. W., Iyengar, G., Decuetos, P., Basu, S., &Verma, A. (2000). Joint processing of audio and visual information for multimedia indexing and human-computer interaction(pp. 294-301).

[3] Radová, V., & Psutka, J. (1997). An approach to speaker identification using multiple classifiers. In Acoustics, Speech, and Signal Processing, ICASSP-97 (Vol. 2, pp. 1135-1138).

[4] Holzapfel, H., Nickel, K., & Stiefelhagen, R. (2004). Implementation and evaluation of a constraint-based multimodal fusion system for speech and 3D pointing gestures. In Proceedings of the 6th international conference on Multimodal interfaces (pp. 175-182).

[5] Wu, K., Lin, C.K., Chang, E., Smith, J.R. (2004) Multimodal information fusion for video concept detection. In: IEEE International Conference on Image Processing, Singapore (pp. 2391–2394).

[6] Adams, W. H., Iyengar, G., Lin, C. Y., Naphade, M. R., Neti, C., Nock, H. J., & Smith, J. R. (2003). Semantic indexing of multimedia content using visual, audio, and text cues. EURASIP Journal on Advances in Signal Processing (pp. 170-185).

[7] Zhu, Q., Yeh, M. C., & Cheng, K. T. (2006). Multimodal fusion using learned text concepts for image categorization. In Proceedings of the 14th annual ACM international conference on Multimedia (pp. 211-220).

[8] Li, F.F., Perona, P. (2005). A bayesian hierarchical model for learning natural scene categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Washington (vol. 2, pp. 524–531)

[9] Cutler, R., & Davis, L. (2000). Look who's talking: Speaker detection using video and audio correlation. In Multimedia and Expo, 2000. ICME 2000 (Vol. 3, pp. 1589-1592).

[10] Gandetto, M., Marchesotti, L., Sciutto, S., Negroni, D., Regazzoni, C.S. (2003). From multi-sensor surveillance towards smart interactive spaces. In: IEEE International Conference on Multimedia and Expo, Baltimore (pp. I:641–644).

[11] Bellard, F., & Niedermayer, M. (2012). FFmpeg. http://ffmpeg.org

[12] Boersma, Paul &Weenink, David (2014). Praat: doing phonetics by computer [Computer program]. Version 5.3.77, retrieved 18 May 2014 from http://www.praat.org/

[13] Naruniec, J., &Skarbek, W. (2007). Face detection by discrete gabor jets and reference graph of fiducial points. In Rough Sets and Knowledge Technology Springer Berlin Heidelberg (pp. 187-194).

[14] Martin, Olivier, et al. 2006. The eNTERFACE’ 05 Audio-Visual Emotion Database. Data Engineering Workshops, Proceedings.

[15] Chih-Chung Chang and Chih-Jen Lin ( 2006). LIBSVM : a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm

[16] Nam Khanh Tran (2012) Multimodal Fusion for Combining.

[17] Textual and Visual Information in a Semantic mode, Thesis submitted to Universitat Des Saarlandes.

Downloads

Published

2014-09-01
Metrics
Views/Downloads
  • Abstract
    35
  • PDF
    26

How to Cite

Asawa, K. and Manchanda, P. (2014). Recognition of Emotions using Energy Based Bimodal Information Fusion and Correlation. International Journal of Interactive Multimedia and Artificial Intelligence, 2(7), 17–21. https://doi.org/10.9781/ijimai.2014.272