A New Feature Selection Method based on Intuitionistic Fuzzy Entropy to Categorize Text Documents

Authors

Keywords:

Feature Selection, High Dimensionality, Intuitionistic Fuzzy Entropy, Text Categorization

Abstract

Selection of highly discriminative feature in text document plays a major challenging role in categorization. Feature selection is an important task that involves dimensionality reduction of feature matrix, which in turn enhances the performance of categorization. This article presents a new feature selection method based on Intuitionistic Fuzzy Entropy (IFE) for Text Categorization. Firstly, Intuitionistic Fuzzy C-Means (IFCM) clustering method is employed to compute the intuitionistic membership values. The computed intuitionistic membership values are used to estimate intuitionistic fuzzy entropy via Match degree. Further, features with lower entropy values are selected to categorize the text documents. To find the efficacy of the proposed method, experiments are conducted on three standard benchmark datasets using three classifiers. F-measure is used to assess the performance of the classifiers. The proposed method shows impressive results as compared to other well known feature selection methods. Moreover, Intuitionistic Fuzzy Set (IFS) property addresses the uncertainty limitations of traditional fuzzy set.

Downloads

Download data is not yet available.

References

J. H. Du, “Automatic text classification algorithm based on Gauss improved convolutional neural network”, Journal of Computational Science, 21, pp.195-200, 2017.

M. Kang, J. Ahn and K. Lee, “Opinion mining using ensemble text hidden Markov models for text classification”, Expert Systems with Applications, 94, pp.218-227, 2018.

A. M. Jalil, I. Hafidi, L. Alami and E. Khouribga, “Comparative study of clustering algorithms in text mining context”, International Journal of Interactive Multimedia and Artificial Intelligence, 3(7), pp.42-45, 2016.

A. Moreno and T. Redondo, “Text analytics: The convergence of big data and artificial intelligence”, International Journal of Interactive Multimedia and Artificial Intelligence, 3(6), pp.57-64, 2016.

A. Onan, S. Korukoğlu and H. Bulut, “Ensemble of keyword extraction methods and classifiers in text classification”, Expert Systems with Applications, 57, pp.232-247, 2016.

A. K. Uysal and S. Gunal, “The impact of preprocessing on text classification”, Information Processing & Management, 50(1), pp.104-112, 2014.

J. Huang, G. Li, Q. Huang and X. Wu, “Joint feature selection and classification for multilabel learning”, IEEE Transactions on Cybernetics. 2017.

D. Agnihotri, K. Verma and P. Tripathi, “Variable Global Feature Selection Scheme for automatic classification of text documents”, Expert Systems with Applications, 81, pp.268-281, 2017.

M. Labani, P. Moradi, F. Ahmadizar and M. Jalili, “A novel multivariate filter method for feature selection in text classification problems”, Engineering Applications of Artificial Intelligence, 70, pp.25-37, 2018.

Y. Lu and Y. Chen, “A Text Feature Selection Method Based on the Small World Algorithm”, Procedia Computer Science, 107, pp.276-284, 2017.

B. Tang, S. Kay and H. He, “Toward optimal feature selection in naive Bayes for text categorization”, IEEE Transactions on Knowledge and Data Engineering, 28(9), pp.2508-2521, 2016.

A. K. Uysal, “An improved global feature selection scheme for text classification”, Expert Systems with Applications, 43:82–92, 2016.

D. Zhu and K. W. Wong, “An evaluation study on text categorization using automatically generated labeled dataset”, Neurocomputing, 249, pp.321-336, 2017.

Y. Yang and J. O. Pedersen, “A comparative study on feature selection in text categorization”, In ICML, Vol. 97, pp. 412-420, 1997.

A. Rehman, K. Javed, and H. A. Babri, “Feature selection based on a normalized difference measure for text classification”, Information Processing & Management, 53(2), pp.473-489, 2017.

L. P. Jing, H. K. Huang and H. B. Shi, “Improved feature selection approach TFIDF in text mining”, In Machine Learning and Cybernetics, 2002. Proceedings. 2002 International Conference on, Vol. 2, pp. 944-946, 2002. IEEE

Y. Xu, G. J. Jones, J. Li, B. Wang and C. Sun, “A study on mutual information-based feature selection for text categorization”. Journal of Computational Information Systems, 3(3), pp.1007-1012, 2007.

C. Lee and G. G. Lee, “Information gain and divergence-based feature selection for machine learning-based text categorization”, Information processing & management, 42(1), pp.155-165, 2006.

S. S. Mengle and N. Goharian, “Ambiguity measure feature‐selection algorithm”, Journal of the American Society for Information Science and Technology, 60(5), pp.1037-1050, 2009.

M. Lan, S. Y. Sung, H. B. Low and C. L. Tan, “A comparative study on term weighting schemes for text categorization”, In Neural Networks, 2005. IJCNN’05. Proceedings. 2005 IEEE International Joint Conference on, Vol. 1, pp. 546-551, 2005. IEEE

M. B. Revanasiddappa, B. S. Harish and S. Manjunath, Document classification using Symbolic classifiers”, In Contemporary Computing and Informatics (IC3I), 2014 International Conference on, pp. 299-303, November 2014. IEEE

C. Largeron, C. Moulin and M. Géry, “Entropy based feature selection for text categorization”, In Proceedings of the 2011 ACM Symposium on Applied Computing, pp. 924-928, 2011. ACM

C. E. Shannon, “A mathematical theory of communication”, ACM SIGMOBILE Mobile Computing and Communications Review, 5(1), pp.3-55, 2001.

S. V. Vaseghi, “Advanced digital signal processing and noise reduction”, 2008. John Wiley & Sons

A. De Luca and S. Termini, “A definition of a nonprobabilistic entropy in the setting of fuzzy sets theory”, Information and control, 20(4), pp.301-312, 1972.

H. M. Lee, C. M. Chen, J. M. Chen and Y. L. Jou, “An efficient fuzzy classifier with feature selection based on fuzzy entropy”,IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 31(3), pp.426-432, 2001.

C. Palanisamy and S. Selvan, “Efficient subspace clustering for higher dimensional data using fuzzy entropy”, Journal of Systems Science and Systems Engineering, 18(1), pp.95-110, 2009.

X. Liu and W. Pedrycz, “Axiomatic fuzzy set theory and its applications”, Vol. 244, 2009. Heidelberg: Springer

L. A. Zadeh, “Probability measures of fuzzy events”. Journal of mathematical analysis and applications, 23(2), pp.421-427, 1968.

K. T. Atanassov, “Intuitionistic fuzzy sets”, Fuzzy Sets and Systems, 20(1), pp.87-96, 1986.

T. Chaira, “A novel intuitionistic fuzzy C means clustering algorithm and its application to medical images”, Applied Soft Computing, 11(2), pp.1711-1717, 2011.

K. Atanassov, “Intuitionistic Fuzzy Sets: Theory and Applications”, 1999. Springer, Heidelberg

B. S. Harish, D. S. Guru and S. Manjunath, “Representation and classification of text documents: A brief review”, IJCA, Special Issue on RTIPPR (2), pp.110-119, 2010.

T. Joachims, “Text categorization with support vector machines: Learning with many relevant features”, Machine Learning: ECML-98. pp.137-142, 1998.

E. P. Jiang, “Semi-supervised text classification using RBF networks”, In International Symposium on Intelligent Data Analysis, pp. 95-106, August 2009. Springer Berlin

S. Goswami, A. K. Das, A. Chakrabarti and B. Chakraborty, “A feature cluster taxonomy based feature selection technique”, Expert Systems with Applications, 79, pp.76-89, 2017.

H. T. Ng, W. B. Goh and K. L. Low, “Feature selection, perceptron learning, and a usability case study for text categorization”, In ACM SIGIR Forum (Vol. 31, No. SI, pp. 67-73, July 1997. ACM

F. Sebastiani, “Machine learning in automated text categorization”, ACM computing surveys (CSUR), 34(1), pp.1-47, 2002.

S. Vora and H. Yang, “A comprehensive study of eleven feature selection algorithms and their impact on text classification”, In Computing Conference, pp. 440-449, 2017.

B. Tang, S. Kay and H. He, “Toward optimal feature selection in naive Bayes for text categorization”, IEEE Transactions on Knowledge and Data Engineering, 28(9), pp.2508-2521, 2016.

J. Cai and F. Song, “Maximum entropy modeling with feature selection for text categorization”, In Asia Information Retrieval Symposium, pp. 549-554, January 2008. Springer Berlin Heidelberg

V. B. Vaghela, K. H. Vandra and N. K. Modi, “Entropy based feature selection for multi-relational Naive Bayesian Classifier”, Journal of International Technology and Information Management, 23(1), p.2, 2014.

W. Liu and N. Song, “A fuzzy approach to classification of text documents”, Journal of Computer Science and Technology, 18(5), pp.640-647, 2003.

J. L. Fan and Y. L. Ma, “Some new fuzzy entropy formulas”, Fuzzy sets and Systems, 128(2), pp.277-284, 2002.

R. N. Khushaba, A. Al-Jumaily and A. Al-Ani, “Novel feature extraction method based on fuzzy entropy and wavelet packet transform for myoelectric control”, In Communications and Information Technologies, 2007. ISCIT’07. International Symposium on, pp. 352-357, October 2007. IEEE

O. Parkash, P. K. Sharma and R. Mahajan, “New measures of weighted fuzzy entropy and their applications for the study of maximum weighted fuzzy entropy principle”, Information Sciences, 178(11), pp.2389-2395, 2008.

P. Luukka, “Feature selection using fuzzy entropy measures with similarity classifier”, Expert Systems with Applications, 38(4), pp.4600-4607, 2011.

F. Ahmadizar, M. Hemmati and A. Rabanimotlagh, “Two-stage text feature selection method using fuzzy entropy measure and ant colony optimization”, In Electrical Engineering (ICEE), 2012 20th Iranian Conference on, pp. 695-700, May 2012. IEEE

M. Sugeno and T. Terano, “A model of learning based on fuzzy information”, Kybernetes, 6(3), pp.157-166, 1977.

R. R. Yager, “On the measure of fuzziness and negation part I: membership in the unit interval”, Int. J. Gen. Syst. 5, pp.221–229, 1979.

P. Intarapaiboon, “An application of intuitionistic fuzzy sets in text classification”, In Information Science, Electronics and Electrical Engineering (ISEEE), 2014 International Conference on, Vol. 1, pp. 604-608, April 2014. IEEE

Y. Li, D. L. Olson and Z. Qin, “Similarity measures between intuitionistic fuzzy (vague) sets: A comparative analysis”, Pattern Recognition Letters, 28(2), pp.278-285, 2007.

G. A. Papakostas, A. G. Hatzimichailidis and V. G. Kaburlasos, “Distance and similarity measures between intuitionistic fuzzy sets: A comparative analysis from a pattern recognition point of view”, Pattern Recognition Letters, 34(14), pp.1609-1622, 2013.

E. Szmidt and J. Kacprzyk, “Using intuitionistic fuzzy sets in text categorization”, In International Conference on Artificial Intelligence and Soft Computing, pp. 351-362, 2008. Springer, Berlin, Heidelberg

H. M. Zhang, Z. S. Xu and Q. Chen, “On clustering approach to intuitionistic fuzzy sets”, Control and Decision, 22(8), p.882, 2007.

D. Iakovidis, N. Pelekis, E. Kotsifakos and I. Kopanakis, “Intuitionistic fuzzy clustering with applications in computer vision”, In Advanced Concepts for Intelligent Vision Systems (pp. 764-774), 2008. Springer Berlin/Heidelberg

Z. Xu, J. Chen and J. Wu, “Clustering algorithm for intuitionistic fuzzy sets”, Information Sciences, 178(19), pp.3775-3790, 2008.

Z. Xu and J. Wu, “Intuitionistic fuzzy C-means clustering algorithms”, Journal of Systems Engineering and Electronics, 21(4), pp.580-590, 2010.

E. Szmidt and J. Kacprzyk, “Entropy for intuitionistic fuzzy sets”, Fuzzy Sets and Systems, 118(3), pp.467-477, 2001.

P. Burillo and H. Bustince, “Entropy on intuitionistic fuzzy sets and on interval-valued fuzzy sets”. Fuzzy Sets and Systems, 78(3), pp.305-316, 1996.

W. L. Hung and M. S. Yang, “Fuzzy entropy on intuitionistic fuzzy sets”, International Journal of Intelligent Systems, 21(4), pp.443-451, 2006.

I. K. Vlachos and G. D. Sergiadis, “Intuitionistic fuzzy information– applications to pattern recognition”, Pattern Recognition Letters, 28(2), pp.197-206, 2007.

20newsgroups: http://people.csail.mit.edu/jrennie/20Newsgroups/

Reuters-21578: http://www.daviddlewis.com/resources/testcollections/reuters21578/

X. Hu, X. Zhang, C. Lu, E. K. Park and X. Zhou, “Exploiting Wikipedia as external knowledge for document clustering”, In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 389-396, 2009. ACM

Downloads

Published

2018-12-01
Metrics
Views/Downloads
  • Abstract
    77
  • PDF
    36

How to Cite

Harish, B. S. and Revanasiddappa, M. B. (2018). A New Feature Selection Method based on Intuitionistic Fuzzy Entropy to Categorize Text Documents. International Journal of Interactive Multimedia and Artificial Intelligence, 5(3), 106–117. Retrieved from https://www.ijimai.org/index.php/ijimai/article/view/6183

Most read articles by the same author(s)