S-Divergence-Based Internal Clustering Validation Index.

Authors

DOI:

https://doi.org/10.9781/ijimai.2023.10.001

Keywords:

Clustering Quality Indexes, Generalized Mean, K-Nearest Neighbors, S-distance, S-divergence, Spectral Clustering, Symmetry Favored
Supporting Agencies
This work is partially supported by the project “Smart Solutions in Ubiquitous Computing Environments”, Grant Agency of Excellence (under ID: UHKFIM-GE-2023), University of Hradec Kralove, Faculty of Informatics and Management, Czech Republic.

Abstract

A clustering validation index (CVI) is employed to evaluate an algorithm’s clustering results. Generally, CVI statistics can be split into three classes, namely internal, external, and relative cluster validations. Most of the existing internal CVIs were designed based on compactness (CM) and separation (SM). The distance between cluster centers is calculated by SM, whereas the CM measures the variance of the cluster. However, the SM between groups is not always captured accurately in highly overlapping classes. In this article, we devise a novel internal CVI that can be regarded as a complementary measure to the landscape of available internal CVIs. Initially, a database’s clusters are modeled as a non-parametric density function estimated using kernel density estimation. Then the S-divergence (SD) and S-distance are introduced for measuring the SM and the CM, respectively. The SD is defined based on the concept of Hermitian positive definite matrices applied to density functions. The proposed internal CVI (PM) is the ratio of CM to SM. The PM outperforms the legacy measures presented in the literature on both superficial and realistic databases in various scenarios, according to empirical results from four popular clustering algorithms, including fuzzy k-means, spectral clustering, density peak clustering, and density-based spatial clustering applied to noisy data.

Downloads

Download data is not yet available.

References

K. K. Sharma, A. Seal, “Modeling uncertain data using monte carlo integration method for clustering,” Expert Systems with Applications, vol. 137, pp. 100-116, 2019.

K. K. Sharma, A. Seal, “Clustering analysis using an adaptive fused distance,” Engineering Applications of Artificial Intelligence, vol. 96, p. 103928, 2020.

A. Seal, A. Karlekar, O. Krejcar, E. Herrera-Viedma, “Performance and convergence analysis of modified c-means using jeffreys-divergence for clustering,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 7, no. 2, pp. 141-149, 2021.

M. Martín Merino, A. J. López Rivero, V. Alonso, M. Vallejo, A. Ferreras, “A clustering algorithm based on an ensemble of dissimilarities: An application in the bioinformatics domain,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 7, no. 6, pp. 6-13, 2022.

E. Asensio, A. Almeida, A. Galiano, J.-M. Martín- Álvarez, “Using customer knowledge surveys to explain sales of postgraduate programs: A machine learning approach,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 7, no. 3, pp. 96-102, 2022.

F. A. Ozbay, B. Alatas, “Fake news detection within online social media using supervised artificial intelligence algorithms,” Physica A: Statistical Mechanics and its Applications, vol. 540, p. 123174, 2020.

B. K. Dedeturk, B. Akay, “Spam filtering using a logistic regression model trained by an artificial bee colony algorithm,” Applied Soft Computing, vol. 91, p. 106229, 2020.

S. Munusamy, P. Murugesan, “Modified dynamic fuzzy c-means clustering algorithm–application in dynamic customer segmentation,” Applied Intelligence, pp. 1–21, 2020.

I.-C. Wu, H.-K. Yu, “Sequential analysis and clustering to investigate users’ online shopping behaviors based on need-states,” Information Processing & Management, vol. 57, no. 6, p. 102323, 2020.

A. Sivanathan, H. H. Gharakheili, V. Sivaraman, “Detecting behavioral change of iot devices using clustering-based network traffic modeling,” IEEE Internet of Things Journal, vol. 7, no. 8, pp. 7295–7309, 2020.

A. Das, J. Nayak, B. Naik, U. Ghosh, “Generation of overlapping clusters constructing suitable graph for crime report analysis,” Future Generation Computer Systems, vol. 118, pp. 339–357, 2021.

A. K. Tripathi, K. Sharma, M. Bala, A. Kumar, V. G. Menon, A. K. Bashir, “A parallel military-dog-based algorithm for clustering big data in cognitive industrial internet of things,” IEEE Transactions on Industrial Informatics, vol. 17, no. 3, pp. 2134–2142, 2021, doi: 10.1109/TII.2020.2995680.

M. Landauer, F. Skopik, M. Wurzenberger, A. Rauber, “System log clustering approaches for cyber security applications: A survey,” Computers & Security, vol. 92, p. 101739, 2020.

A. K. Abasi, A. T. Khader, M. A. Al-Betar, S. Naim, S. N. Makhadmeh, Z. A. A. Alyasseri, “Link-based multi- verse optimizer for text documents clustering,” Applied Soft Computing, vol. 87, p. 106002, 2020.

S. Lin, K. Schorpp, I. Rothenaigner, K. Hadian, “Image- based highcontent screening in drug discovery,” Drug discovery today, 2020.

A. Belhadi, Y. Djenouri, J. C.-W. Lin, C. Zhang, A. Cano, “Exploring pattern mining algorithms for hashtag retrieval problem,” IEEE Access, vol. 8, pp. 10569–10583, 2020.

A. Karlekar, A. Seal, O. Krejcar, C. Gonzalo-Martin, “Fuzzy k-means using non-linear s-distance,” IEEE Access, vol. 7, pp. 55121–55131, 2019.

A. Seal, A. Karlekar, O. Krejcar, C. Gonzalo-Martin, “Fuzzy c-means clustering using jeffreys-divergence based similarity measure,” Applied Soft Computing, vol. 88, p. 106016, 2020.

K. K. Sharma, A. Seal, “Spectral embedded generalized mean based k-nearest neighbors clustering with s-distance,” Expert Systems with Applications, vol. 169, p. 114326, 2021.

K. K. Sharma, A. Seal, A. Yazidi, A. Selamat, O. Krejcar, “Clustering uncertain data objects using jeffreys- divergence and maximum bipartite matching based similarity measure,” IEEE Access, vol. 9, pp. 79505-79519, 2021.

A. Seal, E. Herrera Viedma, et al., “Performance and convergence analysis of modified c-means using jeffreys-divergence for clustering,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 7, no. 2, pp. 141-149, 2021.

K. K. Sharma, A. Seal, A. Yazidi, O. Krejcar, “A new adaptive mixture distance-based improved density peaks clustering for gearbox fault diagnosis,” IEEE Transactions on Instrumentation and Measurement, vol. 71, pp. 1–16, 2022, doi: 10.1109/TIM.2022.3216366.

T. Ullmann, C. Hennig, A.-L. Boulesteix, “Validation of cluster analysis results on validation data: A systematic framework,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, p. e1444, 2022.

B. Tavakkol, J. Choi, M. K. Jeong, S. L. Albin, “Object- based cluster validation with densities,” Pattern Recognition, vol. 121, p. 108223, 2022.

K. K. Sharma, A. Seal, “Multi-view spectral clustering for uncertain objects,” Information Sciences, vol. 547, pp. 723-745, 2020.

K. K. Sharma, A. Seal, “Outlier-robust multi-view clustering for uncertain data,” Knowledge-Based Systems, vol. 211, p. 106567, 2021.

K. K. Sharma, A. Seal, E. Herrera-Viedma, O. Krejcar, “An enhanced spectral clustering algorithm with s- distance,” Symmetry, vol. 13, no. 4, p. 596, 2021.

B. Liang, J. Cai, H. Yang, “A new cell group clustering algorithm based on validation & correction mechanism,” Expert Systems with Applications, vol. 193, p. 116410, 2022.

H. Cui, M. Xie, Y. Cai, X. Huang, Y. Liu, “Cluster validity index for adaptive clustering algorithms,” IET Communications, vol. 8, no. 13, pp. 2256–2263, 2014.

B. Tang, S. Kay, H. He, “Toward optimal feature selection in naive bayes for text categorization,” IEEE transactions on knowledge and data engineering, vol. 28, no. 9, pp. 2508–2521, 2016.

S. Sra, “Positive definite matrices and the s- divergence,” Proceedings of the American Mathematical Society, vol. 144, no. 7, pp. 2787–2797, 2016.

A. Fahad, N. Alshatri, Z. Tari, A. Alamri, I. Khalil, A. Y. Zomaya, S. Foufou, A. Bouras, “A survey of clustering algorithms for big data: Taxonomy and empirical analysis,” IEEE transactions on emerging topics in computing, vol. 2, no. 3, pp. 267–279, 2014.

S. Sharma, “Applied multivariate techniques, jhonn wiley & sons inc.; 116, New York,” Lewis-Beck vd, vol. 1994, pp. 112–113, 1996.

L. Hubert, P. Arabie, “Comparing partitions,” journal of classification, vol. 2, no. 1, pp. 193–218, 1985.

T. Caliński, J. Harabasz, “A dendrite method for cluster analysis,” Communications in Statistics-theory and Methods, vol. 3, no. 1, pp. 1–27, 1974.

Downloads

Published

2023-12-01
Metrics
Views/Downloads
  • Abstract
    229
  • PDF
    32

How to Cite

Kumar Sharma, K., Seal, A., Yazidi, A., and Krejcar, O. (2023). S-Divergence-Based Internal Clustering Validation Index. International Journal of Interactive Multimedia and Artificial Intelligence, 8(4), 127–139. https://doi.org/10.9781/ijimai.2023.10.001