Use of Data Mining for Intelligent Evaluation of Imputation Methods

la David Red; Carlos R. Primorac

Author	la David Red Carlos R. Primorac
Keywords	Computer Science Imputation Data Mining Interdisciplinary Applications Performance Evaluation
Abstract	In real-world situations, researchers frequently face the difficulty of missing values (MV), i.e., values not observed in a data set. Data imputation techniques allow the estimation of MV using different algorithms, by means of which important data can be imputed for a particular instance. Most of the literature in this field deals with different imputation methods. However, few studies deal with a comparative evaluation of the different methods as to provide more appropriate guidelines for the selection of the method to be applied to impute data for specific situations. The objective of this work is to show a methodology for evaluating the performance of imputation methods by means of new metrics derived from data mining processes, using quality metrics of data mining models. We started from the complete dataset that was amputated with different amputation mechanisms to generate 63 datasets with MV; these were imputed using Median, k-NN, k-Means and Hot-Deck imputation methods. The performance of the imputation methods was evaluated using new metrics derived from quality metrics of the data mining processes, performed with the original full file and with the imputed files. This evaluation is not based on measuring the error when imputing (usual operation), but on considering the similarity of the values of the quality metrics of the data mining processes obtained with the original file and with the imputed files. The results show that –globally considered and according to the new proposed metric, the imputation methods that showed the best performance were k-NN and k-Means. An additional advantage of the proposed methodology is that it provides predictive data mining models that can be used a posteriori.
Year of Publication	2025
Journal	International Journal of Interactive Multimedia and Artificial Intelligence
Volume	9
Start Page	82
Issue	Regular issue
Number	3
Number of Pages	82-95
Date Published	06/2025
ISSN Number	1989-1660
URL	https://www.ijimai.org/journal/bibcite/reference/3291
DOI	10.9781/ijimai.2023.03.002
	DOI Google Scholar BibTeX EndNote X3 XML EndNote 7 XML Endnote tagged Marc RIS
Attachment	ijimai9_3_8.pdf791.65 KB
Acknowledgment	This work has been developed in the context of the Research Project code SIUTIRE0005231TC, of the Resistencia Regional Faculty of the National Technological University, Argentine. We would like to thank the Co-Director of this project, Dr. Marcelo Karanik, for reviewing this work, Dr. Jorge Emilio Monzón, for reviewing the English version, and the scholarship holder, student Alejandro Nadal, for his effort and dedication to the multiple data mining processes.