Efficient and Robust Model Benchmarks with Item Response Theory and Adaptive Testing

Hao Song; Peter Flach

Author	Hao Song Peter Flach
Keywords	Item Response Theory Adaptive Testing Model Evaluation Benchmark
Abstract	Progress in predictive machine learning is typically measured on the basis of performance comparisons on benchmark datasets. Traditionally these kinds of empirical evaluation are carried out on large numbers of datasets, but this is becoming increasingly hard due to computational requirements and the often large number of alternative methods to compare against. In this paper we investigate adaptive approaches to achieve better efficiency on model benchmarking. For a large collection of datasets, rather than training and testing a given approach on every individual dataset, we seek methods that allow us to pick only a few representative datasets to quantify the model’s goodness, from which to extrapolate to performance on other datasets. To this end, we adapt existing approaches from psychometrics: specifically, Item Response Theory and Adaptive Testing. Both are well-founded frameworks designed for educational tests. We propose certain modifications following the requirements of machine learning experiments, and present experimental results to validate the approach.
Year of Publication	2021
Journal	International Journal of Interactive Multimedia and Artificial Intelligence
Volume	6
Issue	Special Issue on Artificial Intelligence, Paving the Way to the Future
Number	5
Number of Pages	110-118
Date Published	03/2021
ISSN Number	1989-1660
URL	https://www.ijimai.org/journal/sites/default/files/2021-02/ijimai_6_5_11_0.pdf
DOI	10.9781/ijimai.2021.02.009
	DOI Google Scholar BibTeX EndNote X3 XML EndNote 7 XML Endnote tagged Marc RIS
Attachment	ijimai_6_5_11_0.pdf1.48 MB