02674nas a2200289 4500000000100000000000100001008004100002260001200043653002000055653001700075653002200092653002600114653001900140100001500159700001200174700001700186700001300203700001400216700001600230700001800246245011400264856007900378300000900457490000600466520189800472022001402370 2023 d c03/202310aActive Learning10aData Quality10aEfficient Dataset10aEvolving Environments10aGeneralization1 aZhuo Zhang1 aYang Li1 aYicheng Gong1 aYue Yang1 aShukun Ma1 aXiaolan Guo1 aSezai Ercisli00aDataset and Baselines for IID and OOD Image Classification Considering Data Quality and Evolving Environments uhttps://www.ijimai.org/journal/sites/default/files/2023-02/ijimai8_1_1.pdf a6-120 v83 aAt present, artificial intelligence is in a period of rapid development, and deep learning has begun to be applied in various fields. Data, as a key part of the deep learning, its efficiency and stability, will directly affect the performance of the model, so it is valued by people. In order to make the dataset efficient, many active learning methods have been proposed, the dataset containing independent identically distribution (IID) samples is reduced with excellent performance; in order to make the dataset more stable, it should be solved that the model encounters out-of-distribution (OOD) samples to improve generalization performance. However, the current active learning method design and the method of adding OOD samples lack guidance, and people do not know what samples should be selected and which OOD samples will be added to better improve the generalization performance. In this paper, we propose a dataset containing a variety of elements called a dataset with Complete Sample Elements(CSE), the labels such as rotation angle and distance in addition to the common classification labels. These labels can help people analyze the distribution characteristics of each element of an efficient dataset, thereby inspiring new active learning methods; we also construct a corresponding OOD test set, which can not only detect the generalization performance of the model, but also helps explore metrics between OOD samples and existing dataset to guide the selected method of OOD samples, so that it can improve generalization efficiently. In this paper, we explore the distribution characteristics of efficient datasets in terms of angle element, and confirm that an efficient dataset tends to contain samples with different appearance. At the same time, experiments have proved the positive influence of the addition of OOD samples on the generalization performance of dataset. a1989-1660