CompareML: A Novel Approach to Supporting Preliminary Data Analysis Decision Making.

Antonio Jesús Fernández García; Juan Carlos Preciado; Alvaro E. Prieto; Fernando Sánchez Figueroa; Juan D. Gutiérrez

doi:10.9781/ijimai.2021.08.001

Authors

Antonio Jesús Fernández García Universidad Internacional De La Rioja
Juan Carlos Preciado Universidad de Extremadura
Alvaro E. Prieto Universidad de Extremadura
Fernando Sánchez Figueroa Universidad de Extremadura
Juan D. Gutiérrez Universidad de Extremadura

DOI:

https://doi.org/10.9781/ijimai.2021.08.001

Keywords:

Classification, Decision Support System, Knowledge Elicitation, Machine Learning, Regression, Software

Supporting Agencies

This work was developed with the support of (i) Ministerio de Ciencia, Innovación y Universidades (MCIU), Agencia Estatal de Investigación (AEI), and European Regional Development Fund (ERDF): project RTI2018-098652-B-I00, and (ii) European Regional Development Fund (ERDF) and Junta de Extremadura: projects IB16055, IB18034, and GR18112.

Abstract

There are a large number of machine learning algorithms as well as a wide range of libraries and services that allow one to create predictive models. With machine learning and artificial intelligence playing a major role in dealing with engineering problems, practising engineers often come to the machine learning field so overwhelmed with the multitude of possibilities that they find themselves needing to address difficulties before actually starting on carrying out any work. Datasets have intrinsic properties that make it hard to select the algorithm that is best suited to some specific objective, and the ever-increasing number of providers together make this selection even harder. These were the reasons underlying the design of CompareML, an approach to supporting the evaluation and comparison of machine learning libraries and services without deep machine learning knowledge. CompareML makes it easy to compare the performance of different models by using well-known classification and regression algorithms already made available by some of the most widely used providers. It facilitates the practical application of methods and techniques of artificial intelligence that let a practising engineer decide whether they might be used to resolve hitherto intractable problems. Thus, researchers and engineering practitioners can uncover the potential of their datasets for the inference of new knowledge by selecting the most appropriate machine learning algorithm and determining the provider best suited to their data.

Downloads

Download data is not yet available.

References

I. H. Witten, E. Frank, M. A. Hall, “Introduction to weka,” in Data Mining: Practical Machine Learning Tools and Techniques (Third Edition), The Morgan Kaufmann Series in Data Management Systems, Boston: Morgan Kaufmann, 2011, pp. 403–406, third edition ed., doi: https://doi.org/10.1016/B978-0-12-374856-0.00010-9

S. Lang, F. Bravo-Marquez, C. Beckham, M. Hall, E. Frank, “Wekadeeplearning4j: A deep learning package for weka based on deeplearning4j,” Knowledge-Based Systems, vol. 178, pp. 48–50, 2019, doi: https://doi.org/10.1016/j.knosys.2019.04.013

J. Demšar, T. Curk, A. Erjavec, Č. Gorup, T. Hočevar, M. Milutinovič, M. Možina, M. Polajnar, M. Toplak, A. Starič, M. Štajdohar, L. Umek, L. Žagar, J. Žbontar, M. Žitnik, B. Zupan, “Orange: Data mining toolbox in python,” Journal of Machine Learning Research, vol. 14, pp. 2349–2353, 2013.

M. R. Berthold, N. Cebron, F. Dill, T. R. Gabriel, T. Kötter, T. Meinl, P. Ohl, K. Thiel, B. Wiswedel, “Knime - the konstanz information miner: Version 2.0 and beyond,” SIGKDD Explor. Newsl., vol. 11, p. 26–31, Nov. 2009, doi: 10.1145/1656274.1656280.

I. Mierswa, M. Wurst, R. Klinkenberg, M. Scholz, T. Euler, “Yale: Rapid prototyping for complex data mining tasks,” in Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’06, New York, NY, USA, 2006, p. 935–940, Association for Computing Machinery.

A. Jovic, K. Brkic, N. Bogunovic, “An overview of free software tools for general data mining,” in 2014 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), 2014, pp. 1112–1117.

X. He, K. Zhao, X. Chu, “Automl: A survey of the state-of- the-art,” Knowledge-Based Systems, vol. 212, p. 106622, 2021, doi: https://doi.org/10.1016/j.knosys.2020.106622

H. Song, P. Flach, “Efficient and robust model benchmarks with item response theory and adaptive testing,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 6, pp. 110–118, 2021, doi: https://doi.org/10.9781/ijimai.2021.02.009

Microsoft, “Powerbi automated machine learning.” https://docs.microsoft.com/en-us/power-bi/transform-model/dataflows/dataflows-machinelearning-integration Online; last accessed 2 April 2021.

M. Ali, PyCaret: An open source, low-code machine learning library in Python, July 2020. PyCaret version 2.3.

Google, “Cloud automl.” https://cloud.google.com/automl Online; last accessed 2 April 2021.

H. Robles-Berumen, A. Zafra, H. M. Fardoun, S. Ventura, “Leac: An efficient library for clustering with evolutionary algorithms,” KnowledgeBased Systems, vol. 179, pp. 117–119, 2019, doi: https://doi.org/10.1016/j.knosys.2019.05.008

D. Charte, F. Herrera, F. Charte, “Ruta: Implementations of neural autoencoders in r,” Knowledge-Based Systems, vol. 174, pp. 4–8, 2019, doi: https://doi.org/10.1016/j.knosys.2019.01.014

E. Real, C. Liang, D. R. So, Q. V. Le, “Automl-zero: Evolving machine learning algorithms from scratch,” 2020.

C. M. University, “Turi graphlab create.” https://turi.com/ Online; last accessed 2 April 2021.

G. van Rossum, the Python Software Foundation, “Python programming language.” https://www.python.org/ Online; last accessed 2 April 2021.

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, “Scikitlearn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.

R Core Team, R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2013.

A. Gupta, K. Ghanshala, R. C. Joshi, “Machine learning classifier approach with gaussian process, ensemble boosted trees, svm, and linear regression for 5g signal coverage mapping,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 6, pp. 156–163, 2021, doi: https://doi.org/10.9781/ijimai.2021.03.004

A. J. Fernández-García, L. Iribarne, A. Corral, J. Criado, J. Z. Wang, “A recommender system for component- based applications using machine learning techniques,” Knowledge-Based Systems, vol. 164, pp. 68–84, 2019, doi: https://doi.org/10.1016/j.knosys.2018.10.019

A. J. Fernández-García, R. Rodríguez-Echeverría, J. C. Preciado, J. M. C. Manzano, F. Sánchez-Figueroa, “Creating a recommender system to support higher education students in the subject enrollment decision,” IEEE Access, vol. 8, pp. 189069–189088, 2020, doi: 10.1109/ACCESS.2020.3031572.

T. H.-Y. Chiu, C. Wu, R. C. C.-H. Chen, “A generalized wine quality prediction framework by evolutionary algorithms,” International Journal of Interactive Multimedia and Artificial Intelligence, doi: https://doi.org/10.9781/ijimai.2021.04.006

K. M. Ting, Confusion Matrix, pp. 260–260. Boston, MA: Springer US, 2017.

A. Leff, J. T. Rayfield, “Web-application development using the model/view/controller design pattern,” in Proceedings Fifth IEEE International Enterprise Distributed Object Computing Conference, Sep. 2001, pp. 118–127.

W. McKinney, “pandas: a foundational python library for data analysis and statistics,” Python for High Performance and Scientific Computing, vol. 14, 2011.

S. Hellegouarch, CherryPy Essentials: Rapid Python Web Application Development Design, Develop, Test, and Deploy Your Python Web Applications Easily. Packt Publishing, 2007.

M. Bohanec, V. Rajkovič, “Knowledge acquisition and explanation for multi-attribute decision,” in 8th International Workshop Expert Systems and Their Applications, 1988.

D. Dua, C. Graff, “UCI machine learning repository,” 2017. [Online]. Available: http://archive.ics.uci.edu/ml

A. Tsanas, A. Xifara, “Accurate quantitative estimation of energy performance of residential buildings using statistical machine learning tools,” Energy and Buildings, vol. 49, pp. 560 – 567, 2012, doi: https://doi.org/10.1016/j.enbuild.2012.03.003

J. Lewis, M. Fowler, “Microservices: a definition of this new architectural term.” http://martinfowler.com/articles/microservices.html, 2014.