Improving Pipelining Tools for Pre-processing Data

María Novo-Lourés; Yeray Lage; Reyes Pavón; Rosalía Laza; David Ruano-Ordás; José Ramón Méndez

Author	María Novo-Lourés Yeray Lage Reyes Pavón Rosalía Laza David Ruano-Ordás José Ramón Méndez
Keywords	Burst Processing Data Pre-processing Java Pipeline Frameworks
Abstract	The last several years have seen the emergence of data mining and its transformation into a powerful tool that adds value to business and research. Data mining makes it possible to explore and find unseen connections between variables and facts observed in different domains, helping us to better understand reality. The programming methods and frameworks used to analyse data have evolved over time. Currently, the use of pipelining schemes is the most reliable way of analysing data and due to this, several important companies are currently offering this kind of services. Moreover, several frameworks compatible with different programming languages are available for the development of computational pipelines and many research studies have addressed the optimization of data processing speed. However, as this study shows, the presence of early error detection techniques and developer support mechanisms is very limited in these frameworks. In this context, this study introduces different improvements, such as the design of different types of constraints for the early detection of errors, the creation of functions to facilitate debugging of concrete tasks included in a pipeline, the invalidation of erroneous instances and/or the introduction of the burst-processing scheme. Adding these functionalities, we developed Big Data Pipelining for Java (BDP4J, https://github.com/sing-group/bdp4j), a fully functional new pipelining framework that shows the potential of these features.
Year of Publication	2022
Journal	International Journal of Interactive Multimedia and Artificial Intelligence
Volume	7
Issue	Regular Issue
Number	4
Number of Pages	214-224
Date Published	06/2022
ISSN Number	1989-1660
URL	https://www.ijimai.org/journal/sites/default/files/2022-05/ijimai_7_4_19.pdf
DOI	10.9781/ijimai.2021.10.004
	DOI Google Scholar BibTeX EndNote X3 XML EndNote 7 XML Endnote tagged Marc RIS
Attachment	ijimai_7_4_19.pdf773.63 KB