02504nas a2200241 4500000000100000000000100001008004100002260001200043653001500055653002000070653002900090653002400119653003100143653001300174100002300187700002100210245009600231856009500327300001000422490000600432520181000438022001402248 2019 d c03/201910aClustering10aCloud Computing10aWorkflow Data Scheduling10aData Transformation10aClustering Quality Indexes10aCloudSim1 aSid Ahmed Makhlouf1 aBelabbas Yagoubi00aData-Aware Scheduling Strategy for Scientific Workflow Applications in IaaS Cloud Computing uhttp://www.ijimai.org/journal/sites/default/files/files/2018/07/ijimai_5_4_9_pdf_20202.pdf a75-850 v53 aScientific workflows benefit from the cloud computing paradigm, which offers access to virtual resources provisioned on pay-as-you-go and on-demand basis. Minimizing resources costs to meet user’s budget is very important in a cloud environment. Several optimization approaches have been proposed to improve the performance and the cost of data-intensive scientific Workflow Scheduling (DiSWS) in cloud computing. However, in the literature, the majority of the DiSWS approaches focused on the use of heuristic and metaheuristic as an optimization method. Furthermore, the tasks hierarchy in data-intensive scientific workflows has not been extensively explored in the current literature. Specifically, in this paper, a data-intensive scientific workflow is represented as a hierarchy, which specifies hierarchical relations between workflow tasks, and an approach for data-intensive workflow scheduling applications is proposed. In this approach, first, the datasets and workflow tasks are modeled as a conditional probability matrix (CPM). Second, several data transformation and hierarchical clustering are applied to the CPM structure to determine the minimum number of virtual machines needed for the workflow execution. In this approach, the hierarchical clustering is done with respect to the budget imposed by the user. After data transformation and hierarchical clustering, the amount of data transmitted between clusters can be reduced, which can improve cost and makespan of the workflow by optimizing the use of virtual resources and network bandwidth. The performance and cost are analyzed using an extension of Cloudsim simulation tool and compared with existing multi-objective approaches. The results demonstrate that our approach reduces resources cost with respect to the user budgets. a1989-1660