02585nas a2200229 4500000000100000000000100001008004100002260001200043653001900055653002000074653001800094653002000112100002500132700001600157700001900173245007900192856007400271300001000345490000600355520198000361022001402341 2012 d c06/201210aGrid computing10aFault Tolerance10aCheckpointing10aMessage-logging1 aNdeye Massata Ndiaye1 aPierre Sens1 aOusmane Thiare00aPerformance comparison of hierarchical checkpoint protocols grid computing uhttp://www.ijimai.org/journal/sites/default/files/IJIMAI20121_5_6.pdf a46-530 v13 aGrid infrastructure is a large set of nodes geographically distributed and connected by a communication. In this context, fault tolerance is a necessity imposed by the distribution that poses a number of problems related to the heterogeneity of hardware, operating systems, networks, middleware, applications, the dynamic resource, the scalability, the lack of common memory, the lack of a common clock, the asynchronous communication between processes. To improve the robustness of supercomputing applications in the presence of failures, many techniques have been developed to provide resistance to these faults of the system. Fault tolerance is intended to allow the system to provide service as specified in spite of occurrences of faults. It appears as an indispensable element in distributed systems. To meet this need, several techniques have been proposed in the literature. We will study the protocols based on rollback recovery. These protocols are classified into two categories: coordinated checkpointing and rollback protocols and log-based independent checkpointing protocols or message logging protocols. However, the performance of a protocol depends on the characteristics of the system, network and applications running. Faced with the constraints of large-scale environments, many of algorithms of the literature showed inadequate. Given an application environment and a system, it is not easy to identify the recovery protocol that is most appropriate for a cluster or hierarchical environment, like grid computing. While some protocols have been used successfully in small scale, they are not suitable for use in large scale. Hence there is a need to implement these protocols in a hierarchical fashion to compare their performance in grid computing. In this paper, we propose hierarchical version of four well-known protocols. We have implemented and compare the performance of these protocols in clusters and grid computing using the Omnet++ simulator. a1989-1660