The Center for Education and Research in Information Assurance and Security (CERIAS)

The Center for Education and Research in
Information Assurance and Security (CERIAS)

Design and Analysis of an Integrated Checkpointing and Recovery Scheme for Distributed Applications

Author

Ramamurty, Bina; Upadhyaya, Shambhu; Bhargava, Bharat

Entry type

article

Abstract

An integrated checkpointing and recovery scheme which exploits the low latency and high coverage characterisitics of a concurrent error detection scheme is presented. Message dependency which is the main source of multistep rollback in distributed systems is minimized by using a new message validation technique derived from the notion of concurrent error detection. The concept of a new global state matrix is introduced to track error checking and message dependency in a distributed system and assist in the recovery. The analyitcal model, algorithms, and data structures to support an easy implementation of the new scheme are presented. The completeness and correctness of the algorithms are proved. A number of scenarios are illustrations that give the details of the analytical model are presented. The benefits of the integrated checkpointing scheme are quantified by means of simulation using an object-oriented test framework.

Date

1999 – 08 – 08

Journal

IEEE Transactions on Knowledge and Data Engineering

Key alpha

Ramamurty

Publisher

IEEE Computer Society

Volume

12

Affiliation

IEEE

Publication Date

1999-08-08

Language

English

Location

A hard-copy of this is in the Papers Cabinet

Subject

Design and Analysis of an Integrated Checkpointing and Recovery Scheme for Distributed Applications

BibTex-formatted data

To refer to this entry, you may select and copy the text below and paste it into your BibTex document. Note that the text may not contain all macros that BibTex supports.