On Increasing Reliability and Availability in Distributed Database Systems

Get BibTex-formatted data

Author

Shy-Renn Lian

Entry type

phdthesis

Abstract

This thesis proposes several mechanisms to deal with recovery and data availability issues in distributed systems. Checkpointing in a distributed system is essential for recovery to a globally consistent state after failure. We present a checkpointing/rollback recovery algorithm in which each process takes checkpoints independently. In our approach a number of checkpointing processes, a number of rollback processes, and computations on operational processes can proceed concurrently while tolerating the failure of an arbitrary number of processes. During recovery after a failure, a process invokes a two phase rollback algorithm. In the first phase, it collects information about relevant message exchanges in the system and uses it in the second phase to determine both the set of processes that must roll back and the set of checkpoints upto which rollback must occur. We have implemented the checkpointing and rollback recovery algorithm and evaluated its performance in a real processing environment. The evaluation measures the overhead due to time spent in executing the algorithm and the cost in terms of computational time and message traffic. We identify the components that make up the execution time of the algorithm and study how each of them contributes to the total execution time. A typed token scheme for managing replicated data in distributed databaasse systems is proposed in this thesis. Compared to previous schemes, for each object, a set of tokens is used. Each token represents a specific capability for the allowable operations on the object. By distributing tokens to different physical copies of the object, the object can be made available for different operations in various partitions of the network failure. two types of replication for each of these tokens are proposed. One is based on the semantics of operations and the other is based on the semantics of the object. When failures are anticipated, tokens can be redistributed to maintain high availability. We present a merge recovery scheme for efficient recovery of database consistency from dynamic partitioning and merge. In the merge recovery scheme, information needed for recovery is organized in a partition tree so that missing updates can be efficiently carried out when partition merge. The merge recovery scheme is used to support the typed token scheme for efficient partition merge. It can also be used to extend some other replica control protocol.

Key alpha

Lian

Note

December 1990

School

Purdue University

Publication Date

1900-01-01

1. Introduction 2. Independent Checkpointing and Rollback Recovery 3. Experimental Evaluation of the Concurrent Checkpointing and Recovery Algorithms 4. Increasing Availability Without Sacrificing Performance for Databse Application 5. Dynamic Partitioning and Merge 6. Conclusions

Location

A hard-copy of this is in REC 216

BibTex-formatted data

To refer to this entry, you may select and copy the text below and paste it into your BibTex document. Note that the text may not contain all macros that BibTex supports.

@Phdthesis{ Lian,
	title = "On Increasing Reliability and Availability in Distributed Database Systems",
	author = "Shy-Renn Lian",
	note = "December 1990",
	school = "Purdue University",
	abstract = "This thesis proposes several mechanisms to deal with recovery and data availability issues in distributed systems.
Checkpointing in a distributed system is essential for recovery to a globally consistent state after failure.  We present a checkpointing/rollback recovery algorithm in which each process takes checkpoints independently.  In our approach a number of checkpointing processes, a number of rollback processes, and computations on operational processes can proceed concurrently while tolerating the failure of an arbitrary number of processes.  During recovery after a failure, a process invokes a two phase rollback algorithm.  In the first phase, it collects information about relevant message exchanges in the system and uses it in the second phase to determine both the set of processes that must roll back and the set of checkpoints upto which rollback must occur.
We have implemented the checkpointing and rollback recovery algorithm and evaluated its performance in a real processing environment.  The evaluation measures the overhead due to time spent in executing the algorithm and the cost in terms of computational time and message traffic.  We identify the components that make up the execution time of the algorithm and study how each of them contributes to the total execution time.
A typed token scheme for managing replicated data in distributed databaasse systems is proposed in this thesis.  Compared to previous schemes, for each object, a set of tokens is used.  Each token represents a specific capability for the allowable operations on the object.  By distributing tokens to different physical copies of the object, the object can be made available for different operations in various partitions of the network failure.  two types of replication for each of these tokens are proposed.  One is based on the semantics of operations and the other is based on the semantics of the object.  When failures are anticipated, tokens can be redistributed to maintain high availability.
We present a merge recovery scheme for efficient recovery of database consistency from dynamic partitioning and merge.  In the merge recovery scheme, information needed for recovery is organized in a partition tree so that missing updates can be efficiently carried out when partition merge.  The merge recovery scheme is used to support the typed token scheme for efficient partition merge.  It can also be used to extend some other replica control protocol.",
	contents = "1. Introduction
2. Independent Checkpointing and Rollback Recovery
3. Experimental Evaluation of the Concurrent Checkpointing and Recovery Algorithms
4. Increasing Availability Without Sacrificing Performance for Databse Application
5. Dynamic Partitioning and Merge
6. Conclusions",
}