Experimental Analysis of Replication in Distributed Systems

Get BibTex-formatted data

Author

Abdelsalam Ali Helal

Entry type

phdthesis

Abstract

The main objective of replication in distributed database systems is to increase data availability. However, the overhead associated with relication may impair the performance of transaction processing. Moreover, in the presence of changing failure and transaction characteristcs, static replication schemes are so restrictive that they may actually decrease availability. The purpose of this research is to show how adaptability and data reconfiguration can be used in conjunction with static replcation schemes to achieve and maintain higher levels of availability. The basis of this research is an integrated study of availability and performance of replication methods. The integrated study was suggested as a future work in [Pu85}. The availability analysis part of the study is performed through an analystical model. The performance evaluation part is conducted on the second version of the RAID distributed database system developed at Purdue. We classify availability into two categories: algorithmic and operational. While algorithmic availability measures the fault-tolerance provided by replication methods against component failures, operational availability examines the effect of performance failure on the validity of the algorithmic measure. Performance failure can result from an inefficient implementation of an expensive fault-tolerant replication method. Algorithmic availability is studied through an analytical model that ecompasses transaction and database parameters, site and communication link failures, and replication methods' paramters. Operational availability is examined, experimentally, through near-saturation performance measurements of an actual implementation of replication methods. We implement a variety of replication mechanisms and an infrastructure for adaptability to detected and predicted failures. The implementation includes off-line relicatin management, a stand-alone replication control server, a quorum-based interface to a library of replication methods, quorum selection heuristics, a surveillance facility, and a dynamic data reconfiguration protocol. The effectiveness and performance of our implementation, methods, and ideas are tested through experimental measurements of transaction performance. The experiments use an extended version of the standard DebitCredit benchmark. Using the availability model and the RAID system combined, a series of experiments are conducted. We study static replication schemes and devlop local policies for their efficient use. We then examine how to adapt the use of these schemes to perturbations in parameters which includes transaction read/write mix and site and communication link reliabilities. The experiments give a number of insights about how to adapt replication in order to increase the availability of fault-intolerant replication methods, and reduce performance penalties of methods which are highly fault-tolerant.

Key alpha

Helal

Note

May 1991

School

Purdue University

Publication Date

1900-01-01

1. Introduction 2. Related Research 3. Measures of Availability 4. The Design and Implementation of Replication in the RAID System 5. Experimental Replication Policies in RAID 6. Conclusion and Future Work

Location

A hard-copy of this is in REC 216

BibTex-formatted data

To refer to this entry, you may select and copy the text below and paste it into your BibTex document. Note that the text may not contain all macros that BibTex supports.

@Phdthesis{ Helal,
	title = "Experimental Analysis of Replication in Distributed Systems",
	author = "Abdelsalam Ali Helal",
	note = "May 1991",
	school = "Purdue University",
	abstract = "The main objective of replication in distributed database systems is to increase data availability.  However, the overhead associated with relication may impair the performance of transaction processing.  Moreover, in the presence of changing failure and transaction characteristcs, static replication schemes are so restrictive that they may actually decrease availability.
The purpose of this research is to show how adaptability and data reconfiguration can be used in conjunction with static replcation schemes to achieve and maintain higher levels of availability.  The basis of this research is an integrated study of availability and performance of replication methods.  The integrated study was suggested as a future work in [Pu85}.  The availability analysis part of the study is performed through an analystical model.  The performance evaluation part is conducted on the second version of the RAID distributed database system developed at Purdue.
We classify availability into two categories: algorithmic and operational.  While algorithmic availability measures the fault-tolerance provided by replication methods against component failures, operational availability examines the effect of performance failure on the validity of the algorithmic measure.  Performance failure can result from an inefficient implementation of an expensive fault-tolerant replication method.  Algorithmic availability is studied through an analytical model that ecompasses transaction and database parameters, site and communication link failures, and replication methods' paramters.  Operational availability is examined, experimentally, through near-saturation performance measurements of an actual implementation of replication methods.
We implement a variety of replication mechanisms and an infrastructure for adaptability to detected and predicted failures.  The implementation includes off-line relicatin management, a stand-alone replication control server, a quorum-based interface to a library of replication methods, quorum selection heuristics, a surveillance facility, and a dynamic data reconfiguration protocol.  The effectiveness and performance of our implementation, methods, and ideas are tested through experimental measurements of transaction performance.  The experiments use an extended version of the standard DebitCredit benchmark.
Using the availability model and the RAID system combined, a series of experiments are conducted.  We study static replication schemes and devlop local policies for their efficient use.  We then examine how to adapt the use of these schemes to perturbations in parameters which includes transaction read/write mix and site and communication link reliabilities.  The experiments give a number of insights about how to adapt replication in order to increase the availability of fault-intolerant replication methods, and reduce performance penalties of methods which are highly fault-tolerant.",
	contents = "1.  Introduction
2.  Related Research
3.  Measures of Availability
4.  The Design and Implementation of Replication in the RAID System
5.  Experimental Replication Policies in RAID
6.  Conclusion and Future Work",
}