Information about Error Avalanche

Byzantine fault tolerance is the name given to a sub-field of error tolerance research inspired by the Byzantine Generals' Problem, which is a generalized version of the Two Generals' Problem.

The object of Byzantine fault tolerance is to be able to defend against a Byzantine failure, in which a component of some system not only behaves erroneously, but also fails to behave consistently when interacting with multiple other components. Correctly functioning components of a Byzantine fault tolerant system will be able to reach the same group decisions regardless of Byzantine faulty components. There are upper bounds on the percentage of traitorous or unreliable components, however.

Byzantine failures

A Byzantine fault is an arbitrary fault that occurs during the execution of an algorithm by a distributed system. It encompasses those faults that are commonly referred to as "crash failures" and "send and omission failures". When a Byzantine failure has occurred, the system may respond in any unpredictable way, unless it is designed to have Byzantine fault tolerance.

These arbitrary failures may be loosely categorized as follows:
  • a failure to take another step in the algorithm, also known as a crash failure;
  • a failure to correctly execute a step of the algorithm; and
  • arbitrary execution of a step other than the one indicated by the algorithm.
For example, if the output of one function is the input of another, then small round-off errors in the first function can produce much larger errors in the second. If the second function were fed into a third, the problem could grow even larger, until the values produced are worthless. Another example is in compiling source code. One minor syntactical error early on in the code can produce large numbers of perceived errors later, as the compiler gets out-of-phase with the lexical and syntactic information in the source program.

Steps are taken by processes, the abstractions that execute the algorithms. A faulty process is one that at some point exhibits one of the above failures. A process that is not faulty is correct.

The Byzantine failure assumption models real-world environments in which computers and networks may behave in unexpected ways due to hardware failures, network congestion and disconnection, as well as malicious attacks. Byzantine failure-tolerant algorithms must cope with such failures and still satisfy the specifications of the problems they are designed to solve. Such algorithms are commonly characterized by their resilience t, the number of faulty processes with which an algorithm can cope.

Many classic agreement problems, such as the Byzantine Generals' Problem, have no solution unless t < n / 3, where n is the number of processes in the system.

The Two Generals' Problem is a specific case which assumes that processes are reliable but communication between processes is not reliable.

Origin

Byzantine refers to the Byzantine Generals' Problem, an agreement problem in which generals of the Byzantine Empire's army must decide unanimously whether to attack some enemy army. The problem is complicated by the geographic separation of the generals, who must communicate by sending messengers to each other, and by the presence of traitors amongst the generals. These traitors can act arbitrarily in order to achieve the following aims: trick some generals into attacking; force a decision that is not consistent with the generals' desires, e.g. forcing an attack when no general wished to attack; or confusing some generals to the point that they are unable to make up their minds. If the traitors succeed in any of these goals, any resulting attack is doomed, as only a concerted effort can result in victory.

Byzantine fault tolerance can be achieved if the loyal (non-faulty) generals have a unanimous agreement on their strategy. Note that if the source general is correct, all loyal generals must agree upon that value. Otherwise, the choice of strategy agreed upon is irrelevant.

Solutions

Several solutions were originally described by Lamport, Shostak, and Pease in 1982. They began by noting that the Generals' Problem can be reduced to solving a "Commander and Lieutenants" problem where Loyal Lieutenants must all act in unison and that their action must correspond to what the Commander ordered in the case that the Commander is Loyal. Roughly speaking, the Generals vote by treating each others' orders as votes.
  • One solution considers scenarios in which messages may be forged, but which will be Byzantine-fault-tolerant as long as the number of traitorous generals does not equal or exceed one third. The impossibility of dealing with one-third or more traitors ultimately reduces to proving that the 1 Commander + 2 Lieutenants problem cannot be solved if the Commander is traitorous. The reason is, if we have three commanders, A, B, and C, and A is the traitor: when A tells B to attack and C to retreat, and B and C send messages to each other, forwarding A's message, neither B nor C can figure out who is the traitor, since it isn't necessarily A - the other commander could have forged the message purportedly from A. It can be shown that if n is the number of generals in total, and t is the number of traitors in that n, then there are solutions to the problem only when n is greater than or equal to 3 times t + 1.
  • A second solution requires unforgeable signatures (in modern computer systems, this may be achieved through public-key cryptography), but maintains Byzantine fault tolerance in the presence of an arbitrary number of traitorous generals.
  • Also presented is a variation on the first two solutions allowing Byzantine-fault-tolerant behavior in some situations where not all generals can communicate directly with each other.

See also

References

External links

An error-tolerant design is one that does not unduly penalize user errors. It is the human equivalent of fault tolerant design that allows equipment to continue functioning in the presence of hardware faults, such as a "limp-in" mode for an automobile electronics unit that would be
..... Click the link for more information.
In computing, the Two Generals' Problem is a thought experiment meant to illustrate the pitfalls and design challenges of attempting to coordinate an action by communicating over an unreliable link.
..... Click the link for more information.
In document ISO/CD 10303-226, a fault is defined as an abnormal condition or defect at the component, equipment, or sub-system level which may lead to a failure.

According to the Federal Standard 1037C of the United States, the term fault
..... Click the link for more information.
In mathematics, computing, linguistics, and related disciplines, an algorithm is a finite list of well-defined instructions for accomplishing some task that, given an initial state, will proceed through a well-defined series of successive states, eventually terminating in an
..... Click the link for more information.
Distributed computing is a method of computer processing in which different parts of a program run simultaneously on two or more computers that are communicating with each other over a network.
..... Click the link for more information.
crash in computing is a condition where a program (either an application or part of the operating system) stops performing its expected function and also stops responding to other parts of the system. Often the offending program may simply appear to freeze.
..... Click the link for more information.
In computer science, a subroutine (function, method, procedure, or subprogram) is a portion of code within a larger program, which performs a specific task and can be relatively independent of the remaining code.
..... Click the link for more information.
A round-off error, also called rounding error, is the difference between the calculated approximation of a number and its exact mathematical value. Numerical analysis specifically tries to estimate this error when using approximation equations and/or algorithms, especially
..... Click the link for more information.
compiler is a computer program (or set of programs) that translates text written in a computer language (the source language) into another computer language (the target language).
..... Click the link for more information.
source code (commonly just source or code) is any sequence of statements and/or declarations written in some human-readable computer programming language.
..... Click the link for more information.
In computing, the Two Generals' Problem is a thought experiment meant to illustrate the pitfalls and design challenges of attempting to coordinate an action by communicating over an unreliable link.
..... Click the link for more information.
Byzantine Empire or Byzantium is the term conventionally used since the 19th century to describe the Greek-speaking Roman Empire of the Middle Ages, centered on its capital of Constantinople.
..... Click the link for more information.
digital signature or digital signature scheme is a type of asymmetric cryptography used to simulate the security properties of a signature in digital, rather than written, form.
..... Click the link for more information.
Public-key cryptography, also known as asymmetric cryptography, is a form of cryptography in which a user has a pair of cryptographic keys - a public key and a private key. The private key is kept secret, while the public key may be widely distributed.
..... Click the link for more information.
peer-to-peer (or "P2P") computer network exploits diverse connectivity between participants in a network and the cumulative bandwidth of network participants rather than conventional centralized resources where a relatively low number of servers provide the core value to a
..... Click the link for more information.
Association for Computing Machinery

Formation 1947
Headquarters New York, NY
Membership 83,000
President Stuart Feldman
Website [1]

The Association for Computing Machinery, or ACM
..... Click the link for more information.
Barbara Liskov (born Barbara Huberman, 1939), is a prominent computer scientist. She is currently the Ford Professor of Engineering in the Electrical Engineering and Computer Science department at the Massachusetts Institute of Technology.
..... Click the link for more information.
Operating Systems Design and Implementation (OSDI) is an annual Computer Science academic conference organized by USENIX.

See also

  • List of important publications in computer science

..... Click the link for more information.


This article is copied from an article on Wikipedia.org - the free encyclopedia created and edited by online user community. The text was not checked or edited by anyone on our staff. Although the vast majority of the wikipedia encyclopedia articles provide accurate and timely information please do not assume the accuracy of any particular article. This article is distributed under the terms of GNU Free Documentation License.
Herod_Archelaus


page counter