From Crash Fault-Tolerance to Arbitrary-Fault Tolerance:

(1)

From Crash Fault-Tolerance to Arbitrary-Fault Tolerance:

Towards a Modular Approach

Roberto B ALDONI

Jean-Michel H ELARY

^y

Michel R AYNAL

^y

Abstract

This paper presents a generic methodology to transform a pro- tocol resilient to process crashes into one resilient to arbitrary failures in the case where processes run the same text and regu- larly exchange messages (i.e., the case of round-based protocols).

The methodology follows a modular approach encapsulating the detection of arbitrary failures in specific modules. This can be the starting point for designing tools that allow automatic transfor- mation. We show an application of this methodology to the case of consensus.

1 Introduction

Providing specifications and designing solutions for dis- tributed algorithms in an environment where processes can exhibit arbitrary behavior (e.g., omit to execute a statement or corrupt the value of a local variable) is notably more dif- ficult than in a crash context. Additionally, most of the time both specifications used in the crash model are inadequate because if a process shows an arbitrary faulty behavior from the beginning of its execution, it is impossible for others to detect it. Solutions used in the crash model become inade- quate because a malicious process can exhibit failures more subtle than crashes and these failures can lead to the viola- tion of the correctness criteria of the algorithm.

Our goal is to propose a methodology to transform dis- tributed algorithms resilient to process crashes into algo- rithms resilient to arbitrary failures. This method relies on a modular approach that aims to “encapsulate” each type of failure in a specific module

¹

. Because of the semantic na- ture of arbitrary failures, the actual design of some of these modules cannot be performed independently of the algo- rithm that will use them. However, our aim is to propose a

Universit´a “La Sapienza”, Roma, Italy. [email protected].

yIRISA, Campus de Beaulieu, Universit´e Rennes1, Rennes, France.

fhelary,raynal^g@irisa.fr.

1Previous alternative approaches by different authors are summarized in the Chapter 12 of [1]. In the asynchronous case, they provide only a masking of arbitrary faulty messages by identical faulty messages and thus, do not address all types of arbitrary failures.

design method that is independent of a particular algorithm.

In that sense, this paper contributes to a better understand- ing of the tools that can be used to produce algorithms re- silient to arbitrary failures. To illustrate this methodology, we show how it works in the case of consensus.

Consensus: a Case Study. Consensus is a fundamental paradigm for fault-tolerant distributed systems. Each pro- cess proposes a value to the others. All correct processes have to agree (Termination) on the same value (Agreement) which must be one of the initially proposed values (Validi- ty). Solving consensus in asynchronous distributed systems where processes can crash is a well-known difficult task:

Fischer, Lynch and Paterson have proved an impossibility result [7] stating that there is no deterministic solution to the consensus problem in the presence of even a single pro- cess crash . A way to circumvent this impossibility result is to use the concept of unreliable failure detectors introduced by Chandra and Toueg [3]. Each process is equipped with a failure detector module that provides it with a list of pro- cesses it currently suspects to have crashed. A failure detec- tor module can make mistakes by not suspecting a crashed process or by erroneously suspecting a correct one. Formal- ly, a failure detector module is defined by two properties:

completeness (a property on the actual detection of process crashes), and accuracy (a property that restricts the mistakes on erroneous suspicions). In a context where processes en- counter arbitrary failures, the specifications and solutions used in the crash model are inadequate, as discussed here- after.

Specification issues. The traditional Validity property is not adequate to define the consensus problem in an arbitrary failure setting. A faulty process can initially propose an ir- relevant value (i.e., a value different from the one it should propose) and then behave correctly. There is no way for the other processes to detect this failure. Consequently, the set of correct processes could agree on an irrelevant value

v proposed by a process. An advance has been done by

Doudou and Schiper [5] to circumvent this drawback: they

have introduced a new Validity property, namely the Vector

Validity property. This property requires that the decision

value is a vector containing a certain number (at least one)

(2)

of initial values of correct processes. So, processes have first to construct vectors (from processes initial values) and then to agree on one of these vectors (Vector Consensus).

One interesting feature of this construction is the following:

if a process falsifies an entry from a process, it will be de- tected as faulty by correct processes. In a sense, this allows to “certify” initial values of correct processes.

Solution issues. Malkhi and Reiter [10] have been the first to propose an extension of failure detectors able to cope with other types of failures. They have introduced a failure de- tector class

^3S

( bz ) based on the notion of a quiet process.

A process p is quiet if there exists a time after which some correct process does not receive anymore messages from p ^.

(Note that a crashed process is a quiet process). More re- cently, Doudou et al. [6] have pointed out that, in presence of arbitrary failures, it is no longer possible to define fail- ure detection modules independently of the protocols that rely on them: unlike process crashes, other types of failure are not context-free. In particular, the notion of quiet pro- cess is not an extension of the notion of crashed process: a process can be quiet (or mute) to some processes with re- spect to a given protocol, without being crashed. To cope with this notion, the authors have introduced the notion of muteness detectors , and particularly the class

³

M

^A

^{, where}

A

denotes a particular protocol. It can be seen as an in- stantiation of a generic class of omission failure detectors, introduced by Dolev et al. [4]. Moreover, the specification of this failure detector makes sense only in the context of regular round-based distributed algorithms (a precise defi- nition of this class of algorithms is provided in [6]).

Faulty processes are prone to all kinds of non-muteness failures, (e.g. corrupting the content of a message). K- ihlstrom, Moser and Melliar-Smith [9] have pointed out that messages need to be signed as well as certified. The signa- ture allows a receiver to authenticate the sender, while the certificate is a well-defined amount of redundant informa- tion carried by messages that allows a receiver to check if the content of a message is valid and if the sending of the message was done “at the right time”. In other words, a certificate allows a receiver to “look into the sender pro- cess” in order to see if the actions that produced the sending were correct. Using these tools, several consensus pro- tocols have been proposed for the arbitrary failure context ([5],[9],[10]). All these solutions use unreliable failure de- tectors to detect muteness failures, but leave the detection of other types of failure to the protocols themselves.

Results of the Paper. In the methodology proposed in this paper, each type of failure exhibited by a faulty process is detected by a specific module. We make two assumptions on the class of protocols to which the proposed approach applies. First, to meet the requirement of muteness failure detectors, we consider only regular round-based protocols (as in [6]). Roughly speaking, we require from such proto-

cols that they have a regular communication pattern (asyn- chronous rounds), i.e., that each correct process communi- cates regularly with others (note that this does not assume synchronous systems). Second, we assume that each pro- cess “knows” the program of every other process partici- pating in the algorithm (this is satisfied when all processes are assumed to follow the same program). From the ap- plication of the methodology it follows that each process has to be actually composed of five modules: (i) a round- based protocol module which executes the algorithm; (ii) a muteness failure detection module which detects perma- nent omission failures; (iii) a non-muteness failure detec- tion module which reveals other types of failures; (iv) a signature module which filters out messages in which the sender must be identified; (v) a certification module which manages certificates to be associated with messages. It is important to assume that certificates themselves cannot be corrupted. We will explain how this assumption can be en- forced.

This methodology is then applied to the transformation of the Hurfin and Raynal Consensus protocol [8] designed for the crash model, into a Consensus protocol resilient to arbitrary failures. The complete process structure of the re- sulting protocol is shown. We use a muteness failure detec- tor of the class

^3M

, whose implementation has been dis- cussed in [6]. The design of uncorruptible certificates and a reliable implementation of the non-muteness failure detec- tion module are described in detail, according to the general methodology. As far as we know, such an issue had never been addressed before.

The resulting protocol assumes a maximum number of non-correct processes F

min(

^b

( n

^;

1) = 2

^c

;C ) ^where C

is the maximum number of faulty processes that the under- lying certification service used by the protocol can cope with

²

. The protocol satisfies Agreement, Termination and Vector Validity with at least = n

^;

2 F entries from correct processes (note that, due to definition of F ^{, we have}

1 ^).

The paper is made of five sections. Section 2 presents the asynchronous model with arbitrary failures. Section 3 ad- dresses the general methodology to transform process crash resilient protocols into arbitrary failure resilient ones. The case study of consensus is presented in Sections 4 (protocol in the crash model) and 5 (transformed protocol).

2 The Model

A Distributed Computation. We consider a system con- sisting of n > 1 ^processes =

^f

p

¹

;p

²

;::: ;p _n

^g

^executing

the same program text. In the following, a correct process

2Usual certification mechanisms require

C =

^b

(n

^;

1)=3

^c^.

(3)

is a non-faulty process, as explained below. Every pair of processes is connected by a reliable channel. We also as- sume that channels are FIFO (this simplifies the solution when addressing arbitrary failures) and that each process p

possesses a private key and a public key. The private key is used by p to sign, in an unforgeable way, outgoing mes- sages. There is no assumption about the relative speed of processes or the message transfer delays.

Faulty Process. The very concept of arbitrary failures is intended to englobe any possible behavior as seen from out- side the process [12]. However, the design of tools for de- tecting such faulty behaviors rests upon a careful analysis of their cause. This analysis leads to classify failures into two different types: (1) Muteness failures (Permanent Message Omission.) The process stops sending messages it is sup- posed to send. This includes, but is not limited to, the pro- cess crash, where the process stops executing statements of its program and halts. (2) Non-Muteness Failures, includ- ing: corruption of a variable value, transient omissions of statements, duplication of a statement, execution of a spuri- ous statement, misevaluation of an expression, etc.

³

Note that a process may behave correctly with respect to some process, but exhibit a faulty behavior with respect to some other process. However, by definition, a process that makes even a single fault with respect to only one process is a faulty process.

3 Detecting Arbitrary Behaviors

In this section, we consider generic methods to design failure detection tools. Here, genericity means that meth- ods themselves are independent of the protocols using the detection tools, even if detection tools are actually designed to be used by particular protocols (and sometimes mixed with them). The situation is similar to the one encountered when, e.g., designing loops for sequential programs: the de- sign method (based on invariants, stop conditions, etc.) is independent of particular programs, whereas each loop is actually designed in the context of a particular program.

Manifestation of Non-Muteness Failures. In an asyn- chronous distributed computation, failures experienced by a process cannot be revealed to other processes unless the faulty process has to communicate with others (according to the program specification), via protocol messages

⁴

. As a consequence, detection of failures is closely related to the receipt of protocol messages. Then, the key idea is: each process has to check whether the right message has been sent by the right process at the right time with the right ar- guments.

3A more detailed analysis can be found in the full version [2].

4If a process must not send any message (according to the program specification), its behavior can be considered as non-relevant since it has no effect on the other processes.

Since the detection of (i) permanent omission fail- ures (muteness) (ii) processes falsifying their identity (i.e., wrong sender) has been extensively addressed in [6] and [9]

respectively, we will concentrate on the detection of other failures. These failures can be revealed to other processes via messages. We have identified two kinds of “externally”

visible wrong messages:

1. Out-of-order messages (i.e., wrong time), revealing either a non-permanent sending omission or a sending du- plication. This case includes a message that does not belong to the set of messages that can be generated by the text of the program.

2. Wrong Expected Messages (i.e., right time, but wrong message or wrong content). This case includes messages sent after misevaluation of a condition statement (substitut- ed messages), or messages whose content is syntactically or semantically incorrect.

Tools for Detection of Non-Muteness Failures. These tools are based, on the one hand, on certification mecha- nisms, and, on the other hand, on state machines built from the text of the protocol.

Certificates. A certificate is a piece of redundant informa- tion, including a part of the process history. This history includes internal, send and receipt events. The certificate is appended to the message sent, and is used by the receiv- er to check if the content of the message is consistent with the sender’s history (no semantically incorrect message). It also allows the receiver to check that the decision to send this message (and not another one, in case of choice) is the correct one (no substituted message).

Consider a message m ^from p ^to q , containing a value

v . This value has been updated by p according to its own history. Similarly, the sending event of m is a consequence of the receipt of other messages, and is enabled by a set of conditions involving local variables of p . The certificate ap- pended to m must contain proper information able to wit- ness: the value v , the fact that the required receipt events have been correctly taken into account, and the values of

p ’s local variables involved in the enabling condition.

Let us remark that we have to assume that certificates

themselves cannot be corrupted, since a corrupted certify-

ing information could be consistent with a corrupted in-

formation to certify. The concept of reliable Certification

Module encapsulates this assumption. Technically, it can be

enforced by the very composition of certificates: they are

composed of a set of signed messages, e.g. messages whose

receipt is the cause of the sending of m , or whose content

has influenced the update of a local variable whose value

is involved in m . Reliability results from the fact that no

process can falsify the content of a signed message without

being detected as faulty by a correct receiver, and from the

cardinality of the set of signed messages allowing majority

tests to be performed. The correctness of a certificate can

(4)

thus be verified at the recipient side, by a certificate analyz- er (implemented by a state machine, as explained below).

We will say that a certificate attached to m is well-formed with respect to a value v if it has been analyzed as correct and if the receiver can extract information consistent with the value of v and with the action to send m ^.

The design of certificates depends on the protocol to transform. The previous principles constitute a ”guideline”

for this design. If the protocol has been proved correct in the crash model, it remains only to prove that certificates are well-formed with respect to (1) values carried by mes- sages and (2) to decisions enabling their send event.

State machines. Under the assumption that every process knows the program text of the other processes, every pro- cess can build an ad-hoc state machine (e.g., a finite state automaton) modeling the expected behavior of another pro- cess. Let

^SM

p ( q ) be the state machine modeling the be- havior of process q with respect to process p . In this state machine, transitions are triggered when p receives a mes- sage from q . In every state, a set of receipt events are enabled. Out-of-order messages are those whose receipt events are not enabled. Wrong Expected messages are ei- ther (i) those whose receipt event is enabled, but whose syntactic composition is not consistent with the one of the corresponding expected message or (ii) those whose receipt event is enabled, but whose certificate is not well formed with respect to either its arguments or the action to send that particular message. When such events occur, they trigger a transition to a particular terminal state, called faulty state.

The actual design of a particular state machine has to be done in the particular context of the protocol to transform.

Handling local variables. Each local variable is a way that a faulty process can use to “attack” correct processes by corrupting its value. This may happen when the value of a local variable is initialized, or updated independently of the receipt of messages. As a consequence, expressions involving local variables should be, as much as possible, replaced with expressions involving only values of certifi- cates, because certificates cannot be corrupted. Moreover, it can happen – from the definition of the protocol – that some variables cannot be certified. In order to secure an up- date, the transformed protocol must generate an exchange of messages between all processes. After this exchange, each process has a vector of values, ensuring that a given number (at least one) of entries are correct. An entry is correct if 1) it is the value of a correct process, and 2) if a process falsifies this entry, it will be detected as faulty by correct processes. In other words, this vector is certified by messages exchanged to build it. Such a technique is called Vector certification.

General Methodology for Protocol Transformation. A process of the transformed protocol consists of five modules

whose role has been presented in the introduction. More precisely, the structure of a process p i is given in Figure 1.

The same figure also shows the path followed by a message

m ^(resp. m

⁰

) received (resp. sent) by p i ^.

Signature module. Each message arriving at p i is first pro- cessed by this module which verifies the signature of the sender (by using its public key). If the signature of the mes- sage is inconsistent with the identity field contained in the message, the message is discarded and its sender identity (known thanks to the unforgeable signature), is passed to the non-muteness failure detection module to be added to a set faulty i . Otherwise, the message is passed to the mute- ness failure detection module. Also, each message sent by

p i is signed by the signature module just before leaving the process. This module is generic, in the sense that it can be implemented independently of the protocols using it [13].

Muteness failure detection module. This module is devot- ed to the detection of mute processes [6]. It manages a set

suspected i of processes suspected to be mute to p i ^{. The}

receipt of a message will be taken into account by this de- tector, that will remove the sender from the set suspected i

(if necessary). Then, the message is passed to p i ^{’s non-}

muteness failure detection module.

Non-muteness failure detection module. This module re- ceives messages from the muteness failure detection mod- ule and checks if they are properly formed and follow the program specification of the sender (i.e., it actually imple- ments the state machine described above). In the affirma- tive, it passes the message to p _i ’s certification module. This module maintains a set ( faulty _i ) of processes it detected to experience one of the manifestations of non-muteness fail- ure. We say “process p i ^declares p j to be faulty” at some time, if, at that time, p j

²

faulty i . We assume that that this module is reliable, i.e., if p i is correct and p j

²

faulty i ^,

then p j has experienced an incorrect behavior detected by the non-muteness failure detection module of p i ^{. Finally,}

as for the set suspected i ^, p i ’s round-based protocol mod- ule can only read faulty i (note that, if p i is faulty, it can misevaluate an expression involving faulty i ^).

Reliable Certification Module. This module is responsible, upon the receipt of a message from the non-muteness failure detection module, for updating the corresponding certificate local variable. It is also in charge of appending properly formed certificates to the messages that are sent by p i ^.

Round-based Protocol Module. This module is the one that

executes the protocol and it is generated by the transfor-

mation of the round-based protocol in the crash model by

applying the rules defined above in this Section.

(5)

signature module

m

network

network certification module

signature module

consensus module

m

⁰

detection module non-muteness failure

muteness failure detection module

Figure 1. Structure of a process p i ^.

4 Hurfin-Raynal’s Consensus Protocol

This section provides a brief description of a version of Hurfin-Raynal’s protocol [8] that assumes

FIFO

channel- s (this constraint is not required by the original protocol).

It will be used as an example to show the transformation methodology. As other crash-resilient consensus protocol- s, it assumes a majority of correct (i.e., non-crashed) pro- cesses and uses a failure detector of the class

^3S

(Strong Completeness and Eventual Weak Accuracy). It proceeds in successive asynchronous rounds, using the rotating coor- dinator paradigm. During a round, a predetermined process (the round coordinator) tries to impose a value as the de- cision value. To attain this goal, each process votes: either (vote

CURRENT

) in favor of the value proposed by the round coordinator (when it has received one), or (vote

During each round, the behavior of each process p i ^is

determined by a finite state automaton. This automaton is composed of 3 states. The local variable state i ^{will de-}

note the automaton state in which p i currently is. During a round, the states of the automaton have the following mean- ing:

state i = q

⁰

^: p i has not yet voted ( q

⁰

is the automaton initial state).

state _i = q

¹

^: p _i ^{has voted}

^CURRENT

and has not changed its mind ( p _i ^{moves from} q

⁰

^to q

¹

^).

state i = q

²

^: p i ^{has voted}

^NEXT

^.

The protocol manages the progression of each process p i

within its automaton, according to the following rules. At the beginning of round r ^, state i = q

⁰

. Then, during r ^{, the}

transitions are:

Transition q

⁰ ^!

q

¹

( p _i first votes

CURRENT

) ^{. This}

transition occurs when p _i ^{receives a}

^CURRENT

vote (line 7), and is in the initial state q

⁰

(line 10). This means that p i ^is

the round coordinator, or has not previously suspected the round coordinator. Moreover, when p i ^{moves to} q

¹

^{and is}

not the current coordinator, it broadcasts a

CURRENT

vote (line 10).

Transition q

⁰ ^!

q

²

( p _i first votes

⁰

^{, suspects}

the current coordinator (line 13). This means that p i ^{has not}

previously received a

CURRENT

vote. Moreover, when p i

moves to q

²

, it broadcasts a

Transition q

¹ ^!

q

²

( p i changes its mind ) . This tran- sition (executed by statements at line 15) is used to prevent a possible deadlock. A process p i that has issued a

CUR

-

RENT

vote is allowed to change its mind if p i has received a (

CURRENT

or

CURRENT

votes (so it cannot decide), nor a majority of

¹

)

^{^}

(

^j

rec from i

^j

> n= 2) ^.

Protocol Description. In addition to the local variable

state i ^{, process} p i manages the following four local vari- ables:

r i defines the current round number.

est i contains the current estimation by p i of the deci- sion value.

nb current i ^(resp. nb next i ) counts the number of

CURRENT

(resp.

rec from i is a set composed of the process identities from which p i has received a (

CURRENT

or

Finally, suspected i is a set managed by the associated failure detector module; p i can only read this set.

Function consensus() consists of two concurrent tasks.

The first task handles the receipt of a

DECIDE

message (line 2); it ensures that if a process p i decides (line 2 or line 12), then all correct processes will also receive a

DE

-

CIDE

message. The second task (lines 3-18) describes a round: it consists of a loop that constitutes the core of the protocol. Each (

CURRENT

or

⁵

.

5In any round

r

ⁱ, only votes related to round

r

ⁱcan be received. A vote from

p

^krelated to a past round is discarded and a vote related to a future round

r

^k^(with

r

^k

> r

ⁱ) is buffered and delivered when

r

ⁱ

= r

^k^.

(6)

functionconsensus(

v

ⁱ⁾

(1)

r

ⁱ

0

^;

est

ⁱ

v

ⁱ^;

cobegin

(2) ^jjupon receipt of DECIDE(

p

^k

;est

^k^{) send}^DECIDE⁽

p

ⁱ

;est

^k^{) to}

^{; return(}

est

^k⁾

(3) ^jjloop % on a sequence of asynchronous rounds %

(4)

c (r

ⁱ

mod n) + 1

^;

r

ⁱ

r

ⁱ

+ 1

^;

state

ⁱ

q

⁰^;

rec from

ⁱ ^;^;

nb next

ⁱ

0

^;

nb current

ⁱ

0

^;

(5) if (

i = c

) then sendCURRENT(

p

ⁱ

;r

ⁱ

;est

ⁱ^{) to}

^endif;

(6) while (

nb next

ⁱ

n=2

) do % wait until a branch can be selected, and then execute it % (7) upon receipt ofCURRENT(

p

^k

;r

ⁱ

;est

^k⁾

(8)

nb current

ⁱ

nb current

ⁱ

+ 1

^;

rec from

ⁱ

rec from

ⁱ^[^f

p

^k^g^;

(9) if

(nb current

ⁱ

= 1)

^then

est

ⁱ

est

^k^endif;

(10) if

(state

ⁱ

= q

⁰

)

^then

state

ⁱ

q

¹^{; if}

i

⁶

= c

^{then send}^CURRENT⁽

p

ⁱ

;r

ⁱ

;est

ⁱ^{) to}

^endif;

(11) endif;

(12) if (

nb current

ⁱ

> n=2

) then sendDECIDE(

p

ⁱ

;est

ⁱ^{) to}

^{; return(}

est

ⁱ^{) endif}

(13) upon (

p

^c²

suspected

ⁱ^{) if (}

state

ⁱ

= q

⁰^{) then}

state

ⁱ

q

²^{; send}^NEXT⁽

p

ⁱ

;r

ⁱ^{) to}

^endif

(14) upon receipt ofNEXT(

p

^k

;r

ⁱ⁾

nb next

ⁱ

nb next

ⁱ

+ 1

^;

rec from

ⁱ

rec from

ⁱ^[^f

p

^k^g

(15) upon (

change mind

⁾

state

ⁱ

q

p

ⁱ

;r

ⁱ^{) to}

(16) endwhile

(17) if (

state

ⁱ⁶

= q

²^{) then}

state

ⁱ

q

p

ⁱ

;r

ⁱ^{) to}

^endif

(18) endloop coend

Figure 2. Hurfin-Raynal’s

^3S

-based Consensus Protocol (adapted to

FIFO

channels)

At the beginning of a round r , the current coordinator

p c proposes its estimate v c to become the decision value by broadcasting a

CURRENT

vote carrying this value (line 5).

Each time a process p i receives a (

CURRENT

or

NEX

-

T

) vote, it updates the corresponding counter and the set

rec from i (lines 8 and 14).

When a process receives a

CURRENT

vote for the first time, namely,

CURRENT

( p _k ^, r ^, est _k ), it adopts est _k ^{as its}

current estimate est _i (line 9). If, in addition, it is in state

q

⁰

, it moves to state q

¹

^{(line 10).}

A process p i decides on an estimate proposed by the current coordinator as soon as it has received a majority of

CURRENT

votes, i.e., a majority of votes that agree to con- clude during the current round (line 12).

When a process progresses from round r ^{to round} r +1

it issues a

These

The complete analysis and proof of this protocol can be found in [8].

5 Making the Protocol Resilient to Arbitrary Failures

In this section, we concentrate on the certification, the non-muteness failure detection and the resulting consensus

modules (muteness failure detection is ensured by a module of class

^3M

[6], and signature module is independent of the protocol).

5.1 Designing Certificates

Certificates must witness values of the local variables, and correct evaluation of conditions enabling the sending events, as described in Section 3, in order to allow a receiver to detect a sender failure by using a non-muteness failure detection module (i.e., a state machine).

Initial values. Values v i proposed by the processes can- not be certified, because they are initial values. So, it is necessary to apply the Vector Certification technique. In the context of consensus, this leads to the Vector Consensus problem

⁶

, specified by the classical Agreement and Termi- nation properties and the following Vector Validity property [5].

Every process decides on a vector vect ^{of size} n ^:

For every process p i ^{: if} p i is correct, then either

vect [ i ] = v i ^or vect [ i ] = null ^{, and}

At least

1 elements of vect are initial values of correct processes.

6The Vector Consensus notion has first been proposed in synchronous systems where it is called Interactive Consistency problem [11].

(7)

The transformation that must be applied to the original protocol is standard. It consists in a preliminary phase dur- ing which every process sends its proposition to every other process. Thus, each process constructs a vector whose j ^-th

entry is the value received from process p j . For each pro- cess p i , the problem lies in obtaining a vector of proposed values certifying initial values of correct processes (certi- fied vector). This certified vector will then be used as the value proposed by p i to the consensus protocol.

Let

INIT

denote the messages used in this preliminary phase. Each message

INIT

sent by p i carries its proposed value v i ^{. Messages}

^INIT

have an empty certificate. Howev- er, they are signed. This allows the vector built by a process

p i to be certified, in the following way: each time p i ^re-

ceives

^hINIT

( p j ;v j ) ;cert j

ⁱ

^(here, cert j =

^;

^{) from} p j ^{, its}

certification module adds this signed message to the certifi- cate est init i ^.

The text of this procedure for process p i is described in Figure 3, lines 4-9.

Each process initially broadcasts its value v i ^{and then}

waits for ( n

^;

F ) values from other processes. Each time

p _i receives a value, the message is added to est cert _i ^.

Exiting from the initial while loop (lines 6-9) we say that

“ est cert i is well-formed with respect to a value est vect i ^”

if the following conditions are satisfied:

j

est cert i

^j

= ( n

^;

F ) ^{, and}

The value est vect i is correct with respect to the ( n

^;

F )

^INIT

messages contained in est cert i (intuitively, this means that those ( n

^;

F ) messages “witness” that est vect i

is a correct value, because n

^;

F

n

^;

C ^).

We have the following results:

Proposition 1 Eventually, every correct process builds a vector est vect i ^{such that} est vect i [ i ] = v i ^and

⁸

k

⁶

= i : est vect _i [ k ] = v _k ^or est vect _i [ k ] = null ^{, and} est cert _i is well-formed with respect to est vect _i ^.

Proposition 2 No process can build two different initial certified vectors.

Due to space limitations, the proof of these propositions is omitted

⁷

.

Certifying values carried by the other messages. Apart from

INIT

messages, three types of messages are exchanged in the original protocol, namely,

CURRENT

,

DE

-

CIDE

. These messages carry the identity of their sender, the round number where they have been sent, and (except

Identity of the sender. This value is certified by the signature of the message, as explained in Section 3.

Estimate value. In the transformed protocol, estimate val- ues are vectors. The current estimate value of p i ^{will be}

denoted est vect i . The initial value of est vect i is certified

7It can be found in the full paper [2].

by est cert i , as explained above. This variable can then take successive values: it can be updated at most once per round due to the delivery of the first

CURRENT

message re- ceived during this round. When this occurs, the certificate

est cert i is also updated, with the certificate of the

CUR

-

RENT

message. Since this message is properly formed, it- s certificate cert k contains a correct certificate est cert k

(i.e., a certificate well-formed with respect to the value

est vect k contained in the

CURRENT

message). During a round, est cert i is said to be “well-formed with respect to

est vect i ” if the value est vect i is the value included in the ( n

^;

F ) messages contained in est cert k . Otherwise, it means that process p i has either omitted to execute the update of est vect i , or corrupted its value.

Similarly, a process decides a value est vect c ^{at round} r

either when it has received ( n

^;

F ) ^valid

^CURRENT

^mes-

sages (lines 20-21) or when it receives a valid

DECIDE

mes- sage from another process (lines 2-3). In the first case the process authenticates its decision by using a certificate

current cert i made of the set of the ( n

^;

F ) ^signed

^CUR

^-

RENT

messages (line 21), in the second case the message

DECIDE

(with the same certificate) is relayed to the other processes (line 3).

We say that current cert i is well formed with respect to r ^and est vect if the following conditions are satisfied:

j

current cert i

^j

= ( n

^;

F ) (otherwise process p i ^mi-

sevaluated the condition of line 20).

the certificate of each message in current cert i ^con-

tains a est cert c well formed with respect to est vect ^.

the certificate of each message in current cert i ^con-

tains a next cert well formed with respect to r ^.

Round number. A process progress from round r

^;

1 ^to

round r ⁽ r > 1 ) when it has received enough

^d

n

2

e

in the original protocol, at least

( n

^;

F ) in the transformed protocol). So, the new value of r (resulting from the assignment r r + 1 ^performed

at the beginning of the new round) is certified by the signed messages

whose reception has triggered the start of the new round. This set of signed messages constitute the certificate next cert i . This certificate is reset to the emp- ty set at the beginning of each round. When a new round

r ⁽ r > 1 ) starts (before resetting next cert i ), we say that

next cert _i is well-formed with respect to r

^;

1 if the fol- lowing two conditions are satisfied:

j

next cert _i

^j

= ( n

^;

F ) (otherwise process p _i ^{has mise-}

valuated the condition allowing to start a new round, line 14 in the text).

The value r

^;

1 is consistent with respect to the in- formation in the ( n

^;

F )

^NEXT

messages contained in

est cert _i (i.e., all messages refer to round r

^;

1 ). Otherwise process p _i has corrupted the value of r at the beginning of the round, line 11 in the text).

As far as round 1 is concerned, this value cannot be cer-

(8)

tified by

^;

(Otherwise ei- ther p i has corrupted r ^or c at the beginning of the round 1 (line 11) or has misevaluated the condition i = c ^{(line 12).}

Certifying sending events of messages.

Messages

CURRENT

.

At the beginning of a round r , the current coordi- nator p c proposes its estimate est vect c to become the decision value by broadcasting a

CURRENT

message car- rying this value (line 12). This message is certified by

est cert _c

^[

next cert _c ^. est cert _c is used to certify the val- ue proposed by the coordinator est vect _c ^{, that is,} est cert _c

must be well formed with respect to est vect _c ^. next cert _c

is used to certify the value of the current round r ^(i.e., next cert c must be well formed with respect to r

^;

1 ^).

When p i receives the first

CURRENT

valid message while in state q

⁰

(line 18), it relays a

CURRENT

mes- sage (line 19) by using the signed valid message

CURREN

-

T

just received as a certificate. This certificate contains

est cert c

^[

next cert c used to certify r ^and est vect c ^(as

above).

Messages

If, while it is in the initial state q

⁰

^, p i ^suspect-

s the current coordinator, it broadcasts a

²

. This message is certified by

est cert i

^[

current cert i

^[

next cert i . Those certificates ( current cert i ^and next cert i ) will be used by the non- muteness failure detection module of the receiver to decide whether p i has misevaluated or not the sending condition (at line 23). Moreover, as

When the predicate at line 28 becomes true, in order to avoid a deadlock, process p i broadcasts a

^[

next cert i ^. current cert i

and next cert i are used to certify the non-misevaluation of the predicate change mind .

When a process progresses from round r ^{to round} r +1

it issues a

These

messages are used to prevent other processes from remaining blocked in round r (line 31). This mes- sage is certified by next cert i which will allow a receiver to check the correct evaluation of the condition at line 14 by verifying if next cert i is well formed with respect to r ^.

Messages

DECIDE

. We have discussed above the use of cer- tificates current cert to authenticate the decision to send those messages.

Certifying other local variables. Apart from the local vari- ables r; est vect ^{, and} c (which depends directly on r ^{), cer-}

tified as explained above, the protocol manages other local

variables that are not transmitted in messages, but are nev- ertheless involved in some evaluations. These variables are

nb current ^, nb next ^, rec from ^and state . Hence, these local variables must be certified. A way to do this is to re- place them by expressions obtained from certificates. In the original protocol augmented with the certificates designed above, we will obtain:

nb current i ^(resp. nb next i ) can be replaced by us- ing the cardinality of the certificate current cert i ^(resp.

next cert i ^).

state i can assume three values ( q

⁰

;q

¹

;q

²

). Each state can be identified (when necessary) by using certificates in the following way:

- state i = q

⁰

^{: no}

^CURRENT

message has been re- ceived by p i ^and p i has not sent a

^j

current cert i

^j

= 0)

^{^} ^h^NEXT

( p i ;r i ) ;cert

ⁱ

i

⁶²

next cert i ^.

- state i = q

¹

^{: a}

^CURRENT

message has been received by

p i ^and p i has not sent a

^j

current cert i

^j

1)

^{^}^h^NEXT

( p i ;r i ) ;cert

ⁱ

i

⁶²

next cert i ^.

- state i = q

²

^: p i ^{has sent a}

^NEXT

^message:

hNEXT

( p i ;r i ) ;cert

ⁱ

i

²

next cert i ^.

rec from i ^can ^be replaced by the variable

REC FROM i using certificates in the following way:

REC FROM i

^f

p `

^jh^NEXT

( p ` ;r ` ) ;cert

ⁱ

`

²

next cert i

^_

h CURRENT

( p ` ;est vect ` ;r ` ) ;cert

ⁱ

`

²

current cert i

^g

^.

The predicate change mind . It follows that the predicate change mind

( state i = q

¹

)

^{^}

(

^j

rec from i

^j

> ( n

^;

F )) is now expressed in terms of certificates:

(

^j

current cert i

^j

1)

^{^} ^h^NEXT

( p i ;r i ) ;cert

ⁱ

i

⁶²

next cert i

^{^}^j

REC FROM i

^j

( n

^;

F ))

5.2 Consensus and Certification Module

The text of the protocol executed by these two mod- ules is presented in Figure 3. To facilitate the com- parison with the original protocol described Figure 2, the new parts are presented in gray . For better reading, as- signments performed by the reliable certification module (update of certificates) are displayed inside a box .

5.3 Non-Muteness Failure Detection Module

The non-muteness failure detection module of process p i

is composed of a set of finite state automata, one for each process. The module associated with p i is depicted in Fig- ure 4. It represents the view p i has, during the current round

r _i , on the behavior of p _k with respect to its automaton dur- ing the same round.

Automaton States. The automaton of process p _i ^related

to a process p _k is composed of 6 states. Three states are

(9)

functionconsensus(

v

ⁱ⁾

(1)

next cert

ⁱ ^; ^;

est cert

ⁱ ^; ^;

r

ⁱ

0

^;

cobegin

(2)^jjupon receipt of a properly signed and formed^hDECIDE

(p

^k

;est vect

^k

)

^,

cert

^k ⁱ^k

(3) send^hDECIDE

(p

ⁱ

;est vect

^k

)

^,

cert

^k ⁱⁱ^to

^{; return(}

est vect

^k⁾

(4)^jj for

k 1

^to

n

^do

est vect

ⁱ

[k] null

^endfor;

(5) send^hINIT

(p

ⁱ

;v

ⁱ

);

^;iⁱ^to

^;

(6) while ^j

est cert

^ij⁶

= (n

^;

F)

^do

(7) wait receipt of^hINIT

(p

^k

;v

^k

);cert

^kⁱ^k^;

(8)

est cert

ⁱ

est cert

ⁱ^[^h^INIT

(p

^k

;v

^k

);cert

^kⁱ^k ^;

est vect

ⁱ

[k] v

^k

(9) end while

(10) loop % on a sequence of asynchronous rounds % (11)

c (r

ⁱ

mod n) + 1

^;

r

ⁱ

r

ⁱ

+ 1

^;

(12) if (

i = c

) then send^hCURRENT

(p

ⁱ

;r

ⁱ

;est vect

ⁱ

)

^,

est cert

ⁱ^[

next cert

ⁱ ⁱⁱ^to

^{endif; %}

q

⁰^!

q

¹^for

i = c

^%

(13)

next cert

ⁱ ^; ^;

current cert

ⁱ ^; ^;

(14) while (^j

next cert

^ij

(n

^;

F)

^{) do}

(15) upon receipt of a properly signed and formed^hCURRENT

(p

^k

;r

ⁱ

;est vect

^k

)

^,

cert

^kⁱ^k

(16)

current cert

ⁱ

current cert

ⁱ^[^h^CURRENT

(p

^k

;r

ⁱ

;est vect

^k

);cert

^kⁱ^k ^;

(17) if

(

^j

current cert

ⁱ^j

= 1)

^then

est cert

ⁱ

cert

^k ^;

est vect

ⁱ

est vect

^k^endif;

(18) if

(

^j

current cert

ⁱ^j

= 1)

^{^}

(

^h^NEXT

(p

ⁱ

;r

ⁱ

);cert

ⁱⁱⁱ⁶²

next cert

ⁱ

)

^{^}

(i

⁶

= c)

^%

q

⁰^!

q

¹^for

i

⁶

= c

^%

(19) then send^hCURRENT

(p

ⁱ

;r

ⁱ

;est vect

ⁱ

)

^,

current cert

ⁱ ⁱⁱ^to

^endif;

(20) if (^j

current cert

ⁱ^j

= (n

^;

F)

^{) then}

(21) send^hDECIDE

(p

ⁱ

;est vect

ⁱ

)

^,

est cert

ⁱ ⁱⁱ^to

^{; return(}

est vect

ⁱ^{) endif}

(22) upon (

p

^c²

(suspected

ⁱ ^_

faulty

ⁱ

)

⁾

(23) if

((

^j

current cert

ⁱ^j

= 0)

^{^}^h^NEXT

(p

ⁱ

;r

ⁱ

);cert

ⁱⁱⁱ⁶²

next cert

ⁱ

)

(24) then send^hNEXT

(p

ⁱ

;r

ⁱ

)

^,

current cert

ⁱ^[

next cert

ⁱ^[

est cert

ⁱ ⁱⁱ^to

^%

q

⁰^!

q

²^%

(25) endif

(26) upon receipt of a properly signed and formed^hNEXT

(p

^k

;r

ⁱ

)

^,

cert

^k ⁱ^k

(27)

next cert

ⁱ

next cert

ⁱ^[^h^NEXT

(p

^k

;r

ⁱ

);cert

^kⁱ^k

(28) upon (change mind) %

q

¹^!

q2

^%

(29) send^hNEXT

(p

ⁱ

;r

ⁱ

)

^,

current cert

ⁱ^[

next cert

ⁱ ⁱⁱ^to

(30) endwhile

(31) if (^hNEXT

(p

ⁱ

;r

ⁱ

);cert

ⁱⁱⁱ⁶²

next cert

ⁱ) then send^hNEXT

(p

ⁱ

;r

ⁱ

)

^,

next cert

ⁱ ⁱⁱ^to

^{endif %}

q

⁰

=q

¹^!

q

²^%

(32) endloop coend

Figure 3. Merging of the Consensus and Certification Modules of Process p _i

(10)

q

¹

q

²

_

q

⁰

r r

⁺¹

_

start

_

:(

PF

^0;1(

current

^k)

:(

PF

^0;2(

PF

^1;2(

current

^k

current

^k^_

final current

^k^_

decide

^k

init

^k

decide

^k^_

init

^k

init

^k_^_

decide

^k ^:(

PF

^1;final⁽

decide

^k⁾⁾

PF

^1;final⁽

decide

^k⁾

:(

PF

^2;final(

decide

^k))^_

current

^k^_

PF

^1;2(

PF

^0;1(

current

^k⁾

PF

^0;2⁽

faulty PF

^2;final⁽

decide

^k⁾

Figure 4. Automaton of the p i ’s non-muteness failure detection module to monitor process p k ^.

related to a single round (as the ones described in Section 4: q

⁰

;q

¹

;q

²

), one is the initial state ( start ), one is the fi- nal state ( final ) and one is the state declaring p k ^{is faulty}

( faulty ^).

Automaton Transitions. A transition is triggered each time a message arrives form p k . More precisely, a condition is evaluated and, according to the result of this evaluation, the automaton moves from its current state a to another state

b . In particular, if the condition is evaluated to false, the automaton moves from the current state to the sate faulty ^.

The condition is composed of the following predicates.

The predicate type of message _k is true if the mes- sage of that type is not an Out-of-order message and is syn- tactically correct.

PF a;b ( type of message ) ^returns ^true ^if type of message k is true, and if the message is not a Wrong Expected Message. Thus, the set of predicates

PF implements the certificate analyzer (see section 3).

Due to space limitations, the detailed explanation of transitions (depicted on Figure 4) is omitted (see [2]).

References

[1] Attiya H. and Welch J., Distributed Computing, Mac Graw Hill, 1998.

[2] Baldoni R., Helary J.M., and Raynal M., From Crash Fault-Tolerance to Arbitrary Fault- Tolerance: Towards a Modular Approach, http://www.irisa.fr/EXTERNE/bibli/pi/1307/1307.html.

[3] Chandra T. and Toueg S., Unreliable Failure Detectors for Reliable Distributed Systems. JACM, 34(1):225–267, 1996.

[4] Dolev D., Friedman R., Keidar I. and Malkhi D., Failure De- tectors in Omission Failure Environments, Brief announce- ment in Proc. 16th ACM Symposium on PODC, p.286, 1997.

[5] Doudou A. and Schiper A., Muteness Failure Detectors for Consensus with Byzantine Processes, Brief announcement in Proc. 17th ACM Symposium on PODC, p. 315, 1998.

[6] Doudou A., Garbinato B., Guerraoui R. and Schiper A., Muteness Failure Detectors: Specification and Implemen- tation. Proc. Third EDCC, LNCS 1667, pp. 71-87, 1999.

[7] Fischer M.J., Lynch N. and Paterson M.S., Impossibility of Distributed Consensus with One Faulty Process. JACM, 32(2):374–382, April 1985.

[8] Hurfin M. and Raynal M., A Simple and Fast Asynchronous Consensus Protocol Based on a Weak Failure Detector, Dis- tributed Computing, 12(4):209-233, 1999.

[9] Kihlstrom K.P., Moser L.E. and Melliar-Smith P.M., Solving Consensus in a Byzantine Environment Using an Unreliable Fault Detector. OPODIS’97, Chantilly, pp. 61-76, 1997.

[10] Malkhi D. and Reiter M., Unreliable Intrusion Detection in Distributed Computations. In Proc. of the 10th IEEE CSFW, pp. 116–124, Rockport (MA), 1997.

[11] Pease L., Shostak R. and Lamport L., Reaching Agreement in Presence of Faults. JACM, 27(2):228-234, 1980.

[12] Powell D., Failure Mode Assumptions and Assumption Coverage, in Proc. of the 22nd Int. Symp. on Fault-Tolerant Computing (FTCS-22), Boston, MA, pp.386-95, 1992.

From Crash Fault-Tolerance to Arbitrary-Fault Tolerance: