Intrusion-Tolerant Reliable Broadcast

(1)

Intrusion-Tolerant Reliable Broadcast

Silvia Bonomi, Antonella Del Pozzo, Roberto Baldoni Sapienza Universit`a di Roma, Via Ariosto 25, 00185 Roma, Italy

{bonomi, delpozzo, baldoni}@dis.uniroma1.it

MIDLABTECHNICALREPORT2/13 2013

Abstract. We consider a system with n processes where some of them can show a byzantine behavior. A byzantine process can deviate arbitrarily from the protocol, e.g., dropping messages, changing the content of a message (different recipients may receive different content of the same broadcast message), delivering messages not sent by any process or creating fake messages. This paper introduces a new broadcast abstraction, namely Intrusion-tolerant Reliable Broadcast (IT-RB). IT-RB ensures that if a messageM(v) (where v is the content ofM) is broadcast by a correct process, all correct processes will deliver v. Consider now {v1, . . . , vk} (with 1 ≤ k < n) be the set of contents associated withMand sent to other processes by a byzantine one, IT-RB ensures: if the sender of the broadcastMis byzantine, then a subset of correct processes deliver the same content v ∈ {v1, . . . , vk} while the rest of correct processes deliver ⊥.

We first provide a protocol P implementing IT-RB and prove its correctness. P can deliver an infinite number of ⊥ in an infinite run. Secondly, we introduce an oracle detecting byzantine processes running P, namely P-LO, and provide an implementation of P-LO. Such oracle exploits the information collected by a process running P to infer bad process behaviors and to remove them from the computation. Considering an infinite run generated by P^∗, i.e. a protocol that runs together P plus P-LO, we finally show that the number of ⊥ delivered by correct processes is finite.

Keywords: Intrusion-tolerant Reliable Broadcast, Byzantine Detector Oracle, Byzantine failures, Intruders, Synchronous Systems.

1 Introduction

Dependable abstractions are a pillar of many complex modern software systems (from avionics to cloud computing environments) and byzantine-fault-tolerance (BFT) is one of the main techniques employed to ensure both safety and liveness properties that have to be guaranteed despite any type of failures, including malicious ones. More recently, special attention has been spent in designing intrusion-tolerant (IT) systems, i.e., systems tolerating the presence of a particular type of byzantine behaviours, called intruders, coming from the outside of the computation and governed by an entity that intentionally attacks the system with the aim of compromising its security properties (e.g. data integrity) [24]. In this paper, we propose a basic distributed communication primitive, namely Intrusion-tolerant Reliable Broadcast (IT-RB), allowing a set of n processes to reliably communicate also in the presence of f intruders, i.e. byzantine processes that deviate from the protocol by dropping messages, changing the content of a message (different recipients may receive different content of the same broadcast message), delivering messages not sent by any process or creating fake messages, with the aim of breaking the agreement on the delivered value. IT-RB ensures that if a messageM(v) (where v is the content ofM) is broadcast by a correct process, all correct processes will deliver v. Consider now {v₁, . . . , v_k} (with 1 ≤ k < n) be the set of contents associated withMand sent to other processes by an intruder, IT-RB ensures that a subset of correct processes deliver the same content v ∈ {v1, . . . , vk} while the rest of correct processes deliver ⊥. In the following, we first provide a protocol P, implementing IT-RB, that can deliver an infinite number of ⊥ in an infinite run and we prove its correctness. Secondly, we introduce an oracle detecting byzantine processes running P, namely P-LO, and provide an implementation for it.

Such oracle exploits the information collected by a process running P to detect intruders and to exclude them from the computation. Finally, we show that combining P and P-LO in a new IT-RB protocol (called P^∗), the number of ⊥ delivered by correct processes in an infinite run becomes finite.

(2)

The rest of the paper is organized as follows: Section 2 discusses related works, Section 3 introduces the system model and Section 4 presents the specification and a preliminary implementation of the Intrusion-Tolerant Reliable Broadcast primitive. Section 5 presents a second protocol implementing IT- RB and using a local oracle to detect byzantine processes. Finally, Section 6 concludes the paper.

2 Related Work

Byzantine Reliable Broadcast Implementations. A Reliable Broadcast implementation in the presence of byzantine processes can be easily obtained by running a byzantine consensus [15] protocol. In [15], Lamport et al. show that, in a distributed system with f byzantine processes, it is not possible to reach agreement if n ≤ 3f and they proposed a first protocol taking f + 1 rounds to converge with an exponential cost. Fischer and Lynch [11] also showed that f + 1 rounds were necessary in the worst- case run of any byzantine agreement protocol, while Fischer, Lynch e Merritt [12] shown that it is not possible to reach byzantine agreement if the communication graphs has connectivity smaller than 2f + 1 (i.e. each process must be able to communicate with at least 2f + 1 other processes). Following the pillar result in [15], many effort has been done to reduce the number of rounds needed to reach the agreement (e.g. [9], [8], [10], [4], [3], [20], [5], [6] and [13]) and the number of rounds required to reach strong agreement is linear with the number of faulty processes f . Differently, the agreement considered in our IT-RB primitive is weaker than the one obtained trough consensus and this allows us to provide a solution working in a fixed number of steps, regardless the number of byzantine processes. This happens at the cost of delivering some ⊥ values meaning that, during the broadcast execution, something bad happened and leaving to the application the decision about the management of the consistency (e.g. requiring a retransmission).

More recently, in [16] Liang and Vaidya provided an algorithm to solve byzantine agreement that reduces the size of messages exchanged and introduce a weak form of detection, i.e. they use a mechanism that compares messages and if a process detects a mismatch, it triggers a consensus instance to exclude the suspected process from the computation. This approach is similar to the one proposed in our paper but they delegate the decision about the detection and a faulty process exclusion to a consensus protocol resulting thus expensive (every detection requires generally f + 1 rounds). On the contrary, the oracle proposed in this paper, works locally leaving to the protocol P^∗ the management of a faulty process removal that happens in two communication steps.

In [21], Most´efaoui and Raynal consider a validated broadcast primitive, i.e. a communication primitive that delivers a message m if it has been validated by a sufficient large number of processes, otherwise it delivers a default value (i.e. ⊥). Such primitive is similar to the one proposed in this paper since it considers the possibility to deliver a default value but it differs for three main reasons: (i) it realize a n-to-n communication scheme, requiring to any process to broadcast a value, (ii) it implements, in some sense, a strong agreement between any correct process (i.e. if a correct process delivers ⊥, then all the correct processes deliver ⊥) and (iii) it relays on a lower level reliable broadcast primitive. In contrast to the strong agreement considered in the byzantine consensus, there exist other works, like [2], [18], proposing a reliable broadcast with a weaker requirement on the agreement, i.e. only in case of a correct sender, correct processes are required to agree on the value to be delivered. Let us note that the broadcast specification proposed in this paper is stronger than the one in [18].

Byzantine Failure Detectors. Byzantine detector oracles are extremely complex to be implemented and they are strongly coupled to the application they monitor. This conjecture has been formalized by Malkhi and Reiter in [17] where they tried to apply the methodology used in the design of classical failure detectors by defining the abstraction of a byzantine failure detector. Such oracle has the aim to trace messages and to suspect a byzantine process when it interferes in the progress of the computation (i.e. in case of omission or non-well-formed messages). Kihlstrom et al. [14] builds an oracle that suspects faults as soon as a process omits a message or sends messages that are not expected. Similarly to these oracles,

(3)

the byzantine detector introduced in this paper exploits the message exchanges occurred at the broadcast level. Differently from [17], our oracle checks also the content of the messages exchanged to detect additional bad behaviors. Moses and Tuttle [19] shown that faulty messages do not identify the set of faulty processes but they rather place constrains on their identities. So, given a labeled communication graph of that constrains, identify the faulty processes can be translated in to computing the size of minimal vertex cover of the graph (which is know to be a NP-complete problem). Finally, Baldoni et al.

[1] present a general methodology to generate a byzantine failure detector for any given protocol without modifying it.

3 System Model

The distributed system is composed of a set of n processes Π = {p1, p2, . . . pn} each one with a unique identifier. Processes can communicate only by exchanging messages through point-to-point channels.

We assume that processes are arranged in a fully connected topology, i.e. for any pair of process pi, p_j ∈ Π, there exists a bi-directional channel connecting them.

Communication Model. Channels are reliable, that is messages are not created, duplicated or dropped by the channel. Channels are also timely, i.e. there exists an upper bound δ on the time taken by a message m to be delivered to its destination. More formally, channels satisfy the following property:

– Point-to-point Timely Delivery: there exists an integer δ, known by processes, such that if p_isends a message m to pjat time t, then pj receives m by time t + δ.

As in [18], we assume that channels are authenticated (or “oral” model) , i.e. when a process pj

receives a message m from a process pi, then pjknows that m has been generated by pi.

Failure Model. Processes of the distributed system are partitioned into two disjoint sub-sets: correct processes and malicious processes (also called byzantine or attacker). Correct processes behave according to the protocol executed in the distributed system (and discussed in the following Section) while malicious processes deviate arbitrarily from the protocol i.e., by dropping messages (omission failures), changing the content of a message, creating spurious messages etc... We assume that up to f processes may be malicious (with f < n/4).

4 Intrusion-tolerant Reliable Broadcast

In the following, we first provide the specification of our Intrusion-tolerant Reliable Broadcast (IT-RB) primitive and then we will propose a protocol implementing the primitive. Processes may broadcast values trough the IT-RB primitive by invoking the IT − RBcast(v), allowing them to send a value v to all the other processes that will deliver the value trough the IT − RBDeliver(v) notification. Informally, IT-RB primitive guarantees that if a correct process sends a value trough the IT − RBcast(v) interface, then v will be delivered by all the correct processes. In addition, it also guarantees that if a byzantine process invokes the IT − RBcast() interface and a correct process delivers a value v coming from it, then also other correct processes will eventually deliver a v or ⊥. More formally:

– Termination: if a process p_i broadcasts a messageM(v) with some content v and a correct process p_jdelivers a non-⊥ value from pi, then any other correct process eventually delivers a value (either

⊥) by pi.

– No-Creation: if a correct process p_idelivers a value v (with v 6= ⊥), then v was previously broadcast from a process pj.

– Validity: if a correct process p_ibroadcasts a message M(v) with content v, then any correct process will eventually deliver v.

(4)

M(v)

M(v) M(v)

M(v)

BRBcast()

BRBDeliver (v) BRBDeliver (v) BRBDeliver (v) p1

pi

pj

pn

M'(v3) M'(v2)

M'(v1) BRBcast()

BRBDeliver (v4)

BRBDeliver(v2)

BRBDeliver (v2)

BRBDeliver (⊥)

t

Fig. 1. Example of a IT − RBcast() run.

– Weak-Agreement: if two correct processes pi, pj deliver respectively viand vj then either (i) vi= v_j or (ii) at least one of the two is ⊥.

As an example, Figure 1 shows a possible execution of the IT-RB primitive. Let us notice that the IT-RB specification is stronger than the reliable broadcast considered in [18], where agreement on values is guaranteed only if the sender is correct, but it is weaker than consensus [15]. Indeed, IT-RB is able to provide a weak form of agreement among correct processes that, in the worse case, are partitioned into two disjoint sets: members belonging to the first set will deliver the same value v selected among those sent out by the byzantine sender (e.g. considering the previous example p_iand p_j) while members of the second set will deliver a default message, i.e. ⊥ (e.g. pnin the previous example).

4.1 The Intrusion-tolerant Reliable Broadcast Protocol P

Figure 2 shows the protocol P implementing the IT-RB primitive. P is divided in 3 steps and it is similar to the solution provided by Pease et al. in [22]. At each step, values are re-transmitted to all the processes and following such message propagation flow, each process builds its local “computation knowledge”

according which it can select the value to deliver.

To simplify the notation, in the following, we will describe the protocol for a single broadcast event.

If more that one broadcast happens concurrently, pireplicates the data structure for each broadcast event.

For each IT − RBcast() invocation, each process creates and maintains the following data structures:

– value_i: stores the message to be delivered;

– vectori[]: is an array variable having an entry for each process currently in the system. At the j- th position, pi stores its knowledge about what pj received from the sender (i.e. the value that pj

received from the sender and forwarded to pi).

– matrixi[][]: is a matrix n × n, where n is the number of processes of the system. In the position (k, j), pistores its knowledge about what pjsent to p_k(as declared by p_k).

When a process p_i invokes the IT − RBcast() primitive, it sends the value v to all other processes (line 01). The IT − RBcast() protocol starts at each correct process pi when it delivers a IT- RBCAST(v, src) message, containing the value v and the identifier of the sender src.

– Step 1: value forwarding (lines 02 - 04). On delivering the IT-RBCAST(v, src) message, each correct process pisets a flag valid to remember that a value has been received trough a IT-RBCAST

message and to avoid to consider spurious values sent out by byzantine processes to create noise.

Then, p_i forwards the value, together with the source identifier and its one, to all other processes by sending aVALUE FORWARD(v, src, i) message. In the following, we will say that p_iis acting as forwarder(or simply piis a forwarder) to refer piin this action.

(5)

upon eventIT-RBcast(v):

(01) ∀j ∈ [1, n] send IT-RBCAST(v, i) to pj;

—————————————————————————————————————————————————- when IT-RBCAST(v, src) is delivered to pi:

(02) ∀j ∈ [1, n] sendVALUE FORWARD(v, src, i) to pj; (03) valid ← true; waitingi← true;

(04) set(timer P 2i, 2δ);

—————————————————————————————————————————————————- whenVALUE FORWARD(v, src, j) is delivered to pi:

(05) if ((vectori[j] =< >, src >) ∨ (vectori[j] = ∅)) then vectori[j] ← {< v, src >};

(06) if (¬ valid) then ∀j ∈ [1, n] sendVALUE FORWARD(>, src, i) to pj;

(07) valid ← true;

(08) endif

(09) if (¬ waitingi) then waitingi← true;

(11) endif

—————————————————————————————————————————————————- when timer P 2i= 0:

(12) ∀j ∈ [1, n] sendVECTOR FORWARD(vectori, i) to pj; (13) set(timer P 3i, 2δ);

—————————————————————————————————————————————————- whenVECTOR FORWARD(vec, k) is delivered to pi:

(14) if (valid) then matrixi[k][] ← vec;

(15) if (¬ waitingi) then waitingi← true;

(17) endif

(18) else ∀j, matrixi[k][j] ← >;

(19) endif

—————————————————————————————————————————————————- when timer P 3i= 0:

(20) valuei← select value with frequency(vector_i, 2f + 1);

(21) if (valuei6= >) then let C = {j ∈ [1, n]| select value with frequency(matrix_i[][j], n − f − 1) 6= valuei};

(22) if (|C| > f ) then valuei← ⊥;

(23) trigger IT − RBDeliver(valuei);

(24) endif

(25) validi← false; waiting_i← false;

Fig. 2. Protocol P implementing IT − RBcast primitive (code for process pi).

– Step 2: preliminary value validation (lines 05 - 13). When a process p_idelivers aVALUE FORWARD

(v, src, j) message from a process pj, it stores the pair < v, src > in the j-th entry of vectori

(meaning that pinow knows that pjsaw a value v originally sent by src) if such value is the first one forwarded by p_j or if p_j is updating a > value previously sent due to concurrency among messages sent respectively by the sender and by a faulty forwarder. In addition, pichecks if the validiflag is set to true, meaning that it has already received the original broadcast message sent by the sender. On the contrary, v could be a spurious value created by p_j; thus, p_isends aVALUE FORWARD(>, src, j) message to ensure that all the other correct processes start executing the protocol. Note that, in the latter case, pisends a special value > meaning that it has received a spurious message. In any case, pi

remains waiting for 2δ time to be sure that every correct process start the execution of the protocol.

Finally, when unblocked by the wait statement, pisends aVECTOR FORWARD(vector_i, i) message, containing the vectorivariable filled it with all the values it received as witnesses. In the following, we will say that p_iis acting as vector sender (or simply p_iis a vector sender) to refer p_iin this action.

– Step 3: value selection and delivery (lines 14 - 25). On receiving a VECTOR FORWARD(vk[], k) message, p_i checks if the valid flag is set to true (meaning that the broadcast protocol is running) and if so, it updates its matrixivariable by copying in the k-th row the vector vk[] received from p_k. In addition, on receiving the first “valid” vector pialso sets a timer that puts it in the waiting state.

Upon time-out, p_i selects a reference value, trough the select value with frequency() function, and stores it in the valuei variable. The select value with frequency(set, #occ) takes as parameters a set and an integer number representing the minimum required frequency for the value to be selected.

(6)

If all the values occur less than #occ times, then the function returns the default value ⊥.

In particular, in line 20, piwants to select the value in vectori occurring at least 2f + 1 times if it exists. Once a reference value is selected, it is used to decide the value to be delivered by comparing it with those stored in the matrix. In particular, if the reference value is >, nothing happen and no value will be delivered since the current broadcast has been generated by a spurious message.

Conversely, the reference value will be delivered if in the matrix of pi there exist less than f + 1 columns having a value, different from the reference one, occurring too often (i.e. occurs more than n − f − 1 time in a column) (cfr. line 21-22). On the contrary, p_iwill deliver ⊥ because the matrix contains too “corrupted” values and it is not able to decide a value in agreement with other processes.

4.2 Correctness Proofs of Protocol P

In the following, we will show that the protocol P in Figure 2 satisfies IT-RB properties.

Lemma 1 (Termination). Let n > 4f . If a process pi broadcasts a message M(v) with some content v and a correct process p_j delivers a non-⊥ value from p_i, then any other correct process eventually delivers a value (either⊥) by p_i

Proof When pj delivers a value from pithere are two cases: (i) piis correct or (ii) piis byzantine.

Case (i): p_iis correct. This case is trivial, since the sender is correct and it will send a IT-RBCAST(v, i) message to any process. Such message will activate the chain of messages IT-RBCAST(v, i) →VALUE FORWARD (v, id, i) → VECTOR FORWARD(vectori, i) that will bring any correct process to select a value and to trigger a IT − RBDeliver event.

Case (ii): p_i is byzantine. In this case, p_i is not forced to deliver any value since it is faulty. In the following, we will show that in any case, any other correct process pkwill deliver a value.

Let us suppose that p_jdelivers v and another correct process p_kdoes not deliver a value (i.e. it will never execute line 23). Then there exists two cases:

(a) the timer timer P 3knever reaches 0 (line 20).

(b) in vector_kthere are more than 2f values equal to > (line 21).

Case (ii.a):Since p_j is correct and it delivered a message, it means it executed line 23, Figure 2.

This means that pj received a VECTOR FORWARD(vectori, i) message by some other process and its timer P 3jhas been started. Such timer is started as soon as pjdelivers the first validVECTOR FORWARD

(vec, id) message and this happens only after that p_jsent aVALUE FORWARD(v, id, i) to any other process. As in the previous case, this triggers the the chain of messages VALUE FORWARD(v, id, i) →

VECTOR FORWARD(vectori, i) that will bring any correct process to select a value and to trigger a IT − RBDeliver event.

Case (ii.b): If in vectorkthere are more than 2f values equal to > and in vectorj more than 2f values equals to v then between the two vectors there are at least 2f + 1 different values. Since there are at most f faulty forwarders, the remaining different values have to be due to sender actions, which impacts along the whole related columns. This means that pjwill see at least f + 1 columns different from v in which > occurs at least n − f − 1 times, forcing pito deliver ⊥ (line 22), which is a contradiction.

2Lemma 1

(7)

Lemma 2 (No-creation). Let n > 4f . If a correct process p_i delivers a valuev (with v 6= ⊥), then v was previously broadcast from a processp_j.

Proof If pi delivers v then it means that v occurred at least 2f + 1 times in vectori. Considering that vector_iis initially empty and it is filled in upon the delivery of aVALUE FORWARD(v, id, i) message, it means that at least 2f + 1 processes previously sent such message. Considering that: (i) a correct process sends aVALUE FORWARD(v, id, i) message only when it receives a IT-RBCAST(v, i) (line 02) and (ii) byzantine processes are at most f , it means that there exists at least f + 1 correct processes that received a message with content v from the sender and forwarded it.

2Lemma 2

Lemma 3 (Validity). Let n > 4f . If a correct process broadcasts a messageM(v) with content v, then any correct process will eventually deliverv.

Proof Let us suppose by contradiction that there exists a process pidelivering a value v⁰different from the value v sent by the correct sender. Two cases can happen: (i) v⁰ occurs in vector_imore than 2f + 1 times (cfr. line 20) or (ii) there exist in matriximore than f columns each containing at least n − f − 1 v⁰(cfr. lines 21 -22).

Case (i). Since byzantine are at most f , they can only change at most f entries of vectori. Considering that vector_i has n entries, n > 4f , and any correct process will deliver and forward the value v sent by the sender, we have that the most frequent value in vectoriwill be v occurring at least 3f times and we have a contradiction.

Case (ii). As in the previous case since byzantine are at most f , they can only affect at most f columns of matrixi. Considering that matrixi has n column, n > 4f , and any correct process will deliver and forward values without changing their content, we have that at most f columns may contain values different from v and in any of the remaining n − f columns, at most f values may be altered. As a consequence, the condition in lines 21 - 22 will be always false and we have a contradiction.

2Lemma 3

In order to prove the Weak − Agreement property of the IT-RB specification, let us first prove two intermediate results.

Lemma 4. Let vectorithe vector obtained by a correct processp_ibefore executing line 12. Letv_ref be the most frequent value occurring at least2f + 1 times in vectori. Ifvref occurs less thann − f − 1 times then the broadcast sender is byzantine.

Proof Let us suppose by contradiction that v_ref occurs less than n − f − 1 times then the sender is correct. If vref occurs less than n − f − 1 in vectori we have at least f + 1 values different from it.

Since the sender is correct then these values are due to the forwarders actions, but them are at most f ,

and we have a contradiction. 2Lemma 4

Lemma 5. Let vectori the vector obtained by a correct processpi before executing line 12. Letvref

be the most frequent value occurring at least2f + 1 times in vector_i. Letx be the number of values v different fromv_ref invector_i. Ifx > f then, for any correct process p_k, there exits at leastx − (f − 1) columns inmatrixkin which the most frequent valuev⁰is different fromvrefand occurs at leastn−f −1 times.

Proof If x > f , due to Lemma 4, we have that the sender is byzantine. Since it is faulty then there are at most f − 1 faulty forwarders which could have sent values v 6= vref. Since x > f and forwarders are

(8)

at most f − 1 we have that at least the remaining x − (f − 1)value are due to sender actions. So this remaining values will impact in all the matrices at least along x − (f − 1) columns. 2Lemma 5

Lemma 6 (Weak-agreement). Let n > 4f . If two correct processes pi,p_j deliver respectivelyv_iand vj then (i)viarevjare the same or (ii) at least one of the two is⊥.

Proof Let us suppose by contradiction that p_i and pj deliver respectively vi and vj, that vi 6= v_j and that none of them is ⊥.

Since pidelivered vi 6= ⊥, it means that vi is the value occurring more often in vectoriand it occurs at least 2f + 1 times (line 20 Figure 2) and matrixi has less than f + 1 columns with a value different from vi(lines 21-22 Figure 2 ). Thus, vectoristored by piwill appear as:

p1, p2. . . pi. . . p2f +1 p2f +2. . . pn−1

vectori v_i v_j

The same reasoning applies to vj and vectorj will look like:

p1, p2. . . p_2f p_{2f +1}. . . pj. . . pn−1

vectorj v_i v_j

Considering that n > 4f , there exist x entries in vector_i containing values different from v_i that could have been changed by byzantine processes (with x ≤ 2f ). In the worst case, such x entries will contain the same value vj. But also in vectorj there exists y entries containing values different from vj

that could have been changed by byzantine processes. Two cases can happen: (i) x ≤ f of (ii) x > f . Case (i): x ≤ f . In this case, we have that vioccurs in vectoriat least n − 1 − x times. Since byzantine processes are at most f , it means that at least (n − 1 − x) − f correct processes see vi in step 1 and forwarded v_i to all the others (including p_j) in step 2. As a consequence, vector_j must have at least (n − 1 − x) − f occurrences of v_i. Considering that (i) the occurrences of viin vectorj(n − 1 − x) − f is less than n − 2f and (ii) n > 4f , we have that vi is the most frequent values in vectorj and it occurs at least 2f . Thus, when selecting the value in line 20, p_j will select v_iand we have an absurd.

Case (ii): x > f . Also in this case, vioccurs in vectoriat least n − 1 − x times and since viis selected by pi as reference value, it means that it occurs in vectori at least 2f times (i.e. n − 1 − x ≥ 2f ). As a consequence, x ≤ n − 2f − 1 and considering that n > 4f , we get f < x ≤ 2f . Therefore, due to Lemma 5, any correct process will see in its matrix at least x−f +1 columns containing a value different from vi and occurring at lest n − f − 1 times. Thus, the matrix of a any correct process (including pj) will have at least f + 1 columns with values different from v_i. As a consequence, when executing line 21, pjwill change its valuej to ⊥ and we have an absurd.

2Lemma 6

Theorem 1. If n > 4f then the algorithm P shown in Figure 2 implements an IT-RB primitive.

Proof The theorem follows from Lemma 1, Lemma 2, Lemma 3 and Lemma 6. 2T heorem 1

(9)

5 Intrusion-tolerant Reliable Broadcast with the P-LO Oracle

Let us note that, considering an infinite run the protocol presented in Figure 2 may allow the delivery of

⊥ values infinitely often. This is actually due to the fact that protocol P does not detect faulty processes but it is only able to understand that a deviation occurred at some process. Thus, every time that a correct process detects something wrong in the processing of the current broadcast and it is not sure to select a value in agreement with others, it delivers ⊥ in order to assure safety. However, by analyzing the computation knowledge stored in the matrix obtained from the IT-RB protocol execution, each process can infer most of the deviations done by faulty processes and it can detect them with the immediate advantage of an improvement of IT-RBcast protocol performance.

In the following, we will show how it is possible to extend the protocol P presented in Figure 2 with a Byzantine Detector oracle, namely P − LO, obtained by leveraging the information obtained while running P. The new resulting IT-RB protocol, namely P^∗, will gain in performance by limiting the number of ⊥ values that will be delivered from correct processes. In particular, we first provide the specification of the P − LO oracle, then we will show how to extend P to get P^∗ enhancing its performance and finally we will show a protocol implementing P − LO oracle by exploiting the knowledge stored in the matrixivariable filled in during the broadcast execution.

Byzantine Detector Oracle Specification. The byzantine detector P − LO is an oracle, running locally at each process pi, and it notifies the identifiers of detected faulty processes through the detect(p) interfaces. The oracle is defined by the following properties:

– Completeness: If a process p_i is byzantine, it will at least suspected by some correct process pj. – Accuracy: If a process piis detected by some process, then piis byzantine.

5.1 A protocol P^∗Enhancing the IT-RBcast Protocol trough a Byzantine Detector P-LO The protocol P^∗is obtained by adding the instructions of Figure 3 to the protocol P presented in Figure 2 (i.e. handling of a malicious process detection).

The basic idea is to add one further phase to the IT-RBcast protocol where each correct process dissem- inates the set of detected ones. To this aim, each correct process pikeeps additional local variables: (i) correctistoring the set of processes that pibelieves correct; (ii) f_maxⁱ that is an integer value representing the current upper bound to the number of faulty processes; (iii) vote_i[] is an array storing at the j-th position the number of processes that voted pj as faulty; (iv) pendeing removali is a set storing the identifiers of processes detected that still need to be removed from the list of correct ones.

When the P-LO oracle notifies the detection of a faulty process p_j, p_i sends an ALERT(j, i) message to all the processes it sees as corrects (line 02). When a correct process pi delivers an ALERT(j, src) from a process pj, it adds the vote of src to votei[j] and then checks how many other processes detected the failure of p_j. As soon as f_maxⁱ + 1 processes detected the fault, p_i can trust p_j as effectively faulty and can send an alert too (line 04). A necessary condition for the removal of pj from the list of correct processes is that n − f_maxⁱ − 1 processes triggered an alert for it. When this happens, pichecks if there is a broadcast running and if so, p_j is inserted in the pending removal_ito be removed after the value delivery (line 05). Note that, in this case the removal of pjcan cause a change of the value to be delivered, so to avoid loss of the agreement, pi compares the value supposed to be delivered with the one that it would obtain if p_j is removed; if they are the same, p_iproceeds delivering the value, otherwise it deliver

⊥ (lines 11 - 15). If no broadcast is running, p_jcan be immediately removed and the upper bound on the number of faulty processes will be decreased (line 06).Such removal will impact all the data structures, including those maintained by the oracle.

Note that this approach has the immediate advantage to let the system adaptive to failures, in the sense that processes are able to reconfigure them self as soon as “enough” processes detected an intrusion and

(10)

Init

(01) correcti← {p₁, p2. . . pn}; f_maxⁱ ← f ; ∀j : votei[j] ← ∅;

—————————————————————————————————————————————————- upon event detect(j)

(02) ∀pk∈ correct_isendALERT(j, i) to pk;

—————————————————————————————————————————————————- whenALERT(j, src) is delivered to pi:

(03) votei[j] ← votei[j] ∪ {src};

—————————————————————————————————————————————————- when ∃ j such that |votei[j]| > f_maxⁱ :

(04) ∀pk∈ correct_isendALERT(j, i) to pk;

—————————————————————————————————————————————————- when ∃ pj∈ correct_isuch that |votei[j]| ≥ n − f_maxⁱ − 1:

(05) if (timer P 316= 0) then pending removal_i← pending removal_i∪ {p_j};

(06) else correcti← correcti\ {pj}; fmaxⁱ ← f_maxⁱ − 1;

(07) endIf

—————————————————————————————————————————————————- when timer P 3i= 0:

(08) valuei← select value with frequency(vectori, 2f_maxⁱ + 1);

(09) if (valuei6= >) then let C = {j ∈ [1, n]| select value with frequency(matrix_i[][j], n − f_maxⁱ − 1) 6= value_i};

(10) if (|C| > f_maxⁱ ) then valuei← ⊥;

(11) if ((valuei6= ⊥) ∧ (pending removali6= ∅))

(12) then clean matrixi← remove faulty(pending removal_i, matrixi);

(13) v_ref⁰ ← extract value(clean matrix_i);

(14) if (v⁰_ref6= value_i) then valuei← ⊥;

(15) endif

(16) trigger IT − RBDeliver(valuei);

(17) endif

(18) validi← false; waiting_i← false;

(19) correcti← correcti\ pending removali; (20) f_maxⁱ ← f_maxⁱ − |pending removal_i|;

(21) pending removali← ∅;

Fig. 3. Protocol P^∗implementing IT − RBcast primitive (code for process pi).

the intruder will be automatically removed from the computation. In addition, the proposed mechanism prevents malicious processes from taking certain types of actions since their discovery bring them out from the system.

5.2 The P − LO Byzantine Detector Protocol

The basic idea of the protocol is to exploit the knowledge acquired by a process during message- exchanges occurred while running P (i.e. matrixi) to infer malicious actions. Let us note that, in P a byzantine process may omit to send (or to forward) a value, change the content of messages (i.e.

change the value), create fake messages/values etc... All this incorrect actions are reflected in the matrix structure, depending both on the specific action taken by the faulty process (omission, creation or value alteration) and the step at which the action is taken (i.e. step 1, step 2 or step 3).

As an example, let us consider the scenario depicted in Figure 4, where p1, p3and p5are byzantine processes and p1issues the IT − RBcast primitive. In particular, in the first step p1sends the value a to the correct processes p₂and p₄while it sends the value v to all the others. During step 2, every correct process will forward the received value, while a byzantine process may change the value it is forwarding (e.g. process p3received v and then forwards a to p10and b to p9or p5received v and then forward a to p₂). Collecting the forwarded values, each process p_iwill build its vector_i variable that in turns will be propagated during step 3 to build the matrix. In Figure 5, we show the matrix obtained by p4 at the end of the broadcast protocol execution.

Let us notice that, just by looking to the matrix, p₄is immediately able to understand that something went wrong in the broadcast protocol and that at least one byzantine process acted. Once matrixiis available, pi builds its suspect graph SGi(or enrich SGiif it already exists) with the information collected during

(11)

5 1

7

2

3

8 6

9

4 10

11 12 13 14

a

v v a v

5 1

7

2

3

8 6

9 4

10

11 12 13 14

a

v

a

v

a b a

Step 2 Step 1

v v

Fig. 4. Example of Byzantine actions in the IT − RBcast protocol.

p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14

p1

p2 a b a a v v v v v v v v v

p3 a b a v v v v v v v v v v

p10 a a a v v v v v v v v v v

Fig. 5. Matrix of p4corresponding to the IT − RBcast message pattern generated by p1.

the current broadcast. Such graph represents previous detections and suspects of processes and it will be used for further detection. Every time that the suspect graph changes, piwill check its topology to detect other faulty processes.

The P − LO protocol implementing the byzantine detector is shown in Figure 6 and it uses the following local variable:

– suspect_i: is a set storing all the process identifiers, correct or not, suspected to be faulty;

– detecti: is a set storing all the identifiers of detected processes;

– accusei: is a set of pairs < pi, pj > representing a suspect interaction between piand pj, following by the matrix analysis.

The algorithm is triggered by every correct process pias soon as a matrix, following from a broadcast message pattern, is built and contains at least two different values (meaning that at least one byzantine process acted) and it works as follows:

– pi checks if in the matrix there exists some row k containing only > values (line 03); in this case p_k is detected as faulty because it sent out a spurious vector not preceded by any real broadcast invocations (line 04). The matrix is then cleaned by changing > values into ⊥ values (line 05).

– piselects the reference value v_refⁱ by looking to i-th row of matrixiand checks if it is a real value or the default value ⊥. If the reference value is not equal to ⊥, p_iscans the matrix in the following order: (1) diagonal scan, (2) columns scan and finally (3) rows scan (lines 08 - 10). On the contrary, pifirst checks along the diagonal if there are enough different values to detect the sender (line 11)

(12)

upon Init

(01) suspecti← ∅; detecti← ∅; accusei← ∅; SGi(Vi, Ei) = empty graph;

——————————————————————————————————————

when (matrixi) is available

(02) if (∃ v1, v2∈ matrix | v16= v2) then (03) if (∃ k | matrix[k][j] = >, ∀j)

(04) then ∀ k such that ∀j matrix[k][j] = >, detecti← detect_i∪ {k};

(05) ∀ k, j ∈ [1, n]: if (matrixi[k][j] = >) then matrixi[k][j] ← ⊥;

(06) else vⁱ_ref← most frequent(matrix[i][]);

(07) if (vⁱ_ref6= ⊥)

(08) then trigger diagonalScan(matrixi, vⁱ_ref);

(09) for each j ∈ [1, n], with j 6= sender do trigger columnScan(matrixi, vⁱ_ref, j);

(10) for each k ∈ [1, n], with k 6= sender do trigger rowScan(matrixi, vⁱ_ref, k);

(11) else if (∃ v1, v2. . . vf +1|(matrix_i[k][k] = vi) ∧ (vi6= ⊥)) then detect_i← detect_i∪ {sender};

(12) for each j ∈ [1, n], with j 6= sender do (13) msg ← most frequent(matrix[][j]);

(14) trigger coloumsScan(matrixi, msg, j);

(15) endFor

(16) endif

(17) for each (< pj, pk>∈ accusei: pj∈ detect_i∨ p_k∈ detect_i) do accusei← accuse_i\ {< p_j, pk>};

(18) SG_i← create suspect graph((detect_i∪ suspect_i), accusei);

(19) find Byzantine(SGi);

(20) endif

(21) for each p ∈ detectido trigger detect(p);

(22) endif

Fig. 6. P-LO Byzantine Detector Protocol (code for process pi).

and then, in any case, it will perform a columns scan, selecting for each column the reference value, i.e., the most frequent value (lines 12 - 15).

– trough the execution of the scan functions, p_ipopulates the detect_i, suspect_iand accuse_ivariables according with it will create its suspect graph SGi. In particular, SGi = (Vi, Ei) where the set of vertexes Viis the union of suspected and detected processes identified while scanning a matrix and the set of edges E_i represents the suspect interactions detected by the scan functions.

– Once the suspect graph SG_i is built, p_iwill execute the find Byzantine(SG_i) function that, looking to its topology, will try to identify faulty processes.

– finally, p_iwill trigger to the application every detected failure.

Scan Operations. These scans are used to generate SGiby identifying clearly faulty processes and suspect interactions. In the following, we describe the three scan functions according to the order followed by processes in the matrix analysis.

5.3 Diagonal scan

The diagonalScan(matrix, v) function, shown in Figure 7, takes as parameter the matrix built while running the protocol P and a reference value v.

functiondiagonalScan(matrix, v):

(01) if (matrix[i][i] 6= v) then detecti← detecti∪ {sender};

(02) else let Mdsuch that {m|matrix[j][j] 6= v, ∀j ∈ [1, n], j 6= sender};

(03) ∀ m_i∈ M_d, fmi← frequency(m_i, Md);

(04) if (P

(m_i∈M_d)fm_i> f ) then detecti← detect_i∪ {sender};

(05) endif

Fig. 7. Diagonal scan function.

(13)

Executing such function, each process p_ilooks for evidences of the faulty sender. In particular, given the reference value v, pi checks the value that each process declares to have received from the sender (i.e. values along the matrix diagonal) and it detects the sender as faulty in the following cases:

1. the value received by piis different from the reference value (line 01) 2. along the diagonal there exist more than f values different from v (line 04).

As an example, let us consider the case where a faulty process p1issues an IT-RBcast and we have f = 3 faulty processes. Figure 5.3 shows the matrix obtained by both p2 and p5 after the execution of the protocol P in Figure 2. Let the reference value vref be v for both p2 and p5. While executing the diagonalScan() function, p2 will detect the faulty sender p1as matrix2[2][2] 6= v. Contrarily, p5is not able to detect pisince it see only f (matrixi[2][2], matrixi[3][3]andmatrixi[4][4]) values different from the reference one along the diagonal.

p1 p2 p3 p4 p5 p6 p7 p8 p9 p10p11p12p13p14

p1

p₄ a b a v v v v v v v v v v

p₁₁ a a a v v v v v v v v v v

p1 p2 p3 p4 p5 p6 p7 p8 p9 p10p11p12p13p14

p1

p₄ a b a v v v v v v v v v v

p₁₁ a a a v v v v v v v v v v

Fig. 8. Matrix obtained by p2 and p5after the execution of the IT-RB protocol P shown in Figure 2 (where sender row ad column are blank).

5.4 Column scan

The columnScan(matrix, v, j) function, shown in Figure 9, takes as parameter the matrix built while running the protocol P, a reference value v and the column identifier j of the column to check. The aim of this function is to suspect or detect any process involved in message exchange having as value a value v⁰different from the reference one.

When a process p_i scans a column j, it first selects the value v⁰ occurring at least n − f − 1 in j, i.e., the value pj sent to most of the processes (line 05). If there not exists a value occurring enough times (i.e. v⁰ = ⊥), then process pj is clearly faulty and it is detected (line 26-27) because it sent too many different values to other processes. An example of a column representing this scenario is shown in Figure 10(d).

Conversely, when v⁰ exists (cfr. line 06 or 17), pi checks if the value pj declares to have delivered from the sender is different form v⁰ (i.e. it checks if matrix[j][j] 6= v⁰). If this happens, p_i detects p_j as faulty as it would say that pj forwarded to many times v⁰ that is not the value received by the sender (cfr. Figure 10(a), where v⁰ = m and matrix[j][j] = v 6= v⁰).