• Non ci sono risultati.

Brief Announcement: Validity Bound of Regular Registers with Churn and Byzantine Processes ∗

N/A
N/A
Protected

Academic year: 2021

Condividi "Brief Announcement: Validity Bound of Regular Registers with Churn and Byzantine Processes ∗"

Copied!
2
0
0

Testo completo

(1)

Brief Announcement: Validity Bound of Regular Registers

with Churn and Byzantine Processes

Roberto Baldoni Silvia Bonomi Amir Soltani Nezhad

Dipartimento di Informatica e Sistemistica "Antonio Ruberti"

Università degli Studi di Roma "La Sapienza"

Via Ariosto 25, I-00185 Roma, Italy

baldoni,bonomi,amir@dis.uniroma1.it

ABSTRACT

This paper studies the problem of building a byzantine fault tolerant storage service in a distributed system affected by servers join and leave (i.e., servers churn). We show a bound for ensuring both validity of read operations and the persis- tence of a value written by a write operation. This bound correlates the churn rate, the number of faulty processes and the time taken by register operations (i.e., join, read and write operations).

Categories and Subject Descriptors

H.3.4 [Information Storage and Retrieval]: System and Software—distributed systems; D.4.2 [Operating Systems]:

Storage Management—distributed memories; D.4.5 [Operating Systems]: Reliability—fault-tolerance

General Terms

Reliability, Theory

Keywords

Regular Register, Dynamic Distributed Systems, Byzantine failure, Churn.

1. INTRODUCTION

Building a storage able to tolerate arbitrary failures is a central component in any mission-critical system that has to ensure both correct and highly-available operation execu- tions despite software and accidental errors as well as ma- licious operations. Protocols that can tolerate such failures are said to be Byzantine Fault-Tolerant (BFT) [5]. In the re- cent years, BFT storage has become a first class abstraction in large scale distributed systems to implement services such as directory services, sequencer services, namespaces, key management etc. These services are usually implemented by dozens of servers geographycally deployed over large scale settings (e.g. interconnected datacenters) that notoriously have to withstand various types, patterns, degrees and rates of churn. In such environments, churn creates unpredictable patterns of servers that leave and join a computation and no

∗This work is partially supported by the European STREP project SM4All

Copyright is held by the author/owner(s).

ACM X-XXXXX-XX-X/XX/XX.

assumption can be done on the duration of the churn period and on the distance in time between two successive churn periods (i.e., churn is non-quiescent). Thus, churn adds a new complexity dimension to the implementation of BFT storage, and it must be handled to avoid the storage to be- come unavailable.

To the best of our knowledge, this is the first paper analyzing implications of building a BFT storage under churn. Imple- mentations of registers resilient to crash failures have been provided indeed on the top of a distributed system prone to quiescent and non-quiescent churn respectively.

2. SYSTEM MODEL

The distributed system is composed of a universe of clients Uc(i.e. clients system) and of a disjoint universe of servers Us(i.e. servers system). The clients system is composed of a finite arbitrary number of processes, while the servers system is dynamic, i.e. processes may join and leave the system at their will. A server enters the server system by executing the join System procedure. Such an operation aims at connect- ing the new process to both clients and servers that already belonged to the system. A server leaves the system by means of the leave System operation. In order to model processes continuously arriving to and departing from the system, we assume the infinite arrival model. The set of processes that can participate in the servers system, i.e. the server sys- tem population, is composed of a potentially infinite set of processes Us= {. . . si, sj, sk. . . }, each one having a unique identifier (i.e. its index). However, the servers system is composed, at each time, of a finite subset of the population.

Clients and servers can communicate only by exchanging messages through reliable and authenticated channels. We assume the existence of a protocol managing the arrival and the departure of servers from the distributed system; such a protocol is also responsible for the connectivity maintenance among processes of the system.

Distributed Computation. At time t0, when the server system is set up, n servers belong to the servers compu- tation. A server si, belonging to the servers system, that wants to join the distributed computation has to execute the join Server() operation. Such an operation, invoked at some time t, is not instantaneous and takes time to be executed;

how much this time is, depends on the specific implementa- tion provided for the join Server() operation. However, from time t, the server si can receive and process messages sent by any other processes participating in the computation.

When a server sjparticipating in the distributed computa- tion wishes to leave the computation, it executes the leave Server() operation. We model the leave Server() operation as

(2)

an implicit operation, i.e., when a process pileaves the com- putation, it just stops to send and process messages related to the register computation. Moreover, we assume that if a server leaves the computation and later wishes to re-join, it executes again the join Server() operation with a new iden- tity. We assume at most f ≤ n/3 servers can deviate from their specifications during the whole life of the computation.

The remaining servers are correct.

Non-Quiescent Churn. The computation alternates peri- ods of churn (Tc) and periods of stability (Ts). More specif- ically, there exist some periods Tc, Tc0, Tc00 etc. in which servers join and leave the computation, and between two churn periods Tcand Tc0, there exists a period Tswhere the computation becomes stable and no join or leave operations occur. However, no assumption is made about how long Tc

and Ts are. During the churn periods, we assume that the churn refreshes a fraction of the computation participants at each time unit. More precisely, we define as churn rate, de- noted c, the percentage of processes that are “refreshed” at every time unit (c ∈ [0, 1]). This means that while the num- ber of servers remains constant (equal to n), in every time unit in Tc, c × n processes leave the computation and the same number of processes invok the join Server() operation.

3. REGULAR REGISTER

A register is a shared variable accessed by a set of pro- cesses, i.e. clients, through two operations, namely read() and write(). We consider a regular register specified as fol- lows:

Termination: If a correct process participating in the compu- tation invokes an operation and does not leave the system, it eventually returns from that operation.

Validity: A read operation returns the last value written be- fore its invocation, or a value written by a write operation concurrent with it.

Due to the presence of churn, we add one more property, namely write persistency, that an algorithm has to satisfy:

WritePersistency: Servers maintain the last value written by a write operation despite server departures.

Protocol Model. A protocol Preg implementing a regular register is a collection of distributed algorithms, one for each operation (i.e. Preg={AJ S, AR, AW} where AJ S, AR, AW

are respectively the distributed algorithms implementing the join Server(), the read() and the write() operations). Each al- gorithm is composed of a sequence of computation steps and communication steps. A computation step is represented by the computation executed locally to each process while a communication step is represented by the sending and the delivering events of a message.

A register is maintaied by a set of servers (possibly a subset of the ones belonging to the computation). No agreement abstraction is assumed to be available at a server. Clients do not maintain any register information; they can just trig- ger operations and interact with servers through message exchanges. Moreover, we assume that each server has the same role in the distributed computation (i.e. there is no special server acting as a coordinator).

For the sake of simplicity, we assume a single register repli- cated at each server belonging to the computation.

4. VALIDITY BOUND

Consider a generic protocol Preg implementing a regular register such that (i) every operation terminates and that (ii) there exists a period of churn longer than the longest operation issued on the register. Starting form the lower bound of 3f + 1 processes needed to tolerate f failures [7], in [3] we prove the following Theorem.

Theorem 1. Let AJ S, ARand AW be the algorithms im- plementing respectively join Server(), read() and write() op- erations. Let ∆tj, ∆tr and ∆tw be the maximum time in- tervals needed by AJ S to terminate an operation. If c ≥ min{n×∆tn−3f

r,n×(∆tn−3f

j+∆tw)}, then it is not possible to ensure both the write-persistency and the read-validity.

Interestingly, the bound is affected by the time taken by the operations to be executed, which depends on the algo- rithm implementing the operations and on the synchrony assumptions of the underlying distributed systems. In the case of a synchronous distributed system, the values of ∆tj,

∆tr and ∆tw are finite and thus, the value of c can be cal- culated exactly. Moving towards an asynchronous system, the values of ∆tj, ∆trand ∆twcan only be estimated; thus, there is no more a deterministic guarantee that the system works, but it is still possible to provide an estimation of the churn rate that can be tolerated by the system. This re- sult confirms the impossibility result found in [2] that is no algorithm can implement a regular register in the presence of non quiescent churn on the top of a fully asynchronous distributed system.

5. FUTURE WORK

This paper has established a bound for ensuring validity of read operations issued on a regular register which is able to resist arbitrary failures and non-quiescent churn. Future work will study if register storage is weaker than consensus not only in distributed systems prone to both crash failures and quiescent churn (as shown in [1]) but also in a setting including byzantine failures.

6. REFERENCES

[1] Aguilera M. K., Keidar I., Malkhi D., Shraer A., Dynamic atomic storage without consensus, in Proceedings of 28th Annual ACM Symposium on Principles of Distributed Computing (PODC) 2009, 17-25

[2] Baldoni R., Bonomi S., Kermarrec A.M., Raynal M., Implementing a Register in a Dynamic Distributed System, in Proceedings of the 29th IEEE International Conference on Distributed Computing Systems (ICDCS’09), IEEE Computer Society Press, Montreal (Canada), June 2009.

[3] Baldoni R., Bonomi S., Soltani Nezhad A. Regular Registers in Dynamic Distributed Systems with Byzantine Processes: Bounds and Performance Analysis, Technical report - MIDLAB 3/11 - 2011

http://www.dis.uniroma1.it/~midlab/articoli/

[4] Lamport. L., On Interprocess Communication, Part 1:

Models, Part 2: Algorirhms, Distributed Computing, 1(2):77-101, 1986.

[5] Lamport L., Shostak R. E., Pease M. C., The Byzantine Generals Problem, ACM Transactions on Programming Languages and Systems (TOPLAS) 4(3), 382-401, 1982.

[6] Malkhi D., Reiter M. K. Byzantine Quorum Systems, Distributed Computing 11(4): 203-213, 1998

[7] Martin J., Alvisi L., Dahlin M.. Minimal Byzantine Storage, in Proceedings of the 16th International Symposium on Distributed Computing (DISC 2002), pp.

311-325, 2002.

Riferimenti

Documenti correlati

Studiosi/se italiani e tedeschi, provenienti dal campo prevalente dell’architettura e della ricerca architettonica, in parte anche dalle discipline umanistiche, discuteranno nei

for 20 days between March and April of 2018, to evaluate the behavior and the performance of four PV/T panels connected in parallel used as the cold source of a heat pump in

Dowkrxjk wkh lvvxh lv qrw ixoo| uhvroyhg/ pdq| revhuyhuv djuhh wkdw qrplqdo fkdqjhv lq wkh txdqwlw| ri prqh| kdyh uhdo vkruw0uxq hhfwv rq rxwsxw dqg hpsor|phqw1 Wklv kdv ohg wr d

This section presents a protocol implementing a regular register in an eventually synchronous distributed system with continuous churn and where the number of processes participating

While leave operations are instantaneous, a join operation takes time because a new server has to receive the correct value of the storage, say v, by a certain number of server

Upon this churn model, we prove that: (i) a regular register cannot be build in an asynchronous system, (ii) a regular register can be implemented in an eventually

The Genna Selole Formation was deposited within an active extensional tectonic regime under a warm and humid climate (Costamagna 2015 ), possibly related with the early Middle

La geografia napoletana del sacro di cui l’agiografo di IX secolo si era fatto nobile interprete, una geografia tutta basata sulla vetustas e sulla remota antichità degli edifici