Self-Maintenance of Overlay Connectivity

(1)

Self-Maintenance of Overlay Connectivity

Sara Tucci Piergiovanni^∗and Roberto Baldoni Dipartimento di Informatica e Sistemistica

Universit`a di Roma “La Sapienza”

Via Salaria 113, 00198 Roma, Italia

Abstract

Overlay maintenance is fundamental in peer-to-peer (P2P) systems. In short, the problem consists in preserving the topological properties of an overlay despite continual arrivals and departure of nodes. Current approaches to overlay maintenance address different variants of the problem under different kinds of assumptions, for the problem and its underlying computational model have never been precisely defined.

This paper gives a specification of the problem in a precise computing model that captures the dynamics of P2P systems in terms of concurrency and failures. Specifically, a safety property is presented which encapsulates overlay connectivity (the core property of any overlay allowing any two nodes to communicate) along with a liveness property enforcing the ability of new nodes to participate in the overlay. The precise definition of the problem allows to point-out limitations of the use of redundant topologies to ensure connectivity. Moreover we show that the specification is implementable by introducing a protocol which maintains overlay connectivity most of the time and it is able to dynamically repair the overlay when failures occur. A simulation study conveys the very fact that the presented protocol is practically reasonable, as it achieves good scalability properties in periods when the overlay population changes very slowly.

1 Introduction

Peer-to-peer (P2P) systems are distributed systems consisting of interconnected nodes that are able to self-organize into network topologies with the purpose of sharing resources (content, CPU cycles, storage, bandwidth). P2P systems have recently been the subject of a vast research effort in

∗Sara Tucci Piergiovanni is eligible for the William C. Carter Award.

(2)

several fields: information systems, parallel computing, database management, distributed systems, communication networks. The reasons behind their increasing popularity comes, to a large extent, from the ability of P2P systems to span multiple administration domains, and self-organize in the presence of highly transient population and failures [5].

In order to effectively support P2P applications (e.g. content distribution [2, 1, 16]), two basic issues have to be settled by a P2P system however: (i) data location/efficient routing and (ii) efficient multicast communication. In both cases, the challenge lies in devising scalable solutions to handle a potentially infinitely wide and continually running system with a transient population of nodes.

The notion of Overlay-network has emerged as a viable idea to settle such issues. An overlay network is formed on top of – and generally independently from – the underlying physical computer network, by the peers (nodes) of the P2P system. The edges connecting the nodes of the overlay network reflect some logical relationship among nodes defined at the application level. Scalability is achieved by choosing a proper ideal overlay structure, e.g. a tree structure for an efficient multicast [18, 7], a ring for an efficient data location/retrieval [17]. The dynamics of the population is taken into account by devising overlay maintenance protocols that are able to add and remove peers from the overlay. These maintenance protocols aim at properly rearranging the overlay to keep its structure as close as possible to the ideal one, despite such population dynamics. The ability to permanently approximate the ideal overlay has a drastic impact on the performance of the basic P2P functions.

Motivation. As conveyed by the literature in the area, overlay maintenance is a recent and difficult problem: not surprisingly, it is usually studied under various simplifying assumptions. For instance, many overlay maintenance protocols are proved to work with high probability assuming that concurrent overlay changes do not drastically affect intersecting sets of neighbors [26, 32].

Some protocols tolerate unexpected departures assuming threshold of failures [21, 28, 27]. Other protocols do not explicitly consider the rate of overlay changes [11], or assume the prior knowledge of the system size and the overlay changing rate [9]. Whilst these assumptions are usually reasonable and indeed lead to pragmatic solutions, their variety makes the corresponding protocols difficult to compare. In fact, no optimality result can be established about the problem, for this problem has never been precisely defined per se; it is more generally encompassed within the application to which the overlay is devoted.

(3)

We believe that it is time to come up with a precise characterization of this problem and a precise framework to express various underlying assumptions. The contribution of this paper is a first step in this direction.

Contribution. We present a distributed computing model that captures the dynamics of an evolv- ing P2P system, as well as a specification defining the overlay connectivity maintenance problem.

We illustrate the benefits of our specification by exhibiting inherent limitations of maintaining redundant overlays topologies. Moreover we show that the specification is implementable in that computational model by presenting an overlay maintenance protocol along with its correctness proof and a simulation study.

Our model abstracts the inherent characteristics of a P2P system [24, 9]: an unpredictable and potentially infinite number of processes that possibly continuously and simultaneously join and leave the system. We borrow from the rich theory of distributed computing concepts from the “infinite arrival” model and “unbounded level of concurrency” [12, 23]. Specifically, assuming unbounded concurrency means that the (finite) number of processes simultaneously joining/leaving the overlay is arbitrary and unknown. Assuming infinite arrival abstracts a permanently running overlay of a continuously growing size. This also implies that failures accumulate over time, i.e.

their number tend to infinity. It is worth to note that this computational model departs from classical static distributed system models in which: (i) the number of processes is fixed and a- priori known and (ii) solutions are designed to behave correctly under fixed and known threshold of failures [24].

We highlight the basic property that an overlay maintenance protocol should preserve over time:

overlay connectivity, i.e. the possibility for any pair of nodes to communicate. Many other properties are desirable for an overlay: i.e. low diameter, balanced degree among nodes, etc. However, we position overlay connectivity at the heart of these. More specifically, we introduce the notion of Eventual Strong Connectivity, which expresses the ability for overlay nodes to communicate from an arbitrary time onwards. Complementary properties are specified to guarantee overlay progress, i.e. to avoid trivial maintenance protocols that systematically prevent joins to the overlay.

We define our properties in a deterministic manner. It is important to notice that, while doing so, we do not rule out randomized solutions to the problem. Nevertheless, we rather believe that probabilistic specifications (and corresponding randomized protocols) are best patterned after deterministic specifications.

(4)

Thanks to our precise specification, we revisit the intuition that redundancy enables to maintain ideal topologies under concurrency and failures. Specifically, we prove that it is impossible to maintain a k-connected topology (i.e. a topology still connected after the simultaneous deletion of any subset of nodes with cardinality fewer that k[10]¹) if the number of nodes in the overlay can go under k at some point of time. In fact, even assuming practically that the number of nodes in the overlay never goes under a certain threshold, a k-connected overlay topology may partition after the occurrence of k failures. These considerations motivate the need for a dynamic restoring mechanism to recover from partitions after k failures, and highlight the importance of maintaining a 1-connected graph, namely a tree, as ideal topology. Our protocol can precisely be viewed as an implementation of these ideas to ensure eventual strong connectivity under concurrency and failures. In particular, the protocol is purely reactive to failures: each failure triggers a restoring.

Moreover, to reduce the number of recoveries/partitions, voluntary leaves are managed in a different way than failures. In particular, voluntary leaves are handled in an active way, i.e. a piece of code is executed before the actual deletion of the vertex from the overlay. The objective of the active handling of leaves is to maintain connectivity at all times when the overlay is affected only by voluntary leaves. We present performance measures that evaluate the practicality of our protocol.

We use these measures to identify a trade-off between the level of connectivity and the scalability and we discuss various ways to handle the trade-off.

The paper is organized as follows. Section 2 presents our distributed computing model while Section 3 defines the overlay connectivity maintenance problem and establishes the impossibility of maintaining a redundant topology over time. Section 4 presents a tree-based protocol solving the problem. In Section 5 the overlay maintained by the protocol is evaluated. Section 6 presents related work, and Section 7 concludes the paper. Due to lack of space the proof of the protocol correctness and the proof of the impossibility of maintaining a redundant topology are in [31].

2 System Model and Basic Definitions

We consider an infinite set of processes Π, uniquely identified. We denote processes by i, j, k. Pro- cesses communicate by exchanging messages through point-to-point reliable channels. We assume no bound on process relative speeds and message transmission delays. A process may be correct or faulty. A correct process never fails. A faulty process fails by crashing. To simplify the description

1A clique is a particular case of k-connected topology.

(5)

without losing generality, we assume the existence of a fictional global clock, whose output is the set of positive integers denoted by T .

Process State. Each process i ∈ Π has a finite set i.X of variables x ∈ Π ∪ {nil}, where nil is a special process that does not belong to Π. There is no communication channel from/to nil. i.x denotes a neighbor variable x of process i. By definition the nil process does not have any neighbor variable. The state of a process i is represented by i.X. Each neighbor variable i.x initially assumes a nil value.

Overlay Definition. At any time, the global state constituted by the set of processes and their neighbor variables define the overlay. Formally,

Definition 1 (Overlay O(t)). The overlay O(t) is the set constituted by each pair hi, i.Xi :

∃i.x ∈ i.X 6= nil at time t.

Any process i : hi, i.Xi ∈ O(t) is called a vertex of the overlay O(t).

Process Actions. Each process i is characterized by the following actions: init_i, join_i, leave_i, stop_i. The init_i action is an output action coming from the outside world (by an application). The effect of the init_i action is the initialization of the neighbor variables.

The join_i action is an internal action. The effect of the join_i action is the assignment of a non-nil value to some neighbor variable of i, from a state in which all neighbor variables have a nil value, i.e. the join_i action determines the addition of the process i to the overlay.

The leave_i action is an internal action. The effect of the leave_i action is the assignment of nil values to each neighbor variable of i, from a state in which at least one neighbor variable has a non-nil value, i.e. the leave_i action determines the deletion of the process i from the overlay.

The stop_i action is an input action that models a crash of the process. The effect of this action is the same for the leave_i action, i.e. the stop_i action determines the deletion of the process i from the overlay. After stop_i no other action occurs.

Concurrency. The admitted level of concurrency is finite and unbounded [23, 12]. The number of processes that may be concurrently added/deleted to/from the overlay at any time is finite, arbitrary and unknown. This implies that at any time t the number of pairs hi, i.Xi ∈ O(t) is finite but can grow without bound.

(6)

We assume an infinite arrival model [23, 12], i.e., the number of processes that might be added to the overlay (for any infinite system execution) is infinite. However, once a vertex i is deleted from the overlay (by leave_i or stop_i), it is deleted forever. This assumption is not restrictive as a process can become a vertex with a different identifier an infinite number of times (remember that the set Π of processes we consider is infinite). The set of vertices in the overlay may change infinitely often. Hence we make no eventual stability assumption, i.e. there is no point of time t after which O(t) = O(t⁰) for each t⁰ > t.

A vertex may be stable or transient. A stable vertex, once added to the overlay, belongs to the overlay forever. A transient process belongs to the overlay only a finite interval of time.

3 Overlay Connectivity Maintenance Problem

The goal of a connectivity maintenance protocol is dynamically manages the overlay in order to add and delete vertices while keeping the overlay connected. In the following, we precisely define the properties that such a protocol should ensure. Namely, we give (i) a property which captures a fundamental requirement on the state of the overlay and (ii) a set of properties that guarantee the progress of the overlay.

3.1 Overlay Connectivity

The overlay O(t) is said to be connected if for any pair of vertices there is at least a path of neighbor variables connecting them. At any time the overlay may be viewed as a directed graph (digraph) with an arc (i, j) iff j ∈ i.X. With some abuse of terminology we will refer to the overlay either as a global state or as the graph defined by its neighbor variables.

A directed path from i to j at time t is denoted by i →_tj and is defined as follows:

Definition 2 (Directed Path). For any i, j : hi, i.Xi, hj, j.Xi ∈ O(t), then i →_t j iff at time t one of the following conditions holds:

∃i.x ∈ i.X : i.x = j or ∃k ∈ Π : (i →_tk) ∧ (k →_tj)

Two vertices are weakly connected if there exists a directed path in between; more precisely:

Definition 3 (Weak Connection between Two Vertices). For any i, j :

hi, i.Xi, hj, j.Xi ∈ O(t) (i may be equal to j) are weakly connected, iff i →_tj ∨ j →_ti.

(7)

Two vertices are strongly connected if there exist two directed paths (one in each direction) in between; more precisely:

Definition 4 (Strong Connection between Two Vertices). For any i, j :

hi, i.Xi, hj, j.Xi ∈ O(t) (i may be equal to j) are strongly connected, iff i →_tj ∧ j →_ti.

The following property, called Eventual Strong Connectivity (ES), guarantees that every pair of vertices in the overlay are strongly connected from an arbitrary point of time onwards. A protocol ensuring eventual strong connectivity is not necessarily able to maintain strong connectivity at any time, e.g. some overlay disconnection is allowed. In this case the protocol should be able to restore the overlay. The meaning of having an eventual connectivity instead of a connectivity at any time comes from the impossibility of maintaining connectivity at any time in the model assumed (see Section 3.3).

Property 1 (Eventual Strong Connectivity (ES )). ∃t : ∀t⁰≥ t,

∀i, j : hi, i.Xi, hj, j.Xi ∈ O(t⁰) ⇒ (i →_t⁰ j) ∧ (j →_t⁰ i)

3.2 Overlay Progress

The ES property characterizes the connectivity of an overlay that may change infinitely often due to the level of concurrency and the infinite arrival model assumed. However, a trivial (and bogus) connectivity maintenance protocol may systematically avoid the addition of any element in the overlay.

Thus, to prevent trivial implementations, properties on the progress of the overlay are needed.

We define two different progress properties: Global Progress and No-Lockout.

Property 2 (Global Progress). If there exists a correct i ∈ Π that executes the join_i action at time t, then ∃t⁰, j : t⁰ > t ∧ hj, j.Xi ∈ O(t⁰) ∧ hj, j.Xi 6∈ O(t)

This property does not guarantee that the access to the overlay is granted to all processes executing the connectivity protocol, i.e. it allows a process j to be repeatedly granted the access to the overlay, while another process i trying to obtain the access is forever prevented from doing so. The following stronger progress property precludes this scenario.

Property 3 (No-Lockout). For each correct process i ∈ Π that executes the join_i action, ∃t : hi, i.Xi ∈ O(t).

(8)

3.3 Redundant vs Non-redundant Overlay Topologies

Our overlay connectivity maintenance specification does not impose any particular ideal topology to maintain. Its aim is to maintain a connected topology over time. In this context, it is natural to study if selecting as ideal topology a redundant one, such as a clique or a k-connected overlay, can help in ensuring connectivity.

Let n be the number of nodes in the system at a given time and consider that these nodes form a clique. At first glance, a clique is a robust topology which would be able to tolerate n deletions without disconnecting. However, from the computational model immediately comes the impossibility of maintaining a clique, as stated by the following Theorem (the proof is in [31]).

Theorem 1. If there exists a time t such that ∀hi, i.Xi ∈ O(t), i.X variables define a clique, then there exists also a time t⁰ > t in which the graph defined by each hi, i.Xi ∈ O(t⁰) has connectivity k < |O(t⁰)|.

Due to the fact that n is arbitrary, unknown and changing over time, possibly being less than or equal to k, we immediately derive that it is impossible to maintain even a k-connected graph.

This impossibility, however, does not apply under the practical assumption that the number of processes in the system is greater than k at any time. However, in a system in which (i) an unbounded set of vertices may belong to the overlay and (ii) that set may change infinitely often, e.g. it may grow without any bound; a k-connected overlay may partition after the deletions of an unknown and arbitrary small fraction of faulty vertices [24]. This means that a restoring mechanism is necessary to guarantee eventual strong connectivity (as in a non-redundant overlay topology).

This consideration, combined with the fact that managing a redundant overlay topology over time is much harder (and more expensive) than managing a non-redundant one, leads to revisit the benefits the use of redundant overlay topologies.

In the following section, we present a connectivity maintenance protocol that maintains a 1- connected graph, namely a tree, as ideal topology.

4 A Tree-based Connectivity Maintenance Protocol

Our protocol arranges the vertices of the overlay in a routed tree topology. A process i that wishes to join the overlay establishes a strong connection with an arbitrary node of the tree j, becoming a child of j. A node i that wishes to leave the overlay ensures first that the sub-tree rooted by i is

(9)

weakly connected to i’s parent. Any i’s failure in the tree is handled trough the re-connection of the sub-tree routed by i to another live parent in the tree.

Process state. In the following the overlay will be denoted as T (t). Each element of T (t) is composed of a pair hi, (i.parent, i.children)i where i.parent and each x ∈ i.children are neighbor variables. With respect to the process state described in Section 2, the state is enriched by the variable s; the variable s can contain values {in, out, joining, leaving, restoring}.

Process actions. Each process i performs a Tree() task to maintain the overlay. We detail the effects of actions init_i, join_i, leave_i described in Section 2. The init_i action: (i) it initializes the neighbor variables to nil values²; (ii) it initializes the s variable to out and (iii) invokes the Tree().

The join_i action sets the s variable to joining. The leave_i action sets the s variable to leaving.

T (t) initialization. The overlay is initialized by an initiator r, i.e. T (t₀) = hr, (r.parent = r, ∀x ∈ r.children, x := nil)i ∧ r.state = in. It means that r is a special process with an init_r action occurring at time t₀, which is slightly different from the init of all other processes. Upon the init_r action, r (i) sets s := in, (ii) r.parent := r and ∀x ∈ r.children, x := nil. Then, r invokes task T ree().

The contact() function. Any process, to be added to the overlay, must set one (or more) of its own neighbor variables to a non-nil value. The need for getting at least one vertex identifier arises as no a-priori knowledge is assumed. To join the overlay, a process invokes a contact() function.

The contact() function returns a set of processes called contacts. This function returns an arbitrary set of vertices.

The T ree() task. The T ree() task handles the addition/deletion³of the pair hi, (i.parent, i.children)i to/from T (t), i.e. it sets and updates neighbor variables.

As soon as a process i wishes to join the overlay, it sets its variable s to joining. Any vertex in the tree wishing to leave the overlay sets variable s to leaving. Any vertex in the tree wishing to

2The set of neighbor variables is unbounded, i.e. the number of neighbor variables is a-priori-unknown, thus this static initialization models a dynamic initialization of neighbors variables.

3Deletions of only voluntary leaves.

(10)

restore after disconnection has its s variable equal to restoring. T ree() handles the s state transi- tions: [joining/in], [leaving/out] and [restoring/in]. The completion of the transition [joining/in]

at time t denotes the addition of the pair hi, (i.parent, i.children) to T (t). The completion of the transition [leaving/out] at time t⁰, denotes the deletion of the pair hi, (i.parent, i.children)i from T (t). With this mechanism, all in processes are vertices of the tree, while all out processes are not vertices of the tree, i.e. each out process i has i.parent = nil ∧ ∀x ∈ i.children, x = nil.

A faulty vertex disconnects all its sub-tree (if any). To cope with faults T ree() has access to a failure detector module F D_i reporting a list of faulty processes. In this case all children enter a restoring state when their FD module reports their parent. T ree() recovers the overlay reconnecting restoring nodes to a new live parent. We assume that the failure detection is perfect [8], i.e. the F D_i list eventually comprises every faulty process (in the i’s neighborhood) and every process in the list is faulty.

In Figure 4 the T ree() pseudo-code is shown. In the following we detail the mechanism to join, leave, and restore.

The join mechanism. When the variable i.s assumes the joining value (line 4), i obtains a vertex a of the tree through the contact() function. Then i sends a “JOIN” message to a (line 5). Upon receiving the “JOIN” message, a adds i to its children if its a.s = in (the role of the rank variable is detailed for the restore mechanism), by setting a.children to i and sending an

“ACK JOIN” message back to i (lines 16,19). Upon receiving the “ACK JOIN” message, i sets i.parent to a (line 8) and sets the variable i.s = in (lines 31-32). At this point the [joining/in]

transition completes. With this mechanism, if the joining process establishes its connection at time t with a vertex j, i.e. i →_t j, then there exists a time t⁰ < t in which j →_t⁰ i (see Fig.1). This mechanism allows to couple i to a possible transient parent. As we see in the following, this parent before leaving will give a notice to all its children, i comprised. At line 9, the process i sets also its own rank to the rank of its parent plus one.

Due to concurrency, asynchrony, and failures the a vertex given by the contact() function could be out or failed before receiving the “JOIN” message from i ⁴. In this case we assume that F D_i will report a in its list, at this point i restarts by getting another contact a.

4We assume that a process which has left the overlay (an out process) is not obligated to respond to messages associated with the maintenance of the overlay connectivity.

(11)

j i

JOIN k

parent=k j ∈children

j i

k

i ∈children

j i

k

ACK_JOIN

parent=j

s=in s=in s=joining

s=in s=in

j i

k

Figure 1: Change of topology and message pattern for joining

The leave mechanism. When the variable i.s assumes the leaving value, the protocol allows to update the i’s neighbors before deleting i from the overlay. In practice, it ensures that the parent of i becomes the parent of each child of i. In case of two (or more) concurrent leaving processes i, j which are adjacent on a same path, the deletion of the pairs hi, i.Xi and hj, j.Xi during the concurrent diffusion of the corresponding updates may lead to a partition. To avoid partition, the two updates are serialized: before deleting j, the i⁰s update takes effect (or vice versa). After that, j updates its neighbor variables and sends a new update to new neighbors. In the presented protocol the deletion order is given by the distance from the root. The process closest to the root leaves first.

In particular, the pseudo-code includes a sequence of asynchronous basic steps, where a basic step is the sending and the receiving of a “LEAVE” and its “ACK LEAVE” message, respectively (lines 12-15). The [leaving/out]transition completes when a basic step succeeds (line 15), s = out.

More in details, during each step, a “LEAVE” message containing the children of i is sent to update the neighbor variables of the current i’s parent (line 12).

In the simple case where the i’s parent k is in when the “LEAVE”(i.children) message is received (line 21), k updates its children variable, it sends a “NEW PARENT”(k) message to all new children and it sends an “ACK LEAVE” message to i (line 23). In Fig. 2 the message pattern and the consequent changing of connectivity relations is described.

After line 23 (executed at time t), we have: (i)∀j ∈ i.children, j ∈ k.children, then k →_t j and (ii) k 6→_ti by eliminating i from k.children. When at time t⁰ the “ACK LEAVE” is received by i (line 13), i sets its neighbor variables to nil, then only the relation k →_t⁰ j holds. When at time t⁰⁰ “NEW PARENT”(k) is received by each j (lines 24-25), then the following relations hold:

k →_t⁰⁰j ∧ j →_t⁰⁰k.

(12)

j i

LEAVE(i) k

parent=k j ∈children

j i

k

i ∈children

j i

k

ACK_LEAVE

parent=j s=leaving

s=in

nil i

k

i ∈children-{j}

s=in

children=NIL s=out parent=nil

NEW _PARENT

parent=k

nil i

k

Figure 2: Change of topology and message pattern for leaving

In a more difficult case, the i’s parent k is leaving when the “LEAVE” message is received. In that case an order on the departures of i and its parent k is needed, to break the tie. The rule is the following: if i and its parent k have concurrent leaves, k leaves the system before j. In this case, while a step is running for i, a basic step succeeds for k. Then a new (strong) connectivity relation involves i and the k’s parent j, i.e. i →_t⁰⁰ j ∧ j →_t⁰⁰ i. A this point i will have to start a new step since the variable parent change has been set to true (line 19).

In the worst case scenario all ancestors of i are leaving processes. In this case the tree root is also involved in a leave, and thus a new root has to be found. Mechanisms to elect a new root are out of the scope of the paper. Thus, we consider only the case where the root is a stable process avoiding to consider election mechanisms. In this case the number of steps, before i ends successfully a basic step (at line 13), is bounded by its depth in the tree.

The restoring mechanism. Initially, the transition [joining/in] brings the process i to select one parent, namely j, which includes i as one of its children. When i belongs to the overlay (line 8), thanks to lines 27-28, the previous selected parent j is monitored. If the failure detector reports the parent j as faulty, then the variable degree turns to 0 and the process i switches in a restoring state (lines 29-30). Subsequently, i is in charge of restoring the overlay by connecting its sub-tree to a new live parent. To avoid cycles, each process maintains an additional variable rank that gives an indication of the position of the vertex in the overlay. The value of rank is defined during the [joining/in] transition and it is never modified. The root r has r.rank = 0. If a process i joins through a process j, i.rank is set to j.rank + 1. When a process i turns into a restoring state, it avoids to select as parent any parent k with k.rank ≥ i.rank.

(13)

We depict in Fig. 3 a possible overlay evolution and the use of ranks. After a growing phase in which only joins occur (Fig.3(a)), a leaving phase in which only leaves occur is depicted (Fig.

3(b)). Then a failure and the corresponding restoring is described in Fig. 3(c) and Fig. 3(d).

0

1 1 1

2 2

2 2 2 2

2

3 3

3 3 3

4 4 4 4 4 4

5

6

(a) overlay after a joining phase

0

1 1 1

2 2 2

2 2

2

3 3

4 3

4 4 6 4

(b) overlay after a leaving phase

0

1 1 1

2 2 2 2 2

2

3 3

3 4

4 4 6 4

(c) overlay affected by a crash

0

1 1

2 2

2 2 2

2

3 3 3

4 4

4 6 4

(d) overlay after a restoring phase

Figure 3: An example of overlay evolution

Protocol Correctness. We give in [31] formal proofs of the protocol correctness. We prove that the protocol is able to maintain strong connectivity in stabilization periods of the overlay. When the overlay is affected by failures the protocol does not avoid partitioning but is able to eventually restore the overlay. We discuss here the assumptions underlying the correctness of our protocol, giving an intuition of their necessity.

Stable Root Assumption. By the protocol, anytime i switches in a restoring state, if i never completes lines 4 − 8, then ES will be violated. This is precisely why we assume the following:

(14)

Assumption 1. The overlay O(t₀) = {hr, (r.parent = r, r.children = N IL)i} such that r is stable, s = in and rank = 0 .

Assumption 2. The contact() function eventually returns a stable vertex with s = in and rank = 0.

These assumptions guarantee the completion of the transition [restoring/in]. Basically, the contact() function should eventually return to a restoring process i a stable process which does not belong to the sub-tree of i.

Assumption 1 might appear very strong in presence of failures. It is, however, practically satisfied by every P2P overlays relying on a set of super-peers like Kazaa [2]. In a pure peer-to-peer model, in which super-peers are not present, recent studies have revealed the presence of a stable component inside the network [29]. Mechanisms to select the root among these stable peers may be pursued. From a theoretically point of view, Assumption 1 strengthens the non-determinism highlighting the need for an overlay component that is not affected by non-deterministic behaviors.

The protocol is based on a perfect failure detection mechanism[8]. Specifically, we assume the following assumptions:

Assumption 3 (completeness). There exists a time t after which a faulty vertex is permanently suspected by every correct process.

Assumption 4 (accuracy). No process is suspected before it crashes.

It is worth to note that completeness is necessary as it ensures that a broken overlay will be eventually restored. On the other hand accuracy is not strictly necessary but simplifies the problem.

The effects of a non perfect failure detection are extensively studied in [30].

5 Achieving Scalability under Dynamics

Defining what is an ideal overlay topology, i.e. individuating all desirable properties an overlay should maintain, strictly depends on application requirements. For our purposes we analyze two basic topology characteristics achieving scalability: (i) absence of overloaded nodes and (ii) low overlay network diameter. In case of a tree, scalability can be achieved by a k-ary balanced tree, i.e. a tree with all internal nodes with at most k children and where no leaf is much farther away

(15)

T ree()i

1 var : parent change := > Boolean;

2 degree := null {0, 1, null};

3 rank := ∞ {null} ∪ Integer;

4 when (s = joining ∨ s = restoring) do 5 a := contact(); send [“JOIN”, rank] to a;

6 wait until (degree = 1 ∨ a ∈ F Di∨ receive [“RETRY”] from a) 7 when (receive [“ACK JOIN”, r] from a) do

8 parent := a; degree := 1;

9 if (rank = ∞) 10 then rank = r + 1;

11 when ((s = leaving) and (parent change)) do 12 send [“LEAVE”(children)] to parent;

13 parent change := ⊥;

14 when (receive [“ACK LEAVE”] from a = parent) do 15 parent := nil; ∀k ∈ children := nil; s := out;

16 when ((receive [“JOIN,r”] from j)) do 17 if ((s = in) ∧ (rank < r))

18 then children := children ∪ {j};

19 send [“ACK JOIN”,rank] to j 20 else send [“RETRY”] to j

21 when ((receive [“LEAVE”(j.children)] from j ∈ children) ∧ (s = in)) do 22 children := childrenS

j.children − {j};

23 send [“NEW PARENT”(i)] to each j.children; send [“ACK LEAVE”] to j;

24 when (receive [“NEW PARENT”(j)] from a) do 25 parent := j;

26 parent change := >; degree := degree + 1;

27 while parent 6= nil do

28 degree := |parent| − |parent ∩ F Di|;

29 when (degree = 0) do 30 s := restoring;

31 when (degree = 1) do 32 s := in;

Figure 4: The Tree-based Protocol at i

from the root than any other leaf ⁵. In this case in an overlay with N nodes with k children per-nodes, the height of the tree h = dlog_kN (k − 1) + 1e.

The star effect. The level of connectivity assured by the active handling of leaves is at the cost of a loss of scalability. In fact, with a highly transient population of nodes, the topology may converge to some tree configurations that work very badly in terms of scalability, e.g. topology showing a very tailed distribution of node-degrees. This is because the root (like any stable node at low level) in the tree works like a “safe-buoy” for connectivity. Immediately after periods characterized by high churn rates (intended as the number of joins and leaves per unit of time) all vertices that were at high tree level before churning periods are being connected to these safe-buoys bringing to

5There may exist a difference between two leaves of at most 1 level.

(16)

a star-like overlay.

The re-balancing mechanism. To contend with the star effect a mechanism to dynamically re-balance the tree should be pursued. In this section we propose a very simple mechanism to re-balance the tree with the aim at evaluating its effectiveness under dynamics.

Assuming that the topology in the ideal state is a k-ary balanced tree, the mechanism is the following: any node i periodically selects a number of exceeding children equal to |i.children| − k.

These children may be chosen at random or following more sophisticated rules (e.g. choosing the youngest children). We consider a random choice. The node i sends a message to these exceeding children, inviting them to find another parent between their (not exceeding) siblings. Then each exceeding child j that is a leaf immediately tries to connect to one of its siblings k. If the connection successes, (i) j results connected to k with a rank j.rank = k.rank + 1 (ii) i deletes j from its children. In a pure star overlay this mechanism is very effective, cause all the children of the overloaded node are leaves. Unfortunately, if an overloaded node has any children with just one children the mechanism does not work anymore. For this reason, each exceeding child i dismisses all its sub-tree to become a leaf, and the mechanism provides to reconnect the sub-tree as an unique flat level around a not exceeding sibling of i.

Experimental Results Simulation is conducted by using Ns-2 simulator [3]. The aim of the simulation is point-out how fast the topology converges to a star under different churn rates. Then the re-balancing mechanism is evaluated under churn.

Convergence to a star-like topology under churn. Each simulation starts with a topology in an ideal state. In particular, at initial time t^sim₀ , T (t^sim₀ ) is a k-ary balanced tree with k = 7 and 1000 nodes. The simulation time is dived in time intervals called rounds. We fixed the round duration to 20sec. Let µ the churn rate, at each round of the protocol, µ nodes join and µ nodes leave. Then, at the end of each round the number of vertices is still 1000. The state of the overlay is checked every second. We report what is the velocity (in term of seconds) of reaching a star-like topology varying the churn rate ⁶. For pointing out the relation between the churn rate and the time for converging to a star-like topology, the y-axis reports round numbers instead of the seconds.

6Each point in the plots has been computed as an average of 40 simulation distinct runs. For each point all the results of these runs were within 4% each other, thus variance is not reported in the plots.

(17)

In this simulation we have considered that the overlay reaches a star topology when at least one node has a number of children greater than 100⁷. The plot in Fig. 5(a) shows that an increasing churn rate reduces the time the overlay converges to a star-like topology.

(a) convergence to a star-like topology (b) re-balancing effect under churn rate µ = 200

(c) re-balancing effect under different churn rates

Figure 5: Protocol effects on topology without and with the re-balancing mechanism

Re-balancing mechanism evaluation. The effects of the re-balancing mechanism are evaluated under a churn rate equal to µ = 100 (see fig. 5(b)) and under different churn rates (see fig. 5(c)).

We consider also a fixed percentage of failures equal to 1 for all experiments. The churn perturbs the overlay for 10 rounds (200 seconds) and after churn subsides. We have evaluated the degree of the most overloaded node in the overlay at every second and the height of tree after the churn subsides.

The plot in fig. 5(b) points out a periodic counter-effect due to the re-balancing mechanism striving the star convergence during the churn period. In the first round the most overloaded node arrives

7One order of magnitude greater than log₇1000.

(18)

to a degree equal to 25 but at the end of the round the re-balancing mechanism shrinks the degree to 8. The effect of a continuous churn leads to increase the minimum degree and maximum degree per round (e.g. in round 8, the maximum degree of the most overloaded is 36 and at the end of the round the degree goes to 15). At the end of perturbation the re-balancing mechanism brings the overlay to converge again to a well-balanced tree with a degree of most overloaded equal to 7 and a height of the tree equal to 8 (the height does not appear in the plots).

The impact of different churn rates in Fig. 5(c) reveals that the more is the churn rate the faster the convergence to a star, and less effective is the re-balancing mechanism during the perturbation interval. In any case in absence of churn the overlay converges to a balanced tree (the height is always equal to 8). The time for converging again to a balanced tree does not depend on the churn suffered previously.

6 Related Work

To the best of our knowledge, [21] is the only work which presents a precise definition of the overlay maintenance problem. In particular, an invariant for a ring maintenance protocol was defined stipulating that the ring topology is eventually restored after overlay changes subside. A protocol that satisfies the invariant in the absence of failures was then presented. A ring-based redundant structure was also proposed, however, the model they assume considers a finite number of processes and they do not deal with failures accumulating over the time. Our problem specification is more general in the sense that we consider any topology. In addition, our protocol tolerates failures while not assuming a known set of nodes. Even tough, we know of no other provably correct protocols, except [21], solving overlay maintenance problem, several approaches have been considered in the literature. We discuss them below and highlight their underlying assumptions. We classify them as being either structured and unstructured.

Structured overlays impose stringent neighbor relationships [17, 28, 27, 32]. The goal is typically to achieve efficient data location and discovery by providing a many-to-one mapping between data items and peers. Each data item is identified by a key and nodes are organized into a structured overlay topology that maps each key to a responsible node. Many different topologies have been considered in the literature [17, 27, 15, 32, 28, 6, 19, 22]. For instance, in Chord [17] each node and data item is associated with a unique key in an uni-dimensional space, specifically a ring. Chord enables discovery of a data item given its key in only O(logN ) hops with only O(logN ) graph

(19)

neighbors per node. However, as argued in Liben-Nowell et al. [9] none of those considers the fully evolutionary nature of P2P systems, as their (probabilistic) solutions are not proved to work in the face of continuous evolution. In particular, [9] points out the characteristics of a P2P system that should be considered, by defining a metric called half-life of the network, which essentially measures the time for replacement of half the nodes in the network by new arrivals. They prove a lower-bound on the generated traffic to maintain connectivity. In particular, they prove that Ω(logN ) nodes have to be notified of new nodes in the network with a frequency defined by the half-life. They consequently propose a maintenance protocol that with high probability repairs a ring topology over the time. This protocol assumes prior knowledge of the half-life and the system size.

Unstructured overlays do not aim at maintaining any deterministic topology. The goal is typically to support large scale information dissemination and in some cases content-based searching (flooding-based)[25]. There is however no mapping between content (e.g. data) and location (e.g.

node address) as in a structured overlay. In a sense, the goal of a connectivity maintenance protocol for unstructured overlays is to build, in a scalable manner, overlay topologies that exhibit good global properties like connectivity, low-diameter and constant-degree. Recently, several approaches to the construction of topologies based on a randomized choice of neighbors [20, 25] have been proposed. In [25] the authors propose a local heuristic that leads to the construction of a connected overlay of constant degree and logarithmic diameter with high probability. The drawback is that the proposed solution requires a central server representing a possible bottleneck with high join rates, and a single point of failures.

Distributed group membership protocols supporting gossip-based dissemination [11, 13, 4] rep- resent a particular class of maintenance protocols. These protocols provide each node with a small local view of the overlay membership at each node and membership information spreads in an epidemic style [14]. However, [11, 13] do not take into account the issue of the overlay changing rate explicitly. In [4] the authors evaluate the time expected for the overlay to partition as a function of the overlay size, the local view size and the overlay changing rate (called churn rate). The local view size needs to be larger than the churn rate for the expected time until partitioning to be exponential in the square of the local view size. The protocol proposed, however, assumes prior knowledge of the overlay size and the overlay changing rate to tune its parameters.

(20)

7 Concluding Remarks

The paper has focused on yielding a formal definition of the overlay connectivity problem. Doing so, a computational distributed system model has been specified in order to capture the dynamic nature of P2P systems encompassing the notion of infinite arrival and unbounded level of concurrency.

Then, the specification of the problem has been given in terms of both safety and liveness properties.

In particular it has been presented the property of Eventual Strong Connectivity which states that from an arbitrary point of time any pair of nodes must be able to communicate. We believe that this form of connectivity has the double advantage to be enough loose to encompass a wide range of protocols and enough strong to be useful in practice.

The paper has presented a protocol that maintains a tree as overlay topology. This is the actually first provably correct protocol guaranteeing eventual connectivity in a realistic model including unbounded concurrency and failures accumulating over the time.

Finally, the paper studied how to achieve scalability under dynamics. Specifically, with a simple extension of the protocol, simulations studies has revealed that a k-ary tree ideal topology can be reached in stabilization periods of the overlay.

References

[1] The gnutella web site, http://www.gnutella.com.

[2] The kazaa web site, http://www.kazaa.com.

[3] Ns-2 simulator, http://www.isi.edu/nsnam/ns.

[4] A. Allavena, A. Demers, and J. E. Hopcroft, Correctness of a Gossip Based Membership Protocol, Proceedings of the 24th ACM annual symposium on Principles of Distributed Computing (PODC05), 2005, pp. 292–301.

[5] S. Androutsellis-Theotokis and D. Spinellis, A Survey of Peer-to-Peer Content Distribution Technologies, ACM Computing Surveys 36 (2004), no. 4, 335–371.

[6] J. Aspnes and G. Shah, Skip graphs, Proceedings of the 14th annual ACM-SIAM symposium on Discrete algo- rithms, 2003, pp. 384–393.

[7] M. Castro, P. Druschel, A. Kermarrec, and A. Rowstron, SCRIBE: A large-scale and decentralized application- level multicast infrastructure, IEEE Journal on Selected Areas in communications 20 (2002), no. 8, 1489–1499.

[8] T. Chandra and S. Toueg, Unreliable Failure Detectors for Reliable Distributed Systems, Journal of the ACM 43 (1996), no. 2, 225–267.

[9] D. Karger D. Liben-Nowell, H. Balakrishnan, Analysis of the Evolution of Peer-to-Peer Systems, Proceedings of the 21st ACM Annual Symposium on Principles of Distributed Computing (PODC02), pp. 233–242.

[10] R. Diestel, Graph Theory, Springer, 2000.

[11] P. Th. Eugster, R. Guerraoui, S. B. Handurukande, P. Kouznetsov, and A.-M. Kermarrec, Lightweight Proba- bilistic Broadcast, ACM Transanctions on Computer Systems 21 (2003), no. 4, 341–374.

[12] E. Gafni, M. Merritt, and G. Taubenfeld, The Concurrency Hierarchy, and Algorithms for Unbounded Concur- rency, Proceedings of the 20th annual ACM symposium on Principles of Distributed Computing (PODC01), 2001, pp. 161–169.

[13] A. Ganesh, A. Kermarrec, and L. Massoulie, Peer-to-Peer Membership Management for Gossip-based Protocols, IEEE Transactions on Computers 52 (2003), no. 2, 139–149.

(21)

[14] R. A. Golding and K. Taylor, Group Membership in the Epidemic Style, Tech. Report UCSC-CRL-92-13, 1992.

[15] K. Hildrum, J. D. Kubiatowicz, S. Rao, and B. Y. Zhao, Distributed Data Location in a Dynamic Network, Tech.

Report UCB/CSD-02-1178, 2002, Updated version in ACM SPAA 2002.

[16] B. Wiley I. Clark, O. Sandberg and T. W. Hong, Freenet: a Distributed Anonymous Information Storage and Retrieval System, in Proceedings of the International Workshop on Design Issues in Anonymity and Unobserv- ability, 2001, pp. 46–52.

[17] D. Karger M. F. Kaashoek I. Stoica, R. Morris and H. Balakrishnan, Chord: A scalable Peer-To-Peer lookup service for internet applications, Proceedings of the 2001 ACM annual conference of the Special Interest Group on Data Communication (SIGCOMM01), 2001, pp. 149–160.

[18] J. Jannotti, D. K. Gifford, K. L. Johnson, M. F. Kaashoek, and J. W. O’Toole, Jr., Overcast: Reliable Multi- casting with an Overlay Network, Proceedings of the 4th Symposium on Operating System Design and Imple- mentation, 2000, pp. 197–212.

[19] M. F. Kaashoek and D. R. Karger, Koorde: A Simple Degree-optimal Distributed Hash Table, Proceedings of the 2nd International Workshop on Peer-to-Peer Systems (IPTPS03), pp. 98–107.

[20] C. Law and K. Siu, Distributed Construction of Random Expander Networks, Proceedings of 22nd IEEE annual conference on Computer and Communications Societies (Infocom03), 2003.

[21] X. Li, J. Misra, and G. Plaxton, Active and Concurrent Topology Maintenance, Proceedings of 18th Annual Conference on Distributed Computing (DISC04), pp. 320–334.

[22] D. Malkhi, M. Naor, and D. Ratajczak, Viceroy: a Scalable and Dynamic Emulation of the Butterfly, Proceedings of the 21st ACM Symposium on Principles of Distributed Computing (PODC02), 2002, pp. 183–192.

[23] M. Merritt and G. Taubenfeld, Computing with Infinitely Many Processes, Proceedings of the 14th International Conference on Distributed Computing, 2000, pp. 164–178.

[24] A. Mostefaoui, M. Raynal, C. Travers, S. Patterson, D. Agrawal, and A. El Abbadi, From Static Distributed Systems to Dynamic Systems, Tech. Report PI-1713, 2005, T.

[25] G. Pandurangan, P. Raghavan, and E. Upfal, Building Low-Diameter p2p Networks, IEEE Symposium on Foundations of Computer Science (FOCS01), 2001, pp. 492–499.

[26] C. G. Plaxton, R. Rajaraman, and A. W. Richa, Accessing Nearby Copies of Replicated Objects in a Distributed Environment, ACM Symposium on Parallel Algorithms and Architectures (SPAA97), 1997, pp. 311–320.

[27] A. Rowstron and P. Druschel, Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems, In Proceedings of the 18th IFIP/ACM International Conference on Distributed Systems Platforms (Middleware01), vol. 2218, 2001, pp. 329–250.

[28] M. Handley R. Karp S. Ratnasamy, P. Francis and S. Shenker, A Scalable Content-Addressable Network, Proceed- ings of the 2001 ACM annual conference of the Special Interest Group on Data Communication (SIGCOMM01), 2001, pp. 161–172.

[29] D. Stutzbach and Reza Rejaie, Towards a better understanding of churn in peer-to-peer networks, Tech. Report CIS-TR-04-06, University of Oregon, 2004.

[30] S. Tucci-Piergiovanni, Concurrent connectivity maintenance with infinitely many processes. ph.d. thesis, http://www.dis.uniroma1.it/∼midlab/publications.

[31] S. Tucci-Piergiovanni and R.Baldoni, Self-maintenance of overlay connectivity, http://www.dis.uniroma1.it/∼midlab/publications.

[32] B. Y. Zhao, J. D. Kubiatowicz, and A. D. Joseph, Tapestry: An infrastructure for fault-tolerant wide-area location and routing, Tech. Report UCB/CSD-01-1141, UC Berkeley, 2001.

Appendix

The Impossibility of a Clique. In this section we give a formal proof of the impossibility of maintaining a clique at any time. The proof is based on the following abstract solution of the problem: let us consider a very powerful contact() function that allows processes to get the complete

(22)

snapshot of the overlay at the time they invoke the function. contact() returns in contacts the finite number (by finite concurrency) of vertices in the overlay. Then, on the basis of the information got from the contact function, the process enters the overlay invoking an update() function. Once the update function returns the process has been added in the overlay ⁸. The update consists in adding i as new neighbor of each vertex in contacts and adding each vertex in contacts as neighbor of i.

To the aim of proving the impossibility of a clique at any time, the two operations can be assumed atomic⁹. Formally:

Theorem 2. If there exists a time t: ∀hi, i.Xi ∈ O(t), i.X variables define a clique, then there exists a time t⁰ > t in which the graph defined by each hi, i.Xi ∈ O(t⁰) it is not a clique and has connectivity k < |O(t⁰)|.

Proof. Let us suppose by contradiction that for all t⁰ > t, the graph defined by hi, i.Xi ∈ O(t⁰) is a (undirected) clique. This implies that for each t⁰ > t, ∀hi, i.Xi, hj, j.Xi ∈ O(t⁰), ⇒ j ∈ i.X ∧ i ∈ j.X.

Let us consider any run R in which only two processes i, j ∈ Π add to the overlay after time t. In particular, they invokes the contact() function at the time, respectively, t^r_i, t^r_j > t and the update() function at time, respectively, t^w_i , t^w_j > max(t^r_i, t^r_j). The contact() function returns all processes in O(t) (not comprising i and j). Then at time t⁰ = max(t^w_i , t^w_j ) the overlay O(t⁰) contains hi, i.Xi and hj, j.Xi but i 6∈ j.X and j 6∈ i.X with a connectivity k = |O(t)| = |O(t⁰)| − 2 contradicting the initial hypothesis.

Protocol Correctness Proof. The protocol correctness proof is organized as follows: Lemma 1 and Corollary 2 show that in absence of failures the protocol never disconnects and it is able to maintain eventual strong connectivity. Then to show that even in presence of failures the protocol assures eventual strong connectivity it is also shown that: any restore occurs in a finite time (Lemma 2) and at the end of the restoring phase the graph is still a tree (no cycle arise) by Lemma 3. Then Theorem 3 shows that the protocol never partitions after a given point onwards and Corollary 3 shows that the protocol assures eventual connectivity. Theorem 4 and 5 show that the protocol assures the progress of the overlay.

Lemma 1. In absence of failures, at any time t, T ree() arranges each vertex i ∈ O(t) in a connected digraph T (t) defined by parent and children variables.

Proof. initiated by a never changing root r. The proof is by induction on the connected digraph T (t).

8To avoid to solve our problem with the contact function, we also suppose that the invocation of the contact function is only intended to solve the bootstrap, i.e. it can be invoked at most once and before the invocation of the update() function.

9The impossibility arises from the non-atomic execution of contact() and the subsequent update()

(23)

Base case. As long as there are no state transitions, T (t₀) = {hr, (r.parent = r, r.children = nil)i}.

Induction. Suppose that all vertices in O(t) belong to the connected digraph T (t) defined by their parent and children variables.

The proof first considers a prefix of any run in which the only transitions of the variable s are [out/joining] transitions. Then, any prefix is extended to comprise even the [in/leaving] transitions of s.

[out/joining] transitions of s. Let us suppose that s_i changes from out to joining at time t at a process i. When i.parent changes its state from nil to j, the Tree Protocol had already established (in the following order): (i) one weak connection between i and a vertex j such that i ∈ j.children, (ii) a strong connection by establishing the weak connection i.parent = j. Then, when at time t, i becomes a vertex of O(t), i ∈ T (t).

[in/leaving] transitions of s. Let us suppose that s_i changes from in to leaving at time t at a process i. Let us suppose that at time t (i) i belongs to a branch B(t) of T (t) (ii) the branch has depth d and (iii) i is at level ` > 0 with with d, ` finite and unknown (by finite and unbounded concurrency).

Let B(t)= {k_d, . . . k_`, . . . k_m, k_m−1, . . . k₀} such that k₀ = r, i = k_`, k_m−1 = k_m.parent (m = 1 . . . d − 1).

case1. Suppose that i is the only vertex which triggers a transition [in/leaving] in B(t).

The deletion of i from the overlay would partition B(t) in two disconnected sub-branches B(t)⁰= {k_d, . . . k_`−1}, B(t)⁰⁰= {k_`+1. . . k₀}.

The Tree protocol assures that when i leaves, at least the weak connection k_`−1 → k_`+1: k_`+1 ∈ k_`−1.children is established. In particular, i sends the message “LEAVE(i.children)” to k_`−1 and remains in the overlay until an answer is received. When the “LEAVE” message is received, since k_`−1.s = in, k_`−1 inserts k_`+1 in k_`−1.children and an “ACK LEAVE” message is sent from k_`−1 to i. Thus, when i is deleted from the overlay, k_`+1 and k_`−1 are still (weak) connected, i.e. k_`−1 and k_`+1 ∈ T (t).

case2. Suppose that all processes k_m with ` ≤ m ≤ 1 concurrently trigger a transition [in/leaving]

in B(t). The simultaneous deletion of all k_m from the overlay would partition B(t) in two discon- nected sub-branches B(t)⁰= {k_d. . . k_`−1}, B(t)⁰⁰= {k₀}.

The Tree Protocol provides that each process k_mwith ` ≤ m ≤ 1 sends a “LEAVE (k_m.children)”

message to k_m−1 (line 12).

All vertices k_m−1 with m > 1 discard any “LEAVE(k_m.children)” while they have k_m−1.s = leaving. Thus, each k_m remains in the overlay at least until k₁ receives an “ACK LEAVE” mes- sage (line 14) sent by k₀ (line 23) (k₀.s = in by Assumption 1). As in case 1, when k₁ re- ceives this acknowledgment a weak connection between k₂ and k₀ : k₂ ∈ k₀.children has been established. Thus, each k_m with ` ≤ m ≤ 1 belong to a connected directed branch B(t₁)=

{k_d, . . . k_`, . . . k_m, k_m−1, . . . k₂, k₀} with t₁ the time in which k₁ leaves. At line 23, the vertex k₀ sends also a “NEW PARENT” message to k₂. Note that, the “LEAVE(k₂.children)” message cannot be acknowledged until the “NEW PARENT” message from k₀ is received. Only when this message is received k₂ sets k₂.parent to a vertex with the variable s = in. At this point k₂ retransmits “LEAVE(k₂.children)” to the new parent k₀. From the time in which k₀ receives the “LEAVE(k₁.children)” message to the time in which k₂ receives the “ACK LEAVE” message (from k₀), each k_m with ` ≤ m ≤ 1 belong to a weakly connected branch B₁ = B(t₁). When the

“ACK LEAVE” message arrives to k₂ at t₂, each k_m with ` ≤ m ≤ 1 belong to a connected directed branch B(t₂) s.t. k₃ ∈ k₀.children, and so on until k_`+1 becomes a child of k₀ after the leaves of all processes k_m with ` ≤ m ≤ 1. Then all k_m processes with ` < m ≤ d belong to T (t⁰) for each