Churn Resilience of Peer-to-Peer Group Membership: a Performance Analysis

(1)

Churn Resilience of Peer-to-Peer Group Membership: a

Performance Analysis

Roberto Baldoni Adnan Noor Mian Sirio Scipioni Sara Tucci-Piergiovanni

DIS, Universit´a di Roma La Sapienza, Via Salaria 113, Roma, Italia.

baldoni,adnan,tucci@dis.uniroma1.it

Abstract

It is a well-known fact that the partition is one of the main problems in p2p group membership. This problem rises when failures and dynamics of peer participation, or churn, occur in the overlay topology created by a group membership protocol connecting the group of peers.

Solutions based on Gossip-based Group Membership (GGM) cope well with the failures while suffer from network dynamics. This paper shows a performance analysis of SCAMP, one of the most interesting GGM protocol. The analysis points out that the probability of partitioning of the overlay topology created by SCAMP increases as the churn rate increases. We also compare SCAMP with another protocol, namely DET, that deterministically avoids partitions of the overlay. The comparison points out an interesting trade-off between reliability, in terms of guaranteeing overlay connectivity at any churn rate, and scalability in terms of creating scalable overlay topologies where latencies experienced by a peer during join and leave operations do not increase linearly with the number of peers in the group.

1 Introduction

Peer to peer (p2p) systems are rapidly increasing in popularity. Their interest comes from the fact that a peer-to-peer system is a distributed system without any centralized control. Thus there is no need of a costly infrastructure for direct communication among clients. Another specific characteristic of these systems concern peer participation that is each peer joins and leaves the system at any arbitrary time. In fact the dynamics of peer participation, or churn (the continuous arrival and departure of nodes) is an inherently property of a p2p system. Online statistics of GNUTELLA 2 for example, shows that the network size varies along the day hours between 250.000 and 370.000 nodes [13]. The peers communicate through application-level multicast protocols over an overlay network formed by the peers themselves [15, 6]. Due to churn, the overlay continuously changes. This implies that the group membership management protocols are crucial to the success of multicasting. Two issues are usually taken into account by such group membership protocols:

(i) scalability, that is, the operational overhead will not grow linearly with the size of the network and (ii) reliability which is the capacity to keep the overlay network connected in face of network dynamics.

(2)

Epidemic or gossip-based protocols [10, 11] are considered good candidates to cope with the issues of scalability and reliability. More specifically message redundancy cope with random network failures (reliability) and operational overhead grows only logarithmically with the network size (scalability). However their behavior in systems with high churn rates had received little attention in the literature. Only very recently this issue has been addressed in [1] which shows through analytical model that the probability of partitioning increases with the increase of churn rate. This has been also indirectly confirmed by the authors of one of the most interesting Gossip-based Group Membership protocols (GGM), namely SCAMP [11]. Authors state that GGM protocols (including SCAMP) are well-suited to fairly static systems, i.e. systems with low churn rates. SCAMP is an adaptive GGM in the sense that with the changing the size of the group, it maintains a reasonable overhead for each node and a certain degree of reliability.

The contribution of this paper is in understanding the churn resilience of group membership protocols in p2p systems in terms of reliability and scalability. More specifically we compare SCAMP against a deterministic group membership protocol, namely DET, a protocol that is able to deterministically maintain overlay connectivity assuming that a certain threshold of failures holds. This protocol is presented in a companion paper [2] ¹.

In setting up the simulation experiments, we faced the difficulty to come up with meaningful measures in such a continuously evolving systems. For example, to evaluate the reliability of the overlay networks, simulation studies often consider ”the proportion of the nodes reached by an application-level multicast”. But in the presence of high dynamics (with high rates of joins and leaves), it is hard to define ”which nodes are considered to be the intended receivers of a given message”.

We, therefore, used a well-known methodology which divides the simulation time into three intervals: a bootstrapping interval, a perturbation interval and a measurement interval. Boot- strapping is the interval of time in which only joins are allowed. During the perturbation interval all joins and leaves occurs at a given churn rate. At the end of the perturbation period, the churn terminates and after a transitory in which the system stabilizes, the measurement interval starts. In this interval no change to the overlay occur and we evaluate (i) reliability by using an application- level multicast protocol and calculating the proportion of nodes reached by the multicast and (ii) scalability by analyzing the overlay topology w.r.t. join/leave latencies.

Using this methodology, experiments confirm that SCAMP suffers a poor reliability under churn while it scales for any churn rates. In particular under a churn rate equal to 1 membership change (a join or a leave) per second the proportion of nodes reached by the multicast is the 80%. However, the node degree (for every node) remains always equal to the logarithm of the size of the group.

The analysis of DET shows that the overlay remains connected for any churn rate. This form of determinism is at the cost of more interactions required by a node which involve other nodes of the group (actually its neighbors) during the join and leave operations.This leads to an increase of latency of join and leave operations (note that these operations are instantaneous in SCAMP).

In particular, it is shown that if either DET protocol adopts policies to maintain low latencies for join/leave operations or the churn rate increases, the overlay converges to a star topology, thus

1For the sake of clarity, the paper contains a short description of DET in Section 2 and in Appendix A. The interested reader can refer the companion paper for a complete description of the protocol and its properties.

(3)

resulting in an overloading of one node completely. This means that scalability is compromised in any case.

The paper in Section 2 presents a brief description of SCAMP and DET protocols. In Section 3 experimental results are shown. In these experiments the protocols work with a bootstrapping time which is set to zero. Section 4 discusses the related work and section 5 concludes the paper.

Due to the lack of space we have presented and evaluated another scenario of SCAMP in the Appendix B. This scenario considers a bootstrapping phase. These experiments point out that the SCAMP, in the bootstrapping interval builds a cluster of nodes which are very-well connected. But this cluster remains poorly connected with nodes that joins during the perturbation interval. At this point reliability depends on ”who leaves the system”, i.e. if all the nodes, forming the cluster, leave then the reliability of the overlay will be low, leading to partitions and nodes isolations.

2 Group Membership Protocols

In the following we briefly describe the two protocols, namely SCAMP [11] and DET [2], which are evaluated in the simulations. Before that let us introduce the system model.

2.1 System Model

The system consists of an unbounded set of nodes Π (Π is finite). Any node may fail either by crashing or by leaving the system without using the defined protocol. A node that never fails is correct. The system is asynchronous: there is no global clock and there is no timing assumption on node scheduling and message transfer delays. Each pair of nodes p_i, p_j may communicate along point-to-point unidirectional fair lossy links[3]. Roughly speaking, a fair lossy link may intermittently drop a finite number of messages, and may do so infinitely often, but if p_i repeatedly sends some message to p_j and p_j never crashes, then p_j eventually receives the message.

Group Membership. Each node p_i ∈ Π may subscribe (join) and unsubscribe (leave) from the group G. The set of nodes constituting the group G at a certain point of time is a subset of Π with size unbounded and finite. The rules defining the membership of G are the following: (i) a node p ∈ Π becomes a member of G immediately after the completion of the subscription operation, (ii) a node p ceases to be member of G immediately after the completion of the unsubscription operation.

2.2 The SCAMP Probabilistic Protocol[11]

Scamp (Scalable Membership Protocol) is a gossip-based protocol, which is fully decentralized and provides each node with a partial view of the membership. It is adaptive w.r.t. a-priori unknown size of the group, by resizing partial views when necessary.

Data Structures. Each node maintains two lists, a PartialView of nodes it sends gossip messages to, and an InView of nodes that it receives gossip messages from, namely nodes that contain its node-id in their partial views.

Subscription Algorithm. New nodes join the group by sending a subscription request to an arbitrary member, called a contact. They start with a PartialView consisting of just their contact.

(4)

When a node receives a new subscription request, it forwards the new node-id to all members of its own PartialView. It also creates c additional copies of the new subscription (c is a design parameter that determines the proportion of failures tolerated) and forwards them to randomly chosen nodes in its PartialView. When a node receives a forwarded subscription, provided the subscription is not already present in its PartialView, it integrates the new subscriber in its PartialView with a probability p = 1/(1 + sizeof P artialV iew_n). If it doesn’t keep the new subscriber, it forwards the subscription to a node randomly chosen from its PartialView. If a node i decides to keep the subscription of node j, it places the id of node j in its PartialView. It also sends a message to node j telling it to keep the node-id of i in its InView.

Unsubscription Algorithm. Assume the unsubscribing node has ordered the id’s in its Par- tialView as i(1), i(2), ..., i(l) and the id’s in its InView as j(1), j(2), ..., j(l). The unsubscribing node will then inform nodes j(1), j(2), ..., j(l − c − 1) to replace its id with i(1), i(2), ..., i(l − c − 1) respectively (wrapping around if (l − c − 1) > l). It will inform nodes j(l − c), ..., j(l) to remove it from their list but without replacing it by any node id.

Recovery from isolation. A node becomes isolated from the graph when all nodes containing its identifier in their PartialViews have either failed or left. In order to reconnect such nodes, a heartbeat mechanism is used. Each node periodically sends heartbeat messages to the nodes in its PartialView. A node that has not received any heartbeat message in a long time resubscribes through an arbitrary node in its PartialView.

Indirection. This mechanism lets new subscriptions to be targeted uniformly at existing members. This is done by forwarding the newcomer’s subscription request to a node that is chosen approximately at random among existing members. The interested reader may refer to [11] for further details.

Lease mechanism. Each subscription has been given a finite lifetime called its lease. When a subscription expires, every node holding it in its PartialView removes it from the PartialView.

Each node re-subscribes at the time that its subscription expires. Nodes re-subscribe to a member chosen randomly from their PartialView. Re-subscriptions differ from ordinary subscriptions in that the partial view of a re-subscribing node is not modified.

2.3 A Deterministic (DET) Protocol [2]

The DET algorithm deterministically avoids the partition of the overlay. In particular, it provides each member with a partial view of at least 2f + 1 members, where f is the number of tolerated failures. The other important feature of the algorithm consists in imposing a partial order on nodes to manage concurrent leaves that potentially may cause a partition.

Data Structures. Each node p_i maintains two sets sponsors_iand sponsored_i. The union of these two sets is the partial view of the nodes p_i sends messages to. An integer variable rank_i gives an indication of the position of p_i in the overelay, inducing a partial order on nodes. A boolean variable leaving is initialized to ⊥.

Initialization of the group. A set of nodes {p₁, ...p_{3f +1}} ⊆ Π totally interconnected and defined in the initialization phase instantiates the group. All these nodes have rank rank_i = 0. They are special nodes having the property that they never leave the group.

Subscription Algorithm. Each node p_i joins the group by sending² a subscription request to an

(5)

arbitrary set of members, called contacts. When p_ireceives 2f +1 acknowledgments: (1) p_iincludes in sponsors_i all the senders; (2) it sets rank_i = max(rank_k, ∀senderp_k)+1. The subscription operation locally returns. When p_i receives a subscription request from p_j and p_i is already a member: (1) p_i inserts p_j in sponsored_i; (2) it sends an acknowledgement to p_j along with its own rank rank_i. At the end of subscription operation, a newly joined member has 2f + 1 members around itself. Note that, differently from SCAMP, the newly node becomes a group member only after 2f + 1 connections to current members have been established.

Unsubscription Algorithm. Each node p_i leaves the group by setting leaving_i = > and by sending an unsubscription request to sponsors_i along with (i) its own rank rank_i and (ii) nodes is responsible for (sponsored_i). When p_i receives a majority of acknowledgments from its sponsors the unsubscription operation locally returns. When p_i receives an unsubscription request from p_j and rank_i < rank_j and leaving_i = ⊥ (p_i is not concurrently leaving): (1) p_i inserts the nodes p_j that was responsible for in sponsored_i; (2) it sends an acknowledgement to p_j and (3) sends a notification to all nodes previously sponsored by p_j to notify that p_j has been replaced by itself.

When p_i receives a notification from p_j it replaces the old sponsor with p_j.

We refer to the Appendix A for a deepening on (i) the way in which the algorithm guarantees connectivity, (ii) techniques to keep DET scalable in terms of node-degree and (iii) a comparison between SCAMP and DET.

3 Simulations

Simulation is conducted by using Ns-2 simulator [18]³.

The aim of the simulation is to evaluate the behavior of both SCAMP and Det under churn. In order to understand what is the real impact of dynamics (joins and leaves/sec) on these protocols we conducted simulations in which no failure is simulated but only events of join and leave.

3.1 Simulation Framework

Each simulation involves a global number of nodes n_tot. Each simulation is divided in four inter- vals: the bootstrap interval, the perturbation interval, the transitory interval and the measurement interval (see Fig.1).

t₀ t₁ t₂ t₃

∆b

n=n₀ n=n₁ ∆p ∆t ⁿ⁼ⁿ^f ∆m

t₀ t₁ t₂ t₃

∆b

n=n₀ n=n₁ ∆p ∆t ⁿ⁼ⁿ^f ∆m

Figure 1: Time intervals of a simulation run

2Each message is sent through a fair lossy link, the send primitive embeds a retransmission mechanism that ends to retransmit until an acknowledgement is received. The send primitive is supposed non-blocking.

3The choice of Ns-2 was mainly due to the reason that we want to use the full protocol stack and test our protocol at the application level to have as far as possible real scenario of the working of the protocol. But, also as remarked in [1], due to the exponential nature of the phenomena it was only possible to simulate for small view size and / or high churn rates.

(6)

The bootstrap interval. The bootstrap interval ∆_b is intended as the phase in which the group grows (until a desired value is reached) and no leave occurs. In the bootstrap interval the group starts at t₀ with n₀ bootstrap nodes. At the end of the bootstrap interval (t₁) the group contains n₁ nodes. This means that the membership changes in the bootstrap interval consist in n₁ joins.

The perturbation interval and the transitory interval. The perturbation interval ∆_p is intended as the interval in which all membership changes (joins and leaves) are injected in the system.

The transitory interval ∆_t is intended as the interval in which all membership changes injected in the perturbation interval take effect. In each simulation the group starts this interval at t₁ with a number of nodes n₁ obtained after the bootstrapping it ends at t₃ with a number of nodes n_f = ¹₂n_tot⁴. In the perturbation interval we have a total number of leaves equal to ¹₂n_tot and a number of joins equal to n_tot− n₁.

The measurement interval. In the measurement interval ∆_m all measures are taken. In particular we test for both protocols (i) the proportion of nodes reached by a set of (data) messages sent by each node during the measurement interval, (ii) the average node degree, i.e. the average number of active connections per-node. The first and second metrics are related to the level of reliability shown by the protocols. Moreover, the second metrics show the overhead of the protocol. In the case of our protocol we also test the average latency of leaves, i.e. the average time between the leave invocation and the actual departure of the node from the group.

All measures are taken by varying the dynamics rate in the perturbation interval. To charac- terize the dynamics we use the churn rate metrics. The churn rate is the ratio between the number of membership changes, i.e. joins and leaves, and the duration of the perturbation interval. By considering a fixed number of joins and leaves, the churn rate varies by varying the duration of ∆_p. In particular for each simulation ∆_p varies from 5sec to 200sec. Arrivals and up-times follow an exponential distribution.

Simulated Scenarios. All the following simulations have n_tot = 160. We have first compared the two protocols in a scenario in which no bootstrap occurs. Then we have also evaluated SCAMP in another scenario to study the impact of bootstrapping (see Appendix B). Table 1 resumes the simulated scenarios. The scenario in which ∆_b > 0 is subdivided in three scenarios. In particular, we have evaluated what is the SCAMP behavior when (i) 1/4n_tot= 40 nodes enter the perturbation interval and all of them leave the group during the perturbation (0% of stable nodes), (ii) 1/4n_tot = 40 nodes enter the perturbation interval and half of them leave the group during the perturbation (50% of stable nodes) and (iii)1/4n_tot= 40 nodes enter the perturbation interval and none of them leave the group during the perturbation (100% of stable nodes).

n0 n1 joins during ∆p leaves during ∆p nf % stable nodes

∆b= 0 - 1 160 80 80 -

∆b> 0 1 40 120 80 80

0 50 100

4While for SCAMP, leaves and joins are instantaneous, for DET joins and leaves take some time. For this reason we cannot always consider that. Also at t2 the total number of nodes is nf, as some join issued in the perturbation interval could take effect only in the transitory.

(7)

In the first scenario and in the second scenario, at the beginning of ∆_b, ∆_p respectively, the starting node has a partial view which contains only itself.

Each point in the plots has been computed as an average of 40 simulation distinct runs. For each point all the results of these runs were within 4% each other, thus variance is not reported in the plots.

Protocols parameters. As no failure are simulated, we consider for DET f = 0 and for SCAMP c = 0. Even if we do not consider failures we have implemented for SCAMP a heartbeat mechanism to avoid isolation due to leaves ⁵. The heartbeat mechanism we figured out forces a node to re- subscribe if it has not received any heartbeat from its InView in 2.5 seconds. In some plot we have also implemented the lease mechanism for SCAMP. We set the lease duration equal to 50 seconds⁶. The determination of the contacts in DET. In this version of DET we use the following mechanism to join the group: each joining node sends a message to a list of contacts in which the node with rank 0 is always comprised and the other nodes are arbitrary. This allows to always get an acknowledgment in a short time (from the node with rank 0) when other contacts are not in the group. When the joining node gets more than one acknowledgement it selects as its sponsor the node with highest rank. Since all contacted members add the joining node in their partial views even if this node will select only one sponsor among all contacts, extra-messages are needed to purge non-necessary connections ⁷. More sophisticated mechanism can be considered for the join operation (as pointed out in [2] and reminded in the Appendix) at the cost of a high latency upon join/leave operations.

The determination of the contacts in SCAMP. For SCAMP the contact is only one and there is no special node that always belongs to the group (as the node of rank 0 in DET). For this reason, in order to augment the probability of finding an active contact we have implemented the an extra mechanism in which the joining node broadcasts a message to Π and chooses its contact inside the list of active nodes that have replied to the broadcast. Clearly, for high churn rates this node may choose a contact that has become inactive immediately after the reply. Note that for SCAMP once a subscription is sent, the node is logically a member of the group. Then, an inactive contact is a real problem that affects reliability. Note that even the indirection mechanism does not solve this problem as it is a mechanism that works well in fairly static systems [11].

3.2 Experimental Results with no Bootstrapping

Evaluating Reliability of the Topology generated by SCAMP & DET. In the measurement interval the group is freezed in a certain configuration.

Thus, members at the beginning of this interval remains in the group till the end of the simulation and no new member is added. The plot in Fig.2(a) shows that DET is able to guarantee

5Isolation may occur since contacts leave the system without giving a notice to nodes which joined through it.

6This value could be chosen smaller or bigger, but a value of 50 secs allows a reasonable number of resubscriptions in the considered intervals both in terms of scalability and in terms of added reliability.

7Mechanisms to purge non-necessary connections are discussed in [2].

(8)

(a) (b)

Figure 2: A comparison between SCAMP and DET after a ∆_p with churn rate equal to 240/∆_p

that each message sent by a group member in the measurement interval is delivered by every group member independently of the churn rate suffered during the perturbation.

On the other hand, SCAMP is sensitive to different churn rates suffered in the perturbation interval. In particular, only with churn rates lower than 1.5 membership changes (joins or leaves) per-second the proportion of nodes reached by a multicast is the 80% of nodes. Plot in Fig.2(b) shows as the poor reliability of SCAMP is due to a small average degree (from 2.1 to 2.7). This degree is ever less than the threshold of log(n_f = 80) = 4.38 to be reached for a successful working of SCAMP.

The average node-degree of DET only points out that the built topology is a tree, it has no direct relation with reliability. In the next paragraph we discuss scalability of DET considering the node-degree distribution.

Evaluating Scalability of the Topology generated by SCAMP and DET. To evaluate the scalability for SCAMP and DET it is necessary to examine the structure of the topology that they build.

In particular for DET the size of the contact list has a huge impact on the overlay topology since a small contacts list contributes to keep the message overhead small but the obtained topology converges to a star topology with the node of rank 0 in the middle. In the Fig. 3 plots showing the distribution of the node-degree in case of a contact list with size equal to n_tot (Fig. 3(a)) and equal to n_tot/10 (Fig. 3(b)). Note that the size of the contact list is a predominant parameter with respect to the churn rate. In particular, if the contact list is small the topology converges to a star even for low churn rates (Fig. 3(b) curve for ∆_p= 200sec). With a large contact list the topology converges to a star (more properly, the topology shows a set of hubs) only for high churn rates (Fig. 3(a) curve for ∆_p = 5sec), but the tree become deeper for low churn rates (Fig. 3(a) curve for ∆_p= 200sec).

Thus, it is confirmed for DET that the faster joins (a join takes a time equal to the maximum round-trip time between contacts links) the least scalable is the topology.

Note that the more sophisticated mechanisms to join, pointed out in [2] and reminded in the

(9)

(a) (b)

Figure 3: Degree Distribution for DET at t₃ for |contacts| = n_tot and |contacts| = n_tot/10

Appendix A, try to maintain a small contact list and a scalable generated topology at the same time.

However, these mechanism with high churn rates may lead to unpredictable latency of join/leave operations.

On the contrary, SCAMP is always able to balance the degree for each node (see Fig. 4(b)) showing a great scalability. The churn rate impacts only on the average degree and in the distribution degree (see Fig. 4(a)) affecting reliability.

(a) (b)

Figure 4: Degree distribution and node-degree of each node for SCAMP

Evaluating Leave Latency for DET. We evaluate leave latency as the average time that passes from the invocation of a leave (the sending of an unsubscription message) to the actual departure of the node from the group (the receiving of an acknowledgement). Note that for each node of rank 1, this time is equal to the round-trip time on the link connecting the node with the node with rank 0. As the rank increases the latency may increase as well. In the worst case a node with rank i may concurrently invoke its leave with all nodes with lower rank belonging to its branch.

(10)

In this case the latency becomes proportional to the rank of a node. Three factors influence the latency of a leave (i) the depth of the tree (deeper trees bring higher latency), (ii) the rate of leaves (higher rates brings higher latency) and (iii) link delays. The first factor depends on the size of the contact list. In practice, with a contact list very small (as pointed out in the previous paragraph) the tree converges to a star. In this case the average latency is equal to the round-trip time on the link connecting the node with the node with rank 0. The third factor depends on the underlying network behavior, then it may unpredictable. To avoid that an unexpected network behavior biases our analysis we consider (only for this particular evaluation) that all links have a RTT equal to 0.02ms ⁸. Then, we have chosen to evaluate the leave latency in the case in which (i) all node leaves the system at the same time⁹, (ii) the contact list is very large (|contacts| = n_tot) and (iii) the churn rate is low (∆_p= 200s). In this way the tree is a branch and we can evaluate the worst case for leave latency but the best case for scalability of the topology.

In Fig. 5 the latency distribution and the number of conflicts for each node, i.e. the number of unsubscription messages received by a node when it was leaving, is shown.

(a) (b)

Figure 5: leave latency and number of conflicts for the most scalable topology generated by DET This behavior confirm that the most scalable topology for DET is at the cost of latency of leaves and joins (as pointed out in the previous paragraph). For this reason DET provides reliability at the cost of scalability (either in terms of a not scalable topology or in terms of join/leave latency).

3.2.1 The impact of the lease mechanism in SCAMP

The plots in Fig. 6 shows as the lease impacts the reliability of SCAMP under churn. In practice, the lease mechanism does not influence in the average the reliability of SCAMP. What the lease produces is a high clustering of the group, i.e. most of the nodes are very-well connected and some nodes are isolated. To point out this behavior see (i) Figure 7(a) in which the average number of isolated nodes is in SCAMP higher than in SCAMP without lease and (ii) Fig.7(b) in which not isolated nodes have an average degree higher than in the case of SCAMP without lease.

8This value has been chosen so small only for convenience.

(11)

(a) (b)

Figure 6: The impact of the lease in SCAMP on reliability and on the average degree

(a) (b)

Figure 7: Node isolation and degree for SCAMP with lease and SCAMP without lease

The reason underlying this behavior is that the lease mechanism forces even a connected node to re-subscribe contacting an arbitrary member of its partial view. With high churn this member may be inactive. Even if the lease is repeated the same scenario may occur. On the other hand, for those nodes that find an active node upon the resubscription, there is a new dissemination in the system of their node identifier that enlarges partial views. It is clear that in fairly static systems (systems with very low churn rates) the lease mechanism has a valuable impact as shown in [11].

4 Related Work

The group membership problem has been extensively studied, and many specifications and implementations exist in literature. The first papers to study the group membership problem were [9]

9It means that during the perturbation there are two phases: a growing phase in which all joins occur and then a phase in which all leaves occur at the same time.

(12)

(in the context of synchronous systems) and [19] (in the context of asynchronous systems). Later a good survey of group communication and group membership has appeared in [7]. The ISIS system [5, 4] and a group membership mechanism by Cristian [8] provide each node with the same sequence of group views, Similarly the Arjuna system [17] maintains a logically centralized group. These group membership mechanisms ensure greater consistency of group views at the expense of latency and communication overhead.

Probabilistic gossip-based algorithms are being widely studied now. These are easy to deploy due to their simplicity and implement a pro-active reliability mechanism [12]. However, this simplicity is achieved at the expense of more traffic on the network. While gossip protocols are scalable in terms of the communication load imposed on each node, they usually rely on a non-scalable membership algorithm. This has motivated work on distributing membership management [10, 11]

in order to provide each node with a random partial view of the system, without any node having global knowledge of the membership. However, Jelasity et al. in [14], through an extensive and valuable experimental analysis (not comprising SCAMP), point out the inability of GGMs to make a uniform sampling of peers. Allavena, Demers and Hopcroft have recently proposed a new scalable gossip based protocol[1] for local view maintenance without requiring the assumption of uniformly random views but based on a so-called reinforcement mechanism. They have also given theoretical proofs regarding the connectivity of the graph under churn. They also show that all GGM protocols that does not enjoy a reinforcement mechanism converge to star topology under churn.

Liben-Nowell et al. [16] has given a theoretical analysis of structured peer to peer networks operating in the face of concurrent joins and unexpected departures. They define a metric, half-life of the network, which essentially measures the time for replacement of half the nodes in the network by new arrivals. This metrics is coarser than churn rate which measures the number of membership changes per second. On the other hand the half-life is useful when the size of the network is fixed.

Stutzbach and Rejaie in [20] have tried to explain the dynamics of P2P network systems or the dynamics of peer participation in terms of churn. They have characterized churn in Gnutella Network based on the fine-grained monitoring of the entire population. They have observed that the behavior of arrival and departure follows a power-law distribution, contradicting previous studies assuming a poisson distribution of arrivals and an exponential distribution of departures.

5 Conclusion

Through an experimental analysis which compares two p2p group membership protocols, this paper has pointed-out a sharp trade-off between reliability of the generated overlay topology and its ability to scale under churn.

In particular, maintaining an overlay scalable under high churn rates and without sacrificing reliability, latencies of joins and leaves operations become unpredictable. On the other hand, keeping latencies reasonably small (at least predictable) under high churn rates without sacrificing reliability means obtaining not-scalable overlays as stars. In fact, the simulation study pointed out that to obtain overlay scalability and small join/leave latencies in dynamic systems, reliability is compromised. On the contrary, to obtain overlay reliability and join/leave latencies predictable in dynamic systems, overlay scalability is compromised.

(13)

References

[1] Andr´e Allavena, Alan Demers, John E. Hopcroft. Correctness of a Gossip Based Membership Protocol, To appear in Proceedings of ACM Conference on Principles of Distributed Computing (2005)

[2] Roberto Baldoni and Sara Tucci Piergiovanni, Group Membership for Peer-to-Peer Communication, Techni- cal Report May 2005, available on http://www.dis.uniroma1.it/∼midlab/publications

[3] Anindya Basu, Bernardette Charron-Bost, Sam Toeug: Simulating Reliable Links with Unreliable Links in the Presence of Process Crashes. Proceedings of the 10th International Workshop on Distributed Algorithms:

105 - 122 (1996)

[4] Kenneth P. Birman, Robert Cooper, and Barry Gleeson. Programming with process groups: group and multicast semantics. Technical report TR-91-1185. Department of Computer Science, Cornell University (1991)

[5] Kenneth P. Birman and Thomas A. Joseph. Reliable Communication in the Presence of Failures. ACM Transactions on Computer Systems, 5(1): 47-76 (1987).

[6] Miguel Castro, Peter Druschel, Anne-Marie Kermarrec, Antony Rowstron.Scribe: A Large-scale and De- centralized Application-level Multicast Infrastructure. IEEE Journal on Selected Areas in communications (2002)

[7] Gregory Chockler, Idit Keidar, Roman Vitenberg. Group Communication Specifications: a Comprehensive Study. ACM Comput. Surv. 33(4): 427-469 (2001)

[8] Flaviu Cristian. A Probabilistic Approach to Distributed Clock Synchronization. Proceedings of 9th Inter- national Conference on Distributed Computing Systems (Newport Beach, CA), pages 288-96 (1989). IEEE Computer Society Press.

[9] Flaviu Cristian. Reaching Agreement on Processor Group Membership in Synchronous Distributed Systems.

Distributed Computing, 4(4):175-187, April 1991.

[10] Patrick Th. Eugster, Rachid Guerraoui, Sidath B. Handurukande, Petr Kouznetsov, Anne-Marie Kermar- rec.Lightweight Probabilistic Broadcast. ACM Trans. Comput. Syst. 21(4): 341-374 (2003)

[11] Ayalvadi J. Ganesh, Anne-Marie Kermarrec, Laurent Massouli´e: Peer-to-Peer Membership Management for Gossip-Based Protocols. IEEE Trans. Computers 52(2): 139-149 (2003)

[12] I. Gupta, K.P. Birman, and R. van Renesse. Fighting Fire with Fire: using Randomized Gossip to Combat Stochastic Scalability Limits. Special Issue of the Journal on Quality and Reliability Engineering Interna- tional: Secure, Reliable Computer and Network Systems, 29(8):165–184, May 2002.

[13] The Aenea Gnutella2 Crawler. http://crawler.instantnetworks.net.

[14] M´ark Jelasity, Richard Guerraoui, Anne-Marie Kermarrec and Maarten van Steen. The Peer Sampling Ser- vice: Experimental Evaluation of Unstructured Gossip-Based Implementations, Middleware 2004, volume 3231 of Lecture Notes in Computer Science, pages 79-98. Springer-Verlag, (2004).

[15] John Jannotti, David K. Gifford, Kirk L. Johnson, M. Frans Kaashoek, James W. O’Toole. Overcast: Reliable Multicasting with an Overlay Network. Proceedings of 4th Symposium on Operating System Design and Implementation, pages 197-212, San Diego (2000)

[16] David Liben-Nowell, Hari Balakrishnan, David Karger: Analysis of the Evolution of Peer-to-Peer Systems.

Proceedings of ACM Conf. on Principles of Distributed Computing (2002).

[17] Mark C. Little and Santosh K. Shrivastava. Replicated k-resilient Objects in Arjuna. Proceedings of Workshop on Management of Replicated Data, pages 53-8, Houston, Texas (1990).

(14)

[18] Ns-2 simulator. http://www.isi.edu/nsnam/ns.

[19] A. M. Ricciardi and K. P. Birman. Using Process Groups to Implement Failure Detection in Asynchronous Environments. In Proc. of the 10th ACM Symposium on Principles of Distributed Computing, pages 341-352, August 1991.

[20] Daniel Stutzbach, Reza Rejaie, Towards a Better Understanding of Churn in Peer-to-Peer Net- works.Technical Report CIS-TR-04-06, Department of Computer Science, University of Oregon, November, 2004

(15)

Appendix A

DET Behavior

When a node p_i terminates the unsubscription operation d(2f + 1)/2e = f + 1 sponsors of p_i has already become the new sponsors of those nodes previously sponsored by p_i. Those new sponsors have a rank greater than the rank of the newly sponsored nodes. Moreover, as p_i can pick at random those k + 1 own sponsors among its 2k + 1 sponsors, different graphs may arise. The algorithm works since, among 2f + 1 sponsors, at least f + 1 are correct. In this way, any subset of f + 1 nodes picked up by a leaving node contains at least one correct node. This assures that upon a leave of a node p_i, around each node sponsored by p_i the sponsors number does not decrease, i.e.

it remains always at least equal to 2f + 1, among which only f nodes are faulty.

Note that that algorithm manages concurrent leaves that potentially may partition the overlay.

Figure 8a shows the scenario in which a concurrent departure of all node of rank 1 and rank 2 leads to a partition. In practice, the algorithm sequences the leaves, by allowing a leave of a node of

0

0 0 0

1

2 2 2

subgraph

(a)

0

0 0

0 ^subgraph

(b)

0

2 2 2

subgraph 0

0

2 2 2

subgraph

(c)

Figure 8: non-sequenced and sequenced departures of all nodes of rank 1 and 2.

rank rank_j only if none of its sponsors p_i with rank rank_i < rank_j is concurrently leaving. Note that p_j remains blocked as long as new sponsors of p_j concurrently leave. Eventually, if all nodes with rank less than rank_j leave, then p_j will have all sponsors with rank 0. Let us remind that nodes with rank 0 cannot leave, then p_j may eventually leave. Fig. 8c shows how the algorithm avoids the partition by allowing the departures of only nodes with rank 1 upon the concurrent leave invocation of all nodes of rank 1 and 2. Note that DET, differently from SCAMP, delays the actual departure of a leaving node. In the simulations we measure the average delay suffered by leaving nodes upon their departure.

Techniques to keep scalable partial views in DET. The overhead per-node is proportional to its degree. This degree (number of sponsors + number of sponsored) may be predefined or mechanisms to adapt the degree to the total number of nodes may be figured out. First of all, if the number of accepted connections (number of sponsored) is defined and never changes, it means that when a node p_i leaves, some of its sponsored nodes may be not accepted by a p_i’s sponsor. In this case the non-accepted nodes should re-join the group.

An adaptive mechanism for resizing partial views can be the same used by SCAMP. In practice, an indirection mechanism may be pursued to allow that a joining node chooses its contact uniformly at random, starting from a node with rank 0. Moreover, each node p_i accepts with a probability equal to 1/(|sponsored| + 1) both (i) a newly subscription by a node p_j (in this case p_i is the

(16)

contact of p_j) and (ii) an already group member p_k whose sponsor has left. If p_i does not accept the subscription of a newly node p_j, p_j is in charge to retransmit the subscription (to a node uniformly chosen at random). If p_i does not accept an already member p_k, p_i sends a message to p_k to warn it that its node-degree is being decreased. When p_k knows that its degree has been decreased, it can re-subscribe (maintaining its current partial view) in order to come back to its previous node degree by finding another sponsor.

It is worth to note that these sophisticated mechanisms to maintain scalable partial views have a high cost in terms of join/leave latency. In fact, a joining node should wait that 2f + 1 members accept its connection and a leaving node should have 2f + 1 sponsors before leaving. In a fairly static system the latency experienced would be reasonable, but it is not the case when the group dynamics is high.

It means that scalability of partial view is achieved at the cost of latency.

SCAMP vs DET. In SCAMP when a node wants to join or leave, it simply sends a join or leave message and does not wait for acknowledgement messages where as in DET whenever a node wants to join or leave the group, it has to wait for the acknowledgement message after sending the request. The advantage of this type of behavior is that it supports connectivity. But obviously this advantage is not without any cost. The cost incurred is the latency in the join and leave process.

In this respect DET requires a more responsible node behavior than SCAMP when a node leaves the group. This may be a drawback in a peer-to-peer environment in which nodes are supposed to be selfish, i.e. they do something only if they gain something.

Appendix B

The Impact of Bootstrapping in SCAMP

In these simulations we want to evaluate the SCAMP behavior under the condition in which leaves occur only after the group has reached a size equal to a good fraction of the total number of node involved. For this reason we have considered that in the bootstrap phase the protocol builds a starting group of n₁ = 1/4n_tot = 40.

Evaluating the join dynamics rate in ∆_b. SCAMP is sensitive to different join dynamics rates (number of subscriptions per second) injected in the bootstrap interval to get quota n₁. The following plots (Fig. 9(a) and Fig. 9(b)) shows that the n₁ nodes are better connected if the join dynamics rate is at most equal to 2 joins per second. Note that the partial view size (in the growing phase to equal the node degree) does not increase even if the rate is lower than 2s⁻¹. Then, in the following we consider a bootstrap interval of 40s, i.e. a join dynamics rate of 1s⁻¹.

Evaluating Reliability of the Topology generated by SCAMP. In the plots of Fig.10 the effect of bootstrapping is evaluated in three different scenarios: (i) the whole group obtained after the bootstrap stably remains in the group for the entire duration of the perturbation, (ii) only half of the group obtained after the bootstrap remains stably in the group for the perturbation duration and (iii) the whole group obtained after the bootstrap leaves the group during the perturbation.

(17)

(a) (b)

Figure 9: Measures taken at time t₁ after a join dynamics rate of 40/∆_b

The worst case is in the third scenario. At first sight it could seem surprising that this scenario is even worse than the scenario with no bootstrapping (see Fig.2(a)). The explanation behind this behavior lies in the fact that in the bootstrap interval SCAMP creates a cluster very well-connected of 40 nodes. This cluster remains poorly connected with other nodes that successively enter the group during the perturbation¹⁰. Thus, if nodes of this cluster leave (0% stable nodes), the graph at the end of the perturbation interval is poorly connected or may be directly partitioned. For the same reason, the reliability provided by SCAMP in the best case, showed in the first scenario, does not depend on the churn rate.

(a) (b)

Figure 10: SCAMP reliability with ∆_b > 0 and a ∆_p with churn rate equal to 200/∆_p

10This is due to the fact that nodes joining the group during the perturbation may never get the forwarded subscriptions of the nodes forming the cluster, as the forwarding is already over. In this case the new nodes may never add in their partial views the identifiers of cluster nodes.