Group Communication in Ad-hoc Networks: a few Results and a Case Study

(1)

Group Communication in Ad-hoc Networks: a few Results and a Case Study

^∗

Roberto B

ALDONI^†

Jean-Michel H ´

ELARY^∗

Sara T

UCCI

-P

IERGIOVANNI^†

†

DIS, Universit´a di Roma La Sapienza, Via Salaria 113, Roma, Italia.

∗

IRISA, Campus de Beaulieu, 35042 Rennes-cedex, France.

baldoni,tucci@dis.uniroma1.it helary@irisa.fr

Abstract

In the context of mobile ad-hoc networks, this paper stud- ies the problem of reliable communication among a set of processes forming a group in a dynamic system with an un- bounded number of processes. A communication is reliable if any message sent by a member of the group that never leaves or leaves the group in an intentional way is eventu- ally delivered to every member that doesn’t leave the group.

The problem is studied in a setting where communication links model a temporary disconnection among group mem- bers. The paper shows that additional assumptions on the system model are necessary to assure reliable communica- tion. Then it presents a protocol, based on these assump- tions, solving the problem in a meeting/conference room ap- plication scenario.

1. Introduction

An ad-hoc network is a network established ”on the fly”

by a group of (mobile) nodes. The communication does not rely on an underlying (preexisting) infrastructure: the communication infrastructure is made by the very mobile nodes. This feature makes ad hoc networks potentially more pervasive than fixed networks. For instance, an ad-hoc network allows communication where there is no possibility to install a fixed infrastructure or where a disaster has dam- aged the existing infrastructure. Additionally, ad hoc networks are inherently scalable: there is no upper bound on the number of nodes constituting the network. Per contra, the reliability of the communication (as the probability that a message reaches intended receivers) may be threatened by the absence of a fixed infrastructure.

Our aim is to study from a theoretical point of view re- liable communication among a group of nodes/processes running a distributed application in an ad-hoc network. To

∗The work described in this paper was partially supported by the Italian Ministry of Education, University, and Research (MIUR) under the IS- MANET project.

model this environment, we assume that (i) communica- tion among nodes occurs through intermittently links which model temporary disconnections and (ii) no upper bound on message transfer and node scheduling delays holds (asyn- chrony).

In fixed networks it is well known that a message may be reliably sent, i.e. there exists the guarantee that the message will reach the (non-faulty) destination, only if the sender never fails. A non-faulty sender may reliably send its messages since it can ideally retransmit messages infinitely of- ten (coping with message losses). Extending this result to a mobile context, we immediately find that a stationary node, i.e. an up node indefinitely participating in the group, may reliably send its messages. Our aim is going beyond this trivial result, studying under which conditions there exists the deterministic guarantee that a non-stationary node that intentionally leaves the group (from now on called tran- sient) may reliably send its messages. The importance of this study is tightly related to the following reasons: (i) a transient sender models many real scenarios, e.g. a user of a conference/meeting room ad hoc-networking (building coverage) that can control its own ability to communicate with other connected users (e.g. it does not move outside the building) while the application is running and (ii) in a mobile environment a stationary node represents an excep- tion, then it may be useful extending reliability to messages sent by non-stationary nodes.

Therefore, we present a formal specification of the communication reliability problem inside a group (called DGC and formally stated in Section 4.1) well suited to mobile en- vironments. The paper shows that the DGC problem can be solved only if the system model is augmented with some additional assumption. The paper also presents a simple protocol solving DGC in a conference/meeting room application scenario.

The paper is organized as follows. Section 2 presents the system model. Section 3 addresses the problem of reliable point-to-point communication. In Section 4 the DGC problem is specified, and some impossibility result is presented.

In Section 5 the case study of a conference/meeting room

(2)

application is considered and a protocol tolerating t faults is presented. Section 6 presents the related work.

2. System Model

The system is composed by a set Π of processes. The cardinality of Π is unbounded and finite. Processes communicate via message passing, the system is asynchronous:

there is no global clock and there is no timing assumption on process scheduling and message transfer delays. The group G is a subset of Π. At any time the size of G is unbounded.

The rules defining the membership of G are the following:

• a node p ∈ Π becomes a member of G immediately after it generates the joinp(G) event.

• a node p 6∈ G if it has not still generated joinp(G) or immediately after it generates leave_p(G).

• a node can join G only once. It can rejoin with another identifier.

Therefore, p belongs to G during the time between the joinp(G) and the leavep(G) events.

Processes may be correct or faulty. A faulty process may fail either by crashing or may suffer a permanent partition from the group. We indicate both these events as f ail_p. After a f ail_pno event occurs at p. A correct process never produces f ail_p.

Moreover, processes may be stationary or transient. A stationary process p never generates leave_pneither f ails_p. Then, a stationary process is correct. Note that a stationary process can join the group at any time. A transient process p produces the leavepevent within a finite time. Note that a transient process p does not suffer any failure before leaving the group, i.e. till leavep. This means that leavepmodels only an intentional exit completely under the control of the process. A transient process p may be correct or faulty (after leavepgenerates f ail_p).

A pair of processes p, q ∈ Π may communicate along point-to-point unidirectional fair lossy links. A link from a process p to a process q is characterized by the following properties [5]: (i) Uniform Integrity : for all k ≥ 1, if q receives a message m from p exactly k times, then p sends m to q at least k times, and (ii) Fair Loss : if p sends a message m to q an infinite number of times, then q receives m from p an infinite number of times or q is faulty or q leaves.

The event f lp2pSendp(m, q) denotes the sending by p of message m to q through a fair lossy link, and the event f lp2pReceivep(m, q) denotes the receipt by p of message m, sent by q over a fair lossy link (both events occur at p).

3. Reliable point-to-point Communication

In this section the problem of reliable point-to-point communication is presented. We first present the formal

specification that a reliable point-to-point communication should satisfy in a mobile environment, then we show some impossibility result.

3.1. Specification

The point-to-point reliable communication abstraction between two processes p and q involves the two statements rp2pSend(m, q) andrp2pReceive(m, q). The execution of therp2pReceive(m, q) statement by p, produces only one event, namely rp2pReceivep(m, q).

The communication statements are captured by the following specification.

reliable delivery. If a stationary or transient process p executes a rp2pSend(m, q) then rp2pReceiveq(m, p), leaveq or f ailq occurs.

no creation. If rp2pReceive_p(m, q) occurs, then m was previously sent trough arp2pSend(m, p) by q.

no duplication. For each message m,

rp2pReceivep(m, q) occurs at most once.

3.2. Implementation Issues: a few Results

Formally, our aim is to address the problem of implementing the reliable point-to-point communication abstraction specified above.

Let us assume that p executes a rp2psend(m, q) statement and that p generates its leave_p event after the statement returns.

A simple implementation of such a statement can be done by continuously producing the event f lp2pSendp(m, q) [5]. However, an acknowledgement mechanism must be added to this simple protocol, to deal with the case where p is transient. In fact, without such a mechanism, p needs to generate an infinite number of f lp2pSendp(m, q) events in order to be sure that m reaches q [5] . This implies that reliable delivery cannot be assured by this communication protocol in a finite time, i.e.

reliable delivery cannot be assured for transient senders that by definition generate leavep at some (finite) point of time. From this arguments we have the following trivial assertion:

Assertion 1. The use of an acknowledgement mechanism between p and q is a necessary condition to implement reli- able point-to-point communication.

Let us now consider the following protocol implement- ingrp2pSend(m, q) andrp2pReceive(m, p):

1. Each process p has a variable tx_p, and repeatedly sends each message m ∈ tx_pto q, using a fair lossy point-to-point link, producing a sequence of f lp2pSend_p(m, q) events.

2. When p executes a rp2psend(m, q), it produces an event rp2pSend_p(m, q) and then inserts m in txp.

(3)

3. Every time q receives m from p, (i.e., each time f lp2pReceiveq(m, p) occurs), it sends an acknowledgment for m to p generating thus a f lp2pSend_q(ack(m), p) event.

Moreover, the first time q receives m from p, an event rp2pReceiveq(m, p) is generated.

4. When ack(m) is received by p, it generates an event f lp2preceivep(ack(m), q) and discards m from txp (i.e., it stops sending m).

5. The statementrp2pSend(m, q) returns when tx_p= ∅.

Consider the case where q is stationary. If p is station- ary or transient, eventually q receives m and p receives ack(m). Then when p executes a rp2psend(m, q) the statement assures reliable delivery in a finite time. It fol- lows that either p is stationary or transient the specification is satisfied. Let us note that, in this case, the protocol is also quiescent [2], i.e., the protocol eventually stops send- ing messages on the fair lossy link between p and q.

Let us now assume that q is transient or is faulty. It might happen that q leaves (e.g., if tx_q = ∅) and then crashes or the link fails, or that it is faulty, and consequently that p never receives from q an acknowledgement for m. If p is stationary, then the protocol still works, because, from the reliable delivery property, q may either leave (in that case, the protocol is no more quiescent, because p gener- ates an infinite sequence of f lp2pSendp(m, q) events) or fails. However, the protocol cannot guarantee p that m was reliably sent in finite time, it cannot satisfy the reliable de- livery property for a transient p violating the specification.

From the previous considerations, we have the following impossibility result.

Impossibility Result 1. There exists no reliable point-to- point communication protocol if both sender and receiver are transient.

Proof. (Sketch) Let us assume now that p and q are both transient processes and leave at the same time and both sent a message over the fair lossy link. Doing so, q and p de- lay their own leave event enough time to receive an ac- knowledgement from p and q respectively. It is clear that a deadlock arises. This deadlock forces both processes de- lay indefinitely the generation of their leave contradicting the fact that they are transient processes.

We can conclude that to realize a reliable point-to-point communication, over a fair-lossy link, the following necessary condition must hold:

Condition 1. Between two processes linked by a fair lossy link, either the sender or the receiver is stationary.

Our result shows that any implementation – even a non- quiescent one – is impossible even between non-faulty processes but with the possibility of intentional leaves. This apparently surprising result contradicts the common intu- ition, according to which intentional leaves are “easier” to

deal with than crashes, as the latter can be seen as “uninten- tional” leaves. The reason lies in the fact that in our model, messages sent by a transient process must have guarantee to be received, due to the fact that transient processes can be correct, while in the crash model without intentional leaves, messages sent by a faulty process have no guarantee to be received.

4. Reliable Dynamic Group Communication

In this section we discuss reliable communication inside a dynamic group. Interestingly, this section shows that a protocol implementing reliable communication inside a group cannot be designed if the group never contains at least one stationary node.

4.1. The DGC Problem Specification

We now formally define our problem, namely Dynamic Group Communication (DGC) problem.

A process can communicate inside the

group through the rsend group and

rreceive group statements. The execution of the rreceive group(G, m) statement by p, produces only one event, namely rreceive groupp(G, m). The communication statements are specified as it follows:

reliable delivery. If a stationary or transient process p ∈ G executes arsend group(G, m), then for each q ∈ G rreceive groupq(G, m), leaveq(G) or f ailqoccurs.

no creation. If rreceive groupp(G, m) occurs, then some process has previously executed rsend group(G, m).

no duplication. For each message m,

rreceive groupp(G, m) occurs at most once.

Note that the first property assures that each message sent by (i) a stationary process or (ii) a process that correctly leaves the group, will be eventually delivered by all station- ary nodes that either already belong to the group when the message is sent or will join the group in a successive point of time.

4.2. Impossibility Results

The presence of a stationary process. The following impossibility result states that if the group never contains a stationary process, the DGC problem cannot be solved.

Impossibility Result 2. If G does not eventually contain a stationary process, then DGC problem cannot be solved.

Proof. (sketch) By contradiction, suppose that Π is only formed by transient processes. Let us suppose that each process q ∈ Π belongs to G between time moments t⁰_q and

(4)

t^f_q. Let p be a member that wants to send a message m to the group, at a moment of time when G = {q}. As the system is asynchronous, the delay experienced by m on the fair lossy link connecting p and q, could be greater than t^f_q − t⁰_q and then the message of p does not meet q. The same scenario could occur for each p’s message. It implies that p may not reach any group member in a finite time. That forces p de- lays indefinitely the generation of its leave_p contradicting the fact that p is a transient process.

The initialization problem. In an ad-hoc network there are several methods to discover other nodes in the network, e.g. message broadcasting, lookup of devices using the same base station or gateway, etc. Without loss of gener- ality we suppose that a mobile node, that wants to join G, repeatedly broadcasts a special message ALIVE to discover other nodes of the network belonging to G. This mecha- nism provides the node with a local view of the group pos- sibly partial (the local view does not contain some process in the group) and inaccurate (the local view contains some process not in the group) at some point of time. A not par- tial local view at any point of time is complete. Formally, p is provided with a local view LVp satisfying the follow- ing invariant LVp ⊆ Π. After joinp(G) occurs, p is able to establish a fair lossy link with any process that belongs to LVp∩ G.

The following impossibility result states that the DGC problem cannot be solved if a process p generates join_p(G) when LV_p = ∅. Intuitively, that impossibility result is due to the asynchronism of the system that prevents LV_p to be complete: a joining process can erroneously infer to be alone.

Impossibility Result 3. If when join_p(G) occurs, LVp∩ G = ∅, DGC problem cannot be solved.

Proof. (sketch) Let us assume a process p generates joinp(G) at some point of time t and at this point of time LVp = ∅ and G = ∅. Without loss of generality, let us assume that LV_phas never contained any process q. Sup- posing that the discovery mechanism on p is the broadcast of ALIV E, it implies that no ALIV E message issued by other processes in the group has yet arrived to p. Then, p generates join_p(G) basing on a timeout strategy ¹. How- ever, if p generates join_p(G) basing on a timeout strategy it does not mean that the group is empty in every run. For sake of simplicity let us suppose that at time t, p is near to only one stationary process q of G. It means that they are able to communicate with a fair lossy link. Let then assume that they both have sent a ALIVE message at time t. Let us suppose that both these messages suffer a delay ∆. How- ever, due to the asynchronism, it may happen that when the

1Timeout strategy abstracts every local deterministic rule, e.g. a time- out elapsing after k broadcasts of ALIVE.

time out elapses on p at time t⁰ the ALIV E message sent by q is not yet arrived to p as t + ∆ > t⁰. In this case p enters the group, generating join_p(G) with LVp∩ q = ∅.

If p is transient it leaves the group at some point of time t⁰⁰. If t⁰⁰ < t + ∆ then as long as p belongs to the group LVp∩ q = ∅. In this case even every message m sent by p through arsend group(G,m)may not reach q violating reliable delivery.

This impossibility result is very interesting. The first im- plication is that join_p(G) can be generated only if p has already contacted at least one member of G, i.e. if it has re- ceived at least one ALIV E sent by a member of the group.

However, if the group has not yet contained any process, ev- ery process p that wants to join (e.g. they want to run a con- ference/meeting room application) cannot get any ALIV E message from a member of the group, as no member there exists. In this case, none joins the group and the application never starts. Then, an initialization problem arises: there is the need of a special process that creates the group. Clearly, if this special process leaves the group, at least another process must be inside the group, otherwise the group should become empty preventing successive joins. Then, at any point of time |G| > 0.

5. Protocol for a Meeting Room Scenario

In this section we propose a protocol well suited to a group of nodes participating a meeting/conference room application. A group member is an user that runs the meeting application. Accordingly to this scenario each process p ∈ G has a local view that eventually includes all station- ary members of the group. The user may exploit an inter- mittent link with each member contained in its local view.

It implies that the user is eventually connected to stationary processes and it may leave the room for a short period of time without losing the connection. The user fails if it definitively exits the room without previously leaving the group. The protocol tolerates t faults (process or connec- tion failures).

To overcome the impossibility results, let us assume:

Assumption 1. The existence of t + 1 stationary processes that eventually joins G.

This assumption and local views eventually including stationary processes imply that at least one stationary node is eventually reachable by each member of the group.

Let us consider the following protocol implementing the rsend group(G,m) and rreceive group(G,m) statement. The protocol provides that when a process p be- longing to G wants to leave it executes aleave(G)state- ment whose aim is to produce the leave_pevent.

Each process p endows the following data structures:

state. state ∈ { in, leaving}. It is initialized to in.

(5)

charges. set of messages. A message m ∈ charges is repeatedly sent to each member of LV_p, using a fair lossy link. It is initialized to ∅.

1. A process p, executing rsend group(G,m), if state = in, inserts m in charges.

2. Each time p receives m from ` and state = in, it sends back ack(m) to `. Moreover, p inserts m in charges. If m is received for the first time, rreceive group occurs.

3. When process p receives t + 1 ack(m) and state = leaving, it discards m from charges.

4. A process p, executing leave(G), sets state to leaving, then waits until charges = ∅ and generates leavep.

From step 4, it follows that a process p, sender of a message m, waits an acknowledgment ack(m) from t + 1 processes. From Assumption 1 and local views eventually including stationary processes, it follows that p will even- tually receive such a number of acknowledgments in a fi- nite time preserving reliable delivery in case of a transient sender. Moreover, if p gets t + 1 acknowledgments, at least one stationary or transient process q is “responsible” to re- lay to G the message m. If q is stationary, retransmission will be done forever to all the members of G, ensuring thus reliable delivery. If q is transient, then it relays the message to other t + 1 processes, i.e. a transient or stationary process is responsible of m. As Π is finite the protocol works.

6. Related Work

Communication problems have been extensively ad- dressed in the literature, in the context of static distributed, asynchronous, unreliable systems [1, 3, 8] (to cite just a few). According to the reliability assumptions and to the problem specifications, some problems have been shown to be impossible. For example, [5] shows that Uniform Reliable Broadcast cannot be solved in distributed asynchronous systems where a majority of processes can crash and links are only eventually reliable, but can be solved if less than half of processes can crash and links are only fair lossy. [2] focuses on the problem of reliable point-to-point communication. They show that reliable links can be im- plemented over fair-lossy links, even when processes can crash, but this implementation cannot be quiescent unless the system is enriched with a failure detector having unbounded output. In the context of dynamic systems with unbounded number of participants there are fewer results.

In [6] the authors introduce a new problem, namely Dy- namic Atomic Broadcast, and they give a solution to this problem in a synchronous crash failure model.

All above approaches, even though consider a dynamic system, are not specific to mobile systems. A group communication service specific for mobile ad hoc networks was proposed in [11]. In particular, each mobile node is pro-

vided with a group view built by a three-round group membership protocol based on the three round protocol proposed in [7]. They showed how the mobility of nodes injects inaccuracy in the determination of group membership. This is in addition to membership inaccuracy due to failures and recoveries in static distributed systems.

Broadcast algorithms [4, 9, 10, 12] traditionally referred in ad-hoc mobile networks work at level of the physical network: the broadcast consists in a flooding to reach all nodes physically close each other. The group, instead, lives on the top of the physical network (e.g. at application level) in which a node physically near to a group member may be not a group member.

References

[1] Y. Afek, H. Attiya, A.D. Fekete, M. Fischer, N. Lynch, Y. Man- sour, D. Wang, L. Zuck: Reliable communication over unreliable channel. JACM, 41(6):1267-1297, 1994.

[2] Marcos Kawazoe Aguilera, Wei Chen, Sam Toueg: Heartbeat: A Timeout-Free Failure Detector for Quiescent Reliable Commu- nication. WDAG 1997: 126-140.

[3] B. Awerbuch, S. Even: Reliable broadcast protocols in unreliable networks, Networks: An International Journal, 16, 1986.

[4] K. M. Alzoubi, P.J. Wan, O. Frieder: New distributed algorithm for connected dominating set in wireless ad hoc networks. In pro- cedeeings of IEEE HICSS35, 2002.

[5] A. Basu, B. Charron-Bost, S. Toueg: Simulating Reliable Links with Unreliable Links in the Presence of Process Crashes.

WDAG 1997: 105-122.

[6] Z. Bar-Joseph, I. Keidar, N.A. Lynch: Early-Delivery Dynamic Atomic Broadcast. DISC 2002: 1-16.

[7] Flaviu Cristian, Frank Schmuck: Agreeing on Processor Group Membership in Timed Asynchronous Distributed Systems.

UCSD Technical Report CSE95-428, University of California, San Diego, 1995.

[8] V.Hadzilacos, S. Toueg: Fault-Tolerant Broadcast and Related Problems. In S. Mullender, editor, Distributed Systems, chap. 16.

Addison Wesley, 1993.

[9] HyoJun Lim, Chongkwon Kim: Flooding in Wireless Ad hoc Networks, Computer Communications, Vol. 24, (2001) [10] S. Ni, Y. Tseng, Y. Chen, and J. Sheu: The broadcast storm prob-

lem in a mobile ad hoc network. In Proceedings of the Fifth An- nual ACM/IEEE International Conference on Mobile Computing and Networking: 152–162 (1999).

[11] Ravi Prakash, Roberto Baldoni: Architecture for Group Commu- nication in Mobile Systems. In proceedings of 17th IEEE Sym- posium on Reliable Distributed Systems: 235-244 (1998) [12] Yoav Sasson, David Cavin, Andre Schiper: Probabilistic Broad-

cast for Flooding in Wireless Mobile Ad hoc Networks. In Pro- ceedings of IEEE Wireless Communications and Networking Conference.