33 Jean-Michel Hklary, Michel Raynal Roberto Baldoni Rollback-Dependency Trackability: Visible Characterizations

(1)

Rollback-Dependency Trackability:

Visible Characterizations

Roberto Baldoni

Dipartimento di Informatica e Systemistica Universit6 di Roma ” La Sapienza”

Via Salaria 113, Roma, Italy E.mail: baldoniadis .uniromal . it

Abstract

When we consider an asynchronous distributed computation on which local checkpoints have been defined (namely, a communication and checkpoint pattern - in brief, CCP), two types of dependencies between its local checkpoints can be observed. The first type is due to causal sequences of messages that establish on-line trackable dependencies. The second type is due to non- causal sequences of messages (called Z-paths) that establish “hidden” dependencies between local checkpoints (a dependency is “hidden” if it can not be tracked on- line). The Rollback Dependency Trackability (RDT) property, defined by Y.-M. Wang, has been introduced to study CCPs. A CCP satisfies the RDT property if every pair of local checkpoints that are connected by a “hidden” dependency are also connected by a causal sequence of messages. The RDT property has a great interest: CCPs that satisfy this property allow relatively simple solutions to a lot of practical problems. This paper first introduces the notion of RDT-compliant property. In a given CCP, an X-path is a Z-path that satisfies a property X. The property X is RDT-compliant if every CCP without X-paths satisfies the RDT property.

Then, the paper presents a particular RDT-compliant property. This property enjoys several very interesting features. (1) It is “visible” (i.e., it can be tested on-line).

(2) It is stronger than previously known RDT-compliant properties. Consequently, this property provides a characterization of RDT better than the previous ones. The question of the minimal characterization of the RDT property is finally investigated.

1 Introduction

Long running scientific applications and service provid- ing facilities use rollback-recovery techniques to increase their fault tolerance and their availability. This is done by saving onto stable storage the state of processes (i.e.,

Permission to make digital or hard copies of all or part ofthis work for personal or c~assroon~ use is granted without fee provided that copies arc not made or distrihutcd for protit or commercial advantage and that copies bear this notice and the full citation 011 the first page. ‘I’0 copy other&c. to republish, to post on servers or to redistribute to lists.

rquircs prior specific permission and:or a fcx.

ISPD ‘09 Monterey CA USA

Jean-Michel Hklary, Michel Raynal

IRISA Campus de Beaulieu 35042 Rennes Cedex, FRANCE E.mail: {helary,raynal}@irisa.fr

local checkpoints) involved in the application. These savings aim at reducing either the service-down time or the amount of lost work in case of a failure (e.g., soft- ware bug or hardware failure). The application can then resume from a consistent global checkpoint that consists of a collection of local checkpoints one for each process in which no one happens-before another [3, 4).

Rollback-recovery involves major problems such as recovery line computation, garbage wllection of old checkpoints and output commits. Simple and decentralized solutions to the previous problems can be achieved if one is able to eficiently calculate the maximum and the minimum consistent global checkpoint that contain a given set of local checkpoints (91. These solutions would en- large the set of distributed applications that can benefit of the rollback-recovery technique.

Checkpoint-based rollback-recovery is one of the rollback recovery techniques. It is popular, because, unlike other approaches (e.g., log-based rollback-recovery) it does not require applications to satisfy some particular assumptions, such as Piecewise Deterministic behavior.

It relies only on checkpoints, and studying its properties passes through the study of a checkpoint and communication pattern of a distributed computation which consists of the set of local checkpoints of that computation and a dependency relation on those checkpoints. In a pi- oneering work [5], Netzer and Xu have shown that such dependencies are actually created by sequences of messages, called zigzag-paths. They have proved an important theorem stating that any set of local checkpoints can be extended to obtain a consistent global checkpoint if and only if they are not pairwise related by a zigzag- path’. Two categories of zigzag-paths have been iden- tified: causal and non-causal (Z-paths). Dependencies created by causal paths are on-the-fly trackable (e.g., by means of a transitive dependency vector). On the contrary, Z-paths create “hidden” dependencies.

In a seminal paper [9], Y.M. Wang has pointed out that if all the dependencies between local checkpoints are on-the-fly trackable (i.e., non-hidden dependencies), one can calculate very efficiently the minimum and the

‘This result is actually still valid in a more general setting including shared memory as well as message passing systems [2].

(2)

maximum consistent global checkpoints that contain a given set of local checkpoints. To formalize this situa- tion, Wang has defined the Rollback-Dependency Back- ability property (RDT) which imposes some restriction on the checkpoint and communication pattern associated with a distributed computation. A checkpoint and communication pattern satisfies the RDT property if all Z-paths occurring in this pattern are causally doubled.

A Z-path is doubled by a causal one if the pair of checkpoints related by that Z-path is also related by a causal path. In short, the RDT property can be formulated as

“there is no non-doubled Z-paths”.

Since the RDT property has been introduced, several characterizations of this property in terms of checkpoint and communication patterns have been proposed.

One of these characterizations has been given by Y.M.

Wang in [9]. Called FDAS (Fixed Dependency After Send), it requires that, in the checkpoint and communication pattern associated with the computation, “after the first message-sending event in any interval, the transitive dependency vector remains unchanged until the next checkpoint”. According to the terminol- ogy introduced in the present paper, this characterization can be reformulated as “if there are no prime Z- paths, then the RDT property is ensured”. These characterizations have revealed to be important not only from a theoretical point of view, but also from a practical one, when considering the task of maintaining the RDT property on-the-fly and without adding control messages while letting processes to take local checkpoints independently. Protocols that achieve this goal direct processes to take additional forced checkpoints when the RDT-characterization upon which they are based is about to be violated. Such protocols are called communication-induced checkpointing protocols. A communication-induced checkpointing protocol based on the FDAS characterization of the RDT property has been proposed in [9]. This protocol ensures the RDT prop erty on-line by directing processes to take forced checkpoints before delivering a message that would violate the FDAS condition (i.e., by “breaking” all prime Z- paths). Other communication-induced protocols, based on weaker characterizations [6, 81, were previously proposed. All are less efficient than the FDAS protocol in terms of the number of forced checkpoints [7].

The starting point of this work lies on the observation that, when considering the FDAS characterization, it suffices to break a subset of Z-paths (namely the subset of prime Z-paths) in order that RDT be satisfied. This observation raises the following new question: “How to determine subsets of Z-paths whose elimination (by taking forced checkpoints that break them) will be sufficient to ensure that all remaining Z-paths are doubled?‘. The first contribution of this paper is an an- swer to this question, introducing the concept of RDT- compliant property. Consider a property X defined on Z-paths. Z-paths satisfying this property are denoted X-paths. Informally, X is an RDT-compliant property if every checkpoint and communication pattern without X-paths enjoys the RDT property. A characterization based on a RDT-compliant property allows to test only the subset of X-paths instead of the complete set of Z-

paths for potential breaking. For example, the FDAS characterization is based on the property Prime that will be introduced later. As a consequence, the stronger is the property, the more efficient is the characterization, in the sense that there will be less Z-paths to consider for potential breaking.

As previously pointed out, this contribution is important also from a practical point of view, when considering the design of communication-induced checkpointing protocols. This claim is based on recent results obtained by Tsai et al. [7]. In their work, they have shown that communication-induced protocols based on stronger properties than FDAS will force less additional checkpoints than the FDAS protocol. The protocol presented in (11, for example, falls into this category. How- ever, it is important to remark that during a distributed computation its future is not available. Consequently, from a practical point of view, the properties that are interesting to consider are those that rely only on information contained in the causal pest of events upon which the property is evaluated (namely, the message arrivals). Such properties will be called visible pwper- ties. All previously known RDT-compliant properties (No-Receive-After-Send, FDAS) are visible.

In this paper, the following question is addressed: “is it possible to define visible RDT-compliant properties stronger than FDAST’ To our knowledge, this question has not been addressed in previous works. For example, none of the previously known visible RDT- compliant property took into account the fact that a Z- path could be perceived as causally doubled (i.e., visibly doubled), in which case it is not necessary to break this Z-path by forcing an additional checkpoint (as an example, FDAS protocol breaks all Prime-paths, whether they are perceived as doubled or not). Our approach introduces successive visible properties on more and more constrained Z-paths, namely, Z-paths of order two (CC- paths), Causal-Message-Z-paths (CM-paths), Simple- Causal-Message-Z-paths (SCM-paths), Prime-Simple- Causal-Message-Z-paths (PSCM-paths), and finally, Elementary-Prime-Simple-Causal-Message-Z-paths

(EPSCM-paths). The following results are stated (where X means, successively, CC, CM, SCM, PSCM, EPSCM):

(1) a CCP satisfies RDT if and only if it has no non- doubled X-path (i.e., the properties non-doubled X are RDT-compliant) ; and (2) the properties non-uisibly- doubled X are visible RDT-compliant properties.

A visible RDT-compliant property is minimal if it cannot be implied by any other visible RDT-compliant property. This minimality problem is an open problem. We conjecture that the non-visibly-doubled EPSCM property is a minimal visible RDT-compliant property. This paper contributes to this important question by show- ing that “reasonable” visible properties stronger than non-visibly-doubled EPSCM are not RDT-compliant.

The paper consists of five sections. Sections 2 and 3 introduce checkpoints and RDT, respectively. Then, Sec- tion 4 presents RDT-compliant visible properties. Fi- nally, Section 5 concludes the paper.

(3)

2 Checkpoints and Communication Patterns 2.1 Distributed Computations

A distributed computation consists of a finite set of n processes {A, Pz, . . . , P,,} connected by a communication network, that communicate and synchronize only by exchanging messages through the network. We assume that each ordered pair of processes is connected by an asynchronous directed logical channel whose trans- mission delays are unpredictable. Each process runs on a processor. Processors do not share either a common memory or a common clock value; there is no bound for their relative speeds.

A process can execute internal, send and delivery statements. An internal statement does not involve communication. When pi executes the statement

“send(m) to Pj I it puts the message m into the channel from Pi to Pj. When Pi executes the statement

“‘deliver(m)“, it is blocked until at least one message directed to Pi has arrived; then a message is withdrawn from one of its input channels and delivered to P;. Ex- ecutions of internal, send and delivery statements are modeled by internal, send and delivery events.

Processes of a distributed computation are sequential, in other words, each process R produces a sequence of events ei,l . . . ei,, . . . This sequence can be finite or infi- nite, but the number of events preceding another one is finite. Every process Pi has an initial local state denoted Oi.0. The local state bi,s (s > 0) results from the exe- cution of the sequence ei.1 . . . ei,= applied to the initial state u,,o. More precisely, the event ei,s moves Pi from the local state ci,s-l to the local state ui,s. By definition we say that “ei,r belongs to Oj,s” if i = j and x 5 s.

Let H be the set of all the events produced by a distributed computation. This computation is modeled by the partially ordered set I? = (H, 3), where 3 denotes the well-known Lamport’s happened-before relation defined in [4].

2.2 Local Checkpoints

A local checkpoint C is a recorded state of a process. A local state is not necessarily recorded as a local checkpoint, so the set of local checkpoints is only a subset of the set of local states.

Figure 1: A Checkpoint and Communication Pattern

Definition 2.1-A checkpoint-and communication pattern is a pair (H,Cg) where H is a distributed computation and Cg is a set of local checkpoints defined on ii.

Ci,, represents the z-th local checkpoint of process Pi.

The local checkpoint Ci,, corresponds to some local state 0i.s with x 5 s. Figure 1 shows an example of checkpoint and communication pattern. We assume that each process Pi takes an initial local checkpoint Ci,o (corresponding to Qi,o), and after each event a checkpoint will eventually be taken. Thus, each process al- ways begins, and ends, with a checkpoint.

3 Rollback-Dependency Trackability

The concepts introduced in this Section are related to a given checkpoint and communication pattern.

Zigzag-Paths, C-paths and Z-Paths The sequence of events occurring at Pi between Ci,,-1 and Ci,, (z > 0) is called checkpoint interval and is denoted by Ii,,; z is called the indez of this checkpoint interval.

Definition 3.1 A zigzag-path is a sequence of messages _bl,mz,... ,m,l (q 2 1) such that, va, (1 5 a 5 q - 1) 3k (1 5 k 5 n), 39, 3t (s, t >

0) with deliver(m,) E Ik,s A send(m,+l) E Ik,t As 5 t.

To our knowledge this notion has been introduced for the first time by Netzer and Xu in 151. If a zigzag- path [ml, . . . ,mp] is such that send(ml) E Ii,, and deliveT E Ij,g we say that this zigzag-path is from

Ii,, to Ij,p.

In the checkpoint and communication pattern shown Figure 1, [ms,mg] is a zigzag-path of length 2, from Ik,l to Ii.2 ; [ms, m4] and [m5, ms] are two zigzag-paths

Of length 2, from Ii.3 to Ik.2.

Definition 3.2 A zigzag-path is causal if the delivery event of each message (but the last one) locally precedes the send event of the nezt message in the sequence: such a zigzag-path is called C-path. A Z-path is a non-causal zigzag-path2.

‘Note that here, differently from other papers, a zigzag path

and a Z-path are different notions.

(4)

Z-paths &c-pattjs

CM-paths

I

I Pl

is a C-path

I SIMPLE 1 IPRIME

I I ELEME

A zigzag-path made up of a single message is a C-path.

As au example the zigzag-paths [ma, mz] and [mr,, m4]

depicted in Figure 1 are Z-paths, while [ms, me] is a C- path.

Notation. The following notation will be used in the rest of the paper. In a path C, the first (resp. the last) message will be denoted C.first (resp. {Just). The length of a path C is the number of messages forming C and will be denoted as ] c ]. Let < and <’ be two paths, e.g., C = [ml] and C’ = (m2, m3]. The concatenation of C and C’ will be equivalently denoted Cm<‘, or <.[m2, m3], or [ml] . C’, or else [ml, m2, m3].

Causal Doublings

Definition 3.3 A Z-path from Ii,= to Ij,y is causally doubled if i = j A x 5 y or

if

there exists a C-path ,u from Ii,=1 to Ij,v’ where x 2 X' and y’ 5 y.

In the checkpoint and communication pattern shown Figure 1, the Z-path [ms,md] is causally doubled by

[m5, m6].

Rollback-Dependency Trackability

The following concept, Rollback-Dependency i%zckabil- ity, has been introduced by Y.M. Wang in [Q]. It can be defined as follows3:

Definition 3.4 A checkpoint and communication pattern satisfies the Rollback-Dependency Trackability property (RDT)

if

and only if it has no non-doubled Z-

paths.

‘Though expressed differently, this definition (based on the causal doubling notion) is equivalent to Wang’s one.

Figure 2: Successive Embedded Subsets of Z-paths

4 RDT-Compliant Properties 4.1 The Concept of RDT-Compliance

Given a property X defined on Z-paths, with each checkpoint and communication pattern (H, C,) is associated the set of Z-paths that satisfy X in this pattern. These Z-paths will be called X-paths.

Definition 4.1 A property X defined on Z-paths is RD-T-compliant

if:

V(H,C%) : (there is no X-path in (i?, Cg) =+ (‘6,C%) satisfies RDT).

This section introduces successive embedded properties X on Z-paths, and states that the corresponding non- doubled X properties are RDT-compliant. Moreover, re- placing doubled by visibly-doubled, visible RDT-compliant properties non-visibly-doubled X are obtained. Figure 2 summarizes this hierarchy of properties, based on the concepts of simple causal path, prime causal path and elementary causal path.

4.2 Embedded RDT-Compliant Properties Z-paths of order two (CC-paths)

A Z-path of order two is composed of exactly two C- paths. In the checkpoint and communication pattern shown Figure 1, [m2,ms,m4,ms] is a Z-path of order two: it is composed of the C-paths [mz, ms], and [m4, ms].

The corresponding property will be denoted CC.

Lemma 4.1 If all CC-paths are causally doubled, then all Z-paths are causally doubled.

Proof For the class formed by CC-paths, we introduce a specific notation: when C = ~1 . ~2, with ] ~1 ]= ei and ] cl2 ]= t% we say that C is a (-!l,e2)-Z-path.

(5)

w Z-path to be doubled -- -3- Causal Doubling

9: t

x x* x*

Figure 3: Proof of Lemma 4.1

P y’ Y

i

Case where r”=t (v.p; is not causal)

*

W Z-path to be. doubled -- 3- Causal Doubling t

x x’ X”

Let doubled(e) denote the predicate: “all Z-paths of order e are causally doubled”. The lemma can be formulated as doubled(2) j V.! 2 2 : &bled(e). Thus, it is sufficient to prove by induction: Ve > 2 : doubled(!) * doubled(e+l). Let C = ~1 .p2.. .w.w+~ be a Z-path of order e+ 1 from Ii,, to Ij,v (Figure 3). C’ = ~1 .pz . . ./AC is a Z-path of order L from li,, (to some Ih,,). From the induction assumption, C’ is causally doubled. Thus, there exists a C-path v from Ii,,, to I,,,# with z < x’

and z’ 5 z. Since /.~l+r is a C-path from I&,, to Ij,y, we have Y. /.~+i is a C-path or a CC-path, from Ii,,l to 1j.y. In the former case, C is causally doubled by v. p(+r; in the latter case, from the lemma assumption, this CC-path is causally doubled and thus there exists a C-path p from Ii Zn, to Ij,v’ with Z’ < d” and y’ 5 y.

Thus, z 5 2” and d 5 y, which shows that C is causally

doubled by p. 0

CM-paths

A CM-path is a CC-path ~1 . ~2, where ~2 is a single message m.

Lemma 4.2 If all CM-paths are causally doubled, then all CC-paths are causally doubled.

Proof Let doubled(e) denote the predicate “all (-t-,e)- CC-paths are causally doubled”. The lemma can be for-

mulated as doubled(l) =F- VL 2 1 : doubled(l). Thus it is sufficient to prove, inductively: Ve 2 1 : d&led(l) j doubled(e + 1). Let C = ~1 . ~2 with ) cl2 I= .f + 1 a (+,e + 1)-CC-path from li,z to Ij,p (Figure 4). Let m be the first message of ~2, going from Ik,l to &,,t. We have: /AZ = [m] . & where & is a C-path from Ip,t, to Ij,g, with t 5 t’ and ] & (= e; ~11 . [m] is a CM- path from Ii,= to &,t. By the lemma assumption, there exists a C-path v from Ii,,, to Ir,t,, with x 5 z’ and t” _< t. The path v . & from I;,,, to Ij,y is either a C- path and since z 5 z’ it is a causal doubling of C. Or it is a (+,C)-CC-path (this the case shown on the figure, where t = t’ = t” must hold) and thus there exists a C-path 1-1’ from li,,tl to Ij,yl with 2’ < 2” and y’ < y;

thus, x < x” and C is causally doubled by ~1’. I3

Simple C-paths and SCM-paths

A C-path p = [ml,mz,. . . ,m,] is simple if, for each i (1 5 i 5 q - 1) the two events deliveT and send(mi+l) occur (in this order) in the same interval.

In other words, a simple C-path does not “include” local checkpoints. A non-simple C-path can be written as p = p1 . p2 . . . . pt, where each component pi is simple, and there is a local checkpoint between the events deliver(pi.last) and send(pi+l. first). In the checkpoint and communication pattern shown in Figure 1

(6)

e Z-path to be doubled -- s Causal Doubling

P k

P. J

Y Y’ Y”

a) p’ is simple N Z-path to be doubled b) V’ is not simple: II’ = ~‘1 ... pi - - -> Causal doubling

the non-simple C-path [mr,, me, ma] is composed of the two simple causal paths [ms, me] and [me].

A SCM-path is a CM-path ,u . [m] where p is simple.

Lemma 4.3 If all SCM-paths are causally doubled, then all CM-paths are causally doubled.

Proof Let doubled(e) denote the predicate “All CM- paths Jo . [ml, where p is a C-path composed

of

e simple C-paths, are doubled”. The lemma can be formulated as doubled(l) * W > 1 : doubled(e). Thus it is sufficient to prove, inductively: W 2 1 : doubled(e) * doubled(e+l). Let p~+i.p~.. . ..pi.[rn] a CM-path from li,, to Ij,y (Figure 5). By construction, /~+i *pt.. . . ‘~1 is a C-path and each pi (1 5 i 5 f + 1) is simple. Sup- pose that c(~+i is from Ii,, to Ik,=. Thus, PC.. . . .,UI . [m]

is a CM-path (whose causal part is composed of L simple C-paths) from Ik,z~ to Ij,y, with z C E’. By induction assumption, this CM-path is doubled; thus, there exists a C-path /.I’, from Ik,z~, to Ij,v’, with Z’ 5 L” and y’ 5 y. Since z < z”, we have deliver(pl+l.last) -!+

send(p’.first) and thus pc+i . p’ is a C-path from 1i.z to Ij.y’7 with v’ 5 y. This shows that the CM-path Pf+l*/.Q...- . ~1 . [m] is causally doubled. 0

Prime C-paths and PSCM-paths

A C-path from Ij,y to li,, is prime if there is no C-path /.I’ from Ij,p, with 9 5 y’, such that deliver($.last) -$

deliver(p.last). Intuitively, a prime C-path from Ij,y to Ii,, is the 6rst one including the existence of checkpoint Cj,r-i into the causal past of P,!s current state. In the checkpoint and communication pattern shown in Figure 1, the C-path [mi,m4] is prime, while [ml,m7] is not.

A PSCM-path is a SCM-path p. [m] where the C-path p is prime.

Lemma 4.4 Zf all PSCM-paths are causally doubled, then all SCM-paths at-e causally doubled.

Proof Note that, if p is not a prime C-path from 1j.g

to Ii,,, then there must exist a prime C-path ~1’ from

Ij,gl to Ii,, with y 5 y’; this is due to the fact that the number of events occurring on pi before deliver(p.last) is finite.

Let C = /J* [m] be an SCM-path, with p from 1j.r to li,, and m from Ii,, t0 IkVr.

If Jo is prime, then C is causally doubled, by the Iemma assumption.

(7)

‘k P i

‘i / *

Y

If Jo is not prime, then there exists a prime C-path p’

from Ij,y, to Ii,,, with y < y’ . There are two cases:

i) p’ is simple (Figure 6.a). In that case, by the lemma assumption, ~1’ . [m] is causally doubled. Thus, there exists a C-path v from 1j.y” to Ik,=,, with y’ 5 y” and z’ 5 L and C = p . [m] is causally doubled by V.

ii) CL’ is not simple (Figure 6.b). Then, ~1’ = & . . . . .&, where each /.L; is simple. In particular, pi . [m] is a PSCM-path. From the lemma assumption, it is causally doubled by ul. Thus, v = p’i . . . .VC is a C-path doubling

c=P.bl. 0

Elementary C-paths and EPSCM-paths

A C-cycle is a C-path from a process to itself. A C- path is elementary if it does not include any C-cycle (in other words, the sequence of processes it traverses has no repetition).

An EPSCM-path is a PSCM-path p . [m] where /.J is elementary.

Lemma 4.5 If all EPSCM-paths are causally doubled, then all PSCM-paths are causally doubled.

Note that, since processes do not send messages to them- selves, the length of a C-cycle is g 2. Note also that the length of an elementary C-path IS 5 n - 1. Every C- cycle is from an interval Ii,, to an interval li,,l, with z < 2’. When z < z’ we will say that the C-cycle is s&t. The following reduction observation is straight- forward: consider a prime simple C-path ~1.7. ~2 from some interval 1j.l to an interval Ik+, where ~1 and p2

are elementary and 7 is a non strict C-cycle. Then,

~1 . ~2 is an elementary prime simple C-path, from Ij,g foci; j (on th e contrary, if 7 is strict, ~1 .j~z is no more Proof Let C = p . [m) be a PSCM-path, where p is from Ij,g to Ii,= and m from 1i,z to Ik,l.

If p is elementary, then C is causally doubled, by the lemma assumption.

If p is not elementary, remark that it cannot include a C-cycle 7 from some Ii,,) to Ii,, (I’ 5 z), which would

contradict its primality. Let Ii be the path obtained by removing from /J all non strict C-cycles. By the reduction observation and the previous remark, ii is a prime simple C-path from 1j.y to Ii,,. If it is elementary, then 7-i. [m] is causally doubled, by the lemma assumption, and thus, 5 is also causally doubled, by construction.

Let us now consider the case where ji is not elementary. By construction, all the C-cycles included in F are strict. Let 7 be the last C-cycle included in j% This C-cycle is from 1~1 to Ic,t, with t’ < t (strictness of 7) where PC is one of the processes traversed by ji, and e # i (Figure 7). Then ji = p’ .7.,o”, where p’ is a prime C-path from 1j.r to Zt t, and p” is a C-path from It,t

t0 Ii,,. Observe that &’ is elementary, by construction, and simple (because ji is simple). Moreover, it is prime w.r.t If,t, otherwise there would exist a C-path V’ starting from PC after the checkpoint Cl,t, and arriving at Pi before CL”, and the C-path p’ . Y’ would contradict the primality of ji. Thus, cc” . [m] is an EPSCM-path and, from the lemma assumption, it is causally doubled by a C-path Y. Since v starts from Pf after the checkpoint c (,tj, we have: P’.V is a causal path, that doubles ji.[m]

and thus doubles C. 0

4.3 A Characterization of RDT

Theorem 4.6 A checkpoint and wmmunication pattern satisfies the RDT property

if

and only

if it

has no non- doubled EPSCM-paths.

This theorem is an immediate consequence of the previous lemmas.

Proof

i) Only if part. Trivial, since RZVimplies that all Z-paths are causally doubled, and EPSCM-paths are Z- paths.

ii) If part. Suppose that all EPSCM-paths are causally doubled. From Lemma 4.5, all PSCM-paths are causally doubled. From lemma 4.4, all SCM-paths are causally doubled. From Lemma 4.3, all CM-paths are causally doubled. From Lemma 4.2, all z-paths of order 2 are causally doubled. Finally, from Lemma 4.1, all Z-paths are causally doubled, i.e., RVTis satisfied. cl

(8)

Pj I

pi .

.

pk n .

* yk,yp *_

>

a) No visibility: p’ causally doubles ~1. m b) visibility: p’ causally doubles p . m but this doubling is not visible. and this doubling is visible because of v.

Figure 8: Non-visibility of doubling.

4.4 Visible RDT-Compliants Properties

Properties CC, CM, SCM, PSCM and EPSCM are uis- ible, in the sense that they can be tested on-line upon the arrival of a message. In fact, each of these properties involves information that is entirely contained in the causal past of each arrival event. Such an information can be encoded in the local context of the receiving process and in the message itself. Let m’ be an arriving message. Then, for example, CM is equivalent to a message m has been sent previously in the same checkpoint interval; Simple can be tested if m’ carries the sequences of checkpoints that are included in the C-paths terminated by m’; Prime can be tested by using Dependency Vectors ([9]); Elementary can be tested by including in m’ the sequences of processes that have been traversed by the C-paths terminated by this m’.

Let us note that, practically, some of these properties can be more efficiently tested, using appropriate data structures.

On the contrary, the corresponding non-doubled properties on Z-paths (i.e., non-doubled CC, non-doubled CM, . . . . and non-doubled EPSCM) are not visible in the sense that they cannot be tested on-line upon the arrival of a message. As an example see the EPSCM path p . m shown in Figure 8.a. When the message p.last is delivered, the process pi does not know anything about the causal doubling executed by cc’.

Note that (i) none of previous visible property checks if a causal doubling really occurred in the causal past of a message, and that (ii) the property causally doubled is sometimes visible (i.e., on-line detectable), as shown by Figure 8.b. At the time of the receipt of a message forming a Z-path, how can a process know that the Z- path is causally doubled, without knowing the future of the computation? This observation motivates the introduction of detectability of doubling (only CM-paths are considered):

Definition 4.2 Let C be an CM-path p.[m] in (fi,Cg).

C is visibly doubled if

l it is causally doubled by a C-path p’ and,

l deliwer(p’.last) 3 deliver(p.last).

Intuitively a causal doubling of a CM-path C is visible at process pi upon the delivery of the message p.last, if the C-path p’, which causally doubles C, belongs to the causal past of the event deliver(p.last). Obviously, visibly doubled implies doubled, but the converse is not true. So, each of the Lemmas 4.2 to 4.5 can now be reformulated as suficient conditions to ensure RDT. As an example, we obtain:

Theorem 4.7 Zf a checkpoint and communication pattern has no non-visibly-doubled EPSCM-paths then it satisfies the RDT property.

These results are weaker than the characterization results obtained in Sections 4.2 and 4.3. But, unlike these characterizations, they involve properties that can be tested on-line.

As we said in the introduction, using the previous ter- minology, FDAS characterization can be reformulated as: PCM is an RDT-compliant property. Since every non-visibly-doubled EPSCM-path is a PCM-path the non-visibly-doubled EPSCM property is stronger than the PCM property. So, according to the results of [7], communication-induced protocols that break only non-visibly-doubled EPSCM-paths, such as the one proposed in [l], force less checkpoints than FDAS protocol.

These theoretical results are confirmed by simulation studies presented in the same paper.

4.5 Minimality of the Characterization

The previous results show that non-visibly-doubled EP- SCM is a visible RDT-compliant property, stronger than all previously used visible RDT-compliants properties such as FDAS. But this result raises an important new question: “is it possible to find a minimal visible RDT- compliant property, i.e., one that cannot be implied by any other visible RDT-compliant property”? If such a property is found, then it corresponds to a minimal subset of Z-paths that must be tested for potential on- line removal in order to ensure the RDT property. As indicated in the Introduction, we conjecture that the non-visibly-doubled EPSCM property is minimal. In fact, on the one hand, no counter-example has been produced yet (i.e., no strictly stronger visible RDT- compliant property has been found); on the other hand,

(9)

Figure 9: A counter-example

no formal proof of minimality for visible properties has been proposed yet. As an example to enforce the conjecture, we look at which restrictions could still be im- posed to EPSCM-paths. For example one of them could be the restriction on the length of the causal component of CM-paths. The next lemma shows that this stronger visible property is not RDT-compliant.

Proposition 4.8 The property X E (C is an EPSCM- path p. [m] with ICI 5 n - 2) (where n is the number of processes) is not RDT-compliant.

Proof

Consider the checkpoint and communication pattern de- scribed as follows (an example with n = 5 is depicted in Figure 9).

l There are n processes denoted PI, Pz, . . . , P,, (n 1 4)

l On each process P; there is only one checkpoint interval Zi,i

l On process PI, there is only one event: send(ml) to Pz

l On each process Pi with 2 5 i 5 n - 2, the following three events occur in that order: send(m:) to P,-I, re&pt(mi-1) from E-1, send mi to Pi+1

l on process P,,-1 , the following n - 1 events occur in that order: receipt(m&) from Pz, receipt(mb) from Ps, . . . . receipt(m’,-2) from Pn-2, send(m) to Pn, receipt(m,-2) from P,-2.

In this pattern, X-paths are:

Vi : 2<i<n-2,Vj: l_<j<i-1:

[mj,mj+i,... ,mi-l]‘[m:]

Each of them is respectively doubled by

[mj,mj+i,... , mi-1, m;]. SO, all X-paths are causally

doubled. Note that the CM-path [ml, mz, . . . , 7nn-z] . [m] is not an X-path because its length is equal to n - 1 and% (2 <i 5 n-2) theCM-path [mi,... ,m,-z].[m]

is not an X-path because of message m: that makes

t”j’.- ’

m,,-z] not prime. However, [ml, mz, . . . , 7nn-z]

m is an EPSCM-path, and it is not doubled. So, for every n (with n > 4) there exists a checkpoint and communication pattern where all X-paths are doubled and that does not satisfy the BDT property. cl Another class of properties can also be eliminated from BDT-compliance, as shown by the next proposition:

Proposition 4.9 Zj the existence of non-doubled EPSCM-paths does not imply the existence ojX-paths, then X is not RDT-compliant.

Proof By contradiction, suppose that X is BDT-- compliant. From the definition of BDT-compliance, we have: in every checkpoint and communication pattern no X-path implies no non-doubled-Z-path. This is equivalent to: in every checkpoint and communication pattern, the existence of a non-doubled EPSCM-path implies the existence of an X-path, a contradiction. 0 An example of such a property is Non-Doubled-Hamilto- nian-EPSCM (i.e., a non-doubled EPSCM-path travers- ing the n processes).

5 Conclusion

The Rollback-Dependency Backability (RDT) property is satisfied by a checkpoint and communication pattern if all the dependencies between its local checkpoints are on-line trackable. This paper has provided new characterizations of the RDT property. Based on the BDT- compliance notion, these characterizations are stronger than previously known characterizations of RDT (e.g., FDAS). Consequently, there will be less Z-paths to consider for potential breaking. We conjecture that the property “Non-Visibly-Doubled-EPSCM-path” is the strongest one in the class of visible BDT-compliant properties., i.e. the set of Non-Visibly-Doubled-EPSCM paths is the smallest one to test for breaking in order to ensure on-line the RDT property.

Finally, as initially pointed out by Y.M. Wang [9], we would like to insist on the fact that ensuring RDT in a distributed computation allows efficient, on-line and

(10)

distributed implementation of algorithms for recovery line computation, garbage collection of old checkpoints and output commits. These issues are core problems one has to address when adopting a checkpointing-based rollback-recovery technique. Protocols based on a strong characterization of the RDT (e.g., [l]) allow to get these advantages with a small additional checkpointing over- head.

Acknowledgements

We wish to thank S.Y. Kuo, J. Tsai and Y.M. Wang for very interesting discussions and exchanges on topics related to Rollback Dependency Trackability Property.

References

[l] Baldoni, R., Helary, J.M., and Raynal, M., A communication-Induced Checkpointing Proto- col that Ensures Rollback-Dependency Trackability.

in Proc. 27th IEEE Symposium on Fault-Tolemnt Computing (FTC’S 27), Seattle, pp. 68-77, June 1997.

[2] Baldoni, R., Helary, J.M., and Raynal, M., Consis- tent Records in Asynchronous Computations, Acta Informatica, 35441-445, 1998.

[3] zo;dy, K.M. and Lamport, L., Distributed-Snap : Determmmg Global States of Distributed Systems, ACM lkansactions on Computer Systems, 3(1):63-75, 1985.

[4] Lamport, L., Time, Clocks and the Ordering of Events in a Distributed System, Communications of the ACM, 21(7):558-565, 1978.

[5] Netzer, R.H.B., and Xu, J., Necessary and Sufficient Conditions for Consistent Global Snapshots, IEEE

l’kunsactions on Parallel and Distributed Systems, 6(2):165-169, 1995.

[6] Russell, D.L., State Restoration in Systems of Com- municating Processes, IEEE Transactions on Soft- ware Engineering, SE6(2):183-194, 1980.

(71 Tsai, J., Wang, Y.M., and Kuo, S.Y., Theoretical Analysis for Communication-Induced Checkpoint- ing Protocols with Rollback-Dependency Trackabil- ity, IEEE tinsactions on Pamllel and Distn’buted Systems, 9(10):963-971, October 1998.

[8] Venkatesh, K., Radhakrishanan, T. and Li, H.L., Optimal Checkpointing and Local Recording for Dominofree Rollback-Recovery, Information Pm- cessing Letters, 25:295-303, 1987.

[9] Wang, Y.M. Consistent Global Checkpoints That Contain a Given Set of Local Checkpoints. IEEE tinsactions on Computers, 46(4):456-468, 1997.