• Non ci sono risultati.

Efficient Partial Replication to Improve Locality in Large-Scale Systems

N/A
N/A
Protected

Academic year: 2021

Condividi "Efficient Partial Replication to Improve Locality in Large-Scale Systems"

Copied!
6
0
0

Testo completo

(1)

Efficient Partial Replication to Improve Locality in

Large-Scale Systems

Jean-Michel H´elary Alessia Milani

IRISA, Campus de Beaulieu, 35042 Rennes-cedex, France.

DIS, Universit´a di Roma La Sapienza, Via Salaria 113, Roma, Italia.

helary@irisa.fr Alessia.Milani@dis.uniroma1.it

Abstract

This work addresses data replication and consistency maintenance in large scale systems. Based on the observation that partial replica- tion of data allows to improve network traffic and to preserve locality, particularly in large scale systems, it focuses on the constraints nec- essary to maintain consistency when partial replication is used. More precisely, it points out that the transfer of information necessary to manage consistency requirements might cancel the decrease in net- work load and the locality properties offered by partial replication. It is shown that causal consistency criterion suffers this drawback, but, on the contrary, PRAM consistency criterion allows efficient partial replication. The usefulness of PRAM is illustrated with the Bellman- Ford shortest path algorithm, which takes profit of graph locality to partially replicate the variables and is solvable in PRAM distributed shared memories.

1 Introduction

Large-scale systems spanning geographically distant sites are environments naturally appropriated for distributed applications supporting collaboration [4]. As information and users proliferate through the Internet and the in- tranets, large-scale systems must cope with the challenge of scale. To be scalable a system should accommodate large number of processes and should allow applications to manage a great deal of data. CPU processing, local storage and network bandwidth are the traditional metrics for scalability. In order to reduce network load, locality has to be exploited. In other words,

(2)

local awareness enables a system to limit the impact of local computation on wide-area performance both during regular operation and under fault conditions [8]. It has be proven that caching and replication are two effec- tive techniques to improve locality. Replicating data closer to their users reduces network traffic and latency of accesses to such a data, that is a user can experience low latency in accessing data if data are closer to it.

Moreover, locality-awareness is crucial for viability of large internets since costs are proportional to the actual distance of interacting parties [1]. In this sense, to improve performance and allowing system scalability, it is desirable that each process collaborating in a distributed application per- forms its task only accessing a subset of system information, the closest one to it. For what said, partial replication seems to be the most appropriate replication technique in large scale systems.

On the other hand, since replication requires consistency management, that is copies of data have to be maintained consistent, there could be the following drawback: the transfer of information necessary to manage con- sistency requirements cancels the decrease in network load and the locality properties offered by partial replication. According to this, in designing dis- tributed applications, it is essential to analyze the entire infrastructure, that is the algorithm, the replication technique, the consistency necessary to run the given application, etc. [8].

This work addresses data replication and consistency maintenance in large scale systems, in order to determine whether a consistency criterion suffers the above drawback. Section 2 presents the main result obtained so far, concerning the possibility of efficiently implementing partial replication.

In particular, it is shown that causal consistent memories [2] suffer the above drawback, while PRAM consistent memories [6] do not. This result is based on a graph modelization of data replication. Finally, section 3 shows how a distributed application solving the widespread problem of routing, can be efficiently implemented with partial replication on the top of a distributed shared memory, thus preserving locality properties.

2 The problem of efficient partial replication im-

plementation in large-scale systems

Distributed shared memory abstraction is traditionally realized through a distributed memory consistency system(MCS) on top of a message passing system providing a communication primitive with a certain quality of service in terms of ordering and reliability [3]. Such a system consists of a collection

(3)

of nodes. On each node there is an application process and a MCS process.

An application process invokes an operation through its local MCS process which is in charge of the actual execution of the operation. Assuming a partial replicated environment means that, corresponding to the fact that each application process api accesses only a subset of the shared variables X , denoted Xi, each MCS process pi manages a replica of a variable x iff x ∈ Xi. Our aim is to determine which MCS processes are concerned by information on the occurrence of operations performed on the variable x in the system. More precisely, given a variable x, we will say that a MCS pro- cess pi is x-relevant if, in at least one computation, it has to transmit some information on the occurrence of operations performed on variable x in this computation, to enforce the consistency criterion. Of course, each process managing a replica of x is x-relevant, but the problem is that other processes could also be x-relevant. The main result of the section is a characterization of x-relevant processes when considering causally consistent shared memo- ries. From this result, it follows that in causally consistent shared memories all MCS processes are x-relevant, for every variable x, while in PRAM con- sistent shared memories, only MCS processes managing a replica of x are x-relevant.

This result is based on the notion of share graph. Given a distribution of variables {X1, . . . , XN}, the share graph is an undirected (symmetric) graph whose vertices are processes, and an edge (i, j) exists between pi and pj iff there exists a variable x replicated both on pi and pj (i.e. x ∈ Xi∩ Xj).

Each variable x defines a sub-graph C(x) of SG spanned by the processes on which x is replicated. This subgraph C(x) is a clique, i.e. there is an edge between every pair of vertices. The ”share graph” is the union of all cliques C(x).

Given a variable x, we call x-hoop, any path of SG, between two distinct processes in C(x), whose intermediate vertices do not belong to C(x) (figure 1). Formally:

Definition 1 (hoop). Given a variable x and two processes pa and pb in C(x), we say that there is a x-hoop between pa and pb, if there exists a path [pa = p0, p1, . . . , pk = pb] in SG such that:

i) ph 6∈ C(x) (1 ≤ h ≤ k − 1) and

ii) each consecutive pair (ph−1, ph) shares a variable xh such that xh 6= x (1 ≤ h ≤ k)

Let us remark that the notion of hoop depends only on the distribution of variables on the processes, i.e. of the topology of the corresponding share graph. In particular, it is independent of any particular computation.

(4)

x x

x

























x

x  x  x

C(x)

x x

x

























x

x  x  x

C(x)

Figure 1: An x-hoop

Theorem 1. Given a variable x, a process pi is x-relevant if, and only if, it belongs to C(x) or it belongs to an x-hoop.

The idea underlying this result is the following: in a causally consistent execution, an indirect causal dependency can be created between a write op- eration performed by a process paon variable x and another (read or write) operation performed by a process pb on the same variable. This indirect dependency is created because of operations performed on other variables, shared by the intermediary processes. As a consequence, unless a particular variable distribution avoiding all hoops is assumed, it is necessary that each MCS process transmits control information regarding all the shared data to guarantee causal consistency in a partial replicated environment.

On the contrary, PRAM memories [6] can be efficiently implemented in a partial replicated environment, due to the fact that the PRAM con- sistency criterion is only concerned by “direct relations” between pairs of

“adjacent” processes and not by transitivity due to intermediary processes [7]. Thus, even if hoops exist, they cannot create such indirect dependencies.

In other words, it is not necessary to propagate global information to main- tain PRAM consistency. In the next section, we propose, as a case study, a distributed implementation of the Bellman-Ford shortest path algorithm on top of a PRAM distributed shared memory, efficiently exploiting partial replication of variables.

3 Implementing the Bellman-Ford algorithm ex-

ploiting partial replication in large-scale systems

Routing is a common problem in large-scale system. According to this, in the following we propose a distributed implementation of the Bellman-Ford algorithm [5] to compute the minimum path from a source node to every

(5)

other nodes in the system. In this implementation, with each node are associated an application process and its associated MCS process.

In such a computation, locality-awareness is essential since it is desirable that each process only manipulates information shared with its neighbors.

Moreover, since we want to reduce network load in order to ensure scalability, we exploit locality in accessing shared data. In other words, we assume that each process has a local copy of the shared data it is interested in, to compute its task. This means considering partial replication to efficiently distribute the computation.

The system (network) is composed by N processes ap1, . . . , apN and is modelled with a graph G = (V, Γ), where V is the set of vertex, one for each process in the system and Γ is the set of edges (i, j, wij) such that api, apj belong to V and there exists a link between i and j with cost wij. Let Γ−1(i) = {j ∈ V |(i, j, wij) ∈ Γ} be the set of predecessors of process api. The algorithm proceeds in stages. For each successive k ≥ 0:

i) xks = 0, for all k,

ii) ∀ i 6= s, xk+1i = minj∈Γ−1(i)[xkj + wji], where s is the source node and initially x0i = ∞ ∀ i 6= s. It is well-known that, if there are no negative cost cycles, the algorithm converge in at most N steps.

Without loss of generality, we assume that process ap1is the source node.

We denote as xi the current minimum value from node 1 to node i. Then, to compute all the minimum paths from process ap1 to every other process in the system, processes cooperate reading and writing the following set of shared variables X = {x1, x2, . . . , xN}. Moreover, since the algorithm is iterative, in order to ensure liveness we need to introduce synchronization points in the computation. In particular, we want to ensure that at the beginning of each step, each process api reads the values written by his predecessors Γ−1(i) during the previous step. Thus each process knows that at most after N iterations, it has computed the shortest path. To this aim, we introduce the following set of shared variables S = {k1, k2, . . . , kN}.

Each application process api only accesses a subset of shared variables.

More precisely, api accesses xh ∈ X and kh∈ S, such that h = i or aph is a predecessor of api.

Since each variable xi and ki is written only by one process, namely api, it is simple to notice that the algorithm in Figure 2, correctly runs on a PRAM shared memory – and thus also on a causally consistent shared memory. But as we have seen in the previous section that PRAM allows efficient partial replication and thus scalability – while causal consistency does not. The former memory will obviously be chosen.

(6)

Minimum Path 1 ki := 0;

2 if (i == 1) 3 xi := 0;

4 else xi := ∞;

5 while(ki < N )do

6 while(V

h∈Γ−1(i)(kh < ki))do;

7 xi := min([xj+ wji] ∀j ∈ Γ−1(i));

8 ki:= ki+ 1

Figure 2: pseudocode executed by process api

References

[1] I. Abraham and D. Malkhi. Principles of Locality-Aware Networks for Locating Near- est Copies of Data. Technical Report Leibnitz Center TR 2003-84, School of Computer Science and Engineering, The Hebrew University, 2003.

[2] M. Ahamad, G. Neiger, J.E. Burns, P. Kohli and P.W. Hutto. Causal Memory: Defi- nitions, Implementation and Programming. Distributed Computing 9(1): 37-49, 1995.

[3] H. Attiya and J. Welch. Distributed Computing (second edition), Wiley, 2004.

[4] O. Baboglu, A. Bartoli and G. Dini. Replicated File Management in Large-Scale Dis- tributed Systems. Technical Report UBLCS-94-16, June 1994 (Revised January 1995).

[5] R. E. Bellman. On a routing problem. Quarterly Applied Mathematics, XVI(1): 87-90, 1958.

[6] R. Lipton, J. Sandberg. PRAM: a Scalable Shared Memory. Technical Report CS- TR-180-88, Princeton University(1988).

[7] M. Raynal, A. Schiper. A suite of formal definitions for consistency criteria in dis- tributed shared memories. Technical report PI 968, Irisa, 1995

http://www.irisa.fr/bibli/publi/pi/1995/968/968.html

[8] B. Y. Zhao, A. D. Joseph and J. Kubiatowicz. Locality Aware Mechanism for Large- scale Networks, Proc. of Workshop on Future Directions in Distributed Comp, 2002.

Riferimenti

Documenti correlati

This paper discusses modularization as a mechanism for improving the flexibility and comprehensibility of a system while allowing the shortening of its development

fragilis sono state 3, pari al 20.0%; va però precisato che mentre nei 5 campioni di soggetti non a contatto diretto con i suini il risultato è sempre stato negativo (anche per

Then we propose, in the context of active replication, a three-tier architecture in which clients (the client-tier) interact with a middle tier (the mid-tier) that forwards

Figure 3.22: Experimental time series showing the presence of a stable upright position together with a stable large amplitude sub-harmonically resonant roll motion for ITACA in

The field of Tourism and Hospitality is a full-fledged area of study at a global level; and applied research, or case studies, often consist in empirical analyses

There- fore an important development of the present work would be represented by the implementation of the developed algorithms on GPU-base hardware, which would allow the

Essere liberi dalla paura e dal bisogno per essere capaci di esercitare la libertà in senso pieno è quindi agli occhi della Arendt una condizione

In modern Astrodynamics and Celestial Mechanics, thanks to the rise of interplanetary exploration that has brought a renewed interest in Lambert’s problem, it has application mainly