A heuristic for optimum allocation of real-time service workflows

(1)

A Heuristic for Optimum Allocation of Real-Time Service Workflows

Tommaso Cucinotta and Gaetano F. Anastasi Real-Time Systems Laboratory

Scuola Superiore Sant’Anna Pisa, Italy

Email:{t.cucinotta,g.anastasi}@sssup.it

Abstract—In this paper, the problem of optimum allocation of real-time service workflows over a set of heterogeneous re-sources is tackled. In previous works, this problem was formally stated in terms of a Mixed-Integer Non-Linear Programming optimization program, that could be solved by recurring to commercial solvers. However, due to the big dimension of the solution space to be searched, finding the absolutely optimum solution: might take too much time in order to be concretely useful; it may preclude the use of these techniques in large-scale infrastructures; it makes the technique hardly usable adaptively in response to corrective actions that may be needed when some bad event occurs while the services are running (e.g., hardware-level failures). Therefore, in this paper a heuristic algorithm based on graph-matching is introduced that may find very efficiently a reasonably good, albeit non-necessarily optimum, solution. The algorithm is described, and its performance assessed by a set of synthetic experiments.

Keywords-real-time; workflow; resource allocation; heuristic

I. INTRODUCTION

Nowadays, more and more applications are being de-veloped and deployed according to a distributed comput-ing paradigm. Interactive and multimedia applications are examples of real-time services that can be conveniently virtualized [1] but they possess strict timing requirements in terms of the maximum end-to-end latency that can be tolerated by the users. In this context, it is useful to address the problem of how to deploy distributed service workflows with end-to-end deadline constraints over a network of het-erogeneous computing resources, as they may be available within a provider domain. In the decision process, multiple factors may have to be accounted for, including the relative importance among the workflows under admission, their as-sociated revenues for the provider, as well as their asas-sociated computation, networking and storage requirements.

This problem may be formally stated in terms of an optimization program [2], maximizing a given objective-function depending on the provider business policy, under a set of constraints due to the availability of physical resources within the domain, and to the maximum tolerable application latencies. However, due to the big dimension

The research leading to these results has received funding from the European Community’s Seventh Framework Programme FP7 under grant agreement n.248465 S(o)OS Service-oriented Operating Systems.

of the solution space to be searched, finding the absolutely optimum solution may not be convenient.

Therefore, in this paper this problem is tackled by design-ing a proper heuristic based on a graph-matchdesign-ing algorithm. This takes into account the information on the availability of resources, the workload requirements of the workflows as well as their timing constraints. The ability of the pro-posed technique to meet end-to-end real-time constraints of the workflow relies on a clear model of computation for the workflows to be deployed, and on an algorithm that compensates possible shortages in the resource allocation for one service by increasing the allocation on the ones that follow. For example, this allows for trading computing and networking allocations for meeting a given end-to-end deadline. At the same time, the algorithm has some flexibility in that it allows a provider to maximize different business-related objective functions during the allocation process. The obtained execution times for the proposed heuristic are smaller of orders of magnitude, compared to the time needed for finding a theoretically optimum solution.

II. RELATEDWORK

The problem of allocating physical resources to workflow applications has been tackled many times in the past, es-pecially in the Grid and service-oriented communities. For example, Mika et al. [3] propose an application model that resembles the one used in this paper. However, end-to-end real-time constraints are not addressed in that work.

In the real-time community, many authors designed archi-tectures for QoS-aware resource allocation and scheduling for distributed applications [4]. However, in this kind of approaches, services are already assigned to hosts, whilst this allocation/mapping is one of the variables of our problem.

In the Cloud Computing, services are encapsulated in Virtual Machines (VMs) and thus scheduling a service may comprise finding enough resources for a VM to be deployed. As an example, the work by Wang et al. [5] addresses the problem of scheduling parallel tasks within a SOA by taking into account multiple resources required by the corresponding VMs. Our work considers multiple resources as well. We assume the presence of underlying scheduling mechanisms able to offer proper guarantees to individual services or VMs [6], [7].

(2)

III. MODEL OFCOMPUTATION ANDNOTATION In this section, the main notational elements used through-out the paper are introduced. The physical resources consti-tuting the provider infrastructure are defined as follows:

• A set of computing nodes: H = {1, . . . , NH} . Each host h ∈ H is characterized by a computing capacity Uh, expressed in terms of availability of processor(s) share, and a memory capacityΩh (in bytes1).

• A set of available subnets: S = {1, . . . , S} . Each subnet is characterized by a maximum aggregate band-width Bs (in bytes/s), and a latency Ls, that depend on the adopted type of medium, packet scheduling algorithm and protocol for QoS assurance.

• The network topology information, specifying what hostsHs⊂ H are connected to each subnet s ∈ S. In this work we focus on the problem of admitting a single real-time application workflowA into the infrastruc-ture. The application is a linear workflow of n services A, {1, . . . , n} , denoted also as (τ1, . . . , τn) . Each service performs some CPU-intensive computation, then transmits some data to the next service in the workflow, which in turn starts its own computations, and so on. Activation requests to the workflow arrive with a minimumT inter-arrival time. The following elements denote computing, memory and networking requirements, as well as timing constraints, for the considered set of applications:

• Computation timeci, j exhibited by each serviceτi of applicationA, if deployed on physical node j ∈ H. • Amount of (volatile or virtual) memory ωi (in bytes)

needed byτi on the node where it will be deployed. • Numbermi of bytes to be transmitted by each service

τi∈ A, to τi+1, each time τi completes an activation, withi ∈ {1, . . . , n − 1}.

• Response-timeρi of each serviceτi ofA.

• End-to-end response-timeρ of the whole application. A. Scheduling

Each service of the application workflow is assumed to be implemented in form of a process being activated for each workflow item that needs to traverse the whole chain of computations. Also, it is assumed that each host is capable of scheduling the CPU(s) by an algorithm achieving temporal isolationamong concurrently running services, and allowing for the allocation of processor shares to the individual tasks. Furthermore, each host must allow for a simple utilization-based admission control test, i.e., ensuring that the allocation of all the services deployed on the same processor (or processor group) does not exceed a maximum capacityUj. Generally speaking, when using resource-reservation [8] schedulers, an allocation is specified for each service in terms of a budget ofqi time units guaranteed every period

1_{This can be volatile or virtual memory. Note that the model can easily}

be extended to consider both of them.

of di time units. It can be shown [9] that the time needed by the service to complete is bounded by: ρi ≤

l_c

i, j

qi

m di. However, if the budget is sufficient to sustain the worst-case execution time (as assumed from here on), then such value simply reduces to di.

It is also assumed that subnets exhibit proper packet scheduling capabilities, so that a precise bandwidthbican be assigned to each data flow needed by τi for transmitting its result (mi bytes) to τi+1, ensuring the temporal isolation among multiple data flows. For example, weighted fair queueing algorithms [10] would meet such a requirement. Note that, as a corner case, a subnet may also represent a point-to-point link, or the local “loopback” connection.

IV. FORMALIZATION OF THE PROBLEM

Using the definitions and notation introduced in Sec-tion III, the problem under study may now be formalized. Let {xi, j: i ∈ A, j ∈ H} represent unknown boolean variables representing allocations of services to hosts:∀i ∈ {1, . . . n} , ∀j ∈ H, xi, j = 1 if τi is deployed on host j and 0 otherwise. Note that, ∀i ∈ A, P

j∈Hxi, j = 1. Also, let {yi, s: i ∈ A\ {n} , s ∈ S} represent the derivative un-known boolean variables (introduced for clarity) represent-ing allocations of services to subnets:∀s ∈ S, yi, s= 1 if τi is deployed on some nodej ∈ Hs. The relationship between the two sets of variables is given by: ∀i ∈ A, ∀s ∈ S, yi, s =P_j∈H_sxi, j. These variables need to be flanked by other ones quantifying the resource allocations performed on each one of the hosts. Finally, let ui = Pj∈Hxi, jci, j/di be the CPU share allocated forτi on hostj.

The end-to-end response-timeρ of the application work-flow to be admitted may now be formalized and constrained to be lower than or equal to the maximum allowed end-to-end valueR, as coming from the SLA:

ρ = X i∈A\{n} di+ mi bi +X s∈S yi, sLs ! + dn≤ R (1) The resource allocation constraints to be respected are:

     P i∈Auixi, j ≤ Uj ∀j ∈ H P i∈A\{n}biyi, s ≤ Bs∀s ∈ S P i∈Aωixi, j ≤ Ωj, ∀j ∈ H ,

whereUj and Bs denote the available resource utilizations at the time of admitting the new application. If the admission test succeeds, then these values are updated as follows:

     Uj= Uj− ui f or j s.t. xi, j= 1 Bs= Bs− bi f or s s.t. yi, s = 1 Ωj = Ωj− ωi f or j s.t. xi, j= 1 (2)

Additionally, in order to ensure that each token is fully processed before the next one arrives along the pipeline, we

(3)

need the further additional constraints (see also [2]): ∀i ∈ A, di≤ T and ∀i ∈ A, m_b_ii +Ps∈Syi, sLs≤ T.

V. ALGORITHMDESCRIPTION

In this section we propose an algorithm for allocating a distributed real-time application A on a set of connected nodes H while respecting an end-to-end constraint R, as previously discussed. The proposed algorithm is capable of returning the following information:

• the Boolean variables {xi,j} that specify whether or not the servicei is allocated on host j;

• the allocation parametersdi andbi for each servicei. From a high-level perspective, the algorithm performs a graph visit, searching for a path capable of hosting the workflow services, given their workload requirements and end-to-end deadlineR. Specifically, a deadline-splitting methodology (see Algorithm 3) is used to compute a tenta-tiveresources requirements along the various services. Then, instead of merely failing the allocation when not enough resources are available at a step of the algorithm, we exploit the ability for the subsequent stages of the pipeline to com-pensate for possible shortages in the current step. Therefore, the allocation at each step is tuned so as to ensure that the end-to-end response-time till the current service is within the range [−αn thrR, αthrR] ≡ [−∆n thr, ∆thr], with αn thr andαthr tunable thresholds ranging in[0, 1] ⊂ R.

The main procedure of the proposed algorithm is pre-sented in Algorithm 1. Such procedure tries to find a set of nodes satisfying the workflow requirements, by performing a search on the graphG describing the underlying network. It also takes as input the vertexu describing the beginning host of the search and the indexi representing the service compo-nentτi to be allocated, which is increased at each recursive invocation. Basically, the performed search is based on the well-know Depth-First Search (DFS) method, enhanced by considering the arcs at each node in the order dictated by a specific ordering criterion (see line 7 in Algorithm 1), by which particular business policies can be enforced.

The core of the proposed algorithm can be considered the

SERVICE ALLOCATIONprocedure (see Algorithm 2), that al-locates each service componentτion a vertexj. If τican be allocated onj, an assignment is performed for its parameters diandbiand the graph is updated (line 6) for reflecting such allocation, according to Eq. (2). In particular, the allocation procedure performs an initial assignment(d0

i, b0i) of(di, bi), taking into consideration the minimum requirements of the whole workflow (see Algorithm 3), derived by inflating the minimum allocation (dmin

i , bmini ) due to the arrival rate T of the tokens, by a factor β expressing the ratio between the delay obtained with the minimum allocation and the maximum allowed latencyR. At this time the network over which each service will be allocated is unknown, thus the corresponding delayLs is upper-bounded by the maximum delay among the available networks.

Algorithm 1 WORKFLOW ALLOC(G, u, i) 1: oc←_{get vertices ordering criterion()}

2: if all services are allocated then

3: return true

4: end if

5: for allj ∈ Adjacent(u) do

6: if mark[j]=WHITE then

7: queue ← push(j, oc)

8: end if

9: end for

10: while queue is not empty do

11: j ← pop()

12: if SERVICE ALLOCATION(i, j)=false and j satured then

13: mark[j] ← BLACK

14: else if WORKFLOW ALLOC(G, j, i+1) = true then

15: return true

16: end if

17: end while

18: return false

Algorithm 2 SERVICE ALLOCATION(i, j) 1: (d0

i, b0i) ← INIT PARAMS(i)

2: tcij←d0_i− ci U_j 3: tsij←m_b0i i −mi rj

4: if VERIFYREQUIREMENTS(i) = true then 5: (di, bi) ← ASSIGNMENTS(i, j) 6: update status() 7: return true 8: else 9: return false 10: end if

After the initial allocation, Algorithm 2 calculates the residual times tcij andtsij, respectively referring to com-puting and networking power, that are positive if hostj has enough resources to sustain the allocation of servicei. As an example, iftcij = 1 it means that the host, after allocating a CPU share for sustaining a service response equal tod0

i, can allocate additional shares for sustaining at most a response equal to 1 time unit. The same reasoning can be done for tsij, except that in this case rj represents the remaining outbound bandwidth of j.

The computation oftsij andtcij, along with the residual ∆t(i−1) _{coming from previous allocations (please note that} for service i = 1, ∆t(0) _{= 0), permits to decide if the} service i can be allocated on host j. In particular, the

VERIFYREQUIREMENTSsubprocedure calculates such quan-tity∆t(i)tmp, tsij+ tcij+ ∆t(i−1)and denies the allocation (by returning FALSE) if∆t(i)tmp< 0 ∧ |∆t

(i)

tmp| > ∆tn thr. In case ofi = n, it is sufficient that ∆t(i)tmp< 0 for denying the allocation, because a deficit on the last service allocation cannot be compensated further.

Once verified the timing requirement for allocating the service τi on the host j, the ASSIGNMENTS procedure is performed for assigning the pair (di, bi) that could poten-tially differ from the initial assignment (d0

i, b0i). The ratio behind this procedure, not detailed due to space constraints,

(4)

Algorithm 3 INIT PARAMS(i) 1: N ← 2n − 1 2: (dmin i , b min i ) ← (T, mi/T ) 3: β ← N T R−(n−1) maxs∈S{Ls} 4: ifβ > 1 then 5: (d0 i, b0i) ← (d min i /β, β b min i ) 6: else 7: (d0 i, b0i) ← (dmini , bmini ) 8: end if 9: return (d0 i, b0i) Table I

ALLOCATION STATISTICS BY USINGDRUCRITERION.

(αn thr, admitted used allocated util. on used avg exec αthr) wf(%) hosts(%) util.(%) hosts(%) time(ms)

(0, 0) 0.610 0.816 0.452 0.555 0.962 (0.1, 0) 0.666 0.874 0.507 0.579 0.941 (0.2, 0) 0.703 0.910 0.551 0.605 0.922 (0.1, 0.1) 0.653 0.882 0.562 0.639 0.878 (0.2, 0.2) 0.500 0.726 0.512 0.599 0.732 (0.2, 0.1) 0.676 0.906 0.591 0.653 0.867 is keeping∆t(i) _{constrained in the range}_[−∆t

n thr, ∆tthr]. Briefly, if the condition∆t(i)tmp > ∆tthr is not verified, all the remaining resources of j can be assigned to τi. Vice versa, we only allocate resources needed for obtaining a temporal surplus equal to∆tmis, ∆tthr− ∆t(i−1).

VI. EXPERIMENTS

This section describes a set of experiments performed for evaluating the performance of the proposed algorithm. In particular, three workflows have been subsequently submit-ted to the algorithm for deciding about the admission on a sample grid of 5 homogeneous hosts, each one with a network capacity of 100M b/s and a maximum utilization Uj = 0.95. Each workflow is composed of three services, whose requirements have been randomly generated. A total of100 repetitions have been performed for different values ofαn thr andαthr and by varying the criterion for ordering hosts during the graph visit phase.

Initially, we have evaluated our algorithm by ordering the hosts using the decreasing residual utilization (DRU) criterion and the statistics are reported in Table I.

We repeated the experiment (results are not reported due to space constraints) by ordering the hosts with the increasing residual utilization(IRU) criterion and we noticed that it leads to higher server consolidation levels, with the use of generally a lower number of hosts that tend to be more saturated. In both cases, it can be well appreciated the impact of theαthr andαn thr parameters of the allocation heuristic on the achieved results.

Regarding the average execution times of the algorithm, they are in both cases in the order of ms. It could be interesting to note that the former optimum problem [2] applied on a comparable case study has been solved by a commercial product requiring a time in the order of seconds.

Please note that this comparison is quite rough, as the model used in this paper has been simplified with respect to the former one. A more thorough comparison is deferred to future work.

VII. CONCLUSIONS ANDFUTUREWORK

In this paper, a heuristic based on graph-matching was proposed to solve the problem of allocation of distributed services over a heterogeneous network. From the presented preliminary results, the heuristic promises to be very ef-ficient so as to be usable for on-line allocation decisions to be taken in dynamic SOA environments. In the future, a more extensive evaluation needs to be performed on the proposed technique, especially considering large networks of resources with high numbers of workflows to be deployed.

REFERENCES

[1] T. Cucinotta, F. Checconi, G. Kousiouris, D. Kyriazis, T. Var-varigou, A. Mazzetti, Z. Zlatev, J. Papay, M. Boniface, S. Berger, D. Lamp, T. Voith, and M. Stein, “Virtualised e-learning with real-time guarantees on the irmos platform,” in

Proceedings of the IEEE International Conference on Service-Oriented Computing and Applications (SOCA 2010), Perth, Australia, 12 2010, pp. 1–8.

[2] K. Konstanteli, T. Cucinotta, and T. Varvarigou, “Optimum allocation of distributed service workflows with probabilistic real-time guarantees,” Service Oriented Computing and

Ap-plications, vol. 4, pp. 229–243, 2010.

[3] M. Mika, G. Waligra, and J. Wglarz, “Modelling and solving grid resource allocation problem withnetwork resources for workflow applications,” Journal of Scheduling, vol. 14, pp. 291–306, 2011.

[4] K. Nahrstedt, H.-h. Chu, and S. Narayan, “Qos-aware re-source management for distributed multimedia applications,”

J. High Speed Netw., vol. 7, pp. 229–257, December 1998. [5] L. Wang, G. von Laszewski, M. Kunze, and J. Tao, “Schedule

distributed virtual machines in a service oriented environ-ment,” in Proceedings of the 2010 24th IEEE International

Conference on Advanced Information Networking and Ap-plications, ser. AINA ’10. Washington, DC, USA: IEEE Computer Society, 2010, pp. 230–236.

[6] T. Cucinotta, A. Mancina, G. F. Anastasi, G. Lipari, L. Mangeruca, R. Checcozzo, and F. Rusina’, “A real-time service-oriented architecture for industrial automation,” IEEE

Trans. on Industrial Informatics, vol. 5, no. 3, Aug 2009. [7] T. Cucinotta, G. F. Anastasi, and L. Abeni, “Respecting

temporal constraints in virtualised services,” in Computer

Software and Applications Conference, 2009. COMPSAC ’09. 33rd Annual IEEE International, vol. 2, July 2009, pp. 73–78. [8] C. W. Mercer, S. Savage, and H. Tokuda, “Processor capacity reserves for multimedia operating systems,” Carnegie Mellon University, Pittsburgh, Tech. Rep. CMU-CS-93-157, 1993. [9] L. Palopoli, T. Cucinotta, L. Marzario, and G. Lipari,

“AQu-oSA — adaptive quality of service architecture,” Software –

Practice and Experience, vol. 39, no. 1, pp. 1–31, 2009. [10] J. C. R. Bennett and H. Zhang, “Hierarchical packet fair

queueing algorithms,” IEEE/ACM Transactions on

Network-ing, vol. 5(5), pp. 675–689, October 1997.