Performance Evaluation of Replication Policies in Microservice Based Architectures

(1)

Original Citation:

Performance Evaluation of Replication Policies in Microservice Based Architectures

Published version:

DOI:10.1016/j.entcs.2018.03.033

Terms of use:

Open Access

(Article begins on next page)

Anyone can freely access the full text of works made available as "Open Access". Works made available under a Creative Commons license can be used according to the terms and conditions of said license. Use of all other works requires consent of the right holder (author or publisher) if not exempted from copyright protection by the applicable law.

Availability:

This is the author's manuscript

(2)

or with entcsmacro.sty for your meeting. Both can be

found at theENTCS Macro Home Page.

Performance evaluation of replication policies

in microservice based architectures

Marco Gribaudo

2

Politecnico di Milano, via Ponzio 34/5, 20133 Milano (Italy)

Mauro Iacono

3

Universit`a degli Studi della Campania ”L. Vanvitelli” viale Lincoln 5, 81100 Caserta (Italy)

Daniele Manini

1

Universit degli Studi di Torino corso Svizzera, 185, 10129 Torino, Italy

Abstract

Nowadays applications tend to be executed on distributed environments provisioned using on-demand in-frastructures. The use of techniques such as application containers simplifies the orchestration of complex systems. In this context, microservices based architectures offer a promising solution for what concerns soft-ware development and scalability. In this paper, we propose an approach to study the automatic scalability of microservices architectures deployed in public and private clouds. A Fluid Petri Net model describes the characterise of the platform, and a real trace drives the approach to consider a realistic scenario. Our focus is on evaluating the performances, costs and energy consumptions from both the service provider and infrastructure provider point of view.

Keywords: Performance evaluation, microservices.

1 Introduction

The need for a fast, short development cycle and an agile management of Software-as-a-Service (SaaS) applications has led to the rise of the Microservice-Based Soft-ware Architecture (MBSA). This paradigm is based on the parcelization of complex software applications into a high number of small software services, namely Mi-croServices (MSs). This approach has been proposed by the industry, as a result of the application of the concepts of the Service Oriented Architecture (SOA) to the

1 _{manini@unito.it}

2 _{gribaudo@elet.polimi.it} 3 _{mauro.iacono@unicampania.it}

(3)

design and implementation of cloud provisioned software. SOA concepts, abstracted from the SOA implementation issues, lead to a definition of an application as an in-teraction (basically, a workflow) of several independent software units that provide self-contained logical functionalities. Each software service is developed and man-aged independently from the others, but offers a well defined and known interface to other services, thus decoupling the overall design and life cycle of an application and the design and life cycle of each service. In the MBSA approach, this is, in principle, pushed towards a higher granularity of services per application. Three advantages characterise this approach: i) services are very simple, so that they can be developed and managed by very small teams with very short cycles (compatible with DevOps and, in general, agile development approaches); ii) management of a service implies a small amount of knowledge and the development team may easily compensate a high-frequency turnover of its members; iii) the resulting architecture is, in principle, highly scalable, as every service may be executed in a variable num-ber of instances according to the instant needs generated by workload fluctuations, and fault tolerant, as a fault in a service does not cause a failure of the application, and compensation is easy by using another instance of the same service. A MS is an isolated, loosely-coupled unit of development that works on a single concern. This usually means that MSs tend to avoid interdependencies: if one MS has a hard requirement for other MSs, then the point is to understand if it makes sense to make them all part of the same unit. A monolithic application is split into several com-ponents that can run independently and may be implemented with different coding or programming languages. The resulting independent programs are executable by themselves, then these smaller components are grouped together to deliver all the functionalities of the monolithic application. The cloud infrastructure supporting the execution and the algorithms defining resources assignment to MSs should be optimized in order to keep the overall service reliable and performing.

In [6] we already presented a modelling technique that allows to give a general

characterisation of MBSA, in terms of performances and resource utilisation. In this work, we focus on the problems related to scalability issues with respect to cloud architectures. As cloud architectures are the most likely hosting platforms for MS based applications, the interactions between scaling and cloud management and cost policies should be investigated, in order to understand the implications of design and management decisions. In normal operations on cloud systems, costs depend on the number of Virtual Machines (VM) that are used per time frame, that is generally defined as one hour: consequently, different management policies may cause different costs on the same platform with the same overall workload, depending on the usage patterns of available VMs. We propose a modelling approach that allows to evaluate the effects of different possible scaling policies, and that allows to evaluate the costs that derive from different VM usage strategies. The modelling approach is presented in two steps, to focus on the two problems and combine different scaling and usage strategies.

The rest of this paper is organised as follows: Section 2 presents the literature

describing and evaluating MBSAs in cloud environments, in Section3the proposed

approach is presented and the model exploitation is applied to different cases study,

(4)

develop-ments.

2 Related works

MBSA emerged as a solution to support a fast and agile development of large appli-cations with very simple independent services with their own separate development and maintenance cycle and team. An application is then composed of a set of (not exclusively) integrated microservices, that interact via HTTP or socket based messaging, and is executed, in general, on a cloud system. A commonly agreed

definition of MBSA has been presented for the first time in [1], while a more

gen-eral point of view on this model and its implications has been presented in [4],

that is a suitable reference for a first approach to the theme. A very comprehen-sive discussion on all the aspects of the architecture and the typical workload, and a comparison between a monolithic and a MBSA application in different

imple-mentations and cases are available in [20]. The problem of costs and development

cycle management have been analyzed in [21] and [22], with a comparison between

MBSA and monolithic and AWS Lambda based solutions. In [7] is discussed how

mircoservices support scalability for both, runtime performance and development performance, via polyglot persistence, eventual consistency, loose coupling, open source frameworks, and continuous monitoring for elastic capacity management.

The use of containers has been considered in [10] where scalability issues for Docker

technology are evaluated. A similar study has been developed in [12], that identifies

the challenges for a full development of containers based systems. In both works [8]

and [5], the operating conditions are analyzed. The former describes a proposal for

resilience testing, while the latter formulates a proposal for a decentralized auto-nomic behaviour in MS infrastructures. From the applications point of view, among the works presented in literature, [15] and [3] are an interesting starting reference

for readers, since giving a clear and systematic picture of the app scenario. In [17]

authors present general autoscaling techniques in cloud environments. In [18]

au-thors propose an automated approach for the selection and configuration of cloud

providers for multi-cloud MS-based applications. [13] presents benchmark results

to quantify the impacts of container, software defined networking, and encryption

on network performance. In [14] authors present an approach to model the

deploy-ment costs, including compute and IO costs, of MS-based applications deployed to

a public cloud. [19] proposes a novel architecture that enables scalable and resilient

self-management of MSs applications on cloud. An interesting and current state-of-the-art review about cloud container technologies is reported in [16]. Finally, [2] is suggested as a more extensive reference list.

3 modelling approach

The goal of our modelling process is to evaluate the performances, costs and energy consumptions related to the provisioning of a MBSA from both the service provider and infrastructure provider point of view. In particular, we divide the evaluation into two parts: first we consider the autoscaling strategy, then we focus on the pro-visioning scheme. The autoscaling strategies will be described using pseudo-code

(5)

algorithms, while the provisioning schemes will be modelled with Fluid Stochastic Petri Nets (FSPN).

The main purpose of an autoscaling strategy is deciding when to increase or de-crease the amount of resources used to support an application. In this work, we will consider a static strategy that, based on the current workload λ, decides how many instances of each MS are required to execute the application in a stable

way. Each strategy is described by a set of tuples S = {Ci} where each

ele-ment Ci = (λU, λL, n, m, A) defines an infrastructure configuration. Given the

current workload λ, the system must be in state Ci such that λL≤ λ < λU (λU is

the upper bound and λL is the lower bound). Whenever the workload exceeds

λU, the system moves to configuration Ci+1. If workload becomes lower than

λL, the system switches to configuration Ci−1. To avoid alternating behaviours,

give two consecutive configurations Ci and Ci+1, we must have that: λ(i)_U > λ(i)_L ,

λ(i+1)_U > λ(i)_U , λ(i+1)_L > λ(i)_L (configuration Ci+1 must handle a larger workload

than Ci), and λ(i)U > λ (i+1)

L (if the system returns to configuration Ci+1 to Ci,

it must not immediately return to Ci+1)4. Element n represents the number of

virtual machines over which the application is deployed in configuration Ci, and

m = (m1, ·, mNms) ∈ N

Nms _{is a vector whose component m}

j represents the

num-ber of instances on which MS j is currently replicated. Let us call M = PNms

j=1

the total number of instances of MSs used in the considered configuration. Matrix

A = |ajk| ∈ NNms×n defines the allocation of mictroservices on the available VMs:

ajk, 1 ≤ j ≤ Nms, 1 ≤ k ≤ n defines the number of type j MSs currently running

on VM k. For the set of configurations S to be valid, we need te following

addi-tional constraints: n(i+1) ≥ n(i)_{(the number of provisioned VMs can only increase),}

∃j, k : a(i+1)_jk > ai_jk (the number of instances of at least one MS must increase in

configuration Ci+1 compared to config Ci), ∀j,Pnk=1ajk = mj (all MSs instances

must be allocated to a VM).

Following [6], we have generated MBSAs randomly. Each topology is composed

by random number Nms of MSs that follows a Poisson distribution of parameter

µ. MSs themselves, are characterised by their average service demand Dk, that is

computed as the product of an average number of visits vk, and the average time

spent in service at each visit Sk, that is:

Dk= vk· Sk, 1 ≤ k ≤ Nms

Visits are randomly determined according to a rule that is based on the Zipf law, which is defined using four parameters:

vk=

c

(k + q + β · u)s (1)

where k is the index of the considered MS, c is the scale parameter (that defines the order of magnitude of the visits), s is the shape parameter (that defines the ratio between the more popular and less popular MSs), q is the shift parameter (that fine tunes the range of the visits), β is the randomness parameters (if β = 0, visits are determined entirely by the Zipf law, otherwise they are randomly modulated), and

4 _{We have used the notation •}(i)_{to denote the • element of the tuple corresponding to configuration C} i.

(6)

u is a random number in the [0, 1] range. In this way, MSs characterised by a lower index k are more popular and receives more visits.

Service demands are instead randomly determined from instances of an Erlang distribution, summarised by a rate parameter γ and a number of stages equal to KS.

3.1 First step: autoscaling strategies

The considered autoscaling strategies model and evaluate the effects of different

allocation and consolidation5 policies for MSs, in order to understand the impact

of this choice on the usage profile of a set of VMs. In particular, the main point is the evaluation of the influence of allocation and consolidation on relevant indexes. Relevant indexes are, for the goals of our work, the required number of VMs as function of the workload and the utilisation of the VMs.

Autoscaling strategies are characterised by two important features: the initial allocation, and the consolidation policies. The initial allocation policy defines the main logic according to which the different MSs are mapped onto available VMs when the system starts the operations. The consolidation policy defines how the MSs are mapped onto available VMs according to the dynamics of the workload, i.e. the growth or decrease of demand for the various instances for the different MSs.

Given a set of autoscaling strategies, it is possible to compare them by looking at the number of VMs needed and their utilisation, as a function of the workload.

We have then fixed a maximum arrival rate the system might have to serve, λM ax,

and used the strategies to generate the corresponding set of configurations S. In particular, we have generated |S| configurations such that λ(|S|−1)_U ≤ λ_{M ax} < λ(|S|)_U .

In the following, we will consider three autoscaling strategies. Strategy A: one MS per VM

The first strategy follows a trivial approach, used as a baseline for a comparison with the others: it consists in simply mapping one MS on one VM. The initial allocation,

for a configuration composed by Nms MSs characterised by demand vector D, is

described in Algorithm 1, where function eye(Nms) returns a vector of size Nms

components, all equal to 1, and ones(Nms) returns an identity matrix of size Nms.

Parameter Umax represents the maximum average utilisation a VM is allowed to

have. Basically, applying the utilisation law, it determines the maximum workload

the initial configuration is able to serve (λ(0)_U ), then it creates the configuration

where the number of virtual machines n = Nms, i.e. each MS runs as a single

instance and it is allocated on a different VM.

Algorithm2, describes how the next configuration Ci+1can be determined from

configuration Ci. The extra parameter Umin defines the minimum utilisation the

bottleneck server can have before the system can downgrade to the previous

con-figuration. The algorithm first determines the bottleneck VM j in configuration Ci

(line1); then it increases the number of instances of the corresponding MS (line2),

5 _{In the following, the term consolidation refers to the execution of two or MSs on a single VM, in analogy}

with its current use, that refers to the execution of two or more virtual machines on a single physical machine.

(7)

Algorithm 1 autoScalingOneMsPerVM.init(D, Nms, Umax) 1: λ(0)_U = Umax max 1≤k≤Nms Dk ; 2: return C₀ = 0, λ(0)_U , Nms, ones(Nms), eye(Nms) ;

and it determines using the utilisation law the new values of λ(i+1)_L and λ(i+1)_U (line

3). Since in this strategies, each MS is allocate on a new VM, the new configuration

will be characterised by n(i+1) = n(i)+ 1 and A(i+1)= eye(n(i)+ 1).

Algorithm 2 autoScalingOneMsPerVM.consolidation(Ci, D, Nms, Umin, Umax)

1: j = argmax 1≤k≤Nms Dk m(i)_k ! ;

2: m(i+1) = m(i); m(i+1)_j = m(i)_j + 1; 3: λ(i+1)_U = Umax max 1≤k≤Nms Dk m(i+1)_k ! ; λ (i+1) L = Umin max 1≤k≤Nms Dk m(i+1)_k ! ; 4: return Ci+1=

λ(i+1)_L , λ(i+1)_U , n(i)+ 1, m(i+1), eye(n(i)+ 1)

;

Strategy B: start with one MS per VM, but then try to consolidate MSs on less utilised VMs

The second strategy set has the same allocation strategy of the first (Algorithm

1), with one VM for each MS, but it uses consolidation to limit the requests for

more VMs when the workload increase: a new VM is requested only when all

existing VMs are saturated. This is described in Algorithm3. In this case, an extra

parameter UC is required, to control how consolidation is operated. In particular,

the procedure works in this way: whenever a MS is replicated, it tries to start it on the VM that is currently less utilised. If after adding the new instance of the

bottleneck MS, the VM remains with a utilisation lower than UC, consolidation is

actually performed. Otherwise, if the machine with the smallest workload, after starting the new instance of the considered MS would jump to a utilisation level

greater than UC, then consolidation is not performed, and the new instance of the

MS is started on a new VM. This allows to not saturate a VM with consolidation, and avoid Zeno behaviours that could continuously replicate MS, without actually increasing the capacity of the system. In order to determine which MS to replicate in

a configuration, the algorithm first determines the most utilised VM h (line1), and

then the MS j that has the most impact on the workload of the node (line2). Then

it looks at the best candidate VM c where to place the replica of MS j, as the least

loaded VM with this new replication scheme m(i+1) (line4). Consolidation occurs

only if the selected VM c, after hosting MS j, will not exceed the UC utilisation

threshold with the workload that caused the system to enter configuration Ci+1,

that is λ(i)_U (line 5). In this case the number of VMs do not change, and the

allocation matrix A increases of one unit the element ajc corresponding to the

starting of another instance of MS j on VM c (line 6). Otherwise, a new VM is

(8)

function add1col A(i) adds one extra column to matrix A(i). The new limits for the workload are again computed applying the utilisation law to all VMs running in configuration Ci+1(line 10).

Algorithm 3 autoScalingConsMSonVMs.consolidation(Ci, D, Nms, Umin, Umax, UC)

1: h = argmax 1≤l≤n(i) Nms X k=1 a(i)_kl · D_k m(i)_k ! ; 2: j = argmax 1≤k≤Nms a(i)_kh· Dk m(i)_k ! ;

3: m(i+1) = m(i); m(i+1)_j = m(i)_j + 1;

4: c = argmin 1≤l≤n(i) Nms X k=1 akl· Dk m(i+1)_k ! ; 5: if Nms X k=1 akc· Dk m(i+1)_k + Dj m(i+1)_j ! λ(i)_U < UC then 6: A(i+1) = A(i); n(i+1) = n(i); a(i+1)_jc = a(i)_jc + 1; 7: else

8: A(i+1) = add1col A(i); n(i+1) = n(i)+ 1; a(i+1) jn(i+1) = 1; 9: end if 10: λ(i+1)_U = Umax max 1≤l≤n(i+1) Nms X k=1 a(i)_kl · D_k m(i)_k !; λ (i+1) L = Umin max 1≤l≤n(i+1) Nms X k=1 a(i)_kl · D_k m(i)_k !; 11: return Ci+1=

λ(i+1)_L , λ(i+1)_U , n(i+1), m(i+1), A(i+1)

;

Strategy C: start with MSs consolidated on VMs, and then try to consoli-date MSs on less utilised VMs

The third strategy set has the same consolidation strategy as the one presented in

Alogrithm3, but it uses a different initial allocation strategy to consolidate the MSs

on the VMs even with a small workload. This technique, as described in Algorithm

4, requires an additional parameter αV M that defines the ratio of initial number of

VMs with respect to the number of MSs (line 1). The algorithm starts allocating

all the MSs on different VMs (line2), and then it consolidates them until the target

number of VMs is reached (line3). Consolidation is performed by first finding the

two less loaded VMs h and l (line 4). Then the MSs running on the two VMs are

consolidated on a single VM, and one VM is released (line 6). In this case, the

notation A:h refers to the h-th column of matrix A, and function rem1col (A, l)

removes column l from matrix A. At the end, the upper limit to the workload the

configuration can handle is computed using the utilisation law (line8).

3.2 Evaluating the autoscaling strategies

We have generated several MBSA demands using the following parameters: µ =

[5 . . . 100], c = 6, s = 1.5, q = 2, β = 1, γ = 25 and KS = 4. The thresholds used

(9)

Algorithm 4 autoScalingConsMSonVMs.int(Ci, D, Nms, Umax, αV M) 1: n(0) = dαV M · Nmse; 2: n = N_ms; A = eye(N_ms); 3: while n > n(0) do 4: [h,l] = ( h 6= l|∀t 6= h, t 6= l : Nms X k=1 aktDk ≥ Nms X k=1 akhDk∧ Nms X k=1 aktDk≥ Nms X k=1 aklDk ) 5: (with 1 ≤ h, l, t ≤ n); 6: A:h= A:h+ A:l; A = rem1col (A, l); n = n − 1; 7: end while 8: λ(0)_U = Umax max 1≤l≤n(0) Nms X k=1 a(i)_kl · Dk m(i)_k !; 9: return C₀ = 0, λ(0)_U , n(0)_{, ones(N} ms), A ;

and UC = 0.8. Each MBSA is characterised by different number of MS (NM S)

and different demand vectors D. Figure1 shows how the demands change for the

different MSs for five MBSA with µ = 10 (a) and µ = 50 (b). As it can be seen, although the Zipf law makes the demand decrease as the id of the MS increases, the randomness created by having set parameter β = 1 does not make them a monotonic function.

Figure 2 shows the evolution of the number of VMs required to handle a given

0 20 40 60 80 100 120 140 0 2 4 6 8 10 12 Demand [ms.] MS id. MBSA 1 MBSA 2 MBSA 3 MBSA 4 MBSA 5 0 20 40 60 80 100 120 0 10 20 30 40 50 Demand [ms.] MS id. MBSA 1 MBSA 2 MBSA 3 MBSA 4 MBSA 5 a) b)

Fig. 1. Demands characterising two generated MBSA: a) µ = 10, b) µ = 50.

workload λ for topologies generated with a different average number of MSs µ = [5 . . . 100]. To show the differences that applications generated with the same µ have, the plot shows three different traces for µ = 10. The assumption that MSs are deployed initially on different VMs makes the curves depend on the total number of MSs for very small workload. Instead, as the workload increases, the number of VMs

tends to be proportional to the total demand of the system, D =PNM S

i=1 Di. Due

to the Zipf assumption, D increases very slowly as function of µ. For these reasons, in the following, we will focus only on two sample MBSAs generated respectively with µ = 10 and µ = 100.

(10)

0 50 100 150 200 250 300 350 400 0 50 100 150 200 250 300 350 400 450 500 550 # VMs Workload [job/s.] µ=5 µ=10 (1) µ=10 (2) µ=10 (3) µ=20 µ=50 µ=100

Fig. 2. Required VMs for Strategy A (no consolidation) as function of the workload, for different average number of MS µ. For the case µ = 10, the value for three different traces is shown.

The three proposed scaling strategies are compared in Figure 3 and 4. When

the workload is high compared to the capacity of the servers and a large number of instances is required to handle the requests, all policies behave more or less in the same way. Instead, when the traffic is low, consolidation policies are much more efficient with respect to the non-consolidating ones: this is particularly evident in

Figure 3b, for the case with µ = 100, and in the zoom provided in Figure 4 that

considers a MBSA with NM S = 9. However, from Figure 4, it is also clear that as

soon as traffic increases a bit, the differences between the three policies become less

and less evident, and with relatively high workloads (Figure 3a), strategy A that

does not perform consolidation performs even better that some of the other policies.

0 10 20 30 40 50 60 70 80 90 100 0 50 100 150 200 # VMs Workload [job/s.] A: VM B: MS C: MS, α=0.2 C: MS, α=0.4 C: MS, α=0.6 0 50 100 150 200 250 300 350 400 450 0 100 200 300 400 500 # VMs Workload [job/s.] A: VM B: MS C: MS, α=0.2 C: MS, α=0.4 C: MS, α=0.6 a) b)

Fig. 3. Comparison of the different scaling strategies: a) µ = 10, b) µ = 100.

The effect on the utilisation of the VMs of the different policies is studied in

(11)

0 2 4 6 8 10 12 14 0 5 10 15 20 25 # VMs Workload [job/s.] A: VM B: MS C: MS, α=0.2 C: MS, α=0.4 C: MS, α=0.6

Fig. 4. Comparison of the different scaling strategies for a MBSA with NM S= 9 and light workload.

Utilisation of a VM h is computed considering its total demand of the selected VM

in the first configuration Ci that can handle the target workload λ, and applying

the Utilisation Law:

Uh(λ) = λ Nms X k=1 a(i)_kh· D_k m(i)_k , i : λ (i−1) U < λ ≤ λ (i) U (2)

As it can be seen, non-consolidating policies (Figure5a and d) provide even a more

evident sharing of the workload among the configurations. Instead, consolidation

creates a large variability in the minimum utilisation (Figure 5c and f): this is

caused by the fact that when a new VM is started, it is usually assigned a MS that has already a large number of replica, and thus it receives a small amount of requests. This, however, is almost immediately corrected since the newly introduced VMs are targeted to host the consolidated replica of the next MS that becomes the bottleneck. For scenario B, the consolidation occurs only after the initial stage, this phenomenon on the minimum utilisation occurs only when the workload reaches a certain level.

To summarise, from the previous analysis we can conclude that it is better to use consolidating policies (strategy C) when either the workload is very low, or the number of MSs is very high. Otherwise, the non-consolidating policy (strategy A) provides similar results in terms of required VMs, but a more uniform evolution in the utilisation of the VMs, leading to more predictable performances.

3.3 Second step: cloud resource management policies

In the second step, the available cloud resources and their management policy are considered. As costs are dictated by the usage of resources in time and volume, the number of VMs in use are the most relevant cost factor. Two main cases are in the scope of our study: private clouds and public clouds.

(12)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 50 100 150 200 Util. Workload [job/s.] A: VM, µ=10 Min. Avg. Max. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 100 200 300 400 500 Util. Workload [job/s.] A: VM, µ=100 Min. Avg. Max. a) d) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 50 100 150 200 Util. Workload [job/s.] B: MS, µ=10 Min. Avg. Max. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 100 200 300 400 500 Util. Workload [job/s.] B: MS, µ=100 Min. Avg. Max. b) e) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 50 100 150 200 Util. Workload [job/s.] C: MS, α=0.4, µ=10 Min. Avg. Max. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 100 200 300 400 500 Util. Workload [job/s.] C: MS, α=0.4, µ=100 Min. Avg. Max. c) f)

Fig. 5. Average, minimum and maximum utilisation of the VMs: a,b,c) µ = 10, d,e,f) µ = 100; a,d) Strategy A - no consolidation, b,e) Strategy B - initially one VM per MS, then consolidation, c,f) start consolidated at αV M = 0.4, then continue consolidating resources.

running the application: consequently, the cost of a running VM does not depend on a fee connected to the type and number of used resources and the usage time according to a contract, but depends on the real expenses deriving by running the physical system that hosts the VMs. The minimisation of costs is thus connected to power saving, rather than on the limitation of VMs usage.

In the case of public clouds, instead, costs are standardised according to the number of VMs used per time billing unit (e.g., one hour): the minimisation of costs is thus connected to a better usage of the minimal number of VMs possible,

(13)

but with the maximum exploitation of resources in the time billing unit.

The evaluation of the two different cases is performed by means of the FPN

models depicted in Fig. 6and Fig. 7.

Starting NewVM Ready VMs Release SuttingDown Off Load Decrease Increase

Fig. 6. Model for the description of the provisioning process in private clouds.

Case I: private cloud

We will first consider the case of private clouds (Fig. 6). The current workload of

the considered MS is expressed as the number of requests per second it has to serve. In the proposed model, this value corresponds to the marking of place Load. The workload evolves with time, and it might either increase or decrease: this is modelled by the two time-dependent fluid transitions Increase and Decrease. In our study, we will make them fire to make the marking of place Load follow the fluctuations of a publicly available workload trace. Marking of place VMs represents the number of currently deployed VMs. Let us call #P the marking of place P. The acquisition of a new resource is modelled by immediate transition NewVM: in particular it fires whenever the test arc that connects it to fluid transition Load detects that the

workload has exceeded the upper threshold λU(#VMs) for the configuration Ci with

the maximum number of MSs instances, and a number of VMs n(i) = #VMs equal

to the one currently running (test arc that connects it to place VMs). In the same way, immediate transition Release models the de-provisioning of a VM, and fires whenever it detects that the workload (connection with the inhibitor arc to place Load) is less than the threshold λL(#VMs) of the configuration Ci0 with ni

0

= #VMs virtual machines, and the least number of MSs instances (test arc that connects to place VMs). More formally:

λU(n) = {λ(i)_U ∈ Ci|n(i)= n ∧ n(i)+ 1 = n(i+1)}, (3)

λL(n) = {λ(i

0₎

L ∈ Ci0|n(i)= n ∧ n(i 0₎

= n(i0−1)+ 1} (4)

VMs are characterised by a startup time Tup and a shutdown time Tdown. In this

phases, VMs are running, but cannot be used to serve any of the incoming traffic. Since they are running, they consume energy: note that the instances of the MSs running over them are considered to be inactive and do not share the workload with the other instances. Startup phase is modelled by deterministic transition Ready

(characterised by firing time Tup) and place Starting. Shutdown is modelled by

(14)

transitions are infinite server, since more VMs can be starting or shutting down at the same time.

Since users in private cloud pay for the energy required to run the VMs, the corre-sponding strategy aims at starting up the VMs as late as possible, and to shut them down as early as possible. Note also that the presence of startup times can lead to moments in which the system is unstable, that is number or requests arriving per second is greater than the one that the system can serve.

Starting

NewVM Ready VMs EndTS DeprovChk

Release Keep SuttingDown Off Load Decrease Increase

Fig. 7. Model for the description of the provisioning process in public clouds.

Case II: public cloud

Next, we describe VM provisioning when using public clouds (Fig. 7).

Work-load evolution and provisioning of VMs works exactly as for private clouds (Places Load, Starting and VMs, transitions Increase, Decrease, NewVM and Ready). De-provisioning instead, accounts for the fact that once a VM has been acquired, it is charged in a time-quantised way: for example, even if a VM is used for just a few minutes, it is charged for an entire hour. For this reason, once a VM has been started, it can be used to improve the performance of the application, even if not strictly necessary to support the current workload. The VM could then be released at the end of the billing period if the workload has not increased in the meantime. This process is modelled by the infinite server deterministic transition EndTS, whose firing time Tbillcorresponds to the length of the billing interval applied by the public

cloud provider. As soon as a VM is ready, every Tbill it is checked whether to be

kept or released: transition EndTS fires and moves the token corresponding to the VM in place DeprovChk to check whether it could be released or it should be kept. If

the current workload is less than λL(#VMs) (the inhibitor arc that connects to fluid

place Load), then the VM could be released and immediate transition Released

fires. If instead, the workload is greater than λL(#VMs), immediate transition Keep

fires (thanks to the test arc that connects it to fluid place Load). In this case, the token is immediately returned to place VMs, keeping the number of provisioned VMs constant. When a VM is released, a token is moved to place ShuttingDown where it remains for the shutdown time modelled by transition Off: this behaviour is identical to the one seen in private clouds.

(15)

3.4 Evaluation of the cloud resource management policies

We can now use the previously presented models to evaluate the related performance metrics. Interesting indexes are the energy cost (private cloud) or billing cost (public cloud case) that characterise a policy, the response time of MSs, and the stability of the system as of the workload of the system evolves. In particular, we feed the

sets of configurations S = {Ci} defined in Sec. 3.1 to define the thresholds of the

test and inhibitor arcs of the FPNs presented in Sec. 3.3.

Fluid transitions Increase and Decrease are programmed to mimic a realistic variable stream of requests. In particular, they follow the traffic data from the Olympic Web site from February 9, 1998 through February 16, 1998 as presented in

[11], and scale such trace with an appropriate constant to make it in the workload

range intended for the type of MBSA considered in this work. Figure 8 shows

(on the right axis) the number of requests per second Λ(T ) at time T used in our experiments. Let us call #P(T) the marking of place P (either fluid or discrete) at time T . The marking of the FPN models evolves such that:

#Load(T ) = Λ(T ) (5)

We start focusing on the private cloud case modelled in Figure 6. Figure 8 also

shows the number of VM required to handle the considered workload (marking #VMs(T)) when the setup and de-provisioning times of the VMs are supposed to be

negligible (Tup = Tdown = 0). The considered MBSA has been generated with the

parameters given in 3.1 and µ = 10. Following the discussion in 3.2, strategy A

uses the most number of VMs, while strategy B has a lower adaptation speed when the requests reduce.

Next we consider the effects of provisioning and release times by setting Tup= 40

0 5 10 15 20 0 2000 4000 6000 8000 10000 0 5 10 15 20 25 # VMs Workload [job/s.] Time [min.] Load A: VM B: MS C: MS, α=0.4

Fig. 8. The system workload coming from the Olympic Web site from February 9, 1998 through February 16, 1998 used as a guideline to define a variable workload (right axis), and the minimum number of VMs required according to the three considered autoscaling scenarios (left axis) in the private cloud scenario.

min. and Tdown = 2 h. Such values have been slightly enlarged compared to

actual durations that can be experienced in common cloud solutions, to emphasise

their effect. Results for the different autoscaling scenarios are shown in Figure 9

for the time range T ∈ [5000, 8000]. The curve called Target refers to the case

(16)

instead corresponds to the number of VMs currently running (#VMs(T )), while line Current refers to the total number of VMs that are currently consuming energy (#Starting(T ) + #VMs(T ) + #SuttingDown(T )). The presence of startup and

5 10 15 20 5000 5500 6000 6500 7000 7500 8000 # VMs Time [min.] Private cloud, A: VM, µ=10

Load Target Current Active

5 10 15 20 5000 5500 6000 6500 7000 7500 8000 # VMs Time [min.] Private cloud, B: MS, µ=10

5 10 15 20 5000 5500 6000 6500 7000 7500 8000 # VMs Time [min.] Private cloud, C: MS, α=0.4, µ=10

Fig. 9. Effects of VMs startup and shutdown times on the number of active and currently running VMs for the considered autoscaling scenarios.

shutdown time creates a delay which in some cases might make a VM available to support the MSs only after the workload has already reduced. For this reason, policies with a lower hysteresis, such as strategy B, provide a better support to the variable workload.

To further investigate the effects of the autoscaling strategies and the provisioning and de-provisioning times, we compute the average response time as function of time. In particular, we assume the MBSA as a separable queueing network, and we

(17)

compute its average system response time R(T ) as: R(T ) = n(i) X h=1 Nms X k=1 a(i)_kh· D_k m(i)_k 1 − #Load(T ) Nms X k=1 a(i)_kh· Dk m(i)_k , i = #VMs(T ) (6)

where the numerator computes the demand of each VM, and the denominator

ac-counts for one minus its utilisation. Results are shown in Figure10. The policy that

replicates each MS on a proprietary VM (strategy A) is the one that gives the users the best response time. On the contrary, the full consolidation policy (strategy C), is affected by the higher load VMs are experiencing providing poor performances, which in some cases might lead to system instability (shown in the Figure with vertical lines).

We then focus on the power consumption, applying the simple rule given in [9]:

0 5 10 15 20 25 30 0 2000 4000 6000 8000 10000 Response time [s.] Time [min.] A: VM B: MS C: MS, α=0.4

Fig. 10. Response time of a MBSA when run on a private cloud.

PV M = PIdle+ UV M · (PM ax− PIdle) (7)

where the power consumed by a VM is approximated as a constant contribution

(PIdle), plus a term that depends on the utilisation UV M and PM ax - the maximum

power that a VM can require. In this context we set PIdle= 16.5 Watt and PM ax =

34.5 Watt, we suppose that VMs starting or shutting down requires PIdleWatt, and

estimate the power consumption P (T ) of the MBSA at time T as:

P (T ) = u · PIdle+ #Load(T ) · (PM ax− PIdle) n(i) X h=1 Nms X k=1 a(i)_kh· D_k m(i)_k , i = #VMs(T ), u = #Starting(T ) + #VMs(T ) + #SuttingDown(T ) (8)

Results are shown in Figure 11. It is interesting to see that with the given values

(18)

3 3.5 4 4.5 5 5.5 6 6.5 7 0 2000 4000 6000 8000 10000 Power Consumption [KW.] Time [min.] A: VM B: MS C: MS, α=0.4

Fig. 11. Power consumption of a MBSA when run on a private cloud.

a higher power consumption with respect to the non-consolidate strategy, even if it uses a lower number of VMs.

Finally we study the public cloud case modelled in Figure 7. Figure 12 shows

the number of VMs required to support the MBSA. In particular, curve Active refer to the number of VMs that would be available using the same strategy as private clouds, while Pay Opt. shows the number of VMs that can be used without increasing the cost, by exploiting the time-slot based billing policy usually applied

by providers. In this study, we have set Tbill = 8 h., again slightly larger than the

one commonly used by cloud providers, to emphasise its effect. As it can be seen, in many cases keeping the VMs active until the next billing period, can leave the system ready to accomodate new workload fluctuations and generally give more resources than the one actually needed, allowing to provide a better quality of service.

Figure 13 shows the time that must be purchased from the provider (identified

by the term pay) in order to provide the required service. The minimum time that could be purchased, if charged at the seconds level, is instead shown by curves identified by run. As expected, actual purchased VM time grows as a step function, due to quantisation of billing periods. As expected, the full consolidation policy, strategy C, is the one that provides the least cost in term of VMs running hours, and the non-consolidating policy is the one that results to be the most expensive.

4 Conclusions and future work

In this paper, we propose a framework able to evaluate the performances, costs and energy consumptions related to the provisioning of a MBSA. The resulting model allows to study such a systems from both the service provider and infrastructure point of view. In particular, the autoscaling processes are described by pseudo-code algorithms that lead to evaluate different strategies. It emerges that consolidating strategy works better if the load is rather low or the number of MSs is relevantly high. Otherwise, non-consolidating policy has the advantage to supply more pre-dictable behaviours of the architecture.

The infrastructure analysis has been performed by utilisation the provisioning schemes with a FSPN where the available cloud resources and their management policy are considered. In order to evaluate the cost factors affected by resources

(19)

5 10 15 20 5000 5500 6000 6500 7000 7500 8000 # VMs Time [min.] Public cloud, A: VM, µ=10 Load Target Current Active Pay Opt. 5 10 15 20 5000 5500 6000 6500 7000 7500 8000 # VMs Time [min.] Public cloud, B: MS, µ=10 Load Target Current Active Pay Opt. 5 10 15 20 5000 5500 6000 6500 7000 7500 8000 # VMs Time [min.] Public cloud, C: MS, α=0.4, µ=10 Load Target Current Active Pay Opt. Fig. 12. L5.

utilisation, two cases have been compared: private clouds and public clouds. The derivation of interesting indexes, as the energy cost for the private cloud and billing cost for the public cloud case together with the workload of the VMs and the execu-tion time performed by the MSs, supplies an accurate analysis of MBSAs evoluexecu-tion. In the case of private clouds, the costs are mainly related to power concerns whereas in public clouds the number of VMs used per time billing unit is the dominant fac-tor.

We also evaluated a MBSA loaded with a realistic variable stream of requests, the traffic data from the Olympic Web site from February 9, 1998 through February 16, 1998. The proposed approach can be easily adapted to describe new scaling strate-gies and infrastructure cases, providing a flexible tool able to take into account many features of these complex systems.

(20)

0 500 1000 1500 2000 2500 3000 0 2000 4000 6000 8000 10000 VMs running hours Time [min.] A: VM (run) B: MS (run) C: MS, α=0.4 (run) A: VM (pay) B: MS (pay) C: MS, α=0.4 (pay)

Fig. 13. VM running hours for the different autoscaling strategies in public clouds.

References

[1] Microservices (a definition of this new architectural term), https://martinfowler.com/articles/ microservices.html, accessed: 2017-01-25.

[2] Alshuqayran, N., N. Ali and R. Evans, A systematic mapping study in microservice architecture, in: 2016 IEEE 9th International Conference on Service-Oriented Computing and Applications (SOCA), 2016, pp. 44–51.

[3] Butzin, B., F. Golatowski and D. Timmermann, Microservices approach for the internet of things, in: 2016 IEEE 21st International Conference on Emerging Technologies and Factory Automation (ETFA), 2016, pp. 1–6.

[4] Esposito, C., A. Castiglione and K. K. R. Choo, Challenges in delivering software in the cloud as microservices, IEEE Cloud Computing 3 (2016), pp. 10–14.

[5] Florio, L. and E. D. Nitto, Gru: An approach to introduce decentralized autonomic behavior in microservices architectures, in: 2016 IEEE International Conference on Autonomic Computing (ICAC), 2016, pp. 357–362.

[6] Gribaudo, M., M. Iacono and D. Manini, Performance evaluation of massively distributed microservices based applications, in: Proc. of the 2017 ECMS, 2017.

[7] Hasselbring, W., Microservices for scalability: Keynote talk abstract, in: Proc. of the 2017 ASMTA, 2016.

[8] Heorhiadi, V., S. Rajagopalan, H. Jamjoom, M. K. Reiter and V. Sekar, Gremlin: Systematic resilience testing of microservices, in: 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS), 2016, pp. 57–66.

[9] Ho, T. T. N., M. Gribaudo and B. Pernici, Characterizing energy per job in cloud applications, Electronics 5 (2016).

[10] Inagaki, T., Y. Ueda and M. Ohara, Container management as emerging workload for operating systems, in: 2016 IEEE International Symposium on Workload Characterization (IISWC), 2016, pp. 1–10.

[11] Iyengar, A. K., M. S. Squillante and L. Zhang, Analysis and characterization of largescale web server access patterns and performance, World Wide Web 2 (1999), pp. 85–100.

(21)

[12] Kang, H., M. Le and S. Tao, Container and microservice driven design for cloud infrastructure DevOps, in: 2016 IEEE International Conference on Cloud Engineering (IC2E), 2016, pp. 202–211.

[13] Kratzke, N., Automated setup of multi-cloud environments for microservices applications, in: Proc. of the 2016 IEEE 9th International Conference on Cloud Computing (CLOUD), 2016.

[14] Leitner, P., J. Cito and E. St¨ockli, Modelling and managing deployment costs of microservice-based cloud applications, in: Proceedings of the 9th International Conference on Utility and Cloud Computing, UCC ’16 (2016), pp. 165–174.

URLhttp://doi.acm.org/10.1145/2996890.2996901

[15] Melis, A., S. Mirri, C. Prandi, M. Prandini, P. Salomoni and F. Callegati, Crowdsensing for smart mobility through a service-oriented architecture, in: 2016 IEEE International Smart Cities Conference (ISC2), 2016, pp. 1–2.

[16] Pahl, C., A. Brogi, J. Soldani and P. Jamshidi, Cloud container technologies: a state-of-the-art review, IEEE Transactions on Cloud Computing PP (2017), pp. 1–1.

URLhttp://ieeexplore.ieee.org/document/7922500/

[17] samuel Kounev, Stochastic models for self-aware computing in data centers: Tutorial, in: Proc. of the 2017 ASMTA, 2017.

[18] Sousa, G., W. Rudametkin and L. Duchien, About microservices, containers and their underestimated impact on network performance, in: Proc. of the CLOUD COMPUTING 2015: The Sixth International Conference on Cloud Computing, GRIDs, and Virtualization, 2015.

[19] Toffetti, G., S. Brunner, M. Bl¨ochlinger, F. Dudouet and A. Edmonds, An architecture for self-managing microservices, in: Proceedings of the 1st International Workshop on Automated Incident Management in Cloud, AIMC ’15 (2015), pp. 19–24.

URLhttp://doi.acm.org/10.1145/2747470.2747474

[20] Ueda, T., T. Nakaike and M. Ohara, Workload characterization for microservices, in: 2016 IEEE International Symposium on Workload Characterization (IISWC), 2016, pp. 1–10.

[21] Villamizar, M., O. Garcs, H. Castro, M. Verano, L. Salamanca, R. Casallas and S. Gil, Evaluating the monolithic and the microservice architecture pattern to deploy web applications in the cloud, in: 2015 10th Computing Colombian Conference (10CCC), 2015, pp. 583–590.

[22] Villamizar, M., O. Garcs, L. Ochoa, H. Castro, L. Salamanca, M. Verano, R. Casallas, S. Gil, C. Valencia, A. Zambrano and M. Lang, Infrastructure cost comparison of running web applications in the cloud using aws lambda and monolithic and microservice architectures, in: 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), 2016, pp. 179–182.