A Least Flow-Time First Load Sharing Approach for Distributed Server Farm

(1)

A Least Flow-Time First Load Sharing

Approach for Distributed Server Farm

Zahir Tari, James Broberg, Albert Zomaya, Roberto Baldoni

^1,2,3,4

Abstract

The most critical property exhibited by a heavy-tailed workload distribution (found in many WWW workloads) is that a very small fraction of tasks make up a large fraction of the workload, making the load very difficult to distribute in a distributed system. Load balancing and load sharing are the two predominant load distribution strategies used in such systems. Load sharing generally has better response time than load balancing because the latter can exhibit excessive overheads in selecting servers and partitioning tasks. We therefore further explored the Least- Loaded-First (LLF) load sharing approach and found two important limitations:

(a) LLF does not consider the order of processing, and (b) when it assigns a task, LLF does not consider the processing capacity of servers. The high task size variation that exists in heavy-tailed workloads often causes smaller tasks to be severely delayed by large tasks.

This paper proposes a size-based approach, called the least flow-time first (LFF- SIZE), which reduces the delay caused by size variation while maintaining a balanced load in the system. LFF-SIZE takes the relative processing time of a task into account and dynamically assigns a task to the fittest server with a lighter load and higher processing capacity. LFF-SIZE also uses a multi-section queue to separate larger tasks from smaller ones. This arrangement effectively reduces the delay of smaller tasks by larger ones as small tasks are given a higher priority to be processed. The performance results performed on the LFF-SIZE implementation shows a substantial improvement over existing load sharing and static size-based approaches under realistic heavy-tailed workloads.

1 RMIT University, Australia

2 RMIT University, Australia

3 University of Sydney, Australia

4 University of Rome, Italy

(2)

1 Introduction

Load distribution improves system response time by making full utilisation of the available system resources in distributed systems, aiming to maximise the system throughput. Due to uneven task arrival, these resources (i.e. CPU, memory, hard-disk space and network interface I/O) are often poorly utilised.

Resource starvation caused by unbalanced load distribution results in poor system response time, and simply adding more resources is a short-term and expensive solution.

There is more and more evidence [16,11] showing a higher variability of task size distribution in computer loads than the traditionally believed exponential distribution. This type of task size distribution is called heavy-tailed distribution, where there are typically many short jobs and fewer long jobs. This distribution is defined as follows

P [X > x] ≈ x^−α, 0 < α < 2 (1)

where α represents the size variation. The variation of task size increases when α decreases. The empirically measured α value is typically 0.9 - 1.3 [2,9,7]. In practical terms, a heavy-tail task distribution has some very large tasks and a large number of small tasks. These few large tasks can contribute 50% of the total workload. With the presence of very large task sizes, the system load is even more difficult to balance. A very large task can cause the amount of load assigned to a server to be out of proportion to its processing capability. In addition, the high task variation in the heavy-tailed task distribution causes small tasks to be severely delayed by large tasks.

Previous work [18,12,17] showed that better performance can be obtained (in a heavy-tail distribution) by un-balancing the load. They assume that the task size is known in advance by the dispatcher which un-balances the load by assigning tasks based on the task size: larger and smaller tasks are assigned to different queues in such a way that mean waiting time per task is reduced. These approaches have an important limitation because they may assign tasks to heavy loaded servers. Therefore balancing the load amongst servers, in addition to the size-based assignment, may improve the overall performance.

There are two ways to balance the load: load balancing and load sharing. Load balancing assigns tasks proportionally to the processing time of the remaining tasks on servers [16][1]. Load sharing assigns a task to the least-loaded server first (LLF) [13][14][20]. As explained in Section 4.1, load balancing has drawbacks because (i) it has higher overheads for task assignment, and (ii) it has longer mean waiting time in the server queues. Load sharing has a bet-

(3)

ter performance than load balancing in the distributed system studied. The question is whether the LLF load sharing can be further improved in performance. Two limitations are identified. First, LLF does not consider the order of processing. Second, when it assigns a task, LLF does not consider the processing capacity of servers. This paper proposes a solution, called the least flow-time first (LFF-SIZE), which addresses these two limitations.

LFF-SIZE uses a multi-section queue to reduce the task delay caused by size variation. Larger tasks will be separated from the section of smaller tasks. As smaller tasks are given a higher priority to be processed, the task delay caused by larger tasks is greatly reduced. Also, by normalising a server load with the server’s processing capacity, more accurate information about the server load can be obtained and therefore performance gain can be made as the result of a balanced system load. LFF-SIZE dynamically assigns a task to the fittest server in terms of lighter load and higher processing capacity. Experimental tests were performed, and as detailed in Section 6, the results showed substantial performance improvement of LFF-SIZE approach over existing approaches (e.g. SITA-E [12]).

The rest of the paper is organised as follows. Section 2 reviews the recent work on load distribution in distributed server systems. Section 3 describes the formalisation of the basic load distribution model. Section 4 compares load sharing with load balancing and discusses the performance improvements by exploiting the limitations of LLF. A detailed description of LFF-SIZE is given in Section 5. Section 6 describes a series of experiments and demonstrates the performance improvement of LFF-SIZE. Finally we conclude with future work in Section 7.

2 Related Work

The most adverse effect introduced by heavy tailed task size distribution is the task delay caused by very large tasks. High size variation makes harder to balance the waiting time of tasks (in a queue). Recent work on load distribution focuses on methods to remove the delay caused by the size variation in a distributed server farm. In this section we focus on a just a few of these load distribution policies - more extensive reviews are available in [3,19].

M. Crovella et al.[8] proposed a task assignment policy to unbalance load, called SITA-V (Size Interval Task Assignment with Variable Load). SITA- V purposely un-balances the server load between two servers when the task variation is high (α<1): small tasks are assigned to the underloaded server and large tasks are assigned to the overloaded server. When the task variation is very high (α=0.5), SITA-V has a good mean waiting time per task: almost all tasks (0.9999 fraction of the tasks) are on an underloaded server and they are

(4)

quickly processed. However the mean slow down of SITA-V is not good as the very large tasks are processed on a heavy-loaded server. Better performance is only achievable when the task variation is very high. However when the task variation is either close to empirically measured task variation (0.9≤α≤1.3) or is not high (α>1.3), SITA-V exhibits poor performance, as the gain in the waiting time to process the majority of small tasks in the lightly-loaded server can no longer outweigh the cost of load un-balance on the heavily-loaded server.

Harchol-Balter et al. [11] proposed a sized-based load balancing approach, called SITA-E (Size Interval Task Assignment with Equal Load). SITA-E de- fines the size range associated with a server so that the total load assigned to every server is the same. If SITA-E is compared to the least-loaded first policy (LLF), we can observe that LLF policy is preferable over SITA-E when the task variation is not high. However SITA-E is preferable over the LLF when the task variation is close to empirically measured task variation (α≈1.1). The performance gain of SITA-E can only indicate that removing the delay caused by size variation provides a better performance than the LLF when the size variation is high. The performance improvement provided by a load-balanced approach cannot be ignored, which explains the better performance of LLF over SITA-E when task variation is low.

Another interesting load balancing approach is the global-class policy proposed by Awerbuch et al.[2]. This approach attempts to balance load by assigning a task to an idle server. One of the problems of this approach is that the tasks are not processed by the fittest server (which can complete a task in the shortest flow time). Unfortunately, processing a task on an idle server does not guarantee the shortest time. For example, with a server farm that has two servers with processing capacity ratio 1:3, a task with size 1 takes 1 unit of processing time on the low capacity server and 1/3 on the high capacity server. Assume the task is executed on the higher capacity server, the lower capacity server runs idle and another new task with the same size arrives. The mean flow time is either equal to 2/3 unit of time (if the task is executed on the idle lower capacity server) or 1/2 unit of time (if it is executed on the non-idle higher capacity server).

3 Basic Model

This section provides basic concepts which will be used in later sections (e.g.

Sections 4 and 5) to describe the proposed load distribution approach.

A heavy-tailed distribution was defined by Equation 1 in Section 1. The heavy tailed distribution has properties of (i) an infinite variance and infinite mean

(5)

when α≤1 and (ii) there are very few (that is <1%) very large tasks but they make up a large portion of the total load. If we assume that the task size distribution has a maximum value, then the task size can be modelled with a well known distribution, called Bounded Pareto distribution, which has the following probability mass function

f (x) = αk^αx^−(α+1)/(1 − (k/p)^α) (2)

where k≤x≤p, α represents the tail index, k is the smallest possible observation, and p is the largest possible observation [12]. Our implementation uses the Bounded Pareto distribution for the task size distribution for ease of analysis.

The architecture model for the proposed load distribution approach is de- picted in Figure 1. A central dispatcher manages load distribution based on specified policies. In fact it partitions the arriving tasks among the servers in the system. All tasks arrive at the central dispatcher at first. Each task corresponds to a request for a service provided by the system. There are also n mirrored servers which provide the same sets of service. The processing capability of each server is denoted as Pi. The task size follows heavy-tailed task size distribution, and tasks arrive at a rate following the Poisson distribution.

We assume the task size is known and it cannot be divided into parts. This is a fair assumption for some batch distributed computing servers, where a user assigns an upper bound to their task’s processing requirement. In a Web en- vironment (eg. serving HTPP requests for static web objects) the size of files can easily be known at the dispatcher (using Cisco’s LocalDirector [5], for example), enabling the dispatcher to approximate a task’s size upon arrival.

S S

P₁ P₂ P₃

3 2

S1

t t t

3 5

1

t t

4

t₂ 6

Dispatcher

Fig. 1. A Distributed Server Farm

The processing time of a task depends on the processing capability of a server.

The higher the processing capability, the shorter the processing time. The processing capability refers to the CPU frequency, memory size and buffer size of a server. The mean processing time (for a server Sk) is then defined by the following function

(6)

MeanProcessingTimek =^Pⁿ_i=1^k TaskSizei/(nk×ProcessingCapabilityk)

The flow time of a task in the queue of a server consists of the processing time and the waiting time before the task is processed. The waiting time of a task is the total processing time of preceding tasks in the queue. The flow time of a task i, and the mean flow time of n tasks in the queue are defined as,

FlowTimei = WaitingTimei + ProcessingTimei =^Pⁱ_k=1ProcessingTime_k

MeanFlowTime = (^Pⁿ_i=1 FlowTimei)/n = (^Pⁿ_i=1 (n-i+1)×ProcessingTimei)/n A task slows down in the queue as it waits for its turn to be processed. The slow downof a task is defined as the waiting time normalised by its processing time in the queue of a server. The slow down of a task and the mean slow downof n tasks in the queue is defined as,

SlowDowni = WaitingTimei/ProcessingTimei =

P_i−1

k=1 ProcessingTimek)/ProcessingTimei

MeanSlowDown = (^Pⁿ_k=1SlowDown_k)/n

=(^Pⁿ_k=1((^P^k−1_j=1ProcessingTime_j)/ProcessingTime_k))/n

The system response time starts from the time a task enters the system until the time it leaves. The system response time of a task consists of TaskAssig- Time and FlowTime. The task assignment time is the time taken by the load balancing algorithm to allocate the task to one of the participating servers, that is from the time the task arrives in the system to the time the task is assigned to one of the servers. TaskAssigTime depends on (i) the number of tasks arriving in a time interval and (ii) the complexity of the task assignment algorithm. The FlowTime of a task includes WaitingTime and ProcessingTime.

If we assume there are n tasks arriving in the i-server system in a period of time t, and the number of tasks assigned to the servers are k1, k2,· · ·,ki, then the average system response time of the task set is defined as,

TaskAssigTime(n) + AverageFlowTime

= TaskAssigTime(n) + [(^P^k_j=1¹ (k₁-j+1)×ProcessingTime_j)/k₁ + (^P^k_j=1² (k₂-j+1)×ProcessingTime_j)/k₂ + · · ·

+ (^P^k_j=1ⁱ (ki-j+1)×ProcessingTimej)/ki]

= TaskAssigTime(n) + (^Pⁱ_p=1^P^k_j=1^p (k_p -j+1)×ProcessingTime_j/k_p)/i

If there are n tasks of varied size in the queue of a server Siand the processing capacity of the server is Pi at time ti, then the load of the server at ti is defined as,

(7)

SeverLoadi = (^Pⁿ_k=1 TaskSizek)/Pi

Given the above distributed server system, we are interested in finding the best load distribution policy. Three load distribution policies are commonly used, i.e. random, size-based, and the least-loaded first.

• Random: Tasks are statically assigned to each participating server with equal probability.

• Size-based: In size-based task assignment, the task size of distribution is broken into a number of size ranges. Each participating server is designated to execute tasks of a certain range, and tasks are statically assigned based on the task size, e.g. SITA-V [18], SITA-E [12].

• The least-loaded first (LLF): dynamically assigns the task to the least loaded server of the participating servers. It requires the work load of the participating servers to be available to the dispatcher.

4 Load Distribution in Distributed Server Farm

As explained earlier, unbalancing load often does not provide good performance in practical computer systems. In an extended paper [19] we compared two global approaches, load sharing and the load balancing, to evaluate performance of different load distribution strategies in distributed server farm. In the empirically measured task size distribution, simulation results demonstrate that load sharing has a better response time than load balancing.

The static random policy has poor performance in mean waiting time, mean slow down and mean flow time compared to the other two dynamic policies, i.e. load sharing and load balancing. The performance of the random policy degenerates when the server load is very high (1.0≤ρ≤1.6). First due to heterogeneous processing capacity of servers, it should have assigned more tasks to the servers with higher processing capacity to fully utilise its resources.

Second due to the size variation in task distribution, it should have assigned tasks of different size with equal probability to the servers to avoid load imbalance among servers. The study also shows that the load balancing has a much higher overhead in task assignment (that is, a higher mean dispatch time) than the load sharing. The reason is that the load balancing involves sorts and searches in selecting servers and allocating tasks to each participating server.

Load sharing (LS) performs better in all metrics than the load balancing (LB).

To understand the result, we need to take a closer look at the way the load balancing assigns tasks. As described in [19], load balancing partitions the current set of tasks arriving in the system among all the participating servers.

(8)

If the system has more tasks than the number of participating servers, the load balancing allocates more than one task to each server at a time. As the tasks are processed in FIFO order in each server, they are queued at the server except for the first task. The tasks in the queue start waiting at the time when the first task starts being processed. Furthermore, the assignment overhead in the load balancing increases the probability to have more tasks buffered in the dispatcher each time the load balancing makes task assignment decision.

So load balancing has a much higher chance to assign a consecutive number of tasks to the same server each time than the load sharing does. This causes the tasks assigned by the load balancing to have a longer mean waiting time in the queues than those made by the load sharing.

The load sharing does not incur so much overhead in task assignment. Each task is assigned to the least loaded server in comparably much shorter time.

If it happens that a server is the least loaded for a consecutive number of tasks, the tasks arrive one after the other. The tasks start queuing when the preceding task is in the middle of processing. Therefore, the load sharing is a better approach than the load balancing in terms of system response time.

From here onwards, the research on the load balancing is not further studied and load sharing is therefore further investigated.

4.1 Improving Performance by Considering the Limitations of Load Sharing

¿From the performance results of the previous section, the least-loaded first (LLF) load sharing approach outperforms the load balancing approach by assigning a task to the server with a minimum waiting time. However, the question is to decide whether the least-loaded first (LLF) is the best load distribution policy in a heterogeneous server farm system. This section iden- tifies and explains in depth two limitations of the LLF load sharing approach.

First, LLF does not consider the order of processing. Also, when it assigns a task, LLF does not consider the processing capacity of servers. This section discusses the limitations and proposes solutions to address them.

(a) Optimising Local Scheduling: the Order of Processing. One of the limitations of LLF is concerned with task delay caused by size variation. This falls into queuing theory, and the optimal mean waiting time for a single queue is SRPF (Smallest Remaining Process First), which means that mean waiting time of tasks in a queue is optimal if the tasks are processed in a non-decreasing order [6]. There are at least three possible solutions to solve the task delay problem. One solution is to sort the tasks. As the tasks have to be processed on line, it is not acceptable to delay the tasks in the queue in order to process all the tasks allocated to a server in the non-decreasing order. So the ordering can only apply to the current set of tasks in the queue. However as tasks

(9)

arrive randomly, dynamic insertions of new tasks are required to maintain the order of the tasks in the queue. The cost of insertion can be linear with the number of tasks in the queue if a naive data structure is used. The insertion cost can be reduced with better data structures for the queue. However, the cost still incurs with each incoming task. The performance will be degraded if the insertion cost outweighs the saving of task ordering. In a heavy-tailed distribution, there are small tasks contributing to 80% of total task number, and the size variation among them is small. It is likely to incur a higher cost to order the large number of small tasks than the saving in processing them in non-decreasing order. Furthermore, there are chances that incoming tasks are smaller in size than the task currently being executed. If the task cannot be stopped once it starts executing (or task is non-preemptive), then it is still possible that small-size tasks are delayed by the large-size tasks.

The second approach is to preempt the tasks, which does not require ordering.

This approach can be implemented in several ways, such as time-sharing.

A processor allows a task to have a limit of CPU time. If the task can be completed within the time, it processes the next task in the queue. Otherwise, the task is suspended, the stacks and buffers associated with it are saved, and the task joins the end of the queue. When its turn comes to be processed again, the processor first loads the saved parameters into stacks and buffers and executes the remaining task. The cost of context-switch can be expensive.

It shall only be used when the saving in task preemption outweighs its cost.

In a heavy-tail task distribution, there are some very large tasks contributing to over 50% of total task size. The saving in mean waiting time can be huge to preempt the very large task in execution when there are a large number of small tasks in the queue.

The third approach allocates tasks based on task size. A number of slots are defined with low and high size boundaries. Tasks of similar size go to the same slot. Tasks are processed in the order from the slots of small size range to the slots of large size range. As the slots are generally much smaller than the number of total tasks, the size-based approach largely reduces the overhead of insertion per task, while it is able to reduce the delay caused by the large tasks. There are a number of ways to partition a size distribution. For instance, partition the distribution in equal size range, or partition it unequally. As the frequency of tasks is skewed, it is more efficient to partition in increasing size ranges than in equal size ranges.

(b) Optimising Global Scheduling: the Non-Optimal Processing Time.

The second limitation of LLF is related to the non-optimal task assignment in terms of processing time. The task assignment in LLF is only based on absolute server loads, and LLF assigns a task to the least loaded server. In a heterogeneous server farm, servers may have different processing capacities.

(10)

It takes relatively shorter to process a task on a server with higher processing capacity than on a server with a lower processing capacity. Obviously, the processing time of a task is optimal when it is processed on the server with the highest processing capacity in the server farm. However, the mean waiting time increases if all tasks are queued at the same server. Clearly there is a trade-off between the least loaded server and the most powerful server.

The main question is how to get the best out of the two performance parameters. As described in Section 3, the flow time of a task consists of the waiting time and the processing time. The flow time of a new task can be estimated based on the server loads and the processing capacities of servers before it is assigned. The mean flow time is optimal if each task is assigned to the server with the shortest flow time. The Least Flow-Time First (LFF) policy we propose dynamically assigns a task based on the remaining processing time of tasks on servers and the processing time of the current task if it were executed on each server. In this way it assigns the task to the server with the minimum flow time, and therefore optimises both the mean waiting time and the mean flow time.

5 The Proposed Model: the Least Flow-Time First Load Sharing Approach with Distributed Size-Based Multi-Section Queue

This section proposes a least flow-time first load sharing approach with distributed size-based multi-section queue (here-in-after called LFF-SIZE). As shown in Figure 2, a central dispatcher assigns a task to one of the participating servers based on the least flow time first (LFF) policy. A number of mirrored servers with processing capacity Pi receives tasks from the dispatcher and executes them by invoking the requested services. A size-based multi-section queue is implemented on each server. A task goes to one of the sections based on its size, i.e. the tasks between size 10ⁱ and 10ⁱ⁺¹ go into section i. Tasks are processed from sections in increasing size ranges, and tasks are processed in FIFO order within the same section. Task size is known and it follows the Pareto distribution, and task arrival follows the Poisson distribution. Tasks are indivisible, and non-preemptive. The main features of the proposed load sharing approach are described as follows.

1. Balanced load. Load is balanced by assigning a task to the f ittest server in the system. A server is the fittest if it can complete a task in the shortest flow time.

2. Optimise flow time by considering the servers’ processing capacity.

The task assignment policy ensures the flow time of a task is optimised on the server with lower load and higher processing capacity.

3. Size-based multi-section queue. A size-based multi-section queue buffers

(11)

P₁ P₂ P₃

S₁ S₂ S₃

Dispatcher

Fig. 2. A Load Sharing Approach with Distributed Size-Based Multi-Section Queue the tasks and put them in different sections of a queue based on size range.

Tasks are executed in the order of increasing size sections, and the tasks in the same size-section are executed in the first-in-first-out order (FIFO).

As small tasks are always processed ahead of large tasks, the size-based multi-section queue drastically reduces the delay caused by task variation.

4. Distributed. The proposed load sharing approach implements the size- based multi-section queue at each server. The tasks are pushed to each server according to the least flow time first policy (LFF) at the central dispatcher. The distributed arrangement allows tasks buffered in each server and immediately available to servers once they are idle. This reduces the elapsed idle time in fetching a new task across network each time a server completes its task.

5. Non-preemptive. We believe that preemption provides a limited performance gain over the cost of context-switch in a size-based load sharing approach. With the deployment of size-based multi-section queue, the delay caused by task variation has been greatly reduced. The preemption helps only in the situation where the remaining task in execution is higher in size range than a newly arrived task. As the small tasks contribute to more than 90% of total number of tasks in the heavy-tailed task distribution, the probability of the arrival of small tasks is much higher than that of large tasks. It is less likely to ever have a large size task to be processed ahead of a small size task.

6 Performance Evaluation of the Proposed Model

The section gives detailed description on performance evaluation. We first describe different load distribution policies used in the simulation, and later simulation results are evaluated and analysed. As shown in Figure 2, all tasks arrive at the central dispatcher in a server farm and then they are assigned to

(12)

participating servers based on different load distribution policies. The simulation settings are described in [19], along with additional results evaluating LLF against LFF, and comparing SITA-E variations. The proposal load distribution approach is compared to some related existing load distribution policies.

We summarise these load distribution policies as follows.

• Random: A base line static task assignment policy without knowledge of server load.

• SITA-E: A static task assignment policy based on task size. Each server is assigned with equal load. The policy has no knowledge of server load. Two variants of SITA-E are used in simulation. SITA-large2high is a SITA-E based policy, which assigns large tasks to high capacity server first. SITA- large2low is a SITA-E based policy that assigns large tasks to a low capacity server first.

• LLF: A dynamic task assignment policy based on server load.

• LFF: A dynamic task assignment policy based on server load and processing capacity of server.

• LLF-SORT: A LLF-based task assignment policy with partial sorting.

• LLF-SIZE: A LLF-based task assignment policy with distributed size- based multi-section queue.

• LFF-SIZE: The proposed policy. A LFF-based task assignment policy with distributed size-based multi-section queue.

6.1 Performance gain in size-based policies

This testing is performed to understand the relative performance improvement of size-based policies in heavy-tail task distribution. The size variation parameter (α = 1.1) is a typical value found in the practical computer systems. Three different size-based algorithms are used to compare with the non size-based least loaded first policy. The novice LLF-SORT does not perform well due to the partial ordering of task size and its high overhead (see Fig- ure 3). Both LLF-SIZE and SITA-large2low achieve better mean waiting time and mean slow down than the LLF. This suggests it is crucial to reduce the delay caused by the size variation in heavy-tailed task distribution. Notice that SITA-large2low has a very poor mean flow time compared to other polices as shown in the bottom-right graph of Figure 3. By comparing its mean slow down to its mean flow time (see the bottom graphs of Figure 3), it is clear that SITA-large2low has an inferior processing time. The reason is because loads are not balanced with the SITA-large2low. As SITA policies assign tasks based on task size, a task is determined to be processed on one of servers regardless of the status of server load in the system. Even though there are lightly-loaded servers in the system, SITA policies may still assign tasks to the highly-loaded servers! LLF-SIZE has a superior mean wait, flow and slow-

(13)

down time compared to SITA-large2low. The result shows that a size-based load sharing policy outperforms a static size-based policy. The size-based load sharing policy, i.e. LLF-SIZE, outperforms the non size-based load sharing policy, i.e. LLF in all performance metrics (see Figure 3). This agrees with the previous analysis that the size-based load sharing approach provides additional performance improvement to the load sharing approach.

0.5 1.0 1.5

Server Load 50000

100000 150000

Mean Waiting Time

Server Load (Alpha = 1.1)

LLF SITA-large2low LLF-SIZE LLF-SORT

0.5 1.0 1.5

Server Load 0

100 200 300

Mean Slow Down

0.5 1.0 1.5

100000 150000 200000

Mean Flow Time

Fig. 3. Size-based Policies

6.2 The better performance of the proposed LFF-SIZE policy

It has been shown in the previous testings that performance improves by 1) selecting the optimal server according to the least flow time first policy, and 2) using multi-section queue to reduce task delay. This testing is to verify whether further performance gain is possible by combining these two strategies. Two size-based load sharing policies are compared, the LFF-SIZE according to the least flow time first and the LLF-SIZE according to the least loaded first.

Figure 4 shows that the LFF-SIZE outperforms the LLF-SIZE in performance metrics. The results agree with the previous analysis.

(14)

0.5 1.0 1.5 Server Load

10000 20000 30000

LLF-SIZE LFF-SIZE

0.5 1.0 1.5

Server Load 0

20 40 60 80 100

Mean Slow Down

LLF-SIZE LFF-SIZE

0.5 1.0 1.5

20000 30000 40000 50000

Mean Flow Time

LLF-SIZE LFF-SIZE

Fig. 4. LFF-SIZE vs. LLF-SIZE

6.3 The impact of size variation on algorithms

All the previous results use a fixed size variation value, i.e. α=1.1. To understand the impact of size variation on different algorithms, a range of α values (1.1 ≤ α ≤ 2) is used in this testing. Three policies are selected, i.e. LLF, LFF-SIZE, and SITA-large2high. Two policies are size-based, and the LLF serves as a bottom-line comparison. The simulation was performed under a medium system load (System Load = 0.8). In Figure 5, the graphs presented in the left column show the effect of task size variation on mean waiting time, mean flow time and mean slowdown. The graphs in the right column show the effect of additional servers on the same metrics.

The top-left graph shows that mean waiting time increases for SITA-large2low when size variation decreases (as α increases) under moderate system load.

The reason is that the majority of tasks increases in size (although average task size is still the same) when size is less variable, which increases the mean waiting time. The graphs presented on the right-hand side shows that mean waiting time, mean flow time and slowdown steadily decrease for LLF and LFF-SIZE as the amount of servers increase, with LFF-SIZE performing better in all metrics.

(15)

The size-based policy experiences a significant reduction in slowdown as the size variation decreases, but is still easily outclassed by the non-size-based LLF, which also decreases in mean slow down (see the middle-left graph in Figure 5). LLF has a poor slow down when size variation is high (α≈1.1), compared to LFF-SIZE. This in turn indicates that removing size variation is crucial when task size distribution exhibits a high variability.

The bottom-left graph in Figure 5 show that load sharing policies keep mean flow time constant when size variation changes, while SITA-large2low deceases sharply when size variation drops. It suggests that SITA policies fail to perform when size variation is high. The performance gain from balancing load alone is quite obvious when size variation is high. Again it explains why SITA does not perform well. The reason is that the load imbalance caused by a few very large tasks costs too much in processing time! The graph also show that all three policies begin to converge in mean flow time when size distribution becomes less variable. This suggests that the size-based approach makes no difference in performance when task size distribution exhibits small variation.

7 Conclusion

This paper investigated load distribution strategies in a distributed server farm. It cleared the doubts about relative performance between load sharing and load balancing. Performance tests conducted with the empirically measured task size distribution shown that load sharing has a better response time than load balancing. The inferior performance of load balancing is caused by the overhead in server selecting and task allocation, and the tendency of assigning tasks in group which is often the case when tasks are buffered in the dispatcher due to the overhead of assignment.

Further studies identified two limitations of the existing load sharing approach, the least loaded first approach (LLF). That is, 1) LLF does not consider the order of processing, 2) When it assigns a task, LLF does not consider the processing capacity of servers. The size variation causes smaller tasks to be severely delayed by larger tasks, and a task size based approach improves performance by reducing the variation. A size-based load-balanced approach has a better performance than a static size-based approach. The performance gain is made by differentiating server loads. By removing the other limitation of LLF, both mean waiting time and mean slow down can be further improved by assigning a task to the fittest server in terms of lighter server load and higher processing capacity.

One of the limitations of the proposed approach is the static partition of the size range in the multi-section queue. Although the increasing size range

(16)

1.2 1.4 1.6 1.8 Alpha

20000 40000 60000 80000 100000 120000

Alpha Size Variation (System Load = 0.8)

SITA-large2low LLF LFF-SIZE

2 4 6 8

Server Number 10000

20000 30000 40000 50000

Server Number Variation (System Load = 0.8, Alpha = 1.1)

LLF LFF-SIZE

1.2 1.4 1.6 1.8

Alpha 0

50 100 150 200

Mean Slow Down

2 4 6 8

Server Number 0

20 40 60 80 100

Mean Slow Down

LLF LFF-SIZE

1.2 1.4 1.6 1.8

Alpha 50000

100000 150000 200000

Mean Flow Time

2 4 6 8

Server Number 10000

20000 30000 40000 50000

Mean Flow Time

LLF LFF-SIZE

Fig. 5. Effect of Size Variation

partition effectively reduces the delay of the given size variation, it is not adaptive if the task size distribution dynamically changes its lower and upper bounds in great value. This problem may arise in a multiple-service distributed server system. For example, a distributed file server system provides access to text/audio/video files. Each class of files may exhibit a heavy-tail distribution with very different mean file size (actually the mean file sizes are infinite).

A fixed increasing size range partition to allocate mixed file types does not reduce the delay caused by the size variation well. With a single partition, a wide size range slot is expected to hold small video files. However larger video files also fall into the same slot due to the wide range. One possible way to

(17)

solve the problem is to treat the task size distribution of different service type differently. Use a multi-section queue for each service type.

8 Acknowledgment

This project is fully supported by the ARC Discovery Grant no. DP0346545 awarded by the Australian Research Council (ARC) for 2003-2005. We also acknowledge the work of Ling Tan from Monash University, Australia on this paper.

References

[1] E. Haddad, “Load Balancing and Scheduling in Network Heterogeneous Computing.” Chapter 7 in Heterogeneous Computing, Artech House, 1996.

[2] B. Awerbuch, Y. Azar, S. Leonardi, O. Regev, “Minimizing the flow time without migration,“ Proceedings of ACM Symposium on Theory of Computing (STOC), 198-205, 1999.

[3] J. Broberg, Z. Tari and P. Zeephongsekul “Task Assignment based on Prioritising Traffic Flows”, RMIT Technical Report TR-04-05, Available at http://www.cs.rmit.edu.au/reports/tr.html

[4] Gianfranco Ciardo and Alma Riska and Evgenia Smirni, “EQUILOAD:

a load balancing policy for clustered web servers”, Journal of Performance Evaluation, Volume 46, Pages 101-124, 2001.

[5] Cisco Systems, “LocalDirector”, Documentation available at http://www.cisco.com/en/US/products/hw/contnetw/ps1894/index.html.

[6] R.W. Conway, W.L. Maxwell, and L.W. Miller, “Theory of Scheduling.”

Addison-Wesley Publishing Co, 1967.

[7] M. Crovella, M. Taqqu and A. Bestravos, “A Practical Guide to Heavy Tails”, Chapter 1 in Heavy-Tailed Probability Distributions in the World Wide Web, Pages 3-26, Chapman & Hall, 1998.9

[8] M. Crovella, M. Harchol-Balter, and C. Murta, “Task Assignment in a Distributed System: Improving Performance by Unbalancing Load.” Proc. of ACM Sigmetrics Conference on Measurement and Modelling of Computer Systems, June 1998.

[9] M. Crovella and Azer Bestavros, “Self-Similarity in World Wide Web Traffic: Evidence and Possible Causes,” Proceedings of ACM SIGMETRICS International Conference on Measurement and Modelling of Computer Systems, May 1996.

(18)

[10] M. Harchol-Balter and A.Downey, “Exploiting Process Lifetime Distributions for Dynamic Load Balancing,” Proceedings of 1996 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems.

[11] M. Harchol-Balter, M. Crovella, and C. Murta, “On Choosing a Task Assignment Policy for a Distributed Server System,“ 10th International Conference on Modelling Techniques and Tools for Computer Performance Evaluation, Lecture Notes in Computer Science, No. 1469, September 1998.

Journal version: Journal of Parallel and Distributed Computing, vol. 59, no. 2, pp. 204-228, Nov 1999.

[12] M. Harchol-Balter, “Task Assignment with Unknown Duration,” 20th International Conference on Distributed Computing Systems (ICDCS ’00), Taipei, Taiwan, April 2000.

[13] C.C. Hui and S.T. Chanson, “Hydrodynamic Load Balancing.” IEEE Transactions on Parallel and Distributed Systems, 10(11), Nov 1999.

[14] O. Kremien and J. Kramer, “Methodical Analysis of Adaptive Load Sharing Algorithms,” IEEE Trans. on Parallel and Distributed Systems (PDS), 3(6), 1992, pp.747-760.

[15] S.S. Lavenberg, “Computer Performance Modelling Handbook.” Academic Press, 1983.

[16] K. Park, G.T. Kim, and M.E. Crovella, “On the Relationship Between File Sizes, Transport Protocols, and Self-Similar Network Traffic.” Proc, of the Fourth International Conference on Network Protocols (ICNP), pp. 171-180, October 1996.

[17] B. Schroder and M. Harchol-Balter, “Evaluation of Task Assignment Policies for Supercomputing Servers: The Case for Load Unbalancing and Fairness”, Proceedings of the 9th IEEE Symposium on High Performance Distributed Computing, Pages 211-220, 2000.

[18] K.G. Shin, C.J. Hou, “Design and Evaluation of Effective Load Sharing in Distributed Real-Time Systems,” IEEE Trans. on Parallel and Distributed Systems (PDS), 5(7), 1994, pp.704-719.

[19] Z. Tari, J. Broberg, A. Zomaya and L. Tan “A Least Flow-Time First Load Sharing Approach for Distributed Server Farm”, RMIT Technical Report TR-05-01, Available at http://www.cs.rmit.edu.au/reports/tr.html

[20] S. Zhou, X. Zheng, J. Wang and P. Delisle, “Utopia: A Load Sharing Facility for Large, Heterogeneous Distributed Computer Systems,” Software- Practice and Experience, 23(12), Dec. 1993, pp. 1305-1336.