PhD Activities Report

(1)

PhD Activities Report

Author : Florin-Adrian Sp˘ataru

Supervisors: Prof. Univ. Dr. Dana Petcu West University of Timisoara Assoc. Prof. Dr. Laura Ricci University of Pisa

Timi¸soara and Pisa 2021

(2)

(3)

Chapter 1 CloudLightning Contributions

The first two years of activity have focused on Cloud flexibility and efficiency and important contributions have been made in the scope of the CloudLightning Horizon 2020 project. The majority of the effort has focused on designing, implementing and integrating the Gateway Service components with the rest of the system. Additionally, the author has contributed to the design of the Self-Organising Self-Managing Framework for resource management, as well as experimenting with different Coalition Formation Strategies. At the end of the project, the author has collaborated with the Technical Leader of the project to design, implement and execute series of experiments in order to measure the overhead and performance of our system.

1.1 Gateway Service

The Gateway Service is a collection of components allowing for the definition, composition, optimization and deployment of HPC Services using the Cloud paradigm. The key contributions are:

• the modelling of infrastructure (VMs, Containers, Bare Metal servers, hardware Acceler-ators), services and the relationships between them using the TOSCA specification • the modelling of abstract services which can be instantiated by different explicit

imple-mentations through the process of Service Optimization

• the implementation of a User Interface by extending the Alien4Cloud platform with plugins that allow for the Service Optimization process and deployment on CloudLightning infrastructure

• the design of the specification and protocol, as well as the implementation for the Service Optimization process, allowing for communication between the Gateway and the resource manager.

• the implementation of the CloudLightning Orchestrator, which is able to deploy Applica-tions composed of Services using heterogeneous resources (e.g. a VM Service linked to a Container Service which makes use of a hardware accelerator).

This work has been published as a chapter in an open access book [1]. All components have been published as open-source software on bitbucket [gateway-bitbucket]. The Gateway Service allows Application Developers publish Service Definitions and requirements. The End User is able to select and link several of these services in an Application. Depending on the parameters chosen by the End User (cost, performance) and the state of the system (load, energy efficiency), a scheduling system will recommend the placement of the services on the infrastructure. The Gateway then proceeds with deploying the Application services and informs the user about operational metrics: service status, service endpoints, credentials.

(6)

Several components are required to manage the life-cycle of an Application. A User Inter-face allows for the management of service definitions and application deployments. A Service Portfolio provides the means for storing Service Definitions (e.g., requirements, dependencies) and Application Topology definitions, consisting of one or multiple Services and the relationship between them. An Application Developer (AD) is using this component to store such definitions, which can later be used by an Application Operator (AO) to create a new Application Topology or deploy a version designed by the developer.

Figure 1.1: Gateway Service components and interactions

A novel concept of Application Abstraction allows a user of the platform to define an Applica-tion Topology consisting of Abstract Services. These kind of Service DefiniApplica-tions are abstracApplica-tions of the explicit Service Implementations, defining only dependencies on other services, but no requirements on the hardware type or accelerators. The Service Optimization Engine (SOE) allows for the inspection of the Service Portfolio in search for the explicit implementations and provide the SOSM System with a Blueprint of all combinations of implementations for the Application. After choosing the most suitable resources for a Blueprint, the user is presented with the explicit Application Topology. This topology is deployed by an Orchestrator, which manages the life-cycle of the Application.

The migration of HPC-aware services to Cloud Deployments introduced challenges concern-ing their performance, encapsulation, and definition. There is little support for Applications that use heterogeneous resources, and it usually involves the selection of a specific version of software that works with a specific model of hardware accelerator. The End-User is responsible for managing both software and resource selection, which may lead to over-provisioning and conflicts in software-to-software and software-to-hardware communication.

The Gateway Service Components presented in this Chapter provide the unification of the solutions to the identified challenges. It provides a Service Portfolio, for storing service defini-tions, and an architecture defined by CloudLightning Base Types, which defines the guideline for defining HPC-like services in a portable manner. It facilitates the design of Applications and Application templates through an easy to use interface, and manages the deployment of

(7)

1.2. SELF-ORGANIZING SELF-MANAGING FRAMEWORK 3

heterogeneous Services both in terms of hardware (conventional machines, accelerators) and encapsulation (i.e., Bare Metal, VM, and Container).

The challenges are overcome through several processes. First, Services and dependent libraries are packed as VM or Container images by an Application Developer, which can experi-ment with different configurations and determine the appropriate hardware characteristics for a given performance level. Second, we leverage the portability of the TOSCA specification to define the relationships between services, conventional infrastructure and hardware accelerators. Third, through the means of the Resource Discovery and Service Optimization processes, we improve the flexibility of Application design and performance (for the End-User) and the flexibility of Resource Selection (for the Cloud Service Provider).

1.2 Self-Organizing Self-Managing Framework

The motivation for designing a hierarchical system is to divide the search space for finding a subset of specific components on the bottom level of the hierarchy. This level is represented by computing infrastructure and the purpose of the management components (in the intermediate levels) is to reduce the space for reaching specific compute resources. Each management com-ponent possesses two types of Strategies: self-management and self-organizing, which dictate the actions needed to move the component closer to an individual goal. The key contributions of this chapter are:

• the design of a novel, generic, framework for self-organization and self-management in hierarchical systems; this includes mechanisms for communicating the desire of the top level components down the hierarchy and to assess the state and efficiency of bottom level components up the hierarchy.

• the design and implementation of two Coalition Formation strategies for the bottom level of the hierarchy, responsible for matchmaking Service requirements with physical machines capabilities.

• the experimental evaluation of a self-organizing self-managing resource manager prototype using open source trace data; our system outperforms the original resource assignments (from the data set) in terms of resource utilization.

The work on the resource management platform has been published in the proceedings of the ISPDC conference [2]. The work on coalition formation strategies has been published in the proceedings of CCGrid and ICA3PP conferences, respectively [4, 7].

1.3 Framework

Our proposed framework is visually represented in Figure 1.2. Generally, actions taken by a management component will affect the state of the components in the neighbouring levels. Some actions are communicated to a component down the hierarchy in order to update individual goals with the purpose of aligning them with the global goal. Such an action is further referred to as Impetus and the process of transmitting Impetus is referred as Directed Evolution. Since underlying components posses individual goals, the Impetus will generally be integrated taking them into account. The Impetus is transmitted as a vector of Weights.

A management component evaluates the actions taken by underlying components by receiv-ing a vector of Metrics, which offer the managreceiv-ing component a Perception of the evolution

(8)

Figure 1.2: Proposed Framework

taking place in the underlying levels. These metrics will further influence the Directed Evolution actions. Finally, the computing infrastructure properties and state must be assessed by the managing components. Our architecture considers the use of Assessment functions which can be weighted corresponding to an individual component’s goal, and thus determine the performance of the underlying infrastructure.

Management Components engage in Self-organization in order to minimize the manage-ment cost of the level they are a part of. The Self-organization process takes place within a single level, and can have as outcome any of the following operations: component creation, destruction, splitting and merging, transfer of underlying nodes between components. We define an individual goal of the managing components as reaching an equilibrium state. This state is reached when the Directed Evolution actions, transmitted from a superior level, result in no significant variation in the Perception of the inferior components. To determine the amount of contribution a component is providing we use the notion of the Suitability Index. Therefore, the individual goal of each managing component is to maximize the its Suitability Index, and take actions that result in a greater Suitability Index for the managed components.

1.3.1 Coalition Formation Strategies

In Figure 1.3 we present a mechanism to aggregate historical execution data and create coalitions that may be useful for satisfying future requests. An Aggregator will consume the tasks request history and create histograms specific to a Coalition Formation Strategy. Another instance of the Aggregator will consume the tasks usage history and determine the usage for each machine. The resources are then filtered and machines which have available slots can participate in Coalition Formation for the next epoch. When a resource request reaches a bottom level component, a solution is searched within the formed coalitions. If no suitable solution is found, then coalitions can be enlarged with resources from other coalitions.

(9)

1.4. SYSTEM PERFORMANCE MEASUREMENT 5

Figure 1.3: Data processing for Coalition Formation

Size Frequency Similarity is a Coalition Formation Strategy which uses the size of previ-ouslycreated coalitions to determine the most frequent coalition cardinality. The more frequent a coalition size, the more coalitions of this size will be created. We assume that a server can only join one coalition in a given epoch. However, considering containerization/virtualization of the resources in a coalition, it can assign multiple services on the same coalition, given that the requirements do no exceed the capacity of any server. To achieve this, the strategy must employ coalition formation and selection procedures which aggregate usage information in order to filter out unavailable servers

In a similar manner, clusters can be constructed considering Constraint Frequency Similarity and in relation to recently requested constraints in task requirements. For example, in a dataset describing physical machine from a Google data centre, resources present 42 attributes, a subset of which have values in the realm of hundreds. [reiss2011google]. Using all of them in order to compute similarity will prove not only computing inefficient, but also be yielding small and very similar clusters which do not represent improvement in finding a suitable coalition for specific jobs.

1.4 System performance measurement

The CloudLightning system has been evaluated through the means of experiments on a small-scale testbed installed at Norwegian University of Science and Technology (NTNU). The con-tribution of this work is twofold:

• presents a detailed visualization of the inner workings of the resource management system, on a small scale;

• presents evidence that the system does not impact the execution of Services in any significant aspect of their performance.

A detailed inspection of the Suitability Index values behind the decisions taken by the SOSM System reveals the flexibility of the System with respect to Service delivery. The evaluation has shown the ability of the System to maximize the user’s performance endeavor, while optimizing the goal of the System to efficiently manage its resources and power consumption.

(10)

To summarize the experiments carried on, the only overhead incurred by the CloudLightning system is the delay due to provisioning the resources, which accounts for around 8 seconds (on the small scale testbed), no matter which service is deployed. We reckon that this delay is a small cost compared to the facilities offered by our platform, namely: automatic selection and deployment of a variety of implementations, resource provisioning (and management), and service discovery. Alternative solutions generally require the End-User to manually provision the resources, deploy the services and link them (in case of multiple dependent services).

Except for the provisioning delay (which may also be experienced on other Cloud platforms) our platform accounts for no other negative effect on application performance (execution time) or power consumption, when running the applications in Containers. All evaluation results are available as open-source data on bitbucket [bbevaluationresults].

(11)

Chapter 2 Cloud Decentralization

Starting with the enrolment at University of Pisa, the author focused on decentralized protocols for Cloud operations. The first focus was on the implementation of a decentralized scheduling mechanism based on Ethereum Smart Contracts and investigate the operational constraints and the gas cost for managing resource assignments. This has been published in the proceeding of the BCCA conference [6], it received the Best Paper award, and has the following novel contributions:

• an architecture for a decentralized Cloud platform making use of the Gateway Service presented in Chapter 1 which connects to an Ethereum Smart Contract for resource selection and payment.

• a study on the impact of four scheduling methods regarding transaction cost and latency. • an investigation of the constraints under which asynchronous interaction with the Smart

Contract can take place and offers design principles for maintaining it.

• an experimental evaluation using a scenario built from Cloud usage traces, which allows for a deeper investigation of the cost and latency in a real world setting.

The most efficient approach is to read the resource information from the Smart Contract and apply a selection optimization function Offline and ask the Cloud Contract to assign a specific resource. This method is significantly improved if the clients synchronize their decisions to limit the chance for selecting the same resource. Applying a simple optimization function in the Smart Contract proves 6.21 times more expensive than the offline synchronized variant, in our experiments. Indeed, depending on the size of the set of resources, the Contract variant may cost up to 50 times more than the offline version, in terms of gas, for 800 resources. This also increases the latency of the system, because few transactions are mined in each block. The inconvenience of the Offline Selected method is that conflicts with other users selecting the same resource may occur. In the experimental evaluation the number if rejected transactions because of this reason is more than half of all transactions. This can be mitigated by having the users synchronize their decisions through an external component.

In order to maximize the number of transactions that can be sent asynchronously, order must be maintained in the Service Instance array of each user. However, this implies a high cost for terminating instances, since all following instances needs to be copied to fill up the gap. We identified this cost to grow with 27880 gas units (accounting for 0.0006 ETH = 0.06 USD) for each instance that needs to be copied. Moreover, the Reverse Order removal reduces the impact of this cost by not copying Service Instances if they need to be removed in the same time frame. If the Ethereum public network is used to implement the presented system, the cost for running a Service will consist of 0.53 USD for the gas usage plus the price of the resource

(12)

for a given amount of time. The gas price alone is accounting for the equivalent to renting a n1-standard-1 Virtual Machine (1 vCPU, 3.75 GB) on Google Compute Engine for 12 hours. If we consider the price for a resource on our platform to be half the price of Google, then a Service must have a run time of at least 24 hours before starting to benefit of the reduced price offered. Services with a shorter run time will pay more for the gas than for the actual resource utilization. Latency will also be substantially higher because the block rate is slower and the amount other transactions is higher. A better variant is to crate a new network where the only transactions are related to the Cloud platform. The latency of this system will be higher than in our experiments, depending on the number of resources and Cloud Contracts.

2.1 A fault tolerant decentralized Cloud

The final year of the studies has focused on the decentralized management of a public Cloud platform aggregating privately owned resources, which ensures Application Continuity in the presence of Service failures, Component Continuity in case of Management Component failure and ensures a fair price is paid in the case of node failures which cannot be mitigated. This effort has materialized with a journal article, published in Scalable Computing: Practice and Experience [8]. The key contributions are:

• an architecture allowing for the decentralization of the resource registration and assign-ment, using Smart Contracts

• the design of Component Administration Networks and their corresponding protocols, which act as a bridge between the Smart Contract World and the Software World and ensure Component Continuity by assigning them work, saving checkpoints, and monitoring their availability.

• a model for fault tolerant Application Orchestration using Component Administration Networks which also ensures that the user is taxed only for the amount of time a Service has executed;

In Figure 2.1 we present the proposed architecture. Central to this is a public Blockchain capable of Smart Contract execution. Although any such Blockchain can be used, we are constructing our protocols for the Ethereum network. The following Smart Contracts are to be deployed on the Blockchain:

1. Registry Contract – this is the entry point to our system. It contains information about registered Clouds and their status as well as registered Services and Applications.

2. Cloud Contract – contains information about resources, price, Cell Manager and Plug&Play Service endpoints

3. Application Contract – one is created for each deployed Application and is used to track status and payments.

The Gateway Service does not need to hold a priori knowledge of any Scheduler instance. It can read what Clouds are registered in the Registry Contract and is able to contact them. Service and Application catalogues are also stored on in the Registry Contract. This allows any Gateway Service instance to have access to the same information, which in turn allows the decentralization of this component. The user is not required to reach a node where the Gateway is located, as he/she can also run it locally.

(13)

2.2. COMPONENT ADMINISTRATION NETWORKS 9 Orchestration Service Cloud 1 PUBLIC BLOCKCHAIN Physical Resources Hypervisor Layer

OpenStack Marathon Bare Metal

Registry

Cloud 2 Cloud 3

Cloud1Cloud2Cloud3 _App1 _App2 _App3

SOSM SOSM SOSM

Gateway Service Orchestrator UI Service Optimization Engine Component Administration Network PnP PnP PnP

Figure 2.1: Augmented decentralized architecture

The Orchestration process is decoupled from the Gateway. The nodes running the Or-chestration Service make use a replicated data storage mechanism provided by a Component Administration Network (CAN) that ensures the availability of the Orchestration process and continuity of deployment steps. The CAN nodes are responsible for the availability of the Orchestration Service.

The Plug and Play Service used for registering resources is augmented with Blockchain reading and writing capabilities, acting as an interface between the resources and the Blockchain. The Resource Manager is also capable to deregister a resource if deemed unavailable.

2.2 Component Administration Networks

A Component Administration Network has a two-fold purpose. First, it bridges together the land of Smart Contracts with the land of Software Components. Second, it provides the means for monitoring and enforcing a set of replicas in order to tolerate faults. The network of nodes implements a replicated state machine which has two functions. First, it maintains a ledger of transactions related to the network of nodes and the supervised components, different than the Ethereum ledger. Second, the nodes execute a replicated file system to store data associated with the supervised components.

Figure 2.2 presents the layered architecture of this proposal. On the bottom layer, there is the peer to peer network that collaborates for maintaining the Ethereum Blockchain, where the Registry Contract is deployed. Some of these nodes can be part of the Component Administra-tion Network.

The middle layer is concerned with operations for managing the nodes. This is the first layer where we identify the leader of the network, which is responsible for ordering and validating all transactions related to the CAN. The replica nodes will accept any state update from the leader. For this layer we propose the following protocols:

• Join – protocol for a new node to join the network

• RemoveNode – protocol for removing a node that has been discovered to be faulty • LeadElect – protocol for electing a new leader if the current one has been discovered to be

(14)

Figure 2.2: Layered architecture of a Component Administration Network

The top layer is concerned with the administration of Components. Again, the CAN leader is responsible for ordering and validating state updates at this level. This layer is concerned with the following protocols:

• Register – a component gets registered with the network

• Deregister – a component has been unresponsive and is removed • AssignWork – a component is assigned work

• CheckpointWork – a component requests the network to store a Checkpoint • ReassignWork – a component has been removed and work is reassigned to another • FinishWork – a component is requested to terminate the execution of a given piece of work

(termination of a Cloud Application).

2.3 Fault tolerant Orchestration

Using the concepts defined in the previous section, Orchestrators are components which reg-ister with the Component Administration Network. Work is represented by the Application Contracts, which the Orchestrator will read and perform the deployment.

Figure 2.3 presents the deployment continuity of an Application composed of two Service (e.g. Ray Tracing UI and Engine) with Orchestrator failure. After the successful resource discovery process the End-User requests the Cloud Contract to create an Application Contract. the Orchestration CAN (OCAN) leader is then informed about this contract. Alternatively, the leader can subscribe to Ethereum events and get notified when a new Application Contract is created. In both cases the leader will select an Orchestrator replica, broadcast an AssignWork transaction and inform the assigned Orchestrator.

The Orchestrator reads the content of the Application Contract and proceeds with deploy-ment. After each service deployment the Orchestrator will initiate the checkpoint mechanism and will save information about the relationship of a service definition (from the Application Contract) to a service instance (unique identifier used by the Hypervisor). In this manner, if

(15)

2.3. FAULT TOLERANT ORCHESTRATION 11

Figure 2.3: Example deployment continuity with failing Orchestrator

the assigned Orchestrator fails during the deployment of multiple services, another replica can use the checkpoints.

Additionally, the Orchestrator will issue a checkpoint at at intervals set in the Application Contract. This checkpoint is used to collect payments, proof that all Services executed for the set interval. When a service fails, a checkpoint is made and a set number of redeployments are tried. If the service can be redeployed then a checkpoint is made. If the service cannot be redeployed then it must be a problem with the Cloud and a forced shut down of the Application is requested to the leader. The leader will call the corresponding function on the Application Contract.

(16)

The OCAN leader is responsible to update the Application Contract with checkpoint prop-erties, like checkpoint type, timestamp, proposer; the actual checkpoint data can be further inquired from the OCAN network. The OCAN leader can also request payment of the registered checkpoints periodically. For this, the Application Contract must be filled with currency by the End-User. Nevertheless, the End-User will only pay for the actual time the services have been running, based on the checkpoints.

During the execution of the Application several entities dedicate their resources to enforce the fault tolerance of the System and Application, and therefore need to be reimbursed. The entities are the resources, the Cloud managing the resources, the Component Administration Network and the Orchestrator(s). We consider a method for making interim payments in order to lower the burden of computing all payments in a single transaction which may run out of gas. This reseach focused on a decentralized, fault tolerant mechanism for running Cloud Ap-plications on privately owned resources. The architecture of the CloudLightning framework is augmented to achieve decentralization and fault tolerance of the Application and Orchestration Service. A Blockchain capable of Smart Contract Execution is considered to intermediate payments and hold information about the entities of the System.

(17)

Chapter 3 Further contributions

The author also contributed to research in directions different than the PhD studies.

3.1 Electricity consumption prediction

Predicting the consumption of individual customers using machine learning techniques requires a lot of time due to the size of the data and the increasing number of customers connected to the smart grid. One solution to avoid individual predictions is to cluster customers together based on similar patterns. We investigate the efficiency of using cluster information derived from our proposed Adaptive DBSCAN to predict individual consumption. We compare the results against standard ARIMA and the best found seasonal ARIMA model. Results on real-life data show an average deterioration of 30% with respect to the MAPE of the best found model when having enough clusters and using their center as baseline prediction models. The results of this study yielded the publication of a conference article at ISGT-Europe [5].

3.2 Genetic algorithms for weather prediction optimization

A study that investigates the use of genetic algorithms in conjunction with the WRF - Weather Research and Forecast numerical weather prediction system in order to optimize the physical parametrization configuration has been developed to improve the forecast of two important atmospheric parameters: 2 meter temperature and relative humidity. Our research showed good results in improving the average prediction error in limited amount of iterations and this could prove helpful in building GA optimized ensemble forecasts, especially when focusing on specific atmospheric parameters. The optimization process performed well in finding optimal physical configurations for humidity prediction, but showed poor results for temperature forecast, more experiments need to be conducted in order to have a clear view over the utility of using GA techniques for physical parametrization optimization. The paper has been published to the SYNASC 2016 conference [3].

3.3 Data-Centric programming models

During his work on the ASPIDE Horizon 2020 Project, the author has contributed to a paper [9] that presents the main features and the programming constructs of the DCEx programming model designed for the implementation of data-centric large-scale parallel applications on Ex-ascale computing platforms. To support scalable parallelism, the DCEx programming model employs private data structures and limits the amount of shared data among parallel threads. The basic idea of DCEx is structuring programs into data-parallel blocks to be managed by a large number of parallel threads. Parallel blocks are the units of shared- and distributed-memory

(18)

parallel computation, communication, and migration in the memory/storage hierarchy. Threads execute close to data using near-data synchronization according to the PGAS model. A use case is also discussed showing the DCEx features for Exascale programming.

(19)

List of Publications

[1] Ioan Dragan, Teodor-Florin Fortis,, Marian Neagul, Dana Petcu, Teodora Selea, and Adrian

Spataru. “Application Blueprints and Service Description”. In: Heterogeneity, High Per-formance Computing, Self-Organization and the Cloud. Springer, 2018, pp. 89–117.

[2] Christos Filelis-Papadopoulos, Huanhuan Xiong, Adrian Spataru, Gabriel G Casta˜n´e, Dapeng Dong, George A Gravvanis, and John P Morrison. “A generic framework supporting self-organisation and self-management in hierarchical systems”. In: Parallel and Distributed Computing (ISPDC), 2017 16th International Symposium on. IEEE. 2017, pp. 149–156. [3] Liviu Oana and Adrian Spataru. “Use of genetic algorithms in numerical weather

pre-diction”. In: Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), 2016 18th International Symposium on. IEEE. 2016, pp. 456–461.

[4] Teodora Selea, Adrian Spataru, and Marc Frincu. “Reusing Resource Coalitions for Efficient Scheduling on the Intercloud”. In: Cluster, Cloud and Grid Computing (CCGrid), 2016 16th IEEE/ACM International Symposium on. IEEE. 2016, pp. 621–626.

[5] Adrian Spataru and Marc Frincu. “Using cluster information to predict individual cus-tomer consumption”. In: Innovative Smart Grid Technologies Conference Europe (ISGT-Europe), 2017 IEEE PES. IEEE. 2017, pp. 1–6.

[6] Adrian Spataru, Laura Ricci, Dana Petcu, and Barbara Guidi. “Decentralized Cloud Scheduling via Smart Contracts. Operational constraints and costs”. In: The International Symposium on Blockchain Computing and Applications (BCCA2019). 2019.

[7] Adrian Spataru, Teodora Selea, and Marc Frincu. “Online Resource Coalition Reorganiza-tion for Efficient Scheduling on the Intercloud”. In: InternaReorganiza-tional Conference on Algorithms and Architectures for Parallel Processing. Springer. 2016, pp. 143–161.

[8] Adrian Sp˘ataru. “Decentralized and fault tolerant Cloud Service Orchestration”. In: Scalable Computing: Practice and Experience 21.4 (2020), pp. 709–725.

[9] Domenico Talia, Paolo Trunfio, Fabrizio Marozzo, Loris Belcastro, Javier Garcia-Blas, David del Rio, Philippe Couvee, Gael Goret, Lionel Vincent, Alberto Fernandez-Pena, Daniel Martin de Blas, Mirko Nardi, Teresa Pizzuti, Adrian Spataru, and Marek Justyna. “A Novel Data-Centric Programming Model forLarge-Scale Parallel Systems”. In: EuroPar Workshop LSDVE. 2019.

PhD Activities Report

PhD Activities Report

Table of Contents

Chapter 1

CloudLightning Contributions

Chapter 2

Cloud Decentralization

Chapter 3

Further contributions

List of Publications