Decentralized Cloud Computing Doctoral Thesis

(1)

Decentralized Cloud Computing

Doctoral Thesis

Author : Florin-Adrian Sp˘ataru

Supervisors: _{Prof. Univ. Dr. Dana Petcu}

West University of Timisoara Assoc. Prof. Dr. Laura Ricci University of Pisa

Timi¸soara and Pisa 2021

(2)

(3)

Abstract

This thesis proposes a decentralized Cloud platform, where Services can use infrastructure and hardware accelerators from a network of private compute resources managed through Smart Contracts deployed on a public Blockchain. This is achieved by leveraging and extending existing open-source technologies for Cloud Orchestration and designing a decentralized resource selection mechanism and scheduling protocols for resource allocation. The decentralized resource selection mechanism is facilitated by a Smart Contract on a public Blockchain and a Self-Organizing Self-Managing Resource Management System. The main contribution of this part is the investigation of the operational constraints and costs associated with outsourcing the selection logic to the Smart Contract. Moreover, the selection mechanism is optimized to increase the throughput of scheduling decisions.

The platform allows for the definition, composition, optimization, and deployment of Cloud Services. It improves the state of the art by allowing a user to design applications composed of abstract services, for which explicit implementations and resources are selected by our platform, depending on the user constraints and resource availability. This contribution improves the flexibility of both the Cloud user and the Cloud provider, allowing for a more efficient Cloud. The novel resource management framework allows for the self-organization of collaborative resource management components. Each component makes use of a Suitability Index in order to take self-organization decisions and guide resource requests to the most suitable resource given the system state. Experimental evaluation shows that a small provisioning delay is incurred by our System compared to traditional on-premises deployment, yet no significant degradation can be observed in relation to the performance of the Services under investigation.

Another important contribution is the concept of Component Administration Networks, designed to monitor and enforce a set of replicas that deal with storing system state data that can be used to recover in case of component or service failure.

The integrated results of the thesis allow for the creation a free, decentralized, market for Cloud Applications to be deployed on different types of infrastructure. The resources are managed efficiently, according to high level business metrics, and the fault tolerance of Applications is enforced.

Keywords: Distributed Systems, Cloud Service Orchestration, Blockchain, Smart Contracts

(4)

(5)

List of Figures

2.1 Internet of Things Architecture . . . 5

2.2 Credibility based fault tolerance . . . 7

3.1 CloudLightning Components Architecture . . . 16

3.2 Gateway Service components and interactions . . . 17

3.3 Accelerated Services Examples . . . 21

3.4 Example Service requirements and performance configuration . . . 23

3.5 Load CloudLightning plugin . . . 23

3.6 Load CloudLightning CSAR . . . 24

3.7 Application creation . . . 24

3.8 Topology editor interface starting from an Application template . . . 25

3.9 Example instantiations for the Abstract RayTracing Application. . . 25

3.10 Resource Discovery and Service Optimization Interface . . . 26

3.11 Ray Tracing Application deployment . . . 27

3.12 Operational metrics retrieved from Ray Tracing deployment . . . 27

3.13 CloudLightning Data Model . . . 28

3.14 Gateway-SOSM Resource Discovery protocol . . . 29

3.15 GW-SOSM Resource Release protocol . . . 30

4.1 Generic hierarchical architecture . . . 38

4.2 Proposed Framework . . . 39

4.3 CloudLightning hierarchical architecture . . . 46

4.4 Data processing for Coalition Formation . . . 46

4.5 Size of jobs and resource utilization . . . 50

4.6 Coalition Discovery Success with and without Reordering . . . 51

4.7 Coalition Discovery Success with and without Reordering – Job scheduling success 52 4.8 Scheduling success . . . 53

4.9 Average Machine Query steps until finding all requested resources . . . 54

4.10 Average Coalition Query steps until finding all requested resources . . . 54

4.11 Resource utilization for SOSM management and vRM evolution . . . 55

5.1 Example Service performance selection . . . 58

5.2 Step 1: Suitability Index. . . 60

5.5 Step 8 re-deployment: Suitability Index. . . 62

5.6 CPU utilization for the CPU-only genomics application . . . 63 vii

(10)

5.9 Genomics CPU utilisation and power consumption without and with SOSM. . . 64

5.10 Oil and Gas CPU Execution . . . 65

5.11 Oil and Gas Use Case GPU Execution Patterns. . . 65

5.12 CPU and Virtual Memory utilization for the CPU-only Ray Tracing application 66 5.13 MIC utilization for the MIC Ray Tracing application . . . 66

5.14 Power Consumption for RayTracing use case . . . 67

5.15 CPU-based BLAS execution comparisons . . . 67

5.16 GPU-based BLAS execution comparisons . . . 68

6.1 Components Architecture for the proposed Decentralized Cloud . . . 70

6.2 Index Order and Reverse Order termination example . . . 73

6.3 Client view and actions for terminating instances by filling the gap with the last instance . . . 75

6.4 Assignment and termination of 100 Service Instances on 100, 200, 400, and 800 resources:First Match (blue), Best Match (orange), Offline Selected (green) and Finish (red). . . 76

6.5 Experiment Methodology . . . 77

6.6 Application concurrency for chosen data subset . . . 78

6.7 Cumulative scheduling and finishing events . . . 79

6.8 In depth view of the cumulative scheduling and finishing events . . . 79

6.9 Cost for accepted assign transactions (blue), rejected assign transactions (red) and finish transactions (orange) . . . 80

6.10 Box Plot for Start and Finish Latency for average block interval = 4s . . . 81

7.1 CloudLightning Components Architecture . . . 84

7.2 Augmented decentralized architecture . . . 86

7.3 Layered architecture of a Component Administration Network . . . 89

7.4 Size of a CAN for different join and failure probabilities for Nmax = 10 . . . 91

7.5 Size of a CAN for different join and failure probabilities for Nmax = 100 . . . 92

7.6 Example deployment continuity with failing Orchestrator . . . 95

(11)

List of Tables

2.1 Compute infrastructure available in Clouds . . . 4

2.2 Peer to peer applications . . . 6

2.3 Consensus algorithms . . . 8

2.4 Blockchains properties . . . 10

2.5 Blockchain based Technologies for Cloud Services . . . 12

2.6 iExec entities . . . 13

3.1 Deployment Strategies for the different types of resources . . . 29

3.2 TOSCA Interfaces mapping to Brooklyn commands . . . 32

4.1 Framework Terminology . . . 40

4.2 Server capacities . . . 50

4.3 Job size histogram . . . 50

4.4 Machine properties with respect to CL framework . . . 55

5.1 Service Implementations present in the Service Portfolio . . . 58

6.1 Cost in gas units (ETH) . . . 76

6.2 Job size distribution . . . 78

6.3 Average Cost per Service instance in ETH (USD1) . . . 80

(12)

(13)

Chapter 1 Introduction

During the rise of the Internet, fellow researchers have envisioned a system of interconnected computers, structured as a peer to peer network, where each one is able to make its content accessible by the others, without the presence of a centralized server. Although this remains valid, the majority of today’s Internet services are designed as centralized applications, which store and process impressive amounts of data in warehouse-scale computing centers. This has been advanced by the field of Cloud Computing, where resources in large data centres are rented to users. Cloud Service Providers are generally considered trusted actors that provide Quality of Service mechanisms.

In this context, peer to peer systems have continued to be used for file-sharing services and volunteer computing applications. However, the field of Internet of Things has introduced a need for storage and computational infrastructure which needs to be close to the devices at the edge of the network. Using a peer to peer network to provide these needs can help the sensors and devices to access computational and storage resources closer to their location. This, in turn, would reduce network congestion by sending the data close to the source for preprocessing, rather than sending it to the Cloud directly. Additionally, it can provide an economic return for the owner of the infrastructure which hosts such Services.

Lately, the emergence of Blockchain technologies, such as BitCoin [70] and Ethereum [16], has created economic incentives for participation in a globally distributed network of computers. Blockchains construct a public immutable ledger of transactions which serves two main purposes: transparency and auditability. The Ethereum network consists of over 25, 000 nodes that collaborate to store and update the Ethereum Virtual Machine – a global replicated state machine capable of executing arbitrary code instructions assembled into a Smart Contract. The state machine is updated by transactions that are organized into blocks on a Blockchain. This, in turn, requires each node participating in the network to execute all transactions in a block locally, which hinders the performance of the system. The advantage is that each node can access the state machine information locally.

In order to limit the misuse of the platform, each operation has a cost and storing data is the most expensive one. Therefore, a Blockchain must not be used as a means to store high amounts of data, but rather the minimum amount of necessary information needed to ensure the Application and Business logic. Hence, even if the Blockchain paradigm provides a decentralized mechanism for advancing its state, users must rely on external components to create a decentralized application. This opens a market for providing the storage and computational power needed to create a fully decentralized application. Several companies are trying to fill this gap and are presented in the next section.

This thesis proposes a Decentralized Cloud Platform, where Virtual Machine Instances, Containers, and accelerators like Graphic Processing Units (GPU), Many Integrated Core (MIC) cards or Data Flow Engines (DFE) can be provisioned from a peer-to-peer network maintained using Smart Contracts deployed on a public Blockchain. In order to achieve this, we leverage existing open-source technologies for Cloud Orchestration and design a Decentralized Resource Selection mechanism and Scheduling Protocols for resource allocation. The Cloud Orchestration

(14)

software provides facilities for defining and deploying services that depend on infrastructure and on other services.

The Decentralized Resource Selection mechanism is facilitated by a Smart Contract on a public Blockchain and a Self-Organizing Self-Managing Resource Management System. The Smart Contract exhibits operations for registering/removing worker nodes, and for creating Application Contracts. At first, we inspect the operational constraints and costs associated with outsourcing the selection logic to the Smart Contract. Then, we devise a selection mechanism that takes place outside the Smart Contract, which accepts an assignment based on resource reservation promises signed by the resource manager.

The main contributions of this thesis are:

1. The implementation of the Gateway Service, which bundles together several components which allow for the definition, composition, optimization, and deployment of Cloud Ser-vices which use HPC infrastructure like GPUs, MICs or DFEs. Additionally, the flexibility of the Cloud Service Provider is enhanced by allowing a User to design an Application composed of Abstract Services, for which an implementation is chosen based on the current state of the resources managed by the Cloud.

2. The design of a resource management framework which allows for the self-organization of collaborative management components which have individual goals. This Self-Organizing Self-Management (SOSM) framework allows for setting and updating global goals which are transmitted down the hierarchy, influencing the behaviour of the components down-stream in order to align with the global objective. The system is experimentally evaluated with respect to the Service optimization process and decisions made by the SOSM frame-work, and with respect to the overhead incurred by our System.

3. The design of a decentralized Cloud platform managing privately owned resources and ensuring fault tolerance in case of Service or Component failure. Component fault tolerance is enforced through a novel concept of Component Administration Network, which is responsible of monitoring components state, assigning them work and saving checkpoints related to their work. Then, Service fault tolerance is managed by an Orchestrator Component which saves checkpoints related to the Service state.

The thesis is organized as follows. Chapter 2 presents a survey of the literature related to Cloud Services, Peer to Peer networks, consensus protocols and Blockchain technology. Section 2.4 presents the current effort in the same direction as our thesis.

Chapter 3 presents the implementation of the Gateway Service components and provides a comparison with current approaches for Service Delivery. Chapter 4 presents the SOSM Framework and two predictive methods to create coalitions of resources that can be used to ease the resource selection process. The framework is evaluated by experimentation on an open source trace data set. Chapter 5 presents an experimental evaluation of the CloudLightning System on a small scale testbed.

Chapter 6 investigates the operational constraints associated with outsourcing the resource selection mechanism to a Smart Contract. It provides both a static analysis of the cost and a dynamic analysis over the cost and latency, using the same open source data set. Chapter 7 constructs a decentralized, fault-tolerant Cloud platform based on the results of the previous chapter. It defines the concept of Component Administration Networks and presents protocols for joining, leaving, and monitoring the network and components which are managed by the network. Finally, conclusions are given in Chapter 8.

Acknowledgment

The work presented in Chapters 3, 4, and 5 has been funded by the European Union Horizon 2020 Research and Innovation Programme through the CloudLightning project under Grant Agreement Number 643946.

(15)

Chapter 2 State of the Art

This chapter surveys the literature outlining the technologies and principles required to build a decentralized Cloud.

First, we explain the concept of Cloud and what type of computing infrastructure is offered. We then define integration between Clouds, Edge and Fog layers in the Internet of Things field. Next we outline the concept of peer to peer systems and show examples which are providing a computation framework similar to the Cloud.

Second, we show how Distributed Ledgers provided a novel method for achieving consensus in the context of a replicated state machine, and analyze several implementations. Next, we outline several peer to peer systems capable of offering computing infrastructure by using a tamper-proof ledger to make scheduling decisions.

Finally, we outline existing opportunities and challenges in the creation of a Cloud system able to store data and reason about peer resources, performing scheduling in a decentralized fashion.

2.1 Cloud Computing

NIST defines Cloud Computing as “a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction” [63].

Cloud Computing has lead to advancement in the field of computer applications, reducing the time-to-market through ease of deployment, management and cost reduction. Most Cloud Service Providers are delivering Software as a Service solutions such as DataBases, Message Queues, MapReduce clusters and Machine Learning frameworks.

Clouds are increasing the spectrum of capabilities they are offering, adding Graphic Process-ing Units (GPUs), Many Integrated Cores units (MICs), and lately Field Programmable Gate Arrays (FPGAs). Nevertheless, these resources are only available to experienced users because they require in-depth configuration of the application to communicate with the accelerators.

In Table 2.1 we outline the computing facilities offered by Cloud Service Providers (CSP) to be integrated in consumer applications. Amazon Web Services (AWS), Google Compute, and Microsoft Azure are leading in terms of customers, infrastructure and service variety. Some other cloud providers that fall under our Common interpretation are IBM Cloud and Oracle Cloud. Instead, other Cloud Providers, such as Digital Ocean, that offer only Virtual Machine instances are omitted from this analysis.

Docker [64] is a technology that performs virtualization at the operating system level using the Linux kernel for resource isolation and namespace isolation to limit the view on the operating system. In contrast to Virtual Machines, which include a separate operating system, Containers are built on layers which may be shared by similar applications. This enables a higher flexibility with respect to the number of applications that can run on a server or inside a Virtual Machine.

(16)

Table 2.1: Compute infrastructure available in Clouds Compute type Support Details Virtual Ma-chine Very Com-mon

The virtual machine is the most common compute infrastructure, including a separate operating system than the host.

Container Common A process running in a different namespace, isolated from other applications running on the same host.

GPU Common GPUs can be attached to VM instances on-the-fly.

FPGA AWS AWS offers a development kit for writing FPGA

images

Serverless Common The Serverless computing paradigm allows users to trigger code in response to events.

Docker also provides a public Container Registry, called Docker Hub1 _{where developers can host}

their Docker Images. Additionally, registries can be installed on-premises in order to reduce bandwidth and download time. Thus, Cloud Service Providers have stepped up to providing support for containers which can be run on demand. Additionally, they provide container registries for fast deployment inside the data centres.

Kubernetes [14] is an open-source resource management platform that allows the scaling and automation of container applications. Most Cloud Providers that offer containers also offer the possibility to create a Kubernetes cluster to better manage application elasticity. A similar open-source technology, Mesos [48], has a master-agent architecture with a pull-based mechanism for scheduling application containers.

A new paradigm called Serverless Computing allows cloud users to be charged on the basis of lambda instructions, overcoming the server configuration and management overhead while also reducing costs of operation, being taxed at request level [94].

The most common storage service is Block Storage available at all Cloud Providers. This storage is used for virtual machine and container instances. Additionally, users can mount external volumes which are independent of any such instance. Some CSPs have augmented the storage service to ease the management overhead of the consumer. They offer solutions for SQL, NoSQL and BigData use cases through automatic configuration and backup of the database system. They also offer orchestration for distributed File Systems such as Hadoop DFS, offering data-locality awareness to MapReduce and Spark applications. Message brokering is another service offered by large cloud providers. It offers the infrastructure to store and relay messages to be processed asynchronously.

2.1.1 Clouds in the Internet of Things

The increased connectivity and computational capabilities displayed by small devices lead to the emergence of the concept of Internet of Things (IoT).

The majority of IoT research divides the processing spectrum in three layers with limited computation units at the edge of the network and the data centers at the top, as depicted in Figure 2.1.

The field of Edge Computing is concerned with migrating some of the application logic and data processing to the nodes which are close to the source of the data. The main reason for this is to avoid the network congestion generated if all devices would transfer all data directly to the Cloud for processing. For example, most Smart Meters transmit the total consumption over the

1

(17)

2.2. PEER TO PEER SYSTEMS 5

Figure 2.1: Internet of Things Architecture

last 15 minutes, but this degrades the resolution and thus the amount of intelligence that can be gathered about energy consumption patters.

The gap between the edge devices and the Cloud is closed by the field of Fog Computing. Any computational unit capable of storing and processing data can be a fog node for the local network of devices. Thus computation can be offloaded to nearby laptops or desktops which are capable to store and analyze private information or perform intense computations with the help of the graphical processing unit. Another scenario is the usage of fog nodes for content delivery purposes.

2.2 Peer to peer systems

The concept of Peer to Peer (P2P) has been made popular by file sharing services, beginning with Napster in 1999. In general, a peer to peer network is composed of nodes which play both the role of a client and the role of a server, contributing their resources (storage, bandwidth, computation) to the network. In Table 2.2 we list several peer to peer applications and some of their usage scenarios.

File sharing peer to peer applications are incentivizing their users to share their storage resources using a reputation based mechanism. If a node is downloading more data than it shares with its peers it will be forbidden from other downloads until his reputation gets back over a given threshold. Desktop grids usually provide resources voluntarily for the advancement of science. Digital currency peer to peer systems are rewarding the nodes which provide a solution to a cryptographic puzzle in order to advance the state of the underlying transaction history.

2.2.1 Desktop grids

We further analyze the Desktop grids environment, presenting some of the challenges with regard to the correctness of a task execution. Desktop grids are peer to peer networks where resources are shared for scientific applications during the idle time of the computer. Usually during times when the screen saver is active, the node requests and processes some data. Some notable Desktop Grids are BOINC (Berkeley Open Infrastructure for Network Computing) [2]

(18)

Table 2.2: Peer to peer applications

Application Examples Usage

File sharing Bittorrent, Gnutella, IPFS

Content delivery networks, software distribution (Linux, games)

Anonymity I2P, Tor Anonymous and Censorship resistant protocols

relay user traffic through several peers out of tens of thousands using end-to-end encryption

Desktop grids BOINC[2], XtremWeb [37], SETI@home, Folding@home

Peer-to-peer file transfers and volunteer computa-tion for the scientific community.

Distributed Ledgers

BitCoin, Ethereum, etc.

The rise of digital currencies gives possibility to any peer to impose the order of transactions (mine a block) and append it to the distributed ledger..

and XtremWeb [37]. BOINC supports diverse applications, including those with large storage or communication requirements. XtremWeb, on the other hand, focuses on distributing Java and MapReduce jobs.

Another perspective is given by a cooperative computing grid where users “submit ap-plications for execution on a system that is composed of resources from at least three types of infrastructure: Internet Desktop Grids (IDG), Best Effort Grids (BEG) and Cloud” [66]. Each computing resource can be characterized by computing power, price, power efficiency, and security. This framework considers a centralized scheduling platform which is able to perform resource allocation based on multiple criteria such as cost, security parameters, and execution latency. For example, in order to be confident in the result of a processing task, a volunteer node will not be enough because of its lack of credibility. Therefore the platform will need either to duplicate the task on several volunteer nodes until the result receives enough votes, or to schedule the task on a single Cloud node which can be trusted.

Sarmenta [81] investigated the problem of Sabotage tolerance with respect to volunteer computing system and introduced the concept of Credibility based fault tolerance. The paper defines four types of credibility: of the worker, of the result, of a result group, and of the work entry. The credibility of a worker is estimated based on the number of spot-checks it passed (i.e., the number of times it has provided a correct result), and the expected number of faulty workers, f . A new worker will have a small credibility level which will increase as the number of contributions to correct results increase. The credibility Cr(X), where X can be one of the last three types is calculated as the conditional probability that given the current state of the system, object X will yield a correct result.

In Figure 2.2 we present the original example [81] of how credibility is calculated inside a worker pool. In this example any work entry is considered accepted if it has the credibility greater than θ = 0.999, the fraction of faulty workers is assumed to be f ≤ 0.2. A work entry has the credibility of the underlying Result Groups. A Result Group is a table where each row contains the result (res), the worker id (pid ) and the credibility of the worker. Each worker has a credibility computed based on the number spot-checks (k).

Work entry number 0 has only one result, provided by a new worker. This does not have enough credibility, so another at least one more result must be provided. In the case of Work entry 1 there are two result groups, because two workers provided different results. Work entry 998 also has two result groups, but one of them achieved enough credibility to be considered

(19)

2.3. DISTRIBUTED LEDGERS 7

the correct one. Worker P 8, which provided the bad result, will be punished by reducing the number of spot-checks passed. Work entry 999 achieves enough credibility from a single result because the credibility of the worker alone achieves the minimum credibility threshold.

Figure 2.2: Credibility based fault tolerance

2.3 Distributed Ledgers

A Distributed Ledger is a distributed state machine which is replicated across a network of peers using a Consensus Protocol, without the need for centralized storage or management. The most popular example of a Distributed Ledger is a Blockchain, where transactions a organized in blocks which are linked through a cryptographic hash, forming a chain. The alternative to a Blockchain is a Transaction Directed Acyclic Graph (TDAG), where transactions are linked to other transactions instead of being organized into blocks. Using a TDAG only ensures partial ordering, since no conclusion can be drawn about the precedence of transactions on different branches of the graph. Although this is also true for unrelated transactions that are part of the same block.

The majority of Blockchain implementations are for crypto-currency systems and only allow the transfer of money, and eventually a small scripting language to take care of conditionals. A Smart Contract is piece of code that allows for arbitrary computation on the distributed state machine, including loops and calling other Smart Contract functions. This paves the way to Decentralized Applications, which have the replicated data locally when requiring to process it. However, the transition for the Smart Contract state has to be validated and appended into a block, which leads to long confirmation times.

In the following we analyze some of the most popular Consensus Protocols and instances of Distributed Ledgers.

2.3.1 Consensus protocols

The development of algorithms for building resilient servers and distributed systems through replication starts with work of Lamport on Byzantine agreement [56, 75], and evolved over the years, and is well summarized in [23]. The problem of achieving consensus in a group of nodes can rely on two parts: “(1) a (deterministic) state machine that implements the logic of the

(20)

service to be replicated; and (2) a consensus protocol to disseminate requests among the nodes, such that each node executes the same sequence of requests on its instance of the service.” [18] In Table 2.3 we present the families of consensus protocols used in replicated state machines and distributed ledgers.

Table 2.3: Consensus algorithms

Name Description

Paxos Family of protocols offering crash fault tolerance using two phase communication: Proposal and Acceptance.

Proof of

Work

Family of protocols aiming to prevent DoS attacks and spam by requiring the requester to solve a computational puzzle.

Proof of

Stake

Leader election protocol, where the stakeholders are selected based on their number of shares.

Proof of

Elapsed Time

Achieved by verifiable waiting a random amount of time before creating the next block

Proof of

Burn

Alternative to Proof of Work in which “miners” provide a proof that they sent some money to an unspendable address.

Proof of

Useful Work

Family of Proof of Work protocols aiming at providing solutions to actual problems: Traveling Salesman instances, gaps between large prime numbers.

Avalanche Family of protocols build on a metastable mechanism with strong probabilistic safety guarantee in the presence of Byzantine adversaries. Hashgraph

Virtual Voting

Peers randomly send the list of known transactions to other peers, as well as the lists sent to them before. When more than 2/3 nodes know about a transaction, it becomes persistent.

Paxos is a family consensus protocols for ensuring the progress of a replicated state machine in the absence of failures. The basic protocol decides on a single output value for a given round in two phases:

1. Prepare / Promise: In this phase the Proposer is sending the request associated with a number N to a set of Acceptors. If N is largest number an Acceptor has seen, it will promise to the proposer that it will accept this request.

2. Accept/Accepted : If the Proposer received enough promises form the Acceptors, it will send the message Accept, together with the proposal and the value. The Acceptor will accept this proposal only if it had not already promised to accept another proposal with a higher value.

Practical Byzantine Fault Tolerance (PBFT), also known as Byzantine Paxos, extends Paxos with a new message between Acceptors before proceeding with the Accept operation. Acceptors need to receive f + 1 such verification messages in order to proceed.

The concept of Proof of Work was introduced by Dwork [33] in the context of combating junk mail. In this scenario, the sender is required to solve a cryptographic puzzle: given the contents of a message, the sender should find a nonce which concatenated with the message will yield a SHA − 1 stamp with a minimum leading number of zeroes. The minimum leading number of zeroes is known as the difficulty of the proof.

In the BitCoin payment system, the Proof of Work is used to append a block of transactions to the chain (also known as mining a block). The miner is required to find a nonce which

(21)

concatenated with the contents of the block solve the crypto-puzzle. If two blocks are mined at the same time, only the one with the longest chain built after it will survive. This is a computationally expensive process and the difficulty increases with the hash-power of the network. Today the BitCoin network is estimated to consume around 75 T W h/year2 which is more than double the energy consumed by Denmark (32 T W h/year3).

Proof of Stake is a mechanism for selecting the creator of the next block at random, based on the distribution of shares among the nodes. This has been investigated as a green alternative to Proof of Work in Blockchains. It has been formally analyzed and compared to Proof of Work in [11]. An extension of this, Delegated Proof of Stake lets the stakeholder vote on the creator of the next block. On its own this mechanism does not qualify as ideal for a consensus protocol because nodes can hold progress by consistently voting on different histories, but economic incentives such as punishing dishonest peers can reduce the amount of harm. Ouroboros [51] is a provably secure Proof of Stake protocol which selects responsible stakeholders in epochs. These stakeholders execute a distributed coin-flipping protocol to select a leader which is responsible to mine the next block.

Proof of Elapsed Time (PoET) [24] is an alternative to the Proof of Work. It makes use of Intel’s Software Guard Extensions (SGX), which is able to sign the code that is executed on the CPU, for ensuring that nodes are waiting for a given amount of time before being entitled to generate a new block. This is achieved by the SGX providing a cryptographic attestation of the software that ran on the node.

Another direction is to generate useful puzzles in the Proof of Work scenario. Several Proofs of Useful Work have been compiled by a survey [9]. Conquering Generals [60] uses a Proof of Work with small difficulty to generate the input to a Traveling Salesman instance. The solutions to the PoW with the most number of leading zeroes are selected to participate in the next round. Then, the nodes which is able to find the best solution is declared the the winner and is entitled to extend the chain. Other examples of useful Proofs of Work are Cuckoo Cycle (Graph Theoretic) [90], PrimeCoin (finding gaps between large prime numbers) [52], Momentum (Birthday collision) [57], REM (Resource efficient mining using SGX) [99].

Avalanche [79] proposes a mechanism for partial ordering of transactions while guaranteeing the safety and liveness of the protocol with probability 1 − ε. Metastability is introduced in the system by allowing nodes to randomly query the peers regarding a binary decision until αk votes for an option are collected. This continues for m rounds, when the node retains the final value. The authors argue that even in a 50/50 partition, the nodes will decide due to random perturbances during the sampling of the peers. Avalanche builds a Transaction Graph where nodes need to decide on conflicting transactions based on the aforementioned strategy.

Hashgraph Virtual Voting [8] operates in a “completely asynchronous” model, in contrast to previous protocols. Consensus is reached via virtual voting, by continuously sending the list of known events to a random node, and checking that enough members have decided on the validity of a transaction. The white paper tells that the algorithm circumvents the FLP impossibility [43] by sending the history of known events to random peers at each step. The algorithm will reach an agreement with probability 1 [8, Thm. 5.16]. It states that if a member takes a binary decision, then all members will take the same decision within the following two rounds.

2.3.2 Distributed Ledger Technology

A Blockchain is a replicated state machine where state transitions are represented as transactions and are organized in blocks, which are referencing the hash of a previous block. Depending on who is allowed to append blocks, we distinguish three types of Blockchains: permissionless, permissioned, and private.

2_{https://digiconomist.net/bitcoin-energy-consumption} 3

(22)

Permissionless Blockchain protocols allow anyone to join the network and run a node which is able to broadcast transactions and contribute to the state of the system by mining blocks. On the other hand, permissioned Blockchains are managed by known parties which allows them to choose which nodes are able to modify the state of the ledger, and eventually select which nodes are able to send transactions into the system. If a Blockchain is managed by one single entity, this is called a private Blockchain.

Permissioned systems can benefit from the vast literature on consensus, state replication and transaction broadcast in asynchronous networks where the connectivity is uncertain, or the nodes are subject to crashes or subversion by an adversary. Nevertheless, there are many start-ups which are developing Blockchain protocols based on pure intuition, without relying on established research.

In Table 2.4 we present several distributed ledger technologies, and indicate their type, consensus protocol, and support for Smart Contracts.

Table 2.4: Blockchains properties

Ledger Type Consensus Smart contracts

BitCoin Permissionless PoW No

Ethereum Permissionless PoW Yes

HyperLedger Fabric Permissioned PBFT Future

HyperLedger Sawtooth Permissionless PoET No

Hashgraph Permissioned Virtual voting No

Ripple Permissioned Round voting Future

Stellar Permissioned Federated BA No

The Bitcoin cryptocurrency [70] has motivated users to share their computational power in order to assemble transactions into blocks for which they have to solve a crypto-puzzle in order to add it to the Blockchain. The difficulty is increased based on the time spent to mine previous blocks, such that a block is mined approximately every 10 minutes. This represents a bottleneck regarding the throughput of transactions through the network. Moreover, greedy miners invested in Application Specific Integrated Circuits (ASICs) in order to have more hashing power, but this had the effect of increasing the block difficulty and therefore the energy footprint of the system.

In Nakamoto consensus nodes take part in a randomized selection, each node being selected after a random waiting time. After some mining, a node will be able to show to its peers a proof that it had waited for some amount of time in order to extend the Blockchain. The node broadcasts the proof a fast as possible, aiming to be part of the longest chain.

HyperLedger Sawtooth platform4presents a novel Proof of Elapsed Time (PoET) consen-sus. Every node will execute the waiting step in an enclave which will provide an attestation that the node is entitled to extend the Blockchain. This will presumably reduce the energy footprint of mining a block, but the economic investment can influence the protocol, as becoming a leader is proportional to the number of units one controls.

Ethereum [16] and new Blockchain technologies have chosen ASIC-resistant puzzle al-gorithms in order to combat this problem. Additionally, Ethereum provides a Quasi-Turing complete programming paradigm to create Smart Contracts which execute state transitions, in addition to currency-transfer transactions. This in turn forces all verifiers to execute all smart contracts in a sequential manner, forever, in order to check that the system transitions from one state to another. Although this does not bottleneck simple smart contracts, more complex

4

(23)

smart contracts are hard to achieve due to the gas price (in Ether currency), which has to be paid per instruction basis.

HyperLedger Fabric5 consensus is based on an ordering service which uses the Practical Byzantine Fault Tolerant (PBFT) [22] protocol achieves resilience against subverted nodes. The new architecture, Fabric V1 [3], separates the smart-contract transactions from the ordering transactions offering more scalability, support for randomness smart contracts, distributed smart contracts, and modular consensus implementations [95].

The Swirlds Hashgraph algorithm presented in a whitepaper [8] is a patented [7] tech-nology implemented in the proprietary Hedera Hashgraph [58] platform. Developers can create swirlds with which the applications can interact by making use of an SDK6. The protocol has also been implemented as open-source, in Babble7. It outlines the notion of fairness of transactions, arguing that the ordering of transactions is made based on the moment it reached 2/3 of the network. Therefore, if two transactions are issued at roughly the same time fairness resumes to the activity of the nearby community. When it comes to timestamping a transaction, the algorithm will choose the median of the timestamps when the members heard about the transaction. They also take into account extensions in the form of Proof of Stake consensus where timestamps are selected as a median weighted by the stake of the validators.

Ripple8 and Stellar9 are two currency related Blockchains which do not involve mining, but function similar to a permissioned Blockchain.

The Ripple [93] Blockchain is advanced by validating nodes which vote in rounds on the content of an entry. The entry is accepted by a node if receives a proportion of signed updates that match p > 50%, ...80%, increasing by 10% each round. Their documentation states that decisions are made if 4/5n of the n validators are correct. This would tolerate 1/5 of the nodes to have Byzantine behavior in traditional distributed systems. There is a need for a minimum overlap between the convincing sets, which has been inspected in [6] to be 2/5 in order to tolerate 1/5 corruption.

The Stellar Protocol [62] uses a federated Byzantine agreement within its protocol. Each validator declares its convincing-set which should overlap with such sets of the others in order to prevent forks. A transaction is accepted if a given proportion of its convincing-set confirms it. Examples [62] show a multiple level hierarchy, with different thresholds for accepting a transaction. For example:

67% = (3 out of 4) Groups, Groups = {Comp, Acc, Cons, F r} 51% = (2 out of 3) Companies, Comp = {Comp1, Comp2, Comp3}

58% = (5 out of 7) Accountants, Acc = {A1, A2, C1, C2, E1, E2, G1}

51% = (2 out of 3) Consultants, Cons = {C1, C2, A1}

1% =(1 out of 26) F riends, F r = {Anna, Bruno, Charles, ..., Zack}

Similar structures are known in the literature as Byzantine Quorum Systems (BQS). In order for the Blockchain to not fork, the convincing-sets should intersect in the top of the hierarchy which introduces some amount of centralization, similar to BQS [61].

5 https://github.com/hyperledger/fabric 6_{https://www.swirlds.com/download/} 7 https://github.com/babbleio/babble 8_{https://ripple.com} 9 https://www.stellar.org

(24)

2.4 Blockchain-based Clouds

Several companies are tackling the offering of cloud services through the means of Blockchains. Ethereum itself is taught as the world computer, though the capabilities of storing data and execution are drastically limited by the price of smart contracts. Thus, the technologies rely on using the Ethereum Blockchain in order to create tokens for their platform, and raise investment funds through Initial Coin Offerings (ICOs).

Table 2.5: Blockchain based Technologies for Cloud Services

Name Blockchain Market Storage Services

Golem Ethereum No No SaaS

SONM Ethereum Yes Yes IaaS

Enigma External Yes Encrypted PaaS

iExec Ethereum Yes Yes PaaS

Decenter Ethereum Yes Not defined IaaS

Golem10 uses IPFS [10] as the means to distribute file blocks in a network of worker nodes which process data at block level and the user will collect and merge the results computed in parallel. For now, it supports only object rendering, though it has plans to support distributed machine learning.

SONM11uses Docker for executing Container Images and achieves a higher level of abstrac-tion, getting close to a generic Cloud platform. Orders are managed through a side chain which is an extension of Ethereum, Suppliers must interact with this chain in order to set up workers acting on their behalf. The workers expose resources such as CPU, RAM, storage, bandwidth in the form of benchmark identifies, e.g., 20 GFLOPS. A user can rent some resources for a limited amount of time, or on a pay-as-you-go model. There is a limited amount of documentation with regard to how the system is matchmaking user requests with resources.

2.4.1 Enigma

“Enigma is peer-to-peer network, which enables parties to store and run computations on data while keeping the data completely private. The computational model is based on secure multi-party computation, using a sharing scheme” [100]. For storage, “Enigma holds secret-shared pieces of data in a distributed hash table. An external Blockchain is used as the controller of the platform, to manage identities, and events log” [86]. Enigma has an architecture composed of three layers: a Public Ledger, an off-chain Distributed Hash Table (DHT) and a Multi-Party Computation (MPC) Layer.

The Public Ledger information is available on all nodes. An update is propagated to all nodes. It maintains a history of the actions. Private data is encrypted locally before is sent to be stored in the DHT. Additionally the user can set access rights. The MPC layer allows for secure multi party computation. When an operation needs to be made on the data, the data has to be shared with a number of peers using a Linear Secret Sharing Scheme. The data can be referenced through its shares, which can be used to instrument addition and multiplication. Enigma code is developed in C++, with public procedures executed on the Blockchain and private code executed through the platform.

10_{https://golem.network/} 11

(25)

2.5. SUMMARY 13

2.4.2 iExec

iExec relies on research on volunteer computing [37, 66]. They use the Ethereum Blockchain to manage the tokens used on the platform, and implement the platform logic on a side chain. The platform defines several entities which are shown in Table 2.6.

Table 2.6: iExec entities iExec Hub and

Marketplace

an auditable smart contract used to manage the stakes and keep track of the history of the actors.

Dataset providers

individuals which will sell access to their data on the platform. Application

providers

individuals which will deploy applications on the platform; applications can be free or ask for a price.

Workers individuals or companies which expose their resources on the marketplace. Worker pools smart contracts to which workers can subscribe; the smart contract will take care of workers contributions and will receive fees for managing the underlying infrastructure. This contract stores scheduler settings, required in order to handle the stake and payments.

Their current work is focusing on a Proof of Contribution (PoCo) protocol for reaching consensus on the result of some work. An update has been given in a blogpost [50]. The PoCo is intended to link two entities: the iExec marketplace (where deals are made) and the computing infrastructure (based on XtremWeb-HEP middleware [37]).

Anyone can implement a worker pool smart contract (scheduler), which workers can join. They intend the worker pools to compete with one another. This smart contract will need to implement a Sarmenta [81] sabotage-tolerant protocol in order to settle on a result. It is assumed that a worker pool is homogeneous. The scheduler selects from the available workers randomly and decides on a result based on the number of votes for a specific result and based on the reputation of the solvers. A worker only solves a task once, and if it provides a faulty answer, then it will lose the stake which he locked when he asked for work. The workers providing correct answers will split the stake accumulated from the fees and faulty worker’s stake. It currently only supports deterministic applications, but it plans for non-deterministic as well.

2.4.3 DECENTER

DECENTER is a Horizon 2020 financed project aiming at providing a federated brokering platform for fog resources [82]. Their proposed architecture is centered around the Resource Exchange Broker (REB) Smart Contract. Resource providers deploy a REB Contract which manages the selection of resources and signals Orchestration Components to deploy applications. A resource provider is required to have installed a full Cloud management software stack: infrastructure management and provisioning, together with service orchestration components.

Recent publications present an architecture that allow for the definition of Service Level Agreements (SLAs) using Smart Contracts [53]. Quality of Service (QoS) parameters such as network throughput, or the latency between different tiers of the same Application are then used by a decision-making layer, which is composed of the monitoring and orchestration components.

2.5 Summary

Cloud Service Providers offer a vast amount of facilities ranging from Virtual Machines to GPUs, FPGAs and Virtual Clusters on a pay-as-you-go model. The Fog layer extends the functionality of the Cloud by providing the same category of services, namely compute, storage, and network,

(26)

provisioned by computers between the Cloud and the Edge devices. This thesis investigates the deployment of Cloud Services by matching the user’s desire for performance with the Cloud’s desire for efficient resource utilization. It does this by introducing Abstract Service Definitions, which have multiple possible implementations. Through the process of Service Optimization, an appropriate implementation is selected based on some user constraints and the current efficiency measure of the Cloud. It later extends the concept to the management of privately owned resources which constitute the Fog layer.

Blockchains have paved the way for peer to peer systems to grow, providing novel mechanisms for achieving consensus on a global scale. With the introduction of Smart Contracts, Blockchains allow arbitrary computation to take place on the replicated data. However, storing data on the Blockchain is costly, and not recommended. Thus, solutions have been proposed to use the Blockchain as a payment system and provide another layer for storage and complex computation. The current efforts decouple the Blockchain infrastructure from the scheduling logic, using a trusted oracle (for randomized matchmaking) and smart contracts that play the role of schedulers. The incentive is that workers will tend to join worker pools where the payoff is better, so this may cause the centralization under several large worker pools. In this scenario, the side-chain and scheduling logic should be able to scale with the number of placement alternatives. This thesis investigates the cost and latency associated with such a system.

In contrast to the credibility-based sabotage tolerance used by the iExec platform, our proposed solution relies on monitoring components that assess the state of the management and orchestration components, which in turn assess the compliance of the system. In contrast to the DECENTER project, where the resource provider must also provide the orchestration stack, we consider that the resource provider cannot be trusted and, therefore, randomly assigned Orchestrators must assess the availability of the resources and services running on them.

(27)

Chapter 3 Gateway Service

This chapter presents the CloudLightning Gateway Service, a collection of components that allow for the definition, composition, optimization and deployment of HPC Services using the Cloud Computing paradigm. The key contributions are:

• the modelling of infrastructure (VMs, Containers, Bare Metal servers, hardware Acceler-ators), services and the relationships between them using the TOSCA specification • the modelling of abstract services that can be instantiated by different explicit

implemen-tations through the process of Service Optimization

• the implementation of a User Interface by extending the Alien4Cloud platform with plugins that allow for the Service Optimization process and deployment on CloudLightning infrastructure

• the design of the specification and protocol, as well as the implementation for the Service Optimization process, allowing for communication between the Gateway and the resource manager.

• the implementation of the CloudLightning Orchestrator, which is able to deploy Applica-tions composed of Services using heterogeneous resources (e.g. a VM Service linked to a Container Service which makes use of a hardware accelerator).

Section 3.1 briefly presents the CloudLightning System and the responsibility of the Gateway Service and Section 3.2 presents the architecture. Section 3.3 defines the CloudLightning Entities using the TOSCA specification and offers instructions for the development of new Services which follow the CloudLightning specification. Section 3.4 shows the User Interface and several examples for Application Composition and Service Optimization and Deployment visualization. A preliminary version of this work has been published previously [32]. This is revised and enriched to present the final version of these components. Section 3.5 presents the Service Optimization Engine and the interaction with the resource manager, and Section 3.6 presents the Orchestrator and the interaction with the computing infrastructure. Finally, Section 3.7 evaluates the deployment of Services using the CloudLightning Gateway Service compared to alternatives such as on-site deployments or other Cloud frameworks.

3.1 The CloudLightning System

The CloudLightning Project [69] presents a novel framework with the purpose of facilitating the resource selection and deployment for the end user. Its architecture is presented in Figure 3.1.

The Gateway Service is the entry point of the system, where Application Developers publish Service definitions and requirements. The Cloud User is able to select and link several services in order to create an Application. Depending on the parameters chosen by the user (cost, performance) and the state of the system (load, energy efficiency), a scheduling system will recommend the placement of the services on the infrastructure. The Gateway then proceeds

(28)

Figure 3.1: CloudLightning Components Architecture

with deploying the Application services and informs the user about operational metrics: service status, service endpoints, credentials.

A generic Self-Organizing Self-Managing (SOSM) framework for Cloud infrastructure man-agement has been proposed in [41]. The CloudLightning SOSM System is a hierarchical system with 4 layers. The entry point of the system is the Cell Manager. The Cell Manager routes any incoming service requests to one of the pRouters from underneath. There is a pRouter for each type of resource. Under each pRouter there is a number of pSwitches, each with a number of vRackManagers. The number of pSwitches and vRackManagers is variable depending on the management cost and the fragmentation of the underlying infrastructure. The system routes the requests through its layers based on availability information passed from the bottom up. It has been proven to achieve better scheduling success, resource utilization, and energy efficiency compared to other hierarchical schedulers [40].

The CloudLightning framework proposed a Plug and Play for registering resources as in-frastructure. These resources are requested to send monitoring data to a time series database (Influx DB), from where it is consumed by the vRackManagers, which aggregate availability needed by the Scheduling system.

After the successful provisioning of resources for requested services, the deployment is man-aged through a custom implementation of Apache Brooklyn in case of Bare Metal servers or Virtual Machine, and through Marathon in case of Containers.

3.2 Architecture

Several components are required to manage the life-cycle of an Application. A User Interface allows for the management of service definitions and application deployments. A Service Portfolio provides the means for storing Service Definitions (e.g., requirements, dependencies) and Application Topology definitions, consisting of one or multiple Services and the relationship between them. An Application Developer (AD) is using this component to store such definitions, which can later be used by an Application Operator (AO) to create a new Application Topology or deploy a version designed by the developer.

(29)

3.2. ARCHITECTURE 17

Figure 3.2: Gateway Service components and interactions

We introduce the concept of Application Abstraction to allow a user to define an Application Topology consisting of Abstract Services. These kind of Service Definitions are abstractions of the explicit Service Implementations, defining only dependencies on other services, but no requirements on the hardware type or accelerators. The Service Optimization Engine (SOE) allows for the inspection of the Service Portfolio in search for the explicit implementations and provide the SOSM System with a Blueprint of all combinations of implementations for the Application. After choosing the most suitable resources for a Blueprint, the user is presented with the explicit Application Topology. This topology is deployed by an Orchestrator, which manages the life-cycle of the Application.

Part of the Gateway Service consists of extensions to the Alien4Cloud[36] platform. The Service Portfolio uses the platform’s storage service to store TOSCA definitions. The UI has been augmented to allow for the Service Optimization process. More specifically, a Javascript plugin, a4c-SOE1, has been developed to communicate with the SOE. The SOE has been developed from scratch and consists of an Asynchronous REST Server and logic to query the Service Portfolio, create Blueprints, and modify the TOSCA Topology definition upon the selection of explicit implementation. The technical implementation of the cooperation with the SOSM System and example interactions are shown in Section 3.5.

Additionally, an Orchestrator plugin has been developed to handle the deployment of Ser-vices. A novelty feature offered by this plugin is the ability to deploy an Application consisting of services managed by different Resource Managers, achieving interoperability between the different infrastructure types. It communicates with the CloudLightning Brooklyn Orchestrator for deploying services under the form of VMs or Bare Metal servers and with the Marathon Orchestrator for deploying services as Containers. CloudLightning Brooklyn is an extension of the Apache Brooklyn [4] Orchestrator that allows for the interoperability between the Cloud-Lightning TOSCA definitions and Brooklyn’s logic for service deployment. Marathon [65] is an Orchestrator for the Apache MESOS[5] Resource Manager that offers a REST API for the deployment of containers and is installed in situ with the resources.

1

(30)

3.3 Entities definitions

TOSCA (TOpology Specification for Cloud Applications) [71] is a specification designed by OASIS that allows for modelling the full software stack for an application, in order to automate the deployment and management processes in a Cloud. Since the variety of Clouds is numerous, the purpose of TOSCA is to allow for the design of portable applications which have reproducible behaviour when migrating from one Cloud platform to another. The specification details a meta-model for the definition of both the structure and management of applications. The Services composing an Application and the relationships between them and the infrastructure are defined using a Topology entity. Services and Infrastructure are represented as nodes, and relationships must have a source and a target node; additionally, relationships can have parameters that may be of use to the Orchestration Engine.

This description is portable and can be automatically deployed on different execution envi-ronments (e.g. Clouds, local VMs). For example, a component can specify host requirements (CPU, RAM), together with specific libraries that should be installed. In the case Clouds, specific Virtual Machines can be provisioned with the required software as part of the image. In the case of HPC, this can be used to schedule tasks on nodes already containing the required software, or collect the libraries on the nodes. In both cases, the workflow user does not need to handle the installation steps of the workflow in question.

TOSCA Nodes model entities like a physical or virtual machine, a software component, a data volume or a hardware accelerator. Relationships between nodes are modeled with the use of node Capabilities, which express the competences of a given node, and node Requirements, which express the competences required from another node. For example, a node representing a Virtual Machine presents a host capability which a Software Component can use to fulfill a host requirement, through the use of a HostedOn relationship. Additionally, the Software Component can present an Endpoint capability, which other Software Components can use via a ConnectsTo relationship to inform the Orchestrator about their dependency. A workflow is created based on the relationships within an Application. In the aforementioned example, first the hosts for the Service Components are provisioned, then the first Software component is started, and only after the Endpoint is available the second Software Component is started using the information exposed by the Endpoint(e.g. IP, port).

3.3.1 CloudLightning Base Nodes

The CloudLightning Base Nodes model the CloudLightning Resource types, hardware accelera-tors and abstract Services that can be extended to make use of the CloudLightning features.

Listing 1 presents the CLResource infrastructure type. It is derived from the TOSCA standard Compute definition and is augmented with the following:

1. host – this capability is overridden to distinguish it as a type of resources that a Cloud-Lightning Service can be hosted on. All extensions of a CLResource will inherit this capability.

2. accelerator – this capability is a new addition and expresses the ability to have an acceler-ator attached. Services which require acceleracceler-ators will be hosted on a node which needs to know how to provide access to the accelerator, which is provided by AcceleratorAttachment relationship associated with this capability.

Listing 1: CloudLightning base infrastructure type

1 cl.nodes.CLResource:

(31)

3.3. ENTITIES DEFINITIONS 19

3 "description": "Abstract CloudLightning Resource type" 4 "capabilities":

5 - "host": "cl.capabilities.CLResource"

6 - "accelerator": "cl.capabilities.AcceleratorAttachment"

The CLService (Listing 2) represents the standard CloudLightning Service, and is exposing a CLService capability, which is the base capability allowing for the service optimization process. For example, an abstract definition of service X will define a capability cl.capabilities.ServiceX, derived from cl.capabilities.CLService, and all Services derived from Service X will inherit this capability.

Listing 2: CloudLightning base Service definition

1 cl.nodes.CLService:

2 "derived_from": "tosca.nodes.SoftwareComponent" 3 "description": "Abstract CL Service"

4 "abstract": true 5 "capabilities": 6 - "service": "cl.capabilities.CLService" 7 "requirements": 8 - "host": 9 "capability": "cl.capabilities.CLResource" 10 "relationship": "tosca.relationships.HostedOn"

A Container instance may be managed through a resource manager and thus information about the host may not be known. This is the reason why the CLService will be inherited into two different paths, namely the CLSoftware and CLContainer. The CLSoftware (3) requires a host node to be provisioned, while the CLContainer (2) does not, since it can be managed through an Orchestrator like Marathon. Additionally, the CLSoftware definition provides different scripts for initializing, starting and stoping the corresponding software. In the case of Containers, these are provided as properties and managed through Marathon.

Listing 3: CloudLightning base Software definition

1 cl.nodes.CLService.CLSoftware: 2 "derived_from": "cl.nodes.CLService" 3 "requirements": 4 - "host": 5 "capability": "cl.capabilities.CLResource" 6 "relationship": "tosca.relationships.HostedOn" 7 "interfaces": 8 "Standard": 9 "create": "scripts/create.sh" 10 "start":

11 "description": "Script for starting the service" 12 "implementation: scripts/start.sh"

13 "stop: scripts/stop.sh"

In some cases, a CLContainer may require a specific host. This happens when the SOSM selects a specific host or when an accelerator is required. An Accelerator must be mounted on

(32)

a host and the CLContainer must also run on the same host. This is achieved by extending the CLResource into a DockerHost. A CLContainer allows for two occurrence values on the host requirement: 0 in case no host specific information is needed, and 1 in the aforementioned case.

Listing 4: CloudLightning base Container definition

1 cl.nodes.CLContainer:

2 "derived_from": "cl.nodes.CLService"

3 "description": "Abstract CL Docker Container" 4 "abstract": true 5 "properties": 6 "cpu_share": 7 "type": float 8 "default": 1.0 9 "mem_share": 10 "type": scalar-unit.size 11 "default": "128 MB" 12 "constraints": 13 - "greater_or_equal": "0 MB" 14 "docker_run_args": 15 "type": list 16 "required": false 17 "entry_schema": 18 "type": string 19 "docker_run_cmd": 20 "type": string 21 "required": false 22 "docker_env_vars": 23 "type": map 24 "required": false 25 "entry_schema": 26 "type": string 27 "attributes": 28 "endpoint": 29 "type": string 30 "capabilities": 31 "attach": "alien.capabilities.DockerVolumeAttachment" 32 "scalable": "tosca.capabilities.Scalable" 33 "requirements": 34 - "host": 35 "capability": "tosca.capabilities.DockerHost" 36 "relationship": "tosca.relationships.HostedOn" 37 "occurrences": [0, 1]

Every extension of a CLResource will allow for the attachment of accelerators. The abstract Accelerator node requires a AcceleratorAttachment (offered by a CLResource) and offers an AcceleratedBy capability which can satisfy a Service’s requirement for an Accelerator. A new accelerator can extend this specification, providing an explicit AcceleratedBy capability, for example AcceleratedByGPU.

(33)

3.3. ENTITIES DEFINITIONS 21 1 cl.nodes.Accelerator: 2 "abstract": true 3 "capabilities": 4 "accelerator": "cl.capabilities.AcceleratedBy" 5 "requirements": 6 - "attachment": 7 "capability": "cl.capabilities.AcceleratorAttachment" 8 "relationship": "cl.relationships.MountAccelerator"

Figure 3.3 presents example relationships for the nodes we previously described. In general, a CLService and an Accelerator are hosted on a CLResource. A GPUSoftware and a GPU are hosted on a CLResource, and the GPU satisfies the GPUSoftware’s requierement. A MICContainer and a Many Integrated Cores (MIC) unit are hosted on a DockerHost, and the MIC satisfies the MICConainer’s requirement for a MIC. The AcceleratedBy relationship is depicted in a thin green line.

Figure 3.3: Accelerated Services Examples

The CloudLightning Base Nodes also include abstract definitions for three types Accelerated Services, namely MICSoftware, GPUSoftware, and DFESoftware for VM and Bare Metal hosted software, and MICContainer, GPUContainer, and DFEContainer for Container packed services. These definitions are provided to be used as base definitions for the Application Developer to extend with specific software properties and prerequisites. Listing 6 presents such a definition for the DFEContainer, the alternatives being similar.

Listing 6: Example of Container requiring a DFE

1 cl.nodes.CLService.DFEContainer:

2 "derived_from": "cl.nodes.CLContainer"

3 "description": "CL Container accelerated by DFE" 4 "requirements":

5 - "accelerator":

6 "capability": "cl.capabilities.AcceleratedByDFE" 7 "relationship": "cl.relationships.AcceleratedByDFE"

3.3.2 Capabilities and Relationships

Capabilities and Relationships model the interaction between the different types of Nodes, and offer polymorphism for requirements. For example, a MIC is agnostic of the type of resource is mounted, as long as it presents a AcceleratorAttachment capability. The same is valid for

Decentralized Cloud Computing Doctoral Thesis