QoS-Aware Deployment Through the Fog

(1)

QoS-aware Deployment

Through the Fog

Candidate: Stefano Forti

Supervisor: Prof. Antonio Brogi

A thesis submitted for the degree of

MSc in Computer Science and Networking

University of Pisa and SSSUP Sant’Anna

Academic Year 2015/2016

(2)

Abstract

Fog computing aims at extending the Cloud by bringing computa-tional power, storage and communication capabilities to the edge of the network, in support of the Internet of Things (IoT). Segmentation, distribution and adaptive deployment of services over the continuum from Things to Cloud are challenging tasks, due to the intrinsic het-erogeneity, hierarchical structure and worldwide scale infrastructure they will have to leverage.

This thesis proposes a simple, yet general, model to support QoS-aware deployment of multi-component applications over Fog infras-tructures. The model describes operational systemic qualities of the available infrastructure (latency, bandwith), interactions among the involved software components as well as business policies. Algorithms to determine eligible deployment plans for an application over a Fog infrastructure are presented. A motivating example is used to illus-trate the applicability of both the model and the algorithms.

(3)

List of Figures

1.1 A pictorial representation of a Fog system architecture. On the right hand-side the type of infrastructure used and an estimate of the number of nodes for each layer is reported, as in [16]. . . 2

2.1 A data latency hierarchy, showing many uses of the same data as in [16]. . . 9

2.2 A sketch of the Fog architecture proposal, as in [16]. . . 14

3.1 The components of the SmartFields application. Arrows represent interactions among components through proper interfaces. . . 19

3.2 The Fog infrastructure available at Farm X. . . 22

4.1 The Fog infrastructure available at Old MacDonald’s Farm. Links report the estimated QoS values for latency and bandwidth. When the bandwidth is asymmetric (i.e. b↓ 6= b↑) the related values are

reported as b↓/b↑ and the arrow of the link indicates the upload

di-rection. The software offerings are on the right handside (black points) and the hardware offerings on the left handside (white points) of each node. . . 28

4.2 The SmartFields application. Links report the required QoS values with the same conventions as in Figure 4.1. For each component, software requirements are on the right handside (white diamonds) and hardware offerings are on the left handside (oblique lines). Things requirements are at the bottom of gateway modules. . . . 32

(6)

List of Figures

4.3 The infrastructure of Figure 4.2.1, when deploying SmartFields. Exploited Things are coloured as the software component that uses them. The RAM capability of each node is shown graphically beside Fog nodes: each rectangle is 1 GB and it is coloured as the component consuming it. Link bandwidths have been updated and interactions mapped on each link are listed under QoS profiles. . . 39

5.1 The search space for CDP. . . 41

5.2 The search of the eligible deployment. In red the branches pruned at search time, in grey the nodes that are not expanded, in black the path towards the solution. . . 50

5.3 The set of all eligible deployments of the Old MacDonald’s appli-cation. . . 56

5.4 Alternative deployments after link breakdown between local_1 and consortium_1. . . 57

(7)

Chapter 1 Introduction

1.1 Context

Cloud computing is nowadays a reality widely exploited and experienced both by IT experts and users. Current research in this field is focusing on easing the process of designing, deploying and maintaining distributed scalable software systems running over the Cloud by taking into account their Quality of Service (QoS) [11], other non-functional requirements such as power consumption [37], and both functional and non-functional dependencies among different compo-nents [18, 25].

At the same time, the Internet of Things (IoT) is becoming more and more widespread through the pervasive introduction of connected sensors and actuators in hundreds of different scenarios, e.g., healthcare, businesses, buildings, cars and weareable devices among others [27]. As a consequence, enormous amounts of data – the so called Big Data [35] – are collected by those sensors to be stored in Cloud data centres [30]. Here, they are subsequently analysed to determine reactions to events or to extract some analytics or statistics. Due to the volume, variety and velocity of the data being generated, this approach does not perfectly fit all applications, especially those that must meet compelling time or security constraints [16][20], e.g., control systems for industrial plants or public buildings, precision agriculture applications, lifesaving connected devices within a hospital or smart traffic light infrastructures.

(8)

Chapter 1. Introduction

To address this problem, recent research efforts are investigating how to bet-ter exploit network capabilities at the edge of the Inbet-ternet to support the IoT and its needs. In between the Cloud and the IoT, new computational nodes are finding their natural elbow room, acting both as processing capabilities closer to the ground and as filters over data streams directed towards the Cloud [21]. Such geographically distributed nodes can make it possible to respect real-time deadlines and to implement security policies that the Cloud alone could not en-sure. The new intermediate layer connecting the ground to the sky goes under the name of Fog computing [17,3]. Figure1.1 sketches the relations between IoT devices, Fog nodes and Cloud datacentres.

Figure 1.1: A pictorial representation of a Fog system architecture. On the right hand-side the type of infrastructure used and an estimate of the number of nodes for each layer is reported, as in [16].

1.2 Problem Considered

One of the problems raised by the aforementioned scenario is how to master the complexity of deploying and managing applications over the Fog, mainly due to the scale and heterogeneity of the Fog infrastructure. Whilst various solutions to deploy, manage and monitor composite applications have been studied for Cloud-only environments [15, 19, 25, 8], Fog computing calls for new techniques to

(9)

effectively guarantee tolerable latencies between cyber-physical events processing and actuation.

Fog computing needs geographical location and QoS parameters to be consid-ered during the continuous and adaptive deployment process over a dynamic and heterogeneous infrastrucure made out of smart Things, Fog nodes and Cloud data centres. New technologies, methods and tools are to be devised so to properly in-stall and manage across the Fog the distributed components of IoT applications, by enabling real-time interactions, leveraging different connection technologies, forecasting interoperability and federation among multiple providers.

Currently, to the best of our knowledge, there are no tools that specifically support the deployment and management of Fog applications [10]. The choices in the infrastructure design and selection, the identification of criticalities during the deployment and the recognition of possible adjustments at monitoring time are all left in the hands of IT experts who make decisions on the basis of their experience in Cloud computing and without any formal guidance for the novel Fog extension. Questions like:

• “How many and how powerful Fog nodes should I (buy and) install to adequately deploy my application?”

• “Should I deploy this component onto the Cloud, onto the new Fog-as-a-Service (FaaS) opened in my city or on my premises gateway?”

• “Is there any component I’d better deploy on a different node after this link failure?”

may be hard to answer promptly for simple applications, even harder for the complex composite software systems we deal with nowadays.

If some functions are naturally suited to Cloud (e.g., service backends) and others are naturally suited to Fog computing (e.g., industrial control loops), this is not always the case. Future tools should support adaptive deployment and segmentation of tasks over the available infrastructure, dynamically taking into account both the application specificity and the current state of the network for what concerns hardware and software capabilities, link bandwidths and latencies, fault events and cost targets [39]. Lifting the programmer from having to partition

(10)

the functions of her application between the Edge and the Cloud is crucial to achieve scalability and extensibility, hence for the success of the new Fog [47].

1.3 Objective of This Thesis

The objective of this thesis is to propose a simple, yet general, model to support QoS-aware deployment of applications over Fog infrastruc-tures.

The model aims at describing, at a suitably abstract level, the characteristics of interest and some operational systemic qualities of the existing (or planned) Fog facilities and of the composite application to be deployed. In that, the model can be exploited to determine the eligible deployments (if any) of an application over a given, intrinsically hierachical, Fog infrastructure.

While abstracting from lower level details, such as the business logic of the application or the adopted communication technologies, the model stays general enough to describe different types of systems and applications, in spite of some limitations that will be discussed later on.

The potential applicability of tools based on the proposed model can be found in at least three different scenarios:

• at design time, to perform what-if analyses for evaluating performance and identifying beforehand possible critical deficiencies in the Edge network capabilities,

• at deployment time, to decide where each component of the composite soft-ware system can be deployed according to the specified functional and non-functional constraints,

• at run time, to drive the monitoring of deployed applications and to trigger, if needed, reconfiguration or migration processes.

The model can be extended to accomodate new needs or to introduce valuable optimisations as research in the field advances. As an example, the design and adoption of a cost model would improve the search of eligible deployments, driving

(11)

it towards the optimal (or preferred) solutions first. Also, the introduction of appropriate metrics could indicate which Fog nodes may incur in overloading situations both at deployment and at monitoring time. Finally, more efficient algorithms could be devised for the partial (re-)deployment case in which some components have already been installed or cannot be moved from where they are (e.g., third party services).

What is discussed hereinafer aims at contributing to address this new research topic, by setting a possible starting point in the formal understanding of both Fog infrastructure and applications, an original and extensible attempt to tame the complexity of a paradigm that is yet to be standardised and defined in many of its details.

1.4 Thesis Outline

The rest of the manuscript is organised as follows. Chapter 2 offers an overview about Fog computing.

Chapter 3 describes a lifelike motivating example in the context of smart preci-sion agriculture. The example is then used throughout the rest of the thesis to demonstrate the applicability of the proposed model.

Chapter 4 proposes a theoretical model to support QoS-aware deployment and management of applications over Fog infrastructures. The model permits to deployment designers, system administrators and software users to specify significant functional and non-functional details of an application and of a Fog infrastructure to drive the construction of an eligible (initial) deploy-ment plan.

Chapter 5 illustrates and discusses backtracking algorithms and heuristics to find one ore more eligible deployments.

Chapter 6 summarises the contributions of this thesis, critically assessing the proposed methodology and proposing possible extensions and future work based on it.

(12)

Chapter 2 Fog Computing

2.1 Why the Fog?

Connected devices are changing the way we live and work. In the next years, the IoT is expected to bring more and more intelligence around us, being em-bedded in or interacting with the objects that we use everyday. By 2020, CISCO expects 50 billion of connected devices [21] with an average of almost 7 per per-son. Self-driving cars, autonomous domotics systems, energy production plants, agricultural lands, supermarkets, healthcare, schools will exploit Things that are integral part of the Internet and of our existence without us being aware of them. As Weiser had foreseen in 1991, technologies are disappearing, weaving them-selves into the fabric of everyday life until they will be undistinguishable from it [50].

The term IoT was first used in 1999 [12]. Many possible applicability sce-narios have been studied and designed since then, but few have been actually implemented due to infrastructural limitations like the difficulty to support real-time tasks and/or the mandatory requirement for an Internet connection. Re-search [21, 16, 20, 39] agrees that Cloud computing alone is not enough to ad-equately support the forthcoming pervasive digital transformation. A transfor-mation that sees machine interactions grow faster than machine-to-human ones.

(13)

Chapter 2. Fog Computing

multi-tenancy as more powerful and cost-effective alternatives with respect to owning, operating and maintaining in-premise computational assets. It enabled elastic, on-demand access to a pool of resources and services through the exten-sive use of virtualisation technologies and homogeneous hardware. Despite its benefits, the public Cloud deployment model is based on a limited number of huge datacentres. Computing and storage resources reside there, and data must be brought there for processing. This approach is, in a sense, centralised and does not favour scalability nor management in the IoT context.

Whilst data-processing speeds have risen rapidly, bandwidth to carry data to and from datacentres has not increased as fast [47]. On one hand, supporting the transfer of data from/to billion of IoT devices is becoming a hard task to accom-plish in the Cloud-only scenario due to the volume and geo-distribution of those devices. On the other hand, the need to reduce latency, to eliminate manda-tory connectivity requirements, and to support computation or storage closer to where data is generated 24/7, is evident [16]. The time has come to extend the Cloud all through to the IoT, so as to virtualise and exploit a new hierarchy of re-sources from the core towards the edge of the network, where data can be used for prompter decision making and support. The additionally exploited resources are going to be many and heterogeneous, e.g., set-top-boxes, mobile devices, routers, switches, micro-datacentres. The overcoming of the Cloud-only paradigm is likely to happen as the IoT grows and it has been termed Fog (or Edge) computing [16]. Leveraging a large number of highly distributed nodes, Fog computing is ex-pected to selectively support time-sensitive, geo-distributed or mobile applica-tions, in which smart Things are exploited in hundreds of different cyber-physical processing contexts and services.

2.2 What is the Fog?

Despite the technology trend and shift that Fog will bring, a precise definition of “Fog” is yet to be given. In what follows we will go through some of the definitions of Fog computing that have been tentatively proposed in the last four years. Since no standard definition exists, these paragraphs are not aimed at fully clarifying ideas nor at giving an exhaustive answer to the question “What is

(14)

the Fog?”. Rather, they are aimed at illustrating the current scientific debate to reach a widely accepted consensus definition of the new paradigm as it happened for the Cloud.

One of the first endeavours to define Fog computing was, in 2012, by Bonomi et al. [17]:

Fog Computing is a highly virtualized platform that provides compute, storage, and networking services between end devices and traditional Cloud computing data centers, typically, but not exclusively located at the edge of network.

Throughout [17], the authors highlight how such a platform for service delivery has to be widely geographically distributed and hierarchically organised so to ensure that the analysis of local information happens at the ground, whilst more in-depth historical analytics are performed at the Cloud, as shown in Figure 2.1. According to[17], Fog computing characteristics include: location awareness, low latency, widely geographically distributed deployment, large number of nodes, mobility, real-time interactions, predominance of wireless access, interoperability and federation between providers. This first definition is in line with the one given by CISCO [21] in the same year and clarifies that Fog computing is not going to substitute nor cannibalise the Cloud, but rather to fruitfully extend it.

In 2014, Vaquero et al. [48] extended the definition of Fog. Their vision aims at supporting a more fluid concept of Fog, which is not limited to a Cloud extension but is composed of:

ubiquitous and decentralised devices [that] communicate and poten-tially cooperate among them and with the network to perform storage and processing tasks [...] for supporting basic network functions or new services and applications that run in a sandboxed environment.

Additionally, [48] highlights that users leasing part of their devices to host these services get incentives for doing so. This broad definition contains more of the potential that the Fog paradigm is likely to express, a paradigm in which the users and their devices may become central and integral part of the deployment. Other researchers support and extend the definition of Fog towards this human-inclusive direction. Garcia et al. [28] advocate for Edge-centric computing and

(15)

Figure 2.1: A data latency hierarchy, showing many uses of the same data as in [16].

(16)

stress out the missed opportunity of exploiting the enormous amount of computa-tional, communication, and storage power of modern personal devices, highlight-ing how the new paradigm may represent the natural convergence and evolution of many other fields in Computer Science. Concepts born in Content Delivery Networks (CDN), Peer-to-Peer (P2P) overlays and Decentralised Cloud archi-tectures may represent the basis for the Fog transformation. Also the scenario depicted in [46] expects the same application code to run and adaptively migrate onto devices spanning from smartphones, to specialised routers and industrial switches, to virtual nodes in powerful Cloud servers. In line with previous pro-posals, although less drastic, Chiang [20] defined Fog networking as:

an architecture that uses one or a collaborative multitude of end-user clients or near-user edge devices to carry out a substantial amount of storage, communication and management.

According to this definition, the Fog not only stays in between end devices and Cloud computing datacentres but it also includes the former by exploiting a coop-erative mesh of devices. Fog nodes are any computing or networking capability deployed between data sources and Clod-based datacentres. [20] expects the paradigm shift to move at or near the end-user the storage that is now offered by datacentres, the communication currently carried out by the backbone network and the management, presently performed at the network gateways.

As for 2016, the standardisation of Fog computing frameworks is still at the beginning. Consortia such as Mobile Edge Computing (MEC) [2], started in 2014 with a focus on cellular networks, or OpenFog (OFC) [3], started in 2015, are fos-tering work in the field, involving both industry and academia. In their Reference Architecture (RA) draft [40], OpenFog Consortium defines Fog computing as:

a system-level horizontal architecture that distributes resources and services of computing, storage, control and networking anywhere along the continuum from Cloud to Things, thereby accelerating the velocity of decision making. Fog-centric architecture serves a specific subset of business problems that cannot be successfully implemented using only traditional cloud based architectures or solely intelligent edge devices.

(17)

This definition shows some traits of generality, whilst including characterising information about the Fog. It focuses on the problem that Fog computing is trying to solve – fulfilling the gap in between the Cloud and the IoT – and how – bringing resources and services closer to data sources. It stresses the mutually beneficial continuity of Fog with existing technologies and suggests the presence of intelligence to be exploited at all layers of the network. At the same time, the definition depicts Fog as a paradigm “on its own” and as a powerful enabling complement to the Cloud-IoT scenario. Possibly, it misses some explicit reference to the ability of Fog nodes to cooperate with each other, what is declared repeatedly elsewhere in the RA draft document. Also according to OpenFog, it is critical that Fog nodes have the ability to communicate horizontally (East-West or P2P), discover, trust and invoke services of other nodes so to guarantee reliability, availability and serviceability of the new infrastructure.

As it is often the case, the final definition of Fog will probably lie somewhere in between the many proposals. In this work we will assume the OpenFog defi-nition, that seems to currently include the most of Fog computing potential and prospectives.

2.3 Designing the Fog

The design of the Fog infrastructure will require significant efforts so as to en-sure that functionalities delegated to the Edge of the network are carried out efficiently, reliably and securely. Also, designers should embed programmability and scalability in their vision [47].

In this section, after a quick recap of the features that are expected from the Fog infrastructure, we discuss some possible designs and deployment models that are being proposed for its realisation.

2.3.1 Expected Features

Conversely to Cloud datacentres, any Fog architecture will require three main features to enable high scalability, reliability and availability of services:

(18)

dy-Chapter 2. Fog Computing

namically adapt to the current status of the network, so to provide tolerable latency and sufficient bandwidth,

• location-awareness, in that it should be possible to exactly determine the position of the involved Things and computational nodes, so to handle fluidity and mobility of the computation,

• context-awareness, being the involved nodes able to discover the resources and services available around them, so to sense and react according to their current environment, to pool and provision capabilities while meeting application requirements.

Guaranteeing those attributes is not the only task to be accomplished. As for the Cloud, the proposed Fog architectures shall scale to millions of nodes. This must envision dealing not only with fault-handling and elasticity (as in the Cloud) but also with network dinamicity and churn in such a way that if a device temporarily disconnects or runs out of battery, the current state of the computation is not lost and services do not incur in disruption. Tools to support applications deployment should permit rapid and flexible provisioning, depending on the available software, hardware, connections and virtualisation technologies at a given node.

Although researchers do not yet agree on where the Fog hierarchical intel-ligence will end up – whether only in ISP/telco/mobile network infrastructures or also in end-users devices – there is instead a clearly stated intention to lever-age existing technologies by devising convenient interfaces. Any Fog architecture should provide a middleware layer to virtualise (via virtual machines or contain-ers1) and manage resources (i.e., Fog nodes, Things, networking) in a distributed protocol-agnostic fashion. As for the Cloud, Fog computing will have to manage multi-tenant deployments and on-demand provisioning of resources all through the involved tiers. In these regards, easy-to-use frameworks for application devel-opment and deployment, as well as standard interfaces for Things-to-Fog, Fog-to-Fog and Fog-to-Cloud communication will determine much about the future of the new paradigm. As aforementioned, Fog architectures must envision a hi-erarchical organisation, comprising at least one layer of Fog nodes. Closer to

1_{Containerisation is a lightweight virtualisation method where the operating system kernel} permits running isolated, full-fledged, virtual environments as guest user-space processes.

(19)

the Things, M2M processes are controlled by quickly analysing the sensed data, intermediate Fog layers analyse data streams and perform operationally oriented analytics to be human readable, the Cloud works on data coming from multiple systems applying more sophisticate techniques [16, 3].

To sum up, Fog systems should be able to autonomously:

• seamlessy discover and manage the available heterogeneous resources (Things, Fog nodes, Clouds),

• provide services when Cloud connectivity is not necessary nor available, • manage and orchestrate services lifecycle from initial deployment to

dein-stallation, also complying with specified policies and QoS requirements, • take decisions and collaborate with each other (and with the Cloud) to

complete business missions.

2.3.2 Architecture Proposals

Currently three proposals have been advanced to implement all the expected fea-tures discussed before. The first is by Bonomi et al. [16], the second is by the OpenFog Consortium [39], the third is by Dastjerdi et al. [24]. They are substan-tially similar and will be discussed generally hereinafter, referring to Figure 2.2. We adopt the naming convention of [16] and we report the correspondence with the OpenFog jargon of each layer.

The Physical Layer is composed of actual heterogeneous devices by different vendors that will compose the Fog infrastructure and perform compute, storage and network functionalities. To master their variety, Fog architectures must pro-vide an Abstraction Layer (OpenFog Fabric or Software Backplane/Node Man-agement in [40]) that hides platform heterogeneity and offers APIs for multi-tenant monitoring, provisioning and controlling of physical resources. The Ab-straction Layer provides built-in mechanisms for accessing the Things at each node, so that a data collector module can be deployed on any compatible node without worrying for protocol bridging and gateway functionalities. The Abstrac-tion Layer is also in charge of supporting the different virtualisaAbstrac-tion technologies that the Fog will exploit, and enables multi-tenancy.

(20)

Figure 2.2: A sketch of the Fog architecture proposal, as in [16].

On top of it, the Orchestration Layer (OpenFog Services) provides dynamic, QoS-aware life cycle management of Fog services and applications, directly interfacing with Abstraction APIs. At this layer, software agents (Bonomi et al. [16] name them Foglets) should orchestrate functionalities, manage distributed storage of data and enable message exchange via a bus for service orchestration and resource management. This level should monitor the health and status of each physical machine and of the deployed services, so as to manage applications deployment and service provisioning in a QoS-aware manner. The presence of a distributed storage system is crucial to guarantee scalability and fault-tolerance, and to sup-port both the computations run over the Fog-Cloud system and the Orchestration layer itself. A policy engine is included in the Orchestration layer to let adminis-trator specify various QoS constraints for deployment, such as: tolerated network latency and needed bandwidth, needed software or hardware capabilities, load balancing, power requirements, security or other types of policies, etc. A dis-tributed policy manager must ensure those constraints will be adaptively met by running instances of each service, exploiting a global view capability engine.

The Fog framework must expose APIs to the Applications Layer (OpenFog Application/Devices) that enable programmers to manage the distributed data

(21)

store (put/get interface), the control of how applications are deployed over the infrastructure (policy management, services requests, etc.) and interactions with other services and systems.

Needless to say, Fog architecure should support also security and agile DevOps at all layers. Encryption of private or sensitive information, secure rendezvous among devices, access control to all types of resource and mechanisms for chain-trusting, as well as autonomy of orchestration and management, programmability and support to continuous/adaptive deployment cannot be left out by actual implementations of the presented stack.

From a wide perspective, the Abstraction and Orchestration together form a secure middleware layer that exploits Edge, Cloud and IoT resources to en-able [24]:

• multitenant resource management (task placement and resource scheduling, raw data management, monitoring and profiling),

• API and service management (API discovery, authorisation and authenti-cation, composition),

• support different programming models.

2.4 Research Challenges

The emerging Fog paradigm inevitably brings new and interesting research chal-lenges. We briefly survey some of them, as reported in recent literature [51, 47,

24].

QoS-aware deployment: Most early efforts to segment functionalities through the Fog have been tuned manually or considered only tree-based network topology. Automated tools that distribute application functionalities ver-tically and horizontally over the available infrastructure are to be devised and exploited in the Fog. Such tools, integrated within the Orchestration Layer and supported by the underlying Abstraction Layer, must guaran-tee desired QoS policies for deployed application components. This is the challenge that our work aims at contributing.

(22)

Programmability: In order to port applications to the Fog computing platform, a scalable programming model must be devised that permits components to be location-, context- and connection-aware at runtime. Particularly, the programmer should be lift from the duty of orchestrating her applications over the Fog hierarchy and of adapting source code for each deployment. Resource management and reliability: Naming schemes for identifying,

dis-covering and managing IoT devices, Fog and Cloud nodes are crucial to support service and resources management. IoT applications over the Fog will opportunistically exploit for analytics or to trigger prompt responses to events. Hence Things and computing capabilities available in a certain area should be identifiable, also accounting for mobility patterns. Due to heterogeneity of software and devices involved in the Fog, sensors, network, platforms and applications will fail. Reliability of Fog infrastructures must be ensured exploiting distributed techniques for handling churn, failures and dynamics in the network.

Privacy and security: Authentication, access control, intrusion detection, anti-tampering solutions and cryptography will play a protagonist role in the Fog. The involved users and service providers will all benefit from acting in a trusted execution environment and will require it to be so. Lack of tools to guarantee data security and privacy in the Fog can easily cause the sinking of the novel paradigm.

Energy consumption: As for its ability to process data nearer to where it is produced or used, Fog computing can lead to a wiser use of resources and to power savings. However, devices at the edge must adopt energy efficient standards and protocols (like CoAP) to guarantee battery savings and to prevent highly distributed computation from being less efficient.

(23)

Chapter 3 Motivating Example

The agricultural sector is going to face non-trivial challenges in order to keep pace and feed the 9.6 billion people that FAO predicts are going to inhabit planet Earth by 2050. On one hand, food production will have to increase by 70% in the next 30 years. On the other hand, this process must happen whilst reducing as much as possible negative impacts over the environment and the climate [9].

Fog Computing is a promising candidate to boost the introduction of new technologies that ensure the improving, optimisation and maximisation of the outcomes for small and medium farms, which are looking forward to a more ap-propriate, effective and sustainable use of resources (e.g., arable land, fresh water, energy, chemicals, labour force etc.). If large farms have already adopted digital solutions, smaller producers will go through this revolution in the years to come, due both to technological and economical barriers. Particularly, rural realities must cope with an Internet infrastructure that is usually poorly developed or absent – not supporting a performant connection to the Cloud – and governe-ments have only recently started funding the modernisation of their agricultural industry [42].

In this context, the filtering action performed by Fog nodes and their ability to manage sensed data locally would significantly ease and encourage the adoption of the aforementioned digital solutions. Indeed, Fog computing can potentially reach full expression, at a relative low cost, for [49]:

(24)

Chapter 3. Motivating Example

pressure, environment, gates, valves, cameras, storage facilities, etc., • remote control, e.g., opening and closing of valves, lights, pumps, heaters,

robotic vehicles, drones, etc.,

• information transfer, e.g., incorporation of environmental data into decision support systems, weather, market and operational information spreading, real-time information,

• communication, by means of text, graphical and video messages between the farm operators,

• asset tracking, locating vehicles, drones, irrigation systems and livestock, • remote diagnosis, from external experts who are granted permission to

ac-cess the collected or proac-cessed data.

Whereas other areas of application of Fog computing, such as smart cars and drones traffic control, have been reported as examples in literature [40], the smart agriculture use cases are yet to be investigated. Within this chapter, a lifelike (fake) application for precision agriculture is discussed and will be adopted as motivating example in the rest of this thesis work.

3.1 The SmartFields Application

PlantsML LTD produces smart solutions for agricultural purposes, IoT devices that monitor temperature, soil moisture, UV exposition and nutrients of given plantation. Their flagship product, SmartFields, includes an application that is able to respond to sensed data by properly irrigating and fertilising the plantation, also based on historical information which is periodically analysed by a machine learning engine in order to improve the overall efficacy of the solution. The latest release of the product includes fire and flood sensors triggering immediate alert and intervention in case one of the two unfortunate events happen.

SmartFields is a multi-component application, as shown in Figure 3.1. The installed sensors and actuators (“Things” from now on) require a software surro-gate, commonly referred to as gateway, to collect and transmit data to the system.

(25)

In the Fog scenario, this surrogate interacts with the middleware distributed stor-age and may collect data also from Things connected to nodes other with respect to the one upon which it is deployed. Hence we will refer to these components as data collectors (or collectors). In the next paragraphs each of the software com-ponents of SmartFields is presented, mainly focusing on the interactions with the other ones.

Figure 3.1: The components of the SmartFields application. Arrows represent interactions among components through proper interfaces.

3.1.1 Components Interactions

The Irrigation and Fertilisation component of SmartFields is responsible for pro-cessing the information coming from a videocamera, and from UV, soil moisture, salinity and temperature sensors in order to perform the irrigation and fertil-isation of the plantation without wasting water nor wrongly dosing chemicals and always adapting to the kind of cultivated crops and to the current weather

(26)

conditions. In doing so, the component interfaces with:

• the Fire and Flood Detection component, which must be informed of water-ing to avoid false floodwater-ing alarms durwater-ing large irrigation phases and which must in turn trigger the stop of fertilisation in case of fire detection, • the Insights Backend, accessible via dedicated thin clients and responsible

for visualisation of the collected data and for sending back to the actuators manual commands from the final user,

• the Machine Learning Engine, to which it sends average data to improve the information within the plants database – to perform further analyses over the cultivation – and from which it receives updates for the local reduced version of the database, related to the cultivated plants only.

In turn, the Fire and Flood Detection component should interact with the Insights Backend to which it rapidly reports the necessary alerts coming from the connected gateway. In case of fire, the module also activates the extinguishers; in case of flooding, it opens the floodgates. To conclude, the role of the Insights Backend and the Machine Learning Engine has been outlined before but for the fact that the latter component is actually a service, centrally deployed and managed by PlantsML LTD over a Cloud datacentre in Oregon.

Being the structure of the application sufficiently framed, by now some con-siderations about the expected behaviour of its deployment are to be made from the point of view of the final users and with respect to some QoS parameters.

3.1.2 QoS Requirements

To achieve their purposes of promptness and data analysis near the end user, the interaction among IoT application components may have to meet some require-ments in terms of both latency and bandwidth. Those values can be determined by domain experts, in light of their experience, or assessing the final user needs, and may be subject to change after the application has been deployed and mon-itored for a while. In practice, the latency requirement is going to be more

(27)

stringent the more a performed task is sensitive, whilst the bandwidth require-ment is going to be more demanding the more data the Things exploited by a component produce.

In the case of our SmartFields example, the owner of the fields would expect the fire extinguishers to start as fast as possible once the alarm has been raised by the detection component. Analogously, the watering valves should be closed within a reasonable time span, starting from when the plants got the necessary amount of water and fertiliser. Then, the farmer should get insights about the cultivation within a reasonable amount of time, i.e. not days later they have been produced. Last but not least, the machine learning engine should get data from the sensors to improve its model but it can definetely wait for an aggregate update, since it does not require real-time data or almost continuous control as the flooding component does.

Akin considerations would show how the required bandwidth decreases going bottom up from the Things-connected components to the machine learning en-gine. Indeed, the data produced by the installed Things, huge and regular during the day, is reduced by the aggregating operators available at each component and exploited to perform analysis and return insights. In the next chapter, while discussing the proposed model for QoS-aware deployment, the example will be detailed with the introduction of lifelike values.

3.2 Old Mac Donald’s Farm

In the depicted scenario, suppose that Old Mac Donald’s Farm – a medium size farm – considers adopting the SmartFields solution to monitor, improve and secure their cultivations.

Within Old Mac Donald’s agricultural lands, the hired IT experts plan to install two Fog nodes, connected to the Internet via the 4G network and inter-connected through an Ethernet cable. At the same time, Farm X joined two other local farms in a Consortium and agreed to adopt a WLAN solution to connect to a shared Fog node, accessing the Internet via a 100 Mbps VDSL connection. The sensors and actuactors owned by Farm X are connected to their local Fog nodes through the ZigBee protocol, the Consortium node hosts a complete IoT

(28)

weather station, serving the entire area. The described infrastructure is sketched in Figure 3.2.

Figure 3.2: The Fog infrastructure available at Farm X.

3.3 Objectives

The model proposed in Chapter 3 and the algorithms discussed in Chapter 4 will help the system designers in understanding if the selected infrastructure is suffi-cient to support the SmartFields solution and where each component should go to guarantee certain QoS requirements are met. Considering latency and band-width constraints, as well as business policies and exploited Things capabilities, the model will be able to identify the right node where to deploy each compo-nent of SmartFields. Hence, system administrators will be able to evaluate their decisions about capacity planning and feasibility.

Suppose that buying one of the two Fog nodes without the wireless module leads to money savings or that Old MacDonald’s staff want to check what happens in case wireless connectivity of one of the two nodes does not work. The model will be able to simulate that particular scenario and determine alternative deployment of the application, if there exists any.

(29)

Chapter 4 Modelling the Fog

To the best of our knowledge, the first and only attempt to provide a formal model of the Fog computing architecture has been proposed in [45], which focuses on the analysis of energy consumption and advocates Fog computing as a greener solution with respect to traditional Cloud computing. In their model, the authors consider as terminal nodes whole devices such as smart cars, mobile phones and smart metres. The model that we propose adopts instead a finer level of detail, identifying the connected sensors and actuators that collect data within those and other types of devices, such as small computational nodes in the Fog layer. We adopt this approach since, at the state of the art, the majority of cost effective IoT devices feature limited processing power or data storage, while all of them provide some kind of wireless connectivity [1].

A Fog infrastructure is a system including IoT devices, one or more layers of Fog computing nodes, and at least one Cloud provider.

As we will see, given a Fog infrastructure I and an application A, our model permits to identify the deployments of A on I that ensure that QoS requirements and business policies of A will be fulfilled.

4.1 QoS Profiles

Capturing QoS requirements in the model requires defining which operational systemic qualities are of interest in the Fog scenario. Among the possible QoS

(30)

Chapter 4. Modelling the Fog

metrics, we consider only latency and bandwidth since, whilst loss and jitter can be remedied through retransmission or buffering respectively, nothing can be done to tame the former two at run time. Additionally, since Fog computing will exploit wireless access to the Internet and will bring computation at or nearer the end user, it is realistic to model asymmetric link bandwidth. Currently, devices at the Edge of the network that provide wireless connectivity and/or access to the Internet (e.g., gateways, connected cars, mobile phones etc.) often exploit a connection that does not feature the same uplink and downlink bandwidth, what also depends on the environment where they are placed (e.g., presence of physical or electromagnetic obstacles).

The following definition frames accordingly the set of QoS profiles, which will be used both to denote the QoS of an actual communication link and to specify the QoS required for the link supporting the communication between two application components.

Definition 1. The set Q of QoS profiles is a set of pairs h`, bi where ` and b denote respectively the average latency and bandwidth featured by (or required for) a communication link. The bandwidth is a pair (b↓, b↑), distinguishing the

download and upload bandwidth of a link. Unknown/unspecified values are denoted by ⊥.

The average latency and bandwidth of a link can be measured via monitoring tools and periodically updated.

4.2 Fog Infrastructures

At this stage, the concept of Fog system has to be formalised so to precisely frame the available target infrastructure, assuming IoT devices, Fog nodes and Cloud datacentres will be part of such infrastructure.

Definition 2. A Fog infrastructure is a 4-tuple hT, F, C, Li where

(31)

• T is a set of Things, each denoted by a tuple t = hi, π, τ i where i is the identifier of t, π denotes its (geographical) location and τ its type,

• F is a set of Fog nodes, each denoted by a tuple f = hi, π,H, Σ, Θi where i is the identifier of f , π denotes its (geographical) location, H and Σ are the hardware and software capabilities it provides, and Θ ⊆ T contains all Things directly reachable from f ,

• C is a set of available Cloud datacentres, each denoted by a tuple c = hi, π, Σi where i is the identifier of c, π denotes its (geographical) location and Σ the software capabilities it provides,

• L = {hn, n0_{, qi|(a, b) ∈ (F ×F )∪(F ×C)∪(C ×F )∪(C ×C)∧q ∈ Q} is a set}

of available Fog-to-Fog, Fog-to-Cloud and Cloud-to-Cloud communication links1, each associated to its QoS profile.

Some observations are now due to justify the choices made in the definition of the model.

Firstly, all the elements included in a Fog infrastructure are characterised by their current location, assuming it is known at some level of detail. On one hand we assume that sensors/actuators and Fog nodes are provided with geo-spatial location technologies such as GPS. On the other hand Cloud providers usually disclose to customers the geographical area of their datacentres. This assumption permits to include location awareness among the parameters considered when searching for valid deployments. Indeed, Cloud datacentres cannot be moved from where they are (not in a night-time at least), but (some) Fog nodes and Things are expected to be able to move from one place to another and deployment of applications over them should embrace and adapt to their mobility patterns. When some ad hoc taxonomy for identifying geographical locations will be devised in the context of Fog computing, it will be easy to include it in the model2_{. As a}

side note, identifiers for all the modelled entities are needed to describe the case in which more than one entity resides in the same location.

1_{We assume that if hn, n}0_{, qi ∈ L then hn, n}0_{, q}0_{i /}_{∈ L with q 6= q}0_{. We also assume that if} hn, n0_{, h`, b}

↓, b↑ii ∈ L and hn0, n, h`0, b0↓, b0↑ii ∈ L then ` = `0, b↓= b0↑ and b↑= b0↓.

2_{For instance, if GPS coordinates are used to model the location of Things, Fog nodes and} Cloud datacentres, then π = hπlat, πloni.

(32)

Secondly, we abstract from the type of connection technologies employed both at the Wireless Sensor Network (Bluetooth, Zigbee, RFID, etc.) and at the Access Network (xDSL, FttX, 4G, etc.) levels since our focus is on the QoS a given communication link between two endpoints can offer in terms of latency and bandwidth. As a consequence, all available links are modelled uniformly in Θ and L. For the purposes of the model, Things in Θ directly connected to a Fog node – either via wired or wireless connection – are assumed to have negligible latency and infinite bandwidth. Furthermore, current proposals for the Fog architecture [40, 16] include the design of a middleware responsible for protocol bridging, resource discovery and exposing APIs to make the collected data available to other Fog and Cloud, services and systems. In these regards, since security issues are considered orthogonal to (and outside the scope of) this thesis, we assume that Fog nodes can reach the sensors and actuators of all their neighbouring nodes through some middleware interfaces, experiencing the QoS of the associated links in L.

Thirdly, the model does not deliberatly bind to any particular standard for hardware and software capabilities specification. Realistically, hardware specifi-cation will include both consumable (e.g., RAM, storage) and non-consumable resources (e.g., architecture, CPU cores), whereas software capabilities may con-cern the available OS (Linux, Windows, etc.) and the installed platforms or frameworks (.NET, JDK, Hadoop MapReduce, etc.). The choice of how to spec-ify hardware and software capabilities – e.g., with TOSCA YAML [7] like in the SeaClouds project [19, 13], or with other formalisms – is however not bound in the model.

Finally, Cloud computing is modelled according to the hypothesis that it can offer a virtually unlimited amount of hardware capabilities to its customers. This simplification permits the description of any among SaaS, PaaS and Iaas providers, and eliminates the need to describe any particular commercial offering in terms of Virtual Machine (VM) types. Overall, when compared to Fog nodes capabilities, it is true that a Cloud customer can always add processing power and storage by purchasing extra VM instances, as if they were unbounded.

(33)

4.2.1 Motivating Example: Fog Infrastructures

In this section the modelling of the infrastructure for the Old MacDonald’s Farm example from Chapter 2 is described. Figure 4.1 details the QoS parameters determined at design time by the IT experts hired by the farm. Starting from the Things, we give some examples of items in set T :

T = { hmoisture0, p0, moisturei, hvideocamera0, p0, videocamerai, hUV0, p0, UVi, ... } Afterwards, it is the turn of the processing capabilities, starting from the Fog nodes. For the sake of the example, software offerings are represented as strings and hardware offerings only consider the amount of RAM available at a node in Gigabytes, represented as natural numbers. For denoting the connected Things, we use identifiers as a shortcut notation instead of the whole tuples.

F = {

hlocal_1, π0, 2, {Linux, C++, Python},

{moisture0, videocamera0, UV0, salts0, water0}i, hlocal_2, π1, 4, {Linux, C++, Python},

{fertiliser0, extinguisher0, flood0, fire0, floodgates0, water1}i, hconsortium_1, π2, 10, {Linux, C++, Python, MySQL},

{wind0, temperature0, pressure0}i

} Analogously, the two Cloud datacentres are reported below with their software capabilities. Locations are denoted by the names of the States where they are

(34)

Figure 4.1: The Fog infrastructure available at Old MacDonald’s Farm. Links report the estimated QoS values for latency and bandwidth. When the bandwidth is asymmetric (i.e. b↓ 6= b↑) the related values are reported as b↓/b↑ and the arrow

of the link indicates the upload direction. The software offerings are on the right handside (black points) and the hardware offerings on the left handside (white points) of each node.

(35)

situated:

C = {

hcloud_1, Netherlands, {Java, .NET, Ruby, MySQL}i, hcloud_2, Oregon, {C++, Spark, MySQL, Linux, Windows, Python}i,

} Finally, the list of the exploitable links involving local_1 is listed below.

L = {

hlocal_1, local_2, h1 ms, (100 Mbps, 100 Mbps)ii, hlocal_1, consortium_1, h5 ms, (20 Mbps, 20 Mbps)ii,

hlocal_1, cloud_1, h130 ms, (10 Mbps, 8 Mbps)ii, hlocal_1, cloud_2, h200 ms, (12 Mbps, 10 Mbps)ii,

...

} The information about all other links is reported in Figure 4.1. For Things directly connected to Fog nodes, we consider null latency and infinite bandwidth.

4.3 Applications

At this stage, it is crucial to model the applications in such a way to permit to check which nodes can run them in the Fog and Cloud layers and so that their definition includes non-functional requirements. Modern large scale applications are not monolithic anymore. They are made out of independently deployable components and services that can work together exploiting proper interfaces to interact with each other. Therefore, an application running over a Fog computing infrastructure can be realistically thought as a set of software components that are working together and must meet some QoS constraints.

(36)

hΓ, Λ, Θi where

• Γ is a set of software components, each denoted by a tuple γ = hi,H, Σi where i is the identifier of γ, and H and Σ the hardware and software requirements it has.

• Λ = {hγ, γ0_{, qi|(γ, γ}0_{) ∈ (Γ × Γ) ∧ q ∈ Q} denotes the existing interactions}

among components1 in Γ, each expressing the desired QoS profile for the connection that will support it.

• Θ is a set of Things requests each denoted by hγ, τ, qi, where γ ∈ Γ is a software component and τ denotes a type of Thing the component needs to reach with QoS profile q so to work properly.

The modelling of applications comprises QoS profiles for the interactions between components to express the desired operational systemic qualities, together with hardware, software and Things requirements for a component to work properly2. We now give define the notion of compatibility of a software component with a node of a Fog infrastructure, be it a Fog node or a Cloud datacentre. Fog nodes must offer the needed software and non-consumable hardware capabilities, and enough consumable hardware to support at least that component. Compatibility of Cloud datacentres only requires the needed software platforms and frameworks to be available for deployment. Intuitively speaking, a node is compatible with a software component if it can potentially be used to deploy it.

Definition 4. Let A = hΓ, Λ, Θi be an application and I = hT, F, C, Li a Fog infrastructure. A component hi,H, Σi ∈ Γ is compatible with a node n ∈ F ∪ C if and only if either one of the following holds3:

1_{As before, we assume that if hγ, γ}0_{, qi ∈ Λ then hγ, γ}0_{, q}0_{i /}_{∈ Λ with q 6= q}0_{. We also assume} that if hγ, γ0, h`, b↓, b↑ii ∈ Λ and hγ0, γ, h`0, b0↓, b0↑ii ∈ Λ then ` = `0, b↓= b0↑ and b↑= b0↓.

2_{For the sake of simplicity, we consider that one component can only require one Thing of} each type. Straightforward change of Θ to a multiset would permit to model more than one Thing requirement of the same type.

3_{The relations v and can be read as is satisfied by. Σ v Σ when software offerings Σ} at a node fulfil all software requirements Σ of a component. H H when non-consumable and consumable hardware resources H at a node are enough to support a given component requirementsH.

(37)

• if n = hj, π,H, Σ, Θi ∈ F , then Σ v Σ and H H, • if n = hj, π, Σi ∈ C, then Σ v Σ.

4.3.1 Motivating Example: Applications

As for the infrastructure, it is time to describe the composite application of the example in Chapter 2 as A = hΓ, Λ, Θi that is sketched in Figure 4.2. Starting with the set S of software components of the SmartFields application, we get the set reported below.

S = {

hirrigationDC, 1, {Linux, C++}i, hfirefloodDC, 1, {Linux, C++}i, hirrigation, 2, {MySQL, Python, C++}i,

hfireflood, 1, {Python, C++}i, hinsights, 4, {.NET, MySQL}i, hMLengine, 8, {Spark, MySQL}i

} Software interactions are represented analogously to the infrastructure links.

Λ = {

hirrigationDC, irrigation, h15 ms, (8 Mbps, 1 Mbps)ii, hfirefloodDC, fireflood, h5 ms, (2 Mbps, 2 Mbps)ii, hirrigation, fireflood, h15 ms, (1 Mbps, 1 Mbps)ii, hirrigation, insights, h60 ms, (4 Mbps, 1 Mbps)ii, hirrigation, MLengine, h200 ms, (3 Mbps, 6 Mbps)ii,

hfireflood, insights, h15 ms, (1 Mbps, 1 Mbps)ii, hinsights, MLengine, h60 ms, (2 Mbps, 2 Mbps)ii

(38)

Figure 4.2: The SmartFields application. Links report the required QoS val-ues with the same conventions as in Figure 4.1. For each component, software requirements are on the right handside (white diamonds) and hardware offerings are on the left handside (oblique lines). Things requirements are at the bottom of gateway modules.

(39)

Finally, the set Θ is reported.

Θ = {

hirrigationDC, videocamera, h30 ms, ⊥ii, hirrigationDC, water, h30 ms, ⊥ii, hirrigationDC, fertiliser, h30 ms, ⊥ii,

hirrigationDC, moisture, h30 ms, ⊥ii, hirrigationDC, UV, h30 ms, ⊥ii, hirrigationDC, salts, h30 ms, ⊥ii, hirrigationDC, flood, h30 ms, ⊥ii,

hfirefloodDC, fire, h0 ms, ⊥ii, hfirefloodDC, extinguisher, h0 ms, ⊥ii,

hfirefloodDC, floodgates, h0 ms, ⊥ii,

}

4.4 Deployments

The ultimate goal of this work is to determine one or more eligible deployments of an application A over a Fog infrastructure I that satisfy all hardware, software and QoS requirements and business policies of A. Yet some details are missing before it is possible to achieve our goal. We first formalise the notion of deployment. Definition 5. Let A = hΓ, Λ, Θi be an application and I = hT, F, C, Li a Fog infrastructure. A deployment for A over I is a mapping ∆ : Γ → F ∪ C from each software component of A onto a Fog or Cloud node of I.

Since an IoT application deployment is much related to the Things it should manage, a deployment should not only consider mapping software components onto the nodes that will host them, but also Things requirements onto the physical world sensors and actuators that the application will exploit at runtime. The deployment designer, given the specification of an application A = hΓ, Λ, Θi, should be able to define the Things the application will rely upon, once deployed,

(40)

i.e. she should be allowed to specify the bindings between Things requirements and Things in the real world. Consider the case of an application that smartly manages household appliances (e.g., washing machine, oven, etc.): it is easy to convince ourselves that sensors and actuators must be exactly identified for each running instance of the application.

Additionally, the deployment of multi-scale software systems may depend on legal, commercial or political business policies. For instance:

• a start-up sponsored by a specific Cloud provider may want to enforce the (free) use of the datacentres owned by its sponsor for the deployment of its application,

• an automated industrial plant may be interested in keeping on local Fog nodes those components that include undisclosable business procedures or data,

• one of the components in the application can be an invoked third-party service, already deployed at some specified endpoint(s),

• a Cloud provider also manages a business that represents a direct competitor of the application to be deployed, so the system administrator does not want to exploit the datacentres owned by that provider.

It is hence fundamental for the model to be expressive enough to capture require-ments for Things specification and business policies.

Definition 6. Let A = hΓ, Λ, Θi be an application and I = hT, F, C, Li a Fog infrastructure. A deployment specification for A over I is a couple h∆P, ϑPi such

that:

• ∆P : Γ → 2F ∪C specifies a set of Fog or Cloud nodes of where the deployment

of each component of A is permitted (or has been already performed)1, • ϑP : Θ → T maps each Thing request onto a specific Thing.

1_{We assume the shortcut notation ∆}

(41)

Both elements in the definition are important. The function ∆P specifies the

nodes where a certain component can be (or is already) deployed, according to current business policies. It implements a whitelisting strategy that is safer to avoid exploiting undesired computational capabilities for deployment purposes1. The function ϑP ensures that one exactly knows and controls the Things that

the application will exploit, what is essential for the whole deployment to work, sensing from and actuating upon the correct Things.

Anyway, what discussed until now are necessary but non-sufficient conditions for a deployment to be eligible. They do not include any checks on compatibility or on whether the Fog infrastructure can support interactions among software components and between software components and Things. Particularly, if more than one component is mapped onto the same node the model shall account for the fact that the consumable hardware resources are enough to support them all. Analogosuly, if more than one interaction ends up onto the same communication link, latency requirements must be met and the available link bandwidth shall not be exceeded by the requested bandwidth.

Definition 7. Let A = hΓ, Λ, Θi be an application, I = hT, F, C, Li a Fog infras-tructure and h∆P, ϑPi a related deployment specification. A deployment ∆ for A

over I that complies with h∆P, ϑPi is eligible if and only if:

1. for each γ ∈ Γ, γ is compatible with ∆(γ) and ∆(γ) ∈ ∆P(γ),

2. let Γf = {γ ∈ Γ | ∆(γ) = f } be the set of components mapped onto f ∈ F .

Then, for each f = hi, π,H, Σ, Θi ∈ F X

hj,_H,Σi∈Γf

H H 2_,

3. for each Thing request hγ, τ, qi ∈ Θ such that ϑP(hγ, τ, qi) = t, there exists

f = hi, π,H, Σ, Θi ∈ F such that t ∈ Θ and ∆(γ) = f or h∆(γ), f, q0i ∈ L.

1_{In some cases, it would be more concise to adopt a blacklisting mechanism to specify} those nodes that are not suitable for deployment. Even if this approach may lead to problems (suppose not to know a competitor inaugurates a new datacentre), nothing prevents extensions of the model to support either strategy.

2_{We assume it is possible to “sum” the consumable hardware requirements so as to compare} their “total” with the available offerings at a given Fog node.

(42)

4. let

Λ(m,n) = {hγ, γ0, h`, bii ∈ Λ | ∆(γ) = m ∧ ∆(γ0) = n}

be the set of software dependencies that map onto the same communication link between m and n, and let

Θ(m,n) = {hγ, τ, h`, bii ∈ Θ | ∆(γ) = m ∧ ϑ(hγ, τ, h`, bii) = t ∧

n = hi, π,H, Σ, Θi ∈ F : t ∈ Θ} be the set of Things request that exploit non-directly connected Things and maps onto the same communication link between m and n. Then, for each hm, n, h`, bii ∈ L

(∀ hx, y, h`, bii ∈ Λ(m,n) ∪ Θ(m,n). ` ≤ `) ∧

X

hx,y,h`,bii∈Λ(m,n)∪Θ(m,n)

b ≤ b 1.

This definition states all the conditions for a deployment of A over I to be eligible in our model, according to the current state of the Fog infrastructure. Condition (1) guarantees that the business policies of A are met and checks hardware and software compatibility of each component with the node onto which it will be deployed, as in Definition 4. Condition (2) checks those hardware capabilities that are consumed when installing more than one component onto the same Fog node, e.g. RAM, storage. All components deployed onto a single node cannot exceed its hardware capacity, e.g. two components requiring on average 3 GB of RAM cannot be deployed onto the same node that only offers a RAM of 4 GB. Condition (3) ensures that Thing requests of each component are satisfied. A component γ ∈ Γ deployed over n ∈ F ∪ C can reach Things directly (when n directly connects to them) or remotely access them (when n reaches a Fog node m that directly connects to them). Both situations are taken into account by the modelling. Condition (4) ensures that the communication links latency and bandwidth can support all the interactions among components that will map onto them. This concerns interactions in Λ as well as remote Things access. As in condition (3), the model assumes that a Fog node can exploit neighbouring Things

(43)

when needed, what consumes bandwidth over links. This is important, since the amount of data generated by Things can be considerable, e.g. in case of constant tracking and monitoring of a fleet of flying drones for package delivery [5]. Overall, latency offered by a link must at most equal the required one, and bandwidth consumed by components interactions and by remote Things access over the same link must not exceed the link capacity.

4.4.1 Motivating Example: Deployments

In the Old MacDonald’s Farm scenario, the only components that require connec-tion to Things are the data collector and the only component subject to business constraints is the ML Engine, being deployed and maintained by Plants LTD onto cloud_2. We start from the definition of ∆P, as aforementioned the shortcut >

stands for F ∪ C. ∆P(γ) =    cloud_2 if γ = MLengine > otherwise

Then, we explicitly list the function θ mapping the Things requests of the appli-cation onto actual Things of the Fog infrastructure.

ϑP(hirrigationDC, videocamera, h30 ms, ⊥ii) = videocamera0

ϑP(hirrigationDC, water, h30 ms, ⊥ii) = water1

ϑP(hirrigationDC, moisture, h30 ms, ⊥ii) = moisture0

ϑP(hirrigationDC, UV, h30 ms, ⊥ii) = UV0

ϑP(hirrigationDC, salts, h30 ms, ⊥ii) = salts0

ϑP(hirrigationDC, UV, h30 ms, ⊥i)i) = UV0

ϑP(hirrigationDC, temperature, h30 ms, ⊥i)i) = temperature0

It is worth noting that all sensors exploited by irrigationDC are directly con-nected to local_1 but for the temperature sensor (at the Consortium node) and

(44)

the watering splinker (at local_2).

ϑP(hfirefloodDCfire, h30 ms, ⊥ii) = fire0

ϑP(hfirefloodDCextinguisher, h30 ms, ⊥ii) = extinguisher0

ϑP(hfirefloodDCflood, h30 ms, ⊥ii) = flood0

ϑP(hfirefloodDCfloodgates, h30 ms, ⊥ii) = floodgates0

At this point, all the elements are there to show a valid deployment ∆ of A over I. ∆(irrigationDC) = local_1 ∆(firefloodDC) = local_2 ∆(irrigation) = consortium_1 ∆(fireflood) = consortium_1 ∆(insights) = consortium_1 ∆(MLengine) = cloud_2

Figure4.3 shows the result of the proposed eligible deployment. In the next Chapter, the problem of finding one (or all) eligible deployments is discussed.

(45)

Figure 4.3: The infrastructure of Figure 4.2.1, when deploying SmartFields. Exploited Things are coloured as the software component that uses them. The RAM capability of each node is shown graphically beside Fog nodes: each rectan-gle is 1 GB and it is coloured as the component consuming it. Link bandwidths have been updated and interactions mapped on each link are listed under QoS profiles.

(46)

Chapter 5 Finding Eligible Deployments

The problem of finding a eligible deployment for an application over a Fog infras-tructure is analogous to multi-constrained planning problems that have been stud-ied in contexts other than Fog computing like Virtual Network mapping [34], de-ployment over Wide Area Networks [32] or, more recently, over the Cloud [36,19]. Algorithms to determine eligible deployments over Fog infrastructure should manage latencies, uplink and downlink bandwidths, constraints over the Things that a component uses, business policies, location awareness, resource allocation, the possibility to deploy more than one component onto a single computational node and more than one interaction onto a single communication link. Starting from an empty (or partial) deployment they should search for eligible (Defini-tion 7) assignments ∆ mapping all software components to Fog or Cloud nodes.

After discussing the complexity of our Components Deployment Problem (CDP), procedures are presented to preprocess the input data and to search for an eligible solution, also exploiting heuristic strategies.

We have also prototyped a simple offline Java tool, named FogTorch1, that im-plements (the core of) the devised algorithms with the purpose of demonstrating their technical feasibility. The tool has been tested over the motivating example of this thesis and is briefly discussed at the end of this chapter.

1_Systemic _QoS-aware _Deployment _over _the _Fog. _Available _at:

(47)

Chapter 5. Finding Eligible Deployments

Figure 5.1: The search space for CDP.

5.1 Problem Complexity

Given an application A = hΓ, Λ, Θi and a Fog infrastructure I = hT, F, C, Li, as defined as in Chapter 4, finding one or more eligible deployments ∆ : Γ → F ∪ C is a decidable problem that requires a worst-case search among O(N|Γ|) candidate deployments, where N = |F ∪ C|. Indeed, it is possible to brute-force try the deployment of all components onto all nodes in F ∪ C, one at a time. This corresponds to visiting a (finite) search space tree having at most N nodes at each level and height |Γ|, as sketched in Figure 5.1. Each node in the state space represents a (partial) deployment, where the number of deployed components corresponds to the level of the node, i.e. the root corresponds to an empty deployment, nodes at level i are partial deployments of i components, and leaves at level |Γ| contain complete deployments. Edges from one node to another represent the action of deploying a component onto some Fog node or Cloud node.

QoS-Aware Deployment Through the Fog