• Non ci sono risultati.

Analytical assessment of binary data serialization techniques in IoT context (evaluating protocol buffers, flat buffers, message pack, and BSON for sensor nodes)

N/A
N/A
Protected

Academic year: 2021

Condividi "Analytical assessment of binary data serialization techniques in IoT context (evaluating protocol buffers, flat buffers, message pack, and BSON for sensor nodes)"

Copied!
81
0
0

Testo completo

(1)

Serialization Techniques in IoT Context

[Evaluating Protocol Buffers, FlatBuffers, MessagePack,

and BSON for Sensor Nodes]

Amrit Kumar Biswal

Obada Almallah

Student Id: 873052

Student Id: 894304

Advisor: Prof. Marco Brambilla

Dipartimento di Elettronica, Informazione e Bioingegneria

Politecnico di Milano

This thesis is submitted for the degree of

Master of Science in Computer Science and Engineering

December 2019

(2)
(3)

First and foremost, we would like to express our appreciation to the power of teamwork and

synchronized efforts that led the execution of this thesis.

We would like to express our gratitude to our supervisor Prof. Marco Brambilla, who guided

us in the right direction and helped us overcome the obstacles faced.

Moreover, we would like to express our appreciation and acknowledge the power and

vigorous efforts of the open-source community in developing and maintaining technologies

that helped us conduct our experiments.

Last but not least, we would like to thank our families and friends for the support,

motivation, and inspiration they empowered us with that helped us stay on track. Special

(4)

The exceptional development of IoT-enabled technologies has been pervasively fostering

smart solutions that drive the innovation in various industries. This wide-scale and rapid

development in the IoT market faces different challenges related to interoperability, due

to fragmented technologies and applications, and related to the need of more IoT-specific

standardization applied to the IoT infrastructure that considers the constraints and limitations

of IoT applications. These issues can be traced back to the need of achieving a better resource

efficiency in IoT systems due to the resource-constrained nature of IoT devices and the

limitations posed by the low power networks bandwidth, which in turn induces the IoT

industry to apply different measures in order to achieve efficiency on the devices and on

the network. These measures aim at reducing the overall traffic at the network level as

well as reducing power and memory consumption at the device level. By achieving optimal

network and device settings, the IoT system can in turn have the capacity to offer more

services on the application level which consequently accelerates the achievement of the vision

and expectations of IoT. One of the operations affecting device-level efficiency lies in the

mechanism applied to structure and serialize the payload data. In this thesis we evaluate the

resource-efficiency of a group of binary-based mechanisms in dealing with sensor nodes data

in purpose of arriving to candidate technique(s) that have the potential of being standardized

within the IoT infrastructure as interoperable data format protocol(s).

Keywords: IoT, Internet-of-things, Performance, Evaluation, Binary, Semantic, Data

Serialization, Protocol Buffers, FlatBuffers, MessagePack, BSON

(5)

L’eccezionale sviluppo di tecnologie abilitate all’IoT ha pervasivamente promosso l’adozione

di smart solutions che guidano l’innovazione in vari settori industriali. Questo rapido e

consistente sviluppo avvenuto del mercato IoT deve tuttavia affrontare, a causa della

fram-mentazione di tecnologie e applicazioni utilizzate, diverse sfide legate all’interoperatibilita’,

collegate alla necessita’ di una standardizzazione maggiormente specifica applicata

all’infra-struttura IoT che tenga conto di vincoli e limitazioni delle applicazioni IoT. Tali questioni

hanno radici nella necessita’ di un uso piu’ efficiente delle risorse disponibili nei sistemi IoT,

a causa della limitatezza dei mezzi dovuta alla natura dei dispositivi IoT e ai vincoli imposti

dalla capacita’ delle reti, le quali, d’altra parte, stimolano l’industria IoT ad appliccare

differenti soluzioni volte all’efficienza di tali dispositivi su tali reti. Queste soluzioni mirano

alla riduzione del traffico complessivo a livello della rete e al contempo a limitare l’utilizzo

energetico e di memoria a livello del dispositivo. Con l’utilizzo di impostazioni ottimali sia

per la rete che per il dispositivo il sistema IoT diventa in grado di fornire maggiori servizi a

livello di applicazioni, i quali di conseguenza accelerano l’avvicinamento verso l’obiettivo

dell’IoT. Una delle operazioni che influenzano l’efficienza a livello di dispositivo si trova

nel meccanismo utilizzato nella strutturazione e serializzazione del carico di dati. In questa

tesi, valutiamo un gruppo di meccanismi a base binaria sulla base della loro efficienza nel

confrontarsi con dati dei sensor nodes con l’obiettivo di isolare le tecniche con il potenziale

di essere standardizzate all’interno dell’infrastruttura IoT come protocolli interoperativi di

data format.

(6)

Acknowledgments iv

Abstract v

Sommario vi

1 Introduction 1

1.1 Context & Motivation . . . 1

1.2 Research Goals . . . 3

1.3 Thesis Structure . . . 4

2 Background 5 2.1 Internet of Things . . . 5

2.2 Key Features and Challenges . . . 6

2.3 The Internet of Things Architecture . . . 8

2.4 Connectivity Protocols/Standards . . . 9

2.5 Models . . . 13

2.6 Data Exchange Protocols . . . 16

2.7 Data Format Protocols . . . 18

2.7.1 Text-Based Protocols . . . 19

2.7.2 Binary-based Protocols . . . 21

3 Related Work 24 3.1 Data Exchange Protocols Evaluation . . . 24

3.2 Payload Optimization Approaches . . . 26

3.3 Comparative Discussion . . . 29

4 Binary Data Serialization in Sensor Nodes 31 4.1 Introduction and Case Statement . . . 31

4.2 Implemented Protocols . . . 32 vii

(7)

4.3.2 Quantitative Analysis . . . 35

5 Hardware Setup and Implementations 37 5.1 NodeMCU Specifications . . . 38

5.2 Arduino IDE . . . 39

5.3 Platform Configuration . . . 39

5.4 Protocol Buffers Implementation . . . 39

5.5 FlatBuffers Implementation . . . 40

5.6 MessagePack Implementation . . . 41

5.7 BSON Implementation . . . 42

6 Simulation Setup and Implementation 43 6.1 Contiki OS . . . 43

6.2 Cooja Simulator . . . 44

6.3 Simulation setup . . . 45

6.4 Benchmarking . . . 47

6.5 Nanopb (Protocol Buffers) . . . 50

6.6 FlatCC (FlatBuffers) . . . 50

6.7 CWPack (MessagePack) . . . 51

6.8 BINSON (BSON partial implementation) . . . 51

7 Experiments and Discussion 52 7.1 Payload Structure Types . . . 52

7.2 Hardware Runs . . . 54

7.3 Simulation Runs . . . 54

7.4 Results . . . 55

7.4.1 Hardware Results & Discussion . . . 55

7.4.2 Simulation Results & Discussion . . . 57

7.5 Qualitative Outlook . . . 61 8 Conclusions 63 8.1 Workflow Summary . . . 63 8.2 Critical Discussion . . . 64 8.3 Future Work . . . 66 viii

(8)

7.1 Resulted Payload Size (H/W) . . . 55

7.2 Serialization Speed (H/W) . . . 55

7.3 Deserialization Speed (H/W) . . . 55

7.4 Total Speed Performance (H/W) . . . 56

7.5 Library Overhead (H/W) . . . 56

7.6 Total Payload Size (simulation) . . . 57

7.7 Serialization time (simulation) . . . 58

7.8 Deserialization time (simulation) . . . 58

7.9 Library Overhead (simulation) . . . 58

7.10 Memory Footprint - serialization (simulation) . . . 58

7.11 Memory Footprint - deserialization (simulation) . . . 59

7.12 Memory Footprint – Serialization with Deserialization (simulation) . . . 59

7.13 Average Power Consumption (simulation) . . . 60

7.14 Qualitative View . . . 62

(9)

2.1 IoT Verticals . . . 6

2.2 IoT Building Blocks . . . 8

2.3 IoT Application Model . . . 9

2.4 Bus-Based Model . . . 14

2.5 Broker-Based Model . . . 15

6.1 Size command outputs . . . 48

7.1 Visualizing Average Power Consumption . . . 61

(10)

Introduction

1.1

Context & Motivation

Internet of Things has been the driving force behind the intelligence embedded in various environ-ments. These complex networks that connect millions of devices and people around the world is the backbone of “Industry 4.0” enabling fast and autonomous bi-directional data communication between machineries and devices. With its constant growth and incorporation into numerous and diverse industries, the expectations and figures of IoT applications and technologies show how it will become an essential infrastructure powering and driving people’s daily activities and interactions due to the rapidly increasing number of connected “things”. The capabilities and advancements of self-identification, localization, and data acquisition through positioning and sensing technologies remain at the heart of IoT-powered innovations.

Substantially, sensor data constitutes the majority of the IoT business logic, since IoT appli-cations rely mainly on sensor data to effectively perform operations of monitoring, control, opti-mization that eventually enable automation capabilities paving the way of novel applications. In an IoT system, these sensor devices can be deployed via Wireless Sensor Networks (WSN) where sensors communicate with each other either in a multi-hop or in a single-hop fashion relying on radio frequency mechanisms to exchange information and then send target data over the internet, or be deployed in a client/server architecture where each sensor node sends data to the application server over the internet that is then processed to offer various services to end-users such as smart parking, smart factory, and smart city applications.

Regardless of the IoT system architecture, applications feed intensively on sensors’ data and these data harvesting devices have resource constraints related to computing power, memory, energy consumption, and limited bandwidth tolerance. These constraints are due to the nature of sensor nodes that run on batteries and on low powered microprocessors and microcontrollers with limited memory and clock frequency. Also, for IoT environment setting, deployment, and operations to

(11)

achieve the IoT vision and expectations, connected devices need to seamlessly communicate through low-bandwidth networks (e.g. 6LoWPAN) where the data link layer payload is extremely limited (e.g. 81 bytes in IEEE 802.15.4 link headers) and thus fragmentation becomes more probable leading to increasement in the sent and received data packets. This in turn result in bandwidth exhaustion over time preventing other services to be offered on the network. The main factors that contribute to increasing the amount of data traffic are the overheads of IoT application layer protocols, the payload size, and retransmissions of data resulted from data loss [1]. While several papers address and evaluate application layer protocols impact on the overall data traffic for example [2, 3, 4], the payload size remain an essential contributing factor to the overall traffic as it could result in fragmentation at the data link layer which should be avoided as it has negative impacts on the battery life of the node [5, 6], the payload size also typically contributes to the usage of the limited storage capacity of sensor nodes.

In addition, due to the vast and rapid development of IoT technologies, several interoperability issues have emerged threatening the success and service quality of IoT applications including is-sues related to semantic and cross-domain interoperability [7]. These isis-sues threaten the IoT vision defined by International Telecommunication Union’s Telecommunications Standardization sector (ITU-T) of utilizing IoT as a global infrastructure enabled by interoperable information and com-munication technologies [8] that emphasizes the role of machine to machine (M2M) paradigm as a key driver in IoT. To achieve that, the ITU highlights the importance of IoT standardization that should underly the different components of the IoT infrastructure. In fact, standardization and interoperability constitute the major challenges in scaling out IoT and M2M system to match the various market demands [9]. One of these components encompass the methodology of data repre-sentation formats that also should adhere to the interoperability condition. Several research papers have surveyed, analyzed, and evaluated enabling technologies across the different layers of IoT stack in order to provide a standardization reference to follow, few examples are [10, 11, 12].

Therefore, optimization of the payload structuring and size is an important task to perform, and the mechanisms chosen to perform this task are required to support heterogeneity and interoper-ability in order to fully achieve the IoT visions and expectations. In this thesis, the mechanisms we evaluate operate on the semantic layer concerned with format of IoT sensor nodes payload. Cur-rently, text-based formats such as JSON and XML remain common practices for structuring and serializing sensors nodes data. These formats were mainly to utilize due to the increasing number of embedded systems connected to the internet where this connectivity requires a simple yet efficient format for interchanging data between connected devices /cite6861361, and between the devices and the web applications that in many cases rely mainly on these formats. This due to their standard-ized support for web applications interoperability and simplicity requirements represented. In IoT system, the advantages of these formats represented in human-readability and simplicity of serial-ization and parsing processes are outweighed by the resulted sizes of messages exchanged in these

(12)

formats due to the overhead of additional characters and labels required, and the processing effort taken to serialize and deserialize these messages. These consequences are affordable in standard web applications computer systems but impactful in resource constrained IoT sensor nodes. Moreover, in M2M communication that does not require human involvement [8], lightweight messages, speed, and power efficiency are considered major priorities, while messages in a human readable format is not necessary.

1.2

Research Goals

The research aims at evaluating the performance of a selected binary-based data serialization tech-niques in structuring and preparing sensors data. The goal is to achieve an analytical assessment of these techniques through testing each protocol performance in different payload settings, in pur-pose of exploring and examining resource optimization gains that these techniques could bring in terms of the resulted payload size and the effort of serializing and deserializing such payloads, and thus provide achieve a more optimal and resource-efficient IoT settings that could also achieve the interoperable IoT vision.

The selected data serialization techniques are: Protocol Buffers, Flat Buffers, Message Pack, and Binary JSON (BSON). The hypothesis is that binary-based serialization techniques will show an advantage in terms of resulted payload size, and lower consumption of sensor nodes resources mainly memory, processing, and energy. This is regardless of the temporal/ spatial nature of transmitted data. It is also hypothesized that a ranking of techniques will be achieved based on each performance parameter arriving at candidate technique(s) to be potential standard data formats in IoT systems. To carry out the experiments that constitute the benchmark of our comparison, the selected data serialization techniques are tested on two platforms: a microcontroller-based platform NodeMCU, and an OS-based platform where Cooja simulator is utilized to run tests on Zolertia z1 motes that use Contiki OS.

(13)

1.3

Thesis Structure

The rest of this thesis is organized as follows:

Chapter 2 walks through the background knowledge describing the concepts that are related to the work that has been performed for this thesis.

Chapter 3 presents the literature work relevant to this thesis along with the insights they provide. Chapter 4 explains the details of the employed methodology, tools and technology used in this thesis.

Chapter 5 describes the development and implementations of the experiments done on Hardware. Chapter 6 describes the development and implementations of the Simulation based experiments. Chapter 7 presents the results of the experiments and discusses the outcomes.

Chapter 8 concludes this report by summarizing the work, providing a critical discussion and advising on the outlook of future work.

(14)

Background

This chapter provides a primer in the theoretical background in the domain of IoT, which shall deem useful to follow the rest of the thesis.

2.1

Internet of Things (IOT)

According to the McKinsey Global Institute’s 2015 report - The Internet of Things: Mapping the value beyond the Hype1, IoT is estimated to have a potential economic impact of as much as $11.1

trillion per year by 2025 [13] and as per current trends, Gartner forecasts a 32.9% compound annual growth rate (CAGR) from 2015 through 2020 [14], reaching an installed base of 20.4 billion units for Internet of Things endpoints. Cisco estimates that IoT will consist of 50 billion devices connected to the Internet by 2020 [15]. With a paradigm shift towards service driven businesses, the potential for IoT seems limitless. IoT application settings can be organized into multiple verticals. Undoubtedly, the main aim of IoT is to create smart environments for applications that can be deployed in: Cities - Public health, environment and transportation, Vehicles - Autonomous vehicles and maintenance, Home - chore automation and security, Offices - security and energy, Factories - Operations and equipment optimization, Retail environments - transactions and customer experience, Work Sites - Operations, regulations, health and safety, Human - health and fitness, Outside - logistics and navigation.

Examples of these application solutions can be: Smart Parking, Infrastructure Health main-tenance and management, Smart Lighting solutions, Waste Management, City Safety and Forest Fire detection, Air Pollution monitoring, Snow Level monitoring, Earthquake early detection, Gas & Water Leakage monitoring, Supply Chain Control solutions, Inventory management, NFC pay-ments, , Smart Shopping applications, Medical Fridges and Transport, Heart rate and Sleep Quality

1 https://www.mckinsey.com/business-functions/mckinsey-digital/our-insights/the-internet-of-things-the-value-of-digitizing-the-physical-world

(15)

Figure 2.1: Verticals in IoT

monitoring, Energy & Water Consumption, Remote Control Appliances, Security Systems, Indoor Climate Control, Smart Power Grid, Water Flow control, Goods Shipment Control, Fleet Tracking and many others.

2.2

Key Features and Challenges

In recent times, developments in AI, smart sensors, communication technologies, and communica-tion protocols have empowered IoT, fuelling the concept of smart environments around us. Our environments have been pervasively enabled with physical, virtual, augmented and mixed reality experiences by smart objects that communicate with each other (M2M) as well as with us (H2M). All of this is encapsulated in the broad umbrella of IoT where the Internet of Things has become an enabler of new solutions and experiences impacting our daily lives.

Key Features

Some key features that signify an IoT system include: Intelligence - some degree of intelligence aides in smarter decision making, Connectivity - anything can be connected to the internet, Dynamic nature - everchanging and ever evolving state of things around us are pools of data that enable modern problem solving, Enormous scale - scalability is at the pinnacle of needs for today’s high demand applications, Sensing - sensors are key to data acquisition, Things - related services - services in the physical world, Heterogeneity - differences in networks or device hardware and software are integral part of the solution and naturally don’t pose a problem, Seamless connectivity - effortless transitions across use case scenarios, Security - as much so for data, privacy and security of deployed systems is crucial, Autonomy - operation of devices should be free from intervention, Pervasiveness - device integration can range from natural to austere environments.

(16)

Challenges

The term IoT refers to a concept in which interconnected things are integrated into situational decision support, asset management and new services which in turn bring business opportunities and add to the existing complexity of IT. IoT is a fast-emerging ecosystem that integrates both IP-connected and non-IP devices. To accommodate the diversity of the IoT, there is a heterogeneous mix of communication technologies, which need to be adapted in order to address the needs of IoT applications such as energy efficiency, security, and reliability.

IoT solutions have to constantly make careful design decisions that deal with balancing among several trade-offs across a board of factors starting from Cost, Throughput, Complexity, Intelligence, Autonomy, Robustness, Efficiency, and Energy requirements. These factors are crucial to evaluating any IoT solution.

To briefly summarize the challenges that IoT is posed with, we can enumerate them under four main categories, namely, Connectivity, Power, Security, Complexity and Rapid Evolution.

• Connectivity: There is not one clear winner with respect to connectivity standards. As of now, it seems that, cellular networks are winning in lieu of their established infrastructures but a wide variety wired and wireless standards as well as proprietary implementations growing in popularity pose the challenge of getting the connectivity standards to talk to one another with one common worldwide data currency.

• Power management: Self-sustaining, energy-harvesting and battery powered devices are on an endless path trying to extract every bit of efficiency with respect to energy consumption and device life.

• Security: With data serving as the most important asset of IoT, privacy of data and security of both data and devices is prime. Here challenges can be improving the security of connectivity protocols and hardware as well as educating consumers about the same.

• Complexity: From designing new solutions to choosing between options at every layer of the architecture, simplifying poses another challenge.

• Rapid evolution: IoT is fast evolving although it is considered to be in its early days. The industry needs to be prepared to face new challenges, opportunities and find new solutions every day. Adaptability is a key quality any new development should possess.

Data is the essence in IoT applications, and much development has been done regarding the core philosophies of IoT. To utilize the collected data into services, data must be propagated. Therefore, a wide variety of wired and wireless connectivity technologies have been developed to meet the demands of the IoT industry. Moreover, various types of hardware platforms with connectivity solutions and power-efficient technologies continuously being developed provide the interface to IoT solutions.

(17)

Figure 2.2: Building Blocks of IoT

In the following sections, we first describe the IoT architecture, communication protocols/ standards and models then we briefly cover some of the most common data exchange protocols (also referred to as application layer and sometimes session layer protocols) before finally describing the data format protocols that we primarily focus on in this thesis.

2.3

The Internet of Things Architecture

An architecture for IoT implementations consists of several layers: from the data acquisition layer at the bottom to the application layer at the top /cite 8058399. The layered architecture has two distinct divisions with an Internet layer in between to serve the purpose of a common media for communication. The two lower layers contribute to data capturing while the two layers at the top are responsible for supporting Internet of Things applications /cite8058399.

The figure2.3 below represents an implementational model of Internet of Things for data utiliza-tion in applicautiliza-tions.

If we visualize the umbrella term of IoT Solutions, these solutions encompass either one or all of the following: Hardware (end nodes and sensor devices), The Internetworking (Wired, wireless mediums and protocols) and the Software (Node software, Cloud deployment, remote applications and dashboards). Having this segregation enables decoupled development of parts of a complete solution.

An interesting aspect of IoT applications is that most of them are deployed using new wireless communication standards and protocols. In the context of IoT, Sensor Networks seem to dominate moving towards the future. When it comes to deployment, networks can be configured in a variety

(18)

Figure 2.3: Model for IoT Applications2

of ways broadly these can be single hop and multi hop networks. The topology can accordingly use the above application layer communication protocols. HTTP is slow (conn ack) and HTTP header size overhead for energy constraint devices is not feasible. The IIoT landscape is dominated by two application protocols – MQTT and CoAP. On top of it the data/message interchange is filled with multiple protocols and libraries enabling more efficient communication. At the base of these protocols lies the data / message itself. By design, these data need to be serialized to be sent across the network over the protocol. Usually these data contain telemetry and identification data. And in more than one way, serialization solves many issues that would exist without them. The popular serialization formats are JSON, BSON, etc. However, their applicability far exceeds the context of IoT further into the Distributed Systems domain.

2.4

Connectivity Protocols/Standards

Taking into account the range covered and characteristics of connectivity protocols they can be divided into:

• Personal Area Network – ANT, Bluetooth/LE, 6LoWPAN • Local Area Network – ZigBee, WiFi

• Cellular – GPRS, 3G, LTE

(19)

• Cellular-like Radio - LoRaWAN, SigFox, Weightless • Radio Frequency Network – RFID, NFC

Given that in this thesis we focus on optimizing payload operations in sensor nodes IoT applica-tions, the connectivity protocols discussed in this chapter are limited to the ones that are designed to deal with relatively small intermittent payloads generated from devices requiring low power net-works and hence benefiting from payload optimization techniques that further reduce resources consumption. Therefore, the connectivity protocols that are utilized in other more power, data, and bandwidth demanding IoT applications (e.g. real-time video surveillance) are not discussed for the sake of brevity. This in turn limits the connectivity protocols discussion to commonly used low power protocols deployed in sensor applications and include: 6LoWPAN, ZigBee, LoRaWAN, and Wi-Fi.

6LoWPAN

As the full name implies – “IPv6 over Low-Power Wireless Personal Area Networks” – 6LoWPAN is a networking technology or adaptation layer that allows IPv6 packets to be carried efficiently within small link layer frames, such as those defined by IEEE 802.15.4. Low-power, IP-driven nodes and large mesh network support make this technology a great option for Internet of Things (IoT) applications [5]. From the end-nodes, the 6LoWPAN gateway that connects with the IPv6 domain forwards the packets to the destination IP-enabled device by using the IP address. The protocol also utilizes IP communication implementing open IP standards including TCP, UDP, HTTP, COAP, MQTT, and web sockets, the standard offers end-to-end addressable nodes, allowing a router to connect the network to IP leveraging the wide spectrum of IP applications [5]. 6LowPAN is a mesh network that is robust, scalable and self-healing with 128-bit AES link layer encryption. It is ideal for direct connectivity to internet applications in the context of Home Automation, Smart Infrastructure, Smart Agriculture and Industrial IoT. However, supporting IPv6 over IEEE 802.15.4 networks in 6LoWPAN typology poses different challenges due to the resource-constrained devices operating in low-power mesh network. One of the main challenges is that the effective payload size can be limited by the IEEE 802.15.4 link headers to 81 bytes which could also be compromised to IPv6 and UDP headers resulting in 33 bytes left for application data [16], a limitation that can result in packet fragmentation and hence more traffic on the network. Therefore, efficient data format techniques that optimize the payload size is an essential practice in 6LoWPAN network. ZigBee[17]

ZigBee is one of the most widely used transceiver standards in wireless sensor networks. ZigBee over IEEE 802.15.4., defines specifications for low data rate WPAN (LR-WPAN) to support low power monitoring and controlling devices. ZigBee and 802.15.4 are not the same and the ZigBee

(20)

alliance is responsible for the development from the network layer all the way up to application layer of the ZigBee stack. ZigBee supports multiple network topologies, has low duty-cycle, low latency and 128bit AES encryption. With low throughput of about 250kbps, ZigBee systems are useful for applications in Home Automation and control, Residential & commercial utility systems and Health care. Despite the ZigBee-specific development of the network layer, the reliance on IEEE 802.15.4 global standard for low power wireless communication in the lower layers poses the same payload limitations [5], and hence utilizing optimized payload formats is also beneficial.

Cellular

Cellular technologies are utilized in IoT due to the powerful cellular infrastructure that supports fast and large data transmission that has been crystalized with the 4G standard. Cellular technologies encompass technologies developed and maintained by the 3rd Generation Partnership Project 3GPP

3 such as GSM with its 2G and 2.5G standards, UMTS with its 3G standards, LTE with its 4G

standards and the 5G NR and its 5G standards. These technologies have essentially been developed targeting mobile phone systems offering high quality voice and data services offering powerful range and high bandwidth. These benefits come with trade-offs represented in higher power consumption that doesn’t pose critical issues on easily chargeable or electricity connected devices but has drasti-cally draining impacts IoT applications that rely on battery-powered devices aiming to last as much as possible [18]. Additionally, since connectivity is backed by the coverage of cellular devices, this limits their applicability to remote IoT applications. However, since that the mobile phone market has been moving away from 2G cellular networks, the freed-up network bandwidth of 128Kbps (for data) is being utilized by IoT applications. In addition, cellular technologies have been developed to specifically target IoT applications such as Narrowband IoT (NB-IoT) and LTE-M providing lower costs and power connectivity [18]. Nevertheless, cellular technologies remain power hungry compared to other connectivity technologies, and therefore connected devices would benefit from optimizations at the payload structuring level that require less processing support and hence less power consumption.

Wi-Fi

Being IEEE standardized under the IEEE 802.11 wireless standards family, Wi-Fi is a wireless con-nectivity protocol that was initially designed in purpose of replacing Ethernet providing short-range wireless connectivity that supports interoperability. It was essentially targeting desktop devices and due to its ubiquitous coverage, it has been utilized by mobile devices. Additionally, this common deployment of Wi-Fi access points has made it a popular easy solution to utilize by IoT devices. However, power efficiency was not a built-in requirement and therefore it’s considered power drain-ing when applied in IoT context. For that reason, new standards under the family of 802.11 have

(21)

been implemented (802.11ah, and 802.11ax) addressing the power efficiency requirements in IoT applications [19, 20]. However, in the majority of 802.11 deployments, payloads are recommended to be of a max of 1500 bytes [21], this added to the fact that these IoT-specific 802.11 amendments still considered more power consuming than other low power technologies, and the increased power consumption due to congestion faced as a result of using Wi-Fi to reach a target cellular network in order to eventually connect to IoT servers [22] all these scenarios highlight the importance of optimization in payload operations which benefit IoT applications relying on Wi-Fi technologies. LoRaWAN[23]

LoRaWAN is a Low Power Wide Area Network (LPWAN) specification intended for wireless bat-tery operated things. With longer long-range communication capabilities, LoRa enables solutions to connect to the broader network of things without additional expensive hardware. The LoRaWAN network architecture typically uses the star-of-stars topology to successfully enable communication between end-nodes and servers via gateways that can connect to the internet servers via standard IP connections. End-nodes communicate with the gateways only via single-hop wireless communi-cations. LoRa also supports multicast messages due to its complete bidirectional communication model. Utilizing multiple frequency channels, LoRaWAN networks regulate the data rates depend-ing on range and payload. An adaptive data rate (ADR) scheme allows for the management of data rates and radio frequency parameters of each end node individually. Today LoRaWAN is already being used across the world to scale citywide IoT application projects. However, it has a limita-tion when it comes to the maximum transmission unit payload size (MTU); this size depends on the distance of the nodes from the gateway and cannot be known beforehand which in turn drives firmware developers to assign the smallest available MTU, the typical payload size ranges between 51 and 222 bytes maximum, and could be assigned to be as small as 12 bytes [23]. This means that the higher the payload the more fragmentation imposed on the network. It’s also recommended for LoRaWAN devices to send small payloads in order to achieve higher throughput and packet success rate [23]. Therefore, payload optimization becomes also beneficial in LoRaWAN networks.

When it comes to choosing between IoT protocols made available by the industry organizations and standards, both commercial solutions as well as open source projects, for each of them, the basis of choice considers the main functionalities provided, architectures employed, communication models, protocols, data exchange mechanisms and messaging techniques [24].

In one way we can look at these communication protocols as IP based where IP network is leveraged to directly propagate information to cloud and Non-IP based where Gateways or Protocol Translators act as intermediary connecting to the IP Network. This distinction is important in the IoT context as it helps solution architects to choose the protocols based on their device characteristics and capabilities. The Non-IP based protocols help connect the last mile devices. A complete IoT solution can contain both IP-based as well as Non-IP based protocols for example, a non-IP network

(22)

might ultimately connect to the IP network to enable its solutions through the cloud.

From the above protocols we can observe clearly that first, low bandwidth or low throughput networks are critical when it comes to protocol design. Hence, the need to further decrease the network load. Second, the role of middleware hardware and software iins IoT system cannot be overlooked and are an absolute necessity in the context of IoT. Finally, protocol overheads are a contributing a factor in IoT systems implementation choices and thus, the scope of optimization at every level.

2.5

Models

IoT communication can be conceptualized as 4 major communication models, which are mostly de-rived from the traditional web models, namely, Request-Response model, Publish-Subscribe model, Push-Pull model and the Exclusive Pair model.

The Request-Response communication model, simply put, is the client sending requests to the server and the server responding to the requests. Request-response is a stateless communication model and each request-response pair is independent of others. Examples are implementations using HTTP or CoAP protocols.

The Publish-Subscribe communication model involves publishers, brokers and subscribers (con-sumers). Publishers are the source of data [25]. Publishers send the data to the topics which are managed by the broker. Publishers are not aware of the subscribers. Consumers subscribe to the topics which are managed by the broker. When the broker receive data for a topic from the publisher, it sends the data to all the subscribed consumers. Examples are implementations using rabbitMQ (AMQP), mosquitto (MQTT), ejabberd (XMPP), and ZeroMQ [25].

The Push-Pull communication model is a model which uses queueing buffers [26]. Data producers push the data to queues and the consumers Pull the data from the queues. Producers do not need to be aware of the consumers. Queues help decouple the messaging between the Producers and Consumers [26]. Buffering with queues helps in situations when there is a mismatch between the rate at which the producers push data and the rate which the consumers pull data. Examples are implementations that exploit MQTT or AMQP to achieve Push-Pull functionalities [26].

The Exclusive Pair communication model uses a persistent connection to achieve fully duplex, bidirectional data transfer. This implementation conducts stateful communication where the server keeps the connection open till the client requests to close it. Here the communication can happen only after a connection is established and the server is fully aware of all the connections. This model is more of a consequence of the choice of connectivity protocol, examples of which are implementations over Bluetooth and NFC.

It is important to understand that most of the deployments today use commercially available IoT platforms ranging from IIoT to Smart cities. Here once data acquisition is complete at the

(23)

edge sensor nodes, the information is sent all the way up to these platforms(usually cloud) and as a result the communication models as mentioned above can be adapted into: Direct Communication (directly using MQTT or CoAP), Using a Gateway(ZigBee, Zwave) or Agent-based (for distributed deployments).

One common scenario when developing IoT systems is Protocol Translation. This is a con-sequence of the convergence of Protocol fragmentation in IoT. This is achieved by the support mechanisms designed to exchange information across the myriad of protocols that empower these systems. One most used paradigm is the use of Gateways. Based on complexity, gateways can be basic bridges or embedded control devices. End sensor nodes use Gateways for protocol translation, connect them to the bigger back bone of internetworking. Here we would also like to introduce the Data exchange protocols also known as Application layer communication protocols. Henceforth, we refer to these as data exchange protocols which further make use of some sophisticated data format protocols that, as the name suggests, format the data contained in the payload that these data exchange protocols carry. It is not wrong to call some of these data exchange protocols – cloud enabling standards as they help connect edge-sensor networks to the broader IP based Internet as we know it. In data exchange protocols, Bus-based and Broker-based are the two reigning architectural [27].

Bus-based

In the Bus-based architecture, basically there is a Bus Coordinator in charge of delivering messages to the intended subscriber. This architecture offers a queuing subsystem that mediates accesses between Publisher and Subscriber. Example are DDS, REST and XMPP which support broker-less architecture [27].

(24)

Broker-based

In this case, broker(s) act as controller(s) for the distribution of information to forward, filter or prioritize and even store publish packets from the publisher (the source of data) nodes to the sub-scriber (data consumer)nodes. Depending on their objectives, clients can switch between publisher and subscriber roles. One additional important feature of brokers is the implementation of QoS (Quality of Service levels). Examples of broker–based protocols include Advanced Message Queuing Protocol (AMPQ), Constrained Applications Protocol (CoAP), Message Queue Telemetry Transport (MQTT) and Java Message Service API (JMS) [27].

Figure 2.5: Broker-Based Model (Note: Node can be any application)

Alternatively, we can classify these protocols as data-centric or message-centric. A data-centric protocol such as CoAP focuses on delivering the data and assumes the data is understood by the receiver. Middleware understands the data and ensures that the subscribers have a synchronized and consistent view of the data. Message centric protocols such as AMQP, MQTT, and REST focus on the delivery of the message to the intended receiver, regardless of the data payload it contains. Additionally, an important aspect of these protocols is whether they are web-based like CoAP or application-based such as with MQTT and AMQP

Sensor networks juggle through resource constraints on energy, storage and bandwidth while communicating over lightweight protocols to offer divergent solutions and services. Efficiency is re-quired in both, communication as well as computation. This makes addressing resource optimization a primary task. Further evolution in IoT has led to optimizations at the Application layer. This has led to a myriad of Data Exchange Protocols also known as Application Layer Protocols along with many novel Data Format protocols which we elicit in the following pages.

(25)

2.6

Data Exchange Protocols

Application layer protocols define how applications (clients and servers) pass messages (data) to each other. These run on the application layer of the communication model hence known as application layer protocols. Such protocol defines the syntax to use, the semantics to follow and the types of messages that can be sent along with the rules determining when and how messages are exchanged. The term Data Exchange Protocol can be alternatively used to address most of these protocols. Henceforth, in this document, we refer to our selection of protocols below as Data Exchange Proto-cols.

MQTT

MQTT4 stands for Message Queueing Telemetry Transport. It is a publish/subscribe model-based protocol. The design principles behind MQTT were to be a simple, lightweight and open source messaging protocol for constrained devices with low-bandwidth and high-latency whilst also ensuring some degree of reliability and assurance of delivery. This protocol enables exchanging messages among publisher and subscriber nodes using a broker-based architecture. This seems to be ideal for Machine to Machine communication (M2M) contexts. List of available brokers for MQTT include, but are not limited to, ActiveMQ, Apollo, HiveMQ, IBM MessageSight, JoramMQ, Mosquitto, RabbitMQ. These brokers vary in establishment of communication and additional features over standard MQTT functionalities. Although MQTT is very efficient, high numbers off nodes from clients to the central broker might lead to poor performance.

MQTT-SN

MQTT-SN [28] is a derivative of the original MQTT protocol with one specific goal which is to cater to wireless sensor networks (WSNs). The SN abbreviation stands for Sensor Networks. Such networks are dynamic and constitute of large number of wirelessly connected devices which are prone to link instability and failure. This problem is addressed with the Pub-Sub paradigm’s data-centric communication approach rather than an address-based one. The difference is that information is delivered based on contents and interests rather than the addresses of subscribers. All of this is already provided by the MQTT protocol mentioned above but however, MQTT requires an under-lying network, such as TCP/IP, that provides an ordered lossless connection capability and this is too complex for very simple, small footprint, and low-cost devices such as wireless sensor nodes. MQTT-SN is designed to be network protocol agnostic, in other words it supports any network that provides bidirectional communication between any node and a gateway, for example UDP. Compared to MQTT, MQTT-SN can cope with short message lengths, provide predefined topic ids and short

(26)

topic names and an auto discovery procedure to cater to multiple brokers in the same network for load sharing. All of these work in favour of more constrained devices and scalability of deployments. CoAP

The Constrained Application Protocol5(CoAP) is a Data exchange protocol derived from the web for

use with constrained nodes and constrained (e.g., low-power, lossy) networks. They have been widely adopted for nodes with limited memory over constrained networks such as IPv6 over Low-Power Wireless Personal Area Networks (6LoWPANs). It supports a one-to-one protocol for transferring state information between client and server. CoAP utilizes User Datagram Protocol (UDP), and supports broadcast and multicast addressing. It does not support TCP. The CoAP communication is through connectionless datagrams and can be used on top other packet-based communications protocols with Low header overhead and parsing complexity. CoAP supports content negotiation and discovery, allowing devices to probe each other to find ways to exchange data. CoAP was designed for interoperability with the web (including HTTP and RESTful protocols).

AMQP

AMQP[29] stands for Advance Message Queuing Protocol, which is an open source undertaking as part of the Apache Software Foundation Incubator application. AMQP is a message-oriented protocol designed with end to end routing and queuing of data. It is a brokered protocol in which the broker is capable of making routing choices and as a consequence it falls under the category of middleware protocols. AMQP is a binary transfer protocol and today widely used in IoT applications. Being a binary protocol, it has less overhead that test basedtestbased test-based protocols like HTTP. For security though, it depends on an underlying reliable transport layer protocol. With multiplexing capabilities, each connection allows for separated data flows among client-server as well as peer-to-peer communications. The reliability features in AMQP provide three different levels of QoS (Quality of Service).

REST

REST is a paradigm rather than a protocol. Rest stands for Representational State transfer. It is defined as an architectural style for designing applications utilizing HTTP. REST enables web-based services referred to as RESTful services which use the same verbs as HTTP(GET,POST,PUT,DELETE). Utilizing URI(Uniform Resource Identifiers) as points of access, REST enables point-to-point, state-less client/server, cacheable protocol for simple client/server (request/reply) communications from devices to the cloud over TCP/IP. REST is fit for use over the internet due to these characteristics. It is a preferred choice when it comes to web applications, and certainly for cloud services. JSON

(27)

is a common data exchange format used in RESTful services. REST comfortably finds its place in IoT solutions [27] today where the web is leveraged to offer service-oriented applications.

In the review, Towards Efficient Mobile M2M Communications: Survey and Open Challenges [30] , the authors conclude, “Future work should focus in exploring [. . . ] mechanisms to efficiently collect and aggregate data while attaining the time requirements of data, to reduce energy and bandwidth consumption, as resource usage efficiency is a common denominator in the literature due to mass scale envisioned for mobile M2M communications”. We tend to agree and add that one such mechanism is optimization at the data translation level, where information is formatted for optimal transmission over these protocols. Each protocol, when used adds more data in the form of headers for its own functioning. As we note that every protocol designed for IoT aims to reduce the header overhead it introduces, we can safely say that reducing the overhead introduced by data-format protocols serves the same goal. Hence, as deliberated in this thesis, among other resources, we believe, these data exchange protocols can also benefit from optimization done on payload structuring and transmission process which in turn will aggregate to result in a reduction of the overall system traffic.

2.7

Data Format Protocols

Data formats protocols are concerned with the semantic and representation of IoT data generated from connected devices. This involves structuring and converting the payload data into a format using mechanisms that support reliability and interoperability with emphasis on lightweight. In this section we discuss the need of data serialization in IoT and we present the common text-based data serialization formats used in IoT applications, then we introduce a background discussion on the selected binary-based data serialization techniques. Data Serialization in IoT: IoT applications objective remains to exploit the power of data harvested from connected devices. This is achieved through different processes that runs in reaction to the collected data, and involve monitoring, re-porting, control, prediction, and other analytical solutions enabled by the collected data that aim at providing a wide spectrum of services delivering value to an increasingly wide range of users. This vivid communication requires IoT devices to have a structured messaging method for interchang-ing data between different endpoints. Considerinterchang-ing that these endpoints could differ in nature from other sensor nodes, mobile devices, web applications, or other back-end tiers, it’s very common for these endpoints to be developed in different environments, utilizing different platforms and program-ming languages, while harnessing data from mutual sources. These scenarios create the necessity of maintaining the structure and content of messages as well as supporting different platforms and application requirements that are subject to change [31]. To attain that, data serialization mecha-nisms are utilized to assure heterogeneity and support of multiple systems. Data serialization deals with converting data structures into another format that enables efficient storage and transmission

(28)

of data. The main feature of this process is the enabling of relatively fast and easy reconstruction of serialized data in order to be utilized accordingly by another completely different platform given that the serialization mechanism is known by both points. There exist several data serialization techniques that are standardized and initially developed and optimized for high-level programming languages, main examples of these techniques are JSON and XML and. The strong standardization and support these formats have driven their popular utilization in IoT devices [32] that use low-level programming languages such as C/C++ , are resource-constrained, and deal with lightweight mes-sages, therefore they require more optimized data serialization mechanisms to handle the messages format. In the next sections, we briefly describe the highlighted mechanisms in this research with emphasis on target techniques to be analyzed in sensor nodes context. These data serialization techniques are referred to as Data Format Protocols given that they’re applied to the process of structuring and formatting of the message payload.

We distinguish between three different ways of classifications these protocols can follow:

1. Schema and Schema-less: Those that have predefined schemas supporting an interface de-scription language (IDL) of the data needs to be propagated: example HTTP, XML(optional), Protobuff, Flatbuffers are schema based whereas JSON, Message Pack, and BSON are schema less. The inexistence of schema adds more flexibility but impacts the size of encoded data as the data fields’ names need to be serialized too.

2. Encoding: The payload could be encoded as a text utilizing further character encoding mech-anisms such as ASCII or UTF-X or encoded directly in binary format.

3. Library: Protobuf, Flatbuffers and MessagePack require library support whereas XML, JSON work with plain text. This classification is a result from the encoding mechanism used, as text-based encoding require merely additional characters to be added for the parsing/unparsing of data whereas binary-based encoding differs from one protocol to the other depending on the how each protocol handles the encoding of different data types.

In this thesis, the encoding mechanism is used as the main classification method due to its uniformity with respect to our context.

2.7.1

Text-Based Protocols

XML

Extensible Markup Language6(XML) is a flexible standardized known as describing or self-defining, meaning that the structure of the data is embedded with the data in addition to text-based headers called tags that define the specifications of each element, thus when the data arrives there is no need to pre-build the structure to store the data; it is dynamically understood. XML

(29)

allows to store binary data; however, this data must be converted to text using Base64 encoding that requires additional bytes to store binary data. The criticism XML faces stems from the nature of the mechanism that makes it verbose [33] and thus not appropriate to be utilized in resource-constrained devices. Moreover, it’s stated in the official XML specifications that compactness is not a priority [33]. Therefore, this makes the utilization of XML in IoT applications not efficient due to the size overhead and complex encoding mechanisms which pose more resource and bandwidth consumption impacting the performance of IoT devices. Several benchmarking projects and papers that are referred two in the following chapters have shown results that support the argument of XML not being an efficient data format to utilize in IoT resource-constrained devices.

JSON

JavaScript Object Notation7(JSON) is a lightweight data-interchange format. One of the main

fea-tures is that it advertises itself to be more human readable than many other formats. It is easy for machines to parse and generate. JSON is a text-based format that is completely language indepen-dent but uses conventions that are familiar to programmers and majority of popular programming languages. These properties make JSON an ideal data-interchange language.

JSON is built on two structures: a collection of name/value pairs. In various languages, this is realized as an object, record, structure, dictionary, hash table, keyed list, or associative array. The other structure is an ordered list of values. In most languages, this is realized as an array, vector, list, or sequence. These are universal data structures. Virtually all modern programming languages support them in one form or another. It makes sense that a data format that is interchangeable with programming languages also be based on these structures. Moreover, in JSON, additional characters are required for the notation basically including:

• Objects: Objects begin and end with curly braces ({}).

• Object Members: Members consist of strings and values, separated by colon (:). Members are separated by commas.

• Arrays: Arrays begin and end with braces and contain values. Values are separated by commas. • Values: A value can be a string, a number, an object, an array, or the literals true, false or

null.

• Strings: Strings are surrounded by double quotes and contain Unicode characters or common backslash escapes.

JSON is referred to as more lightweight and efficient compared to XML [34]. However, in addition to the fact that JSON encodes the field labels and field values as strings, the additional

(30)

required characters for JSON formatting impose larger payload sizes, longer time to encode/decode the payloads and thus additional power consumption on the IoT device. This is where binary-based data serialization mechanisms prove to be more efficient, this is discussed further in Chapter 3 where we refer to research papers comparing JSON with other binary-based techniques.

2.7.2

Binary-based Protocols

The choice of this set of binary based protocols to be evaluated is based on the efficiency and speed claims on the official implementations sites, supported by several benchmarking results that evidently endorse these claims. In addition, proper documentation, presence of active community, maintained implementations, platform and language neutrality remain essential contributing factors to the choice.

Protocol Buffers

Also known as protobuf8, is an open-source data serialization mechanism that has been developed and maintained by Google. It’s used to serialize structured data for use in communications protocols, data storage and other application scenarios where cross-platform communication is heavily applied. Protocol buffers is platform-neutral and language-neutral supporting a flexible, efficient, automated mechanism for serializing. Practically, utilizing a schema-based technique, it is specified what type of structured information to be serialized by defining protocol buffer message types in “.proto” schema files. The schema is structured to define two types of objects: a “message” and an “enum”. A “message” is the aggregate object type where all fields and “enum” objects are included. The “enum” is used to have a message field that contain a list of values. Built-in data types supported in protobuf include all the major scalar data types excluding 16-bit scalar types, in addition to string and byte data types. The schema also supports the definition of several messages and nested messages. Each protocol buffer message is a small logical record of information, containing a series of name-value pairs. Once data has been structured, the message will be encoded using the protocol buffers compiler from the .proto file, the compiler generates the code in the chosen language in order to work with the message types that have been described in the schema file, including getting and setting fields’ values, serializing messages to an output stream, and parsing them from an input stream. The encoding happens utilizing a by concatenating keys and values into a concatenate byte stream. The keys are constructed using the field number defined in the schema, and wire type that protocol buffers defines for group of data types in a way that make protocol buffer messages self-explanatory. Depending on the programming language (C++, Java, or Python for example), the compiler generates a “.java”, “.py” or “.h” and “.c” file from each “.proto”, with a class for each message type described in the file. Each protocol buffer class has methods for writing and reading messages of the chosen type using the protocol buffer binary format. Protocol buffers are claimed to

(31)

outperform both JSON and XML specifically in numeric-based data where protobuf encoding and decoding performance is claimed to be significantly faster. Google Developers Protocol Buffers site claims that protobuf are simpler, 3 to 10 times smaller, 20 to 100 times faster, less ambiguous, easier to use programmatically [35]. Additionally, several online benchmarking projects show how protobuf outperforms JSON being 5 to 6 times faster and showing better performance even in environments that run JavaScript where JSON is a native format, benchmarking examples are the open-sourced benchmarking project by a Google developer [36] and an interesting comprehensive benchmarking by Bruno Krebs on ‘auth0’ platform [37]. Specifically, protobuf shows significant advantages when dealing with numeric data which makes it a very interesting candidate to be tested in IoT application scenarios and specifically sensor nodes that deal mostly with numeric sensor readings.

FlatBuffers

Also known as flatbuf9, is another open-source cross-platform and binary-based data serialization mechanism developed and maintained by Google. It’s essentially developed to deal with game development and other performance-critical applications. It’s claimed to be developed to have better memory efficiency and speed compared to protobuf due to the utilization of the concept of a flat binary buffer that doesn’t require unpacking/parsing to another representation before accessing the data. It’s also a platform-neutral language-neutral schema-based communication protocol but with the additional support for schema language features like unions. Built-in data types supported in flatbuf include all the major scalar data types including 8-bit and 16-bit integer data types, in addition to string that can hold only UTF-8 or 7-bit ASCII, and byte data types Flatbuf schema language relies on two ways of defining objects, tables and structs. Tables are the main and less constrained way and can include fields of different data types, structs and other nested tables. Structs are more constrained in terms of as they can only include scalar values or other structs, they are used for smaller objects and in turn can be accessed faster. In all cases, a ‘root type’ definition is needed the end; it declares what will be the root table for the serialized data in case multiple tables are used. The way flatbuf works is like protobuf as once the “.fbs” schema file is written, the file is compiled using the target language flatbuf compiler, the compilation generates the header files that includes helper classes to access and construct serialized data. For reading the buffer back, only the pointer to the root object needs to be obtained in order to have an in-place access to the fields. According to Google benchmarks, it’s claimed that flatbuf is approximately 60 times faster than protobuf in and 200 times faster than JSON in encoding. In decoding, given the implementation of flatbuf, it extremely outperforms other protocols that require unpacking processes before data can be decoded being approximately 3000 times faster than potobuf and could achieve a 7000 times faster performance than JSON /citeflatbufbench .

(32)

MessagePack

MessagePack10 is an open-source binary-based object serialization mechanism. Since it’s a

schema-less mechanism, the implementation concept is simpler than protobuf and flatbuf. It relies mainly on two concepts, the Type System and the Formats. Type System defines the data types sup-ported which covers main data types with some limitations on integer value range and String object maximum byte size. The type system also includes the concept of Map that represents key-value pair of objects, a concept that is utilized in our implementation to encode the field label and the value. The Formats concept describes how each data type is binary-encoded. For example, a single byte is used to encode small integers while strings require an additional byte to encode beside the strings encoding. MessagePack uses packing/unpacking logic to refer to encoding/decoding pro-cesses. The mechanism is designed to be lightweight and to have small memory footprint, which makes it applicable and beneficial to utilize in IoT devices.

BSON

Short for Binary JSON, is and like MessagePack, an open-source schema-less binary data serialization mechanism11. The protocol is claimed to be designed to be lightweight minimizing spatial overhead,

easily traversable as it’s represented as a main data representation format in MongoDB, efficient due to the utilization of C data types that results in relatively faster encoding and decoding processes. BSON deals with data as documents that store key/value pairs as a single entity, where keys are represented as strings and values could hold any of the supported data types. MongoDB was the first project to utilize BSON as a data format, as it prioritizes speed of encoding decoding and faster traversal over size of encoded message. BSON also has different open-source library implementations that support numerous languages and platforms including a C and C++ library implementations. Although many of BSON libraries focus on MongoDB applications, it’s stated that these libraries will be made stand-alone and independent of MongoDB to allow other applications to benefit from the protocol. These criteria make BSON a candidate protocol to be utilized in IoT context, and especially sensor nodes where data types tend to be basic.

Based on the efficient mechanisms, specifications, and the advantages these binary-based data serialization techniques are claimed to have compared with text-based data formats, they become interesting to explore and evaluate the value they could bring in IoT applications that would highly benefit from resource-efficiency and further optimizations.

10https://msgpack.org 11http://bsonspec.org

(33)

Related Work

The rapid implementations and scaling of IoT systems has emphasized the need for optimization techniques to be explored and implemented in order to reduce the amount of data transmitted between devices in purpose of saving memory, power consumption, and network bandwidth from exhaustion. Therefore, several approaches have been implemented addressing these issues from different aspects. In this section some of these investigations are surveyed. We classify the researches under approaches analyzing and evaluating data exchange protocols that are also referred to as application layer protocols and session layer protocols, and their impact on the mentioned metrics, and another category that includes approaches addressing payload data optimizations techniques. In both categories, our interest focuses on the aspects of reducing the overall data transmission either by targeting the data exchange protocols overheads, or the methodology of collecting and storing payload data.

3.1

Data Exchange Protocols Evaluation

Due to the nature of IoT systems requirements that vary from one application to the other, systems cannot rely on a single messaging protocol and thus creating a dilemma for the IoT industry [38]. This indeed motivated the academia community to perform analytical studies aimed comparing and ranking protocols based on different performance metrics that impact the overall IoT system. In their analysis, the focus remains on analyzing protocol generated overheads, communication speed, and power consumption, in purpose of driving optimality on IoT resource constrained applications. In [2], a comparative analysis is done over a subset of commonly implemented protocols in IoT applications namely: MQTT, CoAP, AMXP with reference to HTTP as a baseline. The compara-tive analysis uses several metrics to compare the performance but what makes it relevant to us is how the choice of protocol significantly affects the message size and overhead, power and resource consumption, bandwidth consumption, and latency. The performances variances can be attributed

(34)

to the way each protocol was implemented. The paper illustrates that CoAP performs better in all these metrics due to its reliance on UDP rather than TCP which eliminates the connection overheads caused using TCP that impact the performance metrics. MQTT then follows as a TCP-based but lightweight protocol. However, from the angle of adoption rate among industries and organizations, MQTT, however, is an established M2M protocol and is used by many reputed and leading organi-zations. Following is AMQP that showed international and large-scale adoption in big projects such as Oceanography’s monitoring of the Mid-Atlantic Ridge, NASA’s Nebula Cloud Computing and India’s Aadhar Project. These significant adoption trends are powered by the ability of these proto-cols to perform in a nimble and lightweight manner minimizing processing overheads that exhaust the device resources as well as the operating network.

In [39], a more quantitative approach is undertaken comparing CoAP, MQTT, and XMPP with emphasis on real-time communications of sensors data. The interesting results showed how MQTT outperforms the other two protocols when it comes to packet creation and transmission. That is, and despite relying on TCP, the design of MQTT attempts at optimizing the process of message generation minimizing synchronization delays resulting in two times faster packet delivery than CoAP. Moreover, XMPP shows significantly the slowest performance and is mainly because of the relying on XML formats for reading and sending messages.

With the focus on the optimization advantages that could be leveraged using UDP as the un-derlying transport protocol; In [40], a comparative analysis is done between CoAP and MQTT-SN, which is a UDP-based extension and even more lightweight implementation of MQTT that is de-signed with low bandwidth and small message payloads in mind, and thus it profiles and fits itself for resource-constrained sensor devices. The experiment uses seven text files with varying sizes starting from 2.25 Kbit to 63 Kbit in purpose of gaining comprehensive results on the efficiency of each protocol, the results suggest that MQTT-SN is approximately 1000 times faster than CoAP and thus, from a resource utilization aspect of sensor devices and WSN, MQTT-SN shows a more efficient and optimal behavior with significantly less messages overhead and transmission time which comes beneficial in a WSN environment where data is transmitted in a frequently.

Moreover, in a 2016 Master’s thesis by Thomas Wickman [1], different lightweight and known to be efficient communication protocols are compared with emphasis on generated data transmission overheads. Applied to the context of vehicle to server communication, the author examines the performance of MQTT, MQTT-SN, and AMQP. The study scope is motivated by data trends and transmitted data amount optimization priorities from Scania vehicles; It’s expected that in 2020, Scania’s will have 600 thousand connected vehicles, and thus the amount of data these vehicles would generate is also expected to significantly increase along with the number of offered services. This increasing trend does not only apply for Scania’s vehicle, a report by McKinsey suggests that connected cars will continue to increase and win 2020 there will be one connected car in every 5 cars which round to 290 million connected cars [41]. This highlights the importance and the need of

(35)

reducing the aggregate amount of transferred data between connected vehicles and the server due to the data cap limitations of 10 MB per day for each vehicle in Scania’s case which concludes that the lower the protocol overheads the more data will be available to utilize by other services. These protocols, that are designed to minimize transmission overheads and speed up the communication, are yet to show variances in their performance citethomaswickman2016. In addition to protocol-generated overheads, the thesis focuses on how protocols behave in the situation of packet loss in terms of the amount of data needed in the process of retransmission where the case is that Scania’s GSM network has a 4% retransmission rate and is subject to high packet loss due to vehicle’s operations in remote areas with unstable or non-existent connection. The thesis tests the protocols sending an aggregate of 1 MB of data constructed in three different payloads sizes of 100B, 1KB, and 10KB. The thesis concludes that despite the fact the MQTT-SN failed to work for the 10KB payload messages, M1TT-SN was still considered the potential candidate to be the lightest protocol of the three. The breakdown of MQTT-SN was attributed to the minimal and nimble implementation nature of MQTT-SN that results in a lot more overhead due to fragmentation at the application layer. Substantially, thesis highlights the fact that the overhead size of each of the three protocols grows in a directly proportional manner to the payload size, and that MQTT-SN performs way better at the 100B payloads with 217% overhead compared to 366% and 389% overheads generated by MQTT and AMPQ respectively. This conclusion supports the motivation of our thesis that if the payload itself is reduced, then the protocol overheads are reduced and in turn the total data transmitted is reduced. This argument can be understood clearly from the overhead formula used in Thomas’s thesis to theoretically calculates the MQTT-SN overhead.

The formula shows that even on 1KB payloads, which makes 98% of Scania’s payloads, MQTT-SN generates 19% overhead for a 1 MB transfer, a performance that beats MQTT 32% and AMPQP 34% overheads. It also highlights the direct impact of higher payloads on MQTT-SN overheads.

In this section of related work, we observe that protocols designed to be lightweight remain favourable communication protocols in the context of IoT which emphasizes the importance of minimal overheads and fast data transmissions in all IoT applications in purpose of leaving capacity for the increasing number of services made available. While these investigations focus their efforts and analysis on data exchange protocols operating at the application layer, our thesis focuses on the layer where the payload acquisition and structuring happen at the physical layer and before its encapsulation through the data link and transmission through the upper layers, a focus we share with several other investigations that are discussed in the second section of this chapter.

3.2

Payload Optimization Approaches

In this section, we review approaches that target the optimization of transmitted payloads in pur-pose of minimizing the resulted data traffic and reducing resource consumption resource-constrained

Riferimenti

Documenti correlati

Supplementary Materials: The following are available online at http://www.mdpi.com/2073-4409/9/5/1270/s1 , Figure S1: Schematic presentation of experimental setup with time

Keywords: Diazonium salts, Sandmeyer reaction, o-Benzenedisulfonimide, Electron transfer agent, Aryl

Ora, alcuni studi di storia della filosofia (come quello di Annamaria Loche, Immagini dello stato di natura, Milano, 2003) hanno rilevato come lo stato di natura in

Epidemiology and Biostatistics, Institute for Research in Extramural Medicine, Institute for Health and Care Research, VU University Medical Center, 1081BT Amsterdam,

Spatial epidemiology of Amyotrophic Lateral Sclerosis in Piedmont and Aosta Valley, Italy: a population-based cluster analysis.. Published version: DOI:10.1111/ene.13586 Terms

Indeed, if the unbounded derived categories of two Grothendieck categories are equivalent via a restrictable equivalence, then one of them is the heart of a t-structure associated to

The Treaty law does not seem able to provide effective remedies to these problems, above all to problems concerning complex systems, because Islamic states traditionally

The fact that Benjamin grasps the essential aspect of modern dwelling in that intérieur (as opposed to the working environment) which distinguishes the late