Enhancing data privacy and security in Internet of Things through decentralized models and services

(1)

UNIVERSITÀ DEGLI STUDI DELL’INSUBRIA - VARESE

DiSTA

Dipartimento di Scienze Teoriche e Applicate

P H D T H E S I S

to obtain the title of

Doctor of Science

Specialty : Computer Science

Defended by

GÖKHAN SA ¯GIRLAR

ENHANCING DATA PRIVACY AND SECURITY IN INTERNET OF

THINGS THROUGH

DECENTRALIZED MODELS AND SERVICES

Advisor: Prof. Barbara Carminati Advisor: Prof. Elena Ferrari

defended on October 5, 2018 Jury :

Reviewers : Dr. Federica Paci - University of Southampton

Dr. Dan Lin - Missouri University of Science and Tech.

President : Dr. Elena Ferrari - University of Insubria Examinators : Dr. Claudio Agostino Ardagna - University of Milan

Dr. Pierluigi Gallo - University of Palermo

(2)

(3)

Bismillahirrahmanirrahim

Muhtaç Oldu¯gun Kudret Damarlarındaki Asil Kanda Mevcuttur!

Ne Mutlu Türk’üm Diyene!

❑!♦#

(4)

Candan öte aileme ithafen...

(5)

Acknowledgement

I would like to express the deepest gratitude to my advisors Dr. Elena Ferrari and Dr.

Barbara Carminati for their academic guidance. I have learnt a great deal of privacy, security, and, how to do research from you. I will always be thanking you to providing me this opportunity to take the biggest challenge of my life so far when I was a senior bachelor student. Your patience towards my mistakes has always motivated and encouraged me to continue my research. I believe, your guidance through more than 1000 emails and over 100 meetings in last 3 years has transformed me from a motivated but confused rookie to a skilled professional with high potential and goals. Also, I will always be glad for letting me work on diﬀerent topics as I wish. I hope to continue to be learning from you through all of my career.

I would like to thank Dr. Emanuele Ragnoli for supporting my research activities and helping me to improve my skills. I will always be thankful for your collaboration to my research and your trust to me. There are many things to do and many things to learn, and I will continue to do my best.

I would like to thank Dr. Pietro Colombo. I have learnt many stuﬀ from you about technical matters, social things, and convenient solutions for countless questions that I have asked you. I will always envy your work ethics, discipline, and helpful attitude.

I thank my Ph.D. colleagues Bikash, Tu, Alberto, Stefania, Zulﬁkar, for good times we spent during my stay in Italy.

3

(6)

Abstract

Wearable devices tracking our fitness activities and health status, smart home technologies supporting home automation services, smart city technologies improving quality and performance of urban services. These are just some examples of Internet-connected “things” clearly prov- ing that the Internet of Things (IoT) is already upon us, impacting our every-day lives. The promise of IoT is making the world smarter, more profitable, autonomous, more connected and more efficient. So far, IoT has already been applied to several environments: healthcare, man- ufacturing, retail, buildings, cities, automative, transportation, energy etc. Indeed, according to the IHS Markit, as of 2018, number of connected IoT devices has reached 27 billion.¹

However, challenges posed by IoT have also increased with its popularity. As such, among others, challenges on data privacy, security, and limited decentralization of IoT systems introduce major threats for the future of IoT [140, 117]. Given that, in this thesis, we focus on data privacy and security issues in IoT under the decentralized model.

In the ﬁrst part of the thesis, we focus on data privacy issues. In a typical IoT scenario individuals’ privacy can easily be violated due to the high volume of managed personal data.

Particularly, conﬁdential information about individuals may be revealed to unauthorized parties, or, combination of diﬀerent data may lead to infer sensitive information about individuals.

In addition to those issues, in a decentralized IoT scenario, where IoT devices (i.e. smart objects) share data with each other, privacy protection is even more challenging as it is more diﬃcult to control how data are combined and used by smart objects where future operations on data are unknown. Therefore, ﬁrst challenge that we take in this thesis is enhancing data privacy in IoT with a user-centric model. First, we propose a privacy enforcement framework for centralized IoT systems. Then, we extend it for decentralized IoT systems. In this model, compliance check of user individual privacy preferences is performed directly by smart objects.

Decentralization, if coupled with proper security mechanisms, would have many advantages over centralized infrastructures for IoT, such as, among others: better privacy guarantees for data owners, more resilient and secure systems, improved interoperability between services, concerted and autonomous operations. Notably, blockchain is a promising decentralized platform due to its ability to achieve distributed consensus [128] and with it’s intrinsic security features to ensure data integrity. Given that, we shift our focus to address issues related to security and decentralization of IoT systems with blockchain based systems. At this purpose, we ﬁrst deal with security issues in IoT, as resource constrained IoT devices do not employ strong security

1cdn.ihs.com/www/pdf/IoT_ebook.pdf

4

(7)

5 mechanisms and they are easy targets for attackers. One of the most relevant attack is when attackers take advantage of the vulnerabilities of IoT devices, and compromise them to add them to botnets, that are collection of compromised internet computers controlled by attackers.

Then, attackers use their botnets for their malicious purposes, such as performing Distributed Denial of Service (DDoS) attacks. Moreover, to increase attacks’ success chance and resilience against defence mechanisms, modern botnets have often a decentralized P2P structure, which makes them harder to detect. In order to deal with this problem, we take a first step towards detecting P2P botnets in IoT, by proposing AutoBotCatcher. AutoBotCatcher exploits a Byzantine Fault Tolerant (BFT) blockchain, in order to perform collaborative and dynamic botnet detection by collecting and auditing IoT devices’ network traffic flows as blockchain transactions. Secondly, we take the challenge to decentralize IoT, and design a hybrid blockchain architecture for IoT, by proposing Hybrid-IoT. In Hybrid-IoT, subgroups of IoT devices form PoW blockchains, referred to as PoW sub-blockchains. Connection among the PoW sub- blockchains employs a BFT inter-connector framework. We focus on the PoW sub-blockchains formation, guided by a set of guidelines based on a set of dimensions, metrics and bounds.

(8)

List of Figures

2.1 An example blockchain with 4 blocks . . . . 25

4.1 Reference IoT platform . . . . 41

4.2 Example of data category tree . . . . 42

4.3 Example of purpose tree . . . . 42

4.4 Experiment 1 Results . . . . 55

4.5 Experiment 2 Results . . . . 56

5.1 Smart home scenario . . . . 60

5.2 Privacy enforcement by SO roles . . . . 63

5.3 Graph-based SQL query . . . . 73

5.4 Varying query complexity - Processor SO I . . . . 75

5.5 Varying query complexity - SO network - time overhead . . . . 77

5.6 Varying the number of sensing SOs in Processor SO II scenario . . . . 80

5.7 Varying the number of sensing SOs in Consumer SO scenario . . . . 80

5.8 Varying number of sensing SOs with associated a PP in Processor SO II scenario 81 5.9 Varying number of sensing SOs with associated a PP in Consumer SO scenario 82 6.1 Round and state relation . . . . 89

6.2 AutoBotCatcher system ﬂow . . . . 90

7.1 Hybrid-IoT . . . 104

7.2 CPU utilization of light peer devices in Performance Evaluation I . . . 108

7.3 CPU utilization of full peer devices in Performance Evaluation I . . . 109

9

(12)

List of Tables

3.1 Features of the considered state-of-art approaches. If the approach contains the

feature: "✓". . . . 30

3.2 Features of the considered state-of-art approaches. If the approach contains the feature: "✓". . . . 35

4.1 Experiments conﬁguration . . . . 56

5.1 Experiments: X - time overhead, X * time and bandwidth overhead . . . . 74

5.4 Varying query complexity - SO network - bandwidth overhead . . . . 76

5.2 Varying query complexity - Processor SO II . . . . 76

5.3 Varying query complexity - Consumer SO . . . . 76

5.5 Overhead by varying PP complexity . . . . 79

7.1 PoW Blockchain - IoT Integration Metrics . . . . 96

7.2 Evaluation I: Block sizes and block generation intervals . . . . 99

7.3 Evaluation II: Device Locations . . . 100

7.4 Evaluation III: Number of IoT Devices - Experiment (A): Fixed diﬃculty setting 101 7.5 Evaluation III: Number of IoT Devices - Experiment (B): Fixed interval setting 102 7.6 Perf Eva II: Performance Statistics . . . 109

7.7 Security Experiments’ Results . . . 110

10

(13)

Chapter 1 Introduction

The Internet, one of the greatest inventions of humankind, started out as a government funded defense project in 1962 and evolved to ARPANET in 1969, and signiﬁcantly, this opened the way to innovate. In early 90’s Internet saw Tim Berners-Lee’s invention of World Wide Web (WWW) [18] to merge networked information retrieval and hyper-text documents. With many more inventions over last decades, the Internet has experienced a great success. Indeed, today Internet is evolved into a huge network that has many services, such as: e-commerce web-sites, online social networks, personal web blogs, news media sites, and so on.

The Internet, as a tool, opened a way to generate new technologies and inventions for engineers and researchers. In 1982, a group of graduate students came up with the idea of connecting their building’s soda machine to the Internet in order to check whether the machine is empty or sodas are cold before going to the machine to buy a soda.¹ Necessity was the mother of invention as we learned from the famous english proverb², and the Internet was the enabling tool for the invention. According to many sources, this is considered as the very ﬁrst example of a new kind of devices, which we call today Internet of Things, or short IoT. Yet, the term IoT was coined way later by British inventor Kevin Ashton³. Roughly, the term IoT refers to the network of physical objets, so called things, such as sensors, RFIDs or various kinds of physical devices that are able communicate with each other, server or cloud over Internet. IoT transforms physicals objects from being traditional to smart, by enabling them to see, hear, and perform tasks, by letting them to share information with each other [4]. Indeed, the promise of IoT is making various environments smarter, eﬃcient, more connected and autonomous with intelligent decision making [4].

In the last decade, we have witnessed a great growth of IoT. Indeed, according to the IHS Markit, as of 2018, number of IoT devices is over 27 billion. Moreover, IoT devices have already been everyday objects in our daily lives with wearable devices, smart home applications, smart cities and so on. IoT has also been applied to several industries, such as food processing, agri- culture, healthcare, environmental monitoring, transportation and logistics, mining production

1ibm.com/blogs/industries/little-known-story-first-iot-device

2In original it is first used by William Horman in his book Volgar in Latin as:"Mater artium necessitas".

3rfidjournal.com/articles/view?4986

11

(14)

CHAPTER 1. INTRODUCTION 12 monitoring, security surveillance and so on [42]. As suggested by its growth in number of devices and many use cases, as also stated by the US National Intelligence Council [98], IoT is certainly one of the disruptive technologies today.

Despite its potential, many use cases, great growth, and many futuristic ideas for its future, IoT also comprises important challenges. Some of the key challenges are, among others: data privacy, security, and decentralization of IoT systems [140, 117, 4]. In this thesis, we deal with data privacy and security issues in IoT to support IoT to reach it’s full potential. In doing that, we take decentralization as main approach and model, as we believe it will play fundamental role in future IoT systems. In the following, we describe the main motivations behind dealing with on data privacy and security issues of IoT, and why decentralization matters for IoT.

Data privacy issues in IoT

IoT is impacting our every-day lives with many applications that process personal and conﬁden- tial data, such as wearable devices that track ﬁtness activities and health status of individuals.

With extensive number of IoT devices that are collecting and processing personal and conﬁden- tial data, naturally, individuals privacy protection arises as a major challenge to overcome. In fact, data privacy issues in IoT has been widely investigated in the literature [86, 103, 140, 117].

Moreover, there are also data protection regulatory laws and frameworks, such as European Union General Data Protection Regulation (GDPR) [106]. In order to ensure European Union citizens’ personal data protection GDPR introduces data protection principles and data subjects’ rights [84]. As stated by GDPR, data subjects should be given more transparency on how their data is processed and they should be in charge of their personal data that refers to any information related to an identiﬁable person.⁴

First issue related to data privacy in IoT is revealing personal information to unauthorized parties/data consumers. Second privacy issue arises due to further processing of data. In that, sensitive information of data owners may be inferred through data analytics processes (e.g. data joins, aggregations). For example, by joining and combing data related to movements, heart beats and breath rate, it is possible to infer possible psychological disorders due to insomnia.

Privacy issues get even more complicated when we consider a decentralized IoT scenario where devices are able to process and share data with each other. Indeed, in such a scenario, we do not have prior knowledge about how data are going to be shared and processed. Therefore, third issue on data privacy is protecting data owners’ privacy even in unknown future use of data.

For example, walked distance and number of steps sensed by a smart watch can be combined to infer individual’s height information with some approximation. Starting from the height information and by combining it with weight information sensed by a smart scale one can infer body mass index of individuals. As such, future operations performed over data may introduce additional privacy violations, thus privacy of the users should be enforced in the future use of their data.

4eugdpr.org/the-regulation/gdpr-faqs

(15)

CHAPTER 1. INTRODUCTION 13 Security issues in IoT

Device vendors do not take security as primary goal in producing IoT devices [9], in order to produce IoT devices quickly to catch market trends and protect their business profits. However, this made IoT an amplifying platform for cyberattacks, where malicious parties can easily take control of IoT devices [140, 117, 9]. Threats caused by compromised IoT devices present serious security issues for online services’ security, such as: attackers may control compromised devices for their malicious purposes and form malicious botnets, or steal confidential information stored by devices. We focus on malicious botnet threat due to their relevance and huge destructive effects on victims. A malicious botnet is a collection of compromised Internet computers being controlled remotely by attackers for malicious and illegal purposes, such as performing cyberattacks (e.g.,Distributed Denial of Service (DDOS) attacks)[132]. Typically, attackers try to infect as many devices as possible in order to increase power and effect of their attacks. Indeed, in 2016 the Mirai malware [8] infected many IoT devices in order to perform DDOS attacks by generating extensive amount of Internet traffic (more than 1 tbps).

Why decentralization matters?

Today, most of the IoT systems are centralized cloud based computing infrastructures. However, centralized IoT infrastructures present a number of drawbacks to IoT. Such drawbacks are, among others: high cloud server maintenance costs, weak adoption and support for time-critical IoT applications, security issues (e.g., Single Point of Failure (SPoF)). Moreover, diﬀerent parties (device vendors, service providers) have to trust each other or another third party in order to collaborate in centralized systems. This limits the interoperability between diﬀerent IoT applications and services.

On the other hand, decentralization, if achieved, would have many advantages over centralized infrastructures. The most prominent outcomes of decentralizing IoT is achieving distributed consensus among IoT devices, and if properly governed it will improve security of IoT systems, and, providing better privacy guarantees to the users with eﬀective data protection mechanisms. With that, IoT devices are able to perform concerted and autonomous operations, which increases data utility of IoT systems due to enhanced data processing and information generation. Moreover, amount of data transferred to the cloud for processing and cloud maintenance costs are reduced in decentralized systems. The last but not the least, decentralization would be instrumental to improve security and privacy of the managed data by assuring data security and accountability, and eliminating SPoF problem of centralized systems.

Decentralized IoT systems have to be able to process high throughput of transactions and scale to many peers in achieving consensus without a trusted central authority. Therefore, IoT decentralization requires frameworks that employ scalable and performant distributed consensus among peers. Lack of such frameworks has been a bottleneck against successful decentralization of many domains including IoT. But, rise of Bitcoin [92], the peer-to-peer digital currency, has the potential for paradigm shift in decentralization. Particularly, invention of blockchain as underlying technology of Bitcoin, has opened a way to overcome distributed consensus bottlenecks in a decentralized setting for large scale applications. Fundamentally,

(16)

CHAPTER 1. INTRODUCTION 14 blockchain is the concept of a distributed ledger maintained by a peer-to-peer network and it allows peers of a p2p network to reach consensus without needing a central authority and es- tablishing trust. Moreover, blockchain employs cryptographic techniques, such as; asymmetric cryptography and hashing, enables to ensure integrity of the blockchain data structure (cfr.

Chapter 2 for background information on blockchain). Given that, we consider blockchain as a promising decentralization platform. Therefore, we use blockchain technology as a tool in designing decentralized IoT systems.

1.1 Thesis Objective

By referencing to the above discussed motivations, we formulate our objective as follows:

OBJECTIVE: Establish and develop alternative architectures, frameworks, and models to enhance individuals privacy in IoT scenarios and improve security of IoT systems, especially under decentralized model, that are efficient and practical.

A notable challenge in enhancing data privacy and security in IoT under decentralized model is developing lightweight and scalable decentralized solutions. In fact, decentralization comes with a cost, as it might introduce additional complexity and overhead, especially when considering cases that require distributed consensus. Therefore, in this thesis, in addition to challenges related to enhancing data privacy and security, we also deal with the intrinsic challenge of making lightweight and scalable decentralized systems for IoT. To this end, we use blockchain technology as decentralization tool.

1.2 Terminology

We provide deﬁnitions to the key concepts that are used in this thesis for reader’s convenience.

These deﬁnitions are mainly given based on the common understanding in the literature, but are also related to how they are used in this dissertation.

• Data: Information generated by IoT devices.

• Smart objects: IoT devices that are not only able to sense data, but also able to processes and aggregate data, and interact with other IoT devices.

• Privacy: Individuals’ right to control how their data could be shared with others (e.g.

third party data consumers).

• Security: Protection of data from disclosure, alteration, destruction and loss [116].

• Data owner: Owner of the IoT device that generates data.

(17)

CHAPTER 1. INTRODUCTION 15

• Blockchain: A distributed data structure shared across peers of a p2p network. Block- chain data structure is secured using cryptography to protect integrity of its records. We provide background on blockchain technology in in Chapter 2.

1.3 Main Contributions

In summary, this thesis provides the following main research contributions:

• A core framework to enforce user’s privacy in centralized IoT systems with low overhead.

For this framework, we design a novel privacy preference model, which allows to handle privacy with a user-centric approach. Particularly, with their privacy preferences users are able to state: which portion of their personal data can be accessed and how these data can be combined; and what can not be inferred from their data through any kind of analytics processes. Experimental results show the eﬃciency of the proposed enforcement mechanism.

• An extension of the above mentioned core framework for decentralized IoT smart objects with low overhead. The novel problem that this framework tackles is privacy enforcement without a centralized reference monitor. In order to solve this problem, we leverage on the privacy model developed for the core framework extend it to deal with decentralized ecosystems. In contrast, the enforcement mechanism leverages on ad hoc-designed security meta-data, called Privacy-Enhanced Attribute Schema (PEAS), attached to each piece of data for decenralized privacy enforcement. Privacy preference compliance check is performed before data are going to be released to the third party data consumers. The proposed framework has been tested on diﬀerent scenarios, and the obtained results show the feasibility of our approach.

• Architectural design of the ﬁrst blockchain-based botnet detection architecture for IoT, called AutoBotCatcher. AutoBotCatcher performs dynamic and collaborative botnet detection and prevention for IoT with dynamic community detection methodology. In Auto- BotCatcher, a blockchain is exploited to enable multiple parties to collaborate for botnet detection without needing to trust each other or a central server/database. Moreover, blockchain is exploited to model the botnet detection process as a set of shared application states of parties collaborating in botnet detection.

• Hybrid blockchain architecture to decentralize and secure IoT, called Hybrid-IoT. Hybrid- IoT exploits Proof of Work (PoW) blockchains to achieve distributed consensus among operations performed by IoT devices. By virtue of that, in Hybrid-IoT, IoT devices are able to collectively and autonomously execute their operations by forming their machine to machine (m2m) communications in form of blockchain transactions. This also guarantees accountability and security of the stored data. To measure performance of the PoW blockchains (i.e., transaction throughputs) we deﬁne a set of PoW blockchain-IoT integration metrics. We also provide a measurement study of the performance of PoW

(18)

CHAPTER 1. INTRODUCTION 16 blockchains in IoT, subject to the PoW blockchain-IoT integration metrics. We test performance and security of the proposed approach.

1.4 Thesis Organization

The dissertation is organized into eight chapters, brieﬂy described in the following:

Chapter 2: Background - Blockchain

In this chapter, we ﬁrst provide background information on the cryptography techniques used in blockchains, then we present blockchain, and, ﬁnally, we introduce the consensus problem and consensus protocols employed by blockchain.

Chapter 3: Literature Review

We review the literature on proposals dealing with data privacy in IoT, botnet detection, and blockchain based systems for IoT.

Chapter 4: Enhancing User Privacy in IoT

In this chapter, we present the core framework to enforce user’s privacy in centralized IoT systems.

Chapter 5: Decentralizing Privacy Enforcement in IoT

In this chapter, we present the enhanced privacy enforcement framework, that is tailored to enforce user’s privacy in decentralized IoT systems consisting smart objects.

Chapter 6: Blockchain-based P2P Botnet Detection for IoT

In this chapter, we present AutoBotCatcher, a blockchain-based P2P botnet detection mechanism for IoT.

Chapter 7: Hybrid Blockchain Architecture for IoT

In this chapter, we present Hybrid-IoT, a hybrid blockchain architecture for IoT.

Chapter 8: Conclusions

In this chapter, to sum-up this thesis, we discuss main arguments and contributions. This chapter outlines the future plan as well.

(19)

CHAPTER 1. INTRODUCTION 17

1.5 Related Publications

The research activities described in this thesis have brought to the following publications:

• Barbara Carminati, Elena Ferrari, Pietro Colombo, Gokhan Sagirlar, "Enhancing user control on personal data usage in Internet of things ecosystems", in 2016 IEEE Inter- national Conference on Services Computing (SCC), pp. 291 - 298, in San Fransisco, USA;

• Gokhan Sagirlar, Barbara Carminati, Elena Ferrari, "Decentralizing Privacy Enforcement for Internet of Things Smart Objects" in Elsevier Computer Networks Journal, Volume 143, 9 October 2018, Pages 112-125;

• Gokhan Sagirlar, Barbara Carminati, Elena Ferrari, "AutoBotCatcher: Blockchain-based P2P Botnet Detection for the Internet of Things", in 2018 IEEE International Conference on Collaboration and Internet Computing (CIC), in Philadelphia, Pennsylvania, USA;

• Gokhan Sagirlar, Barbara Carminati, Elena Ferrari, John D. Sheehan, Emanuele Rag- noli "Hybrid-IoT: Hybrid Blockchain Architecture for Internet of Things - PoW Sub- blockchains", in 2018 IEEE International Conference on Blockchain, in Halifax, Canada;

• Gokhan Sagirlar, Barbara Carminati, Elena Ferrari, John D. Sheehan, Emanuele Ragnoli

"Hybrid-IoT: Scalable and Hybrid Blockchain Architecture for Internet of Things", Under preparation;

(20)

Chapter 2 Background

The work conducted on this thesis is on enhancing data privacy and security of IoT systems under the decentralized model. Given that, in this section, we provide background information on architecture models and protocols for IoT (ie. Section 2.1). On the other hand, towards the thesis, we exploited diﬀerent technologies and tools. The most signiﬁcant one, among others, is the blockchain technology, as we use it to design a P2P botnet detection framework (Chapter 6), and a hybrid blockchain architecture for IoT (Chapter 7). Therefore, in this chapter (ie.

Section 2.2) we also provide the background information on blockchain technology.

2.1 IoT Architectures and Protocols

In this section we provide an overview about architecture models for IoT systems in Section 2.1.1 and commonly used protocols in IoT in Section 2.1.2.

2.1.1 IoT Architectures

Despite some eﬀorts to generate a reference IoT architecture (e.g. IoT-A [31]), there is no de facto architecture model for IoT. Yet, by reviewing relevant works on the literature, such as [138, 11, 73, 57, 4], we present common patterns in IoT architectures. As such, the basic IoT architecture is the 5 layered model as presented in [4] and [73], that includes following layers:

Object (Perception) Layer where IoT devices exist, Object Abstraction (Network) Layer where IoT data are securely transferred (e.g. ZigBee, RFID) and managed by cloud computing or data management systems, Service Management (Middleware) Layer where data generated by heterogenous IoT devices are processed, decisions made, services are paired and delivered to the requesters, Application Layer where services provided to customers, and Business Layer where overall systems are managed with a business model.

An important aspect in discussing architecture models for IoT is the group of elements that are required to deliver the functionality of IoT systems. According to [4], the six main elements for IoT are: Identiﬁcation, refers to naming and matching IoT services and devices;

Sensing, refers to gathering data from the physical world within the network; Communication,

18

(21)

CHAPTER 2. BACKGROUND 19 refers connecting IoT devices together securely to deliver smart services; Computation, refers processing of the IoT data and represents brain of the IoT application; Services, refers to IoT applications’ services for customers or other services; and Semantics, refers to ability to extract knowledge smartly to provide required services.

There are two types of IoT architectures: centralized and decentralized. These architectures diﬀerentiate mainly in the way how they handle computation, services and communication elements of IoT systems. In the following, we provide background information on centralized and decentralized IoT architectures.

Centralized IoT Architectures

In centralized IoT architectures, IoT devices are linked to a central hub, such as server or cloud, which is used to provide backend services to smart devices [141]. In centralized IoT architectures, the main objective of IoT devices is sensing data from physical world around them. Then, they share sensed data with the central hub.The central hub performs operations related to computation, services and semantics elements of IoT systems, such as real-time analysis, event processing and management, data management, decision taking etc. To sum up, in centralized IoT architectures objectives of IoT devices are limited, and operations that require processing and data storage are left to the central hub. In the following we brieﬂy discuss cloud computing based IoT systems as example to centralized IoT architectures.

Cloud-based IoT Systems. Essentially, the amount of data generated by IoT systems are huge and often referred as big data. Many IoT scenarios, such as smart grid, health monitoring etc., requires to perform real-time analytics, processing, and decision taking on such big data generated by IoT platforms. Cloud computing oﬀers a new way to manage and process big data generated in IoT scenarios [4] and currently vast majority of IoT applications are based on cloud-based solutions. In cloud-based architectures, data sensed and sent by IoT devices are pooled at a single or geographically distributed cloud infrastructures. Where, they process data generated by IoT devices, generate services for IoT applications, take decision according to the stored data, and provide services to the users and customers when demanded. There are many cloud platforms that are in use today, such as Amazon Web Services¹ and IBM Watson IoT². In addition, academic literature oﬀers several works that propose cloud based IoT systems such as [2, 53, 74, 95, 90], where cloud computing have been used to store, process and manage IoT data for various applications scenarios.

Despite their wide usage, centralized systems systems have many drawbacks for IoT. In this thesis, the most relevant drawbacks addressed are data privacy and security issues of centralized IoT systems that are discussed in Chapter 1.

Decentralized IoT Architectures

Decentralized IoT architectures are essentially decentralized systems of cooperating smart objects. Such smart objects are able, not only to sense data, but also to interact with other objects

1aws.amazon.com/iot

2ibm.com/internet-of-things

(22)

CHAPTER 2. BACKGROUND 20 and to aggregate data sensed through different sensors. This allows smart objects to locally create new knowledge, that could be used to make decisions, such as quickly trigger actions on environments, if needed. Smart objects are very heterogeneous in terms of data sensing and data processing capabilities. Some of them can only sense data, others can perform basic or complex operations on them. Such a scenario enacts the transition from the Internet of Things to the Internet of Everything, a new definition of IoT seen as a loosely coupled, decentralized system of cooperating smart objects, which leverages on alternative architectural patterns with regards to the centralized cloud-based one. Where, unlike centralized IoT architectures, computation and service elements of IoT platforms are also objectives of such smart objects. As such, devices perform some operations over sensed data and take decisions, depending their computation and storage capabilities. How the communication elements handled is also different than centralized architectures, since smart objects share their data with other smart objects, maybe autonomously, provided that they are able to convert and understand each other’s communication protocols (i.e. with the help of a middleware layer). In the following, we briefly discuss fog computing as an example to decentralized IoT architectures.

Fog computing. Fog computing is a distributed computing paradigm that extends cloud to the edge of the network [21], where extensive amount of heterogeneous decentralized and ubiq- uitous IoT devices and gateways communicate and cooperate with each other in the network to perform computation and storage tasks [129]. Fog computing is not an alternative to the cloud computing, it is a supplementary paradigm that improves localization of services and reduces amount data transferred to the cloud, thus it acts as a bridge between smart devices and large- scale cloud computing and storage services [4]. Specifically, in fog computing devices at different hierarchical levels are equipped with "intelligence" to examine whether an application request requires the intervention of the cloud computing tier or not. It also has the potential to reduce delay in delivering services to end users due to localization of services with close proximity. In fact, as measured in [113], as number of latency-sensitive IoT services increase, fog computing outperforms cloud computing by decreasing overall latency of the services. Specifically, in [113]

Sarkar et al. showed that, in an environment where 50% of applications are requesting real- time services, fog computing compared to cloud computing reduces the overall service latency by 50.09%. The fog play important role in many IoT scenarios, such as connected vehicles, smart grids, wireless sensor and actuator networks [21]. Additionally, academic literature oﬀers interesting proposals that make use of fog computing in IoT applications such as [126, 62, 1].

Given smart objects’ increasing storage, communication and computing capabilities in parallel to the Moore’s Law, we can expect to see increase in number of decentralized IoT systems [125]. Despite their potential and beneﬁts, as described in Chapter 1, decentralized systems face many new challenges on assuring data privacy and security are addressed in this thesis.

2.1.2 IoT Protocols

Thanks to adopted protocols by available IoT devices, IoT is a very heterogenous domain.

Therefore, in order to increase interoperability of IoT services and applications, various working groups and consortiums from many groups, such as Institute of Electrical and Electronics Engineers (IEEE), European Telecommunications Standards Institute (ETSI), International

(23)

CHAPTER 2. BACKGROUND 21 Telecommunication Union-Telecommunication (ITU-T) and Internet Engineering Task Force (IETF), are working to generate standardized protocols that remove gaps between diﬀerent protocols. In the following we present some prominent examples of such eﬀorts through protocol standardization, namely The Constrained Application Protocol (CoAP), Message Queue Telemetry Transport (MQTT), The Internet Protocol version 6 (IPv6) and IPv6 over Low- Power Wireless Personal Area Networks (6LoWPAN).

The Constrained Application Protocol (CoAP)

The IETF Constrained RESTful Environments (CoRE) working group generated CoAP [115, 22] with aim to make Hypertext Transfer Protocol ’s (HTTP) Representational State Transfer (REST) paradigm available to restrained IoT devices and networks. Given that, CoAP protocol stack is similar to, but less complex than, the HTTP protocol stack [22], and like HTTP it is a application layer protocol. Since it shares the REST architecture with the HTTP protocol, CoAP is capable of interacting with several device types easily. In order to reduce complexity of HTTP, CoAP uses User Datagram Protocol (UDP) as transport layer protocol rather than Transmission Control Protocol (TCP) used by HTTP.

On top of the UDP, CoAP deals with asynchronous nature of interactions and the request/response interactions [115]. Moreover, for applications that require security CoAP can be used on top of Datagram Transport Layer Security (DTLS) that protects confidentiality and integrity of message contents [22]. Where, device may adopt different security levels that are divided into four modes [115] as following: NoSec mode where DTLS is disabled so there is no protocol level security; PreSharedKey mode where DTLS is enabled and there is a list of shared keys (keys include a list of nodes can be used to communicate); RawPublicKey mode where DTLS is enabled and device has a asymmetric key without a certificate; and, Certificate mode where DTLS is enabled and device has a asymmetric key with a certificate signed by some common trust root. CoAP is also capable of carrying different types of payload and it integrates different data model types such as XML and JSON.

Message Queue Telemetry Transport (MQTT)

MQTT is a lightweight publish/subscribe messaging protocol that was invented in 1999 by Andy Stanford-Clark of IBM and Arlen Nipper of Arcom [130] and in 2013 it became an OASIS standard. MQTT is suitable for low-bandwidth, unreliable and high-latency networks and is designed for constrained devices.³ MQTT’s publish/subscribe model has three main components: publisher, subscriber and broker [4]. By taking the subscriber role IoT devices subscribe to their interest of topics published by publishers. When a publisher publishes a message, MQTT transfers the message via broker to every IoT device that has subscribed the topic of the published message.

Unlike CoAP, MQTT works on top of TCP/IP protocol. For applications that require to take security measurements, MQTT connections can be complemented with Transport Layer

3mqtt.org/faq

(24)

CHAPTER 2. BACKGROUND 22 Security (TLS). Even though MQTT runs on TCP/IP protocol suite rather than more lightweight UDP, it is designed to be low overhead, and thanks to its publish/subscribe model, subscribers do not respond to messages they have received from a publisher topic they have subscribed [68]. Therefore lower network bandwidths and less device resources are used. More- over, as it runs over TCP/IP, it attempts to ensure some degree of assurance of delivery, even in unreliable networks.

The Internet Protocol version 6 (IPv6)

IPv6 was developed by IETF as new version of the Internet Protocol as the successor of the IPv4.

IPv6 has been standardized by IETF in July 2017 with the publication of RFC8200. In order to solve problems related to the depletion of unallocated addresses in IPv4 address space, IPv6 expanded the address size 32 bytes of IPv4 to 128 bytes, and thus is able to support much greater number of addresses and more levels of address hierarchy [44]. In addition to that, in order to reduce IPv6 header packets’ network bandwidth consumption and packet processing cost, some of the fields included in IPv4 header has been dropped. By using Security Architecture for the Internet Protocol defined in RFC4301 [72], IPv6 packets’ integrity and confidentiality can be protected. In addition to that, IPv6 packets can be also protected with upper layer protocols such as TLS and Secure Shell (SSH) [44].

IPv6 over Low-Power Wireless Personal Area Networks (6LoWPAN)

In 2007, IETF’s 6LowPAN working group defined 6LowPAN protocol to enable IPv6 packets to be carried on top of low power wireless networks, specifically IEEE 802.15.4, to apply the Internet Protocol (IP) to the constrained and small devices [89]. Thanks to 6LoWPAN, existing network architecture can be used and constrained devices can easily connect to other IP based networks without proxies and translation gateways [114]. Due to complexity and performance reasons, the most common transport protocol used with 6LoWPAN is the UDP [114]. In 6LoWPAN, the necessity of configuration servers DHCP and NAT is eliminated. Moreover, 6LoWPAN implementations can easily fit into 32K flash memory [89]. Given that, in 6LoWPAN overhead for the most common packets are much less than other protocols [89].

2.2 Blockchain Technology

The term blockchain has been first used to define the underlying technology of the first digital currency (aka cryptocurrency) called Bitcoin⁴. Bitcoin is proposed by a person or group using pseudonym Satoshi Nakamoto in 2008 [92]. Main invention of Bitcoin was its ability to remove any trust relation to perform money transfers.

In essence, blockchain is a cryptographically secure distributed data structure shared across the peers of the p2p network, where trust among peers to achieve consensus between peers.

In this section we provide an overview about blockchain technology, speciﬁcally by focusing

4bitcoin.org

(25)

CHAPTER 2. BACKGROUND 23 on the parts that will be needed later. First, in order to lay the foundation of important concepts, we provide background information on cryptography techniques used by blockchains in Section 2.2.1. Then in Section 2.2.2, we elaborate the discussion on the blockchain technology.

In Section 2.2.3 we introduce the consensus problem and discuss consensus methodologies of blockchains. Finally, in Section 2.2.4, we present three blockchain platforms relevant to this thesis.

2.2.1 Cryptography

Cryptography is the study, with a long history, of mathematical techniques related to the information security (e.g. data integrity, authentication, data conﬁdentiality) [71]. Cryptographic techniques lays the foundation for many modern security tools, such as encryption techniques, digital signatures etc. From distributed systems point of view, it is probably the key enabling technology for protecting security of the distributed systems [6]. In fact, as a distributed system, blockchain technology makes extensive use of cryptographic techniques. Thus, let us review and introduce some of the most fundamental cryptography concepts that are relevant to the blockchain technology, namely; asymmetric cryptography, digital signatures, and hash functions.

Asymmetric cryptography (Public key cryptography)

Asymmetric cryptography algorithms use a pair of keys, namely; public key and private key, where each component of the pair are used to verify the other and to perform two counter- part cryptographic operations (e.g. encryption - decryption) [116]. Private key is the secret component of the asymmetric key pair, which is essentially meant to be known only by the owner of it as a secret information. On the other hand, public key is the public component of the asymmetric key pair where owner can disclose it to the any part that she wishes. As an example use case of to asymmetric cryptography algorithms relevant to the blockchain, let us consider encryption-decryption algorithms. Let us assume, a user U₁wishes to send message m to user U₂ securely, meaning that by assuring integrity and and conﬁdentiality of the message.

Let us also assume U1 has the asymmetric cryptography key set, respectively public key and private key, P b₁and P r₁, and U₂ has P b₂ and P r₂. In order to send m securely, U₁ encrypts m with the receiver’s public key P b₂ with an encryption algorithm that U₂ is aware of (e.g. RSA algorithm [107]), and then sends the message to the U2. The caveat that ensures conﬁdentiality of the message is: only way to decrypt this encrypted message is using P r₂, which we assume only U₂ has.

Digital signatures

In above example scheme, if U₁also uses his private key P r₁ in encrypting m, authenticity and integrity of the message is also ensured, as U2 can decrypt it by using public key P b1 and thus, she can be sure that this message has been generated by U₁ and has not been modiﬁed on the way (as it would require to have P b₁ to encrypt modiﬁed message m). This is an example of the

(26)

CHAPTER 2. BACKGROUND 24 methodology known as digital signatures, and is an application of asymmetric cryptography.

The basic idea of the digital signatures is that, they can be created by only one, but can be read by everyone [6].

Hash functions

Essentially, hashing refers to mapping larger domains to smaller ranges [71]. In cryptography domain, hash functions are used to ensure data integrity, where they map a variable length string to ﬁxed-length string [116]. Conceptually a good hashing algorithm never has collusions, meaning that there is no pair of inputs that would let algorithm to generate same hashed output. In this thesis, we assume that considered hashing algorithms are collusion-free. Today, algorithms belonging to Secure Hash Algorithms (SHA) family, such as SHA-2 are widely used by blockchain protocols such as Bitcoin. Hash functions are extensively used in blockchain systems to ensure integrity of the data structure by connecting blocks to each other in chain structure (explained in detail below). In addition to that, Proof of Work blockchains (explained in detail below), uses hash functions as proof of work functions.

2.2.2 Blockchain

Blockchain relies on the concept of a distributed ledger maintained by a peer-to-peer network [128]. Novelty of the blockchain technology lies in its ability to achieve coordination and veri- ﬁcation of individual activities carried out by diﬀerent parties without a centralized authority or trusted third party, that allows decentralization of application execution with concerted and autonomous operations.

In blockchain, transactions transfer information (i.e., data packets) between peers. They have a unique identiﬁer (transaction-id), input data, and are bundled into data chunks, referred to as blocks. Block generator peers of the blockchain broadcast blocks by exploiting public-key cryptography. Blocks are recorded in the blockchain with an exact order.

Brieﬂy, a block contains: a set of transactions; a timestamp; a reference to the preceding block that identiﬁes the block’s place in the blockchain; an authenticated data structure (e.g., a Merkle tree) to ensure block integrity.⁵ The block height is block’s distance to the genesis block, which has the height 0. An example blockchain containing four blocks is presented in Figure 2.1.

Blockchains can be classified into two groups as public and permissioned according to their way of regulating peers’ participation in blockchain operations. Particularly, in public blockchains, any peer can read and write to the blockchain, meaning that anyone can participate in the consensus process. Whereas, in permissioned blockchains, only a set of previously identified peers can write to the blockchain and participate in the consensus, and, read rights may be public or limited to pre-identified peers.

5Block structure varies in different blockchain protocols, here we list the most common elements.

(27)

CHAPTER 2. BACKGROUND 25

Figure 2.1: An example blockchain with 4 blocks

Stale blocks

Blockchains may have forks, aka diﬀerent branches in the chain structure, due to malicious manipulations or propagation delays. The longest fork of the blockchain is accepted as the main branch (for rest of the thesis we refer blocks included to the main blockchain as genuine blocks), and remains as the agreed [128]. In general, blocks in shorter forks do not included in the blockchain and they referred as stale blocks, and transactions in stale blocks are considered as unprocessed by the network.

Smart contracts

Modern blockchains employ deterministic and self-executing contractual clauses called smart contracts. The smart contract concept was ﬁrst introduced by Szabo in [124] as: "a com- puterized transaction protocol that executes the terms of a contract". Smart contracts are executable scripts stored forever in the blockchain, where nobody can modify or control them.

They have unique address on the blockchain, and their clauses can be triggered by peers via sending transactions to execute them.

2.2.3 Consensus

In essence, any kind of distributed system includes a set of processes (i.e. abstract units that are able to perform computations) and in order to execute properly it seeks to achieve some kind of cooperation among these processes [28]. Consensus is a form of agreement among set of processes. Processes use consensus to agree on a common result value after their operation [28]. In a distributed setting, achieving consensus between a set of processes is not an easy objective, considering the fact that anyone of processes may fail or crash, communication among processes may be delayed or even blocked (i.e. due to network latency and/or network partition), or some of processes may act maliciously and not follow the protocol. In the literature consensus problem has been widely studied [76, 77, 32, 83, 28], and many diﬀerent models and systems has been proposed with various fault model abstractions of processes.

Please refer to [28] for broad discussion on processes in distributed systems, consensus problem, and diﬀerent consensus models.

(28)

CHAPTER 2. BACKGROUND 26 Consensus Protocols

As blockchains are essentially distributed systems, consensus problem is one of the most fundamental problems that must be handled by blockchain protocols. In fact, diﬀerent participants in the blockchain have to achieve consensus on the latest state of the ledger in order to achieve coordination on the processes that they perform on the ledger. There are diﬀerent methodologies to achieve consensus in the blockchain, we explain the related consensus protocols in the following.

Proof of Work (PoW) protocols. As introduced by Back in HashCash [14], PoW consensus mechanisms rely on the condition of doing some computation that requires to use hardware resources and energy to prove legitimacy of the performed operation. Parallel to that, in blockchains using PoW-based consensus protocols (for the rest of the thesis we will refer such blockchains as PoW blockchains), such as Bitcoin [92], block generators have to solve a cryptographic puzzle to generate a valid block. Roughly, main mechanism of the PoW is as follows: upon forming a new block, block generators add a nonce to the block and take hash of the block; if the hash value satisfies the predefined threshold, then they can seal the block and publish it to the blockchain network. Block generation operation is called mining and block generators are referred as miners. In PoW consensus, it is hard to generate a block, however it is easy to confirm its validity once it is generated. The main virtue of PoW is preventing instant block generation to reduce conflicts such as double spending and sybil attacks.

PoW consensus protocols requires to have a system where majority (i.e. one-half plus one of all peers) of the mining power follows the protocol, aka are honest peers. If majority is not honest, malicious miners may be able to generate malicious blocks (i.e. blocks that consist bogus transactions) more frequently than the honest minority and have the longest branch of the blockchain, which violates correctness of the consensus. Therefore, throughout this thesis, we assume that in PoW blockchains, at least more than one-half of the total mining power of the blockchain is honest.

PoW is a probabilistic consensus method, meaning that possibility of a block or transaction being in the correct branch of the blockchain increases with the more blocks added to the blockchain as conﬁrmations. Indeed, it is harder for an attacker to generate more blocks to form an alternative branch of the blockchain when the genuine blockchain is longer as each block requires to solve PoW puzzle.

Byzantine Fault Tolerant (BFT) protocols. Let us ﬁrst brieﬂy describe the concepts called Byzantine processes and Byzantine Fault Tolerance. Byzantine processes are malicious processes that may fail arbitrarily in any possible way from it’s algorithm and task [28]. For example, Byzantine processes may not follow the protocol that they have assigned to, they may stop responding, they may reject connection request, they may selectively drop messages or they may lie and propagate false information and so on. A distributed system designed to be BFT must be able to operate correctly and achieve consensus in existence of such arbitrary behaving processes. In order to achieve BFT consensus, it is a very-well known fact that, in an asynchronous network (where messages between processes may delay for unbounded times), the best we can do is to assume that at most less than one-thirds of the all processes are

Enhancing data privacy and security in Internet of Things through decentralized models and services

P H D T H E S I S

ENHANCING DATA PRIVACY AND SECURITY IN INTERNET OF

THINGS THROUGH

DECENTRALIZED MODELS AND SERVICES

❑!♦#

Acknowledgement

Abstract

Contents

List of Figures

List of Tables

Chapter 1

Introduction

Chapter 2

Background