Performance analysis of cryptographic protocols for the Internet of Things

(1)

Second Cycle (D.M. 270/2004) Final thesis

Performance analysis of cryptographic

protocols for the Internet of Things

Supervisor: Graduand:

Chiar.mo Prof. Marco Noalettin

Riccardo Focardi Matriculation Number 838175

Co-Supervisor: Academic Year:

(2)

(3)

With the increasing growth of the Internet of Things (IoT), more and more devices are connected to the Internet to exchange data. These devices are used in many different fields and most of them are resource constrained and also powered with battery so there is a need of using efficient protocols to maximize the durability. On the other hand there is also the necessity of securing all these devices from possible cyber attacks. The MQ Telemetry Transport (MQTT) protocol meets the requirement of efficiency but it is inherently insecure. A simple solution is to use the TLS protocol to secure the communication channel. Unfortunately the use of TLS has a negative impact on performance. In this thesis we measure the cost of TLS on MQTT and we propose some efficient solutions and alternatives to secure MQTT.

(4)

(5)

Abstract iii

List of Figures vii

List of Tables ix

1 Introduction 1

1.1 Structure of the thesis . . . 2

1.1.0.1 Background. . . 2

1.1.0.2 Performance analysis of the MQ Telemetry Transport protocol. . . 2

1.1.0.3 Performance analysis of TLS in MQTT . . . 2

1.1.0.4 Alternatives to TLS on MQTT . . . 3

1.2 Contribution . . . 3

2 Background 4 2.1 The MQ Telemetry Transport protocol . . . 4

2.1.1 Architecture . . . 5

2.1.1.1 CONNECT. . . 8

2.1.1.2 SUBSCRIBE . . . 9

2.1.1.3 PUBLISH. . . 9

2.1.2 Protocol diffusion. . . 10

2.2 Transport Layer Security protocol . . . 11

2.2.1 Architecture of TLS . . . 12

3 Performance analysis of the MQ Telemetry Transport protocol 15 3.1 Experimental Design . . . 16

3.1.1 Experimental Environments . . . 16

3.1.2 Software used for profiling . . . 17

3.1.3 Methodology . . . 18

3.2 Profiling of MQTT . . . 19

3.2.1 Test of MQTT without TLS. . . 19

3.2.2 Test of MQTT with TLS . . . 27

3.2.3 Comparison between MQTT with and without TLS . . . 32

4 Performance analysis of TLS in MQTT 34 4.1 Varying the RSA key size . . . 34

4.2 Comparison between ECDSA and RSA . . . 37 v

(6)

4.3 Analysis of the TLS impact on the payload . . . 41

4.4 TLS behaviour on different types of network. . . 42

4.5 Test with the ESP8266. . . 44

4.6 Summary . . . 46

5 Alternatives to TLS on MQTT 47 5.1 Related works . . . 47

5.2 New proposal . . . 48

5.2.1 Resume an already negotiated TLS session . . . 48

5.2.2 Design of a new method to secure MQTT . . . 50

6 Conclusions 53

(7)

2.1 MQTT protocol architecture . . . 6

2.2 MQTT packet structure. . . 7

2.3 List of all the Message Types in MQTT . . . 8

2.4 Message flow of the MQTT CONNECT function . . . 9

2.5 MQTT servers found with Shodan . . . 10

2.6 Highlight of the TLS 4-way handshake phase . . . 13

3.1 Output from Callgrind showing the CPU cost of the CONNECT function 21 3.2 M QT T Client connect call graph . . . 21

3.3 Output from Wireshark showing the packets exchanged during the con-nection phase . . . 22

3.4 Detail of the output from Massif showing the Heap RAM consumption of the Connect function . . . 23

3.5 Analysis of the CPU cost of the PUBLISH command with Callgrind . . . 24

3.6 Analysis in Wireshark of the PUBLISH command . . . 25

3.7 Output from Callgrind showing the cost for the MQTTClient subscribe function . . . 26

3.8 Detail of the output from Wirshark showing the size in bytes for the MQTTClient subscribe function . . . 26

3.9 Output from Callgrind showing the cost of the CONNECT function when used with TLS . . . 27

3.10 Call graph of the MQTTClient Connect function when using TLS . . . . 28

3.11 Call graph generated for the PUBLISH function with TLS . . . 29

3.12 Output from Wireshark showing the packets exchanged by the CON-NECT function encrypted with TLS . . . 30

3.13 Detail of the output from Massif showing the Heap RAM consumption of the Connect function when using TLS . . . 31

3.14 Barplot showing the comparison of the CPU times for the CONNECT function with and without TLS . . . 32

3.15 The panda-polar bear relationship . . . 32

4.1 Graph showing the CPU time for different RSA keys length . . . 35

4.2 Detail from Callgrind showing the CPU cost of the verify and decrypt functions for a 1024 bit key . . . 36

4.5 Call graph of the RSA verification step . . . 39 vii

(8)

4.6 Call graph of the ECDSA verification step . . . 39

4.7 Comparison of the CPU times using different cipher suites . . . 40

4.8 Comparison of the dimension in bytes of the payload with different ciphers 41 4.9 Comparison of the CPU time using different ciphers . . . 42

4.10 Output from Wireshark showing the ECDSA bytes exchanged by the client and server Hello message . . . 43

4.11 Output from Wireshark showing the RSA bytes exchanged by the client and server Hello message. . . 43

4.12 Total time RSA and ECDSA with different data rates . . . 44

4.13 The panda-polar bear relationship . . . 45

4.14 Comparison of the total RAM usage with and without TLS . . . 45

5.1 CPU time comparison between session resumption and full handshake . . 49

5.2 CPU time comparison between session resumption and full handshake . . 50

(9)

3.1 Hardware specifications for the Raspberry Pi and the ESP8266 . . . 16

3.2 CPU time of MQTTClient connect function . . . 20

3.3 CPU time of MQTTClient publish function . . . 24

3.4 CPU time of MQTTClient subscribe function . . . 25

3.5 Summary of the CPU time values for the connect, publish and subscribe functions obtained on the PC . . . 26

3.6 Summary of the CPU time values for the connect, publish and subscribe functions obtained using the Rpi . . . 26

3.7 CPU time of the CONNECT function with TLS . . . 29

3.8 CPU time of PUBLISH function with TLS . . . 29

3.9 CPU time of the SUBSCRIBE function with TLS on the RPi . . . 30

3.10 Summary of the CPU time values for the connect, publish and subscribe functions obtained using the Rpi with TLS . . . 31

4.1 CPU time using different keys size measured on a Rpi . . . 35

4.2 Summary of the RAM usage with different RSA keys length . . . 36

4.3 Summary of the packet size exchanged during the connection phase using different RSA keys length . . . 37

4.4 Summary of the CPU time, RAM consumption and packets size for dif-ferent RSA keys size obtained using the Rpi . . . 37

4.5 List of the most common TLS cipher suites . . . 37

4.6 Time comparison between RSA and ECDSA . . . 38

4.7 CPU time comparison of the MQTT CONNECT function used with RSA and ECDSA . . . 38

4.8 CPU time comparison between ECDHE and ECDH . . . 40

4.9 Data rates and latency for different types of mobile connections . . . 43

4.10 Comparison of the total times and RAM usage on ESP8266 with and without TLS . . . 45

(10)

Introduction

When we transmit information over a network without any security measure, a potential attacker can easily read and manipulate data compromising confidentiality, integrity and authenticity. It is therefore important to secure the communication in order to prevent such type of attacks. With the new Internet of Things and Big Data era, a large number of connected devices constantly transmit information primarily to monitor and control data in real time from a remote location. These smart devices include light sensors, weather sensors but also trackers, fitness devices, alarms and so on. Thus, it is crucial to protect the transmitted information in order to preserve user privacy, and prevent that malicious users modify data or inject fake message. To make these device to com-municate, IoT needs standard protocols. There are various protocols used such as Con-strained Application Protocol(CoAP), Advanced Message Queuing Protocol(AMQP), MQ Telemetry Transport(MQTT), Data Distribution Service(DDS), Hypertext Trans-fer Protocol(HTTP) and many others. All these protocols have some advantages and drawbacks, but all are designed to work with resource constrained devices. Almost all of these protocols do not have any form of built-in encryption or integrity check, so it is important to implement solutions that can provide a certain level of security. Unfortu-nately, it is not always easy to design a new efficient protocol from scratch to secure the devices. For these reasons, TLS can be considered a good candidate since it provides both confidentiality and integrity and it can be easily implemented. But there is a price to pay: the use of TLS impacts on the performance. The constrained devices with very limited resources just do not work with TLS. But, even if TLS is compatible with most of the resource-constraned devices, the overhead introduced can decrease the battery life. For this reason, many implementators decide to give more importance to performance and give up on security.

(11)

In this work we study the performance impact of using TLS as a transport protocol for the MQTT protocol. We study MQTT and measure its performance first without encryption and then with TLS enabled. The choice of MQTT is because it is one of the most used protocols in IoT. We aim at finding where the overhead of TLS comes from and then we study if it is possible to configure the protocol so to reduce the overhead. Finally, we propose some solutions to reduce the overhead of TLS.

1.1 Structure of the thesis

The structure of the thesis is defined as follows:

1.1.0.1 Background

This chapter is meant to introduce the reader an overview of what is the MQ Telemetry Transport protocol, its architecture and finally a section that shows some statistics about the general usage and security level of this protocol. Subsequently, it is described the Transport Layer Security (TLS) protocol, and it is given a very basic knowledge on public key cryptography that will be useful to understand the mechanism on which TLS is based.

1.1.0.2 Performance analysis of the MQ Telemetry Transport protocol

This is the main chapter, dealing with the experimental part of the thesis, where we analyze the performance of MQTT. This chapter is divided in three sections. The first section introduces the design of the experiments, the variables, the hardware and the software used. The second part analyze the performance of the MQTT protocol, first without TLS and then with TLS enabled. The analysis includes the CPU, RAM and network usage. The goal is to find where the overhead of TLS is located. Finally, in the third part, we summarize the results of the tests conducted.

1.1.0.3 Performance analysis of TLS in MQTT

This section analyze more deeply the use of the TLS protocol in MQTT. In particular, starting from the result obtained previously, we vary various TLS parameters such as

(12)

RSA key length, cipher suite etc.. in order to find a configuration that can reduce the total cost. Furthermore, we present a study on the behaviour of different TLS authentication algorithms under different types of networks. Finally, a test with and without TLS using the embedded device ESP8266 is performed.

1.1.0.4 Alternatives to TLS on MQTT

This chapter is devoted to present some alternatives to TLS and solutions to secure the MQTT protocol. The chapter is divided in two parts: in the first part are presented some related works. In the seconds part we present our solutions. In particular, it is described a solution to reduce the impact of TLS using session resumption, and a new method design to secure the MQTT protocol, that involves the use of authenticated encryption.

1.2 Contribution

The main contributions of this thesis can be summarized as follows:

• We present an experimental performance study of the impact of using TLS on MQTT.

• We provide a detailed analysis and profiling of the MQ Telemetry Transport pro-tocol with and without TLS, and we identify the source of the overhead of TLS in the connection phase.

• We experiment many TLS configurations, trying to find out the best in terms of CPU, RAM and network cost.

• We study the behaviour of RSA and ECDSA under different types of networks, and discover under which conditions it is better to use one over the other. We also propose a solution that, depending on the type of network, can switch between RSA and ECDSA to save time. This may be helpful on embedded device powered with battery, in order to increase the durability.

• We propose new solutions to solve the problem of the TLS overhead on resource constrained devices.

(13)

Background

This chapter is intended to provide a brief introduction on the topics that will be used in what follows. We do not intend to provide a complete description of the topics and the arguments covered on this chapter. Therefore, to deeply understand the content, the reader should be already familiar with some notions of basic cryptography, programming and networks.

Chapter structure

This chapter is divided into two parts:

• In the first part we give an introduction to the MQTT protocol. In particular we introduce the architecture and we provide some statistics about the diffusion of the protocol.

• In the second part we briefly introduce very few basic concepts of asymmetric encryption and the TLS protocol and its architecture.

2.1 The MQ Telemetry Transport protocol

The MQ Telemetry Transport (MQTT) is a lightweight publish/subscribe messaging protocol used on top of the TCP/IP protocol, that allows resource-constrained network clients to send and receive telemetry information. With MQTT devices can send (pub-lish) information about a given topic, to a server called message broker. The broker then sends the information out to those clients that have subscribed to a specific topic. It was invented in 1999 by Andy Stanford-Clarck and Arlen Nipper, in order to create a new protocol for connecting Oil Pipeline telemetry systems over satellite with the goal of

(14)

reducing the costs of transmission. Although at the beginning it was a IBM proprietary protocol, it was released Royalty free in 2010 and became an OASIS standard on Oct. 29th 2014 after one year of standardization process. Now MQTT is fast becoming one of the main protocols for IoT (Internet of Things) deployments.

The original key points and goals of the protocol were:

• simplicity of implementation, • bandwidth efficiency,

• QoS Data Delivery,

• minimum requirements for power.

Historically, MQ originated from ”message queueing (MQ)” architecture used by IBM, but actually there is no queueing in MQTT. Facebook uses MQTT for their messenger app not only to minimize battery usage, but also because the protocol allows messages to be delivered efficiently in milliseconds.

Now MQTT 3.1.1 is the current version of the protocol and we refer to this version throughout this thesis although the MQTT Technical Committee is working on the next version called MQTT version 5 that is in the draft stage.

2.1.1 Architecture

In this section we describe how MQTT works. For a more detailed explanation the actual specification MQTT V3.1 [1] is freely available to everyone.

From the architectural point of view, MQTT is a message-oriented protocol based on the publish/subscribe design pattern in witch two different components interact with each other: the client and the broker.

Every device that intends to communicate with another exploiting the protocol can be viewed as a client connected via TCP to a broker. A broker is essentially a server whose task is to deliver all the messages received from the clients on a specific topic to all the clients that are interested to that particular topic. There are two types of clients: the publisher and the subscriber. The former is responsible for sending a particular message to topics, thus making them available to all subscribers to those topics, the latter can subscribe to topics that pertain to them and so receive whatever messages are published to those topics. The key point is that the publisher and the subscriber don’t know about the existence of one another and never connect directly to each other. This property is also known as space decoupling. MQTT provides also time decoupling

(15)

since publisher and subscriber do not need to run at the same time and synchronization decoupling because operations on both components are not halted during publishing or subscribing.

Figure 2.1: MQTT protocol architecture The figure2.1 depicts a simplified version of the MQTT architecture.

The broker is primarily responsible for receiving all messages, filtering them based on topics, decide who is interested in them, and sending messages to all the subscribed clients. Another important aspect that deserves to be mentioned with regards of this thesis, is the concept of topic. Basically a topic is a UTF-8 string and it is used by the broker to filter messages for each client. A topic consists in one or more topic levels separated by a slash, which separate the different levels of the hierarchy. A topic must be at least 1 character long and can contain spaces. A client can spawn a new topic simply publishing it, without any prior initialization. Every message sent to a MQTT broker must be linked to a topic and, clients that connect to a broker must subscribe to one or more topics. They are organized in hierarchy, such as the file system. MQTT also defines special topics that start with the $ symbol and are reserved for internal statistics of the MQTT broker. As a consequence, a client cannot publish messages to these topics.

As we mentioned before, one characteristic of MQTT is the presence of the Quality of Service (QoS). The QoS is simply an agreement between the sender and the

(16)

receiver of a message on the guarantees of delivering a message. QoS is an important feature, especially on IoT devices since it makes easier the communication on the unre-liable networks. MQTT allows to specify the QoS level for each message and this will determine how the client and the broker will interact with each other.

There are 3 QoS levels in MQTT:

• level 0 - At most once: This is the fastest mode of transfer. The message is delivered at most once, or not at all, and its delivery across the network is not acknowledged. For this reason, QoS level 0 is also called “fire and forget”.

• level 1 - At least once: This is the default mode of transfer. When using QoS level 1, the message is always delivered at least once. The broker notifies the publisher that it has received the message sending a PUBACK command message to the sender. If the PUBACK is not received after a certain amount of time, the publisher will resend the message setting the duplicate (DUP) flag to true. • level 2 - Exaclty once: In this mode of operation the message is always delivered

exactly once. It is the safest method but the transfer takes more time. There is a double exchanging of acknowledge between the client and the broker.

Last we explain the concept of packet in MQTT and how it is composed since it is a fundamental part of the MQTT architecture.

The MQTT protocol works by exchanging a series of MQTT control packets in a defined way. Each of these packets is composed of an header of fixed length and another of variable length as in figure2.2.

Figure 2.2: MQTT packet structure.

The fixed header is composed of four different flags:

• Message Type flag specifies the command to be sent. Figure2.3briefly describes all the possible commands types.

• DUP specifies if the packet is a duplicate of a preceding one. • QoS flag specifies the guarantees of delivery.

(17)

Figure 2.3: List of all the Message Types in MQTT

• RETAIN flag specifies if a message of type PUBLISH has to be stored in memory by the broker in order to be sent to all the new clients that will subscribe to a specific topic.

In what follows we describe in detail the three most significant Message Types that we will use in the next chapter.

2.1.1.1 CONNECT

This is one of the most used command message. This packet is sent from the client to the broker to initiate a connection. The connect packet contains mandatory fields that are:

• ClientId: This is the client identifier and contains a UTF-8 string unique per broker.

• CleanSession: This is a flag that can assume either the value 0 or 1 and indicates whether to start a new persistence session or not. If this flag is set to 0, the broker will store all subscriptions for the clients and all missed messages. If the flag is set to 1 the broker won’t store anything.

(18)

• KeepAlive: This fiels is an integer that specifies a time interval in seconds after that the client sends a PING Request message to the broker. The broker responds with a PING Response and this determines whether one of the two is still alive.

The optional flags include the username and the password for authenticating the client, if necessary. However the password is sent in plaintext. Others optional flags include all the ”will messages”, that allow to notify others clients when a client disconnects ungracefully. When a client sends a CONNECT command message, the broker must respond with a CONNACK message that contains a return code. Figure

2.4depicts this simple flow. In case of success, the return code is 0.

Figure 2.4: Message flow of the MQTT CONNECT function

2.1.1.2 SUBSCRIBE

This packet is sent from the client to the broker, and allows the client to subscribe to one or more topics, in order to receive relevant messages. This command message contains the client ID, along with a list of topics and, for each topic, the relative QoS depending on the guarantees the client needs. The broker responds to a SUBSCRIBE message with a SUBACK command message.

2.1.1.3 PUBLISH

The PUBLISH message is sent from the client to the broker when the client needs to send a message to all the subscribers of a specific topic. Thus, the PUBLISH messages must have a topic name field and a payload, which contains the message to send. It is completely up to the sender if it wants to send binary data, textual data or JSON. It is possible to set a QoS level for each message to be sent, to determine the guarantee of a message reaching the other end. When the QoS is greater than 0, a Packet Identifier flag must be set in order to identify a message in a message flow. Finally there is a DUP flag that specifies if a message is a duplicate or not.

(19)

2.1.2 Protocol diffusion

In this section we want to spend some words about the protocol diffusion and in which context is used. MQTT has found a large field of application in the IoT market, thanks to the fact that it is lightweight and can run on resource constrained devices. In fact, in comparison to others protocols used to connect IoT devices like HTTP Rest (Rep-resentation State Transfer), MQTT guarantees a minimum transfer of bytes. This is a major advantage since Internet of Things devices deal with a large amount of data. For this reason MQTT is used in the Industrial energy field, in logistics, in security and surveillance and also in the medical and health field.

Shodan [2] is a search engine for Internet-connected devices created in 2009 by John Matherly, and let the user find specific types of computers connected to the Internet, by using many filters. We can use Shodan to see how many public servers there are right now simply typing in the search box the string port:1883. In fact, MQTT by default allows all the incoming connections on port 1883. There is also a port used for the secure version of MQTT that is 8883.

Figure 2.5: MQTT servers found with Shodan

Figure2.5shows the result of the search. At the time of writing there are more than 23 thousand active MQTT servers. Most of them are located in United States and China. However, this result is indicative since, in total, there are many more MQTT servers. Lots of these are not listed, simply because some of them use different ports from 1883 or are not reachable from the public Internet.

When we try to connect to a broker, the server always responds with a CONNACK message and, if the Connection Code is 0, then the connection is successful. So, we can check the return code from CONNACK to know how many servers don’t use authen-tication. Moreover, with MQTT, one can also subscribe to the special topic, identified by the character #, that allows to receive messages from all the topics. Shodan already does this automatically, so we can get more statistical information simply by searching

(20)

for: port:1883 MQTT Connection Code: 0. From this search emerged that around 18 thousand of servers do not require any authentication method. This means that the 75% of the MQTT servers are free to access. For completeness, we also report the the list of the most used topics that give us a real indication about what MQTT is used for. The list comprehends:

• Location services: owntracks reporting GPS positions.

• Weather stations reporting values about temperature, humidity, etc.. • Light sensors reporting values.

• Home automation devices. • Health devices reporting values.

This list is not meant to be exhaustive but, give us an idea about the main fields of application of IoT as well as an idea about the security level.

2.2 Transport Layer Security protocol

This section is devoted to summarize very few elements from cryptography and the TLS protocol. On this purpose, let us first introduce the concept of public key cryptography. Public key cryptography is any cryptographic system that uses two pairs of keys: one is a private key, that only the owner knows, and the other a public key, that is known by everybody. Private and public keys are correlated, but the knowledge of the public key does not give any information about the private key. The public key is used to encrypt the message, while the private key to decrypt the message. The strength of an asymmetric encryption algorithm relies on the fact that it is computationally infeasible, for a properly generated private key, to be determined from its corresponding public key. This is obtained thanks to the use of one-way trap-door functions. We recall that an injective, invertible function is one-way if and only if f(x) is easy to compute but the inverse function f-1_{(x) is infeasible to compute. One of the most popular public key}

cryptographyc system is RSA and its security is based on the property of the big prime numbers to generate a one-way trap-door functions.

Now that we have introduced the basics concept of public key cryptography let us start to briefly introduce the TLS protocol. For more details see [3–5].

(21)

data integrity between two communicating applications on TCP/IP networks. Basically, it is the evolution of the SSLv.3 protocol and it is defined on RF2246 [6]. Currently, the latest standard version is TLSv1.2, but, the upcoming TLS v1.3 is already in the draft stage. One of the most common uses of SSL/TLS is on the HTTPS protocol to encrypt the data sent, such as credit card information, names or addresses. Even if it was initially designed to solve the security problem linked to the WWW, TLS can be adopted to secure many different protocols. In the following section we describe the basic architecture of TLS.

2.2.1 Architecture of TLS

TLS is located between the transport layer and the session layer of TCP/IP. From an architectural point of view TLS can be divided in two phases:

• the TLS Handshake phase. • the bulk data encryption phase.

The TLS Handshake Protocol allows the client and the server, to agree on certain parameters and then, to start an authenticated communication between the server and the client. For the purpose of this thesis we now describe in more details the Handshake protocol. This can be divided in five steps:

• Agree on the protocol version to be used; • Agree on a set of cryptographics algorithms; • Validating each other by exchanging a certificate; • Generate a shared key using asymmetric encryption.

• Encrypt all the traffic with a symmetric encryption algorithm using the shared key

This is achieved by exchanging messages between the client and the server, and it is known as 4-way handshake. The steps involved in the four-way handshake are:

1. The client sends a ”client hello” message containing information about the TLS version and a list of cipher suites it intends to use.

2. The server responds with a ”server hello” message containing the cipher suite chosen from the list provided by the client and its certificate. If the server requires a client certificate it sends also a ”client certificate request” message.

(22)

3. The client verifies the server certificate and sends to the server a random byte string encrypted with the server public key that allow both the server and the client to compute a shared key which is used in the next step. If it has received a ”client certificate request” message from the server then it sends the certificate. 4. The client sends a ”finished” message encrypted with the shared key, that indicates

to the server that the handshake is completed.

5. The server sends a ”finished” message encrypted the shared key, that indicates to the client that the handshake is complete.

6. Now all the communications are encrypted with a symmetric algorithm using the shared key.

Figure 2.6gives a graphical representation of all these steps.

Figure 2.6: Highlight of the TLS 4-way handshake phase

During all these phases, different algorithms are used. These set of algorithms is called cipher suite. In particular, one algorithm is used to exchange the key securely, another is used to verify the certificate and finally, an algorithm is used for the symmet-ric encryption and the integrity verification.

(23)

• For the key exchange RSA, Diffie-Hellman, ECDH, SRP, PSK; • For the authentication RSA, DSA, ECDSA;

• For the symmetric encryption RC4, DES, Triple DES, AES, IDEA or Camellia. • For the integrity HMAC-MD5 or HMAC-SHA.

The choice of the algorithm to be used depends on many factor. In this thesis, since we are looking for performance, we focus on ECDH to exchange the key and RSA and ECDSA for certification. The bulk data encryption phase uses the shared key to encrypt all the traffic, using symmetric encryption.

In this thesis we use TLS to encrypt the data sent with MQTT and study the performance. In fact, MQTT is not inherently secure enough. The reason why the implementators decide not to implement a buit-in cryptographyc system, is to keep MQTT simple and lightweight.

(24)

Performance analysis of the MQ

Telemetry Transport protocol

MQ Telemetry Transport is one of the most widely adopted protocol in the IoT world. This is mainly due to the fact that it is very light and can run on constraint devices. But, when we enable TLS to secure the communication, the performance degrade to the point where some resource-constrained devices, are no more able to support it.

In this chapter we study the performance of MQTT with TLS, with the goal of finding where the overhead is concentrated. This helps in finding new solutions to reduce the impact of TLS and to design new protocols, that can guarantee the same level of security.

Chapter structure

The chapter is divided into three sections:

• The first section describes the hardware setup, used to perform the tests, and all the software used to profile an analyze the performance. We describe also the methodology and the metrics adopted to perform the experiments.

• The second part deals with the tests performed on MQTT and it is divided into two subsections. In the first subsection we analyze the MQTT protocol without TLS, to establish a baseline useful for the subsequent tests. Then, we proceed with the analysis of the performance with TLS enabled. In particular, we profile the MQTT CONNECT, the PUBLISH and the SUBSCRIBE commands separately, with the aim at finding where the overhead is located.

• In the last section we summarize the results obtained. 15

(25)

3.1 Experimental Design

This section introduce the hardware used for the experiments, as well as the list of soft-ware used for the analysis of MQTT. Then, we describe the strategies and the approach used to conduct the tests.

3.1.1 Experimental Environments

This section briefly describes the hardware used during the tests. For the purposes of these experiments, we study the behaviour of MQTT on three different devices: a lap-top PC, a Raspberry Pi 1 Model B (RPi) and finally, the embedded device NodeMCU ESP8266. The laptop PC consists of a Intel i3-4030U processor and 4 GB of RAM memory with Ubuntu 17.04 Linux installed.

We further install Mosquitto to serve as the MQTT broker on the PC. The code written for this experiment uses the MQTT-Paho C library that is fully compatible with the Mosquitto Broker. The Raspberry Pi is a single board computer, created by the not-for-profit Raspberry Pi foundation. The core of the RPi is and ARM processor mounted on a Broadcom BCM2835 SoC. The technical specifications are showed on table 3.1. The RPi comes with Raspibian OS installed. It is interesting to test the performance of MQTT in this device because it has not a high computational capability, in fact, it can be considered like an embedded device and it also lacks all the CPU optimization of a laptop PC.

Finally, the NodeMCU ESP8266 is a self-contained SOC that integrates a small proces-sor and requires a very low power consumption of 3.3V. The procesproces-sor can run at 80Mhz but, can be overclocked securely to 160 Mhz. The NodeMCU is a real embedded device, that can be easily interfaced to a PC via the microUSB port and easily programmed in LUA language, using the Arduino IDE. For these reasons, it is frequently used to experiment with the IoT projects in the home automation context. The ESP8266 we used for the test, incorporates the latest version of the NodeMCU firmware Based on Lua 5.1.4. The more technical specifications are illustrated on table 3.1. The ESP has also a built in WiFi module that allows an easy connection to the network.

Raspberry Pi ESP8266

Model 1 Model B NodeMCU

CPU type ARM 1176JZF-S 700 MHz Tensilica Xtensa LX106 80Mhz

RAM Size 512 MB 96 KB

Network type Ethernet WiFi

(26)

Due to its characteristics, the ESP8266 represents a real use case and, for these reasons, we use it to measure the real impact of TLS. With these three devices, we think we can cover all the possible use cases and have a better accuracy on our results.

3.1.2 Software used for profiling

In this section we present the tools and the software used for the analysis and the profiling of the code. Profiling is a technique that consists in measuring the performance of a program and identify the bottlenecks. This can be done both manually or with tools, but, using a tool, one can achieve a more precise and detailed result. Typically a profiler tool collects data regarding:

• function calls,

• time spent within a function,

• number of time that a function is called,

For the purpose of the thesis we analyze the CPU, the RAM and the network usage. For the analysis of CPU and RAM, preference was given to the valgrind tool suite. It is an open source instrumentation framework, that includes many tools that can detect memory management and profile a program in detail. Actually there are three tools: a memory error detector, a time profiler, and a space profiler.For the purpose of this thesis, two tools have been used: Callgrind and Massif.

For the analysis of the CPU usage and the CPU time we use the tool called Callgrind. Callgrind is a free profiling tool that allows to know how a software program spends its execution time with each of the functions, during a particular execution path. The collected data consist of the number of instructions executed, their relationships to the source lines, the caller/callee relationship between functions and the numbers of such calls. Together with Callgrind, KCachegrind is used to graphically visualize the data and easily navigate through the large amount of data produced by Callgrind. One thing we can do with Callgrind is the generation of the call graph. A call graph is basically a control flow graph, that represents the calling relationship between functions. Each node is identied by a function call, while a link represents an immediate caller-callee execution path. With a call graph we can easily identify hot execution paths, i.e. paths that require long execution times.

As regards the analysis of RAM, we used the tool called Massif. It is a Heap profiler that measures how much heap memory a program uses. The heap is the region of memory

(27)

which is allocated with functions like malloc. The data generated by Massif are written in a file, called massif.out.pid, where pid is the PID of the program and, similarly to Callgrind, there is a tool called Massif Visualizer, that allows to graphically visualize Massif data.

Another tool that deserves to be mention is gperftools, a performance analysis tool, for Unix applications, useful to measure the CPU time.

Finally, for the network usage we used Wireshark. Wireshark is a network packet ana-lyzer. It captures all the packets, that pass through a specific interface, and displays this packets with detailed protocol information. It allows a deep inspection of hundreds of protocols and, also, to specify some filters on many criteria, to show only the interesting packets.

3.1.3 Methodology

This subsection outlines our profiling methodology, what we measured and the data collected and analyzed.

The goal is to study the performance of MQTT when it is encrypted with TLS and how much TLS degrades the performance. We study first the performance of MQTT without TLS, to have a baseline and a term of comparison, and then with TLS. After that, we start investigating more deeply on TLS by varying some parameters, with the aim at finding a configuration with a smaller overhead. The parameters that we vary in this second part of tests are: the dimension of the RSA key, i.e. 1024, 2048 and 4096 bits, different key exchange algorithms (ECDHE, ECDH) and different authentication algorithms (RSA, ECDSA) and some symmetric ciphers (AES, DES, RC4).

Our test program is written in C and uses the Eclipse Paho MQTT C library, which enable applications to connect to an MQTT broker to publish messages, to subscribe to topics and receive published messages. We preferred the use of C because it is the most widely used language for programming in the IoT context. We study separately the publish and the connect commands. In fact, this two commands have the greatest impact on the performance and are the most widely used in MQTT. Finally, we test the performance over different types of networks with different latency and bandwidth. As for the broker, we use Mosquitto since it is one of the most widely used and it is free. We analyze the CPU usage, in terms of instructions executed and CPU time, the RAM and the network usage, in terms of bytes transferred.

(28)

• A local case where both the client and the server run on the same pc. This test allow us to measure the cost of each operation with more precision because, by running both the client and the server on the same pc, we don’t have to bother about I/O and network latency.

• The second test case is done between the Raspberry Pi and the pc and, in partic-ular, the client is installed on the RPi and the broker on the pc. In this second test both the RPi and the pc are connected via ethernet. This second test case represents a more realistic situation, where the client runs on a device that doesn’t have the benefits of the PC optimizations.

• The last test is done with the ESP8266 that is connected to the PC, where the broker runs, via WIFI. This third test represents a realistic situation with network latency included and the computational power of a real IoT device.

As for the TLS library used in these tests, we use the OpenSSL library for the first two cases and the axTLS library for the test with the ESP8266. axTLS is a TLS library specifically designed for resource constrained devices.

To achieve a better accuracy measurement, we repeat each test for 15 times using the same configuration and we also save the output data on a text file for a further analysis.

3.2 Profiling of MQTT

This section is devoted to profile the MQTT protocol, first when used without TLS and then with TLS. We analyze separately the CPU, the RAM and the Network consump-tion. For the purpose of this test we wrote a simple C program that, first, performs a connect to the broker, and then publishes a message. To achieve a more accurate anal-ysis, we compiled our program using a separate build with the CMAKE BUILD TYPE set to RelWithDebInfo, in order to have a good compromise between performance and availability of debugging information, with respect to the use of the -g flag with GCC.

3.2.1 Test of MQTT without TLS

Now, we describe the first test without using TLS. For this test we set the QoS level to 0, this means that the message won’t be acknowledged by the broker and this helps saving a bit the network overhead. QoS 0 can be used when we don’t care if one or more messages are lost once a while. The payload sent is the constant string ”Hello World!”. Let us start analyzing the cost of the connect function that, as the name suggests,

(29)

simply performs the connection to the MQTT broker. In listing3.1we report the piece of our C code that we are going to analyze. The function is called MQTTClient connect and returns an integer that is equal to 0, if the connection is successful, or -1, if the connection fails. The conn opts argument is to specify the options for the connections such as the cleansession flag and the keepAliveInterval.

if ((rc = M Q T T C l i e n t _ c o n n e c t(client, &c o n n _ o p t s) ) != M Q T T C L I E N T _ S U C C E S S) { p r i n t f(" F a i l e d to connect , r e t u r n c o d e % d \ n ", rc) ; e x i t(E X I T _ F A I L U R E) ; }

Listing 3.1: C code of the CONNECT function

We run this code with Callgrind and we also collected the information about the total CPU time. The table 3.2 summarizes the statistics about the total CPU time in mil-liseconds of the MQTTClient connect function.

CPU time CONNECT

Min. Mean Max.

0.3410 ms 0.4477 ms 0.5820 ms

Table 3.2: CPU time of MQTTClient connect function

The total CPU time is a good index, but it depends on the underlying hard-ware. To know exactly how and where the time is spent in the M QT T Client connect function, let us analyze the output from Callgrind. The figure 3.1 shows the profiling result for the CONNECT, obtained using Callgrind and then graphically visualized with KCachegrind. The left panel shows the functions called, sorted by percentage of time spent inside each function, while the right panel shows the source code and the functions callees.

As we can see, the M QT T Client connect function takes the 18.51% of the CPU time. We can also see, on the right side down, that the total number of instruc-tions executed is equal to 94252. We also generate and analyze the call graph for the M QT T Client connect, as depicted in figure 3.2. In the tree we can observe how the CPU time is allocated in each function call and the relationships between sub-routines, in a computer program. For example, we can see, from the graph, that

(30)

Figure 3.1: Output from Callgrind showing the CPU cost of the CONNECT function

M QT T Client connect calls the M QT T Client connectU RI, that finally calls the func-tion MQTTClient connectURIVersion. From the paho MQTT library documentafunc-tion we know that this function attempts to connect a previously-created client.

The underneath subtree shows how the time of M QT T Client connectU RIV ersion is splitted. Basically, all the functions on the subtree create and connect to a socket and this process takes in total the 18.51% of CPU time.

(31)

As for the network part, we can use Wireshark to inspect the protocol. We recall, from the documentation of the MQTT protocol, that the connection is initi-ated through a client sending a CONNECT message to the broker, and then the broker responds with a CONNACK and a status code. Figure3.3shows the output from Wire-shark. We can see both the Connect Command of 96 bytes, containing the information like the QoS value, the Keep Alive interval, the client id etc.., and the Connect Ack of 70 bytes. The total size of the bytes transferred is thus 166 bytes.

Figure 3.3: Output from Wireshark showing the packets exchanged during the con-nection phase

Finally, we analyze the RAM usage with the use of Massif tool. In particular, we analyze the Heap segment. Figure 3.4 shows the output of Massif visualized with the utility massif-visualizer. In the diagram we can observe that the total memory heap consumption is increasing steadily upon a point the value of 126 KiB, that represents the peak. This is the total RAM size used. Valgrind shows by default the top 10 most memory consuming calls. The box below the graph of figure 3.4 can be expanded to analyze each function call in more details. In this case the RAM is consumed by the function MQTTClient create, showed in listing3.2.

M Q T T C l i e n t _ c r e a t e(&client, ADDRESS, C L I E N T I D, M Q T T C L I E N T _ P E R S I S T E N C E _ N O N E, N U L L) ;

(32)

Figure 3.4: Detail of the output from Massif showing the Heap RAM consumption of the Connect function

This function creates an instance of the client stored in the variable client, creates the structure and assigns to it the parameters such as the address of the broker to connect to, the id of the client, etc... This initiated client is then passed on the M QT T Client connect function.

Now we can basically repeat the same process to analyze the PUBLISH function. The function is showed in listing3.3.

int M Q T T C l i e n t _ p u b l i s h(M Q T T C l i e n t handle, c o n s t c h a r* t o p i c N a m e,

int p a y l o a d l e n, v o i d* payload, int qos, int r e t a i n e d, M Q T T C l i e n t _ d e l i v e r y T o k e n* dt) ;

Listing 3.3: C code for the MQTT publish function

This function simply publishes a message payload under the topic topicName, with QoS qos. The retained flag indicates whether the broker has to store the message and its QoS for that topics. Each client that subscribe to that topic, will receive the message immediately after subscribing.

We report in table 3.3the CPU time measured for the MQTTClient publish function. As we can see, this function takes half of the time with respect to the connect function. We now inspect the output of Callgrind, showed in figure 3.5. We discover that the

(33)

CPU time PUBLISH

Min. Mean Max.

0.1550 ms 0.2516 ms 0.3310 ms

Table 3.3: CPU time of MQTTClient publish function

function M QT T Client publish takes the 4.79% of the total CPU time and executes a number of instructions equal to 24376. This also explains why the CPU time are lower than the MQTTClient connec function.

Figure 3.5: Analysis of the CPU cost of the PUBLISH command with Callgrind

Let us now move to the analysis of the network consumption. We recall, from chapter 2, that the PUBLISH command simply sends a packet containing the topic name, the payload and others four flags, as showed in the output of Wireshark in figure

3.6. Clearly, in this case, the network consumption depends on the payload. In our case, the payload is the constant string Hello World!. Having QoS level set to 0, the client sends the publish message and does not wait for an ACK. The amount of data transferred during the PUBLISH is thus 86 bytes, much less with respect to the connect command.

(34)

Figure 3.6: Analysis in Wireshark of the PUBLISH command

We want to complete the tests analyzing also the subscribe function. It is a very simple function showed in listing 3.4

c o n s t c h a r* t o p i c = " t e s t ";

int qos = 0;

rc = M Q T T C l i e n t _ s u b s c r i b e(client, topic, qos) ;

Listing 3.4: Fragment of the C code of the subscribe function

This function is very simple as it simply subscribes to a topic with a certain QoS level. Table3.4shows the CPU time measured for the MQTTClient subscribe function.

We observe that the CPU time is lower than the one of the publish function. From the CPU time Subscribe

Min. Mean Max.

0.1150 ms 0.1497 ms 0.1980 ms

Table 3.4: CPU time of MQTTClient subscribe function

Callgrind output, showed in figure3.7, we observe that, in fact, the cost of this function is very low.

The total bandwidth consumption is the sum of the subscribe request, containing the name of the topic to which subscribe, the QoS level, plus, the Subscribe Ack received by the broker, as showed in figure 3.8.

(35)

Figure 3.7: Output from Callgrind showing the cost for the MQTTClient subscribe function

Figure 3.8: Detail of the output from Wirshark showing the size in bytes for the MQTTClient subscribe function

From this preliminary analysis we found that the CONNECT function is heav-ier in terms of computation time than the PUBLISH and SUBSCRIBE functions. In table 3.5we report a summary of the CPU times for all the functions analyzed so far.

Functions Min Max Mean

CONNECT 0.3410 ms 0.5820 ms 0.4477 ms PUBLISH 0.1550 ms 0.3310 ms 0.2516 ms SUBSCRIBE 0.1150 ms 0.1980 ms 0.1497 ms

Table 3.5: Summary of the CPU time values for the connect, publish and subscribe functions obtained on the PC

We want to conclude this section by performing the same tests, but now using the Rasp-berry Pi connected to the broker via ethernet. Since the code for the connection, the publish and the subscribe is the same, the analysis with Callgrind, as long as the net-work part, is the same, so we do not report it again. Instead, we report the CPU times measured for the connect, the publish and the subscribe functions, in table3.6. We show the minimum, the maximum and the mean value for each of the three functions. Of course, the RPi has a lower computational capability than the PC, so we expect to see higher CPU times.

Table 3.6: Summary of the CPU time values for the connect, publish and subscribe functions obtained using the Rpi

These results seem to be in line with the previously measured times, and they confirm that the connect function is the most CPU-heavy.

(36)

3.2.2 Test of MQTT with TLS

In this section we analyze the performance of the MQTT protocol encrypted with TLS. We basically do the same tests performed in the previous section, so we analyze the CPU, the RAM and the network consumption. The TLS version used is 1.2. First we need to generate the certificates and enable the broker to allow connections on port 8883, the standard port for secure connections in the MQTT protocol. The RSA key has a dimension of 2048 bits.

Let us start analyzing the MQTTClient connect function. The code for the connect function is the same as in listing3.1.

We start from the analysis with Callgrind, showed in figure 3.9. We can observe that the MQTTClient Connect function now takes the 40.86% of the total CPU time, and a number of instructions equal to 4097338. This is slightly more than twice with respect to the same function without TLS.

Figure 3.9: Output from Callgrind showing the cost of the CONNECT function when used with TLS

It is interesting to investigate the reason of this increment in CPU time, and this can be done by inspecting the call graph generated with KCachegrind. After digging a bit through all the functions of the connect command, we come up to a set of functions, showed in figure 3.10, that use the the 32% of the total time, and around the 80% of the total time of the MQTTClient Connect function.

(37)

Figure 3.10: Call graph of the MQTTClient Connect function when using TLS

ECDH compute key, X509 verify cert and RSA verify. All these functions are SSL func-tions the are used to perform the public crypto operafunc-tions. For example ECDH compute key is used for the purposes of key agreements. It uses complex math operations involving Elliptic Curves, to securely generating and exchange a secret shared key. This value is used to encrypt the forthcoming traffic. Then, RSA verify is use in the 4-way handshake during the signing operation. From the OpenSSL manual we read [7]: “RSA verify ver-ifies that the signature matches a given message digest m”. Among these operations the RSA verify has the greatest cost.

Finally, X509 verify cert is used to discover and verify X509 certificate chain.

We are now sure that the overhead of the TLS is concentrated on the 4-way handshake phase and is due to all the crypto operations.

(38)

Let us now analyze the publish function. We would expect a lower CPU time, especially with small payloads, since all the public key operations have already been done and the operations involved are at most functions performing symmetric encryption. Also the MQTTClient publish is identical to the one in listing3.3.

Table3.7 shows the CPU time for the MQTTClient publish function. CPU time CONNECT with TLS

Min. Mean Max.

1.173 ms 3.094 ms 4.415 ms

Table 3.7: CPU time of the CONNECT function with TLS

CPU time PUBLISH TLS

Min. Mean Max.

0.2110 ms 0.2811 ms 0.3180 ms Table 3.8: CPU time of PUBLISH function with TLS

It is evident that the values obtained are almost equals to the response times of the MQTTClient publish without TLS. Figure 3.11 shows the section of the call graph of the publish function that differs from the one without TLS. Now, we clearly see that in the tree there are all the functions necessary to perform the symmetric cryptography. Using a big payload one observe an increment on CPU consumption.

Figure 3.11: Call graph generated for the PUBLISH function with TLS

As for the network part, differently from the version without TLS, now we have that the amount of information exchanged between the client and the broker is higher

(39)

because of the 4-way handshake. Figure3.12 shows the capture in Wireshark.

The total amount of bytes sent is equal to 5410. It is more than the double with respect

Figure 3.12: Output from Wireshark showing the packets exchanged by the CON-NECT function encrypted with TLS

to the non TLS version of connect. This can be a problem especially in slow networks. The reason of this increment is due to the exchange of the RSA certificate, and all the parameters necessary to negotiate the secure connection.

The last function we analyze is the MQTTClient subscribe. Table 3.9 shows the CPU time of the MQTTClient subscribe function.

CPU time of the SUBSCRIBE function with TLS

Min. Mean Max.

0.1360 ms 0.1653 ms 0.2230 ms

Table 3.9: CPU time of the SUBSCRIBE function with TLS on the RPi

Finally we analyze the RAM consumption. The figure 3.13shows the RAM utilization. We measured a total of 339 KiB of Heap RAM used. Of course using TLS implies the need of allocating additional buffers to store the keys. We observe from the picture3.13

that the function CRYPTO malloc and CRYPTO realloc are resposible for the RAM consumption. The listing 3.5 shows the additional code needed to initialize the TLS connection. This additional memory can be a problem on embedded devices with little memory space.

(40)

Figure 3.13: Detail of the output from Massif showing the Heap RAM consumption of the Connect function when using TLS

M Q T T C l i e n t _ S S L O p t i o n s s s l O p t i o n s = M Q T T C l i e n t _ S S L O p t i o n s _ i n i t i a l i z e r; s s l O p t i o n s.e n a b l e S e r v e r C e r t A u t h = 1; s s l O p t i o n s.t r u s t S t o r e = " ca . crt "; c o n n _ o p t s.ssl = &s s l O p t i o n s;

Listing 3.5: C code of the SUBSCRIBE function

We conclude this section performing the same test with the Raspberry Pi. We report in table 3.10the summary of the CPU time values measured for the connect, the publish and the subscribe functions.

Table 3.10: Summary of the CPU time values for the connect, publish and subscribe functions obtained using the Rpi with TLS

Using the RPi we start to note the big difference between the TLS version and the clear version of MQTT especially for the connect function. In the next section we summarize and compare the results obtained so far.

(41)

3.2.3 Comparison between MQTT with and without TLS

In this subsection we summarize the results obtained in the previous tests by visualizing them on a barplot to visualize better the differences. We compare the values obtained with the Raspberry Pi.

The picture 3.14 shows the comparison, in terms of CPU time, between the connect function used with TLS (red) and without TLS (blue).

Figure 3.14: Barplot showing the comparison of the CPU times for the CONNECT function with and without TLS

The figure 3.15a shows the difference of RAM consumption between MQTT with TLS and without TLS, while the figure3.15b compares the total number of bytes exchanged during the connection phase without TLS (blue) and with TLS (red).

(a) RAM consumption (b) Bytes exchanged

Figure 3.15: Comparison of the RAM consumption on the left and the total number of bytes exchanged on the right

(42)

We can summarize the results of our experiments as follows:

• The non-TLS performance cost stands at a value between 13% and 18%.

• The overhead of TLS is concentrated on the handshake phase, and it is due to the public key cryptography operations and, in particular, the RSA verify has the greatest computational cost. We measured that the total TLS cost ranges from 40% to 45% of the total time.

• The impact in terms of RAM consumption in the non-TLS version is around 130 KiB, while in the TLS version this value increases to 340 KiB, with a reduction of the 61% from TLS to non-TLS. This is due to the additional buffers needed for the cryptographic operations and allocated during the initialization phase. • We notice a minor impact on the publish and subscribe functions, even if the

publish cost depends on the payload size.

• The network consumption is higher in TLS because we need to send the certificate and all the initialization parameters, during the 4-way handshake.

In the next chapter we analyze the connect command varying different TLS parameters like cipher suites and keys length, with the aim at studying if it is possible to reduce the overhead of TLS.

(43)

Performance analysis of TLS in

MQTT

In the previous chapter we saw that the overhead of TLS is located in the connec-tion phase and, in particular, it is due to all the public key cryptographic operaconnec-tions performed during the handshake phase. We also identified the RSA verify as the cryp-tographic operation with the greatest cost.

In this chapter we try to investigate more deeply on TLS and, in particular, we repeat all the tests performed in the previous section varying different parameters of the TLS connection such as the keys size, the algorithms used during the handhsake process etc.. in order to reduce the overhead of TLS. Finally, in the second part we perform a final test with the embedded device ESP8266 using the best configuration of TLS found.

4.1 Varying the RSA key size

When we generate an RSA key, we need to specify the key length in bits and, in partic-ular, for an RSA key, the length of the key specifies number of bits in the modulus. The greater the key is, the more the system is secure. But choosing a big key introduces a side effect: with every doubling of the RSA key length, decryption is 6-7 times slower [8] but also the encryption is effected.

We test what happens on the client side when varying only the RSA key length. We measure the CPU time, the RAM consumption and the amount of data transferred over the network, using different key size, starting from 1024 bit up to 8192 bit.

Table4.1shows the CPU time in milliseconds for the relative key size, measured on the Rpi during the connection phase.

(44)

Time RSA key size 68.862 ms 1024 bits

74.05 ms 2048 bits 91.544 ms 4096 bits

Table 4.1: CPU time using different keys size measured on a Rpi

A graphical representation of these data is showed in figure 4.1. The curve depicted is a polynomial curve and it is easy to predict what happens to the time when the key size increases. In particular, at each step of the RSA verify function, a Square and multiply operation is performed. The Square and multiply function, showed in 4.1

allows to efficiently calculate the result of the exponentiation by reducing the number of multiplication to performed. Thus, by using this function, the complexity of the RSA verification process is O(k3

), where k is the dimension of the key.

f u n c t i o n s q u a r e _ a n d _ m u l t i p l y(b, e) // b : base , e : e x p o n e n t p = 1 for j = t 1 d o w n t o 0 do p = p*p if e_j = 1 t h e n p = p*b r e t u r n p // p = b ^ e

Listing 4.1: Pseudocode of the square and multiply function

Figure 4.1: Graph showing the CPU time for different RSA keys length

From the inspection with the Callgrind tool we can investigate the rsa verify function, that is responsible for this increment in CPU time. Figures 4.2, 4.3 and 4.4

(45)

Figure 4.2: Detail from Callgrind showing the CPU cost of the verify and decrypt functions for a 1024 bit key

show that by increasing the key length, also the CPU cost needed to verify the signa-ture increases. This can be rather obvious since increasing the size of the key brings to modulo operation with a bigger exponent.

Not only increasing the key size does affect the CPU time, it also affects the memory usage. In fact the additional memory is caused by the need for allocating additional buffers for TLS, in addition to the key storage. This can be a problem especially on emdedded devices with a limited size of memory.

Table4.2shows the RAM consumption in relation with the key length. RAM usage RSA key size

337.3 KiB 1024 bits 340.2 KiB 2048 bits 347.2 KiB 4096 bits

Table 4.2: Summary of the RAM usage with different RSA keys length

Increasing the key size also leads to the increasing of the amount of data sent over the network. Table 4.3shows the bytes send by the connect commands in relation with the key size. This is a problem especially on slow networks since the time to trans-fer a packet of size s increases.

It is evident, from this test, that the choice of the key size affects the per-formance, especially on constrained devices. Table 4.4 shows a summary of the values obtained for each key size. We recall that we identified the RSA verify as the operation with the greatest cost during the handshake phase.

(46)

Packet size RSA key size 4750 bytes 1024 bits 5410 bytes 2048 bits 6690 bytes 4096 bits

Table 4.3: Summary of the packet size exchanged during the connection phase using different RSA keys length

RSA key size CPU Time RAM usage Packet size 1024 bit 68.862 ms 337.3 KiB 4750 bytes 2048 bit 74.05 ms 340.2 KiB 5410 bytes 4096 bit 91.544 ms 347.2 KiB 6690 bytes

Table 4.4: Summary of the CPU time, RAM consumption and packets size for differ-ent RSA keys size obtained using the Rpi

4.2 Comparison between ECDSA and RSA

In the previous chapter we saw that the RSA verify is the function that has the greatest computation cost among the others cryptographic functions. But RSA is not the only available algorithm used to verify a certificate. In this section we consider the use of different cipher suites. A cipher suite is a set of cryptographic algorithms used for the following stages:

• Key exchange • authentication

• symmetric encryption • message authentication

Table4.5shows the most common algorithms used in TLS v.1.2.

Key exchange Authentication Symmetric Ciphers MAC

ECDHE RSA RC4 SHA

Diffie-Hellman DSA AES MD5

ECDH ECDSA DES

Table 4.5: List of the most common TLS cipher suites

An example of a string used to describe a TLS cipher suite is: T LS ECDHE RSA W IT H AES 128 CBC SHA256.

(47)

algorithms. These two algorithms have the greatest impact on the connect function, while the choice of the block/stream ciphers and the message authentication have a greater impact on the publish function. In particular, we now focus on two different authentication algorithms: RSA and ECDSA. Since we are looking for performance, as for the key exchange algorithm we consider ECDHE instead of DH. This is because the use of plain Diffie-Hellman is, of course, more computational heavier than the same version with the Elliptic curve (ECDH). The same thing holds for ECDSA, the elliptic curve version of the DSA algorithm.

First we test the differences between RSA and ECDSA. There are two things to consider when we compare ECDSA and RSA, two algorithms used for the signature verification. Both ECDSA signature and public keys are much smaller than RSA signa-ture and public keys of similar security levels. A 192-bit ECDSA curve is equivalent to a 1024-bit RSA key. On the other hand, the ECDSA signature and verification are slower than RSA. We can verify this by doing a benchmark test with OpenSSL. The results for a 192-bit ECDSA key and an equivalent RSA key, on a RPi, are showed in table 4.6.

sign verify sign/s verify/s rsa 1024 bits 0.015545s 0.000872s 64.3 1147.2 192-bit ecdsa 0.0036s 0.0122s 279.1 82.0

Table 4.6: Time comparison between RSA and ECDSA

Let’s now see how this affects the connect command in the MQTT protocol. Table4.7compares the CPU time on the RPi with RSA and ECDSA.

Time RSA 71.515 ms ECDSA 104.947 ms

Table 4.7: CPU time comparison of the MQTT CONNECT function used with RSA and ECDSA

Let us now inspect the output from Callgrind. Figure 4.6 shows the ECDSA call graph, while figure4.5shows the RSA call graph produced with callgrind. Basically, ECDSA takes the 16.33% of CPU, while RSA only the 7%. This confirms that RSA is much faster than ECDSA in terms of verification. This is most likely due to the fact that ECDSA uses more complex operations in comparison to RSA. ECDSA in fact uses elliptic curve cryptography. From figure 4.6 we can easily spot a function named EC POINT that uses the 12.82% of the total time. This function performs the so called Elliptic curve point multiplication that basically adds a point along an elliptic curve

(48)

in order to produce a one-way function. On the other hand in RSA the verification is very fast since it is simpler in terms of cryptographic computations as only a minimum of modular multiplications are necessary. For a complete explanation on elliptic curve cryptography see [9]

Figure 4.5: Call graph of the RSA verification step

(49)

With the test above we have compared the two lighter authentication algo-rithms and we have learned that ECDHE-RSA is faster than ECDHE-ECDSA. Now we compare two key exchange algorithms: ECDHE and ECDH. Both are Elliptic curve key exchange algorithms but ECDHE is ephemeral (the final ’E’ stands for ’ephemeral’). Ephemeral keys are temporaries, rather than statics. This means that during the key exchange process, the key is generated for every connection and thus, the same key is never used twice like in ECDH. In general it is recommended to use ECDHE because it is more secure since it has a Perfect Forward Secrecy (PFS). Perfect Forward Secrecy ensures that if a long term encryption key is compromised, the session keys generated using the long term key remain safe. Again for a performance perspective ECDH is slightly faster that ECDHE as it requires more crypto operations. Table4.8 shows the CPU time comparison in MQTT connect between ECDH and ECDHE.

Even if using ECDH as key exchange algorithm we have that ECDHE-RSA performs

Time ECDH-RSA 55.4 ms ECDH-ECDSA 81.205 ms

Table 4.8: CPU time comparison between ECDHE and ECDH

slightly better. So ECDHE-RSA can be a good compromise in terms of speed and se-curity, but not in terms of memory requirements since ECDSA is by far better.

Figure 4.7shows a comparison between all the combinations tested in this section.

(50)

4.3 Analysis of the TLS impact on the payload

In the previous section we tested the performance of the connect function under differ-ent algorithms for keys exchange and authdiffer-entication. Now, we continue investigating the use of different cipher suites but taking in consideration the performance of the publish command. The performance of the publish command is determined by choice of the symmetric encryption algorithm. So, we test different symmetric ciphers along with Message authentications (MAC), with the aim at finding an efficient encryption algorithm in terms of CPU consumption and network consumption. In fact, especially on slow networks, it is important to reduce the number of bytes to be transfer, in order to reduce the time needed to transfer the byte through the network.

The ciphers that we have take into consideration are:

• AES128-GCM-SHA256 • AES128-CBC-SHA256 • DES-CBC3-SHA • RC4-SHA • AES256-SHA

The same payload size has been used for all the tests. Figure 4.9 shows the different payloads size using different ciphers. In this case we want to reduce the total size in byte, so lower is better.

Figure 4.8: Comparison of the dimension in bytes of the payload with different ciphers

(51)

This is because GCM does not require padding, nor does it require plaintext to be a multiple of the block size as CBC does.

Figure 4.9shows the CPU times with different ciphers.

Figure 4.9: Comparison of the CPU time using different ciphers

In terms of CPU usage AES128-GCM is not he best one. This is because GCM uses complex math operations that involve elliptic curves. We will discuss GCM in more detail on chapter 5.

However, by taking into consideration the size of the payload in bytes and the CPU time, it seems that AES128-GCM-SHA256 is a good compromise.

4.4 TLS behaviour on different types of network

In section4.2 we saw that, in terms of CPU time, ECDH-RSA is more convenient but, in terms of key size, it is better to use ECDHE-ECDSA. Since the difference in the key size between ECDSA and RSA is significant, it is interesting to know how this could impact on the network performance. Most embedded devices, and especially all the ones powered with a battery, the less they are turned on the more the battery life increases. In addition to this, most IoT devices operate on networks with low throughput and high latency. Now, the big question on our mind is: for slow networks is it convenient to use ECDSA or RSA? Can the small dimension of the ECDSA key compensate the high computation time?

In this section we try to answer to this question performing some tests and simulations over different type of networks. For simplicity, in the following we only consider two variables: data rate and latency. In the following we test the most common type of mobile networks; all these networks are listed in table4.9with their performance in terms of speed and latency.