WebRTC: energy-efficiency characterization for constrained devices

(1)

University of Pisa and Scuola Superiore Sant’Anna

Master Degree

in Computer Science and Networking

Master Thesis

WebRTC: energy-efficiency characterization

for constrained devices

Supervisor

Nicola Tonellotto

Co-supervisor

Alberto Gotta

Candidate

Patrik Rosini

Academic Year 2016/2017

(2)

Abstract

In this master thesis, I performed the energy profiling of the WebRTC frame-work on top of a resource-constrained platform. WebRTC is a standard so-lution for real-time video streaming, and it is an appealing soso-lution in a wide range of application scenarios. This work focus on the use case of unmanned aerial vehicles distributing real-time video streams over a WebRTC session to a peer on the ground. The Raspberry Pi is a reference platform in the single-board computer category thus, it has been used as streaming device. A large testbed has been conducted to emulate the scenario under consideration. The testbed shows the empirical relationship between power consumption and QoE metrics such as PSNR and SSIM, highlighting the configurations that enable an energy-efficient WebRTC-based video-streaming.

(3)

List of Figures

2.1 WebRTC architecture . . . 11

2.2 WebRTC protocols stack . . . 13

2.3 Google Congestion Control architecture . . . 14

2.4 Remote rate controller FSM . . . 15

3.1 WebRTC sender media engine . . . 19

3.2 Power meter schematic . . . 22

3.3 Measurement system overview . . . 23

3.4 PMF of baseline power consumption . . . 24

3.5 CPU stress power consumption . . . 25

3.6 WiFi stress power absorption . . . 26

3.7 Experimental testing set-up overview . . . 28

4.1 Automatic mode performance . . . 33

4.2 Manual mode performance (high framerate with min. CPU and max. encoding threads) . . . 34

4.3 Manual mode performance (high framerate with max. CPU and min. encoding threads) . . . 35

4.4 Manual mode performance (high quality) . . . 37

4.5 Automatic mode vs. Manual mode performance comparison . 38 B.1 ArduINA power meter . . . 47

(5)

List of Tables

2.1 ITU-R quality and impairment scale . . . 16

3.1 Baseline power consumption characterization . . . 24

3.2 Testbed parameters considered . . . 27

4.1 Automatic mode configurations . . . 32

4.2 Manual mode configurations (high framerate with min. CPU and max. encoding threads) . . . 33

4.3 Manual mode configurations (high framerate with max. CPU and min. encoding threads) . . . 35

4.4 Manual mode configurations (high quality) . . . 36

(6)

Acronyms

AEC Acoustic Echo Cancellation. 11

API Application Program Interface. 8–11, 19, 21, 42 ARM Advanced RISC Machine. 20, 21

COTS Commercial off-the-shelf. 9, 20

CPU Central Processing Unit. 18, 20, 23–25, 27, 29, 31, 32, 34, 36 CSV Comma-Separated Values. 22, 23, 28

DC Direct Current. 46

DCT Discrete Cosine Transform. 18

DTLS Datagram Transport Layer Security. 13

DVFS Dynamic Voltage Frequency Scaling. 19, 24, 27 FEC Forward Error Correction. 16, 29

FSM Finite-State Machine. 14, 15 GCC Google Congestion Control. 13, 15

HDMI High-Definition Multimedia Interface. 23 HTTP Hypertext Transfer Protocol. 23

(7)

ICE Interactive Connectivity Establishment. 12 IDE Integrated Development Environment. 47 IETF Internet Engineering Task Force. 8, 10 iLBC internet Low Bitrate Codec. 11

IoT Internet of Things. 9, 10 IP Internet Protocol. 53

iSAC internet Speech Audio Codec. 11

ITU-R International Telecommunication Union Radiocommunication Sec-tor. 16

LAN Local Area Network. 47 LCD Liquid-Crystal Display. 46, 47 LED Light-Emitting Diode. 46 MOS Mean Opinion Score. 16 MSE Mean Squared Error. 17

NAT Network Address Translation. 12 NR Noise Reduction. 11

OS Operating System. 24, 31 PER Packet Error Rate. 19 PM Power Management. 25, 27

PMF Probability Mass Function. 2, 24

(8)

QoE Quality of Experience. 1 QP Quantization Parameters. 31 RAM Random Access Memory. 20

REMB Receiver Estimated Maximum Bitrate. 12, 14, 15 RTC Real-Time Communication. 8, 9, 11

RTCP RTP Control Protocol. 12–14 RTP Real-time Transport Protocol. 12, 13

SCTP Stream Control Transmission Protocol. 12, 13 SD Secure Digital. 23

SDP Session Description Protocol. 12 SoC System on Chip. 24

SRTP Secure RTCP. 12 SRTP Secure RTP. 12, 13

SSIM Structural SIMilarity. 16, 17, 29–32, 36, 38 STUN Session Traversal Utilities for NAT. 12 TURN Traversal Using Relays around NAT. 12 UAV Unmanned Aerial Vehicle. 9, 18–20, 28, 39, 40 UDP User Datagram Protocol. 12, 13

URL Uniform Resource Locator. 23 USB Universal Serial Bus. 23

(9)

Acknowledgements

I would like to express my gratitude to Professor Nicola Tonellotto for his guidance, motivation and patience. My sincere thanks to Alberto Gotta for giving me the opportunity to conduct this thesis. A special thanks to Manlio Bacco for his great support. You were always willing to help me. In addition, I thank Matteo Catena for all his advice.

I would like to thank Claudio, Michele, Cesira, Alessia, Agostina, Luigi, Thomas, Chiara, Riccardo, Giuseppe and Paolo. Your friendship will be unforgettable.

Most importantly, thank you to my mother for allowing me to pursue the master’s degree and Sara who believed in me for all the duration of the studies.

(10)

Chapter 1 Introduction

In the recent years, we assisted to the growth of Real-Time Communication (RTC) on the networks. Its use span from video chat to conference event streaming, and it gained the interest of regular and business users [1]. The main purpose of implementing RTC on top of the Web architecture is to allow a peer-to-peer communication within browsers context or between browsers and other systems. The main approach to provide that functionality has been the adoption of proprietary protocols and plugins, thus hindering the development of real-time interaction on the Web and creating fragmentation among heterogeneous systems. To solve these limitations, a new approach was necessary, leading the World Wide Web Consortium (W3C) and Internet Engineering Task Force (IETF) to propose a set of open standards in order to simplify the development and deployment of RTC, in order to enable seamless and rich communications between browsers and other clients.

The two technologies representing those approaches are: Adobe Flash and WebRTC [2]. The former is the most common third-party solution since long time, the latter an open source native Application Program Interface (API) candidate to be the favourite solution for RTC. Adobe Flash became popular thanks to its cross-browser and cross-platform interoperability, but for sev-eral reasons (e.g. frequent vulnerabilities, mobile incompatibility, antiquated architecture) its support will terminate at the end of 2020 as officially stated

(11)

by Adobe1. In general, the main factor determining the decline of third-party technologies for RTC is the rising of reliable open standard solutions. WebRTC is proposed as a collections of communications protocols and APIs (independent of platforms and devices) that provide a flexible and modular technology stack. In fact, it is adopted for security system (for instance, the

Amaryllo2 _{camera), remote presence on work sites [3], telehealth [4][5] and}

integration in Internet of Things (IoT) scenarios [6].

RTC and computing platforms with limited resources is an interesting topic for industrial and research community, especially for the purpose of multimedia applications and services. In particular, wireless video streaming scenario from Unmanned Aerial Vehicle (UAV) is a relevant study subject, and this work focus on such an application scenario. A Commercial off-the-shelf (COTS) single-board computer can provide enough computational power on a credit card form factor. On the other hand, an UAV typically has a limited energy resource supplied by an on-board battery. Hence, it is important to optimize the power consumption concentrating on the streaming efficiency. The objective function could be the minimization of the power absorption while keeping a video quality at an acceptable level. Further than that, any requirements imposed by the use case under consideration must be respected as well.

The aim of this master thesis is to provide an energy profiling of the We-bRTC framework when running on top of a constrained platform (a Rasp-berry Pi 3), which delivers a real-time video stream to a fixed receiver via WiFi. The main result is a characterization of the achievable performance level, focusing on the trade-off between video quality and energy-efficiency.

The remainder of this work is organized as follows: Chapter 2 describes WebRTC and the video quality metrics used to represent the collected data. In Chapter 3, the scenario definition is presented, as well as an experimental testbed along with some preliminary tests. Then, in Chapter 4, the testbed results are analyzed and discussed. Chapter 5 draws the conclusions and proposes some research directions that may be of interest in the next future.

1

https://blogs.adobe.com/conversations/2017/07/adobe-flash-update.html 2

(12)

Chapter 2 Background

This chapter provides the necessary background on WebRTC in Section 2.1, then Section 2.2 introduces the metrics used in this work for the purpose of analyzing the measurements collected during the tests, discussed and pre-sented in Chapter 4.

2.1 WebRTC framework

WebRTC is the outcome of the work done by W3C1 _{and the IETF}2_{in defining}

a framework, protocols and APIs enabling a real-time interactive voice, video and data over peer-to-peer connections [7]. It is currently at the state of standard draft (even if already largely used) and, even if initially designed for browsers, it also provides support for mobile platforms and IoT devices. The contributors of the initiative are Google, which is the major author, Mozilla, and Opera, amongst others.

The overall architecture is summarized in the Figure 2.1. Two layers can be seen starting from the top:

• Web API is a Javascript API for the development of web based video chat-like applications (usually in conjunction with the use of HTML5);

1_{Web Real-Time Comm. Working Group, https://www.w3.org/2011/04/webrtc/}

(13)

Figure 2.1: WebRTC architecture3

• WebRTC C++ API is a low-level API offering a direct access to the core of the framework, allowing the development of native RTC applications or the implementation of a custom higher level API.

The core is composed of few but complex blocks which cooperate each other. These blocks are the voice engine, video engine, transport and ses-sion [8]. The voice engine take cares of the audio media chain from micro-phone audio input to network transmission. It can manages several audio codecs such as iSAC, iLBC, Opus and it has some tuning components like NetEQ for Voice to deal with network jitter and packet loss, the Acoustic Echo Cancellation (AEC) for acoustic echo removal and Noise Reduction (NR) for background noise cancellation. In turn, the video engine is re-sponsible of the video media chain from images camera input to network to screen. The video codecs available are VP8, VP9 and H.264 together with specialized components like Video Jitter Buffer for video concealment due to the effects of jitter and packet loss and an Image enhancements to remove

3

(14)

the noise from the input images of the camera.

The transport block consists of a Real-time Transport Protocol (RTP) stack, a multiplexing utility and other protocols plus a dedicated framework used to establish a peer-to-peer network. Audio and video stream are trans-ported by RTP, which is an approved standard for real-time media transport and it is used over UDP. The RTP Control Protocol (RTCP) is used to provide feedback to the sender. Many extensions exist for both RTP and RTCP, like the Receiver Estimated Maximum Bitrate (REMB) from RTCP adopted in the congestion control algorithm. WebRTC provides the trans-port of generic data also between the endpoints in a peer-to-peer fashion. The transport of data could be reliable or partially reliable with in-order or out-of-order delivery of the sent message. Anyway, to realize the generic data transport the Stream Control Transmission Protocol (SCTP) is chosen. A multiplexing of the RTP media streams and the data is applied by default for the purpose of minimizing failures in NAT and firewall traversal. This technique aggregates in one UDP flow all the type of messages carried in one RTP session. In addition to the components listed so far, the Inter-active Connectivity Establishment (ICE) framework with Session Traversal Utilities for NAT (STUN) and Traversal Using Relays around NAT (TURN) protocols are essential parts of the transport block. They establish a peers connection in presence of NAT and firewall. ICE orchestrates the connection phase trying on a first attempt with the STUN protocol and, if failed, it relies on TURN protocol at the expense of increasing the latency and jitter effect on the media.

The last block is session. It is an abstraction part of the architecture since its implementation is left to the developer. It is supposed to define a signalling protocol to coordinate communication and sending control messages that will be suitable for the specific scenario. The Session Description Protocol (SDP) is the unique component required as by standard. It is mandatory for the initiation of endpoints communication since it describes the parameters of the peer-to-peer connection so it acts when a signaling channel is established.

As concerning the security between peers, Secure RTP (SRTP) can ensure protection to the RTP session alongside with Secure RTCP (SRTP) to secure

(15)

Figure 2.2: WebRTC protocols stack5

the flow of RTP Control Protocol (RTCP) messages. The key-management is designated by DTLS-SRTP negotiation mechanism. In case of data trans-port, SCTP is secured with a DTLS association so it is encapsulated inside

DTLS that is running on top of UDP [8]4_{. Figure 2.2 shows explicitly the}

aforementioned protocols stack.

Here, a finer overview of the audio and video stream delivery optimization is provided. A description of how the communication bandwidth is managed will be useful to point out the relation with the video engine block and for further improvements of the following thesis.

Real-time audio and video flows are considered delay-sensitive so a con-gestion control is necessary to manage the connection latency. WebRTC implements the Google Congestion Control (GCC) algorithm to guarantee the highest possible quality of the application [9]. Its architecture overview is shown in Figure 2.3. Considering a peers connection with a sender and receiver peers where the sender emits RTP packets and receives RTCP peri-odic control messages generated from the receiver. The GCC is composed of two components:

4_{Other aspects of the security architecture are out of scope of this thesis and the reader}

is reminded to consult the official documentation. 5

(16)

Figure 2.3: Google Congestion Control architecture [9]

• a loss-based controller at the sender evaluating and using a target

send-ing rate As;

• a delay-based controller at the receiver which computes an ideal target

rate Ar and communicates it to the sender.

The loss-based controller takes into account the fraction of lost packet

receiv-ing RTCP messages which carry the information fl(tk) and REMB messages

carrying the ideal target rate Arcomputed by the receiver. Every time tk, the

new value of fl(tk) in the k-th RTCP message is received the sender computes

the target sending rate As(tk) by means of the equation:

As(tk) =          As(tk-1)(1 − 0.5fl(tk)) fl(tk) > 0.1 1.05(As(tk-1)) fl(tk) < 0.02 As(tk-1) else (2.1.1)

The logic for the computation of the target sending rate is straightforward.

If the fraction of lost packet is small, the new As is not computed but kept

constant. If the estimation of the lost fraction is high (fl(tk) > 0.1), then

the rate is decreased of a factor (1 - 0.5fl(tk)). Otherwise, if the lost fraction

is negligible (fl(tk) < 0.02), then the rate is increased of a factor equal to

1.05. At the receiver side, the rate controller considers the delay of a packets sequence forming the i-th video frame. It is defined by a Finite-State Machine (Figure 2.4) where a state σ changes upon a signal s coming from the over-use

(17)

Figure 2.4: Remote rate controller FSM [9]

detector component. At the same time, the adaptive threshold component dynamically changes the threshold on the base of the network condition. The over-use detector produces an overused signal if the network is considered congested, an underused signal if the network is considered underused or a normal signal if the network condition estimation is considered within the

threshold bounds. Depending on the state the Ar is decreased, increased or

kept constant and it is computed according to the equation:

Ar(ti) =         

ηAr(ti-1) σ = Increase

αRr(ti)) σ = Decrease

Ar(ti-1) σ = Hold

(2.1.2)

where ti indicates the time of i-th video frame reception, η = 1.05, α = 0.85

and Rr(ti) is the receiving rate. Once the ideal target rate is evaluated, it is

passed to the REMB processing component that normally send every 1 sec a

REMB message to the sender or instantaneously when Ar is decreased more

than 3%. It is important to remind that Ar is upper bounded by the value

of 1.5Rr(ti)). The sending target rate actuated by the GCC is defined as

A = min(Ar,As). So A is forwarded to the pacer and encoder components.

The encoder does its best to encode with a bitrate as much as closer to the target rate but to avoid flickering distortion it may not match precisely the rate A. The pacer reacts in two different ways based on the outcome of the encoding phase. If the encoder uses a rate higher than the target rate, the pacer will evacuate its queue with rate fA where f is a constant factor, thus

(18)

Scale Quality Impairment

5 Excellent Imperceptible

4 Good Perceptible, but not annoying

3 Fair Slightly annoying

2 Good Annoying

1 Good Very annoying

Table 2.1: ITU-R quality and impairment scale [10]

avoiding delay because of the queue at sender. If the pacer rate is lower than the rate A, padding data or FEC are added. All the described process is able to generate an average sending target rate approximated to A.

2.2 Video quality measurements

In order to asses the video quality of a video stream, it is necessary to adopt proper evaluation metrics, below briefly described. Video quality measure-ments can be based on two methodologies: subjective or objective. The subjective evaluation is performed by leveraging users of a video system to rate the quality of the watched video sequence based on their impressions. The users opinion is given on a scale called Mean Opinion Score (MOS) and its range is between 1 (lowest quality) to 5 (highest quality). These values are commonly mapped with rating categories as shown in Table 2.1. Such a methodology is expensive and complex because of time consuming, man-power and special equipment requirements, so it is not normally affordable.

The objective evaluation are best suitable for automated tests since their are based on numerical criteria, therefore free from any humans interpre-tation. Among all the objective video quality metrics the most referenced are the Peak Signal-to-Noise Ratio (PSNR) and the Structural SIMilarity (SSIM) index. They are both full reference metrics, which means, the qual-ity difference is computed by comparing the original uncompressed video sequence with the resulting distorted video sequence, hence frame by frame. Considering f, g as reference and test image respectively, both of size MxN,

(19)

the PSNR is defined as: PSNR(f,g) = 10 log10(2552/MSE(f,g)) (2.2.1) where MSE(f,g) = 1 MN M X i=1 N X j=1 (fij - gij)2 (2.2.2)

is the Mean Squared Error. The PSNR is expressed in dB where higher val-ues correspond to higher image quality. The SSIM measures the similarity between two images and is created to improve metrics such as PSNR. It mea-sures the image distortion as a combination of luminance distortion, contrast distortion and loss of correlation. The SSIM is defined as:

SSIM(f,g) = l(f,g)c(f,g)s(f,g) (2.2.3) where l(f,g) = 2µf µg+ C1 µf2+ µg2+ C1 (2.2.4) c(f,g) = 2σf σg+ C2 σf2+ σg2+ C2 (2.2.5) s(f,g) = σfg+ C3 σf σg+ C3 (2.2.6)

with µf and µg as mean luminance, σf and σg as contrast standard deviation,

σfg as covariance of f and g. The SSIM is expressed within an interval [0,1]

of positive value where 1 means f =g.

Although there are not specific recommendations to prefer PSNR over SSIM or vice versa, their sensitivity in respect to some common image degra-dations like Gaussian blur, additive Gaussian noise, jpeg and jpeg2000 appear dissimilar. In facts, PSNR is more sensitive to additive Gaussian noise than SSIM while SSIM is more sensitive to jpeg compression than PSNR [11]. For this reason, they are both used in the results analysis discussed later, in order to evaluate comprehensively the video quality.

(20)

Chapter 3 Experimental analysis

In this chapter, Section 3.1 provides a description of the problem to be tack-led, then Section 3.2 provides details of the designed testbed and preliminary tests discussion.

3.1 Problem statement

In a wireless video communication from a battery-powered Unmanned Aerial Vehicle (UAV) delivering a video stream to a terrestrial fixed receiver, the limited energy poses a constraint that must be taken into account.

The energy consumption of a video streaming application mainly depends on computation and transmission tasks [12]. In facts, in the computation phase, energy is mostly absorbed to perform encoding operations; during the transmission phase, energy is spent to deliver the video sequence. These two tasks should be jointly accounted [12]. A typical encoding system is formed by several modules, usually motion estimation, motion compensa-tion, Discrete Cosine Transform (DCT), quantizacompensa-tion, entropy encoding of the quantized DCT coefficients, inverse quantization, inverse DCT, picture reconstruction, and interpolation. Such modules can be associated to a set of parameters of the video codec in use, in order to control the complexity of the entire encoding process. The computational complexity is strictly correlated with the power consumption of the device CPU during the compression of the

(21)

Figure 3.1: WebRTC sender media engine [15]

video sequence. Therefore, a fine tuning of specific complexity parameters could enhance the energy efficiency of the streaming device [13]. As antici-pated, video transmission has an impact on the overall power consumption. In general, the consumed energy while transmitting a video sequence de-pends on wireless channel conditions like instantaneous channel fading factor, channel noise, and also on transmission parameters like frequency bandwidth, Packet Error Rate (PER), modulation scheme and coding scheme [13]. In this work, only the transmission rate and the power management technology of the device transmitter are considered. However, an additional hint for power management is given by the well-known Dynamic Voltage Frequency Scaling (DVFS) feature, offered by the majority of modern microprocessors. DVFS exploitation, eventually, jointed with an optimized encoding process, could further improve the energy-efficiency of the video application [14]. Merg-ing all together, the complexity parameters, transmission power control, and DVFS, it is feasible they are potential ways to reduce the power absorption of the running application or service on a dedicated resource-constrained device, streaming a real-time video sequence from an UAV.

The main objective of this work is to use WebRTC hosted by a single-board computer and to find out a possibility to execute it in an energy efficient way while still respecting any requirements on the video quality. In order to do so, the structure of the WebRTC media engine, as depicted in Figure 3.1, must be carefully analyzed. A module of interest is the Video Encode one, because in charge of the encoding process. This module repre-sents any steps of the transformation from a raw frame to a compressed one. After an exhaustive study of the native API internals (directly from source because of the lacking of official documentation at this time), it has been possible to analyze the default video encoder implementation, so that a

(22)

sim-plified functioning of the Video Encode module can be read in Algorithm 1. The InitialConfiguration() operation refers to the set-up phase of the video

Result: Encoded frame InitialConfiguration(); while true do

AddVideoFrame(newFrame);

if isTargetBitrateChanged or isFramerateChanged then SetNewRates(newTargetBitrate, newFramerate); end

Encode(newFrame); end

Algorithm 1: Video Encode implementation

encoder: all the parameters of the used video codec are configured. Once the encoding process is in place, the encoder strives to respect the target sending bitrate as seen in Section 2.1 at a certain framerate. The framerate is internally evaluated in order to optimize the enconding process. Next, the If statement indicates a check on the new values of target bitrate and framerate so that, if newer values are present, they are used until another update. Finally, the raw frame can be encoded.

3.2 Testbed environment

Considering the exemplary scenario described in Section 3.1, a testbed has been designed. The choice of the resource-constrained device has been sub-ject to some requirements such that: a small and lightweight platform easy to be deployed on a UAV, low-power consumption to smoothly run on batteries

and a COTS device to contain the operating costs. The Raspberry Pi1 _has

been chosen for its wide popularity. Despite of its cost, the features offered by this single-board computer are remarkable and, as stated in Chapter 1, it has been adopted in many research activities. For these reasons, the Rasp-berry Pi 3 Model B featuring an ARM CPU (quad core 1200 MHz Broadcom BCM2837 64bit), a WiFi chipset (Broadcom BCM43438) and 1 GB of RAM

1

(23)

equipped with Raspbian 8 (Jessie) is the selected platform to run WebRTC in this work.

The experiments have been conducted in a controlled environment (al-most ideal wireless link) and the connection between the Raspberry Pi and the receiver has been realized by means of a commodity WiFi access point. The receiver consists of a standard laptop running Ubuntu 16.04.1 and it has been used to perform additional tasks during the testing. The WebRTC application, has been developed using the Native C++ API and it is based on the client and server sample applications included in the framework

exam-ples2. The native API of WebRTC is public available in a bundle providing

the toolchain for all the major platforms (e.g., ARM, x86, x64). Due to its strong development cycle is was not possible to use the latest version,

there-fore the M57 revision3 _{(Feb. 2017) is here considered. More details about}

the application and the framework customization are listed in Appendix A.

3.2.1 Power measurement

In order to provide an energy profiling of WebRTC, a monitoring tool to monitor the power consumption was necessary. An energy profile could be obtained by means of two methodologies: software or hardware based [16]. The former uses software parts to collect and analyze samples of power con-sumption, while the latter relies on external hardware, like a power meter, to gather measurements. Moreover, a software-based tool introduces overhead on the measurements reading and a compatible solution for the Raspberry Pi has not been found, to the best of my knowledge. On the other hand, a hardware-based tool provides more accurate measurements but at the ex-pense of some extra equipment. The adopted solution is a trade-off relying on

open-source products such as an Arduino4 _{prototype board and an Adafruit}

INA219 current sensor breakout5_{. Arduino is a powerful single-board}

micro-2_{https://chromium.googlesource.com/external/webrtc/+/branch-heads/57/} webrtc/examples/peerconnection/ 3_{https://chromium.googlesource.com/external/webrtc/+/branch-heads/57} 4_{https://www.arduino.cc/} 5 https://cdn-learn.adafruit.com/downloads/pdf/adafruit-ina219-current-sensor-breakout.pdf

(24)

Figure 3.2: Power meter schematic

controller for prototyping purpose giving the ability through its programming language and input/output interfaces to build digital devices and interactive objects. Along with Arduino, the Adafruit INA219 current sensor breakout has been chosen for its precision since it can be configured to the applica-tion. Therefore, a custom power meter has been designed and built with a limited equipment cost, providing fine measurement accuracy (see Appendix B). The Figure 3.2 shows the schematics used to connect the Raspberry Pi to the power meter. The power sensor measures the voltage drop over a shunt resistor, which is connected in series with the device under evaluation. It is controlled from the Arduino board by loading a sketch, in other words a proper script written for the Arduino board. The sketch, listed in Appendix C, reads the current and voltage from the current sensor and estimates the absorbed power as follows:

P = VI (3.2.1)

The voltage, current and power values readings, along with a timestamp, are composed in form of a string. A new string is created with a predefined

frequency and stored in a CSV file. At bootstrapping, the power meter

waits for input parameters, such as a filename in order to assign a name to the CSV file, readings interval in order to configure a specific time interval between readings, and test time in order to define the total running time

(25)

Figure 3.3: Measurement system overview

for sensor readings. The input parameters are sent from Raspberry Pi to Arduino via an USB serial connection. Furthermore, the CSV file is stored onto a SD card on the Arduino shield, and once the sensor readings time is expired it can be retrieved with a HTTP GET request using the URL structure http: // ip-arduino/ CSV-file-name. csv . An overview of the measurement system is illustrated in Figure 3.3.

3.2.2 Preliminary testing

A preliminary testing has been conducted to estimate the power consump-tion of the Raspberry Pi in idle, when the CPU is utilized with multithread tasks and when the WiFi interface is transmitting. The purpose of the fol-lowing testing is to assess a reference baseline and to dimension the impact of CPU and the wireless transmitter in isolation. The HDMI connector on the Raspberry Pi has been disabled to reduce unwanted power absorption. Regarding the power meter, a 100 ms time interval between each reading

(26)

Figure 3.4: PMF of baseline power consumption

CPU frequency Average 5th percentile 95th percentile

600 MHz 1,232 mW 1,186 mW 1,362 mW

1,200 MHz 1,436 mW 1,377 mW 1,637 mW

Table 3.1: Baseline power consumption characterization

has been configured. In this way, current and voltage values are sampled at a rate of 10 Hz. The same set-up has been used for the testbed evaluation reported in Chapter 4.

The reference baseline is obtained estimating the power consumption of the Raspberry Pi in idle conditions (e.g., OS running in idle mode), thus the WiFi interface is powered on but not transmitting. The on-board CPU

pro-vides DVFS functionality through the CPUFreq driver6. Two CPU

frequen-cies can be set: 600 and 1,200 MHz7. The baseline test has been conducted

collecting readings for both CPU frequencies and they are represented in Figure 3.4. As expected, the Raspberry Pi board exhibits a very low-power consumption in idle conditions. Table 3.1 shows the average values, 5-th and 95-th percentiles for both aforementioned CPU frequencies. The CPU stress

6

https://www.kernel.org/doc/Documentation/cpu-freq/cpu-drivers.txt

7_{This limitation could be mainly due to the current SoC firmware installed on the}

(27)

Figure 3.5: CPU stress power consumption

test has been conducted by executing a script8simulating different utilization

of the CPU hardware resources. The script has been executed by using from 1 to 4 CPU cores, and an utilization range from 10% to 100% with steps of 10% on both CPU frequencies. The Figure 3.5, shows the chart represent-ing the CPU power consumption for 1 to 4 cores under different utilization conditions. The average power consumption is linear with reference to the CPU utilization in all considered configurations. However, it is worth to re-mark that the energy demand of the CPU shows larger skews when 1-2 cores are used than 3-4, when combined with both CPU frequencies. In fact, by reading the chart, a margin of 17% is gained when no more than 2 cores are used with low and high clock frequency. On the other hand, a small margin is obtained when 3 or 4 cores are used with both clock frequencies. Then,

WiFi stress test have been run. To simulate a video streaming flow, iPerf39

software has been used with negligible computational overhead. Along with the CPU frequencies, the Power Management (PM) option of the WiFi inter-face is taken into account. The transmission rate ranges within 170 Kbit/s to 1,700 Kbit/s, with steps of 170 Kbit/s. The chosen range of values corre-sponds to the set of bitrate values defined in WebRTC when a video sequence

8_{https://gist.github.com/catenamatteo/eec3899a0acc12238692670a291a288c}

(28)

Figure 3.6: WiFi stress power absorption

composed of frames with resolution 640x480 is transmitted, so the maximum imposed bitrate is 1,700 Kbit/s. Figure 3.6 depicts the WiFi test results. It is straightforward to notice the power consumption of the WiFi compo-nent falls within the range 1,364-1,420 mW in any configurations. Thus, the energy demand of the wireless chip can be considered as almost constant.

3.2.3 Parameters under considerations

In this section, computational and transmission parameters are evaluated. Their combination realize a set of configurations that has been used for the experiments on the designed testbed. The considered computational param-eters are related to the encoding process, and they are:

• source rate: a set of bitrate values (described in Section 3.2.2);

• encoding quality: it is the cpu used parameter of the VP8 codec10 _and

it is indicated as cpu speed. It provides a trade-off between quality and speed according to the equation:

target cpu utilization = (100(16 − cpu speed)/16)%; (3.2.2)

10

(29)

Parameter name Value or range of values

source rate 170 - 1,700 Kbit/s

encoding speed 10 - 15

encoding threads 1 - 2

CPU frequency 600 MHz - 1,200 MHz

Table 3.2: Testbed parameters considered

The cpu speed parameter ranges among 16 possible values and, the lower the value, the larger is the use of the CPU (and vice versa); • encoding threads: it is the number of allocated threads used for the

encoding process and it is indicated as g threads.

The choice of these specific encoder parameters has been performed by ana-lyzing the WebRTC encoder implementation, which set them depending on the underlying hardware platform and on the video source resolution. In addition to the encoder parameters, the two CPU frequencies provided by DVFS have been taken into account. Regarding the encoding speed param-eters, six possible values have been chosen, from -10 to -15, since -10 is the default one set by WebRTC when running on top of an ARM-powered plat-form. The negative values force the encoder to comply with the required speed as recommended by codec documentation. Recalling the preliminary tests on CPU in Section 3.2.2, a maximum number of two encoding thread has been chosen because, if using more than two concurrent threads, the power absorption due to the CPU is sensitively increasing and not substantial mar-gin is obtained on both frequencies. For the sake of simplicity, transmission parameters have not been evaluated. The reason is that the WiFi inter-face seems not impacting considerably the overall power consumption of the device. As observed by preliminary test, the device has a rather constant power absorption in the transmission rate range of our interest. Moreover, the PM does not change effectively the energy demand of the transmitter. The parameters considered for the testbed are summarized in Table 3.2.

(30)

Figure 3.7: Experimental testing set-up overview

3.2.4 Testing setup

In the described testbed, the crowd run11 _{video sequence has been used. Its}

original resolution has been converted to 640x480 pixels with a framerate of 30 fps. The aspect ratio is 4:3 and the pixel format is YUV420. The cited video has been chosen because of its high dynamic scenes, captured from above, in order to mimic a video stream from an UAV, which is the reference test-case of this thesis. The video codec used is VP8, i.e., the default one provided by WebRTC. Each test has been executed three times with a duration of three minutes. A test consist in streaming the looped video sequence from the Raspberry Pi to the laptop. A pre-encoded video sequence is saved at sender side while the encoded one is saved at receiver side. When a test is concluded the pre-encoded video sequence is transferred along with some statistical data in CSV format (e.g. source rate, fps) and stored to

11

(31)

the laptop. Then, the stored video sequences can be processed to extract video metrics information (PSNR and SSIM by means of psnr ssim analyzer program provided by the WebRTC package). The overall testing procedure has been automated by custom scripts which can be found in Appendix D. Moreover, some WebRTC features like resolution scaling, CPU overusing detection, frame dropping and any FEC mechanisms have been disable to avoid interference during the experiments. Figure 3.7 illustrates an overview of the full experimental set-up realized for testing.

(32)

Chapter 4 Perfomance evaluation

This chapter provides a performance evaluation of results provided by pro-posed testbed. The Section 4.1 briefly exposes the performance metrics used for results analysis. Afterwards, in Section 4.2 the automatic configurations test are reported while in Section 4.3 the manual configurations test are evaluated and discussed.

4.1 Performance metrics

In order to evaluate the results of the experiments with the objective of a trade-off between video quality and power consumption the following perfor-mance metrics have been chosen:

• framerate: it represents the number of frames per second characterizing the fluidity of the video;

• PSNR and SSIM: they evaluate the video quality as full described in Section 2.2;

• the average instantaneous power consumption.

In the next sections the absolute values of experimental results are provided. They are organized on Table 4.1, 4.2, 4.3, 4.4 as follow: the first column reports the configuration of reference, the second column reports the source

(33)

rate, the third reports the CPU clock frequency, the fourth reports the encod-ing quality/speed level (cpu speed), the fifth reports the number of threads used for the encoding process (g threads), the sixth reports the average fram-erate of the video sequence, the seventh reports the Quantization Parameters (QP) as additional information indicating the quality of a frame with a range of 1-56 where the higher is the QP the lower is the frame quality. They are output data of the encoder produced after when each frame is encoded. The eighth column reports the PSNR and the ninth reports the SSIM. They are both evaluated as average value of all the frames composing the video sequence of the related test configuration. The tenth column reports the average instantaneous power consumption which is compared to the baseline in Table 3.1 in the eleventh column. In more words, the last column shows the additional power consumption with reference to the baseline due to the use of WebRTC on top of a Raspberry Pi. Moreover, Figure 4.1, 4.2, 4.3 and 4.4 show the performance level when WebRTC is running with default settings and when they are manually configured. The values in the charts are shown in the scale 1-100% where 100% means the highest one evaluated from experimental results.

4.2 Automatic mode

In this section, the performance of WebRTC in automatic mode (when de-fault settings are used) are evaluated. Table 4.1 shows the absolute values of the automatic configurations. As can be noticed, the values of cpu speed and g threads are fixed. The former have been discussed in Section 3.2.3 while the latter is based on the video source resolution and on CPU resources (the number of cores available). Regarding the CPU frequency, during the

test-ing the OS default CPUFreq governor1 policy has been used. It is named as

ondemand and it scales the CPU clock frequency dynamically according to

current load. However, thanks to the cpufrequtils package2 _{it has been}

pos-sible to verify that the highest CPU frequency was set when WebRTC was 1

https://www.kernel.org/doc/Documentation/cpu-freq/governors.txt 2

(34)

Cfg Source CPU Encoding Encoding Avg QP PSNR SSIM Avg Additional rate frequency speed threads fps [dB] power power [Kbit/s] [MHz] [cpu speed] [g threads] consumption consumption

[mW] 1 170 1,200 -10 1 9 56 27.9 0.798 3,207 +123% 2 340 1,200 -10 1 3 48 30.3 0.864 3,321 +131% 3 510 1,200 -10 1 2 38 33.4 0.920 3,348 +133% 4 680 1,200 -10 1 2 29 35.8 0.948 3,388 +136% 5 850 1,200 -10 1 2 21 37.5 0.962 3,408 +137% 6 1,020 1,200 -10 1 2 16 39.0 0.970 3,416 +138% 7 1,190 1,200 -10 1 2 12 40.3 0.976 3,431 +139% 8 1,359 1,200 -10 1 2 10 41.5 0.980 3,455 +141% 9 1,529 1,200 -10 1 2 7 42.6 0.983 3,445 +140% 10 1,698 1,200 -10 1 2 6 43.5 0.986 3,450 +140%

Table 4.1: Automatic mode configurations

running, so it has been reported on Table 4.1. For the following tests, only the source rate has been set as input parameters obtaining a total of 10 con-figurations. The Figure 4.1 shows the performance results for the WebRTC framework in automatic mode. Clearly, the main goal is video quality, at the cost of average framerate and power consumption. In facts, the default value of cpu speed parameter forces the encoder to keep a reasonable quality for each frame leading in a costly computational effort. Therefore, the best average framerate value is 9 fps at 170 Kbit/s, so when the source rate is considerably low. On the other hands, PSNR and SSIM values are acceptable for all configurations. Even if the CPU frequency is set at maximum value the limited performance are also due to the Raspberry Pi computing capabil-ities. As default settings, the power consumption is unavoidably high, so the energy-efficiency cannot be considered as a primary optimization objective.

4.3 Manual mode

In this section, WebRTC framework is set with manual settings and its per-formance are reported. The input parameters taken into account, detailed in Section 3.2.3, gave a total number of 240 configurations. In the follow-ing tests, the CPU frequency has been manually imposed relyfollow-ing on the userspace CPUFreq governor policy (by means of testing script). Two per-formance profiles have been extracted: high framerate and high quality. In

(35)

Figure 4.1: Automatic mode performance

[mW] 1 170 600 11 2 12 56 27.9 0.794 2,359 +92% 2 170 600 12 2 14 56 27.8 0.792 2,320 +88% 3 170 600 13 2 14 56 27.8 0.793 2,320 +88% 4 170 600 14 2 19 56 28.0 0.798 2,286 +86% 5 170 600 15 2 19 56 28.1 0.798 2,260 +83%

Table 4.2: Manual mode configurations (high framerate with min. CPU and max. encoding threads)

(36)

Figure 4.2: Manual mode performance (high framerate with min. CPU and max. encoding threads)

the first profile, the configurations providing a fluid framerate are evaluated. They have been selected comparing the power consumption and framerate of the best configuration in automatic mode. The absolute values of refer-ence are shown in Table 4.2 and in Table 4.3. The Figure 4.2 shows the configurations when the CPU frequency is kept at minimum and the number of encoding threads (g threads) is maximum. Differently, Figure 4.3 shows the configurations when the CPU frequency is kept at maximum and the number of encoding threads at minimum. These configurations share the same objective and point out that CPU frequency and encoding threads are inversely proportional. In general, to reach a good fluidity the video qual-ity must be sacrified. This is because low source rate must be chosen thus,

(37)

[mW] 1 170 1,200 11 1 14 56 27.9 0.796 3,076 +114% 2 170 1,200 12 1 16 56 27.9 0.795 3,051 +112% 3 170 1,200 13 1 16 56 27.8 0.794 3,044 +112% 4 170 1,200 14 1 19 56 28.0 0.797 2,976 +107% 5 170 1,200 15 1 20 56 28.0 0.797 2,960 +106%

Table 4.3: Manual mode configurations (high framerate with max. CPU and min. encoding threads)

Figure 4.3: Manual mode performance (high framerate with max. CPU and min. encoding threads)

(38)

[mW] 1 1,178 600 13 1 1 3 46.2 0.991 2,325 +89% 2 1,187 600 10 1 1 2 46.5 0.992 2,314 +88% 3 1,350 600 13 1 1 2 46.1 0.991 2,346 +90% 4 1,357 600 10 1 1 2 46.4 0.989 2,376 +93% 5 1,516 600 13 1 1 2 46.1 0.991 2,350 +91% 6 1,529 600 10 1 1 2 46.6 0.992 2,331 +89% 7 1,683 600 15 1 1 2 46.0 0.990 2,375 +93% 8 1,686 600 11 1 1 2 46.1 0.988 2,339 +90% 9 1,691 600 10 1 1 2 46.2 0.988 2,362 +92% 10 1,700 600 14 1 1 2 46.2 0.991 2,375 +93%

Table 4.4: Manual mode configurations (high quality)

low/medium values of PSNR and SSIM are obtained. Moreover, the more is the cpu speed increase the higher is the framerate since it forces the encoder to process the frames faster at the cost of quality. The power consumption can be comparable to the automatic mode if configurations with the highest value of CPU frequency are chosen. However, there are no advantages in using the maximum CPU frequency since no performance gain is observed. To summarize, the high framerate profile should be used when a fluid video is preferred to the overall video quality.

Regarding the second profile, the highest video quality is evaluated. In this case, the related configurations have been selected comparing the power consumption and PSNR/SSIM of the best configuration in automatic mode. The Table 4.4 reports the absolute values of reference while their performance are shown in Figure 4.4. The PSNR and SSIM reach peak values but it comes at the cost of very low framerate. It is straightforward to realize that such a video quality is obtained when the source rate is high thus, the encoder takes advantage of it when a frame is processed. Moreover, the CPU frequency can be kept at minimum and only one encoding threads is required. The encoding quality is mostly set around the default value and even if it is higher the impact on the video quality is negligible. As observable, the power consumption is not affected by the increasing source rate thus, the maximum value can be chosen without additional cost. Finally, the high quality profile should be used in particular scenario where high resolution is preferred with respect to a scarce framerate.

(39)

Figure 4.4: Manual mode performance (high quality)

Source rate Avg power consumption Avg power consumption Gain

[Kbit/s] AUTOMATIC MANUAL

[mW] [mW] 170 3,207 2,263 42% 340 3,321 2,429 37% 510 3,348 2,468 36% 680 3,388 2,472 37% 850 3,408 2,501 36% 1,020 3,416 2,485 37% 1,190 3,431 2,490 38% 1,360 3,455 2,509 38% 1,530 3,445 2,511 37% 1,700 3,450 2,518 37%

(40)

Figure 4.5: Automatic mode vs. Manual mode performance comparison

To show the potential energy saving achievable by WebRTC, automatic mode and manual mode have been compared. For each source rate, the best manual configurations for energy-efficiency have been selected and compared to the automatic ones with comparable framerate and PSNR/SSIM. The absolute values considered can be found on Table 4.5 while Figure 4.5 depicts the configurations comparison. By using specific computational settings it could be possible to gain 37% of less power consumption on average.

(41)

Chapter 5 Conclusions and Future Works

In this master thesis, WebRTC has been tested in a real-time video stream-ing scenario involvstream-ing the use of constrained platforms. The Raspberry Pi 3, a low-cost and constrained platform, has been chosen as host device to evaluate the power consumption of the framework. An extensive testbed has been designed and run, targeting the use case of a video stream over a WebRTC session transmitted from an UAV to a fixed receiver via a WiFi connection. A custom power meter has been realized with open-source tools, in order to obtain a quite accurate energy profiling. The main goal of the testbed is the evaluation of the energy-efficiency of WebRTC in the afore-mentioned scenario, in order to evaluate the achievable video quality while targeting low-power consumption. Towards this objective, after some pre-liminary tests, comprehensive experiments have been conducted to compare the achievable performance in automatic mode (default settings) and in man-ual mode (custom settings). The results shown that the default settings of WebRTC privileges video quality over energy-efficiency. By moving from default settings to custom ones, the power consumption can be reduced at the expense of video quality/fluidity. The main outcome of this work is the energy profiling of WebRTC over a Raspberry Pi 3 board, which allows to select a different working profile to target different requirements in different application scenarios, where low-power requirements can be privileged over video quality (or vice versa).

(42)

To further investigate and extend this thesis, some future works are sug-gested. Firstly, since the proposed experimental set-up have been tested on a controlled environment, a testing on-board of an UAV is necessary to bet-ter characbet-terize the impact of a realistic communication channel. Secondly, the majority of constrained platforms on the market have a dedicated hard-ware for encoding/decoding purpose. The Raspberry Pi 3 has the ability to process the H.264/MPEG-4 codec by using the on-board media

proces-sor1_{. Hovewer, WebRTC does not still provide a hardware implementation}

for any of the supported video codecs. Therefore, introducing hardware ac-celeration for the encoding process would surely provide benefits to the video quality/fluidity, while the overall power consumption must be probably re-evaluated. Lastly, the on-going diffusion of battery-powered platforms leads to observe a new approach on software development: in fact, an energy-aware development cycle [17] should be taken into account to introduce new features with the objective to improve the energy-efficiency of the WebRTC framework.

1

https://docs.broadcom.com/docs-and-downloads/docs/support/videocore/ VideoCoreIV-AG100-R.pdf

(43)

Appendix A

In this Appendix, the most relevant parts of the developed client/server

application and the edited source code of WebRTC framework are listed1_.

The sender and receiver client implementations can be found in the sr-c/webrtc/thesis folder. They are very similar to the client sample application but converted in a command-line interface programs. The sender expects the following input parameters declared in pc headless sender/params.h:

1 s t r u c t t e s t p a r a m s { 2 s t a t i c s t d : : s t r i n g s e r v e r ; 3 s t a t i c i n t p o r t ; 4 s t a t i c b o o l a u t o c o n n e c t ; 5 s t a t i c b o o l a u t o c a l l ; 6 s t a t i c s t d : : s t r i n g r e f e r e n c e f i l e n a m e ; 7 s t a t i c s t d : : s t r i n g t e s t f i l e n a m e ; 8 s t a t i c s t d : : s t r i n g s t a t s f i l e n a m e ; 9 s t a t i c i n t c p u s p e e d ; 10 s t a t i c u n s i g n e d i n t g t h r e a d s ; 11 s t a t i c b o o l d e n o i s i n g O n ; 12 s t a t i c b o o l frameDroppingOn ; 13 s t a t i c u i n t 3 2 t b i t r a t e ; 14 } ;

On the other side, the receiver expects the input parameters declared in pc headless receiver/params.h 1 s t r u c t t e s t p a r a m s { 2 s t a t i c s t d : : s t r i n g s e r v e r ; 3 s t a t i c i n t p o r t ; 4 s t a t i c b o o l a u t o c o n n e c t ; 5 s t a t i c b o o l a u t o c a l l ; 6 s t a t i c s t d : : s t r i n g t e s t f i l e n a m e ; 7 } ;

The sender to receiver peer connection is constrained to offer a video stream only as implemented in pc headless sender/conductor.cc:

1 v o i d C o n d u c t o r : : C o n n e c t T o P e e r (i n t p e e r i d ) {

(44)

2 RTC DCHECK( p e e r i d == −1) ; 3 RTC DCHECK( p e e r i d != −1) ; 4 5 i f ( p e e r c o n n e c t i o n . g e t ( ) ) { 6 s t d : : c o u t << ” E r r o r , we o n l y s u p p o r t c o n n e c t i n g t o o n e p e e r a t a t i m e ” << ” \n ”; 7 r e t u r n; 8 } 9 10 w e b r t c : : F a k e C o n s t r a i n t s c o n s t r a i n t s 3 ; 11 i f ( I n i t i a l i z e P e e r C o n n e c t i o n ( ) ) { 12 p e e r i d = p e e r i d ; 13 c o n s t r a i n t s 3 . AddMandatory ( w e b r t c : : M e d i a C o n s t r a i n t s I n t e r f a c e : : k O f f e r T o R e c e i v e V i d e o , 14 ” t r u e ”) ; 15 c o n s t r a i n t s 3 . AddMandatory ( w e b r t c : : M e d i a C o n s t r a i n t s I n t e r f a c e : : k O f f e r T o R e c e i v e A u d i o , 16 ” f a l s e ”) ; 17 p e e r c o n n e c t i o n −>C r e a t e O f f e r (t h i s, &c o n s t r a i n t s 3 ) ; 18 } e l s e { 19 s t d : : c o u t << ” E r r o r , f a i l e d t o i n i t i a l i z e P e e r C o n n e c t i o n ”<< ” \n ”; 20 } 21 }

In turn, the video stream is constrained on the frame resolution in the AddStream() function: 1 w e b r t c : : F a k e C o n s t r a i n t s c o n s t r a i n t s 2 ; 2 c o n s t r a i n t s 2 . AddMandatory ( w e b r t c : : M e d i a C o n s t r a i n t s I n t e r f a c e : : kMinWidth , 3 ” 6 4 0 ”) ; 4 c o n s t r a i n t s 2 . AddMandatory ( w e b r t c : : M e d i a C o n s t r a i n t s I n t e r f a c e : : kMaxWidth , 5 ” 6 4 0 ”) ; 6 c o n s t r a i n t s 2 . AddMandatory ( w e b r t c : : M e d i a C o n s t r a i n t s I n t e r f a c e : : kMinHeight , 7 ” 4 8 0 ”) ; 8 c o n s t r a i n t s 2 . AddMandatory ( w e b r t c : : M e d i a C o n s t r a i n t s I n t e r f a c e : : kMaxHeight , 9 ” 4 8 0 ”) ;

The server used to enable the connection between the two peers is the one provided by the server example.

The WebRTC Native C++ API (M57 revision) has been edited to al-low custom settings test configurations. In the webrtcvideoengine2.cc file the function ConfigureVideoEncoderSettings(const VideoCodec& codec) dis-ables some optmization features in order to avoid any interference with the testing:

1 i f ( CodecNamesEq ( c o d e c . name , kVp8CodecName ) ) {

2 w e b r t c : : VideoCodecVP8 v p 8 s e t t i n g s =

3 w e b r t c : : V i d e o E n c o d e r : : G e t D e f a u l t V p 8 S e t t i n g s ( ) ;

4 v p 8 s e t t i n g s . a u t o m a t i c R e s i z e O n = f a l s e;

5 LOG( LS INFO ) << FUNCTION << ” a u t o m a t i c R e s i z e O n : ”

6 << v p 8 s e t t i n g s . a u t o m a t i c R e s i z e O n ;

7 // VP8 d e n o i s i n g i s e n a b l e d by d e f a u l t .

8 v p 8 s e t t i n g s . d e n o i s i n g O n = t e s t p a r a m s : : d e n o i s i n g O n ;

9 LOG( LS INFO ) << FUNCTION << ” d e n o i s i n g O n : ”

10 << v p 8 s e t t i n g s . d e n o i s i n g O n ;

11 LOG( LS TEST ) << FUNCTION << ” d e n o i s i n g O n : ”

12 << s t d : : b o o l a l p h a << v p 8 s e t t i n g s . d e n o i s i n g O n ;

13 v p 8 s e t t i n g s . frameDroppingOn = t e s t p a r a m s : : frameDroppingOn ;

14 LOG( LS INFO ) << FUNCTION << ” frameDroppingOn : ”

15 << v p 8 s e t t i n g s . frameDroppingOn ;

16 LOG( LS TEST ) << FUNCTION << ” frameDroppingOn : ”

(45)

18 r e t u r n new r t c : : R e f C o u n t e d O b j e c t<

19 w e b r t c : : V i d e o E n c o d e r C o n f i g : : V p 8 E n c o d e r S p e c i f i c S e t t i n g s >( v p 8 s e t t i n g s ) ;

20 }

The function SetVideoSend(uint32 t ssrc, bool enable, const VideoOptions* options, rtc::VideoSourceInterface <webrtc::VideoFrame>* source) imple-ments a dummy video source which is the chosen input file for testing: 1 i f ( t e s t p a r a m s : : r e f e r e n c e f i l e n a m e != ” ”) { 2 f r a m e g e n e r a t o r = 3 w e b r t c : : t e s t : : F r a m e G e n e r a t o r C a p t u r e r : : C r e a t e F r o m Y u v F i l e ( 4 t e s t p a r a m s : : r e f e r e n c e f i l e n a m e , 5 6 4 0 , 4 8 0 , 3 0 , w e b r t c : : C l o c k : : G e t R e a l T i m e C l o c k ( ) ) ; 6 f r a m e g e n e r a t o r −>S t a r t ( ) ; 7 r e t u r n kv−>s e c o n d −>S e t V i d e o S e n d ( e n a b l e , o p t i o n s , f r a m e g e n e r a t o r ) ; 8 } 9 e l s e 10 r e t u r n kv−>s e c o n d −>S e t V i d e o S e n d ( e n a b l e , o p t i o n s , s o u r c e ) ;

In the bitrate allocator.cc file the function OnNetworkChanged(uint32 t tar-get bitrate bps, uint8 t fraction loss, int64 t rtt, int64 t probing interval ms) uses the bitrate given as input parameter in order to impose the selected source rate to the Video Encoder module:

1 u i n t 3 2 t i n p u t t a r g e t b i t r a t e b p s = s t a t i c c a s t<u i n t 3 2 t >( t e s t p a r a m s : : b i t r a t e ) ;

2 i f ( l a s t b i t r a t e b p s == 0 )

3 LOG( LS TEST ) << FUNCTION <<

4 ” b i t r a t e : ”<< i n p u t t a r g e t b i t r a t e b p s ; 5 l a s t b i t r a t e b p s = i n p u t t a r g e t b i t r a t e b p s ; 6 l a s t n o n z e r o b i t r a t e b p s = 7 i n p u t t a r g e t b i t r a t e b p s > 0 ? 8 i n p u t t a r g e t b i t r a t e b p s : l a s t n o n z e r o b i t r a t e b p s ; 9 l a s t f r a c t i o n l o s s = f r a c t i o n l o s s ; 10 l a s t r t t = r t t ; 11 l a s t p r o b i n g i n t e r v a l m s = p r o b i n g i n t e r v a l m s ; 12 13 // P e r i o d i c a l l y l o g t h e i n c o m i n g BWE. 14 i n t 6 4 t now = c l o c k −>T i m e I n M i l l i s e c o n d s ( ) ; 15 i f ( now > l a s t b w e l o g t i m e + k B w e L o g I n t e r v a l M s ) {

16 LOG( LS INFO ) << ” C u r r e n t BWE ”<< t e s t p a r a m s : : b i t r a t e ;

17 l a s t b w e l o g t i m e = now ;

18 }

19

20 O b s e r v e r A l l o c a t i o n a l l o c a t i o n = A l l o c a t e B i t r a t e s ( i n p u t t a r g e t b i t r a t e b p s ) ;

Finally, in the file vp8 impl.cc the class FileRenderPassthroughSenderInter-nal implements the method to generate a YUV file from raw frames used in both clients: 1 c l a s s F i l e R e n d e r P a s s t h r o u g h S e n d e r I n t e r n a l { 2 p u b l i c: 3 F i l e R e n d e r P a s s t h r o u g h S e n d e r I n t e r n a l ( ) 4 : f i l e ( n u l l p t r ) , 5 l a s t w i d t h ( 0 ) , 6 l a s t h e i g h t ( 0 ) {} 7 8 ˜ F i l e R e n d e r P a s s t h r o u g h S e n d e r I n t e r n a l ( ) { 9 i f ( f i l e ) { 10 f c l o s e ( f i l e ) ;

(46)

11 } 12 } 13 14 v o i d Fr am eO nY uv In te rn al (c o n s t w e b r t c : : VideoFrame& v i d e o f r a m e ) { 15 r e c s t o p m s i n t e r n a l = r t c : : T i m e M i l l i s ( ) ; 16 r e c m s i n t e r n a l = r t c : : T i m e D i f f ( r e c s t o p m s i n t e r n a l , r e c s t a r t m s i n t e r n a l ) ; 17 i f ( r e c m s i n t e r n a l >= r e c m a x m s i n t e r n a l && f i l e && s t a t s f i l e ) {

18 LOG( LS INFO ) << FUNCTION << ” r e c m s i n t e r n a l : ” << r e c m s i n t e r n a l ;

19 c o m p l e t e d i n t e r n a l = t r u e; 20 f c l o s e ( f i l e ) ; 21 f c l o s e ( s t a t s f i l e ) ; 22 } 23 e l s e { 24 i f ( l a s t w i d t h != v i d e o f r a m e . w i d t h ( ) | | 25 l a s t h e i g h t != v i d e o f r a m e . h e i g h t ( ) ) { 26 f i l e = f o p e n ( t e s t p a r a m s : : t e s t f i l e n a m e . c s t r ( ) , ”wb”) ;

27 LOG( LS INFO ) << FUNCTION << ” t e s t f i l e n a m e : ”

28 << t e s t p a r a m s : : t e s t f i l e n a m e ;

29 i f ( ! f i l e )

30 LOG( LS INFO ) << FUNCTION << t e s t p a r a m s : : t e s t f i l e n a m e << ” e r r o r ”;

31 r e c s t a r t m s i n t e r n a l = r t c : : T i m e M i l l i s ( ) ; 32 } 33 l a s t w i d t h = v i d e o f r a m e . w i d t h ( ) ; 34 l a s t h e i g h t = v i d e o f r a m e . h e i g h t ( ) ; 35 P r i n t V i d e o F r a m e ( v i d e o f r a m e , f i l e ) ; 36 } 37 } 38 39 FILE∗ f i l e ; 40 i n t l a s t w i d t h ; 41 i n t l a s t h e i g h t ; 42 } ;

The function InitEncode(const VideoCodec* inst, int number of cores, size t /*maxPayloadSize */) set the encoder with the sender input parameters in order to test the selected custom configuration:

1 c p u s p e e d [ 0 ] = t e s t p a r a m s : : c p u s p e e d ;

2 LOG( LS INFO ) << FUNCTION <<” c p u s p e e d : ” << c p u s p e e d [ 0 ] ;

and

1 c o n f i g u r a t i o n s [ 0 ] . g t h r e a d s = t e s t p a r a m s : : g t h r e a d s ;

2 LOG( LS INFO ) << FUNCTION <<” g t h r e a d s : ” << c o n f i g u r a t i o n s [ 0 ] . g t h r e a d s ;

At sender side, the function Encode(const VideoFrame& frame, const Codec-SpecificInfo* codec specific info, const std::vector<FrameType>* frame typ es) implements the call to che FrameOnYuvInternal(const webrtc::VideoFram e& video frame) method before the frame is encoded:

1 i f ( f r a m e c o u n t t e s t i n t e r n a l >= 51 && ! c o m p l e t e d i n t e r n a l ) {

2 f i l e p a s s t h r o u g h s e n d e r i n t e r n a l . F ra meO nY u vI nt ern a l ( f r a m e ) ;

3 }

Moreover, in the function GetEncodedPartitions(const VideoFrame& input i mage, bool only predicting from key frame) statistical data are updated af-ter the frame is encoded:

(47)

1 f r a m e s t a t s = i n p u t i m a g e . n t p t i m e m s ( ) ; 2 q p a r a m s s t a t s = qp ; 3 i f ( f r a m e c o u n t t e s t i n t e r n a l >= 51 && ! c o m p l e t e d i n t e r n a l ) { 4 f p r i n t f ( s t a t s f i l e , ”%l l u ,%u,%d,%d\n ”, f r a m e s t a t s , b i t r a t e s t a t s , f r a m e r a t e s t a t s , q p a r a m s s t a t s ) ; 5 } 6 f r a m e c o u n t t e s t i n t e r n a l ++;

At the receiver side, the function ReturnFrame(const vpx image t* img, uint32 t timestamp, int64 t ntp time ms) implements the call to che Frame-OnYuvInternal(const webrtc::VideoFrame& video frame) method after the frame is decoded:

1 i f ( f r a m e c o u n t t e s t i n t e r n a l >= 51 && ! c o m p l e t e d i n t e r n a l ) {

2 f i l e p a s s t h r o u g h r e c e i v e r i n t e r n a l . F ra me On Yu vI nt er na l ( d e c o d e d i m a g e ) ;

3 }

(48)

Appendix B

In this Appendix, the custom power meter used to measure voltage, current and power of the Raspberry Pi (or similar) is detailed. Its main components are Arduino Uno and the INA219 high side DCcurrent sensor breakout. In the following the full list of implemented parts:

• Arduino Uno;

• Arduino Ethernet Shield 2;

• INA219 high side DCcurrent sensor breakout; • 16x2 LCD display;

• Breadboard and hook up wires.

The wiring connections are made as follows as shown in Figure 3.2:

• Arduino 5V to INA219 VCC, LCD VDD, LCD LED+ with 220 Ohm resistance;

• Arduino GND to INA219 GND, LCD VSS, LCD LED-, LCD R/W; • Arduino I2C SCL to INA219 SCL;

• Arduino I2C SDA to INA219 SDA; • Arduino digital pin 8 to LCD E; • Arduino digital pin 9 to LCD RS;

(49)

• Arduino ditigal pin 2 to LCD DB7; • Arduino digital pin 3 to LCD DB6; • Arduino digital pin 5 to LCD DB4; • Arduino digital pin 6 to LCD DB5; • LCD V0 to LCD VSS;

• Device power adapter VCC to INA219 Vin+; • INA219 Vin- to Device power plug VCC; • Device power adapter GND to Arduino GND; • Device power plug GND to Arduino GND.

The sketch (an Arduino program) could be developed according to your needs by means of the Arduino IDE. Depending on the use case some components could be useless, for example the Arduino Ethernet Shied 2 if connecting the board to a LAN is not needed or the LCD display if a visual interaction with the running sketch is not necessary. The ArduINA power meter is illustrated in Figure B.1.

(50)

Appendix C

In this Appendix, the sketch developed for the custom power meter is listed. The script has been designed specifically for the proposed testbed and it can be modifiable to be adapted for every use case.

1 #i n c l u d e <Wire . h> 2 #i n c l u d e <INA219 . h> 3 #i n c l u d e ” SdFat . h ” 4 #i n c l u d e ” S d F a t U t i l . h ” 5 #i n c l u d e ” F r e e S t a c k . h ” 6 #i n c l u d e <E t h e r n e t 2 . h> 7 #i n c l u d e <SPI . h> 8 #i n c l u d e <L i q u i d C r y s t a l . h> 9 10 /∗ ∗∗∗∗∗∗∗∗∗∗∗ INA219 STUFF ∗∗∗∗∗∗∗∗∗∗∗ ∗/ 11 INA219 i n a 2 1 9 ; 12 L i q u i d C r y s t a l l c d ( 9 , 8 , 5 , 6 , 3 , 2 ) ; 13 u n s i g n e d l o n g l o g g i n g I n t e r v a l = 1 0 0 0 ; 14 u n s i g n e d l o n g l a s t R e a d i n g T i m e = 0 ; 15 u n s i g n e d l o n g l o g g i n g T i m e = 0 ; 16 u n s i g n e d l o n g s t a r t M i l l i s = 0 ; 17 b o o l e a n s t a r t S t o p L o g g e r = f a l s e; 18 b o o l e a n i n p u t R e q u e s t e d = f a l s e; 19 c h a r∗ l o g F i l e n a m e ; 20 21 /∗ ∗∗∗∗∗∗∗∗∗∗∗ ETHERNET STUFF ∗∗∗∗∗∗∗∗∗∗∗ ∗/

22 b y t e mac [ ] = { 0 x90 , 0xA2 , 0xDA , 0 x10 , 0 xF1 , 0x5A } ;

23 I P A d d r e s s i p ( 1 9 2 , 1 6 8 , 0 , 5 ) ; 24 I P A d d r e s s s u b n e t ( 2 5 5 , 2 5 5 , 2 5 5 , 0 ) ; 25 I P A d d r e s s g a t e w a y ( 1 9 2 , 1 6 8 , 0 , 2 ) ; 26 I P A d d r e s s d n S e r v e r ( 8 , 8 , 8 , 8 ) ; 27 E t h e r n e t S e r v e r s e r v e r ( 8 0 ) ; 28 29 /∗ ∗∗∗∗∗∗∗∗∗∗∗ SDCARD STUFF ∗∗∗∗∗∗∗∗∗∗∗ ∗/ 30 SdFat s d ; 31 S d F i l e r o o t ; 32 S d F i l e f i l e ; 33 34 v o i d s e t u p ( ) { 35 36 S e r i a l . b e g i n ( 5 7 6 0 0 ) ; 37 pinMode ( 1 0 , OUTPUT) ; 38 d i g i t a l W r i t e ( 1 0 , HIGH) ; 39 i n a 2 1 9 . b e g i n ( ) ; 40 l c d . b e g i n ( 1 6 , 2 ) ; 41 l c d . p r i n t ( F (” l o g g e r r e a d y ”) ) ; 42

(51)

44 45 E t h e r n e t . b e g i n ( mac , i p , d n S e r v e r , gateway , s u b n e t ) ; 46 s e r v e r . b e g i n ( ) ; 47 48 } 49 50 v o i d g e t P o w e r ( u n s i g n e d l o n g l a s t R e a d i n g T i m e ) { 51 52 i n t c o u n t = 5 0 0 ; 53 f l o a t b u s v o l t a g e = i n a 2 1 9 . b u s V o l t a g e ( ) ; 54 55 // w a i t s f o r c o n v e r s i o n r e a d y 56 w h i l e( ! i n a 2 1 9 . r e a d y ( ) && c o u n t < 5 0 0 ) { 57 c o u n t ++; 58 d e l a y ( 1 ) ; 59 b u s v o l t a g e = i n a 2 1 9 . b u s V o l t a g e ( ) ; 60 } 61 62 f l o a t s h u n t v o l t a g e = i n a 2 1 9 . s h u n t V o l t a g e ( ) ; 63 f l o a t c u r r e n t A = i n a 2 1 9 . s h u n t C u r r e n t ( ) ; 64 f l o a t power mW2 = i n a 2 1 9 . busPower ( ) ∗ 1 0 0 0 ; 65 f l o a t l o a d v o l t a g e = b u s v o l t a g e + s h u n t v o l t a g e ; 66 f l o a t power mW1 = ( l o a d v o l t a g e ∗ c u r r e n t A ) ∗ 1 0 0 0 ; 67 c u r r e n t A = c u r r e n t A ∗ 1 0 0 0 ; 68 69 i f ( f i l e . i s O p e n ( ) ) { 70 f i l e . p r i n t ( ( S t r i n g ) l a s t R e a d i n g T i m e ) ; 71 f i l e . p r i n t (” , ”) ; 72 f i l e . p r i n t ( l o a d v o l t a g e , 3 ) ; 73 f i l e . p r i n t (” , ”) ; 74 f i l e . p r i n t ( c u r r e n t A , 3 ) ; 75 f i l e . p r i n t (” , ”) ; 76 f i l e . p r i n t ( power mW1 , 3 ) ; 77 f i l e . p r i n t (” , ”) ; 78 f i l e . p r i n t l n ( power mW2 , 3 ) ; 79 } 80 e l s e { 81 S e r i a l . p r i n t l n ( F (” e r r o r w r i t i n g ”) ) ; 82 } 83 r e t u r n; 84 85 } 86 87 #d e f i n e BUFSIZE 50 88 89 v o i d l o o p ( ) { 90 91 c h a r c l i e n t l i n e [ BUFSIZE ] ; 92 i n t i n d e x = 0 ; 93 E t h e r n e t C l i e n t c l i e n t = s e r v e r . a v a i l a b l e ( ) ; 94 95 i f ( s t a r t S t o p L o g g e r ) { 96 97 u n s i g n e d l o n g c u r r e n t M i l l i s = m i l l i s ( ) ; 98 99 i f ( ( u n s i g n e d l o n g) ( c u r r e n t M i l l i s − l a s t R e a d i n g T i m e ) >= l o g g i n g I n t e r v a l ) { 100 l a s t R e a d i n g T i m e = m i l l i s ( ) ; 101 g e t P o w e r ( l a s t R e a d i n g T i m e ) ; 102 103 // f o r c e d a t a t o SD and u p d a t e t h e d i r e c t o r y e n t r y t o a v o i d d a t a l o s s 104 i f ( ! f i l e . s y n c ( ) | | f i l e . g e t W r i t e E r r o r ( ) ) { 105 S e r i a l . p r i n t l n ( F (” w r i t e e r r o r ”) ) ; 106 } 107 108 } 109 e l s e i f ( ( u n s i g n e d l o n g) ( c u r r e n t M i l l i s − s t a r t M i l l i s ) > l o g g i n g T i m e ) { 110 f i l e . c l o s e ( ) ;

WebRTC: energy-efficiency characterization for constrained devices

University of Pisa and Scuola Superiore Sant’Anna

Master Degree

in Computer Science and Networking

Master Thesis

WebRTC: energy-efficiency characterization

for constrained devices

Supervisor

Nicola Tonellotto

Co-supervisor

Alberto Gotta

Candidate

Patrik Rosini

Academic Year 2016/2017

Contents

List of Figures

List of Tables

Acronyms

Acknowledgements

Chapter 1

Introduction

Chapter 2

Background

2.1

WebRTC framework

2.2

Video quality measurements

Chapter 3

Experimental analysis

3.1

Problem statement

3.2

Testbed environment

3.2.1

Power measurement

3.2.2

Preliminary testing

3.2.3

Parameters under considerations

3.2.4

Testing setup

Chapter 4

Perfomance evaluation

4.1

Performance metrics

4.2

Automatic mode

4.3

Manual mode

Chapter 5

Conclusions and Future Works

Appendix A

Appendix B

Appendix C