High Performance Audio
Streaming for Bluetooth Low
Energy Applications
Author:
Diego A Parra GuzmanSupervisor(s):
Dr. Marco Danelutto Dr. Roberto SanninoThesis submitted for the degree of
Msc in Computer Science and Networking
Department of Computer Science and Networking
University of pisa and SSSUP Sant’Anna
High Performance Audio Streaming for Bluetooth
Low Energy Applications
Diego Alejandro Parra Guzman
Submitted for the degree of Msc in Computer Science and
Networking
2017
Abstract
Supported by the meaningful impact caused by several computer science branches such as software service engineering, large-scale data processing, digital communi-cations, as well the influence of high-performance computing in embedded systems. The modern societies have been able to be connected all time and everywhere where entities known as clients request and process data provided by servers that encapsu-late the data as services and stream it; commonly, making use middle or large range communication networks.
As a result, nowadays there is an increasing demand for sense, and processing data without the human intervention, establishing an interconnection between phys-ical objects through the internet. This paradigm recently called internet of things or simply IoT, is full of challenges principally in the case of Multimedia wireless devices that are affected by the common wireless vulnerabilities, protocol standardization, energy consumption, also hardware and software limitations.
Two important issues associated with multimedia wireless embedded systems have been considered in this thesis. (1) Explore low-level DSP and SIMD mecha-nism of common middle-class hardware architectures to improve the audio stream quality for constraining devices based on ARMv7E-M architecture specification. In this thesis, we used a great versatile audio codec known as OPUS for compress au-dio frames. (2) Subsequently, we investigated an efficient auau-dio frame transmission regarding real-time performance, scheduling and robustness to achiever group syn-chronization for Bluetooth Low Energy (BLE) applications which is a novel solution concerning private patterns well reported in the literature.
TThe author would especially like to thanks to his original supervisors Dr. Marco Danelutto, Dr. Matteo Petracca, and not least important his co-supervisor Dr. Roberto Sannino for their compromise, support, and dedication that have inspired me to research in these fields. I’m grateful for the assistance and support received by my colleagues in the Audio and Sensor Platforms research group. A special thanks should be made to Marco santacaterinna that had followed my work and his great ideas that made possible complete this thesis in the best way as possible. Thanks are also to STMicroelectronics for allows me to work each day in a comfortable en-vironment and with the necessary resources.
Finally, I would like to express my great appreciation to my family and others friends not mentioned here who continued to provide support and encouragement to success my master degree in computer science and networking.
Contents
Abstract iii
Acknowledgements iv
1 Introduction 1
1.1 Context . . . 2
1.1.1 IoT trends, principal aspects, and challenges . . . 2
1.1.2 Multimedia wireless sensor networks challenges and trends . 5 1.2 Contribution of This Thesis . . . 10
1.3 Thesis Structure . . . 11
2 Background 12 2.1 Audio coding . . . 12
2.1.1 Lossless compression of digital audio . . . 13
2.1.2 Lossy compression of digital audio . . . 21
2.2 Audio Coding Attributes . . . 21
2.2.1 Rate . . . 22
2.2.2 Quality . . . 22
2.2.3 Robustness and Reliability . . . 23
2.2.4 Delay . . . 23
2.2.5 Computation and Memory Requirements . . . 23
2.3 Embedded Systems: Programming Models For Real-Time Multime-dia Applications . . . 24
2.3.1 Real time operating systems . . . 25
2.3.2 Event Driven Model . . . 27
2.4 Synchronization techniques for real-time multimedia applications . . . 30
2.4.1 Clock model . . . 30
2.4.2 Clock synchronization concepts . . . 31
2.4.3 Basic clock synchronization techniques . . . 33
2.4.4 Synchronization in distributed multimedia presentation . . . . 35
3 Hardware Devices 40
3.1 Host Microcontroller . . . 40
3.1.1 Hardware Characteristics . . . 41
3.1.2 Application Level Architecture . . . 43
3.1.3 Instruction Set . . . 44
3.2 BLE SoC . . . 47
3.2.1 BLUENRG Hardware characteristics . . . 48
3.2.2 Application Level Architecture . . . 49
4 OPUS CODEC 51 4.1 SILK Speech Codec . . . 52
4.1.1 SILK Encoder . . . 52
4.1.2 SILK Decoder . . . 58
4.2 CELT Speech Codec . . . 59
4.2.1 CELT Encoder . . . 61
4.2.2 CELT Decoder . . . 65
4.3 OPUS: CELT and SILK Hybrid Codec . . . 67
4.4 OPUS Implementation and Optimization Points . . . 68
4.4.1 OPUS implementation Tree . . . 70
4.4.2 OPUS Optimization points . . . 71
5 OPUS Optimization Strategies and Experimental Results 72 5.1 OPUS Optimization Steps . . . 72
5.2 OPUS Optimization First Strategy . . . 73
5.3 OPUS Optimization Experiments and Results . . . 78
5.3.1 Scenario 1 Results: OPUS configured for music . . . 78
5.3.2 Scenario 2 Results: OPUS configured for voice . . . 83
5.3.3 Scenario 3 Results: OPUS configured for low delay . . . 87
5.4 OPUS Optimization Second Strategy . . . 91
5.4.1 Profile Framework Design and Implementation . . . 91
5.4.2 Profile Framework Simple Example . . . 94
5.4.3 Profile Framework Some Comments . . . 95
5.5 OPUS Profiling Results . . . 96
5.5.1 Encoder Results . . . 97
5.5.2 Decoder Results . . . 100
5.6 OPUS Optimization Final Comments . . . 100
6 Bluetooth low energy (BLE) network architecture 102 6.1 BLE Protocol Stack . . . 102
Contents vii
6.1.2 Link Layer (LL) . . . 103
6.1.3 Host Controller Interface (HCI) . . . 107
6.1.4 Logical Link Control and Adaptation Protocol (L2CAP) . . . 108
6.1.5 Attribute Protocol (ATT) . . . 108
6.1.6 Security Management (SM) . . . 109
6.1.7 Generic Attribute Profile (GATT) . . . 110
6.1.8 Generic Access Profile (GAP) . . . 111
6.1.9 BLE Stack Example . . . 112
6.2 BLE Point To Multi-point Communication Strategies . . . 113
6.2.1 BLE broadcaster/observer communication . . . 113
6.2.2 BLE scheduling oriented communication . . . 114
6.3 BLE Group Synchronization . . . 116
6.3.1 BLE Group Synchronization broadcaster/observer Approach . 116 6.3.2 BLE Group Synchronization scheduling Approach . . . 117
7 BLE Group Synchronization Protocols and Results 119 7.1 How to Achieve Synchronization in BLE? . . . 120
7.1.1 The Common Notion of Time . . . 120
7.1.2 Synchronization Control Mechanisms . . . 123
7.2 Group Synchronization Implementation . . . 126
7.2.1 BLE Multinode Connection Platform . . . 126
7.2.2 BLE Clock Synchronization Service . . . 127
7.2.3 BLE Control Synchronization Service . . . 130
7.2.4 OPUS and BLE Group Synchronization Complete Solution . . 130
7.3 BLE Group Synchronization Experiments and Results . . . 132
7.3.1 Scenario 1: Clock Synchronization Service . . . 132
7.3.2 Scenario 2: Control Synchronization Service . . . 138
7.4 Group Synchronization Some Comments . . . 140
8 Conclusions 141 Bibliography 143 Appendix 153 A OPUS Optimization 153 A.1 OPUS Source Code Generic Operations . . . 153
A.2 OPUS Optimized Source Code . . . 159
A.3 Profile Framework Source Code . . . 175
B BLE Group Synchronization Source Code 181
B.1 BLE Multinode Platform . . . 181
B.1.1 Application support . . . 181 B.1.2 Connection Handler . . . 186 B.1.3 Service Handler . . . 193 B.1.4 EventHandler . . . 197 B.1.5 Network . . . 203 B.1.6 firmware typedef . . . 220
B.2 BLE Clock Synchronization Service . . . 224
B.2.1 PTP Core . . . 224
B.2.2 PTP Interrupt . . . 238
B.3 BLE Control Synchronization Service . . . 242
B.3.1 Control Synchronization client . . . 242
List of Figures
1.1 Research trends for IoT: Some of the most critical topics are Security, VLSI-Technology, Connectivity, Interoperability, Power consumption and Quality of Service. . . 3 1.2 Hardware Design trends in MWSN adapted from [14] . . . 6 1.3 Network communication subsystem trends: a) single SoC/NoC b) Multiple
radio standard support SoC/NoC, c) network co-processor (NCP). . . 8 2.1 Basic operations in most lossless compression algorithms : (1) framming
and quantization of the input signal, then (2) decorrelation from 1 to n audio-channels, and finally (3) uses the entropy to codify the processed signal. Adapted from [21]. . . 14 2.2 illustrative representation of the framing operation: Here The audio signal
is divided into 6 frames with a 50% of overlap between a pair of frame intervals. adapted from [28] . . . 15 2.3 Vector quantization approach: where a scalar input is agrupated in a
L-Dimensional vectors. Then the quantization consist in associated a closest M L dimensional code-vector that minimized the Rate-Distorition metrics, once is found its correspondently index is forwarded to the decoder that execute the inverse operation. of course both coder and decoder must to have the same codebook definition . adapted from [28] . . . 17 2.4 intra-channel decorrelation prediction based: Here the error e[n] is
es-timate as the difference between the original audio signal X[n] and an approximation signal X[n] adapted from [35]ˆ . . . 18 2.5 (a) FIR/ IIR predictor model used by DVD standard, adapted from [21]
(b)fitting curve and polynomial model used for AudioPaK [22] and SHORTEN [26] adapted from [19] . . . 19 2.6 active objects system structure and components [109] . . . 27 2.7 Event driven packet diagram illustrating the relationship among the
re-altime, framework, the kernel/RTOS and the application extracted form [109] . . . 28
2.8 Uni and multi-directional synchronization: A node Nj determines the
off-set of its local clock relative another node Ni, a) using unidirectional
com-munication, b) using a single bidirectional communication or c) that allows both nodes to measure a round-trip time. Extracted form [111] . . . 33 2.9 Delay representation of two real time multimedia scenarios: (a) Ideal
sce-nario without network transmission errors. (b) Representation of packet loss and re-connection effects for a typical real-time multimedia streaming application. Extracted form [115] . . . 35 2.10 (a)intra-media synchronization: temporal order between the source and
destination stream, (b) inter-media synchronization: temporal order be-tween multiple streams at the receives, (c) inter-destination synchroniza-tion: the presentation of the received stream by all receives, have to be approximately, at the same time. extracted form [115] . . . 36 3.1 Three Cortex-M architectures organized in terms of hardware capabilities
such as CPU, memory, buses an port connections: (a) Cortex-M3 (b) Cortex-M4 (c) Cortex-M7. extracted from [102] . . . 41 3.2 Three basic DSP Operations: (a) SMUADX (b) SMLADX (c)SMLALX .
extracted from [102] . . . 47 3.3 BlueNRG architecture (a) BlueNRG firmware blocks, (b) BlueNRG
Ap-plication Controller Interface (ACI) (c) ACI Framework . . . 48 4.1 SILK encoder block diagram. Extracted from [66] . . . 53 4.2 Shaphing analysis results : The bitrate for ecoding the exitation signal
is proportional to the area between the deemphasized spectrum (weighted input ) and the quantization noise. observe that without deemphasis, the entropy is proportional to the area between the input spectrum and the quantization noise, clearly higher. extracted from [66] . . . 55 4.3 Noise Shaping Quantization Diagram : Fana(z), Fsyn(z) are the analysis
and syntesis noise shaping filters; while P (z) is the predictor constraining both LPC and LTP filters. The quantization exitation indices are repre-sented by i(n) whereas x(n) is the input signal received forn the prefilter block and y(n) is the quantized excitation output. extracted from [69] . . 58 4.4 SILK decoder receive a range encoded bitstream, and decodes its
para-menters. These parameters (3,4,5) are used to generate the exitation sig-nal which is fed to and optiosig-nal Long-Term prediction (LTP) filter and a Short-Term prediction (LPC) filter , producing the decoded signal (6). extracted from [69] . . . 59
List of Figures xi
4.5 CELT Codec Architecture : Here the dotted lines represent the signaliza-tion data, while the soild lines represent the pure codified data. Adapted from [87] . . . 60 4.6 CELT: (a) low-overlaped windows used in 5ms frames as a reference, (b)
CELT critical bands layout referenced to the Bark scale form 0 to 20kz. Adapted from [87] . . . 62 4.7 OPUS: Hybrid architecture operating at 48khz. Adapted from [87] . . . . 67 4.8 OPUS(core files): Representation of the principal files within opus having
into account during the optimization. . . 69 5.1 CORTEX-M three stages pipeline execution: (a) Ideal execution for
arith-metic operations;(b)real scenario affected by the LOAD instruction. ex-tracted from [107] . . . 77 5.2 Extraction of the auto-correlation function within pitch xcorr arm-gnu2.S 77 5.3 OPUS average clock cycles spent to encode: Left graph correspond to the results
of a generic OPUS encoder without any optimization. Instead, the right graph represent the same results after to have applied the corresponding optimization techniques. . . 80 5.4 OPUS average clock cycles spent to decode: Left graph correspond to the
results of a generic OPUS decode without any optimization. Instead, the right graph represent the same results after to have applied the correspond-ing optimization techniques . . . 80 5.5 OPUS speedup: Left graph correspond to the speedup achieved at the
encoder. Instead, the right graph represent the speedup achieved at the decoder . . . 81 5.6 OPUS differences between the encoder and decoder: The Left graph
corre-sponds to the proportion on how much are the clock cycles spending by the encoder with respect to the decoder without any optimization. Instead, the right graph represent the results after to have applied the corresponding optimization techniques . . . 81 5.7 OPUS average clock cycles spent to encode: Left graph correspond to the results
of a generic OPUS encoder without any optimization. Instead the right graph represent the same results after to have applied the corresponding optimization techniques. . . 84 5.8 OPUS average clock cycle spent to decode: Left graph correspond to the
results of a generic OPUS decode without any optimization. Instead, the right graph represent the same results after to have applied the correspond-ing optimization techniques. . . 84
5.9 OPUS speedup: Left graph correspond to the speedup achieved at the encoder. Instead, the right graph represent the speedup achieved at the decoder . . . 85 5.10 OPUS differences between the encoder and decoder: Left graph correspond
to the proportion on how much are the clock cycle spending by the encoder with respect to the decoder without any optimization. Instead, the right graph represent the same results after to have applied the corresponding optimization techniques . . . 85 5.11 OPUS average clock cycles spending by the encode: Left graph correspond to
the results of a generic OPUS encoder without any optimization. Instead, the right graph represent the same results after to have applied the corresponding optimization techniques. . . 88 5.12 OPUS average clock cycles spending by the decode: Left graph
corre-spond to the results of a generic OPUS decode without any optimization. Instead, the right graph represent the same results after to have applied the corresponding optimization techniques . . . 88 5.13 OPUS Speedup: Left graph correspond to the speedup achieved at the
encoder. Instead, the right graph represent the speedup achieved at the decoder . . . 89 5.14 OPUS differences between the encoder and decoder: Left graph correspond
to the proportion on how much are the clock cycle spending by the encoder with respect to the decoder without any optimization. Instead, the right graph represent the same results after to have applied the corresponding optimization techniques . . . 89 5.15 Profiling diagram where is represented the interaction between and
exter-nal computer and the CORTEX-M4 target; Both are connected through and JTAG interface. . . 92 5.16 Profiling example that include three functions OP 1, OP 2, OP 3. They
are marked to be profiled and called sequentially from the main. The left figure represent the source code before to be compiled. While, the right figure is the same code with the instrument functions inserted at the beginning and at the end of each function . . . 94 5.17 Profiling example that include three functions OP 1, OP 2, OP 3. They
are marked to be profiled and called sequentially from the main. At run time the functions inserted by the compiler are used to capture the number of cycles, as well, the called and source function addressees- The report also contain the number of times that the called function is accessed. Then, when the program execution reach the trace end() functions, a XML file is generated with the described information. . . 95
List of Figures xiii
5.18 OPUS profiling results from the encoder side(MUSIC): Left graph is the representation of the computation of OPUS operating in CELT mode, con-figured for MUSIC and complexity 4, while right graph is what is obtained at complexity 5. It is clear that the pitch prediction is included to improve the quality of the coder causing an increment in the number of clock cycles. 98 5.19 OPUS profiling results from the encoder side(VOICE): Left graph is the
representation of the computation of CELT configured for VOICE at
com-plexity 4, while right graph is what is obtained at comcom-plexity 5 . . . 98
5.20 OPUS profiling results from the encoder side(LOW DELAY): Left graph is the representation of the computation of CELT configured for LOW DE-LAY at complexity 4, while right graph is what is obtained at complexity 5 . . . 99
5.21 OPUS profiling results from the decoder side(MUSIC): Left graph is the representation of the computation of CELT configured for MUSIC at com-plexity 4, while right graph is what is obtained at comcom-plexity 5 . . . 100
6.1 Bluetooth Low Energy Network stack . . . 103
6.2 Bluetooth Low Energy Link layer status . . . 104
6.3 BLE network stack example . . . 112
6.4 Bluetooth LE broadcaster(advertising) and observer(scanning): (a) Advet-icement events depets of the advertAdvet-icement interval, plus a random delay from 0 to 10 ms defined by the link layer (b) data exchanges based on adverticement events, where the broadcaster most to forward the packets in the three adverticement channels to increase the probablity to packet reception, since the observer cannot acknolegment the reception. extracted from [94] . . . 114
6.5 Bluetooth LE Schedulling time slot allocation inspirated in the description referenced by [94] . . . 115
7.1 PTP Protocol example: Representation of a simple Clock Synchronization pro-cess between a pair of master and slave devices interchanging the messages (SYNC, FOLLOW UP, DELAY REQ, DELAY RESP), where at the end the client is able to estimate correctly the clock offset, transmission delay and therefore its adjust-ment error is close to zero. . . 122
7.2 BLE Multiple Connections Example: Three connections have been allocated no-tice that once the third is allocated there is not any chance for a new connection establishment since there are not more free slots for that anchor period . . . 124
7.3 Connection Interval Information seen form the BLE master node: (a) connection interval, connection events for three connections following the round robing fash-ion. Notice that the maximum delay corresponds exactly to the last connection scheduled. (b) Method to archive group synchronization based in the connection parameters and the maximum delay determined ether statically, or making use of the PTP, RTP-RTCP protocols . . . 125 7.4 Relationship between the application, the multinode platform and the low level
libraries provided by the BLE chip . . . 128 7.5 BLE Clock synchronization service and characteristics configuration . . . 129 7.6 Control Synchronization Service : (a) Static approach in which the transmitter
es-timates the delay metric based on the pure connection interval period and length. For this case is needed only one characteristic TXctrl char. (b) Alternative in which the transmitter estimates the delay metric employing the PTP clock syn-chronization protocol, it is needed two characteristics in order to transmit RTCP reports in both directions. . . 131 7.7 OPUS and group synchronization implementation scheme: A representation of
three services OPUS, Control synchronization (RTCP), and Clock Synchroniza-tion (PTP) . These three services coperate each other, as well, they interact with the multinode platform to archive group synchronization for Voice Over Bluetooth Low Energy. . . 132 7.8 Clock dispersion and drift in a normal conditions with sampling period of 200ms
and observation windows of 15 minutes: (a) clock dispersion and drift between the server(Master clock ) an both clients(Slave Clocks) (b) clock dispersion and drift between a pair of clients(Slave Clocks) . . . 134 7.9 Messages transmited by the clock synchonization service . . . 135 7.10 Clock drift of two clients(Slave Clocks) respect to a reference server (Master clock )
as a function of synchronization periods of 0.2 sec measured during 5 minutes . . 135 7.11 Clock drift of two clients(Slave Clocks) as a function of synchronization periods of
0.2 sec: (a)clock drift considering the events of the connection interval, (b) clock drift without consider the events of the connection interval . . . 136 7.12 Group synchronization results for random data transmission between one master
an two slaves, where is reported the transmission period, presentation delay, and presentation drift for one transmission period . . . 139 7.13 Group synchronization results for random data transmission between one master
an two slaves, where is reported the evolution on the presentation time drift during two synchronization periods . . . 139
List of Tables
3.1 Host Micro-controller Technical Characteristics . . . 42
3.2 SIMD instructions multiply and MAC 32bits words . . . 45
3.3 SIMD instructions multiply and MAC 16bits half-words: x,y could be combined as (B: lower half) (T: upper half) . . . 45
3.4 SIMD Multiple Load-Store Instructions: x,y could be combined a (IA: increment after) or (DB: Decrement Before ) . . . 46
3.5 SIMD: Single Load and Store Instructions . . . 46
3.6 SIMD: Shift and bit-reverse operations . . . 46
4.1 SILK codec Specification extracted form [66] . . . 52
4.2 OPUS v1.2 Architectures and Characteristics . . . 69
4.3 OPUS: Architecture and implementation source files relationship . . . 70
5.1 OPUS Common MULT Fixed Point Operations . . . 74
5.2 OPUS Common bit manipulation operations . . . 75
5.3 CMSIS intrinsic functions . . . 76
5.4 CMSIS Specialized math Functions for ARMv7e . . . 76
5.5 OPUS Experiment’s Configuration . . . 78
5.6 OPUS Profiling Scenarios . . . 96
6.1 BLE Advertisement Parameters . . . 105
6.2 BLE Scanning Parameters . . . 106
6.3 BLE Connection Parameters . . . 107
6.4 BLE Attribute Definition . . . 109
6.5 BLE GAP Roles, Modes, and Procedures specification . . . 111
7.1 Clock Synchronization Service Test Parameters . . . 133
Introduction
The faster growth in the cloud computing research, as well the vital expansion in the marker of a high-performance microprocessor, followed by the increasing use of short-range wireless personal area networks have made possible to supply small devices with a certain intelligence. Thousand of these tiny devices can be sensed, controlled and programmed directly or indirectly through the well-defined network architectures [1].
Nowadays there are around to 5 billions of smart devices connected in IoT sys-tems such as thermostats, energy meters, lighting control syssys-tems, music streaming and control systems; Most of those systems are connected through the internet com-monly using web services, or maybe through smartphone applications. A prominent view for IoT is expected to have an annual growth rate higher than 5.4% from 2014 up to 50 billions of connected devices in 2020; This could motivate a demand in-crement in the embedded market from 152.4 billion to 198.5 billion in the next five years [2].
Many embedded architectures can produce a high level of performance running few kilobytes of code, facilitated by the integration of an enhanced CPU equipped of independent co-processors to archive low-level agility operations which include, for example, float point units (FPUs), single instructions for Multiple Data (SIMD), and probably interrupt priority units(IPUs); . However, a particular group of smal-l/middle architectures does not support at all those facilities involving significant challenges and requirements limiting the future vision of smart objects, especially in multimedia applications, where, factors like real-time multimedia requirements, net-work constraints, and microprocessor performance are tremendously important [3].
1.1. Context 2
architectures for multimedia applications. Specifically, in an efficient audio stream-ing, it was included the open source OPUS codec as a high-quality audio codec that brings support for high-performance audio streaming as well, for interactive speech. OPUS reports a notorious enhancement to others competitor such as G.722, G.722.1c, G.711 regarding the relation between banding coding (coding quality) and bitrates [59].
The second part has related the way in which are streaming the codified au-dio frames; this thesis offers a novel solution for networks based on Bluetooth Low Energy (BLE) technology. Typically, employed for single point to point communi-cations(one master and one slave).
The idea is to extend the communication facilities of BLE to point to multipoint. Also, we make use of the precision time protocol (PTP), as well a control structure inspired in the real-time protocol (RTP- RTCP) for archive inter-destination media synchronization1 (IDMS) [108].
The rest of this chapter will describe the context in which this thesis was de-veloped . Especially on the relevant characteristics, challenges, and trends carryout by the technologies involved. The contribution of internet of things and multimedia sensor networks then will be mentioned.
1.1
Context
Currently, IoT is getting particular attention from several actors involving issues associated particularly with embedded systems, sensor networks, and software ser-vices engineering. Figure 1.1 summarizes different research trends associated with IoT.
1.1.1
IoT trends, principal aspects, and challenges
standarization
Naturally, each distributed IoT sensor needs to interact with its neighbors maybe to change its behavior, get an action, forward or receive information, or persuade some other sensor. Nevertheless, Vendors commonly force their products to have the same communication protocols and operating system. As a result, the network
1IDMS is the process of synchronizing playout across multiple media receivers
Figure 1.1: Research trends for IoT: Some of the most critical topics are Security, VLSI-Technology, Connectivity, Interoperability, Power consumption and Quality of Service.
interoperability is one of the most critical aspects that affect the growth of IoT. Cur-rently, the communication layers are inconsistent between different manufactures [1].
Some standard organizations including IEEE, IETF are putting a significant investment and efforts in ensuring inter-operability between the different network-ing stacks. Consequently standards like IPv6 (6LowPAN), IEEE802.15.4 MAC are becoming very popular, and this motivated to several organizations like Bluetooth integrate IPv6 (6LowPAN) in their communication stacks. Thus, it would help to allow billions of devices to be connected and to exchange information through the standard internet protocol [4].
interoperability
Considering the flexibility in the modern IoT systems; The integration between cloud platforms is crucial to facilitate the cooperation between two subsystems, as, for example, the pressure and temperature sensors possibly associated with two different web services. They need to co-operate to apply an efficient regulation
1.1. Context 4
command into the actuator. As a consequence, some companies like IFTTT.com or zAPler.com are working in web services integrated platforms. Following the ( IF This are Then That ) logic cloud platforms able to support better interactions including, for example, a most sophisticated flow control strategy or extending the current communication protocols [1].
Security
In the future, the IoT sensors will be completely integrated with our lives, with many new nodes being added into dynamic network structures. These will lead to the emergence of malicious actors that might take advantage of the well known wireless vulnerabilities and the limitation associated with the embedded systems architectures such as their limited memory, constrained middleware, and in some cases their computation overhead [5] [6], notably it will require a substantial re-engineering of to those tiny devices.
Trusted computing [5] is a multi-layered security approach who reused all the result obtained so far in security and applies them to the entire IoT infrastructure. This approach explores concepts like secure booting, access control, device authen-tication, deep packet inspection, as well support for updates and patches. As an example, many open source operating systems such as wind river are deeply involved in securing devices that perform critical functions typically into highly dangerous environments such as in the nuclear and electrical industries [7].
multilanguage validation and interpretation
Thanks to the increasing demand for new services and applications, the embed-ded system industry has been in continuous change and evolution, providing high-performance multimedia processors characterized by high processing capabilities. But, even in such kind of processors, the perspective of IoT claim a solid compati-bility between the different applications, particularly in the codecs used to stream audio or video. Additionally, they are forced to bring multilingual support, vali-dation, and interpretation [5]. The most popular electronic consumer companies are putting a little effort to reduce this gap. Applications like APPLE-SIRI, ALEXA, and GOOGLE HOME are trivial examples. Unfortunately, these are far away from a sustainable solution [8].
Energy consumption
From several years the IoT infrastructures have been constrained by energy con-sumption. Recently several alternatives have been thought to provide a suitable power management solution to the IoT sensors. One of these alternatives explores various energy harvesting strategies. They are concentrated principally to guaran-tee portability. In this way, a sensor could be charged drawing out energy present in the medium and then convert it into electrical energy, for example taking ad-vantage of vibration, pressure, light, temperature, and even radio frequency. This latter motivated that ( wireless charging) begins to be one of the most popular solutions, especially in the automotive industry that should provide a sustainable charging strategy. The wireless charging model even considers the possibility to design battery-free devices qualified to work on demand during periodic intervals, picking the energy needed directly from the ambiance [9].
Additionally, the power management research seeks for ultra-low power con-sumption devices, principally during the transmission interval, implementing, for example, radio duty cycle protocols; These protocols are responsible for ensuring a short period between the measure and the actual transmission of the data. Popularly shot range protocols are desirable like Zigbee, Bluetooth low energy, IEEE802.15.4; that are taking a potential billion-dollar in the wireless sensor network market [5].
1.1.2
Multimedia wireless sensor networks challenges and
trends
The concept of multimedia wireless sensors can be considered as a distribute au-tonomous embedded system principally used to retrieve audio and video in wireless networks. Typically, the sensors are incorporated a low power inexpensive micro-phones and cameras [10]. Essential implementations recall robust digital signal processing capabilities and multimedia source coding techniques [11]. They repre-sent a significant challenge during the implementation of multimedia wireless sensor networks (MWSN). As the devices increase their capabilities, more sophisticated so-lutions are needed to cover the principal performance drawbacks. This means that we need to provide support for both fix and mobile sensors, reduction in the power consumption, and to defeat the unpredictable network infrastructure performance.
Wireless networks are getting especial popularity since the interconnection based on cables is dedicated to scenarios without the restrictions associated with mobility,
1.1. Context 6
Figure 1.2: Hardware Design trends in MWSN adapted from [14]
price, or infrastructure. However, the implementation of wireless communication introduces additional unpredictable issues like delay, noise, interference, or secu-rity.Therefore, the wireless sensor networks have been restricted to carry only scalar data such that temperature or pressure with incredible results and applications primarily in industrial and home automation, agriculture, or simply in areas with difficult access [12].
Hardware Design
New results in circuit design (such as CMOS and VLSI) have influenced the in-troduction of low-cost cameras and microphones in sensor devices which expand the limits of the wireless sensor networks since now they can satisfy a wide range of applications which requires audio and video capabilities [10] [13]. However, at the same time their presence increase, the complexity due to the extra challenges that have to be faced (QoS, bandwidth, real-time restrictions required to streaming multimedia data). Figure 1.2 summarizes the different hardware design trends in multimedia wireless sensor networks [13].
The technology associated with MWSN follows two main branches: one is
cording to the hardware used for processing and transferring data, and the second is according to the technology-manufacture adopted in the audio and video peripherals. Basically is possible to classify the hardware platforms also called motes in al-most three classes: lightweight, intermediate, and PDA-class. This classification covers the devices from the less to the higher complexity concerning processing ca-pabilities, memory, data rate, and power consumption [14].
The technology adopted by the audio and video peripherals has also shifted. In the case of microphones, it has crossed from an electric microphone (ECM) to a micro-electrical-mechanical system (MEMS). This reduced the complexity needed to integrate the microphones in the MWSN system because, it comes with an inte-grated analog to digital convert (ADC), power management, hardware, and control communication port (digital serial link ). On the other hand, the introduction of (CMOS) takes over the charge-couple technology (CCD) because CMOS enhances the design of video sensors thanks to a reduction in the manufacturing scale that allowed more space on the chip for photosites, while guarantee a better low-light performance [14].
Possibly the network communication subsystem design is the one who has been more influenced by the era of IoT since thousand of components are developed ev-ery day. They are the horse fight of many vendors who offer its products as the most innovated in the market. Moreover, in practice, they follow two trends while implementing their systems: one is called system on chip (SoC), or network on a chip (NoC), and the second one as network co-processor(NCP).
SoC/NoC implements network components directly into physical silicon (chip). This technology brings an important improvement in comparison with conventional network structures such as buses or crossbar because this is low-cost technology with lowest power consumption, scalable and suitable to be parallelized. Some of the most representative fabricators are VxWorks, RTlinux, or QNX. Figure 1.3 presents two variants on this technology[springer-app] [15].
Networks co-processor(NCP) expand the possibilities offered by
SoC/NoC, since this aggregates, a host processor used to decouple the application layer form the SoC/NoC. This method is useful for applications that require high processing demand since the host controller could sleep most of the time while the SoC/NoC remains active to maintain the connection to the network. Another ad-vantage carried by the NCP is that the deployment and test of applications become more completely independent of the net. Figure 1.3 provides a pictographic
repre-1.1. Context 8
Figure 1.3: Network communication subsystem trends: a) single SoC/NoC b) Multiple radio standard support SoC/NoC, c) network co-processor (NCP).
sentation of this technology [16].
Network Architecture Design
The introduction of multimedia wireless sensor networks comes with a positive re-search increment in fields like network architecture, network standardization, dis-tributed systems, multimedia applications, QoS, QoE , and energy consumption [10], [11] [13], [14].
The literature classifies the architectures associated with multimedia sensor net-works in three types [11]:
1. homogeneous MWSN : All the sensors in the network are homogeneous, i.e., they have same capabilities and functionalities.
2. heterogeneous MWSN : The network contains multiple types of sensor, i.e., high performance for multimedia, and scalar data sensors coexist .
3. clustering MWSN : Also called Specialized network architecture, In this struc-ture, a cluster groups a set of sensors. Each cluster solves a specific task in an entire network chain. Consider for example a simple tracking object ap-plication where a particular cluster of sensors without multimedia capabilities is responsible for detecting the movement of the object, while another more sophisticated group is capable of taking a high-resolution image from different angles and points. Both clusters could cooperate reducing the energy con-sumption since the more advanced cluster could be in a sleep state most of the time, and it may switch on only when an event is a trigger by the movement
detection cluster. Notably, this approach covers most of the cases in MWSN given the drawbacks associated with the energy consumption.
Concerning MWSN network stack, there are four layers: physical, link, network, and application. The network stack is one of the most significant challenges in MWSN due to the standardization issues. At this point, it is essential to remark the greats efforts for standardizing at least the first three layers [10], [14]. In contrast, the application layer which is the responsible for processing, coding and decoding audio, data, or video remains untouchable, Since it depends on the encoding and compression methods employed.
Media coding methods have strong dependences on the application, quality of the signal as well as real-time constraints. Those are the principal sources of com-putation demand in the entire pipeline of the multimedia streaming. So, coding compression patterns essentially have to satisfy three principal requirements:
• Data compression must be as less complex as possible.
• It must be suitable to be adjusted to different network requirement (band-width, QoE, QoS.)
• In case of real-time streaming. It must be tolerant to packet loss.
Based on this, individual and distributed coding strategies have been investigated so far for MWSN. But, The distributed coding strategy is playing a leading role in this process, as it is a method that reduces the encode/decode computational complexity making an efficient distribution between the different entities in the network, and therefore the sensor nodes require fewer resources to process and send encoded data. Results like [17] [18] [13] are considered as the most relevant on this field.
1.2. Contribution of This Thesis 10
1.2
Contribution of This Thesis
Following the vision of IoT and known the limitations carried by MWSN the prin-cipal objective of this thesis is to improve the streaming of audio in multimedia wireless sensor networks implementing a high-performance audio codec (OPUS) [59] onto an intermediate-class platform.
Several points have been considered to achieve this objective:
1. Optimization: OPUS has been designed and optimized for a high-performance processor ( ARM with NEON unit, x86 or x64 ) running high level operating systems. One of our contributions is to make it able to be used by intermediate-class processors, Especially for ARMv7E-M based architectures.
The optimization follows two directions: 1) Analyze the structure of OPUS, then work in the identification of bottlenecks. 2) Characterize the architecture, and explore some low-level optimization related to the usage of DSP and SIMD instructions, data-types, dynamic memory allocation, and algorithms.
2. Integration: The optimization is only the top of the iceberg since the scope of Opus in just at the application layer. OPUS is must be supported by a net-work architecture and by some intermediate firmware capable of dealing with processing delays, multi-node synchronization, and the unpredictable issues tied to the network.
3. Multi-node Synchronization: The Bluetooth Low Energy technology has been characterized principally by Point to Point/ Client-Server communication, it has facilitated its use in multiple application such as machine to machine (M2M) and stereo communications. However, the limitations in broadcast communication support and the global internet protocol (IP) constrain its use against other technologies like IEEE.802.15.4 or WI-FI.
Even though their limitations, this thesis explains the advantage to use the Bluetooth Low Energy standard in MWSN, and propose and evaluate a Point to Point multi-node synchronization solution, which up to now was widely resolved based broadcast communications.
4. Independency of the network architecture: The solution proposed by this thesis contemplate a Bluetooth Low Energy 4.1 SoC as the responsible to intercon-nect multiple-nodes. Nevertheless this solution it’s easily portable to other wireless standards.
1.3
Thesis Structure
The remaining sections of the document are organized as a follows.
Chapter 2 brings the fundamental background required to understand the struc-ture of OPUS, followed by a description of different programming models adopted by embedded systems including various optimization methods. Finally, a background for the multiple synchronization techniques used in real-time multimedia applica-tions is presented.
Chapter 3 describes the distinct hardware components used during the im-plementation, i.e., the Host and the BLE-SoC firmware, Additionally, this chapter includes the firmware optimization facilities used for optimizing OPUS into Cor-texM4 microcontrollers.
Chapter 4 gives an introduction to OPUS audio codec showing its essential elements and characteristics. Also, is examined the critical features carried by its implementation in common performance architectures.
Chapter 5 includes the optimization techniques employed to integrate OPUS into the hardware architecture described in chapter 3 with a set of experimental results.
Chapter 6 provides an introduction to Bluetooth low energy (BLE) network architecture. This chapter offers an in-depth analysis of the inter-destination syn-chronization issues attached to BLE.
Chapter 7 contains the protocols and results obtained by the inter-destination synchronization under BLE.
Chapter 8 includes the conclusions and the contribution of this thesis regarding the vision of IoT. Also, it suggests a future work based on the achieved results.
Chapter 2
Background
This background brings the common knowledge needed to understand more interest-ing topics discourses in the followinterest-ing chapters as was mentioned in the introduction of this thesis. The background has been divided into three sections that cover widely the literature regarding audio coding, most popular programming models for handler network communications in embedded systems, and the principles of synchronization for multimedia applications.
2.1
Audio coding
A human perceptive audio signal is a representation of a sound wave which is char-acterized by a variation in the pressure level of the air that fluctuates in time in the range of roughly 20 to 20,000 Hz; This signal is continuous in value and time. Thus it carries an infinite amount of information [19]. In the eighties with the emergence of the telecommunication industry, the introduction of the compact disc (CD) its digitization become necessary its digitization with the aim to be transmitted and stored efficiently, using the minimum number of bits [19].
The audio codification techniques vary substantially according to the algorithms employed. In detail, lossless codification techniques codify the audio signal in such as way that the signal is perfectly reconstructed on the decoder without losing its quality. The lossy compression instead does not provide a perfect reconstruction of the signal but ensures high compression ratios. That has promoted its use in wireless communication. Lossless and lossy techniques are able to operate up to the limits of human hearing bandwidth. For instance, the classic PSTN bandwidth for speech occupied the narrowband from 200 to 3400 Hz, while the wideband covered the range between 50 to 7000 Hz, which finally become 20 to 20000 Hz for fullband [20] .
Speech and high fidelity audio are the principal branches in which those coding techniques are employed. The former is the technology used by (PSTN), or (VoIP), where clearly lossy compression is more relevant due to the idea that in speech transmission we need to minimize the end to end delay in the communication. High fidelity audio covers the high definition of the audio for portable audio players. and high definition audio and video streaming or storage. This is based entirely on loss-less coding or as a combination of both lossloss-less and lossy codification algorithms [20].
The transmission of an audio signal over any communication network requires a special treatment used to satisfy the application requirements, and the network constraints such as Bandwidth, QoS, QoE, etc... Logically, a trade-off exists between the compression ratio and the audio quality, since an important reduction of the bit rate typically is used to add a better transmission capacity [20]. Recently, several coding methodologies have been used. In general, they employ techniques that take advantage of the perceptual irrelevancies and the statistical redundancies to minimize the number of bits needed to represent the input signal [19].
2.1.1
Lossless compression of digital audio
The objective of lossless audio compression algorithms is to ensure high fidelity be-tween the source and the compressed audio signals; Typically the compression ratio oscillates around to 2:1 and 4:1 depending if it is used in real time audio storage or streaming scenarios. Compressor is widely used in high fidelity recording in profes-sional surroundings.
Large number of codecs have been developed following the philosophy of lossless audio compression such as [22] [23] [24] [25] [26] [27]. Lossless compression basically takes into account the redundancy and correlation present in the audio signal to reduce the number of bits used to represent it.
Correlation in audio samples has and special relationship with the amount of information that needs to be codified, due that as higher the association between two audio samples fewer symbols to be encoded. It depends and efficient coding scheme. Figure 2.1 is an overview of the three phases used by most of the lossless compression algorithms.
2.1. Audio coding 14
Figure 2.1: Basic operations in most lossless compression algorithms : (1) framming and quantization of the input signal, then (2) decorrelation from 1 to n audio-channels, and finally (3) uses the entropy to codify the processed signal. Adapted from [21].
Framing or Windowing
Statistical analysis of the audio signal demonstrates that it is not stationary1, even if it could be viewed stationary in short time periods exactly close to 20ms. Thus, framing is considered as one of the most significant and critical operations to ensures stationarity in the audio signal processing; It consists in divide the input signal in a constant or dynamic time slots, commonly limiting the signal in band using an anti-aliasing filter [57]; This practically passes the input signal into a windows of constant amplitude limited in time that separate the signal into independent frames of fixed or dynamic duration always close to 20 ms or most general around 13 to 26 ms. It correspond to n = tsfs samples, where ts is the sample period and fs is the
fundamental frequency of the original signal [19].
During the framing operation it is essential to consider how to subdivide the input signal into frames in order to avoid the presence of harmonics2 because that introduce distortion in the signal especially in the frequency domain, The audio signal processing theory exceed this limitation making use of an overlapped windows that has been well studied in the literature under the transform based analysis such as Fourier or wavelet transforms. There is possible to find plenty of overlapping windows that are characterized either in time or in frequency and their selection depends clearly on the application: sometimes is selected statically but could be also dynamic. In this case, multiple overlapped windows are used varying according to the characteristics of the input signal [32]. Figure 2.2 represents the framing processes pictographically.
1A stationary signal is one whose statistical properties such as mean, variance, autocorrelation,
etc. are all constant over time
2A harmonic is a signal or wave whose frequency is a (whole-multiple) of the fundamental
frequency
Figure 2.2: illustrative representation of the framing operation: Here The audio signal is divided into 6 frames with a 50% of overlap between a pair of frame intervals. adapted from [28]
Sampling and Quatization
In order to represent an analog signal continuously in time into a digital signal, it is necessary to perform two task. The first one sampling, which is the process needed to obtain the value of the continuous time signal at a regular time intervals Ts. It is
called after the sample period and it is conditioned by the Shannon theorem which claims that a continuous time signal can be reconstructed exactly from its samples n when the higher frequency fh present in the signal is lower than a half of the
sampling frequency fs( Also known as Nyquist frequency) which corresponds to the
inverse of the sample period.
fh ≤
fs
2 (2.1.1)
Once the analog signal is sampled. The second task consist is mapping the sam-ple sequence of numbers on a continuous range of values. There are infinitely many possible values, and the quantization is the process remapping these numbers to a representation of a finite number of digits (digital representation) such that it can be stored and processed by any computer.
The process of quantization causes loss of information since a digital represen-tation is not able to map exactly the set of infinite possible values of the signal. In audio signal processing theory typically the quantization noise is the major cause of distortion in the coding process [28] [29]. Therefore, the strategy that has been used in the literature is to identify the audio signal frames as a random process in terms of its probability density function and then use a scalar or vector quantization methods in order to produce a closed discrete representation and reduce the signal
2.1. Audio coding 16
to noise ratio3 [19].
Various alternatives have been investigated in scalar quantification all of them are variations of the classical uniform pulse code modulation (PCM) [28]. The most basic solution consists in quantizing the amplitudes of the sampled signal in a con-stant step size 4. Ideally, this method is able to provide a signal to noise ratio characterized by SN Rpcm = 6.02Rb+ Ki(dB) [30]; Where Rb is the number of bits
used by the quantize and Ki is the quantization step size. Deferential PCM [33]
is a quantization method that exploits the autocorrelation between two consecutive samples. Instead of quantizing the amplitude of the signal, this method quantizes the variation in the amplitude between two consecutive samples which is smaller than the amplitude itself [30]. Therefore, this method improves the quantization noise introduced by the quantizer, since it reduces the number of bits (lest informa-tion to transmit ) and possibly also the quantizainforma-tion step size [19].
The technique called vector quantization is a natural extension of the scalar quantization applied over a multidimensional vector of data, instead to a single scalar value [31]; Therefore, vector quantization is a form of pattern recognition, where an input pattern si = [si(0), si(1), si(2), ....si(N − 1)]T is approximated to one
predefined set of patterns, templates or codewords ˆsi. It is estimated through the
distortion metric ; Typically, calculated as the sum of squared errors between the original and the approximation vector [19] [34].
Vector quantization and some of its characteristics such as the use of the look-up tables to store the codewords, impact gracefully in a decrease in the number of bits used to codec/decoded, so and therefore consequently reduced the coding complexity. It has been stated that in the last few years vector quantization has become an important technique for data compression in speech recognition where it also employes lossy compression techniques. Figure 2.3 Represent a very typical Vector quantization approach.
Intra-channel decorrelation
As already mentioned lossless compression extracts the audio signal redundancy that in case of multi-channel is also known as intra-channel decorrelation, where the idea is to take advantage of the mutual information between the signal samples
3Signal to noise ratio (SNR) : is a measure that compares the level of a desired signal to the
level of background noise.
Figure 2.3: Vector quantization approach: where a scalar input is agrupated in a L-Dimensional vectors. Then the quantization consist in associated a closest M L dimensional code-vector that minimized the Rate-Distorition metrics, once is found its correspondently index is forwarded to the decoder that execute the inverse operation. of course both coder and decoder must to have the same codebook definition . adapted from [28]
of each channel. Two categories have been widely recognized. One is prediction based that consist of obtaining a close approximation of the input signal and then compute a low variance prediction residual that later can be coded as efficiently as possible according to its entropy. The other is transform-based characterized for decorrelate frame samples using, for example, Fourier based transforms like the modified discrete cosine transform (MDCT) whose use is more popular in lossy compression algorithms. Therefore, the lossless transform base strategy refers to codified the difference that exists between the original signal and the result obtained to apply a lossy compression representation.
Prediction Based
The Figure 2.5 represents the prediction based model where the error signal e[n] is the difference between the predictionX[n] and the original signal X[n].ˆ
Where X[n] is obtained as a function of the past entries of the original signal i.eˆ ˆ
X[n] = f (X[n − 1], X[n − 2], X[n − 3]...X[n − nf]).
The prediction block is implemented using several mathematics solutions. For example, in the GSM standard, the linear predictive model (LPC) uses an autore-gressive method which basically is a combination of a finite impulse response (FIR) and infinite response (IIR) filters [35]. However given the computational efforts
2.1. Audio coding 18
Figure 2.4: intra-channel decorrelation prediction based: Here the error e[n] is estimate as the difference between the original audio signal X[n] and an approximation signalX[n]ˆ adapted from [35]
associated with the implementation of (IIR) filters, FIR has enjoyed of a great repu-tation, nevertheless standards like [DVD] [22] adopt the IIR and demonstrated that IIR prediction schemes archive an outstanding performance in comparison with sim-ple FIR scheme [19].
On the other hand, audio codec solutions like AudioPaK [22] and SHORTEN [26] use a prediction based on fitting curve also called interpolation and least squares that use simple polynomial approximation where fundamentally the predicted samples X[n], X[n + 1], X[n + 2] and so on, are calculated as an L-order polynomial to the last L points. This method chooses the polynomial with the close similarity to a particular frame as the estimated frame; Of course, this method does not always brings a great data rate compression; Since it depends on the statistics used to approximate the input signal [36]. Figure 2.3 (b) brings a photographic representation of this method.
Transform Based
Some authors such as LTAC [23], IntDCT [37] employ different coding alternatives taking advantage of the results coming from lossy compression algorithms. Remark-ably, the synthesis and analysis process is done over the input signal in the frequency domain. For example, psychoacoustic analysis, filter banks, pre-filtering windowing and so on [19] [38]. The principal advantage of using a transform based solution is the favorable increment in the compression ratio. Fundamentally, a perceptual model of the input signal is generated, selecting the relevant parameters, and the distortion linked to the input audio signal, and then codified. Subsequently, the decoder is able to replicate an exact copy the input signal based on that generated perceptual model and some extra configuration normally defined statically.
(a) FIR/ IIR predictor model
(b) Polynomial approximation of x(n)
Figure 2.5: (a) FIR/ IIR predictor model used by DVD standard, adapted from [21] (b)fitting curve and polynomial model used for AudioPaK [22] and SHORTEN [26] adapted from [19]
Just to clarify consider the Lossless Transform Audio Coding (LTAC) that uses the discrete modified cosine transform (MDCT) for describing the energy of the input signal in the frequency domain. The LTAC quantize the coefficients outcom-ing of the transformation, but also the quantification error measure within a local decoder. Using these two parameters it is possible to create an exact replica of the input signal [19].
The principal disadvantage of transform based technique is the coding complex-ity due to the fact commonly these solutions require a substantial increment in more complex mathematical operations, and demanded as well the inclusion of a local decoder code in the coder. Therefore, usually, a tradeoff exists between the compression ratio and the coding complexity.
Entropy coding
Once the redundancy of the input signal is derived, it is necessary to codify the result with an efficient coding scheme. For that, some entropy coding algorithms are used, for example, a Lempel-Ziv coding [39], Huffman coding [40] [41], Rice cod-ing [42], Run length codcod-ing [43], Golomb codcod-ing [44], or High order context modelcod-ing coding [45]. These coding schemes have an important role in the audio compression since for example classic solutions such as *.zip, *.rar, provides a compression ratio
2.1. Audio coding 20
of no more than 10% of the original size, while through the use of Rice or Huffman coding, it is viable to compress the audio in not more than 50%.
Entropy coding is one of the relevant topics in the information theory. The ob-jective is to decrease the average code length and guarantee a perfect reconstruction. In of the audio signal from its code representation. In other words, the objective is to provide the minimum redundancy encoding in such as way that the code is unique decodable. The goal is usually achieved by mapping a set of information symbols into codes according to their appearance frequency.
From the Shannon results, it is possible to define the minimum number of bits required to encode a signal, which is given by the entropy value He(x), also called
uncertainty. The entropy of an input signal X could be defined as a follows. Given a set of samples of the input vector X = [x1, x2, x3, ..., xN] of length N , and pi as
the probability that the i-th symbol of the symbol set, (V = [v1, v2, v3, ..., vk]) is
transmitted, then the entropy is given by:
He(X) = − K
X
n=1
pilog2(pi). (2.1.1)
Unfortunately, a conventional entropy coding is not enough since it is also influ-enced by several factors such as the quantization noise, or the masking effects such as tone or presence of harmonic, that must be taking into account
Diverse codecs such as intMDCT [37], MPEG-I [46], DVD [47] employ the Huff-man coding because of its simplicity [40]. Statistical analysis of the audio signals confirms that it is characterized by a Gaussian distribution in a shorter frames lengths, while in longer length frames it is characterized by gamma or Laplacian densities. Hence, the usage of Huffman codes with Gaussian or Laplacian densities could ensure a minimal redundancy entropy [19]. Which commonly depends on the probability associated with the appearance of each symbol in the original audio file. Rice code consist in represent a number I into four parts (signed bit, m-LSB, and the unitary representation of the remaining MSB bits, and finally a stop bit ) where m is defined as log2(loge(2)E(|x|)). The main characteristic that Rice code offers is its efficient performance in cases in which the input signal exhibits the Laplacian distribution in long length frames [48], this justifies their use in codecs like SHORTEN, and LTAC [19].
AudioPack coding reports fascinating results using a technique known as Golomb November 27, 2017
which enhance the use of prefix codes in audio signal processing. The Golomb coding represents an integer number ”I” using a free parameter ”m,” in two parts: the binary representation of (m mod I) and the unary representation of mI following by and stop bit, those makes Golomb based codecs optimal for exponential decaying distributions such as occurring in audio frames characterized by short lengths of positive integers [19].
2.1.2
Lossy compression of digital audio
Lossy compression has been extensively used in multimedia applications since it can achieve high compression ratios. The compressed signal obtained from lossy com-pression algorithms is not the same as the original but percentually close to. In this case, the distortion metrics produce a quantitative measure of how close the approximation is to the original signal [49], [50], [51].
The mean square error (MSE) [52] is a distortion metric often used to estimate the average square of the errors, generally used for imaging and speech compression. For streaming multimedia data it is most frequently used the signal to noise ratio (SNR) [19], that measures the relative error of the signal, or also the peak signal to noise ratio (PSNR) [53] to representing the error corresponding to the peak value of the signal [19].
In contrast to lossless, lossy techniques employ large transform coding algorithms such as MDCT, Fourier or wavelet Transforms, and employees as well vector quan-tization to reduce the coding complexity, A new promising audio codec, known as OPUS, has been designed as and hybrid compressor, that is it has been built based on both lossless and lossy compression techniques to expand their use either as high fidelity audio compression or as speech coding for multimedia streaming applica-tions [58].
2.2
Audio Coding Attributes
The performance of speech codecs lies in a set of properties and attributes, where the most relevant are bit rate, speech quality, robustness & reliability, delay and computational cost. Unfortunately, in general, we must look for a trade-off between two or more conflicting objectives, when a good performance of some attributes implies a lower performance for others. This section describes the most important attributes and its context of relevance.
2.2. Audio Coding Attributes 22
2.2.1
Rate
The rate of the speech measured as the average number of bits per second, where two strategies have been adopted in a wide number of coders: fixed-rate coders in which the data rate is constant for each encoding block and variable-rate coders the number of bits per second changes over the time [91].
Traditionally, coders with fixed-rate has been more popular because they were initially developed for circuit switching communication systems, and because a priori knowledge about the bit allocations has a significant impact in the design and the structure of the codec. Consider for example that the transmission of quantization indices is trivial [91].
In variable bit rate, coders are increasingly important with the emergence of packet-switched communication systems. Variable-rate coders lead to higher coding efficiency, and are very common in applications that combine audio or-and video signals since these require to face low rates, delays, and constrained packet payload. In this type of coders the bit allocation within a particular block for the parameters or variables depends on the signal, and vary with the quantization index and the mapping from the quantization index as a particular codeword is performed using table lookup or through complex computations [91].
2.2.2
Quality
Lossy and lossless techniques in practical scenarios always must manage to reduce the precision used to represent the sound signals and the internal codec parameters. This, clearly reduces the precision of the reconstructed signal that is not more a perfect copy of the original digital signal. It is, therefore, fundamental to ensure that its quality meets certain standard [91].
The literature mentions two principal standards: overall quality and distortion measure, The former can be obtained directly form scoring of speech utterances by humans, commonly using standardized listening test, where listeners are asked to score an utterance on the absolute scale. The most common overall measure is the mean opinion score MOS [92]. The latter is used to decide how to encode each signal block (typically of duration 5-25 ms) but it is also used during the design and training of codebooks. Commonly the square error criterion is more used since it facilitates a fast evaluation for coding purposes [91].
2.2.3
Robustness and Reliability
Robustness and Reliability solutions are highly demanded in modern communication systems due to the fact that bit error and packet loss often occur. Multiple high-performance solutions such as source-channel codes are not considered in wireless communications, because of the considerable modularity reduction incurred. There-fore error control usually resides in the transport layers that in fact, commonly use automatic repeat request, e.g. in (TCP ) which is not appropriate for real-time com-munications. Thus, the most appropriate solution looks like those defects must be managed directly by the codec ( or the application layer ), as an example, using, the redundancy of the signal or the perception of distortion by the user [91].
2.2.4
Delay
Practical audio signal coding implementations require a delay during the transfer of messages. However, long delays are impractical because they are not tolerated by real-time applications even in those case they are associated with complex but useful computational methods. [91].
Generally, in real-time applications, a high-performance codec aims to keep the one-way delay below of 100ms, and although 200ms is considered a useful bound. This happens because significant delays dramatically affect the quality of the con-versation. However, at delays above to 20ms an echo is perceived and therefore low delay codecs have been designed to keep the effect of such echo to a minimum [91].
2.2.5
Computation and Memory Requirements
Audio Coders adoption depends on the computational an memory requirements of the whole coding system; The principal performance metric is the computational complexity measure regarding the number of instructions required to process the codification-de-codification on a particular silicon device. [91].
Fixed point operation count, as one of the complex factors that need particular attention when the codecs are designed. The optimization of this factor significant development effort compared to the floating-point count parameters. On the other side, vector quantization is an effective technique that finds an optimal rate quality against quality trade-off. Unfortunately, these mentioned techniques require very high computational cost and its use commonly result in impractical codecs. Thus,
2.3. Embedded Systems: Programming Models For Real-Time
Multimedia Applications 24
during the past two decades, a research for more efficient techniques able to face the computational cost and quality have been meant without significant advances. [91].
2.3
Embedded Systems: Programming Models For
Real-Time Multimedia Applications
Embedded systems are a mixture of software and hardware used to process infor-mation in a hundred of applications. They have been perfectly integrated into the daily life, automatization, and telecommunication systems. For example, they are capable of controlling the mechanical components of the engine, brakes, seat belts, airbag, and audio system in a vehicle, or monitoring the energy production in a power factory [54].
Technically most modern embedded systems are often equipped with a single pro-cessor, equipped with a set of peripherals such as timers, analog to digital converters, flash, ROM, and RAM, all of them connected through physical buses. Commonly they have been dedicated to solving a single job. This has provided them the pos-sibility to be adjusted and designed to reduce the cost of the products. [54] [55].
However, only a few years ago the research relative to embedded systems has moved towards the more recent trends in the design of sustainable IoT platforms; The computer science community recognized that the engineering techniques re-quired for analyze, and program these systems are different, mainly due to their reduced resources: processing capabilities, memory size, or power consumption.
Since embedded systems have been designed to solve a particular task, many embedded system solutions are based on the idea of a cyclic execution or ”super loop” running sequentially on a bare-metal CPU an undefined number of tasks within of a “while(true) do” loop, without concurrency support, this strategy is suitable for low complexity applications. However, when the complexity of the applications increases, particularly, when the correctness of the computation does not depend only on the logic results, but also of the time in which the results are produced, the embedded real-time operating system provides a pretty stable an mature solution, as they can handle concurrently multiple parallel events, switching, and scheduling the CPU.