Video Streaming Authentication Combining Signature Amortization and Watermarking

(1)

Gabriele Oligeri

(2)

(3)

Acknoledgements

I would like to thank my supervisors, Dr. Erina Ferro and Prof. Gene Tsudik. They have given me a very rewarding research experience. I deeply appreciate their supervision, mentoring, and understanding.

I would also like to thank Dr. Stefano Chessa, Roberto Di Pietro, and Gaetano Giunta for their help and their valuable comments on my papers.

I would like to express my thanks to all the members of the WnLAB who have helped me directly and indirectly in accomplishing this work and giving me a leaning environment to grow me personally as well as professionally.

(4)

(5)

Abstract

Video broadcasting is one of the most important services in many environments. In order to enable a widespread and trusted streamed media dissemination, in-tegrity and authenticity guarantees are necessary in this scenario. Users need assurance that the multimedia content is originated from the supposed sender and malicious entities cannot replace parts of the stream with different mate-rial.

The contribution of this work is twofold: a theoretical framework to watermark embedding and extraction procedures, and two different approaches to video stream authentication.

A theoretical analysis of the watermarking embedding and extraction proce-dures is presented. Simulation results are shown to support the analysis, and a set of optimal embedding parameters have been drawn to maximize the mark extraction ratio minimizing the video pictures impairments.

This work also presents two different solutions to video stream authentica-tion. A watermarking technique has been combined with two different signa-ture amortization schemes obtaining two different approaches to video stream authentication. In the first case, a redounded hash chain has been adopted, achieving a bit-by-bit authentication of the video stream. In the second case, authentication of features extracted from each video picture has been per-formed. A feature extraction algorithm has been introduced, and the TESLA packet level authentication algorithm has been integrated with the watermark-ing algorithm.

Simulation results of both the authentication procedures are presented prov-ing the robustness to packet loss, transcodprov-ing algorithms (MPEG2 like), and finally, to commonly used attacks like superimposing or picture-in-picture.

(6)

(7)

Sommario

La trasmissione di contenuti multimediali e’ uno dei piu’ importanti e utiliz-zati servizi di cui oggi e’ possibile usufruire sulla rete Internet. L’autenticita’ e l’integrita’ dei contenuti diventano caratteristiche imprescindibili al fine di permettere la diffusione di tali servizi. L’utente necessita della garanzia che il contenuto multimediale di cui sta usufruendo provenga dal sito che lui sup-pone e che nessuno, durante la trasmissione, sia potuto intervenire sul video con manipolazioni di qualunque genere.

Il contributo di questo lavoro e’ duplice: un’analisi teorica degli algoritmi di inserzione ed estrazione relativi ad una procedura di watermark, e due nuove procedure per l’autenticazione della trasmissione di contenuti multimediali. L’analisi teorica dell’algoritmo di watermarking viene corredata con risultati simulativi che ne dimostrano la correttezza. Inoltre, viene proposto un insieme di parametri da utilizzare al fine di massimizzare la probabilita’ di estrazione del watermark e minimizzare l’impatto sulla qualita’ del video.

Questo lavoro presenta due nuovi algoritmi di autenticazione di contenuti mul-timediali. Una prima soluzione riguarda l’autenticazione bit-a-bit ottenuta attraverso una catena di hash ridondata. Un secondo approccio combina il pro-tocollo di autenticazione a livello di pacchetto TESLA con un algoritmo di es-trazione di caratteristiche dell’immagine al fine di ottenere non l’autenticazione della sequenza di bit ma del suo contenuto in termini di caratteristiche estratte. Vengono proposti i risultati delle simulazioni riguardanti la robustezza degli al-goritmi proposti alla perdita dei pacchetti, agli alal-goritmi di transcodifica, ed infine, a comuni procedure di attacco quali la sovrapposizione di immagini o l’inserzione di nuove immagini all’interno di quelle autentiche..

(8)

Chapter 1 Introduction

1.1 Motivations

Every day large amounts of multimedia data are distributed throughout the Internet, while at the same time, advances in computing hardware, software, and networks have created threats to copyright protection and content integrity. Moreover, popu-lar video editing software permits to easily tamper with video contents, thus making video authenticity unreliable.

Easy editing and perfect reproduction in digital domain make the protection of own-ership and the prevention of unauthorized tampering of multimedia data (audio, image, video, and document) important concerns. Digital watermarking and data hiding techniques are well known schemes to protect copyright of still images and video contents. Embedded nd data plays the role of copyright notices, which cannot be removed by an adversary without destroying (or severely damaging) the document itself. In recent years, embedding secondary information in multimedia contents has made considerable progress and attracted attention from both academia and indus-try. Data hiding has been found useful as a general tool to send side information in multimedia communications for achieving additional functionalities and enhancing performance, i.e. movies subtitles, financial information, device control, upgrading of legacy systems, etc. Moreover, modern applications of data hiding involve copyright protection, traitor tracing, content authentication, and media forensics. In fact, in-formation hiding techniques can be also leveraged to communicate end users security information. Several techniques have been proposed for a variety of applications, including source authentication, ownership protection, data integrity, access control. By this way, end users can enjoy both the copyright protection inherent in informa-tion hiding techniques and source and content authenticainforma-tion transmitted in the side communication channel.

Hybrid techniques, i.e. that one leveraging watermarking to transmit security in-11

(12)

12 CHAPTER 1. INTRODUCTION formation have strong trade-off to undergo. In fact, information hiding techniques have to deal with quality degradation, embedded data length, robustness to both transcoding algorithms and geometrical distortions; whereas these kinds of approach are completely independent of the network architecture and of the end user hardware equipment. Generally speaking, increasing the quantity of the embedded information decreases the quality of the video stream, that is, the more the bits are changed by the embedding process, the more the video pixels are affected by impairments. Whereas, hiding information techniques work at application layer, and no matter which kind of security framework is adopted in the lower communication layers: when the multi-media content is transcoded to be compliant with the network it has to go through, the embedded authentication information will follow it. Moreover, end users are not asked to upgrade their hardware to play the video, just a software upgrade is needed, that is, users not authorized or not compliant with the new architecture will con-tinue to enjoy the video but not the authentication algorithm. This thesis addresses the issues regarding data hiding and its applications in the field of video streaming authentication.

1.2 Thesis overview

This work presents a formal definition of a watermarking procedure, i.e. both the embedding and the extraction algorithm are described. We introduce the algorithms, and we provide a theoretical analysis of the procedures. Then, we present a theoretical framework for mark extraction probability, and we draw a set of optimal parameters to be used for guaranteeing the best trade-off between quality and mark extraction probability.

We present two different authentication algorithms: while the first one aims at hard authentication, the second one achieves fragile authentication. The former is char-acterized by a bit-by-bit authentication of the video pictures, i.e. if one only bit is corrupted during the video transmission, the authentication procedure fails; the later is based on features authentication of the video pictures, i.e. a features extraction algorithm that extracts features from video pictures is presented, and an authentica-tion algorithm that guarantees the features authenticaauthentica-tion is depicted.

We introduce a preliminary authentication scheme that leverages the embedding of a simple hash chain to achieve the hard authentication of the video pictures. Subse-quently, a specifically designed hash chain robust to packet loss is presented, and its effectiveness proved in a simulated corrupted channel.

Finally, we present an authentication procedure robust to transcoding algorithms (MPEG2/4 like), and we show the results of an extensive simulation campaign to prove the robustness of the proposed approach against corrupted channels, bit/rate changes, and frequently used attacks.

(13)

1.3. THESIS OUTLINE 13

1.3 Thesis outline

Chapter 2 provides a review of the current literature on the related topics. Section 2.1 reviews the most important works related to watermarking, while section 2.2 gives an overview of the most important packet level authentication algorithms.

Chapter 3 introduces the system model with a detailed description of two scenarios that we take into account in this work.

Chapter 4 presents the description of the watermarking algorithm, with the formal-ization and the mathematical description of both the embedding and extraction pro-cedures. After a brief description of the most important watermarking requirements, a list of definitions is presented in section 4.1, and finally section 4.2 and 4.3 show the details of the embedding and extraction algorithms, respectively.

Chapter 5 presents the theoretical analysis related to the mark extraction probability as function of the embedding parameters. Simulations have been performed to draw optimal parameters for the embedding process with the aim to maximize the mark extraction ratio.

Chapter 6 shows the watermarking performance in terms of quality degradation and mark extraction ratio. A quality metric has been defined and used to asses the best trade-off between mark extraction ratio and quality degradation. We take into ac-count two different scenarios, i.e. watermarking performance using both raw video samples and MPEG-2 coded video samples, highlighting the influence of MPEG fam-ily coding algorithm in our watermarking algorithm.

Chapter 7 briefly introduces the idea of a hybrid authentication scheme that leverages both watermarking and authentication schemes. This approach belongs to the hard authentication family algorithms, i.e. the video stream is authenticated bit-by-bit, if one only bit of a video stream picture is corrupted, the authentication of the whole video sample is broken.

Chapter 8 presents a hard authentication algorithm that can deal with noisy channels. Only two hash chains are used leveraging the watermarking ability of guarantee mark extraction also in presence of minor picture corruptions. Simulations are presented to show the bit error rate robustness of the proposed solution.

Chapter 9 introduces a fragile authentication algorithm able to authenticate features of a video sample. Robustness to bit error rate is reached combining the watermark-ing procedure to the TESLA authentication protocol. A feature extraction algorithm is presented, and simulations related to both transcoding robustness and tampering attacks are shown.

(14)

(15)

Chapter 2 Related Work

2.1 Watermarking

Several methods have been designed to verify the originality of multimedia contents and protecting copyrights.

Steganography stands for techniques that allow secret communication, usually by em-bedding or hiding the secret information in other, unsuspected data. Steganographic methods generally rely on the assumption that the existence of the hidden commu-nication is unknown to third parties and they are mainly used in secret peer-to-peer communications between trusted parties. As a result, steganographic methods are in general not robust, i.e., the hidden information cannot be recovered after data manipulation.

Watermarking, as opposed to steganography, has the additional notion of robustness against attacks or not authorized manipulation. Even if the existence of the hidden information is known, it is ideally impossible for the adversary to destroy the embed-ded watermark, even if the watermarking embedding procedure is known.

Data hiding and data embedding are used in varying contexts, but they typically de-note applications where the existence of the embedded data is publicly known, but there is no need to protect it. This is typically the case of the embedded transmission of secondary information or services that are publicly available and do not relate to copyright protection or conditional access functionalities.

Fingerprinting and labeling denote special applications of watermarking. They relate to copyright protection applications where information about originator and recipient of digital data is embedded as a watermark. The individual watermarks, which are unique codes out of a series of codes, are called ”fingerprints” or ”labels”.

Visible watermarks, as the name says, are visual patterns, like logos, which are in-serted into or overlaid on images (or video), likely visible paper watermarks. Visible watermarks are mainly applied to images, for example, to visibly mark preview im-ages available in image databases or on the World Wide Web in order to prevent people from commercial use of such images.

(16)

16 CHAPTER 2. RELATED WORK

Digital watermarking was the first solution developed to prevent the abuse of digital images. Many digital watermark schemes, such as spectrum watermarks [1], quan-tization watermarks [2], and blind detection watermarks [3], have been proposed to protect digital images. A digital watermark is basically an identification code that carries information about the copyright owner. It can be invisible and permanently embedded in digital data for copyright protection, source authentication, and con-tent integrity verification. Generally speaking, the effectiveness of a watermark-based copy detection system depends to a large extent on the robustness of the associated digital watermarking method [4].

Many watermarking techniques have been proposed in the past decades, which can be classified into two families: spatial-domain and transform domain techniques. Ac-tually, transform domain techniques are more popular due to the higher robustness to image processing.

An earlier technique proposed by Cox et al. [2] selects 1000 Discrete Cousine Trans-form (DCT) coefficients for embedding the watermark to achieve the requirements of robustness and imperceptibility. He proposes a watermark whose structure consists of k random numbers drawn from a distribution. The length of the watermark is vari-able and can be adjusted to suit the characteristics of the data. Each coefficient in the frequency domain has a perceptual capacity, that is, a quantity of additional informa-tion can be added without any (or with minimal) impact on the perceptual blindness of the data. Authors recommend that the watermark be placed in the perceptually most significant components of the image spectrum. Experiment results show that this algorithm can extract a reliable copy of the watermark from images that degraded with several common geometric and signal processing procedures. These procedures include translation, rotation, scale change, and cropping. The algorithm also displays strong resilience to lossy operations such as aggressive scale changes, JPEG compres-sion, and dithering and data conversion. The main defect is that the scheme of Cox et al. requires the original image in order to extract the watermark.

Wang et al. [5] proposed a wavelet-based watermarking scheme, which embeds the scrambled watermark into the middle frequency of a wavelet domain. The watermarks used in Wang et al’s algorithm are real-numbered images transferred from binary im-ages. The detailed steps of their watermark embedding algorithm are illustrated as follows: firstly, based on the size of the image and the watermark pattern, the number of decomposition levels L is determined; secondly, 2L + 2 sets of orthonormal filter banks are randomly generated and the wavelet decomposition pyramid is chosen also. Lastly, the analysis filters are used to decompose the image, and then the coefficients of selected middle frequency band are replaced by the watermark image. It basically meets the requirements of the security and blindness. The main defect is that when their scheme suffers from some serious attacks, the extracted watermark is ambigu-ous.

Various solutions have been proposed for inserting data into pictures of a video stream. Authors in [6] proposed methods for embedding additive digital watermarks into un-compressed and un-compressed video streams. The first method regards inserting

(17)

se-2.2. AUTHENTICATION 17

quences of repeated bits (the redounded mark) in the spatial domain of the video image after their amplification and modulation by means of pseudo noise sequences. They also presented a marking procedure in the MPEG-2 bit stream domain, con-cluding that in that case watermarking scheme is less robust than its counterpart in the pixel domain.

Authors in [7] proposed a spread spectrum watermarking scheme by embedding data in perceptually insignificant DCT coefficients of the cover image, and they presented a methodology that can be generalized to audio, video, and multimedia data. The basic idea is to spread narrow band watermark signal over many frequency bins of the host image by using pseudo random spreading sequences, so that the watermark energy content for each bin becomes small and can be hardly detected. At the same time, any attempt to remove watermark causes image impairments to an extent that fails to preserve the acceptable quality of the watermarked image.

2.2 Authentication

In this section we present an overview of various solutions to multicast video stream-ing authentication based on hash and signature amortization at packet level. In [8] authors proposed a solution to authenticate multicast communications through a shared key mechanism where each member has a different set of keys. Each sender holds a set of l keys and attaches to each packet l MACs, each one computed with a different key. Each recipient holds a subset of the l keys and verifies the MAC accord-ing to the keys it holds. In this schema, the communication overhead is important and the security is only defined up to a coalition of w malicious recipients forging data for a chosen recipient.

A different approach to multicast authentication is an efficient and simple scheme that guarantees authenticity, integrity, and non repudiation of a packet sequence by using a digital signature for the first packet and propagating authentication through a hash chain. In [9] authors present a procedure in which the sender signs the first packet, then it ties in subsequent blocks in a way that guarantees the packets au-thenticity. For a finite and known sequence of packets, the sender inserts the hash of each block into the block that precedes it. This solution does not tolerate packet loss, because losses break the chain, thus preventing next authentications; moreover, this solution applies only if both sender and receiver are able to buffer in advance the entire packet sequence. Various solutions have been proposed for tackling packet loss phenomena.

In [10], the authors introduce a packet stream authentication protocol that transmits authentication information along with data packets. The authentication information is encoded by means of erasure codes in order to ensure that such information re-mains available even in presence of packet loss. This solution is designed to tolerate an extremely high packet loss (around 20%) at the cost of a very high bandwidth overhead (around 80%). Our approach differs from [10] since we design an

(18)

authenti-18 CHAPTER 2. RELATED WORK cation protocol that is transparent to the lower level communication layers, and we address applicative scenarios (such as satellite communications) in which such a high packet loss tolerance is not required (in satellite communications the expected packet loss is very low [11]). Other solutions are given by redundant chains, like in [12], [13], [14], or based on implicit key authentication [15].

Recent papers have proposed the usage of hash chains to authenticate data flows in multicast environments. In [16], authors proposed a novel signature model for multicast authentication combining one-way hash chains and one-time signatures. In [17] authors introduced a new signature scheme for broadcast authentication tailored towards peer-to-peer systems to overcome limitations of traditional approaches based on signature schemes like RSA and DSA. All these solutions do not leverage the video capability of carrying security information to the end user in a transparent way. The result is a bandwidth overhead due to the authentication information appended to each packet. Moreover, end-users need a receiver hardware compliant to the spe-cific authentication algorithms, otherwise it is not possible to decode and enjoy the authenticated packets.

2.3 Hybrid Solutions

All the previous solutions do not leverage the capability of the video stream to carry security information in a transparent way. The result is: (i) the necessity of an ad-hoc hardware/software infrastructure to deal with the authentication protocol, and (ii) a bandwidth overhead due to the authentication information transmitted out of bandwidth.

In [18], authors proposed a new approach by leveraging the synergy among crypto-graphic signature, forward error correction (FEC), and digital watermarking. They proposed to embed a redundant hash chain in the video stream by means of a wa-termarking technique. The hashes belonging to the chain are computed over features extracted from the video stream pictures. This approach guarantees authentication resilience to commonly used transcoding techniques. This weak (fragile) authentica-tion scheme is suitable for their scenario, in which the video stream is transmitted by different sources, it goes through different heterogeneous networks, and finally it is received by different entities, like handled devices, laptops, decoders, etc.

A video multicast instant source authentication model based on digital video water-marking and TESLA is proposed in [19]. The authentication information associated to the current picture is directly embedded in the Modified Look-Up Table (MLUT) of the current I-Frame without coding/decoding the MPEG-2 video stream. This approach allows an instant authentication at the receiver side of the current I-Frame but it cannot work if the video stream is transcoded between the sender and the receivers. In fact, the transcoding process wipes out all the data structures of the compression algorithm. Moreover, the previous approach has a strict dependency on the adopted compression algorithm; changing the compression algorithm means to find a new way to embed authentication information in its data structures.

(19)

2.3. HYBRID SOLUTIONS 19

In [20], [21], [22] authors presented an algorithm for multicast video streaming au-thentication over satellite networks. They proposed to embed simple hash chains by means of a watermarking technique, and they tested mark robustness to a common coding algorithm like MPEG-2.

(20)

(21)

Chapter 3 Scenario and adversarial model

We consider a specific scenario of a broadcast satellite network, and then we introduce additional cryptographic constructions extending the original scenario to a hybrid network constituted by a satellite and a terrestrial wireless network.

3.0.1 Satellite network

Multicast satellite networks present peculiar characteristics like high round trip time, no packet reordering, and low bit error rate (BER). For instance, geostationary satel-lites, placed in equatorial orbit at 36,000 km altitude, introduce a transmission delay between transmitter and receiver of about 250 ms, while a satellite link presents a low BER of about 10−8 _{in the majority of cases [11]. This low BER is due to design} requirements and structural constraints. Taking into account the low BER of the satellite channel and the fact that a satellite link behaves as a FIFO pipe (no packet reordering), chapter 7 introduces a novel scheme for authenticating multicast video streaming that does not affect the available satellite bandwidth. Whereas, chapter 8 introduces a different cryptographic construction specifically designed to deal with bursty error channels which are typical in satellite networks, e.g. when the weather conditions are bad.

In a typical multicast satellite network both transmitter and receiver perform pre-processing elaborations, like MPEG-2 coding/decoding, before transmission and after reception, respectively. The use of MPEG-2 algorithm [23] is justified if we assume that the satellite network uses DVB-S standard [24].

In our model, the sender preprocesses the uncompressed video stream by embed-ding the authentication information by means of a watermarking technique and then performs MPEG-2 compression (set up phase). Later on, the preprocessed video stream is transmitted to the various receiver stations through the satellite network (broadcast phase). It is worth noticing that using watermarking to embed authenti-cation information within the video stream avoids any bandwidth overhead; indeed, the authentication information introduces just transparent modifications to the video

(22)

22 CHAPTER 3. SCENARIO AND ADVERSARIAL MODEL pictures (that is, modifications that will not affect the perceived quality of the video). Finally, each receiver station decompresses the video stream and then authenticates it (verification phase).

3.0.2 Hybrid network

The second scenario we take into account consists of a real-time stream server that broadcasts a real-time video stream to several end-users belonging to different net-works. In general, different networks are characterized by different QoS constraints in terms of bandwidth, link latency, and packet loss. To meet these constraints the video stream undergoes through a transcoding process, that is, video stream charac-teristics have to be adapted to be more suitable for the network in which the video has to be broadcast. The authentication algorithm presented in chapter 9 works on the real-time stream server and embeds the authentication information in the video stream by means of the watermarking technique, with a negligible computational overhead (there is one only digital signature computation), and with no bandwidth overhead (the authentication information are embedded in the video stream and are not perceptible to the human eye). The authentication procedure has to be robust to packet loss, survive to transcoding processes, and finally, has to guarantee the video trustiness for clients who wants to see the video asynchronously.

3.0.3 Adversarial model

The adversary behaves in two ways: • Change the video stream content:

Video stream content can be tampered either replacing one or more pictures, or adding new malicious contents to the transmitted video.

• Change the video stream ownership:

Video stream ownership (copyright) can be tampered removing the existing authentication information from the video stream, and replacing them with new ones. In this way, the video content is still authentic but its ownership has been changed.

The adversary knows all the algorithms used both by the sender and by the receivers, he is able to detect and to read the watermark, but he cannot remove or change it. As it will be explained in the next chapters, reading and removing the mark implies the knowledge of a shared secret (the pseudo noise sequence used for embedding the mark by the sender), furthermore, the video stream content is always protected, but the ownership can be guaranteed only in that scenario in which the pseudo noise sequence is shared as a secret. Finally, the adversary can also add one or more own watermarks to the pictures belonging to the stream video.

(23)

Chapter 4 The watermarking algorithm

In the following sections we presents the algorithms for embedding and extracting the authentication information (the mark) from a video stream, W (◦) and E(◦) respec-tively. As previously discussed in the Chapter 1, our use of watermarking technique is not standard. Normally, it is acceptable that the extraction of the embedded logo be lossy, that is, the logo can be extracted with errors, provided that it remains recog-nizable. In our case, we use watermarking to embed the result of a hash computation. The extraction of the hash must not be affected by errors, otherwise the hash chain would break and the receiver could not authenticate the remaining video stream. For this reason, in this chapter we propose an embedding technique that provides error-free hash extraction.

Watermarking techniques exploit the implicit limitations of the human eye that is an imperfect detector. In particular, human eye is characterized by the masking phe-nomenon [25], that is, a component in a video signal may become imperceptible in the presence of another signal called the masker. Spatial patterns can affect the iden-tifiability of other features that are spatially close to them; for example, luminance edges and fine details reduce the identifiability of other signals around them. An interesting approach for splitting edges and fine details from other image com-ponents is the Discrete Cosine Transform (DCT) decomposition of the source image. This method assures robustness with regards to various types of compression methods like MPEG-2 or MPEG-4 [26]. Our use of DCT is different from MPEG-2 approach; watermarking algorithms like [27] apply DCT exhaustively over all 8 × 8 blocks of the video picture, and they insert few mark bits for each block. On the contrary, we extend concepts from still-image marking and we apply the DCT to the whole video picture by embedding the replicated mark in its medium frequency components [7].

Requirements

The most important properties of a watermarking technique are:

(24)

24 CHAPTER 4. THE WATERMARKING ALGORITHM • Robustness describes if the watermark can be reliably detected after media oper-ations. Whereas, robustness does not include attacks on the embedding scheme that are based on the knowledge of the embedding algorithm or on the availabil-ity of the detector function. Robustness means resistance, or common media operations. For example geometrical distorsions, superimposing, or embedding picture-in-picture images can all be considered as robustness attacks.

• Security describes if the embedded watermarking information cannot be re-moved beyond reliable detection by targeted attacks based on a full knowledge of the embedding algorithm and the detector, except the key, and the knowledge of at least one watermarked data.

• Transparency is based on the properties of the human visual system. A trans-parent watermark causes no visible artifacts or perceptible quality loss. • Complexity describes the effort and time the watermarking embedding and

re-trieval algorithm needs for encoding and decoding the JPEG images or MPEG streams. This parameter is essential if we have real time applications. Another aspect addresses if we need the original data in the retrieval process or not. • Capacity decribes how many information bits can be embedded. It addresses

also the possibility of embedding multiple watermarks in one document in the same time.

4.1 Preliminary Definitions

An uncompressed video stream can be considered as a three dimensional signal, where its components are a two-dimensional matrix (the picture) and the time. Each pic-ture is formed by a [r × c] matrix of pixels; each pixel expresses a color that can be represented by three values: R (red), G (green), B (blue) or, equivalently, by Y(luminance) and U, V (chrominance) components. In our algorithm, we consider the Y-U-V representation, and we embed a mark (hash) on each picture of the video stream on its luminance component (Y).

The raw video is encoded via MPEG-2 [23] to test authentication robustness against compression. According to the MPEG-2 coding algorithm, the raw video frames are compressed into three kinds of frames: Intra-coded frames (I-frames), Predictive-coded frames (P-frames), and Bi-directionally predictive-Predictive-coded frames (B-frames). A video sequence is split into subsequences called GOPs (Group Of Pictures). Every GOP has a fixed length and a definite structure including one I-frame and a set of P-frames and B-frames.

We consider an uncompressed video stream S as a sequence of blocks S =< B0, . . . , Bn−1>. Each block Bi, with i ∈ [0, n), is composed by s pictures Bi=< p0, . . . , ps−1>. The watermark algorithm W (◦) embeds the same mark h (to our purposes the mark is a hash) in all of the pictures within the same block Bi, thus changing it into

(25)

4.1. PRELIMINARY DEFINITIONS 25 X =       x0 x1 x2 x0 x1 x2 x0 x1 x2 x0 x1 x2 x0 x1 x2 x0 x1 x2 x0 x1 x2 x0 x1 x2 0      

Figure 4.1: Replica 5-matrix X of vector x =< x0, x1, x2>.

Y =           0 · · · 0 .. . . .. ... 0 · · · 0 0 · · · 0 .. . . .. ... 0 · · · 0 0 · · · 0 .. . . .. ... 0 · · · 0 X          

Figure 4.2: Container r × c matrix Y defined over the replica d-matrix X.

Bi= W (Bi, h). One or more blocks Bi are coded by the MPEG-2 compression algo-rithm, thus obtaining cBi, which is the GOP in the MPEG-2 notation.

Receiver decodes cBi and uses the algorithm E(◦) for extracting the embedded hash h from those pictures that belong to the same watermarked block Bi.

Now we define some matrix operators that are used in the following to manage the matrices representing the pictures and the mark. Let x be a vector of L symbols < x0, . . . , xL−1 >; we define the replica d-matrix of x, a d × d matrix X = [xij] (with i, j ∈ [0, d)), where: xij = ( 0 i = d − 1, j ≥ d − d2 _{mod L} xl l = (d · i + j) mod L (4.1) Figure 4.1 shows the replica 5-matrix of a vector of three symbol x =< x0, x1, x2>. Hereafter we denote with repk(X) the (k + 1)th replica of x within X (with k ∈ [0, ⌊d_L2⌋)), and with repk,l(X) the lthsymbol of repk(X), i.e. repk,l(X) = xi,j, where i =k·L+l

d

and j = (k·L+l) mod d, and repk(X) =< repk,0(X), . . . , repk,L−1(X) >. We also define the container r × c matrix Y = [ypq] as:

Ypq= (

0 p ≤ r − d ∨ q ≤ c − d

xij i = p − (r − d) ∧ j = q − (c − d)

(4.2) Figure 4.2 shows a container matrix defined over a replica d-matrix X. Given a r × c matrix Y = [yi,j], we define Φd(Y ) as its d × d sub-matrix of elements yi,j with i ∈ [r − d, r) and j ∈ [c − d, c). In other words, Y is the r × c container matrix of Φd(Y ).

(26)

26 CHAPTER 4. THE WATERMARKING ALGORITHM

4.2 Embedding the mark

Let us consider B as a block (sequence) of pictures, and h the hash value that should be embedded in each picture of block B. Note that, due to specific features of the mark extraction (discussed in Chapter ??), the mark must be represented as a sequence of {−1, 1}. For this reason the result h of the hash function is a binary sequence of {−1, 1} symbols.

Let pi(Y ) be the luminance component of picture pi, DCT(◦) the Discrete Cosine Transform, H the r × c container matrix of the replica d-matrix of h (where r × c is the size of the picture), nian r × c matrix of pseudo noise integer values in the range [−1, 1], and α real value in the range [0, 1].

For each picture pi (with 0 ≤ i ≤ s − 1) in B, the mark h is embedded in piby using the following equations:

Pi = DCT(pi(Y)) (4.3)

Pi = Pi+ |Pi| ∗ α ∗ ni∗ H (4.4)

Equation 4.4 embeds the container matrix H in Pi. However, since the components of H corresponding to the low/medium frequencies of Piare null, and only the compo-nents corresponding to the medium/high frequencies of Piare not null, this equation has the effect of embedding the h replicas (that are included in H) in the medium/high frequencies of Pi.

The amount of redundancy Red in the mark embedded in the picture can be ex-pressed as the fraction of the area in the matrix H occupied by the mark replicas, i.e. Red = d2_{/(r × c). This parameter will be used in the presentation of the simulation} results instead of d, r, c.

Finally, Piis transformed into the luminance component of the picture pias: pi(Y ) = DCT−1{Pi}. Thus pi is the picture embedding the mark in its luminance compo-nent. The embedding procedure is detailed in Table 1.

(27)

4.2. EMBEDDING THE MARK 27

Algorithm 1: Function W (◦) embedding the hash value into a block of video pictures

Input: Video block Bi belonging to the sequence S = [B0, . . . , BN −1] Data: textbflet B be a video block belonging to the stream S; textbflet h be the hash value to be embedded in B;

textbflet H be the container matrix associated to h; textbflet p0, . . . , ps−1 be the pictures belonging to B; textbflet α be a real value in the range [0, 1];

textbflet Let ni be an r × c matrix of pseudo random numbers;

/* For each picture belonging to the block Bi */

fori = 0 to s − 1 1 { 2 /* Calculating DCT coefficients */ Pi = DCT{pi(Y)}; 3

/* Marking the picture */

Pi = P + |Pi| ∗ α ∗ ni∗ H; 4

/* Set marked luminance values in the old picture */

pi(Y ) = DCT −1_{P i(Y)} 5 } 6

Algorithm 2: Function E(◦) retrieving the hash value from a block of video pictures

Input: Video block bBi belonging to the sequence bS = [ bB0, . . . , bBN −1] Data: textbflet p0, . . . , ps−1be the pictures belonging to a block bBi; textbflet m be the number of h replicas inside each picture;

textbflet ni be the same matrix used by the transmitter; h′

=< 0 . . . 0 > ; 1

/* For each picture belonging to the block bBi */

fori = 0tos − 1 do 2 /* Calculating DCT coefficients */ Pi= DCT{pi(Y)} ; 3 h′ =Pm−1j=0 n repj(Φd(Pi∗ ni))) o ; 4 h′ = h′ + h′ ; 5 end 6

/* Calculating the mark via the sign function */

h′′

= sign(h′ ) 7

(28)

28 CHAPTER 4. THE WATERMARKING ALGORITHM

4.3 Retrieval of information

Given an incoming block B consisting of s pictures, let pi (with i ∈ [0, s)) be the decoded pictures, and Pi the DCT over the luminance component (Y) of pi, that is, Pi = DCT (pi(Y )). Let also L be the hash length, d the size of the replica matrix of h used by the transmitter to embed the hash h in the picture, s the number of pictures per block, and m = ⌊d2_{/L⌋ the number of hash replicas in each picture.} The receiver recovers the hash h embedded in the block B by computing the sign of a vector of real values h′

that is extracted from the received (and decoded) block B. The computed hash h′′

may be different from h due to the noise introduced by the hosting picture. In particular the hash h′′

is computed as: h′′

= sign(h′

) (4.5)

In turn h′

is extracted from the medium/high frequencies of Piby using the following equation: h′ = s−1 X i=0 m−1 X j=0 n repj(Φd(Pi∗ ni)) o (4.6)

In this equation the inner sum accumulates contributions of the hash replicas within a given picture, and the outer sum accumulates contributions from all the pictures in the block (recall that each picture embeds a copy of the mark and the receiver evaluates the mark by combining the marks extracted from all the pictures). Observing that Pi∗ ni= Pi∗ ni+ |Pi| ∗ α ∗ H, we have: h′ = s−1 X i=0 m−1 X j=0 n repj(Φd(Pi)) ∗ repj(Φd(ni)) + repj(Φd(|Pi|)) ∗ α ∗ repj(Φd(H)) o (4.7)

By definition, we have repj(Φd(H)) = h. Furthermore, the term repj(Φd(Pi)) ∗ repj(Φd(ni)) in Eq. 4.7 behaves as a noisy term since it contains values from pictures multiplied by pseudo random numbers in [−1, 1]; thus, its contribution approaches 0 for increasing values of s and m. Thus for sufficiently high values of s and m, h′ converges to: lim s,m→∞h ′ = s−1 X i=0 m−1_X j=0 n repj(Φd(|Pi|)) ∗ α ∗ h o (4.8)

(29)

4.3. RETRIEVAL OF INFORMATION 29

Replacing Eq. 4.8 in Eq. 4.6 we obtain: lim s,m→∞h ′′ = sign( s−1 X i=0 m−1 X j=0 n repj(Φd(|Pi|)) ∗ α ∗ h o )

Finally, observing that repj(Φd(|Pi|)) ∗ α ≥ 0 for each j, i, yields:

lim s,m→∞h ′′ = sign   s−1 X i=0 m−1_X j=0 h   = h

Hence, for sufficiently high values of s and m, the receiver correctly extracts the mark h from the block.

(30)

(31)

Chapter 5 Theoretical framework for mark

extraction probability

In this chapter we give a theoretical framework that can be used to evaluate mark extraction probability. Without loss of generality we assume that mark embedding and extraction is performed in one picture. Given a block B and under the assumption that B contains a single picture p per block, the symbol h′′

l (l ∈ [0, L)) of the extracted mark h′′

can be computed from equations (4.6), (4.7), and (4.7) as the sign of h′ l, where: h′ l= m−1_X j=0 n repj,l(Φd(P )) ∗ repj,l(Φd(n)) + repj,l(Φd(|P |)) ∗ α ∗ repj,l(Φd(H)) o (5.1)

and P is the DCT of the luminance component of p. By definition, repj,l(Φd(H)) is the bit l of the embedded hash h, that is, repj,l(Φd(H)) = hl. Hence letting Nl=P m−1 j=0 n repj,l(Φd(P )) ∗ repj,l(Φd(n)) o and Nl+= Pm−1 j=0 n repj,l(Φd(|P |)) o , Eq. (5.2) can be rewritten as:

h′

l= Nl+ α ∗ hl∗ Nl+ (5.2)

Let X1be the random variable associated to Nl− α ∗ Nl+and X2the random variable associated to Nl+ α ∗ Nl+. Recalling that hl is a binary value in the set {−1, 1}, the random variable X associated to h′

lis such that: X = ( X1 with probability 0.5 X2 with probability 0.5 (5.3) 31

(32)

32

CHAPTER 5. THEORETICAL FRAMEWORK FOR MARK EXTRACTION PROBABILITY Let ψ+_{and ψ}−

be the Probability Mass Functions (PMF) associated to X2 and X1, respectively, and let Ψ+ _{and Ψ}−

be the Cumulative Distribution Functions (CDF) of ψ+ _{and ψ}−

, respectively. We define P+

e the probability of false positive, that is, the probability to receive the symbol h′

l = −1 in place of the transmitted symbol hl = +1, and the probability of false negative Pe−, as the probability to receive the symbol h′

l = +1 in place of the transmitted symbol hl = −1. The probability of h′

l6= hl(that is, symbol l of the hash is not computed correctly by the receiver) can be calculated as the probability of false positive P+

e when hl= −1, or false negative P− e when hl= +1. P+ e = Pe(h ′ l= −1 | hl= +1) = Ψ+(0) P− e = Pe(h ′ l= +1 | hl= −1) = 1 − Ψ − (0) (5.4)

The mark extraction probability can be evaluated according to Eq. (5.4), provided that the distribution ψ+ _{and ψ}−

are known. In the next subsection we consider three video samples and we show that with high probability X1and X2, obtained by embedding the mark in these video, are normally distributed. Under this assumption, we evaluate in Subsection ?? the mark extraction probability.

5.1 Gaussianity test of random variables

We test the gaussianity of X1and X2by evaluating the distribution ψ of the random variable X as defined in Eq. (5.3). It is immediate that ψ = 1

2ψ + ₊ 1

2ψ −

. We perform the gaussianity test by using high order moments and the mean absolute value analysis. The following lemmas prove intermediate results.

Lemma 5.1.1 E[X1] = −E[X2].

Proof 5.1.2 We observe that: E[X1] + E[X2] = 1 l L−1_X i=0 Nl+ α · Nl++ 1 l L−1_X i=0 Nl− α · Nl+= 2 l L−1_X i=0 Nl= 2 · E[Nl]

Note that liml→∞E[Nl] = 0, in fact Nl = P m−1 j=0 n repj,l(Φd(P )) · repj,l(Φd(n)) o , where n are pseudo-random values in [−1, 1]. Hence, the thesis follows.

(33)

5.1. GAUSSIANITY TEST OF RANDOM VARIABLES 33

Proof 5.1.4 Let −µ and µ be the mean of X1 and X2, respectively (see Lemma 5.1.1). From the definition of variance, we have:

E[(X1+ µ)2] − E[(X2− µ)2] = E[(Nl+ α · Nl+− µ) 2 ] − E[(Nl− α · Nl++ µ) 2 ] = 4αE[NlN_l+] − 4µE[Nl]

Note that liml→∞E[Nl] = 0 and liml→∞E[NlNl+] = 0; in fact Nl= m−1_X j=0 n repj,l(Φd(P )) · repj,l(Φd(n)) o

where n are pseudo-random values in [−1, 1]. Hence, the thesis follows.

Lemma 5.1.5 If X1 and X2 are normally distributed then E[X2] = µ2+ σ2 is the second order moment of X, where −µ and µ are the mean of X1and X2, respectively, and σ2 _{is the variance of X}

1 and X2.

Proof 5.1.6 From the definition of the second order moment we have: E[X2_{] =}Z ∞ −∞ x2_· 1 2(ψ +_{(x) + ψ}− (x)) dx = 1 2 Z ∞ −∞ x2_{· ψ}+_{(x) dx +}1 2 Z ∞ −∞ x2_{· ψ}− (x) dx = 1 2E[X 2 1] + 1 2E[X 2 2]

Under the hypothesis that X1 and X2 are normally distributed, their second order moments are E[X2

1] = µ2+ σ2 and E[X22] = µ2+ σ2. Hence, the thesis follows. Lemma 5.1.7 If X1 and X2are normally distributed then E[X4] = µ4+6σ2µ2+3σ2 is the fourth order moment of X, where −µ and µ are the mean of X1 and X2, respectively, and σ2 _{is the variance of X}

1 and X2.

Proof 5.1.8 From the definition of the fourth order moment we have: E[X4_{] =} Z ∞ −∞ x4_· 1 2(ψ +_{(x) + ψ}− (x)) dx = 1 2 Z ∞ −∞ x4_{· ψ}+_{(x) dx +}1 2 Z ∞ −∞ x4_{· ψ}− (x) dx = 1 2E[X 4 1] + 1 2E[X 4 2]

(34)

34

CHAPTER 5. THEORETICAL FRAMEWORK FOR MARK EXTRACTION PROBABILITY Under the hypothesis that X1 and X2 are normally distributed, their fourth order moments are E[X4

1] = µ4+ 6σ2µ2+ 3σ2 and E[X24] = µ4+ 6σ2µ2+ 3σ2. Hence, the thesis follows.

We test the gaussianity of X1 and X2 according to the following procedure. We first generate samples of random variable X by simulating the mark embedding and extraction over a video sample, and we use these samples to estimate the second and fourth order moments of X. Assuming that X1 and X2 are normally distributed, their mean −µ and µ and their variance σ2 _{can be computed by exploiting lemmas} 5.1.5 and 5.1.7 as:

(

µ2_{+ σ}2_{= E[X}2_]

µ4_{+ 6σ}2_µ2_{+ 3σ}4_{= E[X}4_]

Let us now consider two normally distributed random variables Y1 and Y2 with variance σ2 _{and mean −µ and µ, respectively. Let Y be defined as:}

Y = (

Y1 with probability 0.5 Y2 with probability 0.5

(5.5) We generate random samples of the random variable Y and we estimate the mean absolute value E[|Y |]. We execute the previous procedure by computing a set of E[|Y |] values. We define an admissibility interval, AI, over E[|Y |] values, that is AI = [Q5{E[|Y |]}, Q95{E[|Y |]}], where Q5{E[|Y |]} and Q95{E[|Y |]} are the 5-quantile and 95-quantile, respectively.

We test the gaussianity of X1 and X2 by comparing the mean absolute value E[|X|] of X with the admissibility interval AI. If the mean absolute value of X belongs to the AI interval, then X is a random variable constituted by the sum of two normally distributed variables, that is X = X1+ X2 where X1 ∈ N (µ, σ2) and X2∈ N (−µ, σ2). Errorbars in Fig. 5.1(a), 5.1(b), 5.1(c) show admissibility intervals for each configuration parameter in Table 6.1 for the three video samples, while dot lines fit the mean absolute values obtained after mark embedding. In all the cases, the evaluated mean absolute value belongs to the admissibility interval. We conclude that the embedded mark coefficients h′

lcan be modelled as the outcomes of a random variable X that in turn, is the sum of two normally distributed random variables X1 and X2. In the following subsection we show simulation results about mark extraction probability by evaluating mean and variance of DCT coefficients of the three video samples.

5.2 Mark extraction probability

In this section we show how to compute the mark extraction probability and we present simulation results that validate our procedure. The mark extraction prob-ability can be computed from Eq. (5.4) observing that Ψ+ _{and Ψ}−

(35)

5.2. MARK EXTRACTION PROBABILITY 35 0 100 200 300 400 500 600 700 800 900 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34

Theoretical and Measured Mean Absolute Value

Configuration parameters M.A.V. intervals Red = 0.7 Red = 0.6 Red = 0.5 Red = 0.4 Red = 0.3 Red = 0.2 Red = 0.1 0 200 400 600 800 1000 1200 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34

Configuration parameters M.A.V. intervals Red = 0.7 Red = 0.6 Red = 0.5 Red = 0.4 Red = 0.3 Red = 0.2 Red = 0.1

Figure 5.1: Theoretical intervals and measured Mean Absolute Value, using akiyo (a), foreman (b), and coastguard (c) video samples

(36)

36

CHAPTER 5. THEORETICAL FRAMEWORK FOR MARK EXTRACTION PROBABILITY distributed random variables, that have opposite mean (see Lemma 5.1.1) and same variance (see Lemma 5.1.3). The probability Pe that the extracted symbol is wrong, can be expressed as Pe= Pe++ P

−

e , that is Pe= 2 · Ψ(0).

The probability Pt of correct mark extraction, that is, the probability that the re-ceiver computes h′′

from h′

correctly, is thus expressed as:

Pt= (1 − Pe)L (5.6)

where L is the number of mark symbols. Let Pmbe the mark extraction rate defined as the number of pictures from which the mark has been correctly extracted, divided by the number of pictures in the whole video stream. We compare the mark extraction rate Pmwith Ptas obtained by evaluating Peand Pton the same video samples used in the simulations.

We draw a set of Pt values for each pair of configuration parameters α and Red (expressing the amount of redundancy in the mark replicas). For each configuration, Figs. 5.2(a), 5.2(b), 5.2(c) show, for each configuration, the values of Pm and the error bars representing the percentile 5 and 95 of the calculated Ptvalues. Errorbars associated to Pt intervals are quite small in the cases where the mark extraction probability is high enough.

For example configurations 24, . . . , 28, 30, . . . , 34 have a good Pt estimation, and the mark extraction ratio is almost 1. We suggest to use configuration parameters that guarantee mark extraction ratio equal to 1; in that case, the mark extraction model is particularly precise.

(37)

5.2. MARK EXTRACTION PROBABILITY 37 0 0.2 0.4 0.6 0.8 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34

Mark extraction probability

Configuration parameters Simulated akiyo Theoretical akiyo 0 0.2 0.4 0.6 0.8 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34

Configuration parameters Simulated foreman Theoretical foreman 0 0.2 0.4 0.6 0.8 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34

Configuration parameters Simulated coastguard

Theoretical coastguard

Figure 5.2: Theoretical intervals and measured extraction frequencies, using akiyo (a), foreman (b), and coastguard (c) video samples

(38)

(39)

Chapter 6 Simulations results on video quality

degradation

In this chapter we present simulation results relevant to video quality degradation at the receiver side versus the mark extraction rate in both the scenarios where the video is compressed with MPEG-2, and where the video is not compressed. In the simulations, we consider three widely used video sequences, namely akiyo, foreman and coastguard [28], and we assess the influence of parameters α and Red in the un-favorable case where s = 1. Note that s > 1 implies a higher redundancy in the mark replicas and, thus, a higher probability of mark extraction.

To assess the quality of the achieved mark transparency, we have compared the wa-termarked and the original video using the standard PSNR metric [29]. The PSNR computed over the Y component of two pictures is expressed as:

PSNR = 10 log (2 n_{− 1)}2 p

MSE(Y)

where n is the number of bits used in each pixel to represent the Y component and MSE(Y) is : MSE(Y) = 1 3wh∗ N X i=1 (po i(Y) − pwi (Y)) 2

where [w, h] is the picture size, while po

i(Y) and pwi(Y) are the luminance components of the original and watermarked pictures, respectively. PSNR is a good metric to evaluate the video quality variations for a given compression algorithm and for a given video [30]. In particular, in [31, 32], authors show that PSNR values over 30dB are always proof of negligible degradation of the video quality.

In the simulations we evaluate the video quality by analyzing the PSNR (Peak to Signal Noise Ratio) [29] decay of pictures after the mark embedding and the MPEG-2 encoding. PSNR is a well known method for evaluating video differences between the original and the modified source.

(40)

40

CHAPTER 6. SIMULATIONS RESULTS ON VIDEO QUALITY DEGRADATION Table 6.1: Configuration Parameters (C.P.)

C.P. α Red C.P. α Red 0 0.1 0.1 18 0.3 0.5 1 0.1 0.2 19 0.3 0.6 2 0.1 0.3 20 0.3 0.7 3 0.1 0.4 21 0.4 0.1 4 0.1 0.5 22 0.4 0.2 5 0.1 0.6 23 0.4 0.3 6 0.1 0.7 24 0.4 0.4 7 0.2 0.1 25 0.4 0.5 8 0.2 0.2 26 0.4 0.6 9 0.2 0.3 27 0.4 0.7 10 0.2 0.4 28 0.5 0.1 11 0.2 0.5 29 0.5 0.2 12 0.2 0.6 30 0.5 0.3 13 0.2 0.7 31 0.5 0.4 14 0.3 0.1 32 0.5 0.5 15 0.3 0.2 33 0.5 0.6 16 0.3 0.3 34 0.5 0.7 17 0.3 0.4

We conduct simulations using different pairs of parameters α and Red (the amount of redundancy in the mark embedding). All these configurations are shown in Table 6.1, and each configuration is referred to with a progressive number that is used in the graphs to represent the outcomes of the simulations for that configuration. Finally, note that for some of the simulations herewith reported, we have also created a web page, showing the video with the watermarking information. A subset of the video samples is available at [33].

6.1 Raw video

In this section we show simulation results on raw video. Figures 6.1(a), 6.1(c), 6.1(e) report the results for the videos akiyo, foreman, and coastguard, respectively. Each figure presents the PSNR (in the upper part) and the mark extraction rate (in the lower part) computed over the whole video as a function of the configuration param-eters. The simulation on the three videos provide similar results; in particular, the values of α and Red have a different impact on the video quality, and higher values of Red degrade the PSNR more than high values of α. Furthermore, for a given mark extraction rate, a strong amplification factor α appears to provide a better video qual-ity than that provided by incrementing the amount of redundancy Red in the mark replicas. For example, in Fig. 6.1(a) the mark extraction rate is almost 1 for config-urations 23, . . . , 27, 29, . . . , 34. Among these configconfig-urations, the highest PSNR values are reached by configurations 23, 29, and 30, that correspond to (α = 0.4, Red = 0.3), (α = 0.5, Red = 0.2), and (α = 0.5, Red = 0.3), respectively. On the other hand, the PSNR of configurations 23, 29, and 30 are about 55, 57, 54, respectively. Thus, in this case, configuration 29 offers the best tradeoff between video quality and mark

(41)

6.2. MPEG-2 VIDEO TRANSCODING 41

extraction rate.

6.2 MPEG-2 video transcoding

In this section we show the simulation results on MPEG-2 compressed videos. We use the same video samples used in [28].

Figures 6.1(b), 6.1(d), 6.1(f) show the PSNR and the mark extraction rate obtained for the three videos by using the configuration parameters of Table 6.1. Also in this case the results are quite independent of the actual video used. The figures show that, as compared to the case where no compression is used, with MPEG-2 coding the mark extraction rate uniformly decreases for all the configuration parameters. This is due to the quantization of the compression phase that reduces the mark strength. Moreover, the behavior of the PSNR and the mark extraction rate as a function of Red and α is similar to that observed in Subsection 5.2. For example, Fig. 6.1(b), configurations 27, 32, 33 and 34 show the highest mark extraction rate. Among these configurations, the highest PSNR values are reached by configurations 33 and 34, that correspond to (α = 0.5, Red = 0.6), (α = 0.5, Red = 0.7), respectively. On the other hand the PSNR of configurations 33 and 34 are about 46dB and 43dB, respectively. Thus in this case configuration 33 offers the best tradeoff between video quality and mark extraction rate.

(42)

42

CHAPTER 6. SIMULATIONS RESULTS ON VIDEO QUALITY DEGRADATION

(a) (b)

(c) (d)

(e) (f)

Figure 6.1: Mark extraction ratio vs PSNR, using akiyo, foreman, and coastguard considered as raw video samples (a), (c), (e), and after MPEG-2 compression (b), (d), (f).

(43)

Chapter 7 Hard authentication: the baseline

The simplest approach exploits a single hash chain and a digital signature based on DSA as illustrated in Fig. 7.1. The hash chain (that, for example, can be built over SHA-1 [34]) is hidden in the video stream pictures by means of the watermarking technique described in Section ??.

Table 3 describes in details how the sender builds up the hash chain. Transmitter is feeded with an uncompressed video stream S and interprets it as a sequence of blocks of pictures < B0, . . . , Bn−1 >, so that each block is codified afterwards as a GOP by the MPEG-2 algorithm. The last block of the video sequence is compressed:

b

Bn−1 = M peg2(Bn−1), where M peg2(◦) indicates the MPEG-2 coding algorithm. The construction of the hash chain starts from the block of index n − 2. The hash value hi is calculated over the GOP bBi through the function hi = hash( bBi). The value hi is inserted into the block Bi−1by means of the watermarking function, thus yielding Bi−1= W (Bi−1, hi). The watermarked block Bi−1 is thus encoded into the GOP bBi−1= M peg2(Bi−1), which is the argument for the next hash computation. The procedure is iterated over all the blocks up to bB0. Finally, bB0 is authenticated by means of a digital signature SIG = Signature(hash( bB0)). The marked GOP

h₁ h₂ DSA(B )₀

B

0

B

1

B

0

B

1 2

B

2

Figure 7.1: Simple hash chain constructed on three blocks.

(44)

44 CHAPTER 7. HARD AUTHENTICATION: THE BASELINE

sequence bS =< bB0, . . . , bBn−1> is then transmitted after the digital signature SIG. Algorithm 3: Preprocessing (set up phase) leveraging simple hash chain, and broadcast phase

Input: Video stream sequence S

Data: let S =< B0, . . . , Bn−1> be the uncompressed video stream block sequence

/* Coding the last block */

b

Bn−1= M peg2(Bn−1) ; 1

/* For each block in the video sequence: */

fori = n − 2 to 1 do 2

/* Hashing the block */

hi+1= hash( bBi+1) ; 3

/* Marking the block */

Bi= W (Bi, hi+1) ; 4

/* Coding the block */

b Bi= M peg2(Bi) ; 5 end 6 SIG = Signature(hash( bB0)) ; 7 b S =< bB0, . . . , bBn−1>) ; 8

/* Transmitting both signature and marked video sequence */

Transmit (SIG, bS) ; 9

Once the receiver receives the sequence (SIG, bS), it decodes the MPEG-2 stream and extracts a GOP sequence from the compressed video stream bS via ExtGOP (◦), that is: < bB0, . . . , bBn−1>= ExtGOP (bS), then it authenticates the GOPs. To this purpose it computes the hash of the first GOP: h0= hash( bB0), and it uses this value to verify the digital signature SIG with V erSignature(SIG, h0).

If the test succeeds, the receiver can subsequently authenticate the video stream by checking for the authenticity of each block of the hash chain. The hash value hi inserted by the sender is extracted from the block Bi−1 through the watermark re-trieval function E(Bi−1). Bi−1is obtained repeatedly from each GOP belonging to the video stream through Bi−1= U nM peg2( bBi−1). The value hi is then compared with the hash calculated by the receiver over the GOP bBi, that is hash( bBi). The execution proceeds iteratively only if the test is successful for all the blocks. If a pic-ture is corrupted (for instance, forged or changed during the channel transmission) its hash changes and, consequently, the authentication test fails. Once a failure is triggered, all subsequent pictures cannot be authenticated. Table 4 shows in details the algorithm executed by the receiver.

(45)

45

Algorithm 4: On the fly verification of sequence SIG, bS Input: Signature and Marked video sequence (SIG, bS); Receive (SIG, bB0) ;

1

/* Hashing the first block */

h = hash( bB0) ; 2

/* Verifying the first block authenticity */

if NOT V erSignature(SIG, h)then 3 authentication failed ; 4 exit ; 5 end 6 else 7 B0= U nM peg2( bB0) ; 8 h1= E(B0) ; 9 end 10 fori = 1 to n − 1 do 11

Wait for ( bBi) arrival ; 12

/* Decoding the block */

Bi= U nM peg2( bBi) ; 13

/* Extracting the hash */

hi+1= E(Bi) ; 14

/* Trusting the next block */

if hi6= hash( bBi) then 15 authentication failed ; 16 exit ; 17 end 18 end 19

(46)

(47)

Chapter 8 Hard authentication algorithm

In this chapter we present a novel authentication scheme based on the Duplex Hash Chain, that allows to authenticate video streams transmitted via low noisy channels, like satellite ones. The new scheme is based on signature amortization with hash chain: DSA signature calculation is performed only once at the sender side, while authentication information is propagated over the whole video stream by means of the hash chain. Although satellite links are very reliable due to project constraints, corruptions of few bits cannot be completely neglected. For this reason we enhance the authentication procedure proposed in [21] to deal with (low) bit error rate. The improvement is based on the assumption that the hash extraction can be done any-way (also with a corrupted video sample). We give the proof of this in Chapter ??. This enhanced version of the authentication procedure is based on the new Duplex Hash Chain scheme, hereafter DxHC. Figure 8.1 shows the improved scheme with the DxHC, considering a 4 blocks video stream. Referring to Figure 9.1, let us consider blocks Bb1, Bb2, and Bb3, and let us assume that some bits of Bb2 are corrupted. It is immediate that the authentication of Bb2 by means of the hash h2 contained in

b

B1 fails. However due to the property of mark extraction, it is possible to correctly extract the hashes h3 and hh3from Bb2despite of its bit errors. The bit errors effect on the mark extraction is discussed later in this chapter.

These hashes are authenticated by means of the hash hh2 contained inBb1, and they can be used to subsequently authenticateBb3. This means that, exploiting the DxHC chain, the bit failures in B2 affects only the authentication of B2 but it does not prevent the authentication of the other blocks.

Note that this new scheme is completely different from the double hash chain scheme: the DxHC scheme is not constituted by two simple hash chains, but is obtained as the combination of one simple hash chain, that authenticates the subsequent block and it is fragile to the packet loss, and a second hash chain that authenticates the simple hash chain and it is robust to packet loss. Table 5 describes in details how the sender

(48)

48 CHAPTER 8. HARD AUTHENTICATION ALGORITHM h1 hh1 h2 hh2 h3 hh3 h4

DSA(B )

₀

B

0

B

1

B

2

B

3

B

4

B

0

B

1 2 3 4

Figure 8.1: The DxHC scheme for dealing with bit error rate.

builds the hash chain. Again, the transmitter is provided with an uncompressed video stream S and interprets it as a sequence of block pictures < B0, . . . , Bn−1 >. The authentication procedure works as follows. The last block of the video sequence is first compressed into bBn−1 = M peg2(Bn−1). The construction of the hash chain starts from the block of index n − 1. The hash value hn−1 of bBn−1is inserted in the block Bn−2by means of the watermarking function, yielding Bn−2= W (Bn−2, hn−1). Block Bn−2is then coded in bBn−2= M peg2(Bn−2). If i < n−2, two hash values hi+1 and hhi+1 are inserted in the block Bi, where hi+1= hash(Bbi+1) and hhi+1 is com-puted over the concatenation of hi+2 and hhi+2, that is hhi+1= hash(hi+2khhi+2). Each marked block is coded with MPEG-2 algorithm and buffered asBbi. Finally, the first GOP (Bb0) is authenticated through a digital signature SIG = Signature(hash(Bb0)). The signature and the marked GOP sequence <Bb0, . . . ,Bbn−1> are then transmit-ted.

(49)

49

Algorithm 5: Preprocessing (set up phase) leveraging DxHC, and broadcast phase

Input: Signature and Marked video sequence (SIG, bS);

Data: let S =< B0, . . . , Bn−1> be the uncompressed video stream block sequence

hhn−1= 0 ; 1

/* Coding the last block */

b

Bn−1= M peg2(Bn−1) ; 2

/* Hashing the last block */

hn−1= hash( bBn−1) ; 3

fori = n − 2 to 0 do 4

/* Marking with the first block */

Bi= W (Bi, hi+1) ; 5

if i ≤ n − 2 then 6

/* Marking with the second hash */

Bi= W (Bi, hhi+1) ; 7

/* Coding the block */

b

Bi= M peg2(Bi) ; 8

/* Hashing the block */

hi= hash(Bbi) ; 9

/* Hashing the hash values */

hhi= hash(hi+1khhi+1) ; 10 end 11 /* Signature generation */ SIG = Signature(h0) ; 12 b S =<Bb0, . . . ,Bbn−1> ; 13 Transmit (SIG,S) ;b 14 end 15

Table 6 shows in details the algorithm executed by the receiver. The authentication procedure works as follows. The receiver authenticates firstBb0by using the signature SIG. If the test succeeds, the receiver proceeds with the authentication of the other GOPs. In particular, let Bbi be the last authenticated GOP (i < n − 1) and hi the hash embedded by the transmitter in Bbi and extracted by the receiver. GOPBbi+1 is authenticated if hi= hash(Bbi+1).

Let us now assume that in the transmission of the block Bbi+1 a packet is lost or corrupted, and let us consider the values of hi, hhi, hi+1, and hhi+1. Clearly, block

b

(50)

50 CHAPTER 8. HARD AUTHENTICATION ALGORITHM

However, the receiver can anyway try to extract fromBbi+1 the pair hi+1 and hhi+1. Let us assume that, despite the packet loss or corruption, the extraction of hi+1 and hhi+1 is successful. Since hhi= hash(hi+1khhi+1), both hi+1 and hhi+1 are authen-ticated and they can be used to check the authenticity of the next blockBbi+2. This means that the used DxHC scheme used is tolerant to packet losses or corruptions, provided that the hashes be correctly extracted from a block.

8.1 Dealing with noisy channels

The watermarking extraction robustness against packet loss has been evaluated by means of simulation, under the assumption that the loss is limited to a few bits (as it is expected in satellite communications). In the simulations we consider three video samples (Akiyo, Foreman, Coastguard [28]), for each video sample we consider four block sizes, namely of 5, 10, 15, and 20 pictures (recall that each block is a sequence of pictures). We consider a Bernoulli random loss process (independent losses) [11] with four different mean bit error rates (0, 5 · 10−6_{, 10}−5_{, 5 · 10}−5_{). A subset of} the corrupted video samples is available at [33]. The simulation results are shown in Figure 8.2, the mark extraction ratio increases as the number of pictures per block increases, and it is 1 in all the cases when the block size is 20. This happens be-cause the same mark is embedded in all the pictures belonging to the same block (see Chapter ??), hence the redundancy (mark repetitions) increases with the number of pictures per block, and this, in turn, improves the resilience to bit corruptions. The poor performance in the mark extraction ratio, in the case of Akiyo video, is obtained only with a high bit error rate (5 · 10−5_{); however, even in this case, it is possible to} guarantee the mark extraction for sufficiently high block lengths.

Note that the mark extraction probability could depend on the bit error distribution (channel model), but we want to enlighten that the mark extraction ratio will con-verge anyway to 1, provided that a sufficiently large block is considered. On the other hand, if an on/off channel model is considered, and the off period is sufficiently long, for example as long as the block length, the chain will be broken.

Video Streaming Authentication Combining Signature Amortization and Watermarking

Gabriele Oligeri

Acknoledgements

Abstract

Sommario

Contents

Chapter 1

Introduction

1.1

Motivations

1.2

Thesis overview

1.3

Thesis outline

Chapter 2

Related Work

2.1

Watermarking

2.2

Authentication

2.3

Hybrid Solutions

Chapter 3

Scenario and adversarial model

3.0.1

Satellite network

3.0.2

Hybrid network

3.0.3

Adversarial model

Chapter 4

The watermarking algorithm

Requirements

4.1

Preliminary Definitions

4.2

Embedding the mark

4.3

Retrieval of information

Chapter 5

Theoretical framework for mark

extraction probability

5.1

Gaussianity test of random variables

5.2

Mark extraction probability

Chapter 6

Simulations results on video quality

degradation

6.1

Raw video

6.2

MPEG-2 video transcoding

Chapter 7

Hard authentication: the baseline

B

B

B

B

B

B

Chapter 8

Hard authentication algorithm

DSA(B )

B

B

B

B

B

B

B

B

B

B

8.1

Dealing with noisy channels