Design and development of a quantum circuit to solve the information set decoding problem

(1)

Scuola di Ingegneria Industriale e dell’Informazione

Corso di Laurea Magistrale in Ingegneria Informatica

Dipartimento di Elettronica, Informazione e Bioingegneria

Design and development of a quantum circuit

to solve the Information Set Decoding problem

Advisor: Prof. Gerardo PELOSI

Co-advisor: Prof. Alessandro BARENGHI

Tesi di Laurea di:

(2)

(3)

Cryptosystems based on linear codes are gaining momentum due to their stronger resistance to quantum attacks. They rely on the hardness of finding a minimum-weight codeword in a large linear code with an apparently ran-dom structure. In this work we designed and implemented several quantum circuits to specifically solve the Information Set Decoding problem, which is currently the most effective attack against code-based cryptoschemes. Rely-ing on Grover’s algorithm, the proposed algorithms were shown capable of effectively recover the original error vector simulating the computation of a quantum computer. Both an exhaustive search and a variant of Lee-Brickell’s algorithm are proposed, with the former relying only on a quantum circuit and the latter using a hybrid classic-quantum approach. In both cases, two variants have been analyzed and compared, showing how a proper prepara-tion of the initial state of the system can drastically reduce the number of iterations with respect to the uniform superposition of the classic Grover’s algorithm. We provide, for the proposed algorithms, a quantitative evalua-tion of their computaevalua-tional complexity in terms of the number of involved quantum gates and required storage in qubits.

(4)

(5)

Negli ultimi anni i crittosistemi basati su codici lineari sono stati oggetto di studi sempre più approfonditi data la loro maggior resistenza ad attacchi tramite calcolatori quantistici. La sicurezza di questo tipo di crittosistemi si basa sulla difficoltà di ricavare il valore di una parola di codice corretta a partire da una affetta da errore dato un codice lineare con una struttura apparentemente casuale. In questo lavoro abbiamo progettato e implemen-tato diversi circuiti quantistici in grado di risolvere il problema noto come Information Set Decoding, che è attualmente il più efficace tipo di attacco a tali crittosistemi. Basati sull’algoritmo di Grover, gli algoritmi quantisti-ci proposti si sono dimostrati in grado di identificare l’errore originale con un’elevata percentuale di affidabilità, durante la loro validazione tramite si-mulatore di calcolatore quantistico. Abbiamo esplorato due tipi di attacchi diversi: il primo, basato su un algoritmo di ricerca esaustiva tradizionale, è puramente quantistico; il secondo, basato sull’algoritmo di Lee-Brickell, è un algoritmo ibrido classico-quantistico. In entrambi i casi, sono state utilizzate e comparate modalità di esecuzione diverse, dimostrando come un’attenta preparazione dello stato iniziale del sistema possa ridurre drasticamente il numero di iterazioni rispetto all’utilizzo di una versione base dell’algoritmo di Grover. In questo lavoro abbiamo inoltre fornito una misura quantitativa della complessità di calcolo di entrambi gli algoritmi proposti in termini di numero di quantum gates e numero complessivo di qubit.

(6)

(7)

Introduction 1

1 Error correcting codes 3

1.1 Linear codes . . . 3

1.1.1 Preliminaries . . . 3

1.1.2 Hamming codes . . . 8

1.2 Decoding methods . . . 9

1.2.1 Syndrome decoding . . . 9

1.2.2 Information Set Decoding . . . 12

1.3 Post-Quantum Code Based Cryptosystems . . . 21

1.3.1 McEliece . . . 22 1.3.2 Niederreiter . . . 25 2 Quantum Computing 27 2.1 Motivations . . . 27 2.2 History . . . 28 2.3 Overview . . . 29

2.4 Quantum Information Theory . . . 30

2.5 Boolean circuits . . . 31

2.5.1 Reversible circuits . . . 32

2.5.2 Randomized circuits . . . 35

2.5.3 Quantum circuits . . . 37

2.6 Random Access Machine . . . 43

2.6.1 Classical RAM model . . . 43

2.6.2 Quantum RAM model . . . 44

(8)

2.6.4 Quantum programming environment . . . 45

3 Quantum circuit design for ISD 51 3.1 Introduction . . . 51

3.2 Grover’s algorithm . . . 51

3.2.1 Optimal number of iterations . . . 61

3.3 Main support circuits . . . 62

3.3.1 Multi-controlled gates . . . 62

3.3.2 Fast population count . . . 66

3.3.3 Butterfly network . . . 72

3.4 Quantum ISD algorithms . . . 76

3.4.1 Bruteforce . . . 76 3.4.2 Lee-Brickell . . . 83 4 Analysis 87 4.1 Number of qubits . . . 88 4.1.1 Introduction . . . 88 4.1.2 Bruteforce circuit . . . 90

4.1.3 Lee Brickell circuit . . . 91

4.2 Number of gates . . . 93

4.2.1 Basis gates . . . 93

4.2.2 Input parameters . . . 94

4.2.3 BN mode . . . 95

4.2.4 FPC mode . . . 95

4.2.5 Weight check for Lee-Brickell . . . 96

4.2.6 Flip phase of searched state . . . 96

4.2.7 Overall number of gates . . . 96

4.3 Number of Grover’s iterations . . . 97

4.4 Results . . . 99

Conclusion 101 A Quantum algebra 103 A.1 Preface . . . 103

A.1.1 Dirac notation . . . 103

(9)

A.3 Qubits and qudits . . . 108

A.3.1 Single qubit/qudit . . . 108

A.3.2 Multiple qubits/qudits . . . 110

A.4 Operations on qubits . . . 112

A.4.1 Unitary transformations . . . 112

A.4.2 Measurements . . . 113

A.5 Qubit geometrical representation . . . 119

B Tools and frameworks 123 B.1 OpenQASM . . . 123

B.2 Qiskit . . . 125

B.3 Quirk . . . 127

B.4 Quantikz . . . 128

(10)

(11)

3.4 Circuit to flip the phase of |1i⊗n_{. . . 59}

3.5 Overall circuit for Grover’s algorithm. . . 61

3.11 The ripple carry adder for two 3-bit length words. . . 69

3.15 Example of Hamming weight check circuit. . . 72

3.16 Example of a butterfly network for an 8-bit input. . . 73

3.19 High level overview of our algorithm. . . 76

3.20 The input matrix transposed into the quantum circuit. . . 78

3.21 Column selectors. . . 79

3.22 The input syndrome transposed into our circuit. . . 79

3.23 Phase inversion for BN mode. . . 80

3.24 Phase inversion for FPC mode. . . 81

3.25 Overall circuit for the oracle of BN mode. . . 81

A.1 Complex plane representation of a qubit. . . 119

(12)

(13)

2.3 Common gates used in quantum computation with their cir-cuit representation and matrix form. . . 48 4.1 Overview of the control qubits involved in the MCX circuits

in the different modes and algorithms. . . 89 4.2 Number of qubits required by bruteforce algorithm for the

different modes. . . 91 4.3 Number of qubits required by Lee-Brickell’s algorithm for the

different modes. . . 93 4.4 Number of gates required by our algorithms, without taking

into account the MCX gates. . . 97 4.5 Estimate of the number of gates required by basic and

(14)

(15)

1.2.1 Syndrome decoding - exhaustive search . . . 11

1.2.2 ISextract . . . 14

1.2.3 Syndrome decoding - Prange’s ISD . . . 16

1.2.4 Syndrome decoding - Lee-Brickell’s ISD . . . 18

1.2.5 Syndrome decoding - Leon’s ISD . . . 20

1.3.1 McEliece Key Generation . . . 23

1.3.2 McEliece Encryption . . . 23

1.3.3 McEliece Decryption . . . 23

1.3.4 Niederreiter Key Generation . . . 25

1.3.5 Niederreiter Encryption . . . 26

1.3.6 Niederreiter Decryption . . . 26

(16)

(17)

Nowadays, most of the widespread algorithms used in public-key cryptog-raphy, including RSA, are vastly dependent on only two computationally-hard problems: integer factorization and discrete logarithm. Up to now the most efficient algorithm for prime factorization, the general number field sieve, run in sub-exponential time, making it theoretically unfeasible to break cryptosystems. In 1994, however, Peter Shor firstly formulated a quantum algorithm for integer factorization running in polynomial time.

Given that it seems likely that we are at a dawn of the quantum era, the study and development of quantum resistant algorithms has become crucial. Cryptosystems based on linear codes, which never gained much acceptance in the cryptographic community before then, are now being deeply scrutinized due to their apparent immunity to quantum attacks. They relies on the hardness of finding a minimum-weight codeword in a large linear code with an apparently random structure. Their analysis is therefore crucial to show if and how they can resist to known quantum attacks.

The main focus of our work has been the design, development and testing of quantum circuits to attack linear codes. To this purpose we adapted the well-known Grover’s algorithm to solve the Information Set Decoding (ISD) problem. Our proposed circuits have been developed using Qiskit, a framework provided by IBM to program their quantum hardware.

In Chapter 1 we introduce the topic of error correcting codes, with the formal definitions required to understand the subject. Special attention has been paid to the classical algorithms used for Information Set Decoding. We also briefly explained the two most known cryptosystems based on linear codes: McEliece and Niederreiter.

(18)

explain-ing both the QRAM and the quantum circuit, which are the most widely used models to describe quantum algorithms. In this chapter, we also presented the most common basic gates used to design quantum circuits, together with some examples to allow a basic understanding on how to analyze the evolu-tion of a quantum system.

In Chapter 3 we detailed the design and implementation of our algo-rithms, but only after having introduced Grover’s algorithm and the quan-tum circuits used to support them. An extensive analysis in terms of number of qubits, number of gates and number of Grover’s round required can be found in Chapter 4.

We devoted Appendix A to a deeper treatment of the algebra required in quantum computation, while Appendix B contains a detailed list of the tools and frameworks used to build and test our algorithms.

(19)

Error correcting codes

1.1 Linear codes

In [40] Claude Shannon laid the mathematical foundations for the trans-missions of messages over noisy channels. The fundamental idea is to add redundant information to the transmitted message, in order for the receiver to recover the original message even in the presence of a limited number of errors.

The main idea of linear codes is to encode an input message m of length k into a codeword c of length n to protect against channel errors. The encoding is done in such a way that error patterns (up to a certain amount) should not transform one codeword into the other.

Before proceeding with the remaining sections, it is useful to introduce some formal definitions and notations, which are heavily based on the ones used by Hobach [22] and [21], but that are similar to those found in any literature on linear codes.

1.1.1 Preliminaries

Notation

In this book we use Fq to denote a finite field of order q. Usually, we use q = 2, i.e. we work with binary fields.

With v ∈ Fn

q we denote a column vector with n elements over the field. v| should then denote a row vector. A special vector is the all-zero vector, denoted by 0.

(20)

With M ∈ Fx×y

q we denote the set of all the x × y matrix with entries over the field. A special matrix is the identity matrix Ir, with r specifying its size (i.e. r × r).

Given a matrix A ∈ Fx×y

q , we call AI the matrix which consists of the column vectors of A that are indexed by a set I ⊆ {1, . . . , x}. In a similar fashion, we define vI to be the column vector consisting of the I-indexed entries of v.

Given A ∈ Fx×y

q , we employ the notation A = [ A1| A2 ] to show that the concatenation of the column vectors of the two matrices A1 and A2 forms the original matrix A. Note that we must have A1 ∈ Fx×zq and A2∈ Fx×(y−z)q .

Similarly, we employ the notation a = [ a1 | a2 ]to show the concate-nation of the two column vectors forming the column vector a.

In the pseudocode of our algorithms, we use the notation h a, b, c i to define a tuple, which is often used when a function returns more than one value.

Definitions

Definition 1.1 (Linear code). Let Fq denote a finite field of q elements and Fn

q a vector space of n tuples over Fq. An [n, k] linear code C is a k-dimensional vector subspace of Fn_q. The vectors (c1, c2, . . . , ckq) ∈ C are called codewords of C .

If q = 2, the code is described as binary linear code. We will mainly focus on binary linear codes in the rest of the book.

Remark 1. We call such codes linear because for each codewords c1, c2 ∈ C and each scalar α, we have:

1. c1+ c2 ∈ C. 2. αc1 ∈ C

The above property strictly follows from the linearity of matrix multi-plication used in the codeword generation process. Indeed, being a linear subspace of Fn

q, a linear code can be represented as the span of a minimal set of codewords - the basis. These basis codewords are often collated into rows of a matrix known as generator matrix for the code C .

(21)

Definition 1.2 (Generator Matrix). Denote by m ∈ Fk_q a message and by G ∈ Fk×n

q a matrix of rank k. Then we can obtain the [n, k] code as C := {c ∈ Fn

q|c = G|m}. G is called the Generator matrix of the code C . From the definition, we can easily see how a codeword c is just a linear combination of the rows of G.

The generator matrix allows to translate, or encode, a given message m of length k into a codeword c of length n. Note that in general the generator matrix for a given code is not unique.

Definition 1.3(Information and Redundancy set). An Information Set of C is a set of coordinates corresponding to any k linearly independent columns of G, while the remaining n − k columns form the Redundancy Set of C Notation 1. We define for the rest of the book r := n − k

In the context of linear codes, we also use the notion of code rate. Definition 1.4 (Code rate). The code rate, also called information rate, is defined as R := k

n

An important properties of vectors which will be vastly used is the Ham-ming distance.

Definition 1.5 (Hamming distance). The Hamming distance dist(x, y) between two vectors x, y ∈ Fm

q is defined to be the number of positions at which corresponding symbols xi, yi, ∀1 ≤ i ≤ mare different.

Roughly speaking, the Hamming distance measures the minimum num-ber of substitutions to change one vector into the other, or the minimum number of errors that could have transformed one vector into the other. Remark 2. If q = 2 the Hamming distance is equal to the number of ones in x ⊕ y

Definition 1.6 (Hamming weight). The Hamming weight weight(x) = dist(x, 0) of a vector x ∈ Fnq is defined to be the Hamming distance between xand the zero vector.

In other words, the Hamming weight is the number of nonzero compo-nents of x.

(22)

Definition 1.7 (Hamming distance of a code). The Hamming distance of a code is the minimum Hamming distance between any two codewords, i.e. d := min{dist(c1, c2)|c1, c2 ∈ C , c1 6= c2}. Equivalently, the Ham-ming distance of a code could be defined as the minimum HamHam-ming weight of its nonzero codewords. This leads to the widely spread notation of [n, k, d]-code.

The concept of Hamming distance of a code is important from a theo-retical viewpoint. The main idea of a linear code, in fact, is to encode the original message in such a way that error patterns should not transform one codeword into another. If d is the Hamming distance of a certain code, two distinct codewords differ in at least d bits. For this reason, we can detect up to d − 1 errors in a transmitted codeword, while correcting up to bd−1

2 c in an unique way. So, for example, a one-bit correction code must have at least d = 3 to be able to correct for sure all errors of weight at most 1.

To note that from the previous statement we could not say anything in advance on the ability to correct an error with a weight greater than bd−1

2 c. If such an error pattern occurs, there may be more than one codewords corresponding to it and it remains a task for the receiver to decide which codeword was probably sent. Indeed, [47] and [16] independently proposed the notion of list decoding that allows the decoder to output a list of code-words as answers. In this book, however, we will focus our attention on the algorithms with error-correction capabilities smaller than bd−1

2 c.

As pointed out in [43], the problem of finding the Hamming distance of a code is hard. For this reason, often we resort to other tricks to ascertain that for sure the Hamming distance of a code is at least equal to some fixed value.

Similarly to the code rate, we could also introduce the notion of error rate.

Definition 1.8 (Error rate). The error rate is defined as T := _nt, if the linear code allows for at most t errors to be corrected.

It is also useful to introduce the concept of equivalence between linear codes.

Definition 1.9 (Codes equivalence). Two linear [n, k, d]-codes C_A, CB are said to be equivalent if and only if there exists a bijective mapping function

(23)

f : Fnq → Fnq such that ∀c1, c2 ∈ CAthere exists mappings f(c1), f (c2) ∈ C2 with dist(c1, c2) = dist(f (c1), f (c2))

Remark 3. Equivalent codes with the same set of codewords, i.e. CA= CB, are also called identical codes. In this case the generator matrix of CA can be transformed into the generator matrix of the code CB by only using elementary row operations.

Remark 4. It should be stressed that the mapping m → c from the message space to the code space is in general not the same for equivalent codes, even for identical codes. The notion of equivalent codes is in fact based on preserving the Hamming distance of the code.

Lemma 1.1.1. Given a linear code C1 with a generator matrix G1 ∈ Fk×nq and an equivalent linear code C2 with generator matrix G2 the following statements hold:

• C1 is a linear [n, k, d]-code ⇐⇒ C2 is a linear [n, k, d]-code

• G2 can always be written as G2 = AG1M, with A ∈ Fk×kq a matrix of rank k and M ∈ Fn×n

q a monomial matrix.

All of the properties above does not change the Hamming distance be-tween any two different codewords and they allow to find a new generator matrix for an equivalent code using basic linear algebra techniques, such as Gaussian elimination. In particular, they are used to find the so-called systematic form of a generator matrix.

Definition 1.10(Systematic form of a generator matrix). Every linear code C1 with generator matrix G1∈ Fn×kq has an equivalent (and even an identi-cal) code C2 with a generator matrix G2 = [Ik|Q], with I the k × k identity matrix and Q ∈ Fk×r

q . Generator matrices in such a form are called sys-tematic. In this case the first k columns of G2 forms the information set of C.

A generator matrix in systematic form can be convenient as it allows the sender to just append r := n − k symbols to the message m of length n. For decoding, the receiver can directly read the message from the first k symbols, after having corrected possible errors using the redundant r := n−k symbols. Instead of using a generator matrix G, a parity check matrix H is often employed to define a linear code.

(24)

Definition 1.11 (Parity-check matrix). For any [n, k]-code C with gener-ator matrix G ∈ Fk×n

q , any matrix H ∈ Fr×nq , with HG|= 0[r×k] and rank r is called Parity Check matrix for the code C . H is a matrix whose null space is C . The code generated by H is called the dual code of C .

Remark 5. As for the generator matrix G, also the parity check matrix in general is not unique.

If the generator matrix G is in systematic form, we can directly compute the parity check matrix H in systematic form.

Remark 6. For any linear code C with generator matrix G ∈ Fq k×n _in standard systematic form [Ik|Q], Q ∈ Fk×rq we have that the H = [−Q||Ir] is a valid parity check matrix of C , called systematic.

Remark 7. The minimum Hamming distance d of a linear code C is equal to the minimum number of linearly dependent columns of H, i.e. the minimum number of columns of H that can be combined together to give the zero vector.

Using the parity check matrix, we can restate the codeword generation process as a parity check or nullspace check. Indeed, from definition 1.11, we have that if c is a codeword of C , then Hc = 0. This property give also a fundamental tool to check for the correctness of a received codeword, as it will be more clear in 1.2.1

1.1.2 Hamming codes

We close this section with a brief overview on Hamming codes, the first class of binary linear codes developed for error correcting purposes, presented in 1950 by Hamming in [20].

In general, to be able to retrieve the original message m ∈ Fk

2 from a codeword c ∈ Fn

2, we need to satisfy the inequality n ≤ 2n−k − 1. This formula can be used to obtain a lower bound on the number of parity bits n − k needed. For example, for a k = 4-bit input message, we need at least r = 3redundancy bits, giving a total codeword of length n = k + r = 7. The best we can achieve for the code rate is to satisfy the previous bound with equality, that is n = 2n−k_{− 1}_.

(25)

Hamming codes satisfy the previous equality and have therefore the min-imum number of parity bits needed to correct a message, thus obtaining the highest possible rate for a given n.

The [n, k] parameters of an Hamming code are chosen to be [2r_{− 1, 2}r₋ r − 1]for each r ≥ 2. It can be shown that all the linear codes chosen using the previous parameters have a minimum Hamming distance of three and can thus be used to correct all one-bit errors in a message. So, for example, if r = 3, we have a [7, 4, 3] linear code.

Another fundamental aspect of these class of codes is that the generation of the parity check matrix is trivial. It suffices to list all binary vectors of length r from 1 to 2r_{− 1}_{and put them together in a matrix column-wise.}

Given the relatively easiness of matrix generation for these class of codes, we have extensively used them for testing all the proposed algorithms.

1.2 Decoding methods

Once we have an understanding of the basic concepts of linear codes, we can dive into the decoding strategies.

Recall that in the traditional usage of linear codes for error-correcting purposes, a decoding strategy tries to translate a (possibly erroneous) re-ceived message y into a codeword c ∈ C , and from then to the original message m.

Our interest in the different decoding strategy is motivated on the usage of linear codes in cryptography, as we will see in section 1.3. In summary, an attacker, dealing with a random looking code and an encoded message y (the ciphertext), tries to find the original message m (the plaintext).

It is quite natural, for this reason, to analyze the most interesting meth-ods to decode a linear code and see also how they behave in case the code structure is not known in advance.

1.2.1 Syndrome decoding

Using the definition 1.11 of parity check matrix, we can introduce the concept of syndrome of a given message.

(26)

Definition 1.12 (Syndrome). Given a parity check matrix H ∈ Fr×n_q of a code C and y ∈ Fn

q a (received) vector, we call s = Hy, s ∈ Frqthe syndrome of y.

If the received vector y is error free (i.e., y is a proper codeword of C ) we have that s = Hy = H(G|_{m) = HG}|_{m = 0}_.

If, on the other hand, y contains some error e, i.e. y = c + e, we have that s = Hy = H(G|_{m) + He = He}_{. For this reason, the syndrome of a} message is also referred to as the syndrome of the error.

The error syndrome can be used to correct errors on a received codeword. If we have a table T mapping each syndrome s to the corresponding error e, with weight(e) ≤ bd−1₂ c, upon receiving a message y we can retrieve the original codeword as c = y − T(s) = y − e. This process is known as syndrome decoding. To note that, if we use binary vectors, the precomputed table T must have a size of 2r_{− 1}_.

How all of the above presentation applies to a cryptosystem? Suppose an attacker receives a given ciphertext y and she knows the parity-check matrix H used to generate it and the error weight t. How can she retrieve the original message m?

" # H ×                   y = " # s ≡ " # H ×                   e = " # s

This problem, identical to the problem of decoding a random linear code, is also called Computational Syndrome Decoding (CSD) problem. The most naive approach is, as always, an exhaustive search. Knowing that weight(e) = t, we must find a selection of t columns of H adding up to s. This problem is very close to a subset-sum problem and requires us to compute every sum of t columns of H. Algorithm 1.2.1 gives the pseudocode of the proposed algorithm.

(27)

Algorithm 1.2.1:Syndrome decoding - exhaustive search Input : s: a r-bit long syndrome (column vector)

H: a r × n binary parity-check matrix

t: the weight of the error vector to be recovered

Output: e: an n-bit binary column error vector s.t. He = s, with weight(e) = t

1 e ← [0|_1×n]

2 for j ← 1 to n_t do

// J is a set of t distinct integers in {0, . . . , n − 1} 3 J ← IntegerToCombination(j)

4 if P_i∈J hi = s then 5 foreach i ∈ J do

6 e ← e + [0_1×i | 1 | 0_{1×(r+k−1−i)} ]| 7 return e

time complexity of the algorithm, denoted by CES(n, r, t), is

CES(n, r, t) = n

t

(CIntToComb+ tr) (1.1) The spatial complexity is SES(n, r) = O(rn).

Proof. The cost of the exhaustive can be obtained starting from the com-plexity of a single iteration of the loop. First of all, at line 3, we have the cost of decoding an integer into its combinadics representation, i.e. finding the corresponding combination among all the n

t

ones. This is equal to

CIntToComb = O((2n − t) log n

t 2!

. Note that, if the value of t is fixed, it is possible to avoid CIntToComb, spe-cializing the algorithm with a t-deep loop nest to generate the combinations. To this, we have to add the cost of adding together t + 1 bit vectors of length r, i.e., tr, multiplied by the number of such additions.

The whole process has to be repeated n t

times.

Can we do better than this? We will see in the next section how can we use the notion of Information Set to speed up the decoding (and hence the decryption) of a received message.

(28)

1.2.2 Information Set Decoding

A relevant aspect of linear codes which has already been briefly touched in definition 1.3 is the so-called Information Set. Before taking a deeper dive in the Information Set Decoding problem, we shall introduce a few useful notations and definitions.

Definition 1.13 (Information set generator matrix). Given a [n, k] linear code with G ∈ Fk×n

q , a size-k index set I ⊆ {1, . . . , n} is an information set if and only if the rank of GI) = |I| = k. Said differently, I is an information set if and only if GI has full rank, i.e. all the k-indexed columns of G are linearly independent.

The columns of GI span a k-dimensional vector space and can thus be used to represent any possible non-redundant linear transformation on a message m ∈ Fk

q.

Equivalently, we could also define the information set for a parity check matrix.

Definition 1.14 (Information set parity check matrix). Given a linear code C with parity check matrix H ∈ Fr×n_q , any size-k index set I ⊆ {1, . . . , n} is an information set if and only if the rank of HI∗ = r, with I∗ := {1, . . . n}\I.

Roughly speaking, we can either choose k linearly independent columns of G or r linearly independent columns of H to form an information set.

Information Set Decoding (or ISD for short) can be used in error correct-ing codes to decode a possible erroneous n-bit message y, given a description of the linear code C in the form of a k×n generator matrix G and the weight of the error t = weight(e).

In particular, using the generator matrix G and given a received vector y = c + e, we have that

y|_IG−1_I = (c + e)|_IG−1_I = c|_IG−1_I + e|_IG−1_I If eI = 0, i.e. y is a proper codeword, we obtain

y_I|G−1I = c|IG

−1

I = (m|G)IG−1I = m|

So, upon receiving a message y and after choosing a proper information set I, if we know that eI=0 we can easily retrieve the original message m

(29)

using the inverse matrix of GI, which surely exists given that it has full rank.

Note that cx ∈ C because ˆG = G

−1

I Galso generates a code ˆC which is identical to the original code C generated by G.

Hence, correcting errors in linear codes and retrieving the original mes-sage could be described as the problem of finding an information set I which does not contain any error (or for which we know eI).

It is possible to restate the aforementioned decoding problem in terms of the r × n parity check matrix H and the r-bit syndrome s in a similar fashion to what we have seen in 1.2.1.

All the information set decoding algorithms presented in this section uses the matrix H and the syndrome s to retrieve the n-bit error e of weight t; thus we will present a more extensive explanation based on this approach.

Upon receiving a parity check matrix H and after choosing randomly an information set I:

1. we use a permutation matrix P ∈ Fn×n

q to bring the columns of H indexed by I∗ _{to the right side of H, obtaining}

c

H := HP.

2. we apply elementary row operations to Hb to bring it into systematic form. We represent the whole elementary operations as a matrix U and we thus obtainH = U cf H = [V |I_r].

To note thatHf generates a code which is equivalent, but not identical, to the original H. We have to additionally define a new syndrome ¯s := Us and a new error vector e = Pb

−1

e = P|e. This latter operation is used to pack all the I indexed rows of the (column) vector e on the top. Basically,

(30)

what we want to obtain at the end of the procedure is something like that. h V Ir i f H × " b eI b eI∗ # b e = ¯s

From a mathematical view point, what we get from the steps above is

f He = [V |I][b ebI|ebI∗] |₌ = Veb | I+ Ieb | I∗= = Veb | I+eb | I∗ := ¯s (1.2)

from which we can also derive

b

e|_I∗ = ¯s − Ve_b|_I (1.3)

In Algorithm 1.2.2 we present the pseudo-code description arising from the operations presented above, in which RandomPermutationGen(n) denotes a procedure to generate a random n × n permutation matrix, while RedRowEchelonForm(H) is a procedure used to bring the input matrix H in the reduced row echelon form.

Algorithm 1.2.2:ISextract

Input : s: a r-bit long syndrome (column vector) H: a r × n binary parity-check matrix Output: P: an n × n permutation matrix

¯

san r-bit long binary column vector

V: an r × k binary matrix V = [v0, . . . , vk−1] 1 repeat

2 P ← RandomPermutationGen(n) // random choice of I 3 H ← HPc // the corresponding error vector is e = P_b |e 4 5 h U , [V |W ] i ← RedRowEchelonForm(cH) // UH = [V |Wc _r×r] = fH 6 until W 6= Ir 7 s ← U s¯ 8 return h P , [V |W ], ¯si

(31)

Proposition 1.2.2 (Computational complexity of Algorithm 1.2.2). Given the binary r × n parity-check H and the r × 1 syndrome s, we define the time complexity CIS(n, r) CIS(n, r) = 1 Qr i=1(1 − 2−i) CRREF(n, r) + r2 (1.4) with CRREF(n, r) = O nr2 2 + nr 2 − r3 6 + r 2₊ r 6 − 1

The spatial complexity is SIS(n, r) = O(rn).

Proof. The loop body of Algorithm 1.2.2 is dominated by the cost of checking that the matrix W is an identity, i.e. that the corresponding submatrix of

b

H has indeed full rank.

Note that in an r × r binary matrix, the first row has a probability of 1

2r of being linearly dependent from itself (i.e. zero); the second row has a

probability of 2

2r of being linearly dependent (i.e. zero or equal to the first).

With an inductive argument we obtain that the r-th row has a probability of 2r−1

2r of being linearly dependent from the previous ones. We thus have

that the probability of having all the rows independent from one another is Qr i=1(1 −2 i−1 2r ) = Qr i=1(1 − 21i).

We thus have that the column permutation, with computational com-plexity rn, and the RREF transformationhave to be repeated 1

Qr

i=1(1−2−i)

times, yielding the first addend of the computational cost CIS(n, r). The cost CRREF(n, r)is derived considering the RREF as an iterative algorithm performing as many iterations as the rank of the identity matrix in the result (i.e., r in this case). Each iteration 0 ≤ i ≤ r − 2 proceeds to find a pivot, taking O(r − i), swaps it with the (r − i)-th row in O(n) and proceeds to add the pivot to all the remaining r − i − 1 rows which have a one in the n − i-th column. The total cost is

CRREF(n, r) = O r−2 X i=0 (r − i) + rn + r−2 X i=0 (r − 1 − i)(n − i) ! = O nr 2 2 + nr 2 − r3 6 + r 2₊r 6 − 1

(32)

The second addend of the cost CIS(n, r)is constituted by the computational complexity of computing ¯s, which is r2_.

A huge variety of ISD algorithms have been proposed during the years. Up to now, they are the best known algorithms to solve ISD that do not rely on any specific structure in the code, i.e. the actual case of code-based cryptography.

Prange

In [37] Prange described the first ISD algorithm that laid the foundation for all the subsequent work on ISD.

The fundamental observation made by Prange is that, ifebI=0, i.e. if the k entries of the error indexed by the information set I are all zero, then ¯s reveals all the entries of ebI∗.

h V Ir i f H × " b eI = 0 b eI∗ # b e = ¯s

Indeed, if this is the case, from (1.2) we have that ebI∗ = ¯s and thus weight(be) = weight(ebI∗) = weight(¯s) = t.

A pseudo-code description of Prange’s ISD algorithm is provided in Al-gorithm 1.2.3.

Algorithm 1.2.3:Syndrome decoding - Prange’s ISD Input : s: a r-bit long syndrome (column vector)

Data : P: an n × n permutation matrix ¯

san r-bit long binary column vector

V: an r × k binary matrix V = [v0, . . . , vk−1] 1 repeat 2 h P , fH, ¯si ← ISextract(s, H) 3 until weight(¯s) = t 4 e ← [0_b |_1×k | ¯s] 5 return bP e

(33)

The time complexity of Algorithm 1.2.3 can be computed starting from the probability of success Prsucc of a single iteration of the loop and the compu-tational requirements of executing the loop body citer. In particular the time complexity is CISD(n, r, t) = 1 Prsucc citer = n t r t (CIS(n, r) + O(n)) where CIS(n, r)is as in (1.4).

Proof. Prsucc is obtained as the number of permuted error vectors with the error-affected positions such that they are fitting the hypotheses made by the algorithm, divided by the number of all the possible error vectors. This fact holds for all ISD algorithms. In the case of Prange’s ISD, the permuted error vectors admissible by the hypotheses are r

t

, as all the error-affected positions should be within the last r bits of the permuted error vectors, while the overall number of error vectors is n

t

.

The additional O(n) is the cost of buildingbe.

Lee-Brickell

Lee and Brickell in [29] improved Prange’s idea by allowing a known weight for the error vector columns indexed by the information set, i.e. weight(beI) = p. In this sense, Prange can be considered a special case of Lee-Brickell’s algorithm with p = 0.

The main idea underlying the algorithm is that it is unlikely for eb to spread all the weight in the last r positions. Instead, Lee and Brickell decided to allow exactly p non-zero inside the first k coordinates of eb

Indeed, from (1.3) and given that we must have weight(be) = t, we have that

weight(beI∗) = weight(¯s − V

b

eI) = t − p

Algorithm 1.2.4 presents the pseudocode for all the reasoning above for binary linear codes. We use J ⊆ I to denote a set of p distinct integers in {0, . . . , k − 1}, which represent the indexes ofe_bI guessed to contain the error bits.

(34)

Algorithm 1.2.4:Syndrome decoding - Lee-Brickell’s ISD Input : s: a r-bit long syndrome (column vector)

Data : P: an n × n permutation matrix b

e = P|e: the error vector row-permuted by P| p: the weight of the first k bits ofeb, 0 ≤ p ≤ t ¯

s: an r-bit long binary column vector; syndrome of e_b V: an r × k binary matrix V = [v0, . . . , vk−1] 1 Loop 2 hP , [V | I_r], ¯si ← ISextract(H, s) 3 for j ← 1 to k_p do 4 J ← IntegerToCombination(j) 5 e_bI∗ ← ¯s +P

i∈J vi // vi is the column i of V

6 if weight(beI∗) = t − p then

7 e ← [0_b |_1×k|e_bI∗]

8 foreach i ∈ J do

9 e ←_b e + [0_b _1×i | 1 | 0_{1×(r+k−1−i)} ]| 10 return P|e_b

Proposition 1.2.4 (Computational complexity of Algorithm 1.2.4). As for Algorithm 1.2.3, the time complexity of Algorithm 1.2.4 can be computed starting from the probability of success Prsucc of a single iteration of the loop and the computational requirements of executing the loop body citer. In particular the time complexity is

CISD(n, r, t, p) = 1 Prsucc citer= n t k p r t−p CIS(n, r) + k p (CIntToComb+ pr) (1.5) where CIS(n, r) is as in (1.4). The second addend, instead, is the cost of decoding an integer into its combinadics representation, i.e. finding the corresponding combination among all the k

p

ones. Its cost is given by

CIntToComb= O((2k − p)(log k

p

)2)

(35)

fol-lowing the same line of reasoning employed for Prange’s, thus dividing the number of admissible permuted error vectors, k

p _r

t−p

, by the number of the possible error vectors n

t

.

The cost of an iteration of Lee and Brickell’s algorithm can be obtained as the cost of adding together p + 1 bit vectors of length r, i.e., pr (line 5), multiplied by the number of such additions, i.e., k

p

as they constitute the body of the loop at lines 4–9. Note that, if the value of p is fixed, it is possible to avoid CIntToComb, specializing the algorithm with a p-deep loop nest to generate the combinations.

Leon

Leon’s algorithm presented in [30] is a refinement on Lee-Brickell’s one in which we additionally require that ebI∗ has weight 0 in ` consecutive po-sitions. This idea lowers the success rate of the single iteration, but allows to precompute the sum of the first ` rows of the matrix composed by the p columns of V , instead of requiring the sum of the entire columns.

The idea of allowing this ` long run of zeroes in the permuted vectoreb involves the slicing of the different vectors and matrices in two part, indicated by up and down in the rest of the section.

¯ s = " ¯ sup ¯ sdown # V =hv0 . . . vk−1 i = " Vup Vdown # = " vup 0 . . . vup k−1 vdown 0 . . . vdown k−1 #

The pseudocode of the algorithm is proposed in 1.2.5.

Proposition 1.2.5 (Computational complexity of Algorithm 1.2.5). The time complexity of Algorithm 1.2.5 can be computed, as for the previous algorithms, starting from the probability of success Prsucc of a single iteration of the loop and the computational requirements of executing the loop body citer. In particular the time complexity is given by

(36)

Algorithm 1.2.5:Syndrome decoding - Leon’s ISD Input : s: a r-bit long syndrome (column vector)

Data : P: an n × n permutation matrix b

e = P|e: the error vector row-permuted by P| p: the weight of the first k bits ofeb, 0 ≤ p ≤ t `: length of the run of zeroes in e_bI∗= [ 0_1×`

b

eI∗_,down]

¯

s: an r-bit long binary column vector; syndrome of eb V: an r × k binary matrix V

1 Loop

2 hP, [V Ir], ¯si ← ISextract(H, s) 3 for j ← 1 to k_p do

// J is a set of p distinct integers in {0, . . . , k − 1} 4 J ← IntegerToCombination(j)

5 e_bI∗_,up← ¯s_up+P

i∈J vi,up 6 if weight(beJ∗_,up) = 0 then

7 e_bI∗_,down← ¯s_down+P

i∈J vi,down 8 if weight(beI∗_,down) = t − p then

9 e ← [ 0_b |_1×k |e_bI∗_,up | b eI∗_,down ] 10 foreach i ∈ J do 11 e ←_b e + [0_b 1×i | 1 | 01×(r+k−1−i) ]| 12 return P|e_b CISD(n, r, t, p, `) = 1 Prsucc citer= = n t k p r−` t−p CIS(n, r) + k p CIntToComb+ p` + k p 2` p(r − `) !!

where CIS(n, r) is as in (1.4) and CIntToComb is as in (1.5).

Proof. The success probability of an iteration of Leon’s algorithm follows the same line of reasoning of Prange’s and Lee Brickell’s, dividing the number of admissible permuted error vectors k

p r−`

t−p

by the total one n t

. The complexity of a single iteration is obtained considering that the loop at lines 4–10 will perform k

p

(37)

together (complexity p`), and , if the result is zero, a further addition of p + 1 bit vectors, each one of length r − ` has to be performed (complexity p(r − `)). This further addition takes place with a probability of (

k p)

2`, as

all the possible values for ¯sup are 2`, and only k_p

attempts at hitting the correct one are made, thus yielding the correct complexity.

1.3 Post-Quantum Code Based Cryptosystems

Since the first algorithms for public-key cryptography first appeared in the 1970s, the search for secure and efficient public-key cryptosystems has been one of the most active areas in the field of cryptography. Nowadays, most of the widespread trapdoors used in public-key cryptography, includ-ing RSA, are vastly dependent on only two computationally-hard problems: integer factorization and discrete logarithm. Up to now, the most efficient algorithm for prime factorization, the general number field sieve, run in sub-exponential time, making it theoretically unfeasible to break cryptosystems based on that.

In the 1994, however, Peter Shor firstly formulated a quantum algorithm for integer factorization running in polynomial time [41]; that is, almost exponentially faster than any known classical algorithm.

This huge breakthrough was a powerful incentive to the study of new, quantum resistant, cryptographical algorithm. Although cryptosystems based on linear codes never gained much acceptance in the cryptographic community before then, they are now gaining momentum due to their ap-parent immunity to quantum attacks. They relies on the hardness of finding a minimum-weight codeword in a large linear code with an apparently random structure.

The most famous of these systems are McEliece and Niederreiter ciphers, which are equivalent from the security point of view. They are at the moment among the few alternatives to the common public-key algorithms based on number theory. Studying their security is therefore essential in order to anticipate a possible important progress in factoring methods.

(38)

1.3.1 McEliece

In [23] McEliece proposed an asymmetric encryption algorithm based on linear codes. Although encryption ad decryption are extremely fast and the underlying problem of decoding a general linear code was proven to be NP-hard [5], it was vastly ignored because of the huge key requirements and the relatively low code rate. Moreover, it has always be a problem to estimate the complexity of attacks on concrete McEliece parameter set and thus choose secure parameters in practice [22]. As already said, however, the fact that this system seems to withstand known quantum attacks [13] has led to an increase interest in its study.

Algorithms 1.3.1 present the pseudocode used to derive the private and public key of the cryptosystem; we called ChooseLinearCode the pro-cedure which choose a linear [n, k, d]-code C over the finite field Fq uni-formly at random from a class of linear codes so that a decoding algorithm DecodeG() exists that can correct up to t ≤ bd−1₂ c errors in polynomial time.

As we already know from definition 1.1.1 all the matrices operations defined in the algorithm produce a [n, k, d] code C0 _{which is equivalent to} the original code C . As already stated, even though the codewords C and C0 are similar, the mapping between the message space to the codewords is entirely different for the two codes and thus it is non-trivial to recover the original message m from G0_.

To encrypt a message, the sender employs the algorithm shown in 1.3.2. Basically, we are employing the encoding algorithm and simulating a random transmission error of weight t. If possible, it is a good idea to choose t = bd−1

2 c, i.e. an error with the highest possible weight.

The decryption algorithm, instead, is depicted in 1.3.3. We can also look more closely at the matrix operations necessary for decryption; in particular, the following equation shows the matrix operations necessary to retrieve d from the ciphertext y.

d := P y|= P (mG0)|+ P e|= P (mAGP )|+ P e| = P P|G|A|m + P e|= (AG)|m + P e|

(39)

Algorithm 1.3.1:McEliece Key Generation Input : n, k, q, t ∈ Z

Output: pk: the public key sk: the secret key

Data : P: an n × n permutation matrix over Fq A: a k × k matrix of rank k over Fq 1 C ← ChooseLinearCode

2 Find a generator matrix G ∈ Fk×n_q for the linear code C 3 Choose the permutation matrix P at random

4 Choose the matrix A at random 5 G0 := AGP

6 pk := (G0, n, k, t, q) 7 sk := (G, P , A) 8 return (pk, sk)

Algorithm 1.3.2:McEliece Encryption Input : m: the plaintext, m ∈ Fk_q

pk: the public key Output: y: the ciphertext, y ∈ Fn_q Data : e: a random error, e ∈ Fn_q 1 c := mG0

2 Choose e at random, weight(e) = t 3 y := c + e

4 return y

An important remark is that, being P a permutation matrix, its trans-pose is also its inverse; i.e. P| _{= P}−1_{. Also, the permutation matrix does}

not affect the weight of the error e. Algorithm 1.3.3:McEliece Decryption

Input : y: the ciphertext, y ∈ Fn_q sk: the secret key

Output: m: the plaintext, m ∈ Fk_q 1 d := P y|

2 a := DecodeG(d) = A|m 3 m := A−1a

4 return m

In [23] McEliece suggested the use of binary (i.e. q = 2) Goppa codes with security parameter sizes of n = 1024, k = 524, t = 50. [38] contains an

(40)

extensive summary of Goppa Codes, originally presented by Goppa in [18]. However, this set of parameters was found to be insecure by several re-searchers, for instance in [10] and [6]. Many other more secure parameters have been proposed. For example, the 2015 document [28] provides the PQCRYPTO 1 _{project’s initial recommendations for post-quantum} crypto-graphic algorithms, including among the others a suggestion of n = 6960, k = 5413, t = 119 for McEliece with binary Goppa codes. This choice produces a public key of size 5413 × (6960 − 5413) ≈ 8Mbit ≈ 1MiB and a security level of 266-bit [6].

For the security of the McEliece cryptosystem, as for any other public-key cryptosystems, it should be unfeasible to recover the private public-key from the public key. In particular, given G0_{, it should be hard to retrieve G} (and thus A and P ). Also, in practice, because the linear code C and thus the corresponding matrix G must have a certain structure to be able to define an efficient algorithm for the decoding, an attacker could inherently use the structure of C to recover G. Thus, in general, given random input parameters to create C , it must be hard to distinguish the generator matrix Gfrom a random matrix G ∈ Fb k×n_q .

A strategy that does not require G is based on the concept of information set decoding presented in section 1.2.2

Another crucial point about the security of a cryptosystem is that it must be hard to recover the plaintext m from the ciphertext y. This problem was proved to be NP-hard in [5]. Note that, as already stated, even though C and C0 _{are equivalent, we cannot use decode}

G in combination with G0 and y as it is highly unlikely that G0 has the same structure of G.

As pointed out in [22], the original McEliece cryptosystem as it is pre-sented here is not IND-CPA-secure: If we encrypt 0, we will always get a ciphertext y with weight(y) = t, whereas for almost all of the other plaintexts we have weight(y) 6= t. So an adversary can simply choose the zero-vector and some other message as challenge plaintexts and can then distinguish the ciphertexts with high probability.

More generally, we can distinguish any two ciphertexts y1 and y2 with probability 1, where weight(c1)−weight(c2)is larger than 2t. An attacker does not know the weight of the two codewords c1 and c2, but he knows

1

(41)

that this event often occurs if he chooses the all-zero vector and some other vector as messages.

Quite recently, several suggestion have been made to achieve IND-CCA2-security, for example in [17], [26] and [7].

1.3.2 Niederreiter

This cryptosystem, initially proposed in [32], is a variation of McEliece cryptosystem. It applies the same idea to the parity check matrix H of the linear code C . It uses a syndrome as ciphertext, while the message is the error pattern. The encryption of Niederreiter is about ten times faster than the encryption of McEliece and can thus be used to construct a digital signature scheme.

As the security of this cryptoscheme is equivalent to the one of McEliece, we only show the algorithms required for key generation, encryption and decryption.

Algorithm 1.3.4:Niederreiter Key Generation Input : n, k, q, t ∈ Z

Output: pk: the public key sk: the secret key

Data : P: an n × n permutation matrix over Fq A: an r × r matrix of rank r over Fq

1 Choose a linear [n, k, d]-code C over the finite field Fq uniformly at random from a class of linear codes so that a syndrome decoding algorithm SyndromeDecodeH()exists that can correct up to t ≤ bd−1₂ c errors in polynomial time

2 Find a parity check matrix H ∈ Fr×n_q for the linear code C 3 Choose the permutation matrix P at random

4 Choose the matrix A at random 5 H0 := AHP

6 pk := (H0, n, k, t, q) 7 sk := (H, P , A) 8 return (pk, sk)

(42)

Algorithm 1.3.5:Niederreiter Encryption

Input : m: the plaintext, m ∈ Fn_q, with weight(m) = t pk: the public key

1 y: the ciphertext, y ∈ Fr_q 2 y := H0m|

3 return y

Algorithm 1.3.6:Niederreiter Decryption Input : y: the ciphertext, y ∈ Fn_q

sk: the secret key

Output: m: the plaintext, m ∈ Fn_q 1 d := A−1y = A−1(AHP m|) = HP m| 2 a := SyndromeDecodeH(d) = P m| 3 m|:= P−1a

(43)

Quantum Computing

2.1 Motivations

Back in the 1960s, Intel co-founder Gordon Moore realized that the num-ber of transistors on a computer doubles roughly every 18 months. This apparently unshakable trend, which became known as Moore’s Law, persists even now.

Once we continue to miniaturize computer components more and more, we will eventually reach the point in which we can’t use classical physical laws any more. We are slowly approaching the physical limit, with computer parts approaching the size of an atom.

Since it appears that classical computation has its natural limitations in the terms of the size of computing devices, it is natural to investigate the behavior of objects in micro-scale and nano-scale. At these sizes quantum effects cannot be neglected anymore.

In the quantum realm, physics works quite differently from the pre-dictable ways we are used to. One of the most interesting phenomena that happens when things are very small is that you cannot measure things in the same way you do in the classical world: our measurement is interacting with the observed entity.

Regarding the computer world, one fascinating phenomena is the so-called quantum tunneling. As transistors are shrinking to the size of only a few atoms, electrons may just transfer themselves to the other side of a blocked passage. Therefore, transistors become unreliable.

(44)

Apart from the inherent problems due to miniaturization, a huge factor that led to the study of quantum computing is related to the efficient simula-tion of quantum mechanical effects. If the prominent model of nature obeys quantum mechanic laws, why not to use a quantum computer to simulate quantum mechanics itself?

This observation led scientist to think that perhaps computation in gen-eral, not only simulation of the quantum world, could be much more efficient on a computer using quantum effects with respect to classical computers. Building such a quantum computer, however, proved tricky and the field developed slowly.

2.2 History

The first scientist to consistently speculate about quantum computers was Richard Feynman. During his lectures at Caltech1_{, he discussed on how} to use quantum gates to build a quantum computer. At the time, however, he did not seem to recognize the full potential power of quantum computers. He thought of the quantum computer only as a tiny computer which can compute everything a classical computer can.

When in 1994 Peter Shor surprised the world by describing a polynomial time quantum algorithm for factoring integers [41] (while the best known probabilistic classical algorithm runs in time exponential with respect to the size of input number), computer scientists begin to recognize the full power of quantum computers and started to seriously think that quantum computation could do things that classical computations cannot. Also, they began to hypothesize that quantum machines could have the potential to solve known classical problems more efficiently.

Among the most spectacular achievements in quantum information the-ory up to now, Shor’s algorithm pave the way to the intense study of new quantum algorithm on one hand and the hard objective to build a stable and usable quantum computer on the other.

In the following years, quantum computation has been investigated to create unconditionally secure communication links or send information with

1

(45)

efficiency unachievable with classical computers. Also, quantum crypto-graphic protocols have been implemented in real world systems.

On the other hand, we now know that the quantum mechanic laws allows to improve the solution of some problems, construct games and random walks with new properties.

2.3 Overview

The relevant aspects of mathematics relative to quantum systems, to-gether with the adopted notation, are extensively discussed in Appendix A. In this section, instead, we gently introduce the topic of quantum computa-tion.

In classical computer, bits are the smallest units of information. Quan-tum computer use qubits instead, which can also be set on one of the two values 0 and 1. A qubit can represent any two level quantum system: the spin of an electron (up or down) or the polarization of a photon (horizontal or vertical) are just two relevant candidates for quantum bits.

The main difference with classical computing is that the qubit (and phys-ically, the underlying quantum system, such as the electron) does not have to be just in one of those two states: it can be in any proportion of both states at once. This is called superposition.

However, as soon as we take a measurement of the qubit, the physical system has to decide in which one of the two quantum states to be. In other words, measurements will give us just one single value, 0 or 1.

So, as long as unobserved, the qubit is in a superposition of probabilities for 0 and 1 and we cannot predict which one we will measure. While, for example, four classical bits can be in one of the 24 _{different configurations,} four qubits can be in all the 24 _{combinations at once.}

Another strange properties that qubits can have is called entanglement: when multiple qubits have been interacting in some way, each of the qubits react instantaneously to a change in another qubit state, no matter how far apart they are. This counter-intuitive aspect makes it possible to deduce the state of a qubit indirectly, by only observing the state of its entangled partners and not the one of the qubit we are interested in.

(46)

amount of inputs and produces a single output, a quantum gate manipulates an input of superpositions, rotates probabilities and produces another super-position as its output. So a quantum computer sets up some qubits, applies quantum gates to entangle them and manipulate probabilities and finally measure the outcome, collapsing superpositions to an actual sequence of 0’s and 1’s.

What it means is that we get the entire lot of calculations that are possi-ble with our current setup all done at the same time; at the end, we measure only one of the possible results and this result is probably the one we want. By cleverly exploiting superposition and entanglement, this could be expo-nentially more efficient than a classical computer.

2.4 Quantum Information Theory

Quantum information theory is a fascinating new field of theoretical com-puter science. It is based on quantum physics and classical comcom-puter science and its goal is to use the laws of quantum mechanics to develop more pow-erful algorithms and protocols. The main goal of this section is to present the various formal models used in the quantum computer science. As we know, classical computation can be described using various models; among the most important ones we can count:

• Turing Machine

• Random Access Machine • Boolean circuits

• Lambda calculus

• Universal programming language

It has be shown that all these models are equivalent to each other; the model of a multi-tape Turing machine is regarded as the canonical one. This fact is captured by the following.

Hypothesis 2.4.1 (Church-Turing). Every function which would be natu-rally regarded as computable can be computed by a universal Turing machine.

(47)

Although stated as hypothesis, this is one of the fundamental axioms of modern computer science. A universal Turing machine is able to simulate any other machine. Research in quantum information processing is motivated by an extended version of this hypothesis formulated by Deutsch.

Hypothesis 2.4.2 (Church-Turing-Deutsch). Every physical process can be simulated by a universal computing device.

In other words, the principle states that a universal computing device can simulate every physical process. Thus, assuming the law of quantum physics can completely describe every physical process, a quantum Turing machine should be governed by the laws of quantum physics and might simulate all the quantum processes.

In quantum information theory, the prominent model used in algorithm presentation is surely the Boolean circuit one. However, as we can see, sometimes the QRAM model can provide some conciseness that would be unachievable using only the circuit model. For this reason, we will review both of them in the following sections.

2.5 Boolean circuits

Boolean circuits are mathematical models mostly used for digital logic circuits. In the most general case, they are used to compute functions of the form f : {0, 1}m _{7→ {0, 1}}n_.

Proposition 2.5.1. Any Boolean function f : {0, 1}m 7→ {0, 1}n _is com-putable by some kind of circuit composed of logic gates.

More formally, we can give the following definition of a Boolean circuit. Definition 2.1. (Boolean circuit) A Boolean circuit is a finite, acyclic directed graph with nodes labeled by input variables, output variables or logical gates.

In a Boolean circuit, an input variable node has no incoming arrows while an output variable node has no outcoming arrows. A common choice for the logical gates is the set {∧, ∨, ¬}, which is indeed universal.

Definition 2.2. A set of gates is called universal if all the functions can be constructed using the gates from this set.

(48)

Because we can replace the previous gates with just a NAND gate, we also have that the NAND is universal for quantum computation.

x2 ¬ ∨ ¬ y2

¬

x1 ∨ ∧ y1

Figure 2.1: An example of a classical Boolean circuit computing the sum of bits of x1 and x2.

A logic gate is described using the number of input bits, the number of output bits and the input/output mapping (usually through a truth table). For example, both AND and NOT take 2 input bits and produce 1 output bit, while NOT takes 1 input bit and produces 1 output bit.

2.5.1 Reversible circuits

In the physical reality, a theoretical bit, represented in Dirac notation as |0i or |1i, is implemented by particle(s). Similarly, a gate that manipulates the bit representations is implemented by a physical object that alter the particle(s) in some way. While the laws of physics are reversible with respect to time, this is not true of most gates coming from the classical world.

Take for example an AND gate and suppose its output is |0i. We cannot infer what its inputs were: the process is not reversible. On the other end, a NOT gate is theoretically reversible: its input can trivially be determined from its output. In broad terms, a reversible gate is a gate for which we can derive the inputs given the outputs.

In order to obtain this property, it is necessary that the number of inputs is equal to the number of outputs and that the input/output function is a bijection.

The issue related to reversible gates has been deeply studied long before the advent of quantum computing. In 1961 Launder [27], from a thermo-dynamical point of view, showed that in any logically irreversible operation entropy of a Boolean circuit increases and an associated amount of energy

(49)

is dissipated as heat. In 1973 [3] at IBM Research showed that a univer-sal Turing machine could be made both logically and thermodynamically reversible.

The quest for Boolean gates that were both reversible and universal was motivated by thermodynamical aspects. If such a set of gates exist, it could be possible, at least theoretically, to have circuits doing general computation without dissipating any energy.

The quest was indeed successful, but it turned out that from a practical point of view energy dissipation did not prove to be a major problem. Their results became fundamental decades later for the quantum model. So, we will briefly introduce the fundamental results on the topic.

Definition 2.3. A Boolean gate G is said to be reversible if it has the same number of inputs as outputs and its mapping from input strings to output strings is a bijection.

The first example of a reversible gate is called CNOT (controlled not) gate. It has 2 input bits and 2 output bits

x1 x1

x2 x1⊕ x2 Its associated truth table is shown in 2.1.

Input Output |00i |00i |01i |01i |10i |11i |11i |10i

Table 2.1: CNOT truth table

In other words, x1 is always passed through; also, if x1 = |1i, x2 gets a NOT applied to it. x1 is said to be the control bit, while x2 is the target.

The CCNOT (controlled-controlled not) gate, also known as Toffoli gate, has 3 input bits and 3 output bits.

(50)

x1 x1

x2 x2

x3 (x1∧ x2) ⊕ x3 Its associated truth table is shown in 2.2

Input Output |000i |000i |001i |001i |010i |010i |011i |011i |100i |100i |101i |101i |110i |111i |111i |110i

Table 2.2: CCNOT truth table

In other words, the first two bits are passed through directly; also, if both x1= |1i and x2 = |1i, x3 is negated.

One particular important theorem is the following one.

Theorem 2.5.2. The CCNOT gate is universal: any standard AND, OR, NOT circuit for a function {0, 1}n_{7→ {0, 1}}m _{may be efficiently transformed} into a reversible one consisting only of CCNOT gates.

Let us see, for example, how can simulate a NAND using CCNOT.

x1 x1

x2 x2

|1i N AN D(x1, x2)

We can use it also to simulate a DUPE gate, which basically duplicate the state of a bit.

|1i |1i

x2 x2

|0i x2

While a DUPE is useless in classical computation, where a single wire, and hence a single bit, can branch into separate wires, it will become very useful in quantum computation.

(51)

Another interesting reversible gate is the CSWAP gate, also called a Fredkin gate, that does a controlled swap of two qubits.

x1 x1

x2 x2 (x3) x3 x3 (x2)

where the parenthesis represent alternatives. Basically, if x1 is active, x2 and x3 are swapped.

This gate is also universal for quantum computation. For example, an AND gate may be built in this way

x1 x1

x2 don’t care |0i AN D(x1, x2)

where we are not interested in the status of the second wire.

From the introduction of previous gates, we have noticed an interesting aspect. Reversible gates often use so-called ancilla bits, i.e. input bits with a fixed value which are put there for the only purpose of assisting the computation. This is the case, for example, of |0i on the third wire of last circuit.

Also, the gate may output results we are not interested in, as highlighted by the second output wire of the last circuit. We call those useless bits garbage bits.

From all the previous discussion we can also derive an important theorem. Theorem 2.5.3. Any classical circuit C computing some Boolean function f : {0, 1}n 7→ {0, 1}m _{can be converted to a reversible circuit computing} function g : {0, 1}n+a _{7→ {0, 1}}m+g_{, where a and g represent ancilla and} garbage bits respectively and are such that n + a = m + g.

2.5.2 Randomized circuits

As for reversible computation, interests in randomized computation also began long before the advent of quantum computation. Indeed, the study of randomized algorithms was stimulated by the discovery of a randomized

(52)

primality test in 1977, followed soon after by the proposal of a randomized version of Miller’s primality test by Rabin.

It is very easy to upgrade the circuit model of computation to a random-ized model introducing a single new gate, called COIN (or just $). It has 0 inputs and 1 output; the output is a fair coin flip, viz. |0i with probability

1

2 and |1i with probability 1 2. We will write this state as

1 2|0i +

1 2|1i

A randomized circuit will look like the following one

$ |0i

The output of this circuit is |00i with probability 1

2 (if the output of the coin flip is 0) and |11i with probability 1

2 (if the output of the coin flip is 1). Let us analyze it step-by-step:

1. After the COIN, x1= 1₂|0i + 1₂|1i 2. Before the CNOT, x2 = 1 |0i + 0 |1i

3. After the CNOT, we have to analyze the state of the two bits together: they are correlated. The analysis of the circuit should be indeed based on the joint state of x1 and x2, i.e. 1₂|00i +1₂|11i

In reality, we should have kept track of the joint states of the bits all along. So, for example, at step 2 we should have denoted the state of the bits as 1

2|00i + 1 2|10i.

One obvious but important remark to make is that, at any time, the coefficient probabilities of a state are non-negative and summing up to 1. Also, we should stress that when we observe the output we will get rid of the probabilities: the state collapses in one of its possible outputs. For example, suppose we have a state like

|ψi = 5 8|000i + 1 8|100i + 1 8|011i + 1 8|111i

(53)

If at this point we were to measure just the first quantum bit the probability we would observe a |0i is 5/8 + 1/8 = 3/4. Supposing that we indeed observed |0i, using conditional probability we could deduce that the state of the system collapses to

5/8 3/4|000i + 1/8 3/4|001i = 5 6|000i + 1 6|001i 2.5.3 Quantum circuits

In quantum computation, we are putting together the results coming from the two previously seen types of computation. Indeed, any reversible circuit is automatically a quantum circuit. However, quantum circuits of-fer much more diversity in terms of the number of allowed operations and resembles, in many aspects, the randomized circuits. In fact, as we know from quantum mechanics, quantum gates are just unitary transformations applied on qubits; thus, they must be reversible. Also, just as we have done with random computation, we can write a qubit in the more general form

|ψi = α |0i + β |1i (2.1)

While in the random world α, β ∈ C are called probabilities, in the quantum world α, β ∈ C are called amplitudes. An equation like (2.1) is called a superposition. Now, the restriction on the amplitude become |α|2_{+ |β|}2 _{= 1.}

By oversimplifying, it is what we would have obtained by taking random-ized computation and allowing probabilities to be complex numbers rather than just real, positive numbers. In reality, we can often treat α and β as real (positive and negative) numbers without losing much. More details on the mathematics involved in the quantum world can be found in Appendix A.

Just like in randomized computation, to analyze a random circuit we need to keep track of the joint state of all the input qubits as they proceed through the gates. The correlation between the qubits is now called entanglement. Instead, when we do the measurement on the output bits, the rule is probabilistic. Suppose for example that the final state is