Introduction to Probabilistic and Quantum Programming

(1)

Introduction to Probabilistic and Quantum Programming

Part III

Ugo Dal Lago

BISS 2014, Bertinoro

(2)

Section 1 The λ-Calculus as a Functional Language

(3)

Minimal Syntax and Dynamics

I Terms:

M ::= x

|

M N

|

λx.M.

I Substitution:

x{x/M } = M y{x/M } = y

N L{x/M } = (N {x/M })(L{x/M }) λy.N {x/M } = λy.(N {x/M })

I CBN Operational Semantics:

(λx.M )N → M {x/N } M → N M L → N L

I Values:

V ::= λx.M

I CBV Operational Semantics:

(λx.M )V → M {x/V } M → N M L → N L

M → N V M → V N

(4)

Minimal Syntax and Dynamics

I Terms:

M ::= x

|

M N

|

λx.M.

I Substitution:

x{x/M } = M y{x/M } = y

(λx.M )N → M {x/N } M → N M L → N L

I Values:

V ::= λx.M

(λx.M )V → M {x/V } M → N M L → N L

M → N V M → V N

(5)

Minimal Syntax and Dynamics

I Terms:

M ::= x

|

M N

|

λx.M.

I Substitution:

x{x/M } = M y{x/M } = y

(λx.M )N → M {x/N } M → N M L → N L

I Values:

V ::= λx.M

(λx.M )V → M {x/V } M → N M L → N L

M → N V M → V N

(6)

Minimal Syntax and Dynamics

I Terms:

M ::= x

|

M N

|

λx.M.

I Substitution:

x{x/M } = M y{x/M } = y

(λx.M )N → M {x/N } M → N M L → N L

I Values:

V ::= λx.M

(λx.M )V → M {x/V } M → N M L → N L

M → N V M → V N

(7)

Minimal Syntax and Dynamics

I Terms:

M ::= x

|

M N

|

λx.M.

I Substitution:

x{x/M } = M y{x/M } = y

(λx.M )N → M {x/N } M → N M L → N L

I Values:

V ::= λx.M

(λx.M )V → M {x/V } M → N M L → N L

M → N V M → V N

(8)

Observations

I Not all redexes are reduced.

I For every M there is at most one N such that M → N .

I It is easy to find a term M such that M = N₀ → N₁ → N₂→ . . .

I There is a problem with substitutions:

(λx.λy.x)(yz) →λy.yz.

(λx.λy.x)(yz) →λw.yz.

I The standard solution is to consider terms modulo so-called α-equivalence: renaming bound variables turns a term into one which is indistinguishable.

λx.λy.xxy ≡ λz.λy.zzy.

I α-equivalence is not necessary if we assume to work with closed terms.

(9)

Observations

(10)

Observations

(11)

Computing Simple Function on N, in CBV

I Natural Numbers

p0q = λx.λy.x;

pn + 1q = λx.λy.ypnq.

I Finite Set A = {a₁, . . . , an}

paiq = λx1· · · λx_n.xi.

I Pairs

hM, N i = λx.xM N

I Fixed-Point Combinator

Θ = λx.λy.y(λz.(xx)yz);

H = ΘΘ.

(12)

Computing Simple Functions on N, in CBV

I Successor

SUCC = λz.λx.λy.yz Indeed:

SUCCpnq → λx.λy.ypnq

I Projections

PROJ^k_i = λx.x(λy1· λy_k.yi) Indeed:

PROJ^k_ihpn1q, . . . , pnkqi → hpn1q, . . . , pnkqi(λy1· · · λy_k.y_i)

→ (λy₁· · · λy_k.y_i)pn1q · · · pnkq

→^∗ pniq.

(13)

Computing Simple Functions on N, in CBV

I Successor

I Projections

→ (λy₁· · · λy_k.y_i)pn1q · · · pnkq

→^∗ pniq.

(14)

Computing Simple Functions on N, in CBV

I Successor

I Projections

→ (λy₁· · · λy_k.y_i)pn1q · · · pnkq

→^∗ pniq.

(15)

Computing Simple Functions on N, in CBV

I Successor

I Projections

→ (λy₁· · · λy_k.y_i)pn1q · · · pnkq

→^∗ pniq.

(16)

A Simple Form of Recursion

I Suppose you have a term M_f which computes the function f : N → N, and you want to iterate over it, i.e. you want to compute g : N × N → N such that

g(0, n) = n;

g(m + 1, n) = f (g(m, n)).

I We need a form of recursion in the language. Observe that:

HV = (ΘΘ)V → (λy.y(λz.Hyz))V → V (λz.HV z)

I The function g can be computed by way of

M_g = H(λx.λy.y(λz.λw.zw(λq.M_f(xhq, wi)))

(17)

A Simple Form of Recursion

g(0, n) = n;

g(m + 1, n) = f (g(m, n)).

(18)

A Simple Form of Recursion

g(0, n) = n;

g(m + 1, n) = f (g(m, n)).

(19)

A Simple Form of Recursion

Mghpm + 1q, pnqi →^∗hpm + 1q, pnqi(λz.λw.zw(λq.Mf((λx.Mgx)hq, wix))

→ (λz.λw.zw(λq.Mf((λx.Mgx)hq, pnqi))pm + 1qpnq

→^∗pm + 1qpnq(λq.M^f((λx.Mgx)hq, pnqi))

→^∗(λq.Mf((λx.Mgx)hq, pnqi))pmq

→^∗Mf((λx.Mgx)hpmq, pnqi)

→^∗Mf(Mghpmq, pnqi)

(20)

A Simple Form of Recursion

(21)

A Simple Form of Recursion

(22)

A Simple Form of Recursion

(23)

A Simple Form of Recursion

(24)

A Simple Form of Recursion

(25)

A Simple Form of Recursion

(26)

Universality

I One could go on, and show that the following schemes can all be encoded:

I Composition;

I Primitive Recursion;

I Minimization.

I This implies that all partial recursive functions can be computed in the λ-calculus.

Theorem

The λ-calculus is Turing-complete, both when CBV and CBN are considered.

(27)

Why is This a Faithful Model of Functional Computation?

I Data, conditionals, basic functions, and recursion can all be encoded into the λ-calculus. We can then give programs

“informally”, in our favourite functional language.

I Take, as an example, the usual recursive program computing the factorial of a natural number

let rec fact x = if (x==0) then 1 else x*(fact x-1) I This can indeed be turned into a λ-term (where M_∗ and

M−1 are terms for multiplication and the predecessor, respectively):

let rec fact x = if (x==0) then 1 else x*(fact x-1) H(λfact.λx.if (x==0) then 1 else x*(fact x-1))

H(λfact.λx.x p1q (x*(fact x-1))) H(λfact.λx.x p1q (M^∗x (fact x-1))) H(λfact.λx.x p1q (M^∗x (fact (M−1x))))

(28)

Why is This a Faithful Model of Functional Computation?

(29)

Why is This a Faithful Model of Functional Computation?

(30)

Why is This a Faithful Model of Functional Computation?

(31)

Why is This a Faithful Model of Functional Computation?

(32)

More About Semantics

I The semantics we have adopted so far is said to be

small-step, since it derives assertions in the form M → N , each corresponding to one atomic computation step.

I There is an equivalent way of formulating the same concept, big-step semantics, in which we “jump directly to the result”:

I CBN

V ⇓ V

M ⇓ λx.N N {x/L} ⇓ V M L ⇓ V

I CBV

V ⇓ V M ⇓ λx.N L ⇓ V N {x/V } ⇓ W

M L ⇓ W

Theorem

In both CBN and CBV, M →^∗ V if and only if M ⇓ V .

I If M ⇓ V for some V , then we write M ⇓, otherwise M ⇑

(33)

More About Semantics

I CBN

V ⇓ V

M ⇓ λx.N N {x/L} ⇓ V M L ⇓ V

I CBV

V ⇓ V M ⇓ λx.N L ⇓ V N {x/V } ⇓ W M L ⇓ W

Theorem

(34)

More About Semantics

I CBN

V ⇓ V

M ⇓ λx.N N {x/L} ⇓ V M L ⇓ V

I CBV

V ⇓ V M ⇓ λx.N L ⇓ V N {x/V } ⇓ W M L ⇓ W

Theorem

(35)

Program Equivalence

I A fundamental question in programming language theory is the following: when can two programs be considered equivalent?

I A satisfactory answer to this question would enable:

I To validate program optimizations; you could be sure that the transformations you are performing turn programs into equivalent ones.

I To perform program verification; just compare the program at hand with another (abstract) program corresponding to the underlying specification.

I But what is the right notion of program equivalence?

I M ≡ N iff M = N ;trivial!

I M ≡ N iff either M ⇓ V and N ⇓ V for some V or M ⇑ and N ⇑;does not equate values which behave the same.

I M ≡ N iff M and N “have the same semantics”;nice, but we do not have time to talk about program’s semantics.

(36)

Program Equivalence

(37)

Program Equivalence

(38)

Program Equivalence

(39)

Program Equivalence

(40)

Context Equivalence

I What we need is a notion of equivalence which equates terms with the same behaviour.

I More specifically, we would like to equate those terms which have the same observable behavior in every context.

I Observable Behaviour. Two terms M and N have the same observable behaviour whenver M ⇓ iff N ⇓ .

I Contexts. They are defined as terms, but they contain a unique placeholder [·]:

C ::= [·]

|

CM

|

M C

|

λx.C.

The term C[M ] is the one obtained from C by substituting [·] with M .

I We can then stipulate that M ≡ N iff for every C, C[M ] ⇓ iff C[N ] ⇓.

(41)

Context Equivalence

C ::= [·]

|

CM

|

M C

|

λx.C.

(42)

Context Equivalence

C ::= [·]

|

CM

|

M C

|

λx.C.

(43)

Context Equivalence

I Examples:

λx.x ≡ λx.(λy.y)x (λx.xx)(λx.xx) = Ω 6≡ λx.Ω

I Exercise: is it true that for every closed M , M ≡ λx.M x?

I Proving that two terms are not context equivalent seems to be easier than proving that they are.

I We need some new techniques to overcome this problem.

(44)

Context Equivalence

I Examples:

(45)

Context Equivalence

I Examples:

(46)

Context Equivalence

I Examples:

(47)

Context Equivalence

I Examples:

(48)

Applicative Bisimulation

Terms Values

M

N

L ...

V

W

Z ...

(49)

Applicative Bisimulation

Terms Values

M

N

L ...

V

W

Z ...

(50)

Applicative Bisimulation

Terms Values

M

N

L ...

V

W

Z ...

(51)

Applicative Bisimulation

Terms Values

M

N

L ...

V

W

Z ...

(52)

Applicative Bisimulation

Terms Values

M

N

L ...

V

W

Z ... M

(53)

Applicative Bisimulation

Terms Values

M

N

L ...

V

W

Z ...

M eval V

(54)

Applicative Bisimulation

Terms Values

M

N

L ...

V

W

Z ...

M eval V

λx.N

(55)

Applicative Bisimulation

Terms Values

M

N

L ...

V

W

Z ...

M eval V

N {x/W } W λx.N

(56)

Applicative Bisimulation

I Simulation.

I If M R N and M ⇓ λx.L, then N ⇓ λx.P and for every argument Q, (L{Q/x}) R (P {Q/x}).

I Bisimulation.

I If M R N then:

I If M ⇓ λx.L, then N ⇓ λx.P and for every argument Q, (L{Q/x}) R (P {Q/x}).

I If N ⇓ λx.L, then M ⇓ λx.P and for every argument Q, (L{Q/x}) R (P {Q/x}).

I Similarity: union of all simulations, denoted-;

I Bisimilarity: union of all bisimulation, denoted ∼.

Theorem (Full Abstraction) M ≡ N iff M ∼ N .

(57)

Applicative Bisimulation

I Simulation.

I If M R N and M ⇓ λx.L, then N ⇓ λx.P and for every argument Q, (L{Q/x}) R (P {Q/x}).

I Bisimulation.

I If M R N then:

I If M ⇓ λx.L, then N ⇓ λx.P and for every argument Q, (L{Q/x}) R (P {Q/x}).

I If N ⇓ λx.L, then M ⇓ λx.P and for every argument Q, (L{Q/x}) R (P {Q/x}).

I Similarity: union of all simulations, denoted-;

I Bisimilarity: union of all bisimulation, denoted ∼.

Theorem (Full Abstraction) M ≡ N iff M ∼ N .

(58)

Applicative Bisimulation

I Suppose we want to prove that, indeed, M = λx.x ≡ λx.(λy.y)x = N.

I In principle, we should consider all possible contexts.

I But we can also exploit full-abstraction, and just exhibit one applicative bisimulation containing (M, N ).

I Here it is:

R = {(M, N )}

∪{(L, (λx.x)L) | L is a closed term}

∪{((λx.x)L, L) | L is a closed term}

∪{(L, L) | L is a closed term}

I Exercise: prove that R is indeed an applicative bisimulation.

(59)

Applicative Bisimulation

I Here it is:

R = {(M, N )}

(60)

Applicative Bisimulation

I Here it is:

R = {(M, N )}

(61)

Applicative Bisimulation

I Here it is:

R = {(M, N )}

(62)

Applicative Bisimulation

I Here it is:

R = {(M, N )}

(63)

Type Systems

I We haven’t considered any form of static type for our programs, so far.

I Some concrete programming languages are designed as to do type-checking only at runtime, i.e., they do not have any static type system.

I Scheme, Python, etc.

I Actually, endowing the λ-calculus with a type system is relatively simple.

I Types: A, B ::= bool

|

nat

|

A → B

|

A × B

|

· · ·

I Judgments: x₁: A₁, . . . , x_n : A_n` M : B.

I Rules (Examples):

Γ ` M : A → B Γ ` N : A Γ ` M N : B

Γ, x : A ` M : B Γ ` λx.M : A → B

Γ, x : A ` x : A Γ ` H : (A → A) → A

(64)

Type Systems

|

nat

|

A → B

|

A × B

|

· · ·

I Rules (Examples):

Γ ` M : A → B Γ ` N : A Γ ` M N : B

Γ, x : A ` M : B Γ ` λx.M : A → B

Γ, x : A ` x : A Γ ` H : (A → A) → A

(65)

Type Systems

|

nat

|

A → B

|

A × B

|

· · ·

I Rules (Examples):

Γ ` M : A → B Γ ` N : A Γ ` M N : B

Γ, x : A ` M : B Γ ` λx.M : A → B

Γ, x : A ` x : A Γ ` H : (A → A) → A

(66)

Type Systems

|

nat

|

A → B

|

A × B

|

· · ·

I Rules (Examples):

Γ ` M : A → B Γ ` N : A Γ ` M N : B

Γ, x : A ` M : B Γ ` λx.M : A → B

Γ, x : A ` x : A Γ ` H : (A → A) → A

(67)

Type Systems

|

nat

|

A → B

|

A × B

|

· · ·

I Rules (Examples):

Γ ` M : A → B Γ ` N : A Γ ` M N : B

Γ, x : A ` M : B Γ ` λx.M : A → B

Γ, x : A ` x : A Γ ` H : (A → A) → A

(68)

Section 2 Probabilistic λ-Calculi

(69)

Probabilistic λ-Calculus: Syntax and Operational Semantics

I Terms: M, N ::= x

|

λx.M

|

M N

|

M ⊕ N ;

I Values: V ::= λx.M ;

I How should we define the semantics of a probabilistic term M ?

I Informally, M ⊕ N rewrites to M or to N with probability

1

2: this is just like flippling a coin and choosing one of the two terms according to the result.

I A value distribution is a functionD from the set V of values to R such X

V ∈V

D(V ) ≤ 1.

I The fact the sum can be strictly smaller than 1 is a way to reflect (the probability of) divergence.

I The empty value distribution is denoted as ∅.

I Useful notation for finite distributions: {V₁^p¹, . . . , V_n^pⁿ}.

I Value distributions can be ordere, pointwise.

I S(D) is the support of D, i.e. the set of values to which D assigns nonnull probability.

(70)

Probabilistic λ-Calculus: Syntax and Operational Semantics

I Terms: M, N ::= x

|

λx.M

|

M N

|

M ⊕ N ;

1

V ∈V

D(V ) ≤ 1.

(71)

Probabilistic λ-Calculus: Syntax and Operational Semantics

I Terms: M, N ::= x

|

λx.M

|

M N

|

M ⊕ N ;

1

V ∈V

D(V ) ≤ 1.

(72)

Probabilistic λ-Calculus: Syntax and Operational Semantics

I Approximation (Big-Step) Semantics, CBN:

M ⇓ ∅ V ⇓ {V¹}

M ⇓D N ⇓E M ⊕ N ⇓ ¹₂D +¹₂E M ⇓D {P {x/N } ⇓Eλx.P}λx.P ∈S(D)

M N ⇓ P

V ∈S(F)D(V )Eλx.P

I Example: consider M = I ⊕ ((II) ⊕ Ω), where I = λx.x and Ω = (λx.xx)(λx.xx).

M ⇓ ∅ M ⇓ {I¹²} M ⇓ {I³⁴}

I Semantics: [[M ]] = sup_{M ⇓}_DD;

I Variations: Small-Step Semantics, CBV.

(73)

Probabilistic λ-Calculus: Syntax and Operational Semantics

M ⇓ ∅ V ⇓ {V¹}

M N ⇓ P

M ⇓ ∅ M ⇓ {I¹²} M ⇓ {I³⁴}

(74)

Probabilistic λ-Calculus: Syntax and Operational Semantics

M ⇓ ∅ V ⇓ {V¹}

M N ⇓ P

M ⇓ ∅ M ⇓ {I¹²} M ⇓ {I³⁴}

(75)

Probabilistic λ-Calculus: Syntax and Operational Semantics

M ⇓ ∅ V ⇓ {V¹}

M N ⇓ P

M ⇓ ∅ M ⇓ {I¹²} M ⇓ {I³⁴}

(76)

Probabilistic λ-Calculus: Syntax and Operational Semantics

M ⇓ ∅ V ⇓ {V¹}

M N ⇓ P

M ⇓ ∅ M ⇓ {I¹²} M ⇓ {I³⁴}

(77)

Probabilistic λ-Calculus: Syntax and Operational Semantics

M ⇓ ∅ V ⇓ {V¹}

M N ⇓ P

M ⇓ ∅ M ⇓ {I¹²} M ⇓ {I³⁴}

(78)

Probabilistic λ-Calculus: Syntax and Operational Semantics

I As an example, consider the following recursive program, and call it M :

let rec fancy x = x (+) (fancy x+1) I One can easily check that:

M p0q ⇓ ∅ M p0q ⇓ {p0q¹²} M p0q ⇓ {p0q¹², p1q¹⁴} · · ·

I As a consequence:

[[M p0q]] = sup