Arithmetic Progressions in a Set of Positive Density

(1)

UNIVERSITÁ DI PISA

DIPARTIMENTO DI MATEMATICA

CORSO DI LAUREA MAGISTRALE IN MATEMATICA

Arithmetic Progressions in a

Set of Positive Density

Relatore

Prof. Mauro Di Nasso Prof. Stefano GalatoloControrelatore Candidato

Luigi Marangio

(2)

(3)

iii

A Paolo per avermi insegnato a pregare, a Lucia per avermi insegnato ad avere fede.

(4)

(5)

Introduction

Abstract

An arithmetic progression is a sequence of numbers such that the dif-ference between the consecutive terms is constant; how many arithmetic progressions there are in a subset of the natural numbers? In 1927 van der Waerden proved a celebrated theorem (see [W]), which states that if N is partitioned into nitely many classes (the colours") then at least one of these classes contains arbitrarily long arithmetic progressions. Actually van der Waerden proved the nitary version of this statement

Theorem. Let k and r be positive integer. Then there exists a positive integer W = W (k, r) such that, however the set {1, . . . , W } is partitioned into r subsets, at least one of the subsets contains an AP of lenght k.

W (k, r)is called van der Waerden number of r and is natural to wonder how quickly the least such W grows as a function of k and r, but this has turned out to be a surprisingly dicult question. The original proof of van der Waerden was short and purely combinatorial, but gave poor bounds for the van der Waerden numbers (bounds W above an Ackermann-type function in k, even when r = 2). Shelah, in 1987, gave the rst primitive recursive upper bound with a beautiful transparent proof; although this was a huge improvement on the previous bound, it still left an enormous gap, as the best known lower bound was, and still is, exponential in k.

In the same period of van der Waerden, Ramsey proved that in any innite complet k-uniform hypergraph colored by a nite numbers of colours, there is a monochromatic innite complete k-uniform section hypergraph (see [Ra]). The van der Waerden and the Ramsey theorems, as they are now called, initiated the eld of Ramsey theory, that is (roughly speaking) the study of dierent notions of largeness for parts of N (or more general structure such as semigroup) and how these notions are linked to each other. A strengthening of a completely dierent kind was conjectured by Erdös and Turán in 1936. They realised that it ought to be possible to nd arith-metic progressions of length k in any suciently dense set of integers, which would show that the colouring in van der Waerden's theorem was, in a sense, a distraction. However is not possible to associate to every statement in

(8)

viii Contents Ramsey theory a so called density version" (a famous theorem of Shur states that is always possible to nd monochromatic triples x, y, z such that x + y = z; of course the density version of this theorem is false, just take a look to odd numbers).

In 1953, Roth solved the conjecture of Erdös and Turán in the case k = 3 (see [Ro]); Roth gave a beautiful argument using exponential-sum estimates, but his approach seemed not to generalize.

The case k = 4 was solved only in 1969 by Szemerédi (see [S1]) with a complex combinatorial argument. Finally the conjecture was proved by Szemerédi in 1974 (see [S2]). Szemerédi's theorem, which we now state precisely, is one of the milestones of combinatorics.

Theorem. Let k be a positive integer and let δ > 0. There exists a positive integer N = N(δ, k) such that every subset of the set {1, · · · , N} of size at least δN contains an arithmetic progression of lenght k.

A second proof of Szemerédi's theorem was given by Furstenberg in 1977 (see [F]), by reducing it, thanks to the so called Furstenberg correspondence principle, to a problem about measure preserving system.

This proof was remarkable, partly because of the diculty of Szemerédi's original proof, and partly because Furstenberg's techniques have since been extended to prove many natural generalizations of the theorem which do not seem to follow from Szemerédi's approach. These include a density version of the Hales- Jewett theorem (see [FK]) and a polynomial Szemerédi theorem" (see [BL]).

The ideas behind the dierent proofs of Szemerédi's have a common fea-ture, namely a dichotomy theorem": for dierent reasons both structure and randomness are source of arithmetic progressions. In the original proof this idea is hidden behind the Szemerédi regularity lemma for graphs, asserting, roughly speaking, that every large enough graph can be approximated by almost random graphs.

However the Furstenberg proof was totally non-constructive, and Sze-merédi's proof used the van der Waerden theorem and also used the regu-larity lemma just mentioned, which makes a tower-type contribution to the size of the bound from any argument that uses it; so both of them did not give bounds on the numbers W (r, k).

In this thesis we will show both Roth's and Furstenberg's argument, we will sketch the main ideas of Gowers in order to generalize Roth's argument for the case k = 4 (see [G1]), and we will give a non standard correspondence principle. More precisely

In the rst chapter we prove classically Roth's theorem; we will just de-ne things in such a way that would be easier to generalize them in chapter 3, when we will take care of the main ideas of the proof of Szemerédi's

(9)

the-Contents ix orem in the case k = 4.

In chapter 2 we will show how we can translate the notion of density from the natural numbers to a measure preserving system and then to prove the ergodic equivalent of Roth's theorem.

In chapter four we propose a non standard-Furstenberg correspondence principle, i.e. we show how to build a measure on βN that imitates the usual notion of density.

Finally in section ve we give another ergodic application to additive combinatorics, namely the Sarközy theorem.

(10)

x Contents

Prerequisites and Notation

Speaking about arithmetic progressions does not require at all a strong formal background in mathematics; however, as Szemerédi's proof proves, this does not make it an easier task. Even if one of the purpose of this thesis is to be most self-contained as possible (at least for the standard parts), we take for granted basic notions of mathematics (linear algebra, basic anal-ysis). We assume also that the reader has familiarity with basic facts as Cauchy-Schwartz inequality, triangle inequality, measures, what asymptotic density is, and so on.

Section one and three did not require other extra mathematical back-ground, however, in section two and ve basic notion of measure theory and probability are required and it may be useful an ergodic background. Section four may appear incomprehensible to a reader who does not have any logic or model theory background.

We now summarize some of the conventions, notations that will be ap-plied throughout this thesis unless explicitly stated otherwise.

We will use indistinctly two points : or a line | to say such that in a set expression

{A ⊂ N : A ∈ U} = {A ⊂ N|A ∈ U}. In section one and three

• we shall be considering subset of Z_N rather then subsets of {1, . . . , N} • and it will be convenient although not essential to take N to be a prime

number.

• D would be the closed unit disc in C.

• we shall write ω for the number exp(2πi/N).

• we shall use this notation (f + g)∧_(s)_{instead of}_{(f + g)(s)}ˆ _{, for obvious} reasons.

• we will write fractions as 1/2N _{instead of} 1 2N.

• we shall denote the characteristic function of a set A ⊂ ZN by the

same letter as the set itself.

• we shall write [−M, M) for the set {−M, −M + 1, . . . , M − 1, M}. For section two, four and ve

(11)

Contents xi • we will use a.e. instead of saying almost everywhere

• we will assume Cauchy-Schwartz and Hölder inequality as well as Caratheodory Extension Theorem

• we will always work with non principal ultralter unless explicitly stated otherwise

• c would be the cardinality of the continuum, i.e. c = 2ℵ0

• sometimes we will omit the∗ behind the expression, but will always be clear by the contest what we mean.

(12)

(13)

Chapter 1

Roth's theorem

A random subset of the set {1, 2, . . . , N} of cardinality δN contains, roughly speaking the expected number of arithmetic progressions of length k, that is, δk_{times the number of such progressions in the whole of {1, 2, ..., N}.}

Unfortunately random sets are not the worst since the value of δ needed to ensure an arithmetic progression of length 3 would be then of order of magnitude N2/3_{, but actually δ it is know to be at least exp(−c(log(N))}1/2_),

for some absolute constant c > 0 ( [Be]).

Roth's proof exploit the fact that random sets contain long arithmetic progressions, and with a density argument and a simple iteration proves Szemerédi's theorem; more precisely the proof is organized as follows

• Dene a notion of pseudorandomness

• prove that every pseudorandom subset of {1, . . . , N} contains the num-ber of arithmetic progressions of length k that you would expect "; • Prove that if A ⊂ {1, · · · , N} is not pseudorandom then then there

exists an arithmetic progression P ⊂ {1, 2, . . . , N} with length tending to innity with N, such that |A ∩ P | ≥ (δ + ε)|P |, for some ε > 0 that depends on δ (and k) only.

1.1 Fourier transform on nite groups

Given a function f : ZN → C and r ∈ ZN we dene the discrete Fourier

transform of f as

ˆ

f (r) = X

s∈ZN

f (s)ω−rs.

Since we will not use convolution in this thesis, let us write f ∗ g for the function

f ∗ g(s) = X

t∈ZN

f (t)g(t − s).

(14)

2 Chapter 1. Roth's theorem For the rest of section 1 all sums will be over ZN unless it is specied

otherwise.

We will use the following basic identities over and over again, let us states those properties in a remark and two lemmas.

Remark 1.1. If r 6= 0 then X s ω−rs= (1 − ω−rN)/(1 − ω−r) = (1 − ω0)/(1 − ω−r) = 0, thus we deduce X s ω−rs= ( N if r = 0, 0 if r 6= 0.

Lemma 1.2. Let f, g : ZN → C, then the following identities hold

1. (f ∗ g)∧_{(r) = ˆ}_{f (r) ˆ}_{f (g)}_, 2. P_r_{f (r) ˆ}ˆ _{f (g) = N}P sf (s)g(s)(Parseval's identity), 3. Pr| ˆf (r)|2= N P s|f (s)|2, 4. f(s) = N−1P rf (r)ωˆ rs. Proof. 1. (f ∗ g)∧(r) =X s (f ∗ g)(s)ω−rs =X s,t f (t)g(t − s)ω−rtωr(t−s) =X u,t f (t)ω−rtg(u)ω−ru = ˆf (r)ˆg(r),

where we used nothing more than −rs = −rt + r(t − s) and u = t − s. 2. By using (1) X r ˆ f (r) ˆf (g) =X(r)(f ∗ g)∧(r) =X r X s f ∗ g(s)ω−rs = N f ∗ g(0) = NX s f (s)g(s), where for the second equality we used 1.1.

(15)

1.1. Fourier transform on finite groups 3 3. Is just a special case of the case above.

4. N−1X r ˆ f (r)ωrs = N−1X r X t f (t)ωr(s−t) = N−1· N f (s)

were again we applying 1.1.

Or one can just deduce (4) applying (2) plus the fact that the function r → ω−rs is the Fourier transform of the characteristic function of a singleton {s}.

There is one more identity that is suciently important to be stated alone.

Lemma 1.3. Let f, g : ZN → C, then

X r | ˆf (r)|2|ˆg(r)|2= NX t X s f (s)g(s − t) 2 . Proof. By identities (1) and (2),

X r | ˆf (r)|2|ˆg(r)|2=X r |(f ∗ g)∧(r)|2 = NX t |f ∗ g(t)|2 = N X t,s,u f (s)g(s − t)f (u)g(u − t) = NX t X s f (s)g(s − t) 2 .

Setting f = g and expanding the right hand side of the expression in the lemma above, we obtain a nice interpretation of the sums of fourth powers of Fourier coecients Lemma 1.4. X r | ˆf (r)|4 = N X a−b=c−d f (a)f (b)f (c)f (d).

When we apply this identity to (the characteristic function of) a set A, tells us that the sum P_r| ˆA(r)|4 is N times the number of quadruples (a, b, c, d) ∈ A4 such that a − b = c − d. Notice that such quadruples are the same as quadruples of the form (x, x + s, x + t, x + s + t).

(16)

4 Chapter 1. Roth's theorem Remark 1.5. In the case of functions with modules at most 1, as will be in a while, one can think of lemma 1.3 as saying that if f has a large inner product with a large number of rotations of g, then f and g must have large Fourier coecients in common, where large means of size proportional to N. For technical reasons it is also useful to consider functions of mean zero. Denition 1.6. Given a set A ⊂ ZN of cardinality δN, we dene the

bal-anced function of A, fA : ZN → [−1, 1] as the characteristic function of A

minus the constant function δ1. More precisely fA(s) =

(

1 − δ if s ∈ A, −δ if s /∈ A.

(17)

1.2. A useful notion of pseudorandomness 5

1.2 A useful notion of pseudorandomness

With tools of previous section we are now able to dene a useful notion of pseudorandomness. In the next lemma we gives several equivalent denitions involving constants ci. When we say that one property involving ci implies

another involving cj , we mean that if the rst holds, then so does the second

for a constant cj that tends to zero as ci tends to zero. (Thus, if one moves

from one property to another and then back again, one does not necessarily recover the original constant.) From the point of view of the eventual bounds obtained, it is important that the dependence is no worse than a xed power. This is always true below.

Lemma 1.7. Let f be a function from ZN to D. The following are equivalent

(i) Pk P sf (s)f (s − k) 2 ≤ c₁N3; (ii) Pa−b=c−df (a)f (b)f (c)f (d) ≤ c1N3;

(iii) P | ˆf (r)|4 ≤ c₁N4; (iv) maxr| ˆf (r)| ≤ c2N ;

(v) for every functions g : ZN → C, Pk

P sf (s)g(s − k) 2 ≤ c3N2kgk22.

Remark 1.8. We dene for a function f : ZN → C the norm of f has

kf k2₂:=X

s

| ˆf (s)|2.

Proof. Equivalence between (i), (ii) and (iii) is actually the proof of lemma 1.4; we know from lemma 1.3 that

X | ˆf (r)|4= N X t,s,u f (s)f (s − t)f (u)f (u − t) = N X a−b=c−d f (a)f (b)f (c)f (d).

(iii)trivially implies (iv) when c2 ≥ c1/41 and since

X | ˆf (r)|4≤ max r | ˆf (r)| 2X r | ˆf (r)|2 ≤ N max r | ˆf (r)| 2_, we nd that if c2

2≤ c1 then (iv) implies (iii).

It is also trivial that (v) implies (i) if c1 > c3; let us conclude the proof

by showing that if c1/2

1 ≤ c3 then (iii) implies (v). By lemma 1.3 we have

X k X s f (s)g(s − k) 2 = N−1X r | ˆf (r)|2|ˆg(r)|2,

(18)

6 Chapter 1. Roth's theorem by the Cauchy-Schwartz inequality

≤ X| ˆf (r)|41/2 X|ˆg(r)|41/2. By using the additional inequality we reach the thesis since

X

|ˆg(r)|41/2≤X

r

|ˆg(r)|2.

Denition 1.9. A function f : ZN → C which satisfy condition (i) above,

with c1 = α, will be called α-uniform. We say that a set A ⊂ Z is α-uniform

if its balanced function fAis α-uniform.

Remark 1.10. This notion of α-uniform coincides with the denition of quasirandom subset of ZN due to Chung and Graham.

If A ⊂ ZN is an α-uniform set of cardinality δN, and f is its balanced

function then we have X

r

| ˆA(r)|4 = |A|4+X

r

| ˆf (r)|4 ≤ |A|4+ αN4.

We noted earlier that P | ˆA(r)|4 is N times the number of quadruples (a, b, c, d) ∈ A4 such that a − b = c − d. If A where a random subset of cardinality δN, then we would expect δ4_N3 _{= N}−1_|A|4 _{such quadruples.}

Therefore the number α measuring how close A is to being random, in this particular case.

(19)

1.3. The density increment argument 7

1.3 The density increment argument

Let us give a standard denition

Denition 1.11. Let us dene the diameter of a subset X ⊂ ZN to be the

smallest integer s such that X ⊂ {n, . . . , n + s} for some n ∈ ZN.

The next one is a very standard lemma, that will allow us to start density increment argument.

Lemma 1.12. Let r, s and N be positive integers with r, s ≤ N but rs ≥ N, and let ϕ : {0, 1, . . . , r − 1} → ZN be linear. Then the set {0, . . . , r − 1} can

be partitioned into arithmetic progressions P1, . . . , PM such that for every j

(i) the diameter of ϕ(Pj) is at most s;

(ii) the length of Pj lies between (_4Nrs)1/2≤ |Pj| ≤ (rs_N)1/2

Remark 1.13. With ϕ linear we mean of the form ϕ(x) = ax + b.

Proof. Let t = p(rN/4s)1/2_{q. By pigeon principle there are two elements, say}

ϕ(u) and ϕ(v), of {ϕ(0), . . . , ϕ(t)} such that |ϕ(u)|, |ϕ(v)| ≤ N/t. Choose the non-zero one between u and v, say u, and observe that by the linearity of ϕ we have

|ϕ(u) − ϕ(0)| ≤ |au| ≤ N · N/t = N/t,

where the last equality is justify since we are working in ZN. Now spilt

{1, . . . , r − 1} into congruence classes modulo u; each class is an arithmetic progression of cardinality either p(r/u)q or x(r/u)y.

Now we want to split again each class into arithmetic progressions (Pj),

each of those satisfy the thesis. First notice that if P is any set of at most st/N consecutive elements of a congruence class, then diamϕ(P ) = s. In fact since the classes are mod u, we can write two elements of P as w + uk1

and w + uk2, with k1, k2 ≤ (st/N ), then

|ϕ(w + uk1) − ϕ(w + uk2)| ≤ |ua(k1− k2)| ≤ (N/t)(sN/t) = s.

Now observe that st N ≤ 1 2 r rs N ≤ 2 3 r rs N = r 3t ≤ 1 3 r u ≤ 1 2 r u

Now take a congruence class, then it has at most p(r/u)q elements, by the inequality above we can thus split this class into sets of consecutive elements of cardinality at least p(st/N)q and at most x(2st/N)y.

(20)

8 Chapter 1. Roth's theorem Corollary 1.14. Let f : {0, . . . , r − 1} → D and ϕ : ZN → ZN be linear

and let α > (4π/r). If r−1 X x=0 f (x)ω−ϕ(s) ≥ αr,

then there is a partition of {0, . . . , r − 1} into m ≤ (16πr/α)1/2 _arithmetic

progressions P1, . . . , Pm such that: m X j=1 X x∈Pj f (x) ≥ (α/2)r

and such that the lengths of all Pj-s lie between (αr/π)1/2/4and (αr/π)1/2/2.

Proof. Let s ≤ αN/4π and let m = (16πr/α)1/2_{; for the previous lemma we}

can nd a partition of {0, . . . , r − 1} into arithmetic progressions P1, . . . , Pm

such that the diameter of ϕ(Pj) is at most s for every j and the length of

each Pj lies between r/m and 2r/m. We just need to prove the condition on

the sum.

By hypothesis and triangle inequality we have

m X j=1 X x∈Pj f (x)ω−ϕ(x) ≥ (α/r). Let xj ∈ Pj, the estimate on diamϕ(P ) implies that

|ωϕ(x)− ωϕ(xj)_{| ≤ α/2,} for every x ∈ P j, Therefore m X j=1 X x∈Pj f (x) = m X j=1 X x∈Pj f (x)ω−ϕ(xj) ≥ m X j=1 X x∈Pj f (x)ω−ϕ(x) − m X j=1 (α/2)|Pj| ≥ (αr/2)

Corollary 1.15. Let A ⊂ ZN be a set of size at least δN and suppose that

| ˆA(r)| ≥ αN for some r 6= 0. Then there exists an arithmetic progression P ⊂ {0, 1, . . . , N − 1}of length at least (α3N/128π)1/2 such that |A ∩ P | ≥ (δ + α/8)|P |.

(21)

1.3. The density increment argument 9 Proof. Let f be the balanced function of A (regarded as a function {0, . . . , N− 1} and dene ϕ(x) = rx. By the corollary above we can partition the set {0, . . . , N − 1}into m = (16πN/α)1/2 _{arithmetic progressions P}

1, . . . , PM of

lengths between N/m and 2N/m such that

m X j=1 X x∈Pj f (x) ≥ αN/2. Since Pm j=1 P

x∈Pjf (x) = 0if we dene J to be the set of j with Px∈Pjf (x) >

0, we have X j∈J X x∈Pj f (x) ≥ αN/4.

Now choose j such that Px∈Pjf (x) ≥ αN/4m, observe that |Pj| ≤ 2N/m,

so we can conclude that

X

x∈Pj

f (x) ≥ α|Pj| 8 , which implies the thesis.

Finally we can give the proof of Roth's theorem; we start with a su-ciently dense subset of {1, . . . , N} ⊂ Z; if we are not in the hypothesis of the corollary above, then we can conclude that we have arithmetic progressions. Otherwise we apply the corollary, and we consider A ∩ P (as subset of P ) instead of A, and then start again by checking the hypothesis of the corollary on A ∩ P .

Remark 1.16. Since we are passing to smaller progressions and iterating, we cannot simply assume that N is prime. We need to prove that we can assume it. Roughly speaking the primes are enough dense to ensure us that considering the closest prime relative to N does not eect too much the density (in positive or in negative) of the subset that we are considering.

One may think of using Bertrand's postulate which states that there is a prime number between n and 2n for any integer n > 1 but this would lead to a loss of 1/2 in the density of A. If this step is only used once this is harmless but in the proof of Roth's theorem, we have to use this step at each iteration and we cannot aord such a loss.

Proof. Let ε < min(1−α, α). By denition of the upper density, there exists an innite increasing sequence (Nk)of integers (thus tending to innity) such

that

|A_N_k| ≥ (α − ε/2)N_k, for every k ∈ N, where by As we mean A ∩ [1, . . . , s].

The sequence (|An|)is increasing and satises

(22)

10 Chapter 1. Roth's theorem thus for any

n ∈ " 1 − α + ε/2 1 − α + ε Nk, α − ε/2 α − ε Nk # we have |A_n| ≥ (α − ε)n.

Now if π(x) is the number of the prime numbers less than x, the prime number theorem gives

( p ∈ " 1 − α + ε/2 1 − α + ε Nk, α − ε/2 α − ε Nk #) ) = π α − ε/2 α − ε Nk ! − π 1 − α + ε/2 1 − α + ε Nk ! ∼ α − ε/2 α − ε − 1 − α + ε/2 1 − α + ε ! Nk logNk ∼ ε/2 (α − ε)(1 − α + ε) Nk logNk . The last quantity is larger then 1 for k (and thus Nk) larger enough thus

there are innitely many prime numbers p such that |A∩[1, . . . , p]| ≥ (α−ε)p. Taking ε = α − δ yields the lemma.

Theorem 1.17 (Roth). Let δ > 0, let N ≥ exp exp(Cδ−1₎ _{(where C is an}

absolute constant) and let A ⊂ {1, . . . , N} be a set of size at least δN. Then Acontains an arithmetic progression of length 3.

Proof. Let α = δ2_/10 _{and suppose that for every non zero r | ˆ}_{A(r)| ≤ αN}_,

i.e. A satisfy condition (iv) of lemma 1.7. Now let B = A ∩ [N/3, 2/3N]; the number of triples (x, y, z) ∈ A × B × B such that x + z = 2y is then

N−1X x∈A X y∈B X z∈B X r ωr(2y−x−z),

in fact if 2x − y − z = 0 (and thus we have an arithmetic progressions) we obtain a factor N divided by N; otherwise we know that the sum on all the r is null. Recalling the denition of ˆA(r), we thus obtain

N−1X r ˆ A(r) ˆB(−2r) ˆB(r) ≥ N−1|A||B|2− N−1max r6=0 | ˆA(r)| X r6=0 | ˆB(−2r)|2 1/2_X r6=0 | ˆB(r)|2 1/2 ≥ δ|B|2− α|B|N

where the rst inequality is justify since ˆA(0) = |A| and thus we are just applying the trivial formula P_rψ(r) = ψ(0)+P

r6=0ψ(r) ≥ ψ(0)−

P

r6=0ψ(r)

(23)

1.3. The density increment argument 11 If |B| ≥ δN/5 then the last quantity is minimized by |B| = δN/5 and the minimum value is δ3_N2_/5)_{, implying the existence of at least this number}

of triples in arithmetic progressions mod N. Since B lives in the middle third, these are genuine progressions, and since there are only N degenerate progression, we conclude that A contains an arithmetic progressions of length 3as long N ≤ 5δ3.

If |B| ≤ δN/5 then either A ∩ [0, N/3] or A ∩ [2N/3, N) has cardinality at least 2δN/5; thus we can apply the argument above to one of those set, we just need to pay attention to obtain genuine progression (i.e. if now we are working in A ∩ [0, N/3] we will choose x, z ∈ A ∩ [0, N/3] and y ∈ A).

So far from now we obtained that if for every non zero r | ˆA(r)| ≤ αN, then A contains arithmetic progressions of length 3. Suppose then the con-trary; by the corollary 1.15, we can nd a sub progression P of {1, . . . , N} of cardinality at least (α3_N/128π)1/2 _{such that}

|A ∩ P | ≥ δ(1 + δ 80)|P |.

This gives us the basis for an iteration argument. Rename A = A0 and

δ = δ0; if we are in the rst case we can nd arithmetic progression, otherwise

we can nd the P above, and we dene A1 = |A ∩ P | and δ1 the density of

this new set.

At each step, the size of the progression in which A lives is around the square root of what it was at the previous step, more precisesly if the density at the step m is δm

δm+1≥ δm+ δm2/80.

This implies that the iteration process end with a nite number of steps, since density cannot exceed the value one. To conclude we need just to prove the estimate on N.

Observe that |A1| ≥ (δ0+ δ02/80)|P | := δ1|P | and |P | ≥α 3_N 128π 1/2 ≥ c1δ2 √ N ;

now we iterate the argument on A1. We get that if N1 ≥ c1/δ2 then we have

a subset A2 of some progression P2 and

|P₂| ≥ (c₁δ2)1+1/2N1/4; after k iteration we get

|P_k| ≥ (c₁δ2)1+1/2+...+1/(2k−1)N1/(2k)= (c1δ2)2(1−2

−k₎

(24)

(25)

Chapter 2

Furstenberg's proof

In the chapter above we saw how to use linear Fourier analysis to proof Roth's theorem; however this method does not control arithmetic progres-sions of length more than 3 (see example 3.11).

In [F], Furstenberg proved Szemerédi's theorem using functional analysis and in particular ergodic theory, avoiding the hard combinatorial techniques of the original proof of Szemerédi.

The main idea is to reformulate the problem in terms of measure pre-serving system and then prove the theorem for simpler classes of measure preserving system, namely weakly mixig systems and periodic systems; then a dichotomy theorem between this two classes of measure preserving systems, allows one to conclude that the theorem needs to be true for every measure preserving system.

2.1 Ergodic Theory

2.1.1 Basic theory

Consider a probability space (X, B, µ) and a measurable transformation T : X → X which preserves the measure µ, i.e. ∀A ∈ B µ(A) = µ(T−1(A)). In this case we say that (X, B, µ, T ) is a measure preserving system.

Denition 2.1. T is said to be ergodic if one of the following statements is true:

• for every A ∈ B with T−1_{(A) = A} _{either µ(A) = 0 or µ(A) = 1;}

• for every A, B ∈ B of positive measure, there exists n>0 such that µ(T−n(A) ∩ B) > 0.

Denition 2.2. T is said to be weakly mixing if, for any A, B ∈ B, one has: lim n 1 n n−1 X i=0 |µ(T−n(A) ∩ B) − µ(A)µ(B)| = 0; 13

(26)

14 Chapter 2. Furstenberg's proof T is said to be mixing if, for any A, B ∈ B, one has:

lim

n µ(T

−n_{(A) ∩ B) = µ(A)µ(B).}

Trivially mixing implies weakly mixing that implies ergodic; respectively these three notions are the statistic equivalent of topological mixing, mini-mality and topological transitive.

Remark 2.3. Given a map T there may be many ergodic measures, i.e. many measures µi such that if T−1(A) = Athen µi(A) needs to be either 0

or 1, for every i.

A general measure-preserving system is not necessarily ergodic, but in the later sections we will sometimes assume the systems to be ergodic; thanks to the following theorem one can always decompose a generic system into ergodic components, i.e. one can always express any non-ergodic measure as an average of ergodic measures, and this would be enough to show that we can prove the Furstenberg's multiple recurrence theorem (i.e. Szemerédi's theorem) just on ergodic systems (see remark 2.18).

Theorem 2.4. Let (X, A, µ, T ) be a measure preserving system and let E(X) the set of all ergodic measures on X. Then there exists a map β : X → E(X) such that for every A ∈ A the map ΠA: x → βx(A) is measurable and

µ(A) = Z

X

ΠA(x)dµ(x).

Since ergodic theory is the study of the statistical properties of an ob-servable (a measurable function), we would like to have a shortly notation for the convergence in mean.

Denition 2.5. Let V a normed vector space and v1, . . . , vn∈ V. We dene

Cesaro convergence (or convergence in mean) as C-limn→∞vn= v ⇔ lim N →∞ 1 N N −1 X n=0 vn= v,

we also dene the Cesaro supremum as C-sup_n→∞vn= lim sup

N →∞ k1 N N −1 X n=0 vnk.

One of the main tools, used in almost all the proof in ergodic additive combinatorics, is the following lemma.

(27)

2.1. Ergodic Theory 15 Lemma 2.6 (van der Corput). If x0, x1, . . .is a bounded sequence of vectors

in a Hilbert space and If

C-limm→∞C-supn→∞hxn+m, xni = 0

then

C-lim x_n= 0 .

Proof. Normalizing, we can suppose, ∀m ∈ N, kxmk ≤ 1. we have, for all

N, h ∈ N, N 6= 0, 1 N N −1 X n=0 xn= 1 N N −1 X n=0 xn+h + rN,h

where krN,hk = O(h/N ).Now, if H ≥ 1, averaging on 1, 2, . . . , H we obtain

1 N N −1 X n=0 xn= 1 H H−1 X h=0 1 N N −1 X n=0 xn+h+ rN,h = 1 N N −1 X n=0 1 H H−1 X h=0 xn+h+ RN,h

where kRN,hk = O(H/N ). By the triangle inequality, we have

1 N N −1 X n=0 xn ≤ 1 N N −1 X n=0 1 H H−1 X h=0 xn+h + O H N ! . Using the convexity of the function x2 _{we obtain}

1 N N −1 X n=0 xn 2 ≤ O 1 N N −1 X n=0 1 H H−1 X h=0 xn+h 2! + O H 2 N2 ! = O 1 H2_N N −1 X n=0 X 0≤h,h0_<H hx_n+h, xn+h0i ! + O H 2 N2 ! = O 1 H2 X 0≤h,h0_<H 1 N N −1 X n=0 hx_n+h, xn+h0i ! + O H 2 N2 ! . By taking the lim sup for N → ∞ we obtain

lim sup N 1 N N −1 X n=0 xn 2 ≤ O 1 H2 X 0≤h,h0_<H C-sup_nhx_n+h, xn+h0i ! = O 1 H2 X 0≤h,h0_<H C-sup_nhx_n+|h−h0_|, x_ni ! ≤ O 1 H X 0≤h<H C-sup_nhxn+h, xni ! .

(28)

16 Chapter 2. Furstenberg's proof By taking the limit for H → ∞ we obtain, by the hypothesis,

lim sup N 1 N N −1 X n=0 xn 2 = 0

and thus C-limnxn= 0.

2.1.2 Hilbert-Schmidt operator

We need now a little bit of theory that easily lead to a useful characteri-zation of the almost periodic function. In order to dene the Hilbert-Schmidt operator we need to speak about tensor product and we start with a standard proposition

Proposition 2.7. Let H and H0 _{be two separable Hilbert spaces. Then}

there exists a separable Hilbert spaces, say H ⊗ H0_{, and a tensor product}

map (bilinear) such that:

⊗ : H × H0→ H ⊗ H0, con hv ⊗ v0

, w ⊗ w0iH⊗H0 = hv, wi_Hhv0, w0i_H0.

Moreover an orthonormal base for the tensor product is given by the tensor products of two orthonormal bases of each factor.

Denition 2.8. Every elements K ∈ H ⊗ H0 _{induce a bounded linear}

op-erator TK : H → H0, dened by the (dual) formula

hTKv, v0iH0 := hK, v ⊗ v0i.

We say that K is the Kernel of TK, and every operator of this form is said

to be Hilbert-Schmidt.

Lemma 2.9. Let H e H0 _{be two Hilbert spaces with orthonormal bases}

respectively (en)n∈A e (e0n)n∈A0, and let T : H → H0 be a bounded linear

operator. The following are equivalents: 1. T is Hilbert-Schmidt;

2. Pn∈A||T en||2_H0 < ∞;

3. Pn∈A

P

n0_∈A0|hT e_n, e0_n0i_H0| < ∞.

The lemma above allow us to dene an inner product and thus a norm related to the Hilbert-Schimdt operator

(29)

2.1. Ergodic Theory 17 Denition 2.10. Let T, S : H → H0 _{be two Hilbert-Schimdt operator, we}

dene hT, SiHS = X n∈A hT en, SeniH0, e ||T ||2_HS =X n∈A ||T e_n||2 H0 = X n∈A X n0_∈A0 |hT e_n, e0_n0i_H0|.

The following property is the one that allows us to characterize Hilbert-Schmidt operators

Lemma 2.11. If T : H → H0 _{is Hilbert-Schmidt, then is compact, i.e. the}

image of every bounded set is precompact.

Proof. Let ε > 0. By the characterization of the Hilbert-Schmidt operators and by monotone convergence theorem, we can nd an orthonormal nite set e1, · · · , eN such that

N

X

n=1

||T e_n||2

H0 ≥ ||T ||2_HS− ε2,

and in particular such that ||T en+1||H0 ≤ εfor every e_n+1that is orthogonal

to e1, · · · , en.

Thus the image of the unitary ball of H, say U, lies in an ε-neighbourhood of the image of

U ∩ span(e1, · · · , eN)

(30)

18 Chapter 2. Furstenberg's proof

2.2 Fursteberg's correspondence principle

The rst step, in order to transfer Szemerédi's theorem in the language of measure preserving systems is to introduce a weaker notion of density then asymptotic density.

Denition 2.12. Let E ⊂ Z, we dene the Banach density of E as the number d∗(E) = lim n→∞ max k∈Z |E ∩ {k, · · · , k + n − 1}| n .

We want to translate our problems in a measure theoretical contest; to accomplish this task we need to introduce a particular space and a specic measure that imitate density on integers.

Consider X = 2Z, the set of all the functions from Z to {0, 1}, and let us

dene for every k ∈ N and for every ω ∈ X, a cylinder as Ck(ω) = {z ∈ X : zi = ωi, ∀i = 0, · · · , k}.

We can now consider the σ-algebra generated by the cylinders, say C; observe that if C is an element of the σ-algebra C, then there are a1, · · · , as∈

Z such that

C = {ω : ωi1 = a1, · · · , ωis = as}.

We observe that (X) is a topological compact space by Hausdor's the-orem, since the topology is the product of the discrete topologies on each factor iN.

We want to dene an additive measure on C, that depends from a certain set E ⊂ E, then we want to use Caratheodory's theorem to extend it to a σ-additive measure on C

Let E ⊂ Z with d∗_{(E) > 0}_{, e let {I}

t} be a sequence of intervals of

increasing length, that reach the density, i.e lim t→∞ |E ∩ I_t| |It| = d∗(E); observe that n ∈ E ⇔ Tn(1E) ∈ A

where T is the shift operator dened on X by T (xn) = xn+1, e A = {x ∈

X : x(0) = 1} ∈ C. Thus we have d∗(E) = lim t→∞ 1 |It| X n∈It 1E(n) = lim t→∞ 1 |It| X n∈It 1A(Tn(1E)).

With a Cantor's diagonal argument, one can proof that ∀C ∈ C there exists a subsequence of {It}, say {Js} such that the following limit exists

p(C) = lim s→∞ 1 |Js| X n∈Js 1C(Tn(1E))

(31)

2.2. Fursteberg's correspondence principle 19 1. p(∅) = 0;

2. if (Ai)is a countable sequence of sets in C mutually disjoint, such that

the union is in C, then p(∪Ai) =P p(Ai);

In fact the rst assert is trivial, for the second one just observe that a count-able union of cylinders mutually disjoint is still a cylinder, and you can write it as a nite union of cylinders.

The last condition to verify in order to satisfy the hypothesis of Caratheodory's theorem is additivity:

let C1, · · · , Ck be mutually disjoint cylinders; by observing

1∪Ci(T n₍₁ E)) = k X 1 (1Ci(T n₍₁ E))), we can deduce k X 1 p(Ci) = lim s 1 |J_s| X n∈Js k X 1 (1Ci(T n₍₁ E))) = lim s 1 |Js| X n∈Js 1∪Ci(T n₍₁ E)) = p(∪k₁Ci).

Theorem 2.13 (Carathéodory). Let X be a set and A be an algebra on X, p a premeasure on A such that p(X) = 1; we dene ∀B ⊂ X

µ∗(B) = inf{Xp(Ai) : (Ai)I ⊂ A, B ⊂ ∪Ai},

and B = {B ⊂ X : µ ∗ (B) + µ∗_(Bc_{) = 1}.}

Then µ∗ _{is an outer measure on X such that µ}∗

|A = p, B is a σ-algebra

on X and µ∗

|B = µis a probability measure.

Notice that we started form a subset E of Z and we have obtained a measure µ such that

µ(A) = d∗(E).

Moreover if p is T -invariant then µ is T -invariant too Let B ∈ B, then ∀ε > 0∃(Ci) ⊂ C : B ⊂ ∪Ci such that

µ(B) ≥Xp(Ci) − ε =

X

p(T−1Ci) − ε ≥ µ(T−1B) − ε.

From the arbitrariness of ε we have an inequality that is an equality when we apply the same argument to T−1_(B)_.

The T -invariance of p follows from this argument: let C ∈ C, then p(C) = lim s 1 |J_s| X n∈Js 1C(Tn(1E))

(32)

20 Chapter 2. Furstenberg's proof but we know that

p(T (C)) = lim s 1 |J_s| X n∈Js 1_{T (C)}(Tn(1E)) by observing Tn(1E) ∈ T (C) ⇔ Tn−1(1E) ∈ C, we obtain p(T (C)) = lim s 1 |Js| X n∈Js 1C(Tn−1(1E)) = lim s 1 |Js| X n+1∈Js 1C(Tn(1E)) = p(C),

where in the last inequality we apply the fact that the intervals have increas-ing length.

With this set up it is not hard to prove one of the form of the Fursten-berg's correspondence principle

Theorem 2.14 (Furstenberg's correspondence principle). Let E ⊂ Z, with d∗(E) > 0. Then there exists a measure preserving system (X, A, µ, T ), and an element A ∈ A such that µ(A) = d∗_(E) _{and for any nite subset of Z,}

say F , the following inequality holds

d∗(∩n∈F(E − n)) ≥ µ(∩T−nA).

Proof. Of course we will choose as measure preserving system X = 2Z

en-dowed with the σ-algebra of the cylinders, T the shift operator, and µ the measure constructed above. We already know that

d∗(E) = µ(A) moreover if F = {n1, · · · , nk} then: n ∈ ∩ni∈F(E − ni) ⇐⇒ n + ni∈ E f orallni ∈ F ⇐⇒ Tn₍₁ E) ∈ ∩ni∈FT −ni_A; thus µ(T−n1_{A ∩ · · · ∩ T}−nk_{A) = lim} s 1 |Js| X n∈Js 1_T−n1_{A∩···∩T}−nkA(Tn(1E)) = lim s 1 |J_s| X n∈Js 1(E−n1)∩···∩(E−Nk)(n) ≤ d∗((E − n1) ∩ · · · ∩ (E − nk),

(33)

2.2. Fursteberg's correspondence principle 21 With this beautiful statement one can prove the correspondence between Szemerédi's theorem and the following theorem of Furstenberg.

Theorem 2.15 (Furstenberg's multiple recurrence theorem). Let (X, A, µ, T ) be a measure preserving system; then for any integer k and any measurable set with positive measure E, there exists some n > 0 such that

µ(E ∩ TnE ∩ . . . ∩ T(k−1)n(E)) > 0;

Actually the condition above is equivalent to the seemingly stronger con-dition in mean: lim inf N →∞ 1 N N −1 X n=0 µ(E ∩ TnE ∩ . . . ∩ T(k−1)n(E)) > 0. We also can rewrite this condition in terms of functions:

Denition 2.16. We say that a measure preserving system is SZ of level k if lim inf N →∞ 1 N N −1 X n=0 Z f Tnf . . . T(k−1)nf dµ > 0, for every f ∈ L∞_(X)_{, f positive and E(f) > 0}

and thus we obtain the following ergodic version of Szemerédi theorem. Theorem 2.17. Every measure preserving system is SZ of level k, for every k > 1.

At rst look can be weird, but actually the theorem above is equivalent to Szemerédi theorem; for instance let us show in the case k = 3 why this ergodic theorem implies Roth's theorem.

Let A ⊂ N with d(A) = δ > 0. Using Furstenberg's correspondence, we can nd a suitable measure preserving system (X, A, µ, T ) and a measurable set E such that for any n1. . . nk∈ N,

µ(E ∩ T−n1_{(E) ∩ . . . ∩ T}−nk_{(E)) ≤ d(A ∩ (A − n}

1) ∩ . . . ∩ (A − nk));

but every measure preserving system is SZ of level k, so there exists n ∈ N such that

µ(E ∩ TnE ∩ T2nE) > 0 hence

d(A ∩ (A − n) ∩ (A − 2n)) > 0

and in particular exists x ∈ (A ∩ (A − n) ∩ (A − 2n) and thus x, x + n, x + 2n are in A.

(34)

22 Chapter 2. Furstenberg's proof In the rest of this chapter we will prove that a system that is either compact or weakly mixing is SZ (of any level) and essentially we will prove that every function can be decomposed in a pseudo random component (w.mixing) and a structured" one (almost periodic). Finally we will prove that every measure preserving system is SZ of level 3 (and thus Roth's the-orem for the remark above).

Remark 2.18. Suppose Szemerédi theorem is true for ergodic systems, i.e. for every f ∈ L∞_(X)_{, f positive and E(f) > 0}

lim inf N →∞ 1 N N −1 X n=0 Z f Tnf . . . T(k−1)nf dµ > 0, holds for every ergodic systems.

In the general case we may write the left-hand side as lim inf N →∞ 1 N N −1 X n=0 Z Z f Tnf . . . T(k−1)nf dµxdµ

where µx came from the ergodic decomposition (theorem 2.4); by Fatou's

lemma this is bounded from below by Z lim inf N →∞ 1 N N −1 X n=0 Z f Tnf . . . T(k−1)nf dµxdµ.

Since f is not almost everywhere zero, it is not µx-a.e. zero for a positive

µ-measure set of x, so the quantity in the brackets is positive on a positive measure set by the ergodic case of the multiple recurrence theorem.

(35)

2.3. Weakly mixing systems 23

2.3 Weakly mixing systems

As said many times before, our strategy is to prove it on two important cases and then to show that this cases cover all the possibilities. In this section we analyse the rst case, namely weakly mixing, roughly speaking, the random part of the system.

Denition 2.19 (Weakly mixing functions). Let (X, A, µ, T ) be a measure preserving system. A function f ∈ L2_(X)_{is said to be weakly mixing if}

C-limn→∞|hTn(f ), f i| = 0;

we refer to hTn_{(f ), f i}_{as the n-th correlation coecient of f.}

The behaviour of the correlation coecients is explained by the Von Neumann mean ergodic theorem, but we need rst to recall what conditional expectation is.

Denition 2.20. Let (X, A, µ, T ) be a measure preserving system and let A0 _{⊂ A} _{be a subalgebra. The conditional expectation E(·|A}0₎ _{is the}

orthog-onal projection map

E(·|A0) : L2(X, A, µ) → L2(X, A0, µ).

Theorem 2.21 (Von Neumann). Let (X, A, µ, T ) be a measure preserving system and let AT _{= {E ∈ A : T (E) = E}}_{be the subalgebra of all invariant}

measurable sets. If f ∈ L2_(X)_then

1 N N −1 X n=0 Tnf →L2_(X)_{E(f |A}T);

in particular if the system is ergodic the limit is E(f).

In order to prove the main property of the weakly mixing functions (or-thogonality of the correlation coecients), we require the following useful lemma.

Lemma 2.22. Let (cn)N be a sequence of non negative numbers. Then are

equivalent:

• C-lim_n→∞cn= 0;

• C-limn→∞c2n= 0.

Proposition 2.23. Let f ∈ L2_(X) _{be a weakly mixing function. The for}

any g ∈ L2_(X)_{we have}

(36)

24 Chapter 2. Furstenberg's proof Proof. By T invariance, we see that hTn_{f, gi = hf, T}−n_gi_{, and by lemma}

2.22 we just need to prove that C-limn→∞hTnf, gi2 = 0.

We have 1 N N −1 X n=0 hTnf, gi2 = N −1 X n=0 hTnf, giTnf, g Applying Cauchy-Schwartz inequality we just need to prove that

C-lim_n→∞hTn_{f, giT}n_{f = 0.}

If we call vn = hTnf, giTnf, observing that (vn)N is a bounded sequence

of non negative numbers since f, g ∈ L2_(X)_{and kfk}

L2 = kTnf k_L2, we can

apply van der Corput lemma; so it suces to prove that

0 = C-limhC-supnhvn, vn+hi = C-limhC-supnhTnf, giTnf, hTn+hf, giTn+hf

= C-limhC-supnhTn, f gihTn+hf, gihTnf, Tn+hf i.

Again hTn_{, f i} _{and hT}n+h_{f, gi}_{are bounded and by T invariance}

hTnf, Tn+hf i = hf, Thf i thus we obtain

C-lim_hC-sup_nhTn_{, f gihT}n+h_{f, gihT}n_{f, T}n+h_{f i ≤ γ·C-lim}

h→∞|hf, Thf i| = 0,

since f is weakly mixing.

With the proposition above, we can characterize weakly mixing system in terms of weakly mixing functions.

Theorem 2.24. A system X is weakly mixing if and only if every function in L2_(X)_{with mean zero is weakly mixing.}

Proof. If (X, A, µ, T ) is weakly mixing, then for every f ∈ L2_(X)_{we have}

C − lim n→∞|hT n_{(f ), f i| ≤ C − lim} n→∞kT n f kkf k = E(f )2, and we conclude assuming that f has mean 0.

Vice versa, if A, B are two measurable sets we want to prove that C-lim_n→∞|µ(T−n(A) ∩ B) − µ(A)µ(B)| = 0;

now 1A− E(1A)is a mean 0 function, thus it is weakly mixing by assumption

and applying proposition 2.23 we have

C-limn→∞hT−n(1A− E(1A), 1Bi = 0

and thus

(37)

2.3. Weakly mixing systems 25 Proposition 2.25. Let (X, A, µ, T ) be a weakly mixing system and k ≥ 1 an integer. Let f1, . . . , fk ∈ L∞(X) and let a1, . . . , ak be a sequence of

distinct non-zero integers. Then

C-limn→∞Ta1nf1. . . Taknfk= E(f1) . . . E(fk).

Proof. We proceed by induction on k. If k = 1 then observe that if (X, A, µ, T ) is weakly mixing then (X, A, µ, Ta1 is weakly mixing too and hence ergodic;

then we can apply the von Neumann ergodic theorem to conclude.

For the inductive step, let us assume that we are working with zero mean functions and thus that our goal is to prove the following

C-lim_n→∞Ta1n_f

1. . . Taknfk = 0;

we can assume this easily, we just need to replace f1with f1−E(f1)and using

the inductive hypothesis on f2, . . . , fk. Moreover if vr = Ta1rf1. . . Takrfk

then we can apply van der Corput lemma on the sequence vr, and thus we

just need to prove that

C-limh→∞C-supn→∞hvn, vn+hi = 0. By denition we have hv_n, vn+hi =Ta1nf1. . . Taknfk, Ta1(n+h)f1. . . Tak(n+h)fk =T−akn_Ta1n_f 1. . . Taknfk , T−akn_Ta1(n+h)_f 1. . . Tak(n+h)fk =T(a1−ak)n_f 1. . . T(ak−1−ak)nfk−1fk, T(a1−ak)nTa1hf1. . . . . . T(ak−1−ak)n_Tak−1h_f k−1Takhfk = Z X T(a1−ak)n_f 1,h. . . T(ak−1−ak)nfk−1,hfk,hdµ,

where fj,h= fjTajhfj. Now f is bounded and so fj is bounded too, applying

Cauchy-Schwartz inequality implies that we can actually prove the following C-limhC-supnT(a1−ak)nf1,h. . . T(ak−1−ak)nfk−1,h= 0.

By induction hypothesis we have

C-limn→∞T(a1−ak)nf1,h. . . T(ak−1−ak)nfk−1,h = E(f1,h) . . . E(fk−1,h),

and since f1 is weakly mixing we obtain

C-lim_h→∞_E(f_1,h) = C-sup_h→∞E(f1Ta1hf1) = C-suph→∞hf1, Ta1hf1i = 0;

hence putting the last two equation together, we conclude that C-limh→∞C-supn→∞T(a1

−ak)n_f

1,h. . . T(ak−1−ak)nfk−1,h

(38)

26 Chapter 2. Furstenberg's proof Corollary 2.26. With the same assumption of proposition 2.25 we have

C-limn→∞

Z

X

Ta1n_f

1. . . Taknfkdµ = E(f1) . . . E(fk).

Theorem 2.27. A weakly mixing system is SZ.

Proof. Let (X, A, µ, T ) be a weakly mixing system and f ∈ L∞_(X) _a

non-negative function with positive mean. By the corollary above, we have lim inf 1 N N −1 X n=0 Z X

(39)

2.4. Compact systems 27

2.4 Compact systems

In this section we will take care of the structural part of a measure preserving system; the following denition is very natural and standard. Denition 2.28 (Almost periodic functions). Let (X, A, µ, T ) be a measure preserving system. We say f ∈ L2_(X)_{is almost periodic if its orbit {T}n_{f :}

n ∈ Z} is precompact in L2(X) with the norm topology. Equivalently, f is almost periodic if the set {n ∈ Z : kf − Tn_{f k < ε}} _{is syndetic (i.e. has}

bounded gaps) for every ε > 0.

Denition 2.29 (Compact systems). A measure preserving system is a compact system if every function in L2 _{is almost periodic.}

Theorem 2.30. Every compact system is SZ.

Proof. We need to prove multiple recurrence for compact systems, i.e. if f is a positive almost periodic function in L2_{(X) ∩ L}∞_(X) _{is almost periodic}

function with positive mean then lim inf N →∞ 1 N N −1 X n=0 Z X f Tnf . . . T(k−1)nf dµ >, 0 where 1 < k is a positive integer.

Without loss of generality, suppose f < 1; let ε > 0 and observe that since f is almost periodic the following set is synthetic

Nε= {n ∈ N : kf − Tnf k <

ε k2k.

Since we are in a measure preserving system, for n ∈ Nεwe have bounds on

the iterates of T :

kTin− TinTnf kL2 = kTinf − T(i+1)nf k_L2 <

ε k2k.

Hence for any j ∈ {1, . . . , k − 1} we have

kf − Tjnf k_L2 ≤ kf − Tnf k_L2 + kTnf − T2nf k_L2+ . . . + kT(j−1)nf − Tjnf k_L2

< j · ε k2k ≤

ε 2k

and we can decompose Tjn_{f = f + g}

j, with kgjk < ε/2k. Thus we deduce

that Z X f Tnf . . . T(k−1)nf dµ = Z X f (f + g1) . . . (f + gk−1)dµ = Z X fkdµ +X i Z X fsi Y j∈Ti gjdµ

(40)

28 Chapter 2. Furstenberg's proof But Pi R Xf s i Q j∈Tigjdµ = P

ihFi, Gii, where Fi is a product of f and some

gj and Gi is a product of gj; since kfkL∞ ≤ 1 and kG_ik_L2 ≤ ε/2k ≤ 1. we

have that |hFi, Gii| ≤ ε/2k; but actually we can have at most 2k products

and thus _Z X f Tnf . . . T(k−1)nf dµ ≥ Z X fkdµ − ε.

To conclude observe that Nε is syndetic, so let dε be the maximum gap of

Nε; since the equation above holds for every n ∈ Nε and there are at least

N/dε elements of Nε lying in [1, N] we have that

lim inf n→∞ 1 N N −1 X n=0 Z X f Tnf . . . T(k−1)nf dµ ≥ lim inf N →∞ 1 N · N dε Z X fkdµ − ε = 1 dε Z X fkdµ − ε> 0 we conclude for ε suciently small.

Since HS operators are compact we can use them to nd almost periodic functions in the following sense

Proposition 2.31. Let (X, A, µ, T ) be a measure preserving system and Ξ : L2(X) → L2(X) be a compact operator which commutes with T then for every f ∈ L2_{(X), Ξ(f) is an almost periodic function.}

Proof. We need to prove that the orbit {Tn_{(Ξf )}}

N is precompact; since Ξ

commutes with T we have {Tn_{(Ξf )}} N= {Ξ(T n_{f )}} N= Ξ({T n_{f }} N)

then we observe that the orbit of f, {Tn_{f }}

N, is bounded in L2(X)and Ξ is

compact.

2.5 Dicothomy theorem

We can now prove that L2_(X) _{admits a decomposition in a periodic}

component plus a random one. Let A(X) and W (X) be respectively the sets of the almost periodic functions and weakly mixing functions in L2_(X)_.

Theorem 2.32. L2_{(X) = A(X) ⊕ W (X).}

Proof. We divide the proof in three parts • W (X) ⊂ A(X)⊥_;

• A(X)⊥ ⊂ W (X);

(41)

2.5. Dicothomy theorem 29 1. Let f be a weakly mixing function, and g an almost periodic function,

we want to prove that hf, gi = 0.

By T -invariance, we have hTn_{f, T}n_{gi = hf, gi}_{, thus it suces to show}

that

C-lim_n→∞|hTnf, Tngi| = 0.

Let ε > 0; since g is almost periodic we may nd g1, · · · , gk ∈ L2(X)

such that for any n, we have kTn_{g − g}

ik < ε, for some 1 ≤ i ≤ k; thus

Since f is weakly mixing we can use proposition 2.23 C-limn→∞|hTnf, Tngi| ≤ εkf kL2,

and we conclude by the arbitrary of ε.

2. We will show that if f is not weakly mixing then there exists g ∈ A(X) such that hf, gi 6= 0.

Since f is in L2_(X)_{, the associated operator of L}2_(X)_{, ϕ}

f(g) = hf, gif is an HS operator kϕ_fk2_HS =X N kϕ_f(en)k2 = X N |hf, e_ni|2kf k2_L2 = kf k4_L2.

Let U be the unitary operator dened on the space H of the HS opera-tors, by setting U(S) = T−1_{◦ S◦T}_{. By denition we have U(ϕ}

f) = ϕT f

and using von Neumann ergodic theorem 1 N N −1 X n=0 Unϕf → Ψf,

for some U-invariant operator Ψf ∈ H. Actually Ψf commutes with

T since

Ψf = U Ψf = T ◦ Ψf ◦ T−1

(42)

30 Chapter 2. Furstenberg's proof Finally hΨ_f(f ), f i = lim N 1 N N −1 X n=0 hUnϕf(f ), f i = lim N 1 N N −1 X n=0 hϕ_Tn_f(f ), f i = lim N 1 N N −1 X n=0 hhTnf, f iTnf, f i = lim N 1 N N −1 X n=0 |hTnf, f i|2 = C-limN →∞|hTnf, f i|2 6= 0.

3. Since f and T f have the same orbit, A(X) is a T -invariant subspace of L2_(X)_.

Let (fn) ⊂ A(X) such that fn→L2 f; we prove that the orbit of f is

totally bounded (and so f ∈ A(X)).

Let ε > 0, and m ∈ N such that kfm− f k < ₂ε. Since fm is an almost

periodic function we may nd g1, · · · , gk ∈ L2(X)such that for any n,

we have kTn_f

m− gik < ε₂, for some 1 ≤ i ≤ k and thus we have

kTnf − gikL2 ≤ kTnf − Tnf_mk_L2 + kTnf_m− g_ik_L2 ≤ ε.

Corollary 2.33. A measure preserving system is weakly mixing if and only if the almost periodic functions are constant almost everywhere.

Proof. Let f ∈ A(X), then f −E(f) is an almost periodic function with zero mean; since X is weakly mixing, f − E(f) is also weakly mixing, hence using the decomposition above we obtain

f − E(f ) = 0 a.e.

Fact 2.34. A(X) is closed under max f, g and min f, g operators.

With the characterization above we can now give an explicit decomposi-tion of L2_(X)_.

Denition 2.35. A factor of a measure preserving system (X, A, µ, T ) is a T-invariant subalgebra of A. A factor is called trivial if its elements have all measure 0 or 1; a factor is called compact if the induced measure preserving system is a compact system.

(43)

2.5. Dicothomy theorem 31 Since A(X) is a closed T -invariant subspace of L2_{, we can consider the}

so called Kronecker factor

A_A= {A ∈ A : 1A∈ A(X)}.

Proposition 2.36. Let (X, A, µ, T ) a measure preserving system and f ∈ L2(X). Then

1. f ∈ A(X) i f is AA-measurable;

2. f ∈ W (X) i E(f|AA) = 0;

3. f = fA+ fW, where fA= E(f |AA) and fW = f − fA.

Proof. 1. If f is AA-measurable then f can be approximated by linear

combinations of {1i

A} ⊂ A(X)and thus f is in A(x) by closure.

Vice versa let f ∈ A(X), we prove that {x : f(x) < a} ∈ AA. We

observe that lim

n→∞min{1, max{n(f − a), 0}} →L2 1{f (x)<a};

since A(X) is closed under point wise min and max operations, we conclude that 1{f (x)<a}∈ A(X) and so {f(x) < a} ∈ AA.

2. f ∈ W (X) i f ∈ A(X)⊥ _{i E(f|A}

A) = 0.

3. Follows from L2_{(X) = A(X) ⊕ W (X)}_.

Corollary 2.37. AA is the maximal compact factor of (X, A, µ, T ) and is

non trivial if and only if there exists an almost periodic function which is non constant almost everywhere.

Finally we are ready to formulate (and prove) the dichotomy theorem. Proposition 2.38. Let (X, A, µ, T ) a measure preserving system. Then exactly one of the following holds:

1. X is weakly mixing;

2. X has a non trivial compact factor.

Proof. If X is not weakly mixing, by the corollary above there exists f ∈ A(X)which is not constant a.e., hence AAis a non trivial compact factor.

(44)

32 Chapter 2. Furstenberg's proof

2.6 The ergodic argument

We start estimating the distance between two functions and their almost periodic parts.

Proposition 2.39. Let (X, A, µ, T ) be an ergodic measure preserving sys-tem. Then for any f1, f2 ∈ L2(X)we have:

C-lim_n→∞(Tnf1T2nf2− TnE(f1|AA)T2nE(f2|AA)) = 0

Proof. Since we can always decompose a function in his weakly mixing and almost periodic parts, we just need to prove

C-limn→∞Tnf1T2nf2 = 0,

if either f1 or f2 is weakly mixing.

By using van der Corput lemma it suces to show that C-sup_hC-limnhvn, vn+hi = 0

where vn= Tnf1T2nf2.

Using linearity and µ-invariance of T we have hvn, vn+hi = Z X Tnf1T2nf2Tn+hf1T2(n+h)f2dµ = Z X f1Tnf2Thf1Tn+2hf2dµ = Z X f1Thf1Tn(f2T2hf2)dµ

by von Neumann's Ergodic theorem C-limn→∞ Z X f1Thf1Tn(f2T2hf2)dµ = Z X f1Thf1(C-limn→∞Tn(f2T2hf2))dµ = Z X f1Thf1E(f2T2hf2)dµ = hf1, Thf1ihf2, T2hf2i = 0

since either f1 or f2 is weakly mixing and thus at least one of the inner

products is zero.

Theorem 2.40. Let (X, A, µ, T ) be an measure preserving system. Then for any f ∈ L∞_(X)_{, f ≥ 0, E(f) > 0, we have}

lim inf N →∞ 1 N N −1 X n=0 Z X f Tnf T2nf dµ > 0 .

(45)

2.6. The ergodic argument 33 Proof. Via ergodic decomposition we can assume that the system is ergodic. By the proposition above we obtain

lim inf N →∞ 1 N N −1 X n=0 Z X f Tnf T2nf dµ = Z X f C-lim infN →∞Tnf T2nf dµ = Z X

f C-lim infN →∞TnE(f |AA)T2nE(f |AA)dµ

= C-lim infN →∞ Z X f TnE(f |AA)T2nE(f |AA)dµ = C-lim infN →∞ Z X

E(f |AA)TnE(f |AA)T2nE(f |AA)dµ > 0

(46)

(47)

Chapter 3

Higher-degree Uniformity

3.1 A new notion of pseudo-randomness

There seems to be no trivial way to obtain progressions of length greater than three by using α-uniformity (see 3.11).

The aim of this section is to dene a notion of pseudo-randomness which is more suitable for the purpose.

Let us resume the proof of Roth's theorem.

• The rst step was to dene an appropriate notion of randomness, α-uniformity;

• then we observe at the very start of the proof of Roth's theorem that if A is uniform then there is a genuine arithmetic progression lying in A. In fact if A is uniform, we have bounds on every Fourier coecient of A and thus we can bound the expression

N−1X

r

ˆ

A(r) ˆA(−2r) ˆA(r) = N−1 X

x,y,z,r

ωr(2y−x−z),

which is actually the number of arithmetic progressions in A.

• As nal step, we proved that if A ⊂ {1, . . . , N} of size δN is not uniform, then we can nd an arithmetic progression P (in [1, N]) such that

|A ∩ P | ≥ (δ + ε)|P |,

with ε depending on δ, or, roughly speaking, that if A is not uniform then there is an arithmetic progression P such that A has relative density respect to P bigger then δ. A simple iteration proves Roth's theorem.

Let us recall how we obtained such an arithmetic progression. 35

(48)

36 Chapter 3. Higher-degree Uniformity The idea is that if A exhibits some large Fourier coecient (i.e. A is not uniform), then we can nd this kind of arithmetic progression. A Fourier coecient is something of the form

ˆ

f (r) =X

s

f (s)ω−rs,

where we was interested in the special case in which f is the balanced function of A; hence our interest in studying the linear function ϕ(x) = rx is justify by the formula above. In fact 1.14 show us that if ϕ is linear and we have some large ϕ-Fourier coecient, i.e. an expression of the form X x f (x)ω−ϕ(x) ,

then we can nd a partition of the interval in which we are working, such that every element of the partition is an arithmetic progression (with bounds on the length) and the function f(x) is large on this partition, i.e. m X j=1 X x∈Pj f (x) ≥ αr 2 .

In corollary 1.15 we just take f(x) to be the balanced function of A and ϕ(x) = rx, where r is the large Fourier coecient of A.

Dealing with the case k = 4 will be harder then the previous one, but we can still organize the proof in three steps.

• In this section we shall see how to generalize α-uniformity to the notion of α-uniformity of degree d. With d = 1 we obtain the usual uniformity. • Then we will prove that if A is uniform of degree d − 2 then contains an arithmetic progression mod-N of length d. We shall then show how to obtain genuine progressions, which turns out to be a minor technicality, similar to the corresponding technicality in the proof of Roth's theorem.

• Now we want to generalize the density argument, i.e. we want to produce an arithmetic progression P on which we can increase the relative density of A. Let us say, informally, that a function f exhibits a polynomial bias if there is a polynomial ϕ : ZN → ZN such that

X

s

f (s)ω−ϕ(s)

is large. Well, in Roth's proof, we showed that a set which fails to be uniform then exhibits a linear bias; the idea is now to show that a set which fails to be uniform of degree d exhibits polynomial bias, rather

(49)

3.1. A new notion of pseudo-randomness 37 than linear bias. In the next two section we will develop this idea, generalizing corollary 1.14 and 1.15 from linear functions to general polynomials; then we will shortly show how to prove that a set A must nevertheless exhibit quadratic bias of some sort and that polynomial bias actually implies linear bias on smaller subprogressions. We will just sketch the proof of the last fact since what is needed is a variation on a classical theorem of Freiman (the modication is a result of Balog and Szemerédi), with a long and dicult combinatorial proof (expe-cially the quantitative part), that is actually out of the range of this thesis. Neverthless we refer to [G2] for further details and we will try to explain as best we can this step of the proof.

So, with these steps in mind, let us start generalizing the notion of uniformity. Denition 3.1. Given a function f : ZN → C, we dene, for any k, the

dierence function ∆(f; k) by

∆(f ; k)(s) = f (s)f (s − k).

The reason for the terminology is that often will be the case f(s) = ωϕ(s)_,

for some function ϕ : ZN → ZN, thus ∆(f; k) = ωϕ(s)−ϕ(s−k).

We can also dene iterated dierence functions by setting ∆(f ; a1, . . . , ad)(s) = ∆(∆(f ; a1, . . . , ad−1); ad)(s).

We can make explicit the result of the inductive process; Let C be the point-wise complex conjugate map from CN _{to C}N_.

Denition 3.2. Given a function f : Z → CN_{, we dene}

∆(f ; a1, · · · , ad)(s) = Y ε1,...,εd (Cε1,··· ,εd_{f )} s − d X i=1 aiεI ,

where the product is over all sequences ε1, . . . , εd, εi ∈ {0, 1}. When

d = 3, for example, we have

∆(f ; a, b, c)(s) = f (s)f (s − a)f (s − b)f (s − c)· ·f (s − a − b)f (s − a − c)f (s − b − c)f (s − a − b − c).

We now dene a function f from ZN to the closed unit disc D ⊂ C to be

α-uniform of degree d if X a1,...,ad X s ∆(f ; a1, . . . , ad)(s) 2 ≤ αNd+2.

(50)

38 Chapter 3. Higher-degree Uniformity When d = 1 we have the same denition of α-uniformity as in section 1, when d = 2, 3 we say that f is quadratically or cubically α-uniform re-spectively. Once again we have several useful reformulations (since they are easier generalization of the basic case, we just sketch the proof).

Lemma 3.3. Let f be a function from ZN to D. The following are

equiva-lent.

(i) f is c1 uniform of degree d.

(ii) Ps

P

a1,...,ad+1∆(f ; a1, . . . , ad+1)(s) ≤ c1N

d+1_.

(iii) There is a function α : Zd−1

N → [0, 1]such that Pa1,...,ad−1α(a1, . . . , ad−1) ≤

c1Nd−1 and ∆(f; a1, . . . , ad−1) is α(a1, . . . , ad−1)-uniform, for every

(a1, . . . , ad−1).

(iv) There is a function α : ZN → [0, 1] such that Prα(r) = c1N and

∆(f ; r)is α-uniform of degree d − 1 for every r. (v) Pa1,...,ad−1|

P

r∆(f ; a1, . . . , ad−1)∧(r)|4 ≤ c1Nd+3.

Proof. We will prove the following scheme

(i) (ii) (iii) (v) (iv)

(i) ⇔ (ii) Since the left-hand sides of the relevant expressions are equal, this is not an hard task, just observe that:

X s X a1,...,ad+1 ∆(f ; a1, . . . , ad+1)(s) = X s X a1,...,ad+1 ∆(∆(f ; a1, . . . , ad); ad+1)

(ii) ⇔ (iii) This step is trivial, just look at the statements (ii) ⇔ (iv) This step could be done with a simple induction

(i) ⇔ (v) The equivalence of (i) and (v) follows, as in the proof of the equivalence of (i) and (iii) in Lemma 1.7, by expanding the left-hand side of (v).

Thanks to (i) and (ii) we shall dene a function f : ZN → D to be

α-uniform of degree 0 if | Psf (s)|2 ≤ αN, and now property (iv) makes sense

when d = 1.

(51)

3.1. A new notion of pseudo-randomness 39 Theorem 3.4. Let k ≥ 2 and let f1, . . . , fkbe functions from ZN to D such

that fk is α-uniform of degree k − 2. Then

|X r X s f1(s)f2(s − r) . . . fk(s − (k − 1)r)| ≤ α1/2 k−1 N2. Proof. When k = 2, we know that

X r X s f1(s)f2(s − r) = X s f1(s) X t f2(t) ≤ α 1/2_N2_,

since | Psf1(s)| ≤ N and | Ptf2(t)| ≤ α1/2N. For k > 2 we proceed by

induction; assume the result for k − 1, let fk be α-uniform of degree k − 2

and let α : ZN → [0, 1] be a function with the property that ∆(fk; r) is

α(r)-uniform of degree k − 3 for every r ∈ ZN. Then

X r X s f1(s) . . . fk(s − (k − 1)r) 2 ≤ NX s X r f1(s)f2(s − r) . . . fk(s − (k − 1)r) 2 ≤ NX s X r f2(s − r)f3(s − 2r) . . . fk(s − (k − 1)r) 2 = NX s X r X t f2(s − r)f2(s − t) . . . fk(s − (k − 1)r))fk(s − (k − 1)t) = NX s X r X u f2(s)f2(s − u) . . . fk(s − (k − 2)r)fk(s − (k − 2)r − (k − 1)u) = NX s X r X u

∆(f2; u)(s)∆(f3; 2u)(s − r) . . . ∆(fk; (k − 1)u)(s − (k − 2)r).

Since ∆(fk; (k − 1)u)is α((k −1)u)-uniform of degree k −3, our inductive

hypothesis implies that this is at most N Puα((k − 1)u)1/2

k−2

N2, and since P

uα((k − 1)u) ≤ αN, this is at most α1/2

k−2

N4, which provides the result for k.

The interest in the theorem above is that the expression on the left-hand side can be used to count arithmetic progression, as we saw in the simple case of length 3.

Denition 3.5. A set A ⊂ ZN is said to be α-uniform of degree d if its

balanced function is.

The next result implies that if a set A is α-uniform of degree d − 2, for α small enough, then A contains about the number of arithmetic progressions of length d that a random set of the same cardinality would have.

(52)

40 Chapter 3. Higher-degree Uniformity Remark 3.6. For now we always means arithmetic progressions mod N. How to obtain genuine arithmetic progression would be a minor technicality, similar to the corresponding in the proof of Roth's theorem.

Corollary 3.7. Let A1, . . . , Ak be subsets of ZN, such that Ai has

cardi-nality δiN for every i, and is α2

i−1

-uniform of degree i − 2 for every i ≥ 3. Then X r |(A₁+ r) ∩ . . . ∩ (Ak+ kr)| − δ1, . . . , δkN2 ≤ 2 k_αN2_.

Proof. Let fi be the balanced function of Ai. Then

|(A1+ r) ∩ . . . ∩ (Ak+ kr)| =

X

s

(δ1+ f1(s − r)) . . . (δk+ fk(s − kr)),

so we can write |(A1+ r) ∩ . . . ∩ (Ak+ kr)| − δ1, . . . , δkN as

X ∅6=B⊂[k] Y i /∈B δi X s Y i∈B fi(s − ir).

Now if j = max B then Pr

P s Q i∈Bfi(s − r) is at most α2 j−1_/2j−1 N2_{, by}

the theorem above. It follows that X r |(A1+ r) ∩ . . . ∩ (Ak+ kr)| − δ1, . . . , δkN2 ≤ X ∅6=B⊂[k] Y i /∈B δi· αN2 = αN2 k Y i=1 (1 + δi) − 1 ,

which is at most 2k_αN2_{, as required.}

Let use prove two simple lemmas.

Lemma 3.8. Let d ≥ 1 and let f : ZN → D be α-uniform of degree d. Then

f is α2-uniform of degree d − 1. Proof. We know that

X a1,...,ad X s ∆(f ; a1, . . . , ad)(s) 2 ≤ αNd+2

by the Cauchy-Schwartz inequality follows that X a1,...,ad X s ∆(f ; a1, . . . , ad)(s) ≤ α 1/2_Nd+1