**3.3** **Recurrence**

One of the most important objections raised against the possibility of describing the approach to the statistical equilibrium in time-reversible dynamical system (as the Hamiltonian ones), is the famous one due to Zermelo. He made use of a a theorem due to Poincaré stating that almost any initial condition of a reversible dynamical system evolves in such a way that the representative point passes arbitrarily close to the initial one infinitely many times (invoking no hypothesis such as ergodicity or mixing). He thus contrasted the idea of Boltzmann who was trying to build up a (kinetic) theory of approach to equilibrium. In particular, the paradoxical case treated by Zermelo is that of a gas initially contained in the left half part of a vessel.

Due to the Poincaré recurrence theorem, the evolution of the gas is such that the gas will soon or later go back close to such a strange configuration, and it will do that infinitely many times. Actually there is no paradox, and such recurrences of “strange” states may actually take place in real systems. The point is that the probability of such states is usually extremely small, and quickly vanishes as the number of degrees of freedom of the system goes to infinity.

Moreover, by another theorem due to Kac, under the ergodicity hypothesis, the time needed
to observe the recurrence of a small set A of initial data is proportional to the inverse of its
probability, so that if the latter is extremely small, the former is extremely large, so large that
the most pathological recurrences are practically unobservable. In the Zermelo example, the
a priori probability that the gas occupies the left part of the cubic vessel, independently of the
velocities of its N particles, is p = 1/2^{N}. By the Kac theorem, if the gas evolution is ergodic,
the recurrence time of such a configuration is*τ/p = τ2*^{N}, where*τ is a physically relevant *
time-unit (the minimum time-scale on which something happens in the gas: e.g. the single binary
collision time-scale). With the normal densities involved in gasses, one has about N = 10^{25}
particles per cubic meter. Even with a time unit*τ of the order of the picosecond (10*^{−12}seconds),
the Kac recurrence time of such an event, in a cubic meter, turns out to be of the order of 10^{10}^{24}
seconds, i.e. a time enormously larger than the present estimate of the age of the universe,
which amounts to about 10^{18}seconds only.

**Poincaré theorem**

**Theorem 3.4 (Poincaré recurrence theorem). Let (**Ω^{,}Φ^{t}^{,}*µ) a dynamical system, and let A be*
any measurable subset of Ω ^{such that} *µ(A) > 0. Then, almost every point x ∈ A is such that*
Φ^{t}(x) ∈ A for arbitrarily large values of the time t (i.e. ∀T > 0 ∃t > T :Φ^{t}(x) ∈ A).

C PROOF. We report the non simple proof due to Kac [21], which makes use of a construc-tion that can be later used to estimate the recurrence times.

First, let us discretize the dynamics: we observe the system at times multiple of*τ, the*
latter being an arbitrary time-unit. The discrete flow consists then in the iterates of the
map

Ψ*τ*:=Φ^{τ}^{,} ^{(3.25)}

with the obvious relations Ψ^{n}* _{τ}* =Φ

^{n}

^{τ}^{,}Ψ

*=Ψ*

_{`τ}

^{`}*=Φ*

_{τ}*. Let us consider the sequence of sets {A*

^{`τ}_{n}}

_{n≥1}, where A

_{1}consists of the points of A whose first iterate belongs to A, and

A_{n}(n ≥ 2) consists of the points of A which exit A at the first iterate and re-enter A for
the first time at the n-th one, i.e.

A_{1}:= {x ∈ A :Ψ*τ*(x) ∈ A} = A ∩Ψ^{−1}_{τ}^{(A) ;} ^{(3.26)}

A_{n} := {x ∈ A :Ψ^{k}* _{τ}*(x) ∉ A, k = 1,..., n − 1; Ψ

^{n}

*(x) ∈ A} =*

_{τ}= A ∩C[Ψ^{−1}* _{τ}* (A)] ∩ ··· ∩C[Ψ

^{−n+1}

*(A)] ∩Ψ*

_{τ}^{−n}

_{τ}^{(A)}

^{(3.27)}for n ≥ 2. Of course, An∩ Am= ; if n 6= m, since the same point of A cannot recur to A for the first time at two different times. A less obvious fact is thatS

n≥1A_{n}differs from A by a
set of measure zero, at most, i.e.*µ(S*_{n≥1}A_{n}) =P

n≥1*µ(A*n*) = µ(A). Let us prove the latter*
fact. Taking in mind that*χ*C(A)*= 1 − χ*A, that*χ*_{A∩B}*= χ*A*χ*Band that*χ*_{Ψ}^{−1}_{τ}_{(A)}*= χ*A◦Ψ_{τ}^{, we}
can write the characteristic functions of the sets (3.26) and (3.27) as follows

*χ*A1*= χ*A(*χ*A◦Ψ*τ*) ;

*χ*An*= χ*A*[1 − χ*A◦Ψ*τ**] ···[1 − χ*A◦Ψ^{n−1}_{τ}^{](}*χ*A◦Ψ^{n}* _{τ}*) (n ≥ 2) .

By adding −1 and +1 to the first and last factor, one can rewrite the above characteristic functions as follows:

*χ*A_{1} *= [χ*A*− 1 + 1][χ*A◦Ψ*τ*− 1 + 1] =

*= [1 − χ*A*][1 − χ*A◦Ψ*τ**] + 1 − [1 − χ*A*] − [1 − χ*A◦Ψ*τ*] ;
*χ*An *= [χ*A*− 1 + 1][1 − χ*A◦Ψ*τ**] ···[1 − χ*A◦Ψ_{τ}^{n−1}^{][}*χ*A◦Ψ^{n}* _{τ}*− 1 + 1] =

*= [1 − χ*A*][1 − χ*A◦Ψ_{τ}*] ···[1 − χ*A◦Ψ^{n−1}_{τ}*][1 − χ*A◦Ψ^{n}* _{τ}*] +

*+ [1 − χ*A◦Ψ

_{τ}*] ···[1 − χ*A◦Ψ

^{n−1}

*] +*

_{τ}*− [1 − χ*A*][1 − χ*A◦Ψ*τ**] ···[1 − χ*A◦Ψ^{n−1}* _{τ}* ] +

*− [1 − χ*A◦Ψ*τ**] ···[1 − χ*A◦Ψ^{n−1}_{τ}*][1 − χ*A◦Ψ_{τ}^{n}] (n ≥ 2) .
One is thus naturally led to introduce the sets

B_{0} := C(A) ; (3.28)

B_{n} := {x ∉ A :Ψ*τ*(x) ∉ A,...,Ψ_{τ}^{n}(x) ∉ A} =

= C_{(A) ∩}C[Ψ^{−1}* _{τ}* (A)] ∩ ··· ∩C[Ψ

^{−n}

_{τ}^{(A)]}

^{,}

^{(3.29)}for n ≥ 1. Bnis the set of points out of A whose first n iterates stay all out of A. In terms of the B

_{n}’s the characteristic functions of the sets A

_{1}, A

_{2}, . . . simplify to

*χ*A1*= χ*B1*+ 1 − χ*B0*− (χ*B0◦Ψ_{τ}^{) ;} ^{(3.30)}
*χ*An*= χ*Bn*+ (χ*B_{n−2}◦Ψ_{τ}*) − χ*B_{n−1}*− (χ*B_{n−1}◦Ψ* _{τ}*) (n ≥ 2) . (3.31)
Upon integrating over the whole spaceΩand exploiting the invariance of the measure,
one obtains

*µ(A*1*) = µ(B*1*) + 1 − 2µ(B*0) ;

3.3. RECURRENCE 61
*µ(A*n*) = µ(B*n*) + µ(B*_{n−2}*) − 2µ(B*_{n−1}) (n ≥ 2) .

Just in order to simplify the notation, let us define the numerical sequence {w_{n}}_{n≥−1},
where

w_{−1}:= 1 ; wn*:= µ(B*n), n ≥ 0 . (3.32)
One thus gets

*µ(A*n) = wn+ w_{n−2}− 2w_{n−1}, (3.33)
which is valid for all n ≥ 1. Notice that by the definition (3.29) B_{n+1}⊆ Bnfor all n ≥ 0, and
1 = w−1> w0, so that the sequence of the w_{n} is non increasing and bounded from below
(by 0). Thus lim_{n→+∞}w_{n}= inf{wn}exists finite. One has

*µ*
µ

[

n≥1

A

¶

= X

n≥1

*µ(A*n) =X

n≥1

(wn+ w_{n−2}− 2w_{n−1}) =

= lim

N→+∞

N

X

n=1

[(wn− w_{n−1}) − (w_{n−1}− w_{n−2})] =

= lim

N→+∞(w_{N}− w_{N−1}− w0+ 1) = 1 − w0=

*= 1 − µ(*C*(A)) = µ(A) .*
Thus A \S

n≥1A_{n} has measure zero. SinceS

n≥1A_{n}is the set of points of A that recur to
A at least once, we have proved up to now that almost every point of A recurs to A at
least once, i.e. that for almost any x ∈ A, namely for any x ∈S

n≥1A_{n}, there exists n ≥ 1
such thatΨ_{τ}^{n}(x) ∈ A. By substitutingΨ_{τ}^{with}Ψ* _{`τ}*=Ψ

^{`}*for an arbitrary positive integer*

_{τ}*` ≥ 1, and repeating the whole reasoning made up to now, one also finds that, for almost*
any x ∈ A, namely for any x ∈S

n≥1A_{n}, there exists n ≥ 1 such thatΨ_{`τ}^{n} (x) =Ψ^{n}_{τ}* ^{`}*(x) ∈ A,
whatever be

*` ≥ 1. As a consequence, the set D*

*consisting of points of A whose iterates from the*

_{`}*`-th one on are out of A, i.e. D*

*`*:= {x ∈ A :Ψ

_{τ}^{n}

*(x) ∉ A ∀n ≥ `}, has zero measure,*whatever be

*` ≥ 1. Then, the set of points of A that recur to A at most a finite number*of times, which is the union of the sets D

*, has measure zero:*

_{`}*µ(S*

*D*

_{`≥1}*) ≤P*

_{`}*`≥1**µ(D** _{`}*) =
0. In other words, almost every point of A recurs to A infinitely many times, and the
theorem is thus proved. B

**Mean recurrence time: Kac theorem**

Let A ⊂Ω^{such that} *µ(A) > 0. For any x ∈ S*_{n≥1}A_{n}the integer valued function

n(x) := min{ j ≥ 1 :Ψ_{τ}^{j}(x) ∈ A} (3.34)
yields the first recurrence time of the point x to A, i.e. *τn(x). The mean recurrence time of the*
setA is defined as the normalized integral of the latter quantity over A, namely

T* _{τ}*(A) :=

*τ*

*µ(A)*

Z

A

n(x) d*µ .* (3.35)

**Theorem 3.5 (Kac theorem). If the dynamical system (**Ω^{,}Ψ_{τ}^{,}*µ) is ergodic, then R*_{A}n(x)d*µ = 1*
for any set A ⊆Ωof positive measure, and the mean recurrence time isT_{τ}*(A) = τ/µ(A).*

C PROOF. With reference to the proof of the Poincaré theorem, sinceS

n≥1A_{n} differs from
A by a set of measure zero, and recalling that A_{n} is the set of points of A that recur for
the first time to A at the n-th iterate, so that n(x) = k if x ∈ Ak, one has
Now, by means of (3.32) and (3.33) one can write

N

Now, from the latter identity it follows that since the sequence of the partial sums PN

k=1k*µ(A*k) is not decreasing, the sequence of the N(w_{N−1}− wN) + w_{N−1} is not
increas-ing, and is lower bounded (by 0). Thus lim_{N→+∞}[N(w_{N−1}− wN) + wN−1] exists, and since
lim_{N→+∞}w_{N} exists, then the limit

L = lim

N→+∞N(w_{N−1}− wN)

exists and, since {wN} is not increasing, it is L ≥ 0. But one can only have L = 0, since
L > 0 would imply the existence of a constant c > 0 and of an M > 0 such that w_{N−1}−wN>

c/N for all N > M (use the definition of limit); in such a case the seriesP

N(w_{N−1}− wN)
would diverge, which is impossible. Finally, one gets

Z

for any n, the limit set

B := \

This formula is general and requires no particular assumption. We are going to show that
the ergodicity hypothesis (not used up to now) implies*µ(B) = 0. Indeed, let us consider*
the setΨ*τ*(B). If x ∈Ψ*τ*(B) thenΨ^{−1}* _{τ}* (x) ∈ B, i.e. Ψ

^{n−1}

*(x) ∉ A for any n ≥ 0, i.e. Ψ*

_{τ}^{−1}

*(x) ∉ A*

_{τ}3.3. RECURRENCE 63
andΨ^{n−1}* _{τ}* (x) ∉ A for any n ≥ 1, i.e.Ψ

^{−1}

*(x) ∉ A and x ∈ B. In other wordsΨ*

_{τ}*(B) ⊆ B so that Ψ*

_{τ}^{n+1}

*(B) ⊆Ψ*

_{τ}^{n}

*(B) for any n ≥ 0. As a consequence, the limit set*

_{τ}C := \

n≥0

Ψ^{n}_{τ}^{(B)} ^{(3.38)}

exists. Now, due to the fact thatΨ*τ* is injective (it is a bijection ofΩ), the image under
Ψ*τ*of the intersection of sets is the intersection of the images of those sets^{1}, which yields

Ψ*τ*(C) =Ψ*τ*

µ

\

n≥0

Ψ^{n}_{τ}^{(B)}

¶

= \

n≥0

Ψ_{τ}^{n+1}(B) = C ,

i.e. the limit set C defined in (3.38) is invariant. If the system is ergodic, the invariant
set C must have measure zero or one (by metric indecomposability). By the invariance of
the measure,*µ(*Ψ_{τ}^{n}*(B)) = µ(B) for any n ≥ 0, which in turn implies*

*µ(C) = µ*
µ

\

n≥0

Ψ_{τ}^{n}^{(B)}

¶

= lim

n→+∞*µ(*Ψ^{n}_{τ}*(B)) = µ(B) .*

Thus, either*µ(B) = 1 or µ(B) = 0. The former possibility must be excluded, since *
other-wise A ⊆C(B) would have measure zero, while*µ(A) > 0 by hypothesis.* B

The Kac theorem yields a simple formula for the mean recurrence time, which explains why
the events with small probability recur rarely. However, the whole treatment, starting with
formula (3.35) has a defect: for a continuous time dynamical system (Ω^{,}Φ^{t}^{,}*µ), the time-step τ*
is completely arbitrary. As a consequence, the mean recurrence time T* _{τ}* has the obvious and
useless limit T

_{0}

*= 0 as τ → 0, even if the system is not ergodic and the Kac theorem does not*hold. Of course, for arbitrarily small values of

*τ, most part of the points x ∈ A are such that*Φ

^{t}

*(x) ∈ A for all 0 ≤ t ≤ τ, i.e. the set A*1 is just a slight deformation of the set A, so that the fake recurrence is due to the fact that on the time-scale

*τ nothing happens in the system.*

**Example 3.9. In a gas, the single binary collision time-scale, below which almost nothing**
happens, is roughly given by

t_{coll}=*ρ*^{−1/3}− 2r0

pT/m ,

where *ρ = N/V is the density of the gas, r*0 is the effective interaction range of the molecular
interaction (two molecules whose distance is larger than 2r_{0} do not see each other), T is the
temperature of the gas andm is the mass of a single molecule.

Such a remark suggests a correction of the formula (3.35) for the mean recurrence time, namely

Tˆ* _{τ}*(A) :=

*τ*

*µ(A \ A*1)

Z

A\A1

n(x) d*µ ,* (3.39)

1In general, for a generic function f : X → Y , and any pair A, B ⊂ X , the inclusion f (A ∩ B) ⊆ f (A) ∩ f (B) holds, the equality being true iff f is injective.

due to Smoluchowski [21]. In this way, one is computing the mean recurrence time by making
*use only of those points x ∈ A that at time τ are out of A: A \ A*1= {x ∈ A :Ψ*τ*(x) ∉ A}. One has

Tˆ_{τ}*(A) = τ*

Pk≥2k*µ(A*k)

*µ(A) − µ(A*1)*= τ* *1 − µ(A*1*) − µ(B)*
*µ(A) − µ(A*1) .

Now, if the system is ergodic,*µ(B) = 0 and, noting that lim**τ→0**µ(A*1*) = µ(A), one gets*
lim*τ→0*

Tˆ* _{τ}*(A) =

*1 − µ(A)*lim

_{τ→0}

^{µ(A)−µ(A}

_{τ}^{1}

^{)},

which exists under the hypothesis that the limit at the denominator exists finite.

**Chapter 4**

**Hamiltonian perturbation theory**

An introduction to the so-called “canonical perturbation theory” in the Hamiltonian framework is here provided. The main idea of perturbation theory is that of trying to characterize the properties of a system close to an integrable one in terms of the properties of the latter. For example, one would like to know whether the first integrals of the integrable system survive the perturbation, and how long.

**4.1** **Quasi-integrable systems**

Loosely speaking, the dynamical system ˙x = u(x) is said to be quasi-integrable, or close to integrable, in D ⊆Γif, for any x ∈ D, one has

*u(x) = v(x) + δv(x) ,* (4.1)

where v(x) is a vector field such that the system ˙x = v(x) is integrable, i.e. solvable for generic
initial conditions in some sense to be specified, whereas*δv(x) is a small vector field *
perturba-tion, i.e.

*kδvk ¿ kvk ,* (4.2)

where k · k is a suitable norm.

**Remark 4.1. In general, the concept of closeness to integrability is a local one, i.e. D ⊂**Γ^{. As a}
consequence, whether a system can be considered quasi-integrable or not depends on the initial
conditions. Given a vector fieldu(x) onΓ, its splitting into integrable partv plus perturbation
*δv may be different in different regions of*Γ^{.}

On the other hand, integrability is meaningful and useful if it is a global property of the system, i.e. if it holds all over the phase space of the system.

In restricting our attention to Hamiltonian systems, we recall here the following theorem, which completely characterizes integrability from a Hamiltonian point of view, and also con-stitutes the basement of most part of Hamiltonian perturbation theory.

**Theorem 4.1 (Liouvulle-Arnol’d). Let the Hamiltonian system defined by H be integrable in**
Γ⊆ R^{2n} in the sense of Liouville, and leta ∈ R^{n}such that the level set

M_{a}:= {(q, p) ∈Γ ^{: F}1(q, p) = a1, . . . , F_{n}(q, p) = an}
65

is non empty; let also M^{0}_{a} denote a (nonempty) connected component of M_{a}. Then, a
func-tion S(q, a) exists, such that p · dq|M^{0}_{a} = dqS(q, a) and S is a complete integral of the
time-independent Hamilton-Jacobi equation, i.e. the generating function of a canonical
transforma-tionC ^{: (q}, p) 7→ (b, a) such that H(C^{−1}^{(b}*, a)) = H(q,∂S(q, a)/∂q) = K(a).*

Moreover, ifM^{0}_{a}is compact, then a small neighborhoodU of M_{a}^{0} is canonically diffeomorphic
to T^{n}× B, where B ⊂ R_{+}^{n}, i.e. there exists a canonical transformation to angle-action variables
F : U → T^{n}*× B : (q, p) 7→ (ϕ, I) such that H(*F^{−1}^{(}*ϕ, I)) = h(I) and F*j(F^{−1}^{(q}, p)) = fj(I) for any

j = 1,..., n.

Thus, for Liouville-integrable Hamiltonian systems displaying compact families of level
sets, canonical action-angle coordinates (*ϕ, I) can be introduced, such that both the *
Hamilto-nian and all the first integrals depend on the action variables I only. In terms of the
vari-ables (*ϕ, I), the dynamics of the system becomes trivial: the Hamilton equations ˙ϕ = ∂h/∂I,*
*I = −∂h/∂ϕ imply that I(t) = I(0) := I*˙ 0and*ϕ(t) = ϕ(0) + ω*0t, where

*ω*0*:= ω(I*0) :=*∂h(I*0)

*∂I* . (4.3)

The phase space of the system is thus locally “foliated” into invariant tori, on each of which the
motion is a translation with a frequency vector (4.3) depending, in general, on the value of the
action I_{0}labeling the torusT^{n}.

**Remark 4.2. In the present lectures we mean** T^{1}*= R/(2πZ), i.e. the (group of) real numbers*
modulo 2*π. Of course,*T^{n}= T^{1}× · · · × T^{1}

| {z }

n times

.

**Example 4.1. Autonomous Hamiltonian systems with one degree of freedom (n = 1) are **
ob-viously integrable. Any connected and compact component of the level curve H(q, p) = E, not
containing critical points of H, is a periodic orbit. In that case the action I = _{2π}^{1} H pdq. The
latter quantity is obviously a function of the energy level E: I = f (E). In such a case one has
H(F^{−1}^{(}*ϕ, I)) := h(I) = f*^{−1}(I). In the simplest mechanical case where H = p^{2}/2 + V (q), one has
I =^{1}* _{π}*R

_{q}

_{+}

q_{−} p2(E − V (q)) dq, where V (q_{±}) = E and V^{0}(q_{±}) 6= 0. As an example, for the harmonic
oscillator, with*V (q) = ω*^{2}q^{2}*/2, one finds I = E/ω.*

**Definition 4.1. A Hamiltonian system is said to be quasi-integrable in D ⊆**Γ if, for anyx ∈ D,
its HamiltonianH can be written as

H(x) = h(x) + P(x) ,

where h is the Hamiltonian of an integrable system and P is a perturbation Hamiltonian such that kXPk ¿ kXhk with respect to a suitable norm k · k.

Quite often in the literature, as a definition of quasi-integrability on finds the condition
sup_{D}|P| ¿ supD|h|; such a condition may turn out to be meaningless. As an example, consider
the case h(x) ≡ C, C being a positive constant. If P(x) is bounded in D, by a suitable choice of
C one can always satisfy the inequality |P(x)| ¿ |h(x)| = C ∀x ∈ D, but the Hamilton equations
read ˙x = XP(x), so that the dynamics of the system is completely decided by the perturbation
(on the other hand, any constant added to the Hamiltonian is is irrelevant to the dynamics of
the system). Even ruling out the problem of the constants, the smallness (in the sense of the
sup-norm) of the perturbation P by itself may be still meaningless.

4.1. QUASI-INTEGRABLE SYSTEMS 67
**Example 4.2. Consider a perturbed pendulum, whose unperturbed, integrable Hamiltonian**
is h(q, p) = p^{2}*/2 − cos(q). Let P(q) = εsin(q/ε*^{2}), where *ε is a small parameter. On any domain*
D = T^{1}*×[− ˆp, ˆp], with ε small enough one has sup*_{D}*|P| = ε ¿ sup*_{D}|h|. However, the dynamics of
the system, described by the equation ¨*q = −sin(q) − cos(q/ε*^{2})/*ε, is clearly influenced by the *
per-turbation. More precisely, setting*ξ := q/ε*^{2}and*τ := t/ε*^{3/2}, one gets d^{2}*ξ/dτ*^{2}*= − cos(ξ) − ε sin(ε*^{2}*ξ),*
so that, on a suitable time-scale, the dynamics is strongly influenced by the perturbation.

The smallness of the perturbing vector field XP is usually more restrictive than, and ac-tually implies, the smallness of the perturbing Hamiltonian P with respect to the integrable component h, up to constant terms.

**Remark 4.3. Quasi-integrability does not mean, in general, closeness of the solution of the**

“unperturbed” problem ˙x = v(x) to the solution of the perturbed one, equation (1.1): any kind of scenario is possible, depending on the class of problems treated, on the initial conditions, and so on.

**Example 4.3. Let us consider a planetary system with a large mass star at rest in the origin,**
described by the Hamiltonian

with corresponding equations (in second order form)

**¨r**_{i}= −Ms
of a n one-body Kepler Hamiltonians. The perturbation P takes into account the interaction
between pairs of planets. One can easily check that in the domain D := B × R^{3n}, where B is a

the quasi-integrability of system (4.5) is guaranteed. Inside B the acceleration of each planet
is due mainly to the interaction planet-star, with a small cumulative effect due to all the other
planets. On the other hand, the condition sup_{D}|P| ¿ supD|h| might well contain the case of a
pair of planets so close to each other that the effect of the interaction with the star is, for both of
them, of minor importance; in such a case the system is still quasi-integrable, but the reference
integrable system is no longer the one defined byh. A simple example of this kind is the system
Sun-Moon: the integrable reference problem consists of the center of mass of the
Earth-Moon system moving under the influence of the Sun, plus the relative motion of Earth and Earth-Moon
around their center of mass (try to show this explicitly).

Hamiltonian perturbation theory was developed by Poincaré to solve the following funda-mental problem. Consider a quasi-integrable system with Hamiltonian of the form

H(*ϕ, I) = h(I) + εP*1(*ϕ, I) + ε*^{2}P_{2}(*ϕ, I) + ε*^{3}. . . , (4.6)
defined for (*ϕ, I) ∈ T*^{n}× B, B ⊆ R^{n}_{+}; *ε is the small parameter ordering the various terms of the*
perturbation P :=P

j≥0*ε*^{j}P_{j}. One now looks for a canonical transformation
C_{ε}^{:}T^{n}× B → T^{n}× B^{0}: (*ϕ, I) 7→ (θ, J) =*C_{ε}^{(}*ϕ, I)*

such that the transformation is*ε-close to the identity, i.e.* C0= idT^{n}×B, and in the transformed
Hamiltonian, the first order term of the perturbation, if not completely removed, is simplified
as much as possible: for example, it is transformed into a term independent of the angles.

If one is able to find such a transformation, then one can iterate the procedure and try to
remove/simplify the new second order term of the perturbation, and so on, with the aim of
getting a Hamiltonian as close as possible to that of the unperturbed integrable system, H_{0}.
Notice that the removal of all the angle variables to order k ensures that every new, and thus
every original action variable changes just a bit on a time-scale 1/*ε*^{k}. Indeed, suppose that the
procedure described above works for the first k steps. As a result, one is left with a transformed
Hamiltonian of the form

H(C_{ε}^{−1}^{(}*θ, J)) = h(J) + εZ*1*(J) + ··· + ε*^{k}Z_{k}(J) + Rk(*θ, J;ε) .*

Suppose now that the remainder at the k-th step satisfies R_{k}*= O(ε*^{k+1}) and*∂R*k/*∂θ*j*= O(ε*^{k+1})

∀ j = 1, . . . , n. Then, ˙J_{j}= {Jj, H ◦C_{ε}^{−1}} = {Jj, R_{k}}, i.e.

|Jj(t) − Jj(0)| = Z t

0

¯

¯

¯

¯

*∂R*k

*∂θ*j

(*θ(s), J(s);ε)*

¯

¯

¯

¯ds ≤ cj*ε*^{k+1}t = (cj*ε)(ε*^{k}t) ,

which means |Jj(t) − Jj*(0)| = O(ε) over times t ≤ 1/ε*^{k} ( j = 1,..., n). On the other hand, since the
canonical transformation (*ϕ, I) 7→ (θ, J) is ε-close to the identity, one has*

|Ij(t) − Ij(0)| = |Ij(t) − Jj(t) + Jj(t) − Jj(0) + Jj(0) − Ij(0)| ≤

≤ |Ij(t) − Jj(t)| + |Jj(t) − Jj(0)| + |Jj(0) − Ij(0)| =

*= O(ε) + O(ε) + O(ε) = O(ε) ,*

*again over times t ≤ 1/ε*^{k}. Thus the original action variables too undergo a slow change, so that,
*on a long time interval 0 ≤ t < 1/ε*^{k}, the dynamics of the perturbed system resembles that of the
unperturbed, integrable one.

**Remark 4.4. The** *ε-closeness to the identity of the canonical transformation* C*ε* prevents one
from improving the estimate on the variation of the original action variables: this can be done
with the new actions only, getting for example a reduced action variation |Jj(t) − Jj*(0)| = O(ε*^{2})
*over shorter times |t| < 1/ε*^{k−1}. However, in going back to the original actions such a sharper
control is lost and the best one can do is decided by the canonical transformation.