AND THE BELLMAN EQUATION NOTE MMII 2010-2011
PAOLA LORETI
Here we give some known results to introduce the students to the subject.
1. Assumption We consider the system
(1) ( ˙ X(t) = b(X(t), α(t)),
X(s) = x,
α(t) is the control function, measurable in [0, +∞) that takes its values in a compact set A.
In order to ensure the existence and uniqueness of the solution of (1) in the class Lip ([0, T ]; R n ), we assume that the mapping
b : R n × A → R n
is continuous, and satisfies the following standard assumptions:
There exist two positive constants C 1 , C 2 such that for every x, x 0 ∈ R n , a ∈ A, we have
(2)
( kb(x, a)k ≤ C 1 ,
kb(x, a) − b(x 0 , a)k ≤ C 2 kx − x 0 k.
The solution X x (t), starting in x at t = 0, is given by X x (t) = x +
Z +∞
0
b(X x (s), α(s))ds We assume that the mapping
f : R n × A → R
is continuous, and satisfies the following assumptions:
There exist two positive constants C 1 , C 2 such that for every x, x 0 ∈ R n , a ∈ A, we have
(3)
( |f (x, a)| ≤ C 1 ,
|f (x, a) − f (x 0 , a)| ≤ C 2 kx − x 0 k.
1
The function f is called the running cost. Take λ > 0. The functional cost is
J (x, α) = Z +∞
0
f (X x (s), α(s))e −λs ds The value function is
u(x) = inf
α J (x, α).
2. The Dynamic programming principle DPP
u(x) = inf
α
Z t
0
f (X x (s), α(s))e −λs ds + u(X x (t))e −λt , for all real x and for all positive t.
We show that
u(x) ≥ inf
α
Z t
0
f (X x (s), α(s))e −λs ds + u(X x (t))e −λt Proof. Take any admissible control, then
Z +∞
0
f (X x (s), α(s))e −λs ds = Z t
0
f (X x (s), α(s))e −λs ds+
Z +∞
t
f (X x (s), α(s))e −λs ds σ = s − t
Z +∞
t
f (X x (s), α(s))e −λs ds = Z +∞
0
f (X x (σ + t), α(σ + t))e −λ(σ+t) dσ We define
X x (σ) = X x (σ + t) (4)
( X ˙ x (σ) = ˙ X x (t + σ) X(0) = X t
X x (σ) = X x (0) + Z σ
0
X(s)ds = X(t) + ˙ Z σ
0
X ˙ x (t + s)ds = X(t) +
Z σ 0
b(X x (t + s), α(t + s)ds = X(t) + Z σ
0
b(X x (s), α(s))ds We set
α(s) = α(t + s) verifies
(5)
( X ˙ x (σ) = b(X x (s), α(s)) X(0) = X(t)
Z +∞
t
f (X x (s), α(s))e −λs ds = e −λt Z +∞
0
f (X x (σ + t), α(σ + t))e −λσ dσ = e −λt
Z +∞
0
f (X X(t) (σ), α(σ))e −λσ dσ ≥ e −λt u(X(t))
Hence
u(x, t) = inf
α J (x, α) ≥ inf
α
Z t 0
f (X x (s), α(s))e −λs ds + u(X(t))e −λt
Proof. We show that
u(x) ≤ inf
α
Z t
0
f (X x (s), α(s))e −λs ds + u(X x (t))e −λt Take any admissible control. Then
Z +∞
0
f (X x (s), α(s))e −λs ds = Z t
0
f (X x (s), α(s))e −λs ds+
Z +∞
t
f (X x (s), α(s))e −λs ds There exists ˆ α such that
u(X(t)) ≥ J (X(t), ˆ α) − .
Take any admissible control α. Set α(s) =
( α(s), 0 ≤ s ≤ t ˆ
α(s − t) s ≥ t, and
X(s) =
( X(s), 0 ≤ s ≤ t X(s − t) ˆ s ≥ t, where
( ˆ X(s) = b( ˆ X(s), ˆ α(s)) X(0) = X(t). ˆ
Z t 0
f (X x (s), α(s))e −λs ds + u(X x (t))e −λt ≥ Z t
0
f (X x (s), α(s))e −λs ds + e −λt J (X(t), ˆ α) − e −λt = Z t
0
f (X x (s), α(s))e −λs ds + e −λt Z ∞
0
f ( ˆ X X(t) (s), ˆ α(s))e −λs ds − e −λt = Z t
0
f (X x (s), α(s))e −λs ds + e −λt e λt Z +∞
t
f (X x (s), α(s))e −λs ds − e −λt = Z +∞
0
f (X x (s), α(s))e −λs ds − e −λt ≥ u(x) − e −λt ≥ u(x) −
3. The Dynamic Programming Equation The Bellman equation is
λu(x) + max
a {−Du(x) · b(x, a) − f (x, a)} = 0.
All the computation are done under the assumption u ∈ C 1 (R n ). By the dynamic programming principle, taking α(s) = a a ∈ A
u(x) ≤ Z t
0
f (X x (s), a)e −λs ds + u(X x (t))e −λt Hence
u(x) − u(X x (t) ≤ Z t
0
f (X x (s), a)e −λs ds + u(X x (t))(e −λt − 1) and
u(x) − u(X x (t)
t ≤ 1
t Z t
0
f (X x (s), a)e −λs ds + u(X x (t))(e −λt − 1) t
As t → 0 + we get
λu − Du(x) · b(x, a) − f (x, a) ≤ 0, since a is chosen in an arbitrary way
λu(x) + max
a {−Du(x) · b(x, a) − f (x, a)} ≤ 0.
On the other hand if it is untrue that λu(x) + max
a {−Du(x) · b(x, a) − f (x, a)} ≥ 0, this means that there exists x 1 and a positive number θ such that
λu(x 1 ) + max
a {−Du(x 1 ) · b(x 1 , a) − f (x 1 , a)} < −θ < 0.
Hence for any a
λu(x 1 ) − Du(x 1 ) · b(x 1 , a) − f (x 1 , a) < −θ < 0.
For t small enough we have also
λu(X x1(t)) − Du(X x1(t)) · b(X x1(t), a) − f (X x1(t), a) < −θ < 0.
(t)) · b(X x1(t), a) − f (X x1(t), a) < −θ < 0.
(t), a) < −θ < 0.
By the dynamic programming principle there exists a control α such that u(x) ≥
Z t 0
f (X x1(s), α(s))e −λs ds + u(X x1(s))e −λt − θt 2 u(x) − u(X x1(t)) ≥
(s))e −λt − θt 2 u(x) − u(X x1(t)) ≥
Z t 0
f (X x (s), α(s))e −λs ds + u(X x1(t))(e −λt − 1) − θt 2 u(x)−u(X x1(t)) = −
(t)) = −
Z t 0
d
ds u(X x1(s)) = − Z t
0
Du(X x1(s))b(u(X x1(s)), α(s))ds Hence
(s)), α(s))ds Hence
Z t 0
−Du(X x1(s))b(X x1(s), α(s))ds−
(s), α(s))ds−
Z t 0
f (X x (s), α(s))e −λs ds ≥ u(X x1(t))(e −λt −1)− θt
2
and Z t
0
−Du(X x1(s))b(u(X x1(s)), α(s))−f (X x (s), α(s))ds−
(s)), α(s))−f (X x (s), α(s))ds−
Z t 0
f (X x (s), α(s))(e −λs −1)ds ≥
u(X x1(t))(e −λt − 1) − θt 2 Since
Z t 0
−Du(X x1(s))b(u(X x1(s)), α(s))−f (X x (s), α(s))ds ≤ Z t
(s)), α(s))−f (X x (s), α(s))ds ≤ Z t
0
−λu(X x1(s))ds−θt we have
Z t 0
−λu(X x1(s))ds−θt ≥ u(X x1(t))(e −λt −1)− θt 2 +
(t))(e −λt −1)− θt 2 +
Z t 0
f (X x (s), α(s))(e −λs −1)ds Hence
0 ≥ θt
2 + ω(t) with ω(t) → 0 as t → 0, a contradiction.
3.1. Applications. Take
b(X, α) = −X · α for X, α ∈ R, J (x, α) =
Z ∞ 0
|X x α (s)| + |α(s)|e −2s ds.
In this case the principle of dynamic programming means that (6) u(x) = inf
α
Z t 0
|X x α (s)| + |α(s)|e −2s ds + u(X x α (t))e −2t
for every t > 0. Hence we deduce the
Proposition 3.1. The value function u : R → R satisfies the following condi- tions:
(a) u is Lipschitzian;
(b) in every point x 6= 0 where u is differentiable, we have 2u(x) − |x| + max
|a|≤1 {axu 0 (x) − |a|} = 0.
Proof.
(a) For x, y ∈ R and ε > 0 fixed arbitrarily, there exists an admissible control such that
u(x) >
Z ∞ 0
|X x α (s)| + |α(s)|e −2s ds − ε.
Since
u(y) ≤ Z ∞
0
X y α (s)
+ |α(s)|e −2s ds,
we have
u(y) − u(x) <
Z ∞ 0
X y α (s)
− |X x α (s)|e −2s ds + ε
≤ Z ∞
0
X y α (s) − X x α (s)
e −2s ds + ε
= |y − x|
Z ∞ 0
e −
R
s0