• Non ci sono risultati.

EXISTENCE OF PROPER REGULAR CONDITIONAL DISTRIBUTIONS

N/A
N/A
Protected

Academic year: 2021

Condividi "EXISTENCE OF PROPER REGULAR CONDITIONAL DISTRIBUTIONS"

Copied!
21
0
0

Testo completo

(1)

EXISTENCE OF PROPER REGULAR CONDITIONAL DISTRIBUTIONS

Pietro Rigo

University of Pavia

Bologna, may 16, 2018

(2)

Classical (Kolmogorovian) conditional probabilities Let (Ω, A, P ) be a probability space and G ⊂ A a sub-σ-field.

A regular conditional distribution (rcd) is a map Q on Ω × A such that

(i) Q(ω, ·) is a probability on A for ω ∈ Ω (ii) Q(·, A) is G-measurable for A ∈ A

(iii) P (A ∩ B) =

RB

Q(ω, A) P (dω) for A ∈ A and B ∈ G

An rcd can fail to exist. However, it exists under mild conditions and

is a.s. unique if A is countably generated.

(3)

In the standard framework, thus, conditioning is with respect to a σ-field G and not with respect to an event H.

What does it mean ?

According to the usual interpretation, it means: For each B ∈ G, we now whether B is true or false. This naive interpretation is very dangerous.

Example 1 Let X = {X

t

: t ≥ 0} be a process adapted to a filtration F = {F

t

: t ≥ 0}. Suppose P (X = x) = 0 for each path x and

{A ∈ A : P (A) = 0} ⊂ F

0

. In this case,

{X = x} ∈ F

0

for each path x.

But then we can stop. We already know the X-path at time 0 !

(4)

Example 2 (Borel-Kolmogorov paradox) Suppose {X = x} = {Y = y}

for some random variables X and Y . Let Q

X

and Q

Y

be rcd’s given σ(X) and σ(Y ). Then,

P (· |X = x) = Q

X

(ω, ·) and P (· |Y = y) = Q

Y

(ω, ·)

where ω ∈ Ω meets X(ω) = x and Y (ω) = y. Hence it may be that P (· |X = x) 6= P (· |Y = y) even if {X = x} = {Y = y}.

Example 3 For the naive interpretation to make sense, Q should be proper, i.e.

Q(ω, ·) = δ

ω

on G for almost all ω .

But Q needs not be proper. In fact, properness of Q essentially

(5)

Conditional 0-1 laws An rcd Q is 0-1 on G if

Q(ω, ·) ∈ {0, 1} on G for almost all ω Why to focus on such a 0-1 law ?

• It is a (natural) consequence of properness

• It is equivalent to

A independent G, under Q(ω, ·), for almost all ω

• It is basic for integral representation of invariant measures

• It is not granted. It typically fails if {A ∈ A : P (A) = 0} ⊂ G

(6)

Theorem 1

Let G

n

⊂ A be a sub-σ-field and Q

n

an rcd given G

n

. The rcd Q is 0-1 on G if

• The ”big” σ-field A is countably generated

• Q

n

is 0-1 on G

n

for each n and G ⊂ lim sup

n

G

n

• E(1

A

|G

n

) → E(1

A

|G) a.s. for each A ∈ A

Note that, by martingale convergence, the last condition is automat-

ically true if the sequence G

n

is monotonic

(7)

Examples

Let S be a Polish space and Ω = S

. Theorem 1 applies to Tail σ-field: G = ∩

n

σ(X

n

, X

n+1

, . . .)

where X

n

is a sequence of real random variables Symmetric σ-field:

G = {B ∈ A : B = f

−1

(B) for each finite permutation f } Thus,

Theorem 1 ⇒ de Finetti’s theorem

Open problem: Theorem 1 does not apply to the shift-invariant σ-field:

G = {B ∈ A : B = s

−1

(B)}

where s(x

1

, x

2

, . . .) = (x

2

, x

3

, . . .) is the shift

(8)

Disintegrability

Let Π ⊂ A be a partition of Ω. P is disintegrable on Π if P (A) =

RΠ

P (A|H) P

(dH)

for each A ∈ A, where

• P (·|H) is a probability on A such that P (H|H) = 1

• P

is a probability on a suitable σ-field of subsets of Π

(9)

Theorem 2 Given a partition Π of Ω, let

G = {(x, y) ∈ Ω × Ω : x ∼ y}.

Then, P is disintegrable on Π whenever

• (Ω, A) is nice (e.g. a standard space)

• G is a Borel subset of Ω × Ω

Remark: G is actually a Borel set if Π is the partition in the atoms of the tail, or the symmetric, or the shift invariant σ-fields

Remark: The condition on G can be relaxed (e.g., G coanalytic)

(10)

Coherent (de Finettian) conditional probabilities A different notion, introduced by de Finetti, is as follows.

Let

P (·|·) : A × G → R.

For all n ≥ 1, c

1

, . . . , c

n

∈ R, A

1

, . . . , A

n

∈ A and B

1

, . . . , B

n

∈ G \ ∅, define

G(ω) =

Pni=1

c

i

1

B

i

(ω) {1

A

i

(ω) − P (A

i

|B

i

)}.

Then, P (·|·) is coherent if

sup

ω∈B

G(ω) ≥ 0 where B = ∪

ni=1

B

i

.

(11)

Such a definition has both merits and drawbacks. In particular, con- trary to the classical case:

• The conditioning is now with respect to events,

• P (B|B) = 1,

• For fixed B, P (·|B) is ”only” a finitely additive probability,

• Disintegrability on Π is not granted, where Π is the partition of

Ω in the atoms of G

(12)

Bayesian inference (X , E) sample space, (Θ, F ) parameter space, {P

θ

: θ ∈ Θ} statistical model,

A prior is a probability π on F . A posterior for π is any collection Q = {Q

x

: x ∈ X } such that

• Q

x

is a probability on F for each x ∈ X

RA

Q

x

(B) m(dx) =

RB

P

θ

(A) π(dθ)

for all A ∈ E B ∈ F and for some (possibly finitely additive)

probability m on subsets of X

(13)

Theorem 3

Fix a measurable function T on X (a statistic) such that P

θ

(T = t) = 0 for all θ and t.

Under mild conditions, for any prior π, there is a posterior Q for π such that

T (x) = T (y) ⇒ Q

x

= Q

y

Interpretation:

The above condition means that T is sufficient for Q. Suppose you

start with a prior π, describing your feelings on θ, and a statistic T ,

describing how different samples affect your inference on θ. Theorem

3 states that, whatever π and T (with P

θ

(T = t) = 0) there is a

posterior Q for π which makes T sufficient.

(14)

Point estimation

The ideas underlying Theorem 3 yield further results. Suppose Θ ⊂ R and d : X → Θ is an estimate of θ.

Theorem 4

Under mild conditions, if the prior π is null on compacta, there is a posterior Q for π such that

R

θ

2

Q

x

(dθ) < ∞ and

E

Q

(θ|x) =

R

θ Q

x

(dθ) = d(x) Interpretation:

The above condition means that d is optimal under square error loss.

Suppose you start with a measurable map d : X → Θ, to be regarded

as your estimate of θ. Theorem 4 states that, if the prior π vanishes

(15)

Compatibility

Let X = (X

1

, . . . , X

k

) be a k-dimensional random vector and X

−i

= (X

1

, . . . , X

i−1

, X

i+1

, . . . , X

k

)

To assess the distribution of X, assign the kernels Q

1

, . . . , Q

k

, where each Q

i

is only requested to satisfy

Q

i

(x, ·) is a probability for fixed x and

the map x 7→ Q

i

(x, A) is measurable for fixed A

The kernels Q

1

, . . . , Q

k

are compatible if there is a Borel probability µ on R

k

such that

P

µ

(X

i

∈ · |X

−i

= x) = Q

i

(x, ·) for all i and µ-almost all x.

Such a µ, if exists, should be regarded as the distribution of X

(16)

Example: Let k = 2 and Q

1

(x, ·) = Q

2

(x, ·) = N (x, 1)

This looks reasonable in a number of problems. Nevertheless, Q

1

and Q

2

are not compatible, i.e., no Borel probability on R

2

admits Q

1

and Q

2

as conditional distributions

Compatibility issues arise in: spatial statistics, statistical me- chanics, Bayesian image analysis, multiple data imputation and Gibbs sampling

Another example are improper priors. Given the statistical model

{P

θ

: θ ∈ Θ}, let Q = {Q

x

: x ∈ X } be the ”formal posterior” of an

improper prior γ (i.e., γ(Θ) = ∞). Strictly speaking, Q makes sense

only if compatible with the statistical model. In that case, Q agrees

with the posterior of some (proper) prior

(17)

For x ∈ R

k

and f ∈ C

b

(R

k

), let

E(f |X

−i

= x

−i

) =

R

f (x

1

, . . . , x

i−1

, t, x

i+1

, . . . , x

k

) Q

i

(x

−i

, dt) Theorem 5

Suppose there is a compact set A

i

such that Q

i

(x, A

i

) = 1 for all x ∈ R

k−1

.

Letting A = A

1

× . . . × A

k

, suppose also that

x 7→ E(f |X

−i

= x

−i

) is continuous on A for each f ∈ C(A) Then, Q

1

, . . . , Q

k

are compatible if and only if

sup

x∈A Pk−1i=1

{E(f

i

|X

−i

= x

−i

) − E(f

i

|X

−k

= x

−k

)} ≥ 0

for all f

1

, . . . , f

k−1

∈ C(A)

(18)

For each i, fix a (σ-finite) measure λ

i

and suppose that Q

i

(x, dy) = f

i

(x, y) λ

i

(dy) for all x ∈ R

k−1

Let λ = λ

1

× . . . × λ

k

be the product measure Theorem 6

Suppose f

i

> 0 for all i. Then, Q

1

, . . . , Q

k

are compatible if and only if there are positive Borel functions u

1

, . . . , u

k

on R

k−1

such that

f

i

(x

i

|x

−i

) = f

k

(x

k

|x

−k

) u

i

(x

−i

) u

k

(x

−k

),

for all i < k and λ-almost all x ∈ R

k

, and

R

u

k

−k

= 1

Remark: The assumption f

i

> 0 can be dropped at the price of a

(19)

An asymptotic result

Let S be a Polish space, (X

n

) an exchangeable sequence of S-valued random variables, and

µ

n

= (1/n)

Pni=1

δ

X

i

empirical measure

a

n

( ·) = P (X

n+1

∈ · |X

1

, . . . , X

n

) predictive measure

Often, a

n

can not be evaluated in closed form and µ

n

is a reasonable

”estimate” of a

n

. Here, we focus on the error d(µ

n

, a

n

)

where d is a distance between probability measures. For instance, if d(µ

n

, a

n

) → 0 in some sense

then µ

n

is a consistent estimate of a

n

(20)

Fix a class D of Borel subsets of S and define d as d(α, β) = ||α − β|| = sup

A∈D

|α(A) − β(A)|

for all probabilities α and β on the Borel subsets of S Theorem 7

If D is a (countably determined) VC-class, lim sup

n q n

log log n

||µ

n

− a

n

|| ≤ 1/ √

2 a.s.

Hence, for any constants r

n

,

r

n

||µ

n

− a

n

|| → 0 a.s. provided r

n qlog log nn

→ 0

(21)

Remark: If S = R

k

,

D = {closed balls}, D = {half spaces}, and D = {(−∞, t] : t ∈ R

k

}

are (countably determined) VC-classes

Remark: It is possible to give conditions for

√ n ||µ

n

− a

n

|| → 0 in probability

or even for

n ||µ

n

− a

n

|| converges a.s. to a finite limit Example: Let S = {0, 1}. Then, √

n ||µ

n

− a

n

|| → 0 in probability if

the prior (i.e, the de Finetti’s measure) is absolutely continuous and

n ||µ

n

− a

n

|| converges a.s. if the prior is absolutely continuous with

an almost Lipschitz density

Riferimenti

Documenti correlati

Suppose you start with a prior π, describing your feelings on θ, and a statistic T , describing how different samples affect your inference on θ. In addition, Q can be taken to

Naval Academy of Livorno and University of Pavia Bologna, March 22, 2019..

Si consideri il sistema meccanico formato da un disco omogeneo D di massa m e raggio r vincolato a rotolare senza strisciare su una guida circolare di raggio R posta in un

You need to nourish and care of your skin; lack of relaxation, irregular skin care, chemical and synthetic products, alcohol, cigarettes, pollution, stress are great

THE …...EST OF THE FAMILY (complete with adjectives like sweet, kind, nice, cool, smart, strong, young, ). YOU ARE MY STAR YOU ARE MY SUN I LOVE YOU MUM, YOU ARE THE ONE

[r]

Appello in cui si intende sostenere la prova di teoria : II  III  VOTO.

The main difference is that function F (X) depends on the amplitude X, while G(jω) depends on the frequency ω.... Describimg functions of the