This session aims to mix a few talks concerning the asymptotics of E(·|G

(1)

This session aims to mix a few talks concerning the asymptotics of

E(·|G

_n

)

when the conditioning σ-fields G

_n

move. In the spirit of the martingale convergence theorem. Among the various questions:

• Is there a σ-field G such that E(Z|G

_n

) → E(Z|G), in some sense, for sufficiently many random variables Z ?

• What about E(Z

_n

|G

_n

) if the Z

_n

move as well ? Example:

G

_n

= σ(X

₁

, . . . , X

_n

),

for some sequence X

_n

, and Z

_n

= 1

_{X

n+1∈B}

, so that

E(Z

_n

|G

_n

) = P (X

_n+1

∈ B|X

₁

, . . . , X

_n

) predictive measure

1

(2)

• Assume a regular version of the conditional distribution given G

_n

exists and define

a

_n

(ω)(A) = P

_ω

(A|G

_n

)

for each ω and each A in some sub-σ-field. Let µ

_n

be a sequence of random probability measures and d a distance between probability measures. What about

d[ a

_n

(ω), µ

_n

(ω)] ?

Example: G

_n

= σ(X

₁

, . . . , X

_n

),

a

_n

(·) = P (X

_n+1

∈ ·|X

₁

, . . . , X

_n

) predictive measure, µ

_n

= (1/n)

^Pⁿ_i=1

δ

_X

i

empirical measure

In some frameworks, µ

_n

can be regarded as an ”estimate” of a

_n

and it is natural to investigate the asymptotics of d(a

_n

, µ

_n

)

(3)

ASYMPTOTIC PREDICTIVE INFERENCE WITH

EXCHANGEABLE DATA

Patrizia Berti, Luca Pratelli, Pietro Rigo

Universita’ di Modena e Reggio-Emilia Accademia Navale di Livorno

Universita’ di Pavia

Torino, June 20, 2017

(4)

Framework

• (X

_n

: n ≥ 1) sequence of random variables with values in a (nice) measurable space (S, B)

• µ

_n

= (1/n)

^Pⁿ_i=1

δ

_X

i

empirical measure

• a

_n

(·) = P (X

_n+1

∈ · | X

₁

, . . . , X

_n

) predictive measure

Under some conditions, µ

_n

(B)−a

_n

(B) → 0 a.s. for each B ∈ B. Thus, we fix D ⊂ B, and we investigate uniform convergence over D, i.e., we focus on

||µ

_n

− a

_n

|| = sup

_B∈D

|µ

_n

(B) − a

_n

(B)|

(5)

Motivations

• Bayesian predictive inference

• Empirical processes for non-ergodic data

• Predictive distributions of exchangeable sequences

• Frequentistic approximations of Bayesian procedures

4

(6)

Assumptions and possible generalizations

• From now on, (X

_n

) is exchangeable. However, most results are still valid if (X

_n

) is conditionally identically distributed, namely,

P (X

_k

∈ · | X

₁

, . . . , X

_n

) = P (X

_n+1

∈ · | X

₁

, . . . , X

_n

) a.s.

for each k > n ≥ 0

• Similarly, in most results, σ(X

₁

, . . . , X

_n

) can be replaced by any filtration G

_n

which makes (X

_n

) adapted

• To avoid measurability problems, D is countably determined, i.e.

sup

_B∈D

|α(B) − β(B)| = sup

_B∈D₀

|α(B) − β(B)|

for some countable D

₀

⊂ D and all probabilities α and β on B

(7)

Goals

• To give conditions for

||µ

_n

− a

_n

|| → 0 a.s. or in probability

• To determine the rate of convergence, i.e., to fix the asymptotic behavior of

r

_n

||µ

_n

− a

_n

||

for suitable constants r

_n

> 0, e.g.

r

_n

= √

n or r

_n

=

^q ⁿ

log log n

• Another issue, not discussed in this talk, is the limiting distribution (if any) of r

_n

(µ

_n

− a

_n

) under some distance on the space l

^∞

(D)

6

(8)

Theorem 1

Recall that, because of exchangeability, there is a random probability measure µ on B such that

µ

_n

(B) → µ(B) a.s. for each B ∈ B

If ||µ

_n

− µ|| → 0 a.s. then ||µ

_n

− a

_n

|| → 0 a.s.

Examples: ||µ

_n

− a

_n

|| → 0 a.s. provided

• D is a universal Glivenko-Cantelli-class

• D = B and µ a.s. discrete (usual in Bayesian nonparametrics !)

• S = R

^k

, D = {closed convex sets} and µ << λ a.s. where λ is a

(non random) σ-finite product measure

(9)

Theorem 2

Fix the constants r

_n

> 0 and define M = sup

_n

r

_n

||µ

_n

− µ||. Define also the ”exchangeable empirical process”

W

_n

= √

n (µ

_n

− µ)

If E(M ) < ∞ then lim sup

_n

||µ

_n

− a

_n

|| < ∞ a.s.

Examples:

•

^q_{log n}ⁿ

||µ

_n

− a

_n

|| → 0 a.s. if sup

_n

E{||W

_n

||

^p

} < ∞ for some p > 2

•

^q_{log log n}ⁿ

||µ

_n

− a

_n

|| ≤

^√¹

2

a.s. if D is a VC-class

8

(10)

Theorem 3

If sup

_n

E{||W

_n

||

^p

} < ∞, for some p ≥ 1, then

r

_n

||µ

_n

− a

_n

|| → 0 in probability whenever

^√^rⁿ

n

→ 0

Remark: Thus, if sup

_n

E{||W

_n

||

^p

} < ∞, then

q _n

log log n

||µ

_n

− a

_n

|| → 0 in probability

but a.s. convergence can not be obtained by the LIL

(11)

Theorem 4 Define

h

_n

(B) = P {X

_n+1

∈ B | µ

_n

(B)}

Suppose that µ(B) has absolutely continuous distribution for each B ∈ D such that 0 < P (X

₁

∈ B) < 1. Then, under mild technical conditions, one obtains

√ n ||µ

_n

− a

_n

|| → 0 in probability

if and only if

√ n {a

_n

(B) − h

_n

(B)} → 0 in probability for each B ∈ D

Open problem: Characterize the above condition. More generally, in addition to µ

_n

(B), which further information is required to evaluate a

_n

(B) ?

10