This session aims to mix a few talks concerning the asymptotics of
E(·|G
n)
when the conditioning σ-fields G
nmove. In the spirit of the martingale convergence theorem. Among the various questions:
• Is there a σ-field G such that E(Z|G
n) → E(Z|G), in some sense, for sufficiently many random variables Z ?
• What about E(Z
n|G
n) if the Z
nmove as well ? Example:
G
n= σ(X
1, . . . , X
n),
for some sequence X
n, and Z
n= 1
{Xn+1∈B}
, so that
E(Z
n|G
n) = P (X
n+1∈ B|X
1, . . . , X
n) predictive measure
1
• Assume a regular version of the conditional distribution given G
nexists and define
a
n(ω)(A) = P
ω(A|G
n)
for each ω and each A in some sub-σ-field. Let µ
nbe a sequence of random probability measures and d a distance between probability measures. What about
d[ a
n(ω), µ
n(ω)] ?
Example: G
n= σ(X
1, . . . , X
n),
a
n(·) = P (X
n+1∈ ·|X
1, . . . , X
n) predictive measure, µ
n= (1/n)
Pni=1δ
Xi
empirical measure
In some frameworks, µ
ncan be regarded as an ”estimate” of a
nand it is natural to investigate the asymptotics of d(a
n, µ
n)
ASYMPTOTIC PREDICTIVE INFERENCE WITH
EXCHANGEABLE DATA
Patrizia Berti, Luca Pratelli, Pietro Rigo
Universita’ di Modena e Reggio-Emilia Accademia Navale di Livorno
Universita’ di Pavia
Torino, June 20, 2017
Framework
• (X
n: n ≥ 1) sequence of random variables with values in a (nice) measurable space (S, B)
• µ
n= (1/n)
Pni=1δ
Xi
empirical measure
• a
n(·) = P (X
n+1∈ · | X
1, . . . , X
n) predictive measure
Under some conditions, µ
n(B)−a
n(B) → 0 a.s. for each B ∈ B. Thus, we fix D ⊂ B, and we investigate uniform convergence over D, i.e., we focus on
||µ
n− a
n|| = sup
B∈D|µ
n(B) − a
n(B)|
Motivations
• Bayesian predictive inference
• Empirical processes for non-ergodic data
• Predictive distributions of exchangeable sequences
• Frequentistic approximations of Bayesian procedures
4
Assumptions and possible generalizations
• From now on, (X
n) is exchangeable. However, most results are still valid if (X
n) is conditionally identically distributed, namely,
P (X
k∈ · | X
1, . . . , X
n) = P (X
n+1∈ · | X
1, . . . , X
n) a.s.
for each k > n ≥ 0
• Similarly, in most results, σ(X
1, . . . , X
n) can be replaced by any filtration G
nwhich makes (X
n) adapted
• To avoid measurability problems, D is countably determined, i.e.
sup
B∈D|α(B) − β(B)| = sup
B∈D0|α(B) − β(B)|
for some countable D
0⊂ D and all probabilities α and β on B
Goals
• To give conditions for
||µ
n− a
n|| → 0 a.s. or in probability
• To determine the rate of convergence, i.e., to fix the asymptotic behavior of
r
n||µ
n− a
n||
for suitable constants r
n> 0, e.g.
r
n= √
n or r
n=
q nlog log n
• Another issue, not discussed in this talk, is the limiting distribution (if any) of r
n(µ
n− a
n) under some distance on the space l
∞(D)
6
Theorem 1
Recall that, because of exchangeability, there is a random probability measure µ on B such that
µ
n(B) → µ(B) a.s. for each B ∈ B
If ||µ
n− µ|| → 0 a.s. then ||µ
n− a
n|| → 0 a.s.
Examples: ||µ
n− a
n|| → 0 a.s. provided
• D is a universal Glivenko-Cantelli-class
• D = B and µ a.s. discrete (usual in Bayesian nonparametrics !)
• S = R
k, D = {closed convex sets} and µ << λ a.s. where λ is a
(non random) σ-finite product measure
Theorem 2
Fix the constants r
n> 0 and define M = sup
nr
n||µ
n− µ||. Define also the ”exchangeable empirical process”
W
n= √
n (µ
n− µ)
If E(M ) < ∞ then lim sup
n||µ
n− a
n|| < ∞ a.s.
Examples:
•
qlog nn||µ
n− a
n|| → 0 a.s. if sup
nE{||W
n||
p} < ∞ for some p > 2
•
qlog log nn||µ
n− a
n|| ≤
√12
a.s. if D is a VC-class
8
Theorem 3
If sup
nE{||W
n||
p} < ∞, for some p ≥ 1, then
r
n||µ
n− a
n|| → 0 in probability whenever
√rnn
→ 0
Remark: Thus, if sup
nE{||W
n||
p} < ∞, then
q n
log log n
||µ
n− a
n|| → 0 in probability
but a.s. convergence can not be obtained by the LIL
Theorem 4 Define
h
n(B) = P {X
n+1∈ B | µ
n(B)}
Suppose that µ(B) has absolutely continuous distribution for each B ∈ D such that 0 < P (X
1∈ B) < 1. Then, under mild technical conditions, one obtains
√ n ||µ
n− a
n|| → 0 in probability
if and only if
√ n {a
n(B) − h
n(B)} → 0 in probability for each B ∈ D
Open problem: Characterize the above condition. More generally, in addition to µ
n(B), which further information is required to evaluate a
n(B) ?
10