Abstract. Let (X

(1)

BASED ON DEPENDENT DATA

PATRIZIA BERTI, LUCA PRATELLI, AND PIETRO RIGO

Abstract. Let (X

n

) be any sequence of random variables adapted to a fil- tration (G

n

). Define a

n

(·) = P X

n+1

∈ · | G

n

, b

n

=

¹_n

P

n−1

i=0

a

i

, and µ

n

=

_n¹

P

n

i=1

δ

X_i

. Convergence in distribution of the empirical processes B

n

= √

n (µ

n

− b

n

) and C

n

= √

n (µ

n

− a

n

)

is investigated under uniform distance. If (X

n

) is conditionally identically distributed (in the sense of [4]) convergence of B

n

and C

n

is studied according to Meyer-Zheng as well. Some CLTs, both uniform and non uniform, are proved. In addition, various examples and a characterization of conditionally identically distributed sequences are given.

1. Introduction

Almost all work on empirical processes, so far, concerned ergodic sequences (X _n ) of random variables. Slightly abusing terminology, here, (X _n ) is called ergodic if the underlying probability measure P is 0-1 valued on the sub-σ-field

σ lim sup

n

1 n

n

X

i=1

I B (X i ) : B a measurable set.

In real problems, however, (X _n ) is often non ergodic in the previous sense. Most stationary sequences, for instance, are non ergodic. Or else, an exchangeable se- quence is ergodic if and only if it is i.i.d..

This paper deals with convergence in distribution of empirical processes based on non ergodic data. Special attention is paid to conditionally identically distributed (c.i.d.) sequences of random variables (see Section 3). This type of dependence, introduced in [4] and [16], includes exchangeability as a particular case and plays a role in Bayesian inference.

For convergence in distribution of empirical processes, the usual and (natural) distances are the uniform and the Skorohod ones. While such distances have various merits, they are often too strong to deal with non ergodic data. Thus, in case of c.i.d. data, empirical processes are also investigated under a weaker distance; see Meyer-Zheng’s paper [19] and Subsection 5.2.

The paper is organized as follows. Sections 2 and 4 include preliminary material (with the only exception of Example 8). Results are in Sections 3 and 5. Section 3 includes a characterization of c.i.d. sequences and a couple of examples. Section 5 contains some uniform and non uniform CLTs. Suppose (X n ) is adapted to

2000 Mathematics Subject Classification. 60B10, 60F05, 60G09, 60G57.

Key words and phrases. Conditional identity in distribution, Empirical process, Exchangeabil- ity, Predictive measure, Stable convergence.

1

(2)

a filtration (G n ). Define the predictive measure a n (·) = P (X n+1 ∈ · | G n ), the empirical measure µ n = _n ¹ P n

i=1 δ X

_i

, and the empirical processes B n = √

n (µ n − b n ) and C n = √

n (µ n − a n ) where b n = _n ¹ P n−1

i=0 a i . Our main results provide conditions for B n and C n to converge in distribution, under uniform distance as well as in Meyer-Zheng’s sense;

see Theorems 10-14.

2. Notation and basic definitions

Throughout, (Ω, A, P ) is a probability space, X a Polish space and B the Borel σ-field on X . The ”data” are meant as a sequence (X n : n ≥ 1) of X -valued random variables on (Ω, A, P ). The sequence (X n ) is adapted to the filtration G = (G n : n ≥ 0). Apart from the final Section 5, (X n ) is assumed to be identically distributed.

Let S be a metric space. A random probability measure on S is a map γ on Ω such that: (i) γ(ω) is a Borel probability measure on S for each ω ∈ Ω; (ii) ω 7→ γ(ω)(B) is A-measurable for each Borel set B ⊂ S. If S = R, we call F a random distribution function if F (t, ω) = γ(ω)(−∞, t], (t, ω) ∈ R × Ω, for some random probability measure γ on R.

A map Y : Ω → S is measurable, or a random variable, if Y ⁻¹ (B) ∈ A for all Borel sets B ⊂ S, and it is tight provided it is measurable and has a tight probability distribution. The outer expectation of a bounded function V : Ω → R is E ^∗ V = inf EU , where inf ranges over those bounded measurable U : Ω → R satisfying U ≥ V .

Let (Ω n , A n , P n ) be a sequence of probability spaces and Y n : Ω n → S. The maps Y n are not requested to be measurable. Denote by C b (S) the set of real bounded continuous functions on S. Given a Borel probability ν on S, say that Y n converges in distribution to ν if

E ^∗ f (Y n ) −→

Z

f dν for all f ∈ C b (S).

In such case, we write Y _n −→ Y for any S-valued random variable Y , defined on ^d some probability space, with distribution ν. We refer to [11] and [21] for more on convergence in distribution of non measurable random elements.

Finally, we turn to stable convergence. Fix a random probability measure γ on S and suppose (Ω n , A n , P n ) = (Ω, A, P ) for all n. Say that Y n converges stably to γ whenever

E ^∗ f (Y n ) | H −→ E(γ(f ) | H) for all f ∈ C b (S) and H ∈ A with P (H) > 0.

Here, γ(f ) denotes the real random variable ω 7→ R f (x) γ(ω)(dx). Stable conver-

gence clearly implies convergence in distribution. Indeed, Y n converges in distribu-

tion to the probability measure E(γ(·) | H), under P (· | H), for each H ∈ A with

P (H) > 0. Stable convergence has been introduced by Renyi and subsequently

investigated by various authors (in case the Y _n are measurable). We refer to [8],

[15] and references therein for details.

(3)

3. Conditionally identically distributed random variables 3.1. Basics. Let (X n : n ≥ 1) be adapted to the filtration G = (G n : n ≥ 0). Then, (X _n ) is conditionally identically distributed with respect to G, or G-c.i.d., if (1) P X k ∈ · | G n = P X n+1 ∈ · | G n

a.s. for all k > n ≥ 0.

Roughly speaking, (1) means that, at each time n ≥ 0, the future observations (X k : k > n) are identically distributed given the past G n . Condition (1) is equivalent to

X T +1 ∼ X 1 for each finite G-stopping time T.

(For any random variables U and V , we write U ∼ V to mean that U and V are identically distributed). When G 0 = {∅, Ω} and G n = σ(X 1 , . . . , X n ), the filtration is not mentioned at all and (X n ) is just called c.i.d.. Clearly, if (X n ) is G-c.i.d.

then it is c.i.d. and identically distributed. Moreover, (X n ) is c.i.d. if and only if (2) X 1 , . . . , X n , X n+2 ∼ X 1 , . . . , X n , X n+1

for all n ≥ 0.

Exchangeable sequences are c.i.d., for they meet (2), while the converse is not true. In fact, by a result in [16], (X n ) is exchangeable if and only if it is stationary and c.i.d.. Forthcoming Examples 3, 4 and 8 exhibit non exchangeable c.i.d. se- quences. We refer to [4] for more on c.i.d. sequences. Here, it suffices to mention the Strong Law of Large Numbers (SLLN) and some of its consequences.

Let (X n ) be G-c.i.d. and µ n = ¹ _n P n

i=1 δ X

_i

the empirical measure. Then, there is a random probability measure γ on X satisfying

µ n (B) −→ γ(B) ^a.s. for every fixed B ∈ B.

As a consequence, given n ≥ 0 and B ∈ B, one obtains Eγ(B) | G n = lim

k Eµ k (B) | G n = lim

k

1 k

k

X

i=n+1

P X i ∈ B | G n = P X n+1 ∈ B | G n a.s..

Suppose next X = R. Up to enlarging the underlying probability space (Ω, A, P ), there is an i.i.d. sequence (U _n ) with U ₁ uniformly distributed on (0, 1) and (U _n ) independent of (X n ). Define the random distribution function F (t) = γ(−∞, t], t ∈ R, and

Z n = inf{t ∈ R : F (t) ≥ U ⁿ }.

Then, (Z n ) is exchangeable and _n ¹ P n

i=1 I _{Z

_i

_∈B} −→ γ(B) for each B ∈ B. The ^a.s.

exchangeable sequence (Z n ) plays a role in forthcoming Theorems 13 and 14.

3.2. Characterizations. Following [10], let us call strategy any collection σ = {σ(q) : q = ∅ or q ∈ X ⁿ for some n = 1, 2, . . .}

where each σ(q) is a probability on B and (x 1 , . . . , x n ) 7→ σ(x 1 , . . . , x n )(B) is Borel measurable for all n ≥ 1 and B ∈ B. Here, ∅ denotes ”the empty sequence”. Let π n be the n-th coordinate projection on X ^∞ , i.e.,

π n (x 1 , . . . , x n , . . .) = x n for all n ≥ 1 and (x 1 , . . . , x n , . . .) ∈ X ^∞ .

By Ionescu Tulcea theorem, each strategy σ induces a unique probability ν on (X ^∞ , B ^∞ ). By ”σ induces ν” we mean that, under ν,

π 1 ∼ σ(∅) and {σ(q) : q ∈ X ⁿ } is a version of the conditional (3)

distribution of π _n+1 given (π ₁ , . . . , π _n ) for all n ≥ 1.

(4)

Conversely, since X is Polish, each probability ν on (X ^∞ , B ^∞ ) is induced by an (essentially unique) strategy σ.

Let α 0 and {α(x) : x ∈ X } be probabilities on B such that the map x 7→ α(x)(B) is Borel measurable for B ∈ B. Say that {α(x) : x ∈ X } is a (Markov) kernel with stationary distribution α ₀ in case α ₀ (B) = R α(x)(B) α 0 (dx) for B ∈ B.

If q = (x ₁ , . . . , x _n ) ∈ X ⁿ and x ∈ X , we write (q, x) = (x ₁ , . . . , x _n , x) and (∅, x) = x. In this notation, the following result is available.

Theorem 1. Let ν be the probability distribution of the sequence (X _n ). Then, (X _n ) is c.i.d. if and only if ν is induced by a strategy σ satisfying

(a) the kernel {σ(q, x) : x ∈ X } has stationary distribution σ(q) for q = ∅ and for almost all q ∈ X ⁿ , n = 1, 2, . . ..

Proof. Fix a strategy σ which induces ν. By (2) and (3), (X n ) is c.i.d. if and only if X 2 ∼ X 1 and, under ν,

{σ(q) : q ∈ X ⁿ } is a version of the conditional (4)

distribution of π n+2 given (π 1 , . . . , π n ) for all n ≥ 1.

In view of (3), the condition X 2 ∼ X 1 amounts to Z

σ(x)(B) σ(∅)(dx) = P (X ₂ ∈ B) = P (X ₁ ∈ B) = σ(∅)(B), B ∈ B, which just means that the kernel {σ(x) : x ∈ X } has stationary distribution σ(∅).

Likewise, condition (4) is equivalent to

for all n ≥ 1, there is H n ∈ B ⁿ such that P (X 1 , . . . , X n ) ∈ H n = 1 and

Z

σ(q, x)(B) σ(q)(dx) = σ(q)(B) for all q ∈ H _n and B ∈ B.

Therefore, (X _n ) is c.i.d. if and only if σ can be taken to meet condition (a). Practically, Theorem 1 suggests how to assess a c.i.d. sequence (X n ) stepwise.

First, select a law σ(∅) on B, the marginal distribution of X 1 . Then, choose a kernel {σ(x) : x ∈ X } with stationary distribution σ(∅), where σ(x) should be viewed as the conditional distribution of X 2 given X 1 = x. Next, for each x ∈ X , select a kernel {σ(x, y) : y ∈ X } with stationary distribution σ(x), where σ(x, y) should be viewed as the conditional distribution of X ₃ given X ₁ = x and X ₂ = y. And so on.

In other terms, for getting a c.i.d. sequence, it is enough to assign at each step a kernel with a given stationary distribution. Indeed, a plenty of methods for doing this are available: see the statistical literature on MCMC, e.g. [18] and [20].

Finally, we recall that exchangeable sequences admit an analogous characteriza- tion. Say that {α(x) : x ∈ X } is a reversible kernel with respect to α 0 in case

Z

A

α(x)(B) α ₀ (dx) = Z

B

α(x)(A) α ₀ (dx) for all A, B ∈ B.

If a kernel is reversible with respect to a probability law, it admits such a law as a stationary distribution. The following result, firstly proved by de Finetti for X = {0, 1}, is in [14].

Theorem 2. The sequence (X _n ) is exchangeable if and only if its probability dis- tribution is induced by a strategy σ such that

(b) {σ(q, x) : x ∈ X } is a reversible kernel with respect to σ(q),

(5)

(c) σ( ^∼ q ) = σ(q) whenever ^∼ q is a permutation of q,

for q = ∅ and for almost all q ∈ X ⁿ , n = 1, 2, . . .. (with ^∼ q = q if q = ∅).

3.3. Examples. It is not hard to see that condition (b) reduces to (a) whenever X = {0, 1}. Thus, for a sequence (X n ) of indicators, (X n ) is exchangeable if and only if it is c.i.d. and its conditional distributions σ(q) are invariant under permutations of q. It is tempting to conjecture that (b) can be weakened into (a) in general, even if the X n are not indicators. As shown by the next example, however, this is not true. It may be that (X n ) fails to be exchangeable, and yet it is c.i.d. and its conditional distributions meet condition (c).

Example 3. Let X = Y × (0, ∞), where Y is a Polish space. Fix a constant r > 0 and Borel probabilities µ 1 on Y and µ 2 on (0, ∞). Define σ(∅) = µ 1 × µ 2 and

σ(x 1 , . . . , x n )(A × B) = σ[(y 1 , z 1 ), . . . , (y n , z n )](A × B)

= r µ ₁ (A) + P n

i=1 z _i I _A (y _i ) r + P n

i=1 z i

µ 2 (B)

where n ≥ 1, x _i = (y _i , z _i ) ∈ Y × (0, ∞) for all i and A ⊂ Y, B ⊂ (0, ∞) are Borel sets. By construction, σ satisfies condition (c). By Lemma 6 of [6], (π _n ) is c.i.d. under ν, where ν is the probability on (X ^∞ , B ^∞ ) induced by σ. However, (π ₁ , π ₂ ) is not distributed as (π ₂ , π ₁ ) for various choices of µ ₁ , µ ₂ (take for instance Y = {0, 1}, µ 1 = (δ 0 + δ 1 )/2 and µ 2 = (δ 1 + δ 2 )/2). Hence, (π n ) may fail to be exchangeable under ν.

The strategy σ of Example 3 makes sense in some real problems. In general, the z _n should be viewed as weights while the y _n describe the phenomenon of interest.

As an example, consider an urn containing white and black balls. At each time n ≥ 1, a ball is drawn and then replaced together with z n more balls of the same color. Let y n be the indicator of the event {white ball at time n} and suppose z n is chosen according to a fixed distribution on the integers, independently of (y 1 , z 1 , . . . , y n−1 , z n−1 , y n ). This situation is modelled by σ in Example 3. Note also that σ is of Ferguson-Dirichlet type if z n = 1 for all n; see [13].

Finally, suppose (X n ) is 2-exchangeable, that is, (X i , X j ) ∼ (X 1 , X 2 ) for all i 6= j.

Suggested by de Finetti’s representation theorem, another conjecture is that the probability distribution of (X n ) is a mixture of 2-independent identically distributed laws. More precisely, this means that

(5) P (X 1 , X 2 , . . .) ∈ B = Z

ν(B) Q(dν), B ∈ B ^∞ ,

where Q is some mixing measure supported by those probability laws ν on (X ^∞ , B ^∞ ) such that (π n ) is 2-independent and identically distributed under ν. Once again, the conjecture turns out to be false. As shown by the following example, it may be that (X n ) is c.i.d. and 2-exchangeable and yet its probability distribution does not admit representation (5).

Example 4. Let m be Lebesgue measure and f : [0, 1] → [0, 1] a Borel function satisfying

(6) Z 1

0

f (u) du = 1 2 ,

Z 1 0

u f (u) du = 1

3 , m{u ∈ [0, 1] : f (u) 6= u} > 0.

(6)

Let (U n : n ≥ 0) be i.i.d. with U 0 uniformly distributed on [0, 1]. Define X = {0, 1}

and X n = I H

_n

, where

H 1 = {U 1 ≤ f (U 0 )}, H n = {U n ≤ U 0 } for n > 1.

Conditionally on U 0 , the sequence (X n ) is independent with

P (X 1 = 1 | U 0 ) = f (U 0 ) and P (X n = 1 | U 0 ) = U 0 a.s. for all n > 1.

Basing on this fact and (6), it is straightforward to check that (X n ) is c.i.d. and 2-exchangeable. Moreover, _n ¹ P n

i=1 X _i −→ U ^a.s. 0 . By Etemadi’s SLLN, if (π _n ) is 2-independent and identically distributed under ν, then

1 n

n

X

i=1

π i ν−a.s.

−→ E ν (π 1 ).

Letting π _∗ = lim sup _n _n ¹ P n

i=1 π i , it follows that ν π _∗ ∈ I, π 1 = 1 = ν π ∗ ∈ I, π 2 = 1

for all Borel sets I ⊂ [0, 1].

Hence, if representation (5) holds, one obtains Z

I

f (u) du = Z

{U

0

∈I}

P (X 1 = 1 | U 0 ) dP = P (U 0 ∈ I, X 1 = 1)

= Z

ν π _∗ ∈ I, π ₁ = 1 Q(dν) = Z

ν π _∗ ∈ I, π ₂ = 1 Q(dν)

= P (U ₀ ∈ I, X 2 = 1) = Z

I

u du for all Borel sets I ⊂ [0, 1].

This implies f (u) = u, for m-almost all u, contrary to (6). Thus, the probability distribution of (X _n ) cannot be written as in (5).

4. Empirical processes

This section includes preliminary material. Apart from Example 8, which is new, all other results are from [4]. First, the empirical processes B n and C n are introduced for an arbitrary G-adapted sequence (X n ). Then, in case (X n ) is G-c.i.d., some known facts on B n and C n are reviewed.

4.1. The general case. Fix a subclass F ⊂ B. Also, for any set T , let l ^∞ (T ) denote the space of real bounded functions on T equipped with the sup-norm

kφk = sup

t∈T

|φ(t)|, φ ∈ l ^∞ (T ).

In the particular case where (X _n ) is i.i.d., the empirical process is G n = √

n (µ n − µ) where µ n = ¹ _n P n

i=1 δ X

_i

is the empirical measure and µ = P ◦ X ₁ ⁻¹ the proba- bility distribution common to the X n . From the point of view of convergence in distribution, G n is regarded as a (non measurable) map G n : Ω → l ^∞ (F ).

If (X n ) is not i.i.d., G n needs not be the ”right” empirical process to be dealt

with. A first reason is that µ only partially characterizes the probability distribution

of the sequence (X _n ) (and usually not in the most significant way). So, in the

(7)

dependent case, G n is often not much interesting for applications. A second reason is the following. If G n converges in distribution, as a map G n : Ω → l ^∞ (F ), then

kµ _n − µk = 1

√ n kG _n k −→ 0. ^P

But kµ n − µk typically fails to converge to 0 when (X n ) is non ergodic. In this case, G n is definitively ruled out as far as convergence in distribution is concerned.

Hence, when (X _n ) is non ergodic, empirical processes should be defined in some different way. One option is

∼

G _n = r _n (µ _n − γ _n ),

where the r n are constants such that r n → ∞ and the γ n random probability measures on X satisfying kµ _n − γ n k −→ 0. ^P

As an example, if there is a random probability measure γ on X satisfying µ n (B) −→ γ(B) ^a.s. for each fixed B ∈ B,

it is tempting to let γ n = γ for all n. Such γ is actually available when (X n ) is c.i.d.. Further, if (X n ) is exchangeable, (X n ) is conditionally i.i.d. given γ. In the latter case, it is rather natural to take r _n = √

n. The corresponding empirical process

W n = √

n (µ n − γ) is examined in [4] and [5].

For another example, define the predictive measure a n (·) = P X n+1 ∈ · | G n .

In Bayesian inference and discrete time filtering, evaluating a n is a major goal.

When a _n cannot be calculated in closed form, one option is estimating it by data and a possible estimate is the empirical measure µ _n . For instance, µ _n is a sound estimate of a _n if (X _n ) is exchangeable and G _n = σ(X ₁ , . . . , X _n ). In such cases, it is important to evaluate the limiting distribution of the error, that is, to investigate convergence in distribution of G ^∼ _n = r _n (µ _n − a n ) for suitable constants r _n → ∞. Among other things, µ _n is a ”consistent estimate” of a _n if G ^∼ _n converges in distribution. In this case, in fact, kµ _n − a n k = _r ¹

n

k G ^∼ _n k −→ 0. Thus, in a Bayesian framework, it is quite ^P reasonable to let γ n = a n . Letting also r n = √

n leads to the empirical process C _n = √

n (µ _n − a _n ).

In case of c.i.d. data, C n is investigated in [2], [4], [6].

A third example, considered in [4] for c.i.d. data, is B n = √

n (µ n − b n ) where b n = 1 n

n−1

X

i=0

a i .

The empirical process B _n plays a role in calibration, stochastic approximation and gambling; see [3] and references therein. Following [9] and focusing on calibration, we now give some motivations to B _n .

Example 5. Let X = R and T a real random variable. At each time n ≥ 0, you

are requested to predict the event {X _n+1 ≤ T } basing on the available information

(8)

G n . Your predictor is P X n+1 ≤ T | G n and prediction performances are assessed through

V _n = 1 n

n

X

i=1

I _{X

_i

_{≤T }} − 1 n

n

X

i=1

P X _i ≤ T | G i−1 = µ n (−∞, T ] − b _n (−∞, T ].

Loosely speaking, you are well calibrated if V _n is small. Let F = {(−∞, t] : t ∈ R}.

Then, |V n | ≤ kµ n − b n k. Furthermore, kµ n − b n k −→ 0 provided µ ^a.s. n converges uniformly on F a.s.; see [3]. Thus, the rate of convergence of kµ n − b n k should be investigated, and this leads to r n (µ n − b n ) for some choice of the constants r n . The process B n corresponds to r n = √

n.

A last remark is that

B n = C n = W n = G n

when (X n ) is i.i.d., G 0 = {∅, Ω} and G n = σ(X 1 , . . . , X n ). Generally, however, B n , C n and W n are technically harder than G n to work with. In fact, G n is centered around the fixed measure µ, while B n , C n and W n are centered around random measures (b _n , a _n and γ, respectively) possibly depending on n.

4.2. The case of c.i.d. data. In the sequel, we focus on X = R and F = {(−∞, t] : t ∈ R}.

For each φ ∈ l ^∞ (F ), we write φ(t) instead of φ (−∞, t] and we regard φ as a member of l ^∞ (R). Accordingly, B ⁿ and C n are regarded as maps from Ω into l ^∞ (R). Precisely, for each t ∈ R, they can be written as

B n (t) = √

n F n (t) − b n (−∞, t]

and C n (t) = √

n F n (t) − P X n+1 ≤ t | G n

where F _n (t) = µ _n (−∞, t] = 1

n

X

i=1

I _{X

_i

_≤t} .

Let N k (0, Σ) denote the Gaussian law on the Borel sets of R ^k with mean 0 and covariance matrix Σ (possibly singular). We let N k (0, 0) = δ 0 and, for k = 1 and u ≥ 0, we write N (0, u) instead of N ₁ (0, u).

Suppose (X _n ) is G-c.i.d.. Then, a possible limit in distribution for B _n or C _n is a tight random variable G : Ω 0 → l ^∞ (R), defined on some probability space (Ω ₀ , A ₀ , P ₀ ), such that

(7) P 0

(G(t ¹ ), . . . , G(t ^k )) ∈ A

= Z

N k 0, Σ(t 1 , . . . , t k )(A) dP

where t 1 , . . . , t k ∈ R, A ⊂ R ^k is a Borel set and Σ(t 1 , . . . , t k ) a random covariance matrix on (Ω, A, P ). One significant case is the following. Recall that there is a random distribution function F such that F _n (t) −→ F (t) for each t ∈ R. Define ^a.s.

G ^F (t) = B(M (t)), t ∈ R,

where B and M are defined on (Ω 0 , A ₀ , P ₀ ), B is a Brownian bridge, M a random distribution function independent of B, and M ∼ F . Then, equation (7) holds with G = G ^F and

Σ(t 1 , . . . , t k ) =

F (t i ∧ t j )(1 − F (t i ∨ t j )) : 1 ≤ i, j ≤ k

.

(9)

Generally, G ^F : Ω 0 → l ^∞ (R) can fail to be measurable if l ^∞ (R) is equipped with the Borel σ-field; see [5] and references therein. However, G ^F is measurable and tight whenever every F -path is continuous on D ^c for some fixed countable set D ⊂ R.

As a trivial example, suppose (X _n ) i.i.d., G ₀ = {∅, Ω} and G _n = σ(X ₁ , . . . , X _n ).

Then F = H a.s., where H is the distribution function common to the X _n , and D can be taken to be D = {t : H(t) > H(t−)}. Thus, G ^F = G ^H is measurable and tight and G n

−→ G d ^H (recall that B n = C n = W n = G n in this particular case).

Let (Y n ) be any sequence of real random processes indexed by R, with bounded cadlag paths, defined on (Ω, A, P ). According to Theorem 1.5.6 of [21], a necessary condition for Y n to converge in distribution to a tight limit is: For all , η > 0, there is a finite partition I 1 , . . . , I m of R by right-open intervals such that

(8) lim sup

n

P max

j sup

s,t∈I

_j

|Y n (s) − Y _n (t)| > < η.

We are now able to state a couple of results from [4].

Theorem 6. Suppose (X n ) is G-c.i.d. and B n meets (8) (i.e., (8) holds with Y _n = B _n ). Then B _n −→ G ^d ^F , under uniform distance on l ^∞ (R), and G ^F is tight.

Theorem 7. Suppose (X n ) is G-c.i.d., C n meets (8), and sup _n EC n (t) ² < ∞ for all t ∈ R. Suppose also that

1 n

n

X

i=1

q i (s) q i (t) −→ σ(s, t) ^a.s. for all s, t ∈ R

where q i (t) = I _{X

_i

_≤t} − i P X i+1 ≤ t | G i + (i − 1) P X i ≤ t | G i−1 .

Then C n

−→ G, under uniform distance on l d ^∞ (R), where G is a tight process with distribution (7) and Σ(t 1 , . . . , t k ) =

σ(t i , t j ) : 1 ≤ i, j ≤ k .

Both Theorems 6 and 7 require condition (8) and it would be useful to have a criterion for testing it. In the exchangeable case, one such criterion is tightness of the process G ^F . Suppose in fact (X _n ) exchangeable and G ^F tight. Then,

W n = √

n {F n − F } −→ G ^d ^F .

Hence, W n meets (8) and, as a consequence, B n and C n satisfy (8) as well; see Remark 4.4 of [4]. Furthermore, G ^F is tight whenever P (X ₁ = X ₂ ) = 0 or X ₁ has a discrete distribution. Unfortunately, this useful criterion fails in the G-c.i.d. case.

As we now prove, it may be that (X _n ) is G-c.i.d., G ^F tight, and yet condition (8) fails for C _n .

Example 8. Let (α n ) and (β n ) be independent sequences of independent real ran- dom variables, with α n ∼ N (0, c n − c n−1 ) and β n ∼ N (0, 1 − c n ) where c n = 1 − ( _n+1 ¹ )

¹⁵

. Define

X n =

n

X

i=1

α i + β n , G 0 = {∅, Ω}, G n = σ(α 1 , β 1 , . . . , α n , β n ).

In Example 1.2 of [4], it is shown that (X n ) is G-c.i.d. and X n

−→ X for some a.s.

random variable X. Given t ∈ R, since X ⁿ −→ X and P (X = t) = 0, then ^a.s.

F _n (t) −→ I ^a.s. _{X≤t} . Therefore, F = I _[X,∞) and G ^F = 0, so that G ^F is tight.

(10)

The finite dimensional distributions of C n converge weakly to 0. In fact, P X _n+1 ≤ t | G _n = Φ t − S n

√ 1 − c n

where S _n = P n

i=1 α _i and Φ is the standard normal distribution function. Hence, C n (t) = √

n

F n (t) − I _{S

_n

_≤t} + √

n

I _{S

_n

_≤t} − Φ t − S _n

√ 1 − c _n

.

For fixed t, since P (X = t) = 0, X _n −→ X and S ^a.s. _n −→ X, it is not hard to see ^a.s.

that C n (t) −→ 0. ^a.s.

Toward a contradiction, suppose now that C n meets (8) and define I n =

Z S

_n

+1 S

n

−1

C n (t) dt.

Then C _n −→ 0, so that |I ^d n | ≤ 2 kC n k −→ 0. On the other hand, ^P

I _n = 1

√ n

n

X

i=1

S _n + 1 − X _i ∨ (S _n − 1) +

− √ n

Z S

n

+1 S

_n

−1

Φ t − S n

√ 1 − c n

dt

= 1

√ n

n

X

i=1

S n + 1 − X i ∨ (S n − 1) +

− √ n.

Let

J _n = 1

√ n

n

X

i=1

S _n + 1 − X _i

− √ n = 1

√ n

n

X

i=1

S _n − X i .

Then I n − J n

−→ 0, due to S a.s. n − X n

−→ 0, and thus J a.s. n

−→ 0. But this is a P

contradiction, since J n ∼ N (0, σ _n ² ) with σ ² _n → ∞. Precisely,

σ ² _n = − n

(n + 1)

¹⁵

+ 2 n

n

X

i=1

i

(i + 1)

¹⁵

so that σ _n ² n

⁴⁵

−→ 1

9 . Therefore, condition (8) fails for C _n .

Incidentally, neither W n = √

n (F n − F ) meets (8). In fact, W n (t) −→ 0 for fixed ^a.s.

t. Thus, if W _n meets (8), then sup _t |W _n (t) − W _n (t−)| ≤ 2 kW _n k −→ 0. But this is ^P again a contradiction, for P (X _i 6= X for all i) = 1 and

sup

t

W _n (t)−W _n (t−) ≥

W _n (X)−W _n (X−) = √

n on the set {X _i 6= X for all i}.

5. Uniform CLTs for the empirical processes B ⁿ and C n

Theorems 6 and 7 apply to G-c.i.d. sequences and refer to uniform distance.

In this section, two types of results are obtained. First, Theorems 6 and 7 are

extended to any G-adapted sequence. Second, conditions for B n and C n to converge

in distribution, under a certain distance weaker than the uniform one, are given for

G-c.i.d. sequences. We again let X = R and F = {(−∞, t] : t ∈ R}.

(11)

5.1. Convergence in distribution under uniform distance. Our main tools are the following two (non uniform) CLTs. The first is already known (see Theorem 1 of [7]) while the second is new.

Theorem 9. Suppose (X _n ) is G-adapted and (X _n ² ) uniformly integrable. Define X n = _n ¹ P n

i=1 X i and U n = E(X n+1 | G n ). Then,

√ n X n − U n −→ N (0, L) stably provided

n ³ E

E(U _n+1 | G n ) − U _n 2

−→ 0,

√ 1

n E max

1≤i≤n i |U _i−1 − U _i | −→ 0, 1

n

X

i=1

X i − U i−1 + i (U i−1 − U i ) ² ^P

−→ L.

Theorem 10. Suppose (X _n ) is G-adapted and (X _n ² ) uniformly integrable. Then, M n =

P n

i=1 X i − E(X i | G i−1 )

√ n −→ N (0, L) stably

whenever

(9) 1

n

X

i=1

X i − E(X _i | G _i−1 ) ² P

−→ L.

Moreover, condition (9) applies if 1

n

X

i=1

X _i ² −→ Y ^P and 1 n

n

X

i=1

E(X i | G i−1 ) ² −→ Y − L ^P for some random variable Y , or if

E(X _n ² | G n−1 ) − E(X n | G n−1 ) ² −→ L. ^P Proof. For n ≥ 1 and i = 1, . . . , n, define F _n,0 = G ₀ , F _n,i = G _i and

Y _n,i = n ^−1/2 X i − E(X i | G i−1 ) . Then, M _n = P n

i=1 Y _n,i . Further, Y _n,i is F _n,i -measurable, F _n+1,i = F _n,i , and E(Y n,i | F n,i−1 ) = 0 a.s..

So, by the martingale CLT (see Theorem 3.2, p. 58, of [15]), it suffices proving that

n

X

i=1

Y _n,i ² −→ L, ^P max

1≤i≤n |Y n,i | −→ 0, ^P sup

n

E max

1≤i≤n Y _n,i ² < ∞.

Let D i = X i − E(X i | G i−1 ). By (9), P n

i=1 Y _n,i ² = _n ¹ P n

i=1 D _i ² −→ L. Since (X ^P _n ² ) is uniformly integrable, (D ² _n ) is uniformly integrable as well. Given > 0, take a > 0 such that E D _i ² I _{|D

_i

_|>a} < for all i. Then,

E max

1≤i≤n Y _n,i ² ≤ a ² n + 1

n

X

i=1

E D ² _i I _{|D

_i

_|>a} < a ² n + .

Therefore, lim _n E max _1≤i≤n Y _n,i ² = 0, and this implies that max 1≤i≤n |Y _n,i | −→ 0 ^P

and sup _n E max _1≤i≤n Y _n,i ² < ∞.

(12)

This concludes the proof of the first part. We next prove the sufficient conditions for (9). Define ∆ i = E(X _i ² | G i−1 ) − E(X i | G i−1 ) ² and note that

E

n

X

i=1

(D _i ² − ∆ i ) ≤ E

n

X

i=1

X _i ² − E(X _i ² | G i−1 ) + 2 E

n

X

i=1

D i E(X i | G i−1 ) . Since (X _n ² ) is uniformly integrable, given > 0, there is a > 0 such that

sup

i

EX _i ² (1 − I A

_i

) < where A i = {|X i | ≤ a}.

Further, n E

n

X

i=1

X _i ² I _A

_i

− E(X _i ² I _A

_i

| G i−1 )

o ²

≤ E n X ⁿ

i=1

X _i ² I _A

_i

− E(X _i ² I _A

_i

| G i−1 ) ² o

=

n

X

i=1

E

X _i ² I _A

_i

− E(X _i ² I _A

_i

| G _i−1 ) ²

≤ n a ⁴ . Thus,

1 n E

n

X

i=1

X _i ² − E(X _i ² | G _i−1 ) ≤ a ²

√ n + 2 sup

i

EX _i ² (1 − I _A

_i

) < a ²

√ n + 2 .

Similarly, letting d = psup _i ED ² _i , one obtains 2

n E

n

X

i=1

D i E(X i | G i−1 ) ≤ 2 a d

√ n + 2 sup

i

E|D i | E |X i | (1 − I A

_i

) | G i−1

≤ 2 a d

√ n + 2 sup

i

q

ED ² _i EX _i ² (1 − I _A

_i

) < 2 a d

√ n + 2 d √

.

It follows that

E 1 n

n

X

i=1

X _i ² − 1 n

n

X

i=1

E(X _i ² | G _i−1 ) −→ 0, (10)

E 1 n

n

X

i=1

D ² _i − 1 n

n

X

i=1

∆ i

−→ 0.

(11)

Suppose that _n ¹ P n

i=1 X _i ² −→ Y and ^P ¹ _n P n

i=1 E(X _i | G _i−1 ) ² −→ Y − L. Then, ^P

1 n

P n

i=1 E(X _i ² | G i−1 ) −→ Y by (10), so that ^P _n ¹ P n i=1 ∆ i

−→ L. Thus, (11) implies P 1

n

P n

i=1 D _i ² −→ L, i.e., condition (9) holds. ^P Finally, suppose ∆ n

−→ L. Then E|∆ P n − L| −→ 0, due to (∆ n ) is uniformly integrable, so that E| _n ¹ P n

i=1 ∆ _i − L| −→ 0. Again, (9) follows from (11). In order to apply Theorem 10, note that _n ¹ P n

i=1 X _i ² converges a.s. under various assumptions. This happens, for instance, if EX ₁ ² < ∞ and (X n ) is G-c.i.d. or stationary or 2-exchangeable. (In the 2-exchangeable case, just apply the SLLN in [12]). In turn, ¹ _n P n

i=1 E X _i | G i−1

2 converges a.s. provided E X _n | G n−1

converges a.s., which is true at least in the G-c.i.d. case. We do not know of any example where (X n ) is stationary, EX ₁ ² < ∞, and yet _n ¹ P n

i=1 E X i | G i−1

2 fails to converge in probability. But such example possibly exists.

We now turn to uniform CLTs.

(13)

Theorem 11. Suppose (X n ) is G-adapted, B n meets condition (8), and 1

n

X

i=1

I _{X

_i

_≤t} −→ a(t), ^P 1 n

n

X

i=1

P (X i ≤ s | G i−1 ) P (X i ≤ t | G i−1 ) −→ b(s, t), ^P

for all s, t ∈ R. Then B ⁿ −→ G, under uniform distance on l ^d ^∞ (R), where G is a tight process with distribution (7) and

Σ(t 1 , . . . , t k ) =

a(t i ∧ t j ) − b(t i , t j ) : 1 ≤ i, j ≤ k .

Proof. By (8), it suffices to prove convergence of finite dimensional distributions;

see e.g. Theorem 1.5.4 of [21]. Fix t 1 , . . . , t k , u 1 , . . . , u k ∈ R and define L =

k

X

r=1 k

X

j=1

u r u j a(t r ∧ t j ) − b(t r , t j ).

Define also f = P k

r=1 u r I _(−∞,t

_r

_] . Then, 1

n

X

i=1

f (X _i ) ² =

k

X

r=1 k

X

j=1

u _r u _j 1 n

n

X

i=1

I _{X

_i

_≤t

_r

_∧t

_j

_} −→ ^P

k

X

r=1 k

X

j=1

u _r u _j a(t _r ∧ t j ).

Moreover, 1 n

n

X

i=1

E f (X i ) | G i−1

2 = 1 n

n

X

i=1

n X ^k

r=1

u r P X i ≤ t r | G i−1

o ²

=

k

X

r=1 k

X

j=1

u _r u _j 1 n

n

X

i=1

P X _i ≤ t _r | G _i−1 P X i ≤ t _j | G _i−1 P

−→

k

X

r=1 k

X

j=1

u _r u _j b(t _r , t _j ).

Thus, Theorem 10 applies to (f (X n )), so that

k

X

r=1

u r B n (t r ) = √ n n 1

n

X

i=1

f (X i ) − 1 n

n

X

i=1

E f (X i ) | G i−1

o

−→ N (0, L) stably.

In particular, P k

r=1 u _r B _n (t _r ) converges in distribution to the probability measure ν(B) =

Z

N (0, L)(B) dP for all Borel sets B ⊂ R.

On noting that P k

r=1 u r G(t r ) ∼ ν, one obtains P k

r=1 u r B n (t r ) −→ ^d P k

r=1 u r G(t r ).

By letting u 1 , . . . , u k vary, it follows that B n (t 1 ), . . . , B n (t k ) d

−→ G(t 1 ), . . . , G(t ^k ).

For the next result, as in Theorem 7, we let

q _i (t) = I _{X

_i

_≤t} − i P X _i+1 ≤ t | G _i + (i − 1) P X i ≤ t | G _i−1 .

Theorem 12. Suppose (X _n ) is G-adapted, C _n meets condition (8), and n ³ E n

P (X n+2 ≤ t | G n ) − P (X n+1 ≤ t | G n ) ² o

−→ 0,

√ 1

n E max

1≤i≤n |q _i (t)| −→ 0, 1 n

n

X

i=1

q _i (s) q _i (t) −→ σ(s, t), ^P

(14)

for all s, t ∈ R. Then C ⁿ −→ G, under uniform distance on l ^d ^∞ (R), where G is a tight process with distribution (7) and Σ(t ₁ , . . . , t _k ) =

σ(t _i , t _j ) : 1 ≤ i, j ≤ k . Proof. We just give a sketch of the proof, for it is quite analogous to that of Theorem 11. By (8), it is enough to prove finite dimensional convergence. Fix t ₁ , . . . , t _k , u ₁ , . . . , u _k ∈ R and define L = P k

r=1

P k

j=1 u _r u _j σ(t _r , t _j ) and ν(·) = R N (0, L)(·) dP . Since P ^k _r=1 u _r G(t r ) ∼ ν, it suffices to show that

k

X

r=1

u r C n (t r ) −→ N (0, L) stably.

To this end, we let f = P k

r=1 u r I (−∞,t

_r

] and we apply Theorem 9 to (f (X n )).

Define U _n = E f (X _n+1 ) | G _n = P ^k _r=1 u _r P X _n+1 ≤ t _r | G _n . On noting that E U n+1 | G n = E f (X n+2 ) | G n =

k

X

r=1

u r P X n+2 ≤ t r | G n , one obtains

n ³ E

E(U n+1 | G n ) − U n ²

≤ n ³ k ² max

1≤r≤k u ² _r E

P (X n+2 ≤ t r | G n ) − P (X n+1 ≤ t r | G n ) ²

−→ 0,

√ 1

n E max

1≤i≤n i |U i−1 − U i | ≤

k

X

r=1

|u r | Emax 1≤i≤n |q i (t r )| + 1

√ n −→ 0,

1 n

n

X

i=1

f (X i ) − U i−1 + i (U i−1 − U i ) ²

=

k

X

r=1 k

X

j=1

u r u j

1 n

n

X

i=1

q i (t r ) q i (t j ) −→ L. ^P

Hence, P k

r=1 u r C n (t r ) = √ n 1

n

P n

i=1 f (X i ) − U n −→ N (0, L) stably. 5.2. Convergence in distribution according to Meyer and Zheng. Checking condition (8) is hard in real problems. Indeed, unless (X _n ) is exchangeable, uniform distance is often too strong for tightness of empirical processes. Conditions for convergence in distribution under some weaker distance, thus, are useful.

Let L 0 = L 0 (m), where m is Lebesgue measure on the Borel σ-field on R. Thus, L 0 is the space of (equivalence classes of) Borel functions f : R → R. Define

ρ(f, g) = Z ∞

−∞

1 ∧ |f (t) − g(t)| e ^−|t| dt for f, g ∈ L 0 .

Then, ρ is a distance on L 0 and (L 0 , ρ) is a Polish space. Further, ρ(f n , f ) → 0 means that f _n −→ f on compacts. Convergence in distribution under ρ has been ^m investigated by Meyer and Zheng in [19]. A clear exposition is in Section 4 of [17].

In the sequel, the expression ”almost all (t 1 , . . . , t k ) ∈ R ^k ” is meant with re- spect to Lebesgue measure on R ^k . A real process U on (Ω, A, P ) is called jointly measurable if the map (t, ω) 7→ U (t, ω) is measurable. Two basic facts are to be mentioned.

(i) Let U and V be real jointly measurable processes indexed by R. Then, U and V are L ₀ -valued random variables and U ∼ V if and only if

U (t 1 ), . . . , U (t k ) ∼ V (t 1 ), . . . , V (t k )

for all k ≥ 1 and almost all (t ₁ , . . . , t _k ) ∈ R ^k .

(15)

(ii) Let U n and U be real jointly measurable processes indexed by R. For U n

−→ U under ρ, it suffices that the sequence (U d n ) is tight under ρ and U n (t 1 ), . . . , U n (t k ) ^d

−→ U (t 1 ), . . . , U (t k ) for all k ≥ 1 and almost all (t 1 , . . . , t k ) ∈ R ^k .

In view of (i), B n : Ω → L 0 and C n : Ω → L 0 are L 0 -valued random vari- ables. Another fact needs to be recalled from Subsection 3.1. Let (X n ) be G-c.i.d..

Up to enlarging the underlying probability space (Ω, A, P ), there are a random distribution function F and an exchangeable sequence (Z _n ) such that

F _n (t) −→ F (t) ^a.s. and F _n ^∗ (t) −→ F (t) ^a.s. for each t ∈ R, (12)

where F _n (t) = 1 n

n

X

i=1

I _{X

_i

_≤t} and F _n ^∗ (t) = 1 n

n

X

i=1

I _{Z

_i

_≤t} .

Our last results deal with the asymptotic behavior of B _n and C _n under ρ. In such results, (X _n ) is G-c.i.d., (Z _n ) is exchangeable, F is a random distribution function, and F _n , F _n ^∗ and F satisfy condition (12).

Theorem 13. If

√ n F n (t) − F _n ^∗ (t) a.s.

−→ 0 and √

n F (t) − E(F (t) | G n ) a.s.

−→ 0 (13)

for almost all t ∈ R, then B n

−→ G d ^F and C _n −→ G ^d ^F under ρ.

Proof. Let R n = √

n F _n ^∗ −F . If (Z n ) is i.i.d., R n

−→ G d ^F under uniform distance.

Hence, R n

−→ G d ^F under ρ, for uniform distance is stronger than ρ. Next, suppose (Z n ) exchangeable (and not necessarily i.i.d.). Since (L 0 , ρ) is Polish, de Finetti’s representation theorem again implies R _n −→ G ^d ^F under ρ. Thus, it suffices to prove that ρ(B n , R n ) −→ 0 and ρ(C ^P n , R n ) −→ 0. ^P

Since P (X i ≤ t | G i−1 ) = E(F (t) | G i−1 ) a.s., then B n (t) − R n (t) = √

n F n (t) − F _n ^∗ (t) +

n

X

i=1

(n i) ^−1/2 √

i F (t) − E(F (t) | G i−1 ) a.s..

Because of (13) and P n

i=1 (n i) ^−1/2 → 2, one obtains B _n (t) − R _n (t) −→ 0 for almost ^a.s.

all t ∈ R. Letting A = {(t, ω) : B ⁿ (t, ω)−R n (t, ω) does not converge to 0}, it follows that

Z

Ω

m{t : (t, ω) ∈ A} P (dω) = Z ∞

−∞

P {ω : (t, ω) ∈ A} dt = 0.

Hence B n (·, ω) − R n (·, ω) −→ 0, m-a.s., for each ω in a set of P -probability 1. This implies ρ(B _n , R _n ) −→ 0. Finally, ρ(C ^a.s. n , R _n ) −→ 0 follows from exactly the same ^a.s.

argument.

Theorem 14. Suppose X ₁ discrete or

inf

>0 lim inf

n P |X _n − X n+1 | < = 0.

Let q _i (t) = I _{X

_i

_≤t} − i P X i+1 ≤ t | G i + (i − 1) P X i ≤ t | G i−1 . If

√ n E

F n (t) − F _n ^∗ (t)

−→ 0 for almost all t ∈ R,

(14)

(16)

then B n

−→ G d ^F under ρ. If, in addition,

√ 1

n E max

1≤i≤n |q i (t)| −→ 0 and 1 n

n

X

i=1

q i (s) q i (t) −→ σ(s, t) ^P for almost all (s, t) ∈ R ² ,

then C _n −→ G under ρ. Here, G is a process satisfying equation (7) with Σ(t ^d 1 , . . . , t _k ) =

σ(t i , t j ) : 1 ≤ i, j ≤ k

for all k ≥ 1 and almost all (t 1 , . . . , t k ) ∈ R ^k .

Proof. Since (X _n ) is G-c.i.d., the finite dimensional distributions of B _n converge weakly to those of G ^F ; see Theorem 4.2 of [4] and its proof. Similarly, arguing as in the proof of Theorem 12 and using the conditions on q i (t), one obtains

C _n (t ₁ ), . . . , C _n (t _k ) d

−→ G(t 1 ), . . . , G(t k )

for all k ≥ 1 and almost all (t ₁ , . . . , t _k ) ∈ R ^k . Therefore, it is enough to prove tightness of (B _n ) and (C _n ) under ρ.

By Theorem 2.5 of [4], (X n , X n+1 ) −→ (Z ^d 1 , Z 2 ). Thus, Z 1 is discrete if X 1 is discrete. Otherwise, if X 1 is not discrete,

P (Z ₁ = Z ₂ ) = inf

>0 P |Z ₁ − Z 2 | < ≤ inf

>0 lim inf

n P |X _n − X n+1 | < = 0.

Let R n = √

n F _n ^∗ − F . Since (Z n ) is exchangeable and Z 1 is discrete or P (Z 1 = Z 2 ) = 0, then R n

−→ G d ^F under uniform distance; see Subsection 4.2.

Given t ∈ R, recall E F (t) | G n = P X n+1 ≤ t | G n a.s. and define Y n (t) = ER n (t) | G n = √

n n

EF _n ^∗ (t) | G n − P X n+1 ≤ t | G n

o . Since R n satisfies condition (8), Y n meets (8) as well; see Remark 4.4 of [4]. Hence, (Y n ) is tight under ρ (for uniform distance is stronger than ρ). Further,

Eρ(C n , Y n ) = Z ∞

−∞

E n 1 ∧

√ n E F n (t) − F _n ^∗ (t) | G n o

e ^−|t| dt

≤ Z ∞

−∞

1 ∧ n √ n E

F n (t) − F _n ^∗ (t) o

e ^−|t| dt −→ 0 by condition (14).

Thus, (C n ) is tight under ρ, because (Y n ) is tight under ρ and ρ(C n , Y n ) −→ 0. ^P We finally prove (B n ) tight under ρ. Let

D n (t) = R n (t) + E(F (t) | G n ) − E(F (t) | G 0 )

√ n +

n

X

i=1

(n i) ^−1/2 Y i (t) − R i (t) .

Since (D n ) is tight under ρ, it suffices to show that Eρ(B n , D n ) −→ 0. Write B n as

B n (t) = W n (t) + E(F (t) | G n ) − E(F (t) | G 0 )

√ n +

n

X

i=1

(n i) ^−1/2 C i (t) − W i (t)

where W n = √

n F n − F . Then, condition (14) implies E

B n (t) − D n (t) −→ 0 for almost all t ∈ R, which in turn implies E ρ(B n , D _n ) −→ 0.

(17)

The idea underlying Theorems 13-14 is straightforward: B n and C n behave nicely under ρ provided (X n ) is close enough to a suitable exchangeable sequence (Z n ). In the spirit of [1], we note that (X n

j

) is actually close to (Z n

j

) for some subsequence (n _j ). Finally, we give a (technical) remark and an example.

Remark 15. Under the assumptions of Theorem 14, one obtains B _n = D _n + D ^∗ _n and C _n = Y _n + Y _n ^∗

with D _n and Y _n satisfying (8) and D ^∗ _n −→ 0, Y ^P _n ^∗ −→ 0 under ρ. Moreover, if the ^P conditions on q _i (t) hold for every (s, t) ∈ R ² and

√ n EkF n − F _n ^∗ k −→ 0,

then B _n −→ G ^d ^F and C _n −→ G under uniform distance. ^d

Example 16. (Example 8 continued). Let the notation of Example 8 prevail.

Since F = I [X,∞) , the exchangeable sequence (Z n ) can be taken to be Z n = X for all n. In particular, G ^F = 0 and F _n ^∗ = F for all n. Since X n

−→ X, S a.s. n

−→ X, a.s.

and P (X = t) = 0 for fixed t, condition (13) is trivially true. Thus, Theorem 13 yields B _n −→ 0 and C ^d _n −→ 0 under ρ. ^d

References

[1] Aldous D.J. (1977) Limit theorems for subsequences of arbitrarily-dependent sequences of random variables, Prob. Theo. Relat. Fields, 40, 59-82.

[2] Bassetti F., Crimaldi I., Leisen F. (2010) Conditionally identically distributed species sampling sequences, Adv. in Appl. Probab., 42, 433-459.

[3] Berti P., Rigo P. (2002) A uniform limit theorem for predictive distributions, Statist. Probab. Letters, 56, 113-120.

[4] Berti P., Pratelli L., Rigo P. (2004) Limit theorems for a class of identically distributed random variables, Ann. Probab., 32, 2029-2052.

[5] Berti P., Pratelli L., Rigo P. (2006) Asymptotic behaviour of the empirical process for exchangeable data, Stoch. Proc. Appl., 116, 337-344.

[6] Berti P., Crimaldi I., Pratelli L., Rigo P. (2009) Rate of convergence of predic- tive distributions for dependent data, Bernoulli, 15, 1351-1367.

[7] Berti P., Crimaldi I., Pratelli L., Rigo P. (2011) A central limit theorem and its applications to multicolor randomly reinforced urns, J. Appl. Probab., 48, 527-546.

[8] Crimaldi I., Letta G., Pratelli L. (2007) A strong form of stable convergence, Seminaire de Probabilites XL, Lect. Notes in Math., 1899, 203-225.

[9] Dawid P. (1982) The well-calibrated Bayesian (with discussion), J. Amer.

Statist. Assoc., 77, 605-613.

[10] Dubins L.E., Savage L.J. (1965) How to gamble if you must: Inequalities for stochastic processes, McGraw Hill.

[11] Dudley R.M. (1999) Uniform central limit theorems, Cambridge University Press.

[12] Etemadi N., Kaminski M. (1996) Strong law of large numbers for 2- exchangeable random variables, Statist. Probab. Letters, 28, 245-250.

[13] Ferguson T.S. (1973) A Bayesian analysis of some nonparametric problems,

Ann. Statist., 1, 209-230.

(18)

[14] Fortini S., Ladelli L., Regazzini E. (2000) Exchangeability, predictive distribu- tions and parametric models, Sankhya A, 62, 86-109.

[15] Hall P., Heyde C.C. (1980) Martingale limit theory and its applications, Aca- demic Press.

[16] Kallenberg O. (1988) Spreading and predictable sampling in exchangeable se- quences and processes, Ann. Probab., 16, 508-534.

[17] Kurtz T.G. (1991) Random time changes and convergence in distribution under the Meyer-Zheng conditions, Ann. Probab., 19, 1010-1034.

[18] Liu J.S. (2001) Monte Carlo strategies in scientific computing, Springer.

[19] Meyer P.A., Zheng W.A. (1984) Tightness criteria for laws of semimartingales, Ann. Inst. H. Poincare Probab. Statist., 20, 353-372.

[20] Tierney L. (1994) Markov chains for exploring posterior distributions (with discussion), Ann. Statist., 22, 1701-1762.

[21] van der Vaart A., Wellner J.A. (1996) Weak convergence and empirical pro- cesses, Springer.

Patrizia Berti, Dipartimento di Matematica Pura ed Applicata ”G. Vitali”, Univer- sita’ di Modena e Reggio-Emilia, via Campi 213/B, 41100 Modena, Italy

E-mail address: [email protected]

Luca Pratelli, Accademia Navale, viale Italia 72, 57100 Livorno, Italy E-mail address: [email protected]

Pietro Rigo (corresponding author), Dipartimento di Matematica ”F. Casorati”, Uni- versita’ di Pavia, via Ferrata 1, 27100 Pavia, Italy

E-mail address: [email protected]

Abstract. Let (X

BASED ON DEPENDENT DATA

PATRIZIA BERTI, LUCA PRATELLI, AND PIETRO RIGO

Abstract. Let (X

) be any sequence of random variables adapted to a fil- tration (G

). Define a

(·) = P X

∈ · | G

, b

=

P

a

, and µ

=

P

δ

. Convergence in distribution of the empirical processes B

= √

n (µ

− b

) and C

= √

n (µ

− a

)

is investigated under uniform distance. If (X

) is conditionally identically distributed (in the sense of [4]) convergence of B

and C

is studied according to Meyer-Zheng as well. Some CLTs, both uniform and non uniform, are proved. In addition, various examples and a characterization of conditionally identically distributed sequences are given.

1. Introduction

Almost all work on empirical processes, so far, concerned ergodic sequences (X n ) of random variables. Slightly abusing terminology, here, (X n ) is called ergodic if the underlying probability measure P is 0-1 valued on the sub-σ-field

σ lim sup

n

1 n

n

X

i=1

I B (X i ) : B a measurable set.

In real problems, however, (X n ) is often non ergodic in the previous sense. Most stationary sequences, for instance, are non ergodic. Or else, an exchangeable se- quence is ergodic if and only if it is i.i.d..

2000 Mathematics Subject Classification. 60B10, 60F05, 60G09, 60G57.

Key words and phrases. Conditional identity in distribution, Empirical process, Exchangeabil- ity, Predictive measure, Stable convergence.

1

a filtration (G n ). Define the predictive measure a n (·) = P (X n+1 ∈ · | G n ), the empirical measure µ n = n 1 P n

i=1 δ X

, and the empirical processes B n = √

n (µ n − b n ) and C n = √

n (µ n − a n ) where b n = n 1 P n−1

i=0 a i . Our main results provide conditions for B n and C n to converge in distribution, under uniform distance as well as in Meyer-Zheng’s sense;

see Theorems 10-14.

2. Notation and basic definitions

Let (Ω n , A n , P n ) be a sequence of probability spaces and Y n : Ω n → S. The maps Y n are not requested to be measurable. Denote by C b (S) the set of real bounded continuous functions on S. Given a Borel probability ν on S, say that Y n converges in distribution to ν if

E ∗ f (Y n ) −→

Z

f dν for all f ∈ C b (S).

In such case, we write Y n −→ Y for any S-valued random variable Y , defined on d some probability space, with distribution ν. We refer to [11] and [21] for more on convergence in distribution of non measurable random elements.

Finally, we turn to stable convergence. Fix a random probability measure γ on S and suppose (Ω n , A n , P n ) = (Ω, A, P ) for all n. Say that Y n converges stably to γ whenever

E ∗ f (Y n ) | H −→ E(γ(f ) | H) for all f ∈ C b (S) and H ∈ A with P (H) > 0.

Here, γ(f ) denotes the real random variable ω 7→ R f (x) γ(ω)(dx). Stable conver-

gence clearly implies convergence in distribution. Indeed, Y n converges in distribu-

tion to the probability measure E(γ(·) | H), under P (· | H), for each H ∈ A with

P (H) > 0. Stable convergence has been introduced by Renyi and subsequently

investigated by various authors (in case the Y n are measurable). We refer to [8],

[15] and references therein for details.

3. Conditionally identically distributed random variables 3.1. Basics. Let (X n : n ≥ 1) be adapted to the filtration G = (G n : n ≥ 0). Then, (X n ) is conditionally identically distributed with respect to G, or G-c.i.d., if (1) P X k ∈ · | G n = P X n+1 ∈ · | G n

a.s. for all k > n ≥ 0.

Roughly speaking, (1) means that, at each time n ≥ 0, the future observations (X k : k > n) are identically distributed given the past G n . Condition (1) is equivalent to

X T +1 ∼ X 1 for each finite G-stopping time T.

(For any random variables U and V , we write U ∼ V to mean that U and V are identically distributed). When G 0 = {∅, Ω} and G n = σ(X 1 , . . . , X n ), the filtration is not mentioned at all and (X n ) is just called c.i.d.. Clearly, if (X n ) is G-c.i.d.

then it is c.i.d. and identically distributed. Moreover, (X n ) is c.i.d. if and only if (2) X 1 , . . . , X n , X n+2 ∼ X 1 , . . . , X n , X n+1

for all n ≥ 0.

Let (X n ) be G-c.i.d. and µ n = 1 n P n

i=1 δ X

the empirical measure. Then, there is a random probability measure γ on X satisfying

µ n (B) −→ γ(B) a.s. for every fixed B ∈ B.

As a consequence, given n ≥ 0 and B ∈ B, one obtains Eγ(B) | G n = lim

k Eµ k (B) | G n = lim

k

1 k

k

X

Almost all work on empirical processes, so far, concerned ergodic sequences (X _n ) of random variables. Slightly abusing terminology, here, (X _n ) is called ergodic if the underlying probability measure P is 0-1 valued on the sub-σ-field

In real problems, however, (X _n ) is often non ergodic in the previous sense. Most stationary sequences, for instance, are non ergodic. Or else, an exchangeable se- quence is ergodic if and only if it is i.i.d..

a filtration (G n ). Define the predictive measure a n (·) = P (X n+1 ∈ · | G n ), the empirical measure µ n = _n ¹ P n

n (µ n − a n ) where b n = _n ¹ P n−1

E ^∗ f (Y n ) −→

In such case, we write Y _n −→ Y for any S-valued random variable Y , defined on ^d some probability space, with distribution ν. We refer to [11] and [21] for more on convergence in distribution of non measurable random elements.

E ^∗ f (Y n ) | H −→ E(γ(f ) | H) for all f ∈ C b (S) and H ∈ A with P (H) > 0.

investigated by various authors (in case the Y _n are measurable). We refer to [8],

3. Conditionally identically distributed random variables 3.1. Basics. Let (X n : n ≥ 1) be adapted to the filtration G = (G n : n ≥ 0). Then, (X _n ) is conditionally identically distributed with respect to G, or G-c.i.d., if (1) P X k ∈ · | G n = P X n+1 ∈ · | G n

Let (X n ) be G-c.i.d. and µ n = ¹ _n P n

µ n (B) −→ γ(B) ^a.s. for every fixed B ∈ B.

As a consequence, given n ≥ 0 and B ∈ B, one obtains Eγ(B) | G n = lim

k Eµ k (B) | G n = lim

Suppose next X = R. Up to enlarging the underlying probability space (Ω, A, P ), there is an i.i.d. sequence (U _n ) with U ₁ uniformly distributed on (0, 1) and (U _n ) independent of (X n ). Define the random distribution function F (t) = γ(−∞, t], t ∈ R, and

Z n = inf{t ∈ R : F (t) ≥ U ⁿ }.

Then, (Z n ) is exchangeable and _n ¹ P n

i=1 I _{Z

_∈B} −→ γ(B) for each B ∈ B. The ^a.s.

3.2. Characterizations. Following [10], let us call strategy any collection σ = {σ(q) : q = ∅ or q ∈ X ⁿ for some n = 1, 2, . . .}

where each σ(q) is a probability on B and (x 1 , . . . , x n ) 7→ σ(x 1 , . . . , x n )(B) is Borel measurable for all n ≥ 1 and B ∈ B. Here, ∅ denotes ”the empty sequence”. Let π n be the n-th coordinate projection on X ^∞ , i.e.,

π n (x 1 , . . . , x n , . . .) = x n for all n ≥ 1 and (x 1 , . . . , x n , . . .) ∈ X ^∞ .

By Ionescu Tulcea theorem, each strategy σ induces a unique probability ν on (X ^∞ , B ^∞ ). By ”σ induces ν” we mean that, under ν,

π 1 ∼ σ(∅) and {σ(q) : q ∈ X ⁿ } is a version of the conditional (3)

distribution of π _n+1 given (π ₁ , . . . , π _n ) for all n ≥ 1.

Conversely, since X is Polish, each probability ν on (X ^∞ , B ^∞ ) is induced by an (essentially unique) strategy σ.

Let α 0 and {α(x) : x ∈ X } be probabilities on B such that the map x 7→ α(x)(B) is Borel measurable for B ∈ B. Say that {α(x) : x ∈ X } is a (Markov) kernel with stationary distribution α ₀ in case α ₀ (B) = R α(x)(B) α 0 (dx) for B ∈ B.

If q = (x ₁ , . . . , x _n ) ∈ X ⁿ and x ∈ X , we write (q, x) = (x ₁ , . . . , x _n , x) and (∅, x) = x. In this notation, the following result is available.

Theorem 1. Let ν be the probability distribution of the sequence (X _n ). Then, (X _n ) is c.i.d. if and only if ν is induced by a strategy σ satisfying

(a) the kernel {σ(q, x) : x ∈ X } has stationary distribution σ(q) for q = ∅ and for almost all q ∈ X ⁿ , n = 1, 2, . . ..

{σ(q) : q ∈ X ⁿ } is a version of the conditional (4)

σ(x)(B) σ(∅)(dx) = P (X ₂ ∈ B) = P (X ₁ ∈ B) = σ(∅)(B), B ∈ B, which just means that the kernel {σ(x) : x ∈ X } has stationary distribution σ(∅).

for all n ≥ 1, there is H n ∈ B ⁿ such that P (X 1 , . . . , X n ) ∈ H n = 1 and

σ(q, x)(B) σ(q)(dx) = σ(q)(B) for all q ∈ H _n and B ∈ B.

Therefore, (X _n ) is c.i.d. if and only if σ can be taken to meet condition (a). Practically, Theorem 1 suggests how to assess a c.i.d. sequence (X n ) stepwise.

α(x)(B) α ₀ (dx) = Z

α(x)(A) α ₀ (dx) for all A, B ∈ B.

Theorem 2. The sequence (X _n ) is exchangeable if and only if its probability dis- tribution is induced by a strategy σ such that

(c) σ( ^∼ q ) = σ(q) whenever ^∼ q is a permutation of q,

for q = ∅ and for almost all q ∈ X ⁿ , n = 1, 2, . . .. (with ^∼ q = q if q = ∅).

= r µ ₁ (A) + P n

i=1 z _i I _A (y _i ) r + P n

The strategy σ of Example 3 makes sense in some real problems. In general, the z _n should be viewed as weights while the y _n describe the phenomenon of interest.

ν(B) Q(dν), B ∈ B ^∞ ,