• Non ci sono risultati.

: n ≥ 1) a sequence of S-valued random variables. Fix a Borel probability measure σ

N/A
N/A
Protected

Academic year: 2021

Condividi ": n ≥ 1) a sequence of S-valued random variables. Fix a Borel probability measure σ"

Copied!
12
0
0

Testo completo

(1)

ASYMPTOTICS OF CERTAIN CONDITIONALLY IDENTICALLY DISTRIBUTED SEQUENCES

PATRIZIA BERTI, EMANUELA DREASSI, LUCA PRATELLI, AND PIETRO RIGO

Abstract. Let S be a Borel subset of a Polish space and (X

n

: n ≥ 1) a sequence of S-valued random variables. Fix a Borel probability measure σ

0

on S, a constant q

0

∈ [0, 1] and a measurable function q

i

: S

i

→ [0, 1] for each i ≥ 1. Suppose X

1

∼ σ

0

and

P X

n+1

∈ · | X

1

, . . . , X

n

 = σ

0

n

Y

i=1

Q

i

+ δ

Xn

(1 − Q

n

) +

n−1

X

i=1

δ

Xi

(1 − Q

i

)

n

Y

j=i+1

Q

j

where Q

i

= q

i−1

(X

1

, . . . , X

i−1

). Sequences of this type, introduced in [10], are conditionally identically distributed and play a role in Bayesian predictive inference. This paper deals with the asymptotics of (X

n

). As expected, (X

n

) exhibits different behaviors depending on the Q

i

. For instance, (X

n

) converges a.s. if α ≤ Q

i

≤ β a.s. for all i, where 0 < α ≤ β < 1 are constants, while (X

n

) does not converge even in probability if σ

0

is nondegenerate, Q

i

> 0 for all i and P

i

(1 − Q

i

) < ∞ a.s. A stable CLT for (X

n

) is proved as well.

1. Introduction

Throughout, S is a Borel subset of a Polish space, B the Borel σ-field on S, and P the collection of all probability measures on B. Moreover, X n is the n-th coordinate projection on S , i.e.

X n (s 1 , . . . , s n , . . .) = s n

for each n ≥ 1 and each (s 1 , . . . , s n , . . .) ∈ S .

Following Dubins and Savage [14], a strategy is a sequence σ = (σ 0 , σ 1 , . . .) such that

• σ 0 ∈ P and σ n = {σ n (x) : x ∈ S n } is a collection of elements of P;

• The map x 7→ σ n (x)(B) is B n -measurable for fixed n ≥ 1 and B ∈ B.

Here, σ 0 should be regarded as the marginal distribution of X 1 and σ n (x) as the conditional distribution of X n+1 given that (X 1 , . . . , X n ) = x. The probabilities σ 0 and σ n (x) are also called the predictive distributions of the sequence (X n ).

According to the Ionescu-Tulcea theorem, for any strategy σ, there is a unique probability measure P on (S , B ) satisfying

P (X 1 ∈ ·) = σ 0 and P X n+1 ∈ · | (X 1 , . . . , X n ) = x = σ n (x) for all n ≥ 1 and P -almost all x ∈ S n .

Such a P is denoted P σ in the sequel.

2010 Mathematics Subject Classification. 62F15, 62M20, 60G25, 60G09.

Key words and phrases. Bayesian nonparametrics, Conditional identity in distribution, Ex- changeability, Limit theorem, Predictive distribution, Predictive inference.

1

(2)

The Ionescu-Tulcea theorem plays a role in Bayesian predictive inference. In fact, in a Bayesian framework, to make predictions on the sequence (X n ) the inferrer needs to select a strategy σ. At each time n ≥ 1, having observed (X 1 , . . . , X n ) = x, the next observation X n+1 is predicted through the predictive distribution σ n (x).

This procedure makes sense, for any strategy σ, because of the Ionescu-Tulcea theorem.

1.1. Standard and non-standard approach for exchangeable data. Usu- ally, (X n ) is requested to be exchangeable. Under this assumption, the standard approach to obtain σ is quite involved. Indeed, to get σ, the inferrer should:

(i) Select a prior π, namely, a probability measure on P;

(ii) Calculate the posterior of π given that (X 1 , . . . , X n ) = x, say π n (x);

(iii) Evaluate σ as σ 0 (B) =

Z

P

p(B) π(dp) and σ n (x)(B) = Z

P

p(B) π n (x)(dp) for all B ∈ B.

Steps (i)-(ii) are troublesome. To assess a prior π is clearly hard. But even when π is selected, to evaluate the posterior π n may be not straightforward. Frequently, π n can not be written in closed form but only approximated numerically.

A non-standard approach (henceforth, NSA) is to assign σ n directly, without passing through π and π n . Merely, instead of choosing π and then evaluating π n and σ n , the inferrer just selects his/her predictive distribution σ n . As noted above, this procedure makes sense because of the Ionescu-Tulcea theorem. See [3], [6], [10], [12], [13], [14], [15], [18], [19]; see also [16], [21], [22], [23] and references therein.

NSA is in line with de Finetti, Dubins and Savage, among others. Recently, NSA has been adopted in [18] to obtain a fast online Bayesian prediction via copulas. In addition, NSA is quite implicit in most of the machine learning literature. From our point of view, NSA has essentially two merits. Firstly, it requires to place probabilities on observable facts only. The value of the next observation X n+1 is actually observable, while π and π n (being probabilities on P) do not deal with observable facts. Secondly, NSA is much more direct than the standard approach.

In fact, if the main goal is to predict future observations, why to select the prior π explicitly ? Rather than wondering about π, it looks reasonable to reflect on how the next observation X n+1 is affected by (X 1 , . . . , X n ).

However, if (X n ) is requested to be exchangeable, NSA has a gap. Given an arbitrary strategy σ, the Ionescu-Tulcea theorem does not grant exchangeability of (X n ) under P σ . Therefore, for NSA to apply, one should first characterize those strategies σ which make (X n ) exchangeable under P σ . A nice characterization is [15, Theorem 3.1]. However, the conditions on σ for making (X n ) exchangeable are quite hard to be checked in real problems. This is the main reason for NSA has not developed so far.

1.2. Predictive inference with conditionally identically distributed data.

To bypass the gap mentioned in the above paragraph, the exchangeability assump-

tion could be weakened. One option is to request (X n ) to be conditionally identically

(3)

distributed (c.i.d.), namely

(1) P X k ∈ · | F n  = P X n+1 ∈ · | F n 

a.s. for all k > n ≥ 0 where F n = σ(X 1 , . . . , X n ) and F 0 is the trivial σ-field.

Roughly speaking, condition (1) means that, at each time n ≥ 0, the future observations (X k : k > n) are identically distributed given the past F n . Condition (1) is actually weaker than exchangeability. Indeed, (X n ) is exchangeable if and only if is stationary and c.i.d.

We refer to Subsection 2.1 for the essentials of c.i.d. sequences. Here, we just mention three reasons for taking c.i.d. data into account.

(j) It is not hard to characterize the strategies σ which make (X n ) c.i.d. under P σ ; see Theorem 1. Therefore, unlike the exchangeable case, NSA can be easily implemented.

(jj) C.i.d. sequences behave asymptotically much in the same way as exchange- able ones; see Subsection 2.1.

(jjj) A number of meaningful strategies can not be used if (X n ) is requested to be exchangeable, but are available if (X n ) is only asked to be c.i.d. A trivial example is the strategy (3) reported below. Various other examples are in [1], [2] and [10].

Motivated by (j)-(jjj), in [10], a few strategies σ which makes (X n ) c.i.d. are introduced. One of such strategies is the following.

Fix σ 0 ∈ P, a constant q 0 ∈ [0, 1] and the measurable functions q n : S n → [0, 1].

For all n ≥ 1 and x = (x 1 , . . . , x n ) ∈ S n , define

σ n (x) = σ 0 n−1

Y

i=0

q i + δ x

n

(1 − q n−1 ) +

n−1

X

i=1

δ x

i

(1 − q i−1 )

n−1

Y

j=i

q j

(2)

where δ x

i

is the unit mass at x i and q i is a shorthand notation to denote q i = q i (x 1 , . . . , x i ).

Then, (X n ) is c.i.d. under P σ . Further, σ satisfies the recursive equation σ n+1 (x, y) = q n (x) σ n (x) + 1 − q n (x) δ y

for all n ≥ 0, x ∈ S n and y ∈ S. Thus, when a new observation y becomes available, σ n+1 (x, y) can be obtained by a simple recursive update of σ n (x).

The strategy (2) is connected to Beta-GOS processes, as meant in [1], and is analogous to formula (10) of [18]. Note also that, if σ 0 vanishes on singletons, the q i

have the following interpretation. Let x = (x 1 , . . . , x n ). Since σ 0 {x 1 , . . . , x n } = 0 and δ x

i

{x 1 , . . . , x n } = 1 for i ≤ n, it follows that

P σ 

X n+1 = X i for some i ≤ n | (X 1 , . . . , X n ) = x 

= σ n (x) {x 1 , . . . , x n } 

= (1 − q n−1 ) +

n−1

X

i=1

(1 − q i−1 )

n−1

Y

j=i

q j = 1 −

n−1

Y

i=0

q i .

More importantly, choosing q i suitably, various real situations can be modeled

by σ. As an example, if q ∈ (0, 1) is a constant and q i = q for each i ≥ 0, one

(4)

obtains

σ n (x) = q n σ 0 + (1 − q)

n

X

i=1

q n−i δ x

i

; (3)

see also [1] and [2]. Roughly speaking, this choice of σ makes sense when the inferrer has only vague opinions on the dependence structure of the data, and yet he/she feels that the weight of the i-th observation x i should be a decreasing function of n − i. In this case, σ n (x) is not invariant under permutations of x, so that (X n ) fails to be exchangeable under P σ .

As another example, take a constant c > 0 and define q i = i+1+c i+c . Then, formula (2) yields the predictive distributions of a Dirichlet sequence, i.e.

σ n (x) = c σ 0 + P n i=1 δ x

i

n + c .

In the above two examples, q i does not depend on (x 1 , . . . , x i ). Clearly, much more elaborated strategies can be obtained if q i actually depends on (x 1 , . . . , x i ).

We refer to [10] for examples of this type, including generalized Polya urns and species sampling sequences.

1.3. Main results. If a strategy σ is used to make predictions, a meaningful in- formation is the asymptotic behavior of the data sequence (X n ) under P σ . This paper investigates the asymptotics of (X n ) under P σ when σ is given by (2). Our main results, formally stated in Section 3, are a strong limit theorem and a stable CLT. Here, we briefly sketch such results.

Consider the probability space (S , B , P σ ), where σ is given by (2), and define Q n = q n−1 (X 1 , . . . , X n−1 ).

The strong limit theorem is

• X n converges a.s. whenever α ≤ Q n ≤ β a.s. for all n, where 0 < α ≤ β < 1 are constants;

• X n does not converge even in probability whenever σ 0 is nondegenerate, Q n > 0 for all n and P

n (1 − Q n ) < ∞ a.s.

Thus, it may be that X n is non-trivial and yet it converges a.s. This is a big difference with respect to the exchangeable case. In fact, an exchangeable sequence Y n converges in probability if and only if Y n = Y 1 a.s. for each n.

Let us turn to the stable CLT. We first recall that stable convergence is a strong form of convergence in distribution; see Subsection 2.2. In particular, stable con- vergence implies convergence in distribution.

For definiteness, suppose S = [a, b] is a bounded interval of the real line. (Oth- erwise, as in Section 3, it suffices to replace X n with f (X n ) where f : S → R is a bounded measurable function). Since (X n ) and (X n 2 ) are both c.i.d. under P σ ,

1 n

n

X

i=1

X i

−→ V a.s. and 1 n

n

X

i=1

X i 2 −→ V a.s.

for some random variables V and V ; see Subsection 2.1. Our CLT is

(5)

• If P

n 1 − E(Q n ) < ∞ and Q n ≤ Q n+1 a.s. for all n, then

√ n X n − V  → N (0, L) stably

where X n = 1

n

n

X

i=1

X i and L = V − V 2 a.s. = lim

n

1 n

n

X

i=1

X i − X n

2

.

In a Bayesian framework, the limit V of the sample means can be seen as a ran- dom parameter and the above CLT is useful to make inference on V . In particular, it allows to build (approximate) credible intervals for V .

Finally, under the same assumptions of the previous CLT, it is also shown that

√ n n

X n − E X n+1 | X 1 , . . . , X n  o

→ N (0, L) stably.

2. Preliminaries

From now on, (Ω, A, P ) is a probability space, (Y n : n ≥ 1) a sequence of S- valued random variables on (Ω, A, P ), and

F 0 = {∅, Ω}, F n = σ(Y 1 , . . . , Y n ).

2.1. Conditionally identically distributed random variables. C.i.d. sequences have been introduced in [4] and [20] and then investigated in various papers; see e.g. [1], [2], [6], [8], [9], [10], [11], [17]. Here, we just recall a few basic facts.

Let (G n : n ≥ 0) be a filtration on (Ω, A, P ). Then, (Y n ) is c.i.d. with respect to (G n ) if is adapted to (G n ) and

P Y k ∈ · | G n  = P Y n+1 ∈ · | G n

 a.s. for all k > n ≥ 0.

If G n = F n , the filtration is not mentioned at all and (Y n ) is just called c.i.d. In this case, by a result in [20], (Y n ) is exchangeable if and only if is stationary and c.i.d.

Asymptotically, a c.i.d. sequence (Y n ) looks like an exchangeable one. We sup- port this claim by three facts.

First, (Y n ) is asymptotically exchangeable, in the sense that (Y n , Y n+1 , . . .) → (Z 1 , Z 2 , . . .) in distribution, as n → ∞, where (Z 1 , Z 2 , . . .) is an exchangeable sequence.

Second, for each bounded measurable function f : S → R, one obtains 1

n

n

X

i=1

f (Y i ) −→ V a.s. and Ef (Y n+1 ) | F n

a.s.

−→ V

for some real random variable V .

To state the third fact, let µ n = n 1 P n

i=1 δ Y

i

be the empirical measure. Then, there is a random probability measure µ on (S, B) satisfying

µ n (B) −→ µ(B) a.s. as n → ∞ for every fixed B ∈ B.

(6)

As a consequence, for fixed n ≥ 0 and B ∈ B, one obtains Eµ(B) | F n = lim

m Eµ m (B) | F n

= lim

m

1 m

m

X

i=n+1

P Y i ∈ B | F n  = P Y n+1 ∈ B | F n  a.s.

Thus, as in the exchangeable case, the predictive distribution P Y n+1 ∈ · | F n  can be written as Eµ(·) | F n , where µ is the a.s. weak limit of the empirical measures µ n .

Finally, to complete claim (j) of Subsection 1.2, we report a characterization of c.i.d. sequences in terms of strategies.

Theorem 1. ([8, Theorem 3.1]). Let σ be a strategy. Then, (X n ) is c.i.d. under P σ if and only if

σ 0 (B) = Z

σ 1 (y)(B) σ 0 (dy) and σ n (x)(B) = Z

σ n+1 (x, y)(B) σ n (x)(dy) for all B ∈ B, all n ≥ 1 and P σ -almost all x ∈ S n .

2.2. Stable convergence. Stable convergence is a strong form of convergence in distribution. In a sense, it is intermediate between the latter and convergence in probability.

A kernel on S (or a random probability measure on S) is a map K : Ω → P such that ω 7→ K(ω)(B) is A-measurable for fixed B ∈ B. Say that Y n converges stably to K, where K is a kernel on S, if

P Y n ∈ · | H  → E K(·) | H weakly for all H ∈ A with P (H) > 0.

In particular, if Y n → K stably, then Y n converges in distribution to the probability measure E K(·) (just let H = Ω). Further, given any random variable Y : Ω → S, it is not hard to see that Y n

−→ Y if and only if Y P n converges stably to the kernel K = δ Y .

Let N (0, b) denote the one-dimensional Gaussian law with mean 0 and variance b ≥ 0 (where N (0, 0) = δ 0 ). Then, N (0, L) is a kernel on R provided L is a real non- negative random variable on (Ω, A, P ). The next corollary provides conditions for stable convergence toward a kernel of this type. It is a straightforward consequence of [7, Theorem 1].

Corollary 2. Fix a bounded measurable function f : S → R and define M n = 1

n

n

X

i=1

f (Y i ) and Z n = Ef (Y n+1 ) | F n .

Suppose (Y n ) c.i.d. and denote by V the a.s. limit of M n (or, equivalently, the a.s.

limit of Z n ). Suppose also that

(a) 1 n Emax 1≤k≤n k |Z k−1 − Z k | −→ 0, (b) 1 n P n

k=1 f (Y k ) − Z k−1 + k(Z k−1 − Z k ) 2 P

−→ F ,

(7)

(c) √

n E sup k≥n |Z k−1 − Z k | −→ 0, (d) n P

k≥n (Z k−1 − Z k ) 2 −→ G, P

where F and G are real nonnegative random variables. Then,

√ n (M n − Z n ) → N (0, F ) stably and

√ n (M n − V ) → N (0, F + G) stably.

Proof. Just note that the sequence (f (Y n ) 2 ) is uniformly integrable (for f is bounded) and

E(Z n+1 | F n ) = Ef (Y n+2 ) | F n = Ef (Y n+1 ) | F n = Z n a.s.

since (Y n ) is c.i.d. Hence, it suffices to apply [7, Theorem 1].  3. Results

We begin with introducing a sequence (Y n : n ≥ 1) of S-valued random variables whose predictive distributions agree with (2).

Fix σ 0 ∈ P, a constant q 0 ∈ [0, 1] and the measurable functions q n : S n → [0, 1], n ≥ 1. Moreover, on some probability space (Ω, A, P ), take random variables (T n : n ≥ 1) and (U i,j : j ∈ N, 1 ≤ i ≤ j) such that

• (T n ) is an i.i.d. sequence of S-valued random variables with T 1 ∼ σ 0 ;

• (U i,j ) is an i.i.d. array of [0, 1]-valued random variables with U 1,1 uniformly distributed on [0, 1];

• (T n ) is independent of (U i,j ).

Next, define (Y n ) as follows. Let Y 1 = T 1 . At step 2, let Q 1 = q 0 and define Y 2 = T 2 or Y 2 = Y 1 according to whether U 1,1 ≤ Q 1 or U 1,1 > Q 1 . At step n + 1, after Y 1 , . . . , Y n have been defined, let

Q i+1 = q i (Y 1 , . . . , Y i ) for i = 0, . . . , n − 1 and then define

Y n+1 = T n+1 if U i,n ≤ Q i for all i,

Y n+1 = Y i if U i,n > Q i and U j,n ≤ Q j for some i and all j > i.

The predictive distributions of (Y n ) are actually given by (2). Recall that F 0 = {∅, Ω} and F n = σ(Y 1 , . . . , Y n ).

Lemma 3. Y 1 ∼ σ 0 and P Y n+1 ∈ · | F n  = σ 0

n

Y

i=1

Q i + δ Y

n

(1 − Q n ) +

n−1

X

i=1

δ Y

i

(1 − Q i )

n

Y

j=i+1

Q j

a.s. for each n ≥ 1.

Proof. It is clear that Y 1 = T 1 ∼ σ 0 . Fix n ≥ 1, B ∈ B, and let

G n = σ(Y 1 , . . . , Y n , U 1,n , . . . , U n,n ), A n = {U i,n ≤ Q i for all i}.

(8)

Since A n ∈ G n and T n+1 is independent of G n , P A n ∩ {Y n+1 ∈ B} | F n  = E 

1 A

n

P T n+1 ∈ B | G n  | F n



= σ 0 (B) P (A n | F n ) = σ 0 (B)

n

Y

i=1

Q i a.s.

Similarly,

P U n,n > Q n , Y n+1 ∈ B | F n  = 1 B (Y n ) P U n,n > Q n | F n  = δ Y

n

(B) (1 − Q n ) a.s.

Finally, if i < n and A i,n = {U i,n > Q i and U j,n ≤ Q j for j = i + 1, . . . , n}, one obtains

P A i,n ∩ {Y n+1 ∈ B} | F n  = 1 B (Y i ) P A i,n | F n



= δ Y

i

(B) (1 − Q i )

n

Y

j=i+1

Q j a.s.

 One consequence of Lemma 3 is that

P 

(Y 1 , Y 2 , . . .) ∈ · 

= P σ 

(X 1 , X 2 , . . .) ∈ · 

where the strategy σ is given by (2). Since (X n ) is c.i.d. under P σ (by [10]) it follows that (Y n ) is c.i.d. as well. More importantly, to fix the asymptotic behavior of (X n ) under P σ , we may work with (Y n ).

Our first result is the following.

Theorem 4. If α ≤ Q n ≤ β a.s. for each n, where 0 < α ≤ β < 1 are constants, then Y n converges a.s.

Proof. Since S is a Borel subset of a Polish space, each probability measure on B is tight. Hence, by [5, Theorem 2.2], it suffices to show that f (Y n ) converges a.s.

for each bounded continuous function f : S → R.

Fix a bounded continuous f : S → R and define ∆ m = Ef (Y m+1 ) | F m − f (Y m ). Then,

m+1 Q m+1

= Ef (Y m+2 ) | F m+1 − f (Y m+1 ) Q m+1

= Z

f dσ 0 m

Y

i=1

Q i + f (Y m )(1 − Q m ) +

m−1

X

i=1

f (Y i )(1 − Q i )

m

Y

j=i+1

Q j − f (Y m+1 )

= Ef (Y m+1 ) | F m − f (Y m+1 ) = ∆ m + f (Y m ) − f (Y m+1 ).

Summing over m = 1, . . . , n,

n+1 /Q n+1 +

n−1

X

m=1

m+1 /Q m+1 =

n

X

m=1

m + f (Y 1 ) − f (Y n+1 ) or equivalently

n

X

m=2

m 1/Q m − 1 = −∆ n+1 /Q n+1 + ∆ 1 + f (Y 1 ) − f (Y n+1 ).

(9)

Next, since (Y n ) is c.i.d. and Q j is F j−1 -measurable, then E n

∆ i

1

Q i − 1 ∆ j

1 Q j − 1  o

= E n

i 1 Q i

− 1  1 Q j

− 1 E(∆ j | F j−1 ) o

= 0 for all i < j.

Therefore,

E n X n

m=2

∆ m

1 Q m

− 1   2 o

=

n

X

m=2

E n

2 m 1 Q m

− 1  2 o . Further, since α ≤ Q m ≤ β a.s., one obtains

(1 − β) 2 β 2

n

X

m=2

E(∆ 2 m ) ≤

n

X

m=2

E n

2 m 1 Q m

− 1  2 o

= E n X n

m=2

∆ m

1 Q m

− 1   2 o

= E n

− ∆ n+1

Q n+1 + ∆ 1 + f (Y 1 ) − f (Y n+1 )  2 o

≤  2 sup|f |

α + 4 sup|f |  2

. Hence, E P ∞

n=2 ∆ 2 n = P n=2 E(∆ 2 n ) < ∞, so that ∆ n −→ 0. To conclude the a.s.

proof, just recall that Ef (Y n+1 ) | F n a.s.

−→ V for some real random variable V ; see Subsection 2.1. Therefore, f (Y n ) −→ V . a.s.  Incidentally we note that, as apparent from the previous proof, the assumption Q n ≥ α a.s. for all n can be weakened into lim inf n E(Q −2 n ) < ∞. We also note that, when the strategy σ is given by (3) (i.e., when Q n = q for all n and some constant 0 < q < 1) Theorem 4 implies that Y n converges a.s.

In Theorem 4, the Q n are separated from 0 and 1 and Y n converges a.s. Things change drastically if Q n approaches 1 quickly enough.

Theorem 5. Y n does not converge in probability provided σ 0 is nondegenerate, Q n > 0 for all n and P

n (1 − Q n ) < ∞ a.s.

Proof. Let d be the distance on S. It suffices to show that d(Y n , Y n+1 ) does not converge to 0 in probability. Since σ 0 is nondegenerate, there is  > 0 such that P d(T 1 , T 2 ) >  is strictly positive. Define

H n = {U i,n ≤ Q i for each i ≤ n and U i,n−1 ≤ Q i for each i < n}.

Since (Q 1 , . . . , Q n ) is a function of (Y 1 , . . . , Y n−1 ), then (T n , T n+1 ) is independent of H n . Hence,

P 

d(Y n , Y n+1 ) >  

≥ P 

H n ∩ {d(T n , T n+1 ) > } 

= P d(T 1 , T 2 ) >  P (H n )

= P d(T 1 , T 2 ) >  E n Y n

i=1

Q i

n−1

Y

i=1

Q i o

.

(10)

Finally, Q n > 0 for all n and P

n (1 − Q n ) < ∞ a.s. implies that Q n i=1 Q i

−→ Q, a.s.

where Q is a random variable such that Q > 0 a.s. Therefore, lim inf

n P 

d(Y n , Y n+1 ) >  

≥ P d(T 1 , T 2 ) >  E(Q 2 ) > 0.

 We finally turn to the CLT. Fix a bounded measurable function f : S → R and define

M n = 1 n

n

X

i=1

f (Y i ) and Z n = Ef (Y n+1 ) | F n . Since (Y n ) is c.i.d., there is a real random variable V such that

M n

−→ V a.s. and Z n

−→ V. a.s.

Our last result deals with C n = √

n (M n − Z n ) and W n = √

n (M n − V ).

Indeed, both C n and W n are often involved in the CLT for dependent data; see e.g. [4], [6], [7], [8]. Note also that, in the special case where (Y n ) is i.i.d. (namely, when Q n = 1 for all n) one obtains C n = W n = √

n M n − E(f (Y 1 )) . Theorem 6. Suppose P

n 1 − E(Q n ) < ∞ and Q n ≤ Q n+1 a.s. for all n. Then, for each bounded measurable function f : S → R, one obtains

C n → N (0, L) stably and W n → N (0, L) stably where

L = V − V 2 with V = lim

n

1 n

n

X

i=1

f 2 (Y i ) a.s.

Proof. By Corollary 2, it suffices to prove conditions (a)-(d) with F = V − V 2 and G = 0.

First note that

Q n Z n−1 = Z n − f (Y n )(1 − Q n ) a.s.

Letting c = 2 sup|f |, it follows that

|Z n − Z n−1 | = (1 − Q n ) |f (Y n ) − Z n−1 | ≤ c (1 − Q n ) a.s.

Since Q n ≤ Q n+1 a.s. for all n, one also obtains

n 1 − E(Q n ) = j 1 − E(Q n ) + (n − j) 1 − E(Q n )

≤ j 1 − E(Q n ) +

n

X

i=j+1

1 − E(Q i )

for each j < n.

Hence, P

n 1 − E(Q n ) < ∞ implies lim sup

n

n 1 − E(Q n ) = 0.

We next prove conditions (a) and (c). As to (c),

√ n Esup

k≥n

|Z k−1 − Z k | ≤ c √

n Esup

k≥n

(1 − Q k ) = c √

n E(1 − Q n ) → 0.

(11)

As to (a), since

n |Z n−1 − Z n | ≤ c n (1 − Q n ) ≤ c

n

X

i=1

(1 − Q i ) a.s., one obtains

√ 1

n E max

1≤k≤n k |Z k−1 − Z k | ≤ c

√ n E n X n

k=1

(1 − Q k ) o

≤ c

√ n

X

k=1

1 − E(Q k ) → 0.

It remains to prove conditions (b) and (d) with F = V − V 2 and G = 0. On the other hand, since

1 n

n

X

k=1

f (Y k ) − Z k−1 2 a.s.

−→ V − V 2 ,

conditions (b) and (d) are actually true with F = V − V 2 and G = 0 provided n (Z n − Z n−1 ) −→ 0. a.s.

Since

E nX

n

(1 − Q n ) o

= X

n

1 − E(Q n ) < ∞, then P

n (1 − Q n ) < ∞ a.s. Hence, arguing as above, n (1 − Q n ) ≤ j (1 − Q n ) +

X

i=j+1

(1 − Q i ) a.s. for each j < n.

Therefore,

lim sup

n

n |Z n − Z n−1 | ≤ c lim sup

n

n (1 − Q n ) = 0 a.s.

and this concludes the proof. 

References

[1] Airoldi E.M., Thiago Costa, Bassetti F., Leisen F., Guindani M. (2014) Gener- alized species sampling priors with latent beta reinforcements, J.A.S.A., 109, 1466-1480.

[2] Bassetti F., Crimaldi I., Leisen F. (2010) Conditionally identically distributed species sampling sequences, Adv. in Appl. Probab., 42, 433-459.

[3] Berti P., Regazzini E., Rigo P. (1997) Well-calibrated, coherent forecasting systems, Theory Probab. Appl., 42, 82-102.

[4] Berti P., Pratelli L., Rigo P. (2004) Limit theorems for a class of identically distributed random variables, Ann. Probab., 32, 2029-2052.

[5] Berti P., Pratelli L., Rigo P. (2006) Almost sure weak convergence of ran- dom probability measures, Stochastics (formerly: Stochastics and Stochastic Reports), 78, 91-97.

[6] Berti P., Crimaldi I., Pratelli L., Rigo P. (2009) Rate of convergence of predic- tive distributions for dependent data, Bernoulli, 15, 1351-1367.

[7] Berti P., Crimaldi I., Pratelli L., Rigo P. (2011) A central limit theorem and

its applications to multicolor randomly reinforced urns, J. Appl. Probab., 48,

527-546.

(12)

[8] Berti P., Pratelli L., Rigo P. (2012) Limit theorems for empirical processes based on dependent data, Electronic J. Probab., 17, 1-18.

[9] Berti P., Pratelli L., Rigo P. (2013) Exchangeable sequences driven by an absolutely continuous random measure, Ann. Probab., 41, 2090-2102.

[10] Berti P., Dreassi E., Pratelli L., Rigo P. (2019) A predictive ap- proach to Bayesian nonparametrics, submitted, currently available at:

http://local.disia.unifi.it/dreassi/mg.pdf

[11] Cassese A., Zhu W., Guindani M., Vannucci M. (2019) A Bayesian nonpara- metric spiked process prior for dynamic model selection, Bayesian Analysis, 14, 553-572.

[12] Cifarelli D.M., Regazzini E. (1996) De Finetti’s contribution to probability and statistics, Statist. Science, 11, 253-282.

[13] de Finetti B. (1937) La prevision: ses lois logiques, ses sources subjectives, Ann. Inst. H. Poincare, 7, 1-68.

[14] Dubins L.E., Savage L.J. (1965) How to gamble if you must: Inequalities for stochastic processes, McGraw Hill.

[15] Fortini S., Ladelli L., Regazzini E. (2000) Exchangeability, predictive distribu- tions and parametric models, Sankhya A, 62, 86-109.

[16] Fortini S., Petrone S. (2012) Predictive construction of priors in Bayesian non- parametrics, Brazilian J. Probab. Statist., 26, 423-449.

[17] Fortini S., Petrone S., Sporysheva P. (2018) On a notion of partially condi- tionally identically distributed sequences, Stoch. Proc. Appl., 128, 819-846.

[18] Hahn P.R., Martin R., Walker S.G. (2018) On recursive Bayesian predictive distributions, J.A.S.A., 113, 1085-1093.

[19] Hill B.M. (1993) Parametric models for A n : splitting processes and mixtures, J. Royal Stat. Soc. B, 55, 423-433.

[20] Kallenberg O. (1988) Spreading and predictable sampling in exchangeable se- quences and processes, Ann. Probab., 16, 508-534.

[21] Martin R., Tokdar S.T. (2011) Semiparametric inference in mixture models with predictive recursion marginal likelihhod, Biometrika, 98, 567-582.

[22] Newton M.A., Zhang Y. (1999) A recursive algorithm for nonparametric anal- ysis with missing data, Biometrika, 86, 15-26.

[23] Tokdar S.T., Martin R., Ghosh J.K. (2009) Consistency of a recursive estimate of mixing distributions, Ann. Statist., 37, 2502-2522.

Patrizia Berti, Dipartimento di Matematica Pura ed Applicata ”G. Vitali”, Univer- sita’ di Modena e Reggio-Emilia, via Campi 213/B, 41100 Modena, Italy

E-mail address: patrizia.berti@unimore.it

Emanuela Dreassi, Dipartimento di Statistica, Informatica, Applicazioni “G. Par- enti”, Universita’ di Firenze, viale Morgagni 59, 50134 Firenze, Italy

E-mail address: emanuela.dreassi@unifi.it

Luca Pratelli, Accademia Navale, viale Italia 72, 57100 Livorno, Italy E-mail address: pratel@mail.dm.unipi.it

Pietro Rigo (corresponding author), Dipartimento di Matematica ”F. Casorati”, Uni- versita’ di Pavia, via Ferrata 1, 27100 Pavia, Italy

E-mail address: pietro.rigo@unipv.it

Riferimenti

Documenti correlati

sense that when a superlinear condition at infinity is assumed, then we have an unbounded sequence of solutions, while when we require a sublinear behaviour near zero we find

Map indicating the line of Venice’s hypothetical city walls (Image by University of Ca’ Foscari, Venice)... Castles on the Water? Defences in Venice and Comacchio during the

They regard the way, and the speed Nearchus moved from the mouth of the Indus to Las Bela, and a few data regarding the way Alexander crossed the Hab river, and probably camped

[r]

On one hand, we shall discuss the existence of hyperelliptic curves lying on the Jacobian variety - and a fortiori on the second symmetric product - of a generic curve by extending

Keywords: Communication studies, media studies, maintenance, communication infrastructures, transportation, persistence, power, materiality.. Maintenance theory: from

We study the nonlinear Neumann problem (1) involving a critical Sobolev expo- nent and a nonlinearity of lower order... The assumption (f 1 ) replaces the usual

The H-tagged jet control region includes events from the H1b or H2b categories, split according to lepton flavor and number of b-tagged jets (0 or ≥1) that do not overlap any