MODEL WITH RANDOM WEIGHTS
PATRIZIA BERTI, IRENE CRIMALDI, LUCA PRATELLI, AND PIETRO RIGO
Abstract. The three-parameter Indian buffet process is generalized. The possibly different role played by customers is taken into account by suitable (random) weights. Various limit theorems are also proved for such generalized Indian buffet process. Let L
nbe the number of dishes experimented by the first n customers, and let K
n= (1/n) P
ni=1
K
iwhere K
iis the number of dishes tried by customer i. The asymptotic distributions of L
nand K
n, suitably centered and scaled, are obtained. The convergence turns out to be stable (and not only in distribution). As a particular case, the results apply to the standard (i.e., non generalized) Indian buffet process.
1. Introduction
Let (X , B) be a measurable space. Think of X as a collection of features po- tentially shared by an object. Such an object is assumed to have a finite number of features only and is identified with the features it possesses. To investigate the object, thus, we focus on the finite subsets of X .
Each finite subset B ⊂ X can be associated to the measure µ B = P
x∈B δ x , where µ ∅ = 0 and δ x denotes the point mass at x. If B is random, µ B is random as well. In fact, letting F = {µ B : B finite}, there is a growing literature focusing on those random measures M satisfying M ∈ F a.s. See [9] and most references quoted below in this section.
A remarkable example is the Indian Buffet Process (IBP) introduced by Griffiths and Ghahramani and developed by Thibaux and Jordan; see [17], [18], [32]. The objects are the customers which sequentially enter an infinite buffet X and the features are the dishes tasted by each customer. In this framework, each customer is modeled by a (completely) random measure M such that M ∈ F a.s. The atoms of M represent the dishes experimented by the customer.
Our starting point is a three-parameter extension of IBP, referred to as standard IBP in the sequel, introduced in [9] and [31] to obtain power-law behavior. Fix α > 0, β ∈ [0, 1) and c > −β. Here, α is the mass parameter, β the discount parameter (or stability exponent) and c the concentration parameter. Also, let Poi(λ) denote the Poisson distribution with mean λ ≥ 0, where Poi(0) = δ 0 . The dynamics of the standard IBP is as follows. Customer 1 tries Poi(α) dishes. For each n ≥ 1, let S n be the collection of dishes experimented by the first n customers.
Then:
1991 Mathematics Subject Classification. 60B10, 60F05, 60G09, 60G57, 62F15.
Key words and phrases. Bayesian nonparametrics, Central limit theorem, Conditional iden- tity in distribution, Indian buffet process, Random measure, Random reinforcement, Stable convergence.
1
− Customer n+1 selects a subset S n ∗ ⊂ S n . Each x ∈ S n is included or not into S n ∗ independently of the other members of S n . The inclusion probability is
P n
i=1 M i {x} − β c + n
where M i {x} is the indicator of the event {customer i selects dish x}.
− In addition to S n ∗ , customer n + 1 also tries Poi(λ n ) new dishes, where λ n = α Γ(c+1)Γ(c+β+n)
Γ(c+β)Γ(c+1+n) .
For β = 0, such a model reduces to the original IBP of [17], [18], [32].
IBP is a flexible tool, able to capture the dynamics of various real problems. In addition, IBP is a basic model in Bayesian nonparametrics; see [14] and [20]. In factor analysis, for instance, IBP works as an infinite-capacity prior over the space of latent factors; see [20]. In this way, the number of factors is not specified in advance but is inferred from the data. Such a number is also allowed to grow as new data points are observed. Among the other possible applications of IBP, we mention causal inference [34], modeling of choices [16], similarity judgements [25]
and dyadic data [22].
Despite its prominent role, however, the asymptotics of IBP is largely neglected.
To the best of our knowledge, the only known fact is the a.s. behavior of L n (defined below) and some other related quantities for large n; see [9] and [31]. Nothing is known as regards limiting distributions.
This paper aims to do two things.
First, to generalize the standard IBP. Indeed, the discount parameter β is allowed to take values in (−∞, 1) rather than in [0, 1). More importantly, the possible different relevance of customers is taken into account by random weights. Let R n > 0 be the weight attached to customer n. Then, for each x ∈ S n , the inclusion probability becomes
P n
i=1 R i M i {x} − β c + P n
i=1 R i
.
Similarly, the new dishes tried by customer n + 1 are now Poi(Λ n ) rather than Poi(λ n ), where Λ n = α Γ(c+1)Γ(c+β+ Pn
i=1
R
i) Γ(c+β)Γ(c+1+ P
ni=1
R
i) . If β ∈ [0, 1) and R n = 1 for all n, the model reduces to the standard IBP.
Second, to investigate the asymptotics of the previous generalized IBP model.
We focus on
L n = number of dishes experimented by the first n customers, and K n = 1
n
n
X
i=1
K i where K i = number of dishes tried by customer i.
Three results are obtained. Define a n (β) = log n if β = 0 and a n (β) = n β if β ∈ (0, 1). Then, under some conditions on the weights R n (see Theorems 4, 5, 8) it is shown that:
(i) If β ∈ [0, 1), then a Ln
n
(β)
−→ λ where λ > 0 is a certain constant; a.s.
(ii) If β ∈ [0, 1), then pa n (β) n
L
na
n(β) − λ o
−→ N 0, λ stably;
(iii) If β < 1/2, then K n
−→ Z and a.s.
√ n K n − Z −→ N 0, σ 2
stably,
√ n K n − E K n+1 | F n −→ N 0, τ 2
stably,
where Z, σ 2 , τ 2 are suitable random variables and F n is the sub-σ-field induced by the available information at time n.
Stable convergence is a strong form of convergence in distribution. The basic defi- nition is recalled in Subsection 2.3. Further, N (0, a) denotes the Gaussian law with mean 0 and variance a ≥ 0, where N (0, 0) = δ 0 .
Among other things, the above results can be useful to make (asymptotic) infer- ence on the model. As an example, suppose β ∈ [0, 1). In view of (i),
β b n = log L n
log n
is a strongly consistent estimator of β for each β ∈ [0, 1). In turn, (ii) provides the limiting distribution of b β n so that simple tests on β can be manufactured. Similarly, if β < 1/2, asymptotic confidence bounds for the random limit Z of K n can be obtained by (iii); see Subsection 5.1.
Note also that, because of (iii), the convergence rate of K n − E K n+1 | F n is at least n −1/2 . Therefore, K n is a good predictor of K n+1 for large n and β < 1/2;
see Subsection 5.1 again.
The results in (i)-(ii)-(iii) hold in particular if R n = 1 for all n. Thus, (ii) and (iii) provide the limiting distributions of L n and K n in the standard IBP model.
Furthermore, in this case, (iii) holds for all β < 1 and not only for β < 1/2.
We close this section with some remarks on β and the R n .
The discount parameter β. Roughly speaking, if β < 0, the inclusion prob- abilities are larger and the chances of tasting new dishes vanish very quickly; see Lemma 2. Define in fact
L = sup
n
L n = card{x ∈ X : x is tried by some customer}.
Because of (i), L n increases logarithmically if β = 0 while exhibits a power-law behavior if β ∈ (0, 1). Accordingly, L = ∞ a.s. if β ∈ [0, 1). On the contrary,
E e L < ∞ if β < 0;
see Lemma 3. In particular, β < 0 implies L < ∞ a.s., and this fact can be helpful to describe some real situations.
Formally, the model studied in this paper makes sense whenever R n > max(β, 0) for all n. Hence, one could also admit β ≥ 1. However, β = 1 leads to trivialities.
Instead, β > 1 could be potentially interesting, but it is hard to unify the latter case and β < 1. Thus, β > 1 will be investigated in a forthcoming paper.
Unless R n = 1 for all n, the results in (iii) are available for β < 1/2 only.
Certainly, (iii) can fail if β ∈ [1/2, 1). Perhaps, some form of (iii) holds even if β ∈ [1/2, 1), up to replacing √
n with some other norming constant and N 0, σ 2 and N 0, τ 2 with some other limit kernels. But we did not investigate this issue.
A last note is that β plays an analogous role to that of the discount parameter in
the two-parameter Poisson-Dirichlet process; see e.g. [3], [27] and [28]. Indeed, such
parameter regulates the asymptotic behavior of the number of distinct observed values, in the same way as β does for L n .
The weights R n . Standard IBP has been generalized in various ways, mainly focusing on computational issues; see e.g. [13], [15], [23], [33]. In this paper, the possible need of distinguishing objects according to some associated random factor is dealt with. To this end, customer n is attached a random weight R n . Indeed, it may be that different customers have different importance, due to some random cause, that does not affect their choices but is relevant to the choices of future customers. Analogous models occur in different settings, for instance in connection with P´ olya urns and species sampling sequences; see [2], [3], [4], [5], [6], [26].
The model investigated in this paper, referred to as “weighted” IBP in the se- quel, generally applies to evolutionary phenomena. In a biological framework, for instance, a new born exhibits some features in common with the existing units with a probability depending on the latter’s weights (reproductive power, ability of adapting to new environmental conditions or to compete for finite resources, and so on). The new born also presents some new features that, in turn, will be trans- mitted to future generations with a probability depending on his/her weight. See e.g. [8] and [29].
Similar examples arise in connection with the evolution of language; see e.g.
[12]. A neologism (i.e., a newly coined term, word, phrase or concept) is often directly attributable to a specific people (or journal, period, event and so on) and its diffusion depends on the importance of such a people. For instance, suppose we are given a sample of journals of the same type (customers) during several years. Each journal uses words (dishes), some of which have been previously used while some others are new. A word appearing for the first time in a journal has a probability of being reused which depends on the importance of the journal at issue.
Other applications of the weighted IBP could be found in Bayesian nonpara- metrics. Standard IBP is widely used as a prior on binary matrices with a fixed finite number of rows and infinitely many columns (rows correspond to objects and columns to features). The weighted IBP can be useful in all those settings where customers arrive sequentially. As an example, some dynamic networks present a competitive aspect, and not all nodes are equally successful in acquiring links. Sup- pose the network evolves in time, a node (customer) is added at every time step, and some links are created with some of the existing nodes. The different ability of competing for links is modeled by a weight attached to each node; see e.g. [7].
Following [24] and [30], each node could be described by a set of binary features (dishes) and the probability of a link is a function of the features of the involved nodes. A nonparametric latent feature model could be assessed at every time step, with the weighted IBP as a prior on the feature matrix.
A last remark concerns the probability distribution of the sequence (M n ), where M n is the random measure corresponding to customer n. Because of the weights, unlike the standard IBP, (M n ) can fail to be exchangeable. Thus, the usual ma- chinery of Bayesian nonparametrics can not be automatically implemented, due to the lack of exchangeability, and this can create some technical drawbacks. On the other hand, the exchangeability assumption is often untenable in applications.
In such cases, the weighted IBP is a realistic alternative to the standard IBP. We
finally note that, when β = 0, (M n ) satisfies a weak form of exchangeability, known as conditional identity in distribution; see Subsection 2.4 and Lemma 1.
2. Preliminaries
2.1. Basic notation. Throughout, X is a separable metric space and B the Borel σ-field on X . We let
M = {µ : µ is a finite positive measure on B}
and we say that µ ∈ M is diffuse in case µ{x} = 0 for all x ∈ X .
All random variables appearing in this paper, unless otherwise stated, are defined on a fixed probability space (Ω, A, P ). If G ⊂ A is a sub-σ-field and X and Y are random variables with values in the same measurable space, we write
X | G ∼ Y | G
to mean that P (X ∈ A | G) = P (Y ∈ A | G) a.s. for each measurable set A.
2.2. Random measures. A random measure (r.m.) is a map M : Ω → M such that ω 7→ M (ω)(B) is A-measurable for each B ∈ B. In the sequel, we write M (B) to denote the real random variable ω 7→ M (ω)(B). Similarly, if f : X → R is a bounded measurable function, M (f ) stands for
M (ω)(f ) = Z
f (x) M (ω)(dx).
A completely r.m. is a r.m. M such that M (B 1 ), . . . , M (B k ) are independent random variables whenever B 1 , . . . , B k ∈ B are pairwise disjoint.
Let ν ∈ M. A Poisson r.m. with intensity ν is a completely r.m. M such that M (B) ∼ Poi ν(B) for all B ∈ B. Note that M (B) = 0 a.s. in case ν(B) = 0.
Note also that the intensity ν has been requested to be a finite measure (and not a σ-finite measure as it usually happens).
We refer to chapter VI of [10] for Poisson r.m.’s. We just note that a Poisson r.m. with intensity ν is easily obtained. It suffices to let M = 0 if ν(X ) = 0, and otherwise
M = I {N >0}
N
X
j=1
δ Xj,
where (X j ) is an i.i.d. sequence of X -valued random variables with X 1 ∼ ν/ν(X ), N is independent of (X j ) and N ∼ Poi ν(X ).
As in Section 1, let F = {µ B : B finite} where µ ∅ = 0 and µ B = P
x∈B δ x . Since X is separable metric and B the Borel σ-field, the set {M ∈ F } belongs to A for every r.m. M . In this paper, we focus on those r.m.’s M satisfying M ∈ F a.s. If M is a Poisson r.m. with intensity ν, then M ∈ F a.s. if and only if ν is diffuse.
Therefore, another class of r.m.’s is to be introduced.
Each ν ∈ M can be uniquely written as ν = ν c + ν d , where ν c is diffuse and ν d = X
j
γ j δ xj
for some γ j ≥ 0 and x j ∈ X . (The case ν d = 0 corresponds to γ j = 0 for all j). Say that M is a Bernoulli r.m. with hazard measure ν, where ν ∈ M, if
• M = M 1 + M 2 with M 1 and M 2 independent r.m.’s;
• M 1 is a Poisson r.m. with intensity ν c ;
• M 2 = P
j V j δ x
jwhere the V j are independent indicators satisfying P (V j = 1) = γ j .
Some (obvious) consequences of the definition are the following.
− For each B ∈ B, EM (B)} = ν(B) and EM (B) 2 } = ν(B) + ν(B) 2 − X
x∈B
ν{x} 2 ;
− M = M 1 a.s. if ν = ν c and M = M 2 a.s. if ν = ν d ;
− M is a completely r.m.;
− M ∈ F a.s.
We will write
M ∼ BeP (ν)
to mean that M is a Bernoulli r.m. with hazard measure ν.
2.3. Stable convergence. Stable convergence is a strong form of convergence in distribution. We just recall the basic definition and we refer to [11], [19] and references therein for more information.
A r.m. K such that K(ω)(X ) = 1, for all ω ∈ Ω, is said to be a kernel or a random probability measure. Let K be a kernel and (X n ) a sequence of X -valued random variables. Say that X n converges stably to K if
EK(f ) | H = lim
n Ef (X n ) | H
for all H ∈ A with P (H) > 0 and all bounded continuous f : X → R. (Recall that A denotes the basic σ-field on Ω). For H = Ω, stable convergence trivially implies convergence in distribution.
2.4. Conditionally identically distributed sequences. Let (X n : n ≥ 1) be a sequence of random variables (with values in any measurable space) adapted to a filtration (U n : n ≥ 0). Say that (X n ) is conditionally identically distributed (c.i.d.) with respect to (U n ) in case
X k | U n ∼ X n+1 | U n for all k > n ≥ 0.
Roughly speaking this means that, at each time n ≥ 0, the future observations (X k : k > n) are identically distributed given the past U n . If U 0 = {∅, Ω} and U n = σ(X 1 , . . . , X n ), the filtration (U n ) is not mentioned at all and (X n ) is just called c.i.d. Note that X k ∼ X 1 for all k ≥ 1 whenever (X n ) is c.i.d.
The c.i.d. property is connected to exchangeability. Indeed, (X n ) is exchange- able if and only if it is stationary and c.i.d., and the asymptotic behavior of c.i.d.
sequences is quite close to that of exchangeable ones. We refer to [4] for details.
3. The model
Let (M n : n ≥ 1) be a sequence of r.m.’s and (R n : n ≥ 1) a sequence of real random variables. The probability distribution of ((M n , R n ) : n ≥ 1) is identified by the parameters m, α, β and c as follows.
• m is a diffuse probability measure on B;
• α, β, c are real numbers such that α > 0, β < 1 and c > −β;
• R n independent of (M 1 , . . . , M n , R 1 , . . . , R n−1 ) and R n ≥ u > max(β, 0),
for some constant u and each n ≥ 1;
• M n+1 | F n ∼ BeP (ν n ) for all n ≥ 0, where
F 0 = {∅, Ω}, ν 0 = α m, F n = σ(M 1 , . . . , M n , R 1 , . . . , R n ), ν n = X
x∈S
nP n
i=1 R i M i {x} − β P n
i=1 R i + c δ x + Γ(c + 1)Γ(c + β + P n i=1 R i ) Γ(c + β)Γ(c + 1 + P n
i=1 R i ) α m and S n = {x ∈ X : M i {x} = 1 for some i = 1, . . . , n}.
Our model is the sequence ((M n , R n ) : n ≥ 1). It reduces to the standard IBP in case β ∈ [0, 1) and R n = 1 for all n. Note that M 1 is a Poisson r.m. with intensity α m. Note also that M n ∈ F a.s. for all n ≥ 1, so that
S n =
n
[
i=1
Support(M i ) a.s.
Formally, for such a model to make sense, β can be taken to be any real number satisfying R n > max(β, 0) for all n. For the reasons explained in Section 1, however, in this paper we focus on β < 1. We also assume R n ≥ u, for all n and some constant u > max(β, 0), as a mere technical assumption. In the sequel, we let
Λ 0 = α and Λ n = α Γ(c + 1)Γ(c + β + P n i=1 R i ) Γ(c + β)Γ(c + 1 + P n
i=1 R i ) . In this notation, the diffuse part of ν n can be written as Λ n m.
As remarked in Section 1, R n should be regarded as the weight of customer n.
Thus, the possibly different role played by each customer can be taken into account.
Apart from the possible negative values of β, the parameters m, α, β and c have essentially the same meaning as in the standard IBP. The probability measure m allows to draw, at each step n ≥ 1, an i.i.d. sample of new dishes. In fact, m X \ S n
= 1 a.s. for m is diffuse and S n finite a.s. The mass parameter α controls the total number of tried dishes. The concentration parameter c tunes the number of customers which try each dish. The discount parameter β has been discussed in Section 1.
A r.m. can be seen as a random variable with values in (M, Σ), where Σ is the σ-field on M generated by the maps µ 7→ µ(B) for all B ∈ B. In the standard IBP case, (M n ) is an exchangeable sequence of random variables. Now, because of the R n , exchangeability is generally lost. In fact, the same phenomenon (loss of exchangeability) occurs in various other extensions of IBP; see [13], [15], [23], [33].
However, under some conditions, (M n ) is c.i.d. with respect to the filtration G 0 = {∅, Ω}, G n = F n ∨ σ(R n+1 ) = σ(M 1 , . . . , M n , R 1 , . . . , R n , R n+1 ).
We next prove this fact. The c.i.d. property has been recalled in Subsection 2.4.
Lemma 1. (M n ) is c.i.d. with respect to (G n ) if and only if Λ n+1 = Λ n
1 − R n+1 − β c + P n+1
i=1 R i
a.s. for all n ≥ 0.
(1)
In particular, (M n ) is c.i.d. with respect to (G n ) if β = 0 or if R n = 1 for all n ≥ 1.
(In these cases, in fact, condition (1) is trivially true).
Proof. We just give a sketch of the proof. Suppose
M n+2 (B) | G n ∼ M n+1 (B) | G n for each n ≥ 0 and B ∈ B.
(2)
Conditionally on G n , the r.m.’s M n+1 and M n+2 are both completely r.m.’s. Hence, condition (2) implies
M n+2 | G n ∼ M n+1 | G n for each n ≥ 0.
In turn, given n ≥ 0 and A ∈ Σ, the previous condition yields P M n+3 ∈ A | G n = E n
P M n+3 ∈ A | G n+1 | G n
o
= E n
P M n+2 ∈ A | G n+1 | G n
o
= P M n+2 ∈ A | G n = P M n+1 ∈ A | G n
a.s.
Hence, M n+3 | G n ∼ M n+1 | G n for each n ≥ 0. Iterating this argument, one obtains M k | G n ∼ M n+1 | G n for all k > n ≥ 0. Therefore, condition (2) is equivalent to (M n ) being c.i.d. with respect to (G n ). We next prove that (1) ⇔ (2).
Fix n ≥ 0 and B ∈ B. It can be assumed m(B) > 0. Since R n+1 is independent of (M 1 , . . . , M n , M n+1 , R 1 , . . . , R n ) then
P M n+1 ∈ A | G n = P M n+1 ∈ A | F n
a.s. for all A ∈ Σ.
Thus, for each t ∈ R,
Ee t Mn+1(B) | G n = Ee t M
n+1(B) | F n
= exp m(B) (e t − 1) Λ n
Y
x∈S
n∩B
n 1 + (e t − 1) −β + P n
i=1 R i M i {x}
c + P n i=1 R i
o a.s.
where the second equality is because M n+1 | F n ∼ BeP (ν n ). Similarly, Ee t Mn+2(B) | G n = EE e t M
n+2(B) | G n+1 | G n
= exp m(B) (e t − 1) Λ n+1 E n Y
x∈S
n+1∩B
1 + (e t − 1) −β + P n+1
i=1 R i M i {x}
c + P n+1 i=1 R i
| G n
o a.s.
Finally, after some computations, one obtains
E n Y
x∈S
n+1∩B
1 + (e t − 1) −β + P n+1
i=1 R i M i {x}
c + P n+1 i=1 R i
| G n
o
=
= exp
m(B) (e t − 1) Λ n
R n+1 − β c + P n+1
i=1 R i
Y
x∈S
n∩B
n
1 + (e t − 1) −β + P n
i=1 R i M i {x}
c + P n i=1 R i
o a.s.
Thus, condition (1) amounts to Ee t Mn+2(B) | G n = Ee t M
n+1(B) | G n a.s. for each t ∈ R, that is, conditions (1) and (2) are equivalent.
4. Asymptotic behavior of L n Let N i be the number of new dishes tried by customer i, i.e.,
N i = card(S i \ S i−1 ) with S 0 = ∅.
Note that N i is F i -measurable and N i | F i−1 ∼ Poi(Λ i−1 ).
This section is devoted to
L n = card(S n ) =
n
X
i=1
N i ,
the number of dishes experimented by the first n customers. Our main tool is the
following technical lemma.
Lemma 2. There is a function h : (0, ∞) → R such that sup
x≥c+u
|x h(x)| < ∞ and Λ n = α Γ(c + 1) Γ(c + β)
1 + h c + P n i=1 R i
c + P n
i=1 R i 1−β for all n ≥ 1.
In particular,
Λ n ≤ D
n 1−β and |Λ n+1 − Λ n
≤ D
n 2−β for all n ≥ 1, (3)
where D is a suitable constant (non random and not depending on n).
Proof. Just note that Γ(x+β) Γ(x+1) = x β−1 (1 + h(x)), with h as required, for all x > max (0, −β); see e.g. formula (6.1.47) of [1]. To prove (3), let v = min (u, c + u).
Since c + u > c + β > 0, then v > 0. Hence, (3) follows from c +
n
X
i=1
R i ≥ c + n u = c + u + (n − 1) u ≥ n v.
Let L = sup n L n be the number of dishes tried by some customer. A first consequence of Lemma 2 is that β < 0 implies L < ∞ a.s.
Lemma 3. P N i > 1−β 1 infinitely often = 0. Moreover, E e L < ∞ if β < 0.
Proof. Fix an integer k ≥ 1. Since N i+1 | F i ∼ Poi(Λ i ), P (N i+1 ≥ k) = EP (N i+1 ≥ k | F i ) = E n
e −ΛiX
j≥k
Λ j i j!
o ≤ E(Λ k i ) k! . By Lemma 2, E(Λ k i ) = O(i −(1−β)k ). Let
k = 1 + max{j ∈ Z : j ≤ 1/(1 − β)}.
Since k (1 − β) > 1, one obtains P
i P N i > 1/(1 − β) = P i P (N i ≥ k) < ∞.
Next, suppose β < 0. By Lemma 2, Λ n ≤ D n β−1 for some constant D. Letting H = (e − 1) D and noting that E e Nn+1 | F n = e Λn(e−1) a.s., one obtains
(e−1) a.s., one obtains
E e Ln+1 = Ee LnE e Nn+1 | F n = Ee Lne Λn(e−1) ≤ E e L
n e H nβ−1
E e Nn+1 | F n = Ee Lne Λn(e−1) ≤ E e L
n e H nβ−1
e Λn(e−1) ≤ E e L
n e H nβ−1
≤ E e Ln−1 e H (n−1)β−1e H nβ−1 ≤ . . . ≤ E e L1 e H Pnj=1j
β−1. Thus, β < 0 and E e L1 = E e N1 < ∞ yield
e H nβ−1 ≤ . . . ≤ E e L1 e H Pnj=1j
β−1. Thus, β < 0 and E e L1 = E e N1 < ∞ yield
e H Pnj=1j
β−1. Thus, β < 0 and E e L1 = E e N1 < ∞ yield
= E e N1 < ∞ yield
E e L = sup
n
E e Ln ≤ E e L1 e H P∞j=1j
β−1 < ∞.
e H P∞j=1j
β−1 < ∞.
In view of Lemma 3, if β < 0 there is a random index N such that L n = L N a.s. for all n ≥ N . The situation is quite different if β ∈ [0, 1). In this case, the a.s.
behavior of L n for large n can be determined by a simple martingale argument.
In the rest of this section, we let β ∈ [0, 1). Define R n = 1
n
n
X
i=1
R i
and suppose that
(4) R n −→ r a.s. for some constant r.
Since R i ≥ u for all i, then r ≥ u > 0. Define also λ(β) = α c
r if β = 0 and λ(β) = α Γ(c + 1) Γ(c + β)
1
β r 1−β if β ∈ (0, 1), a n (β) = log n if β = 0 and a n (β) = n β if β ∈ (0, 1).
Theorem 4. If β ∈ [0, 1) and condition (4) holds, then L n
a n (β)
−→ λ(β). a.s.
Proof. By Lemma 2, Λ j = α Γ(c+β) Γ(c+1) c + P j i=1 R i
β−1
1 + h c + P j i=1 R i where the function h satisfies |h(x)| ≤ (k/x) for all x ≥ c + u and some constant k. Write
P n−1 j=1 Λ j
a n (β) = α Γ(c + 1) Γ(c + β)
P n−1
j=1 j β−1 c j + R j
β−1
a n (β) + D n , where D n = α Γ(c + 1)
Γ(c + β) P n−1
j=1 c + P j
i=1 R i β−1
h c + P j i=1 R i
a n (β) .
In view of (4), one obtains D n −→ 0 and a.s.
P
n−1 j=1Λ
ja
n(β)
−→ λ(β). Next, define a.s.
T 0 = 0 and T n =
n
X
j=1
N j − E(N j | F j−1 )
a j (β) =
n
X
j=1
N j − Λ j−1
a j (β) . Then, (T n ) is a martingale with respect to (F n ) and
E(T n 2 ) =
n
X
j=1
E(N j − Λ j−1 ) 2 a j (β) 2 =
n
X
j=1
EE (N j − Λ j−1 ) 2 | F j−1
a j (β) 2 =
n
X
j=1
E(Λ j−1 ) a j (β) 2 . Since E(Λ j ) = O(j −(1−β) ), then sup n E(T n 2 ) = P ∞
j=1 E(Λ
j−1)
a
j(β)
2< ∞. Thus, T n
converges a.s., and Kronecker lemma implies lim n
L n
a n (β) = lim
n
P n j=1 N j a n (β) = lim
n
P n j=1 Λ j−1
a n (β) = lim
n
Λ 0 + P n−1 j=1 Λ j
a n (β) = λ(β) a.s.
In view of Theorem 4, as far as β ∈ [0, 1) and the weights R n meet the SLLN, L n
essentially behaves for large n as in the standard IBP model. The only difference is that the limit constant λ(β) depends on r as well. (In the standard IBP one has r = 1). Note also that, the R n being independent, a sufficient condition for (4) is
sup
n
E(R 2 n ) < ∞ and P n
i=1 E(R i )
n −→ r.
We next turn to the limiting distribution of L n . To get something, stronger
conditions on the R n are to be requested.
Theorem 5. If β ∈ [0, 1) and R n
−→ r a.s. and P n
j=1 j β−1 E|R j − r|
pa n (β) −→ 0 (5)
for some constant r, then p a n (β) n L n
a n (β) − λ(β) o
−→ N 0, λ(β)
stably.
Proof. We first prove that
p a n (β) n P n j=1 Λ j−1
a n (β) − λ(β) o P
−→ 0.
(6)
By Lemma 2 and some calculations, condition (6) is equivalent to Y n :=
P n−1 j=1
c + P j
i=1 R i β−1
− (r j) β−1 pa n (β)
−→ 0. P
Let v = min (u, c + u). Then, v > 0, r ≥ u ≥ v and c + P j
i=1 R i ≥ v j; see the proof of Lemma 2. Hence, one can estimates as follows
E
(r j) β−1 − c +
j
X
i=1
R i
β−1 ≤
E
c + P j
i=1 R i 1−β
− (r j) 1−β (v j) 2(1−β)
≤ 1
(v j) 2(1−β) 1 − β (v j) β E
c +
j
X
i=1
R i − r j
≤ 1 − β v 2−β
n |c|
j 2−β + E|R j − r|
j 1−β o
. Thus, condition (5) implies E|Y n | → 0. This proves condition (6).
Next, define U n = p
a n (β) n L n
a n (β) − P n
j=1 Λ j−1 a n (β)
o
= P n
j=1 (N j − Λ j−1 ) pa n (β) .
In view of (6), it suffices to show that U n −→ N 0, λ(β) stably. To this end, for n ≥ 1 and j = 1, . . . , n, define
U n,j = N j − Λ j−1
pa n (β) , R n,0 = F 0 and R n,j = F j . Then, E U n,j | R n,j−1 ) = 0 a.s., R n,j ⊂ R n+1,j and U n = P
j U n,j . Thus, by the martingale CLT, U n −→ N 0, λ(β) stably provided
(i)
n
X
j=1
U n,j 2 −→ λ(β), P (ii) max
1≤j≤n |U n,j | −→ 0, P (iii) sup
n
E max
1≤j≤n U n,j 2 < ∞;
see e.g. Theorem 3.2, page 58, of [19]. Let H j = (N j − Λ j−1 ) 2 and D n =
P n
j=1 H j − E(H j | F j−1 )
a n (β) .
By Kronecker lemma and the same martingale argument used in the proof of The- orem 4, D n
−→ 0. Since R a.s. j
−→ r and E(H a.s. j | F j−1 ) = Λ j−1 a.s., then
n
X
j=1
U n,j 2 = P n
j=1 H j
a n (β) = D n + P n
j=1 Λ j−1
a n (β)
−→ λ(β). a.s.
This proves condition (i). As to (ii), fix k ≥ 1 and note that max
1≤j≤n U n,j 2 ≤ max 1≤j≤k H j
a n (β) + max
k<j≤n
H j
a j (β) ≤ max 1≤j≤k H j
a n (β) + sup
j>k
H j
a j (β) for n > k.
Hence, lim sup n max 1≤j≤n U n,j 2 ≤ lim sup n a Hn
n
(β) and condition (ii) follows from H n
a n (β) = P n
j=1 H j a n (β) −
P n−1 j=1 H j a n (β)
−→ 0. a.s.
Finally, condition (iii) is an immediate consequence of Lemma 2 and E max
1≤j≤n U n,j 2 ≤ P n
j=1 E(H j ) a n (β) =
P n
j=1 E(Λ j−1 ) a n (β) .
Note that, letting R n = 1 for all n, Theorem 5 provides the limiting distribution of L n in the standard IBP model.
For Theorem 5 to apply, condition (5) is to be checked. We now give conditions for (5). In particular, (5) is automatically true whenever sup n E(R 2 n ) < ∞ and E(R n ) = r for all n.
Lemma 6. Condition (5) holds provided β ∈ [0, 1) and sup
n
E(R n 2 ) < ∞ and p
n β log n E(R n ) − r −→ 0.
(7)
Proof. Let a = sup n E(R 2 n ). Because of (7), E(R n ) → r. Thus, R n
−→ r since a.s.
a < ∞ and (R n ) is independent. Moreover, E|R j − r| ≤ E|R j − E(R j )| + |E(R j ) − r| ≤
q
var(R j ) + |E(R j ) − r| ≤ p
a/j + |E(R j ) − r|.
Hence, the second part of condition (5) follows from the above inequality and
condition (7).
A last remark is in order. Fix a set B ∈ B and define L n (B) = card(B ∩ S n )
to be the number of dishes, belonging to B, tried by the first n customers. The same arguments used for L n = L n (X ) apply to L n (B) and allow to extend Theorems 4-5 as follows.
Theorem 7. Let β ∈ [0, 1) and B ∈ B. If condition (4) holds, then L n (B)
a n (β)
−→ m(B)λ(β). a.s.
Moreover, under condition (5), one obtains p a n (β) n L n (B)
a n (β) − m(B)λ(β) o
−→ N 0, m(B)λ(β)
stably.
Proof. Let N i (B) denote the number of new dishes, belonging to B, tried by cus- tomer i. Then, L n (B) = P n
i=1 N i (B) and N i+1 (B) | F i ∼ Poi(m(B) Λ i ). There- fore, it suffices to repeat the proofs of Theorems 4-5 with N i (B) in the place of N i
and m(B) Λ i in the place of Λ i .
5. Asymptotic behavior of K n
5.1. The result. Let K i = M i (X ) be the number of dishes experimented by cus- tomer i and
K n = 1 n
n
X
i=1
K i
the mean number of dishes tried by each of the first n customers.
In IBP-type models, K n is a meaningful quantity. One reason is the following.
If the parameters m, α, β and c are unknown, E K n+1 | F n can not be evaluated in closed form. Then, K n could be used as an empirical predictor for the next random variable K n+1 . Such prediction is consistent whenever
V n := K n − E K n+1 | F n P
−→ 0.
But this is usually true. For instance, V n
−→ 0 if the sequence (K a.s. n ) is c.i.d. with respect to (F n ); see [4] and [5]. In general, the higher the convergence rate of V n , the better K n as a predictor of K n+1 .
Under some conditions, K n
−→ Z for some real random variable Z. Thus, two a.s.
random centerings for K n should be considered. One (and more natural) is Z, while the other is E K n+1 | F n , to evaluate the performances of K n as a predictor of K n+1 . Taking √
n as a norming factor, this leads to investigate
√ n K n − Z
and √ n V n .
The limiting distributions of these quantities are provided by the next result.
Theorem 8. Suppose β < 1/2 and sup
n
R n ≤ b, E(R n ) −→ r, E(R 2 n ) −→ q, for some constants b, r, q. Then,
K n −→ Z a.s. and 1 n
n
X
i=1
K i 2 −→ Q, a.s.
where Z and Q are real random variables such that Z 2 < Q a.s. Moreover,
√ n K n − Z −→ N 0, σ 2
stably and
√ n K n − E K n+1 | F n −→ N 0, τ 2
stably, where σ 2 = 2q − r 2
r 2 (Q − Z 2 ), τ 2 = q − r 2
r 2 (Q − Z 2 ).
If R n = 1 for all n, the previous results hold for β < 1 (and not only for β < 1/2).
Theorem 8 is a consequence of Theorem 1 of [6]. The proof, even if conceptually simple, is technically rather hard.
Theorem 8 fails, as it stands, for β ∈ [1/2, 1). Let µ n denote the probability distribution of the random variable √
n K n − Z . The sequence (µ n ) might be not tight if β ∈ (1/2, 1). For instance, (µ n ) is not tight if β ∈ (1/2, 1) and R n = r for all n, where r is any constant such that r 6= 1. If β = 1/2, instead, (µ n ) is tight but the possible limit laws are not mixtures of centered Gaussian distributions. Thus, even if √
n K n − Z converges stably, the limit kernel is not N 0, σ 2 .
Since q ≥ r 2 and Q > Z 2 a.s., then σ 2 > 0 a.s. Hence, N 0, σ 2
is a non degenerate kernel. Instead, N 0, τ 2 may be degenerate. In fact, if q = r 2 then N 0, τ 2 = N 0, 0) = δ 0 . Thus, for q = r 2 , Theorem 8 yields √
n V n
−→ 0. P
The convergence rate of V n is n −1/2 when q > r 2 . Such a rate is even higher if q = r 2 , since √
n V n −→ 0. Overall, K P n seems to be a good predictor of K n+1 for large n.
Among other things, Theorem 8 can be useful to get asymptotic confidence bounds for Z. Define in fact
σ b 2 n = n (2/n) P n i=1 R 2 i R 2 n
− 1 o n 1 n
n
X
i=1
K i 2 − K 2 n o .
Since b σ 2 n −→ σ a.s. 2 and σ 2 > 0 a.s., one obtains I {
σ b
n>0}
√ n K n − Z b σ n
−→ N (0, 1) stably.
Thus, K n ± √ uan σ b n provides an asymptotic confidence interval for Z with (approx- imate) level 1 − a, where u a is such that N (0, 1)(u a , +∞) = a/2.
Theorem 8 works if β ∈ [0, 1) and R n = 1 for all n, that is, it applies to the standard IBP model. Also, in this case, the convergence rate of V n is greater than n −1/2 (since q = r 2 = 1). Hence, K n is a good (asymptotic) predictor of K n+1 . 5.2. The proof. We begin with a couple of results from [6]. Let (X n ) be a sequence of real integrable random variables, adapted to a filtration (U n ), and let
X n = 1 n
n
X
i=1
X i and Z n = E X n+1 | U n .
Lemma 9. If P
n n −2 E(X n 2 ) < ∞ and Z n −→ Z, for some real random variable a.s.
Z, then
X n
−→ Z a.s. and n X
k≥n
X k
k 2
−→ Z. a.s.
Proof. This is exactly Lemma 2 of [6].
Theorem 10. Suppose (X n 2 ) is uniformly integrable and (j) n 3 E
E(Z n+1 | U n ) − Z n
2
−→ 0.
Then, Z n −→ Z and X a.s. n −→ Z for some real random variable Z. Moreover, a.s.
√ n X n − Z n −→ N 0, U stably and
√ n X n − Z −→ N 0, U + V stably for some real random variables U and V , provided
(jj) Esup k≥1 √
k |Z k−1 − Z k | < ∞, (jjj) 1 n P n
k=1 X k − Z k−1 + k(Z k−1 − Z k ) 2 P
−→ U , (jv) n P
k≥n (Z k−1 − Z k ) 2 a.s. −→ V .
Proof. First note that (Z n ) is a quasi-martingale because of (j) and it is uniformly integrable for (X n 2 ) is uniformly integrable. Hence, Z n −→ Z. By Lemma 9, one a.s.
also obtains X n
−→ Z. Next, assume conditions (jj)-(jjj)-(jv). By Theorem 1 of a.s.
[6] (and the subsequent remarks) it is enough to show that
√ n Esup
k≥n
|Z k−1 − Z k | −→ 0 and 1
√ n E max
1≤k≤n k |Z k−1 − Z k | −→ 0.
Let D k = |Z k−1 − Z k |. Because of (jv), n D n 2 = n X
k≥n
D 2 k − n
n + 1 (n + 1) X
k≥n+1
D k 2 −→ 0. a.s.
Thus sup k≥n √ k D k
−→ 0, and condition (jj) implies a.s.
√ n Esup
k≥n
D k ≤ Esup
k≥n
√
k D k −→ 0.
Further, for 1 ≤ h < n, one obtains E max
1≤k≤n k D k ≤ E max
1≤k≤h k D k + √
n E max
h<k≤n
√
k D k ≤ E max
1≤k≤h k D k + √
n Esup
k>h
√ k D k . Hence, it suffices to note that
lim sup
n
√ 1
n E max
1≤k≤n k D k ≤ Esup
k>h
√ k D k
for all h, and lim
h Esup
k>h
√
k D k = 0.
Note that condition (j) is automatically true in case (X n ) is c.i.d. with respect to the filtration (U n ). We are now able to prove Theorem 8.
Proof of Theorem 8. We apply Theorem 10 with X n = K n and U n = F n . Let J n (x) =
P n
i=1 R i M i {x} − β P n
i=1 R i + c for x ∈ X . Note that
X
x∈S
nJ n (x) = P n
i=1 R i P
x∈S
nM i {x} − β L n P n
i=1 R i + c =
P n
i=1 R i K i − β L n
P n
i=1 R i + c and recall the notation
G n = F n ∨ σ(R n+1 ) = σ(M 1 , . . . , M n , R 1 , . . . , R n , R n+1 ).
Uniform integrability of (K n 2 ). It suffices to show that sup n Ee t Kn < ∞ for some t > 0. In particular, (K n 2 ) is uniformly integrable for β < 0, since Lemma 3 yields
sup
n
Ee Kn ≤ sup
n
Ee Ln = Ee L < ∞ if β < 0.
Suppose β ∈ [0, 1/2). Define g(t) = e t − 1 and W n =
P n i=1 R i K i P n
i=1 R i + c .
Arguing as in Lemma 1 and since Λ n ≤ D n β−1 for some constant D, one obtains Ee t Kn+1 | G n = e g(t) Λn Y
Y
x∈S
n1 + g(t) J n (x) ≤ exp n
g(t) Λ n + g(t) X
x∈S
nJ n (x) o
≤ exp n D g(t) n 1−β + g(t)
P n
i=1 R i K i − β L n
P n
i=1 R i + c
o ≤ exp n D g(t)
n 1−β + g(t) W n
o a.s.
Hence, it is enough to show that sup n Ee t Wn < ∞ for some t > 0. We first prove Ee t Wn < ∞ for all n ≥ 1 and t > 0, and subsequently sup n Ee t Wn < ∞ for a suitable t > 0. Define U n = Pn+1R
n+1
< ∞ for all n ≥ 1 and t > 0, and subsequently sup n Ee t Wn < ∞ for a suitable t > 0. Define U n = Pn+1R
n+1
R
n+1i=1
R
i+c . Since U n is G n -measurable, E e t W
n+1= E n
exp
t W n (1 − U n )
E e t UnK
n+1 | G n
o
≤ E n
exp D g(t U n ) n 1−β
exp
t W n + g(t U n ) − t U n W n
o . On noting that U n ≤ b/(n u),
E e t Wn+1 ≤ exp D g nu tb n 1−β
E n
e {t+g(nutb)} W
no . Iterating this procedure, one obtains
E e t Wn+1 ≤ a n (t) E e bn(t) W
1
(t) W
1for suitable constants a n (t) and b n (t).
Since K 1 ∼ Poi(α) and W 1 = R R1
1