Clustering Species With Residual Covariance Matrix in Joint Species Distribution Models

(1)

S.1 DATA PREPROCESSING

For the environmental covariates, we considered precipitation of coldest quarter, annual precipitation, annual mean temperature, mean temperature of the wettest quarter, and topographic predictors (i.e. slope) at 100 m resolution. For the purpose of the analysis in this paper, we used the two main PCA axes for

summarizing the environmental covariates (see Fig.S1), which are related to 80% of the variation. These

two first axes are included as covariates with a quadratic term (using orthogonal polynomials). After retaining the plots with at least 60% of non-missing entries, we obtained 1 139 plots. We used for analysis 111 most abundant species out of 136 dominant species, that were present at least at 2% of plots. The remaining species were pooled into the ‘other’ category, using the build-in gjamTrim function in the GJAM package. However, we discarded these rare species for further analysis.

S.2 SELECTION OF THE NUMBER OF LATENT FACTORS

We use the low-rank matrix Λ with rank r for the approximation of the residual covariance matrix. The rank of matrix Λ determines the resolution with which JSDMs models the residual covariance. The higher the rank, the closer we are to the full covariance matrix, while at the same time it means that we need to estimate more parameters. In the GJAM model, we need to specify this parameter before fitting the model. We have chosen r = 5 using the Deviance information criterion (DIC) implemented in the model

(Figure S2). This value is coherent with the values used in Chen et al. (2018); Taylor-Rodriguez et al.

(2017); Warton et al. (2015), where the low-rank matrices were able to well approximate the full-rank

residual covariance matrices. For the identifiability issue, the matrix Λ have to be full column rank

(Taylor-Rodriguez et al., 2017; Geweke and Singleton, 1980). So in our case, rank(Λ) = min{r, K},

where r is the number of factors and K is the number of repeated rows (number of clusters) in matrix Λ, so r would be upper bounded by K. In our case, we have a prior guess on K and r satisfies the inequality for the prior guess. However, as we do not know if the prior guess is correct, we need to inspect the posterior distribution of the number of clusters. From the analysis of the posterior distribution on the

number of clusters, we can confirm that the number of factors is smaller than the posterior estimate ˆK

(the smallest estimate is 18).

S.3 DISTRIBUTION OF SPECIES TRAIT RATIO FOR DIFFERENT TRAITS

We computed distribution of species trait ratio for different traits (Landolt nutrient indicator, Landolt light indicator, height (in the logarithmic scale), specific leaf area (SLA), leaf dry matter content (LDMC), leaf carbon concentration (LCC), leaf nitrogen concentration (LNC)) for all models (DP, DPc, PYc) (Figure S3).

S.4 TECHNICAL BACKGROUND ON DIRICHLET AND PITMAN–YOR PROCESSES

S.4.1 Dirichlet process

The main purpose of this section is to give a brief description of the Dirichlet process (DP) and the

(2)

to classical literature in Bayesian nonparametrics such asHjort et al. (2010);Ghosal and Van der Vaart (2017). The distributions of both processes are often used as Bayesian nonparametric priors. The Dirichlet

process could be defined through a constructive representation proposed bySethuraman(1994), so-called

stick-breaking construction. Start by fixing parameter α > 0 and a probability measure H. Then consider two independent families of random variables:

V_k iid∼ Beta(1, α) and η?_k iid∼ H, k = 1, 2, . . . (S1)

Define the random weights as:

p1 = V1, p_k = V_k k−1 Y j=1 (1 − Vj), k = 2, 3, . . . (S2) If G := ∞ X k=1 pkδη_k?, (S3)

then G ∼ DP(α, H), i.e G is a Dirichlet process with parameters α, H. The construction of p = {pk} can

be understood metaphorically as follows. Starting with a stick of length 1, we break it at V1, assigning p1

to be the length of stick we just broke off. Now recursively break the other portion to obtain p2, p3 and so

forth. Notice thatP∞_k=1pk = 1 a.s.

In the stick-breaking representation (S3) of the Dirichlet process, the concentration parameter tunes the

distribution on the random weights. When parameter α is small, the Beta-distributed variables Vi tend

to be closer to 1 than to 0, implying that the random weights rapidly decrease (in expectation). And the opposite, when α is large, the random weights decrease slowly in expectation. Therefore, realisations of G will be more concentrated in a few clusters when α is small, and the number of clusters increases with α.

The Dirichlet process is an infinite dimensional process and there are two main approaches for sampling from this distribution. One could sample from the Dirichlet process using marginal sample or sample from the approximation of the Dirichlet process with finite-dimensional representation. In the first case, so-called marginal methods exhibit slow mixing and are not considered for Gibbs sampling in the GJAM model. Approximation of the Dirichlet process based on truncating the stick-breaking representation

explored by Ishwaran and James (2001) is a common choice, where the truncation number N has

to be defined based on the desired approximation error. Another possibility is the finite-dimensional representation by Dirichlet multinomial process, which approximates the Dirichlet process in the limit

(Muliere and Secchi,2003), and could be used as well as finite dimensional prior for some finite N (M¨uller

et al.,2015). In the DP model, the latter representation is used, which allows tractable computation with

Gibbs sampling. The sampling scheme for the DP is described in SectionS.6.

S.4.2 Pitman–Yor process

The Pitman–Yor process is a generalization of the Dirichlet process and is also a special case of a larger

class of priors known as Gibbs-type priors, which were introduced in the seminal works ofPitman(2003)

(3)

The Pitman–Yor process was introduced byPitman and Yor(1997). Similarly to the Dirichlet process, the Pitman–Yor could be defined using stick-breaking construction. Fix scalar parameters α > 0 and σ ∈ (0, 1) and a probability measure H. Then consider two independent families of random variables:

Vk ∼ Beta(1 − σ, α + kσ) and ηk?

iid

∼ H, k = 1, 2, . . . (S4)

Define random weights as:

p1:= V1, pk := Vk k−1 Y j=1 (1 − Vj). (S5) If G = ∞ X k=1 pkδη? k, (S6)

then G ∼ PY(α, σ, H), i.e G is the Pitman–Yor process with concentration parameter α, discount (or diversity) parameter σ and base measure H. Notice that when σ = 0 it holds: P Y (α, 0, H) = DP (α, H). The difference in terms of the stick-breaking representation is in the sampling of the Beta distributed

variables Vk defining the stick-breaking weights, they now depend on two parameters. These variables

are no more identically distributed, and when k increases, the second parameter increases, leading to a

decrease in the mean value of Vk. Parameter Vk decreases (in mean) when k grows, meaning that the

weights pkwill decrease (in mean), slower than in the Dirichlet process. That implies that the Pitman–Yor

process has a greater number of distinct clusters. Indeed, when the data of sample size S is modelled with

the Dirichlet process, then the number of distinct clusters KS has logarithmic growth with S, while for

the Pitman–Yor process KS grows as a power law with S (specifically, Sσ).

Similarly to the Dirichlet process, there exist two main approaches for sampling the Pitman–Yor process, marginal and conditional. However, the flexibility of this process comes with some cost. Sampling of the Pitman–Yor process based on truncation is computationally expensive, especially for large values of σ.

Lijoi et al. (2020) define the finite-dimensional Pitman–Yor multinomial process which approximates

the Pitman–Yor process and allows tractable computation. Pitman–Yor multinomial process is a finite

parameter prior and we use this prior for our model PYc.

S.5 SPECIFICATION OF THE HYPERPARAMETERS FOR THE PRIOR

DISTRIBUTION

For all the models (DP, DPc, PYc) we used N terms in the finite-dimensional representations of the prior processes equal to the number of species S. This is a natural choice because N defines an upper bound of the possible number of clusters, which is the number of species in our definition. However, in order to further reduce the dimension in the case of large number of species S, N is bounded by 150 for DP and

DPc, so the finite dimension representations well approximate the infinite process. For the PYc model,

the value of N could be chosen by computational tractability. Then, on each step of Gibbs sampling, we sample the n = N (i.e, in our case n = S) times from the given process to obtain the rows in matrix A.

S.5.1 Dirichlet process model (DPc)

For each value of parameter α and number of terms in the Dirichlet multinomial process N , we can

use formula (4) with n = S to compute expected number of clusters E[KS](α, N ). As N ,S are fixed for

all our models and α is a random variable with distribution Ga(ν1, ν2), E[KS](α, N ) would be a function

(4)

fix the variance of Gamma distribution at 20, so we can solve the equation for the mean value of this

distribution (ν1/ν2) and then recover ν1, ν2 from the variance relation ν1 = 20ν22. As there is no

closed-form solution, we used a simulation-based approach for approximating function f . This is done by Monte Carlo as follows: αi iid ∼ Ga(ν1, ν2), i = 1, . . . , m (S7) E[KS] = 1 m m X i=1 E[KS](αi). (S8)

For each pair of (ν1, ν2) we compute the prior expected number of clusters E[KS] as: (1) sampling m

times the value of αi from Ga(ν1, ν2), (2) computing E[KS](αi), (3) E[KS] is the average over m values

of E[KS](αi). We solve numerically the approximation equation E[KS] = f∗(ν1, ν2) along with constraint

ν1 = 20ν22and obtain the pair (ν1, ν2).

S.5.2 Pitman–Yor process model ( PYc)

For the Pitman–Yor multinomial prior parameters (α, σ) are fixed. For this process, the prior number of

clusters is given in Theorem 3 ofLijoi et al.(2020) and it is equal to:

P(Kn,N = k) = N ! (N − k)!σ(α + 1)n−1 n X l=k 1 Nl Γ(α/σ + l) Γ(α/σ + 1)Sl,kCn,l (S9)

for any k ≤ min{N, n}, N is the number of terms in finite-representation, n is the number of samples,

Sl,k is the Stirling number of the second kind and Cn,k is the following generalized factorial coefficient

(seeCharalambides,2005): Cn,k = 1 k! k X i=0 (−1)ik i (−iσ)n (S10)

Using Equation (S9) for n = S for the distribution of KS, we can compute the expectation E[KS] and

variance V [KS]. Specifying E[KS] = K∗ and defining some value for the variance V [KS], we can find

some pair (α, σ) by numerically solving the system of equations. However, in our implementation, we

firstly define desired value σ, and find suitable α solving numerically the equation for E[KS](α) = K∗.

S.5.3 Hyperparameters for the priors

The hyperparameters used in the model fitting are reported in TableS1.

S.5.4 Prior distribution of the number of clusters

The prior distribution of the number of clusters in the DP model could be calibrated by varying N

while keeping parameter α fixed (and equal to the number of species as done by Taylor-Rodriguez

et al. (2017)). Another way to calibrate the DP model could be done by selecting a suitable parameter

α. While this approach is not accessible in the DP model, we provide here this prior distribution on the number of clusters for comparison. The difference between these two approaches is that they imply two different prior distributions for the number of clusters. Indeed, the prior distribution induced by the Dirichlet process in the DP model for parameters in our case study, chosen by suitably fixing N (DPM1

in FigureS4) is extremely peaked compared to the one induced by the Dirichlet process in the DP model

when we instead suitably fix α (DPM2 in FigureS4). An additional hierarchical level for parameter α

(5)

S.6 GIBBS SAMPLING

In this section, we first describe the Gibbs sampling scheme for the original GJAM model, and then provide the description for our modified version of the Gibbs sampler. For identifiability reasons, we

need to reparametrize Σ∗: we work instead with the correlation matrix R = D−1/2Σ∗D−1/2, where

Σ∗= ΛΛT+σ2ISand Λ in eq (1), and D is the diagonal matrix containing diag(Σ∗). Reparametrization

of Σ∗leads to the reparametrization of latent variable (see details inTaylor-Rodriguez et al.,2017). Here

yi = (yi1, . . . , yiS) vector of species observations at plot i and Γ(yij) is (−∞, 0] if yij = 0 and (0, ∞) if

yij = 1.

The matrix A is represented as A = Q(k)V , where V = (v_j0)N_j=1, v_j0 ∼ H, V is N × r matrix whose

rows are all potential atoms (H is the base distribution for the Bayesian nonparametric prior). k is the

vector of cluster labels k = (k1, . . . , kS), (1 ≤ k ≤ N ) and al = vkl. Q(k) is the S × N matrix, such that

Q(k) = (ek1, . . . , ekS) and ekl is the N -dimensional vector with 1 in the position kl and 0’s elsewhere.

An outline of the pseudo-code for Gibbs sampling algorithm in the original GJAM model is provided in

Algorithm1.

In the above scheme, we modified the sampling from the posterior distribution [p | k], where k is the vector of cluster labels. In the original model (DP), the Dirichlet multinomial process is used as a finite

approximation of the Dirichlet process with concentration parameter α and base measure N (0, Dz). Then

the posterior distribution of the weights is defined by stick-breaking representation.

S.6.1 Dirichlet process with gamma prior on α (DPc)

In our DPc model, we have employed the same representation of the Dirichlet multinomial process

as in the original DP model. We added the hyperprior distribution for α parameter. The hyperprior

distribution is defined as Ga(ν1, ν2) and hyperparameters ν1, ν2 are defined in section S.5.1. In the case

of the Dirichlet multinomial process, there is no conjugacy for the Gamma distribution, so we used the Metropolis-Hastings step with adaptation for the Gibbs sampling. The conditional density π is

α|p ∝ Γ(α) Γ(α/N )Np α/N −1 1 . . . p α/N −1 N α ν1−1_e−ν2α_. _(S11)

For the sample from this distribution, we need a Metropolis–Hasting step in our Gibbs sampler. We implemented a Random Walk Metropolis–Hasting, with a truncated normal as proposal since our target

distribution, has support on R+.

S.6.2 Pitman–Yor model process (PYc)

For the PYc model, we have used the Pitman–Yor multinomial process introduced by Lijoi et al.

(2020) as the finite dimensional approximation for the Pitman–Yor process. In Theorem 4, authors define posterior distribution for the Pitman-Yor multinomial process as a linear combination of the Dirichlet and ratio-stable distributions. We have followed the proposed sampling scheme for posterior distribution,

which is described in details in the paper. Following the notations inLijoi et al.(2020):

We denote samples from the Pitman–Yor multinomial process as Z1, ..Zn | pN

iid

∼ pN, pN ∼

P Y M (σ, α, N, H), where n is the number of samples, N is the number of terms in the Pitman–Yor

multinomial process, H is the base measure. (in our case n = N = S). Then, Z(N ) = (Z1, .., ZN) have

(6)

Algorithm 1: Gibbs sampling GJAM (Taylor-Rodriguez et al.,2017)

Input: Training data {xi, yi}i=1,...,n

Hyperparameters truncation number N Initialization of parameters B, Z, w, p, k, A for t ≤ T do sample z∗_i ∼ trNS(B∗xi+ Awi, σ2IS; Γ(yi)) sample v_jfor j = 1, . . . , N : if j /∈ k then sample v_j ∼ Nr(0, Dv) end else denote Sj = {l = 1, . . . , S, s.t kl = j} sample vj ∼ Nr(µv, Σvj) where Σvj = ( |Sj| σ2 W T_{W + D}−1 v )−1and µv= Σv_jWt 1 σ2 P l∈Sj(z (l)_{− Xβ} l) end sample wi ∼ Nr(ΣWA_σ12 (zi− Bxi)), where ΣW = 1 σ2 A t_{A + I} r

sample k ∼QS_l=1PN_j=1pljδj(kl), where plj ∝ pjexp

n − 1 2σ2 kz (l)_{− Xβ} l− W vjk2 o sample p: sample vj ∼ Beta(α/N +PSl=1Ikl=j, (N − j)α/N + PS s=j+1 PS l=1Ikl=s), where j = 1, . . . , N − 1 andPS_l=1Ikk=j = nj p1 = v1, pk = vkQk−1l=1(1 − vj), pN = 1 −PN −1j=1 pj sample σ2 ∼ IGnS+ν 2 + 1, Pn i=1||zi−Bxi−Awi||2 2 + ν G2

sample D_v ∼ IW(Dv | 2 + r + N − 1, VtV + 4diag{1/η1, . . . , 1/ηr})

sample B∗ ∼ NSp((Z∗− W AT)TX(XTX)−1, σ2(XTX)−1, IS)

Compute the variables on the correlation scale:

zi = D−1/2zi∗, B = D−1/2B∗and R = D−1/2(AAT + σ2I)D−1/2

end

defined by k and latent variables `k = (`1, . . . , `k):

pN | Z(N ), `(k) = k X j=1 (wj+ wk+1rj)δZ∗_j+ w_k+1 N X j=k+1 rjδ_Z_j, (w1, . . . , wk | Z(N ), `(k)) ∼ Dir(n1− `1σ, . . . , nk − `kσ, α + |`(k)|σ). (r1, . . . , rN −1 | Z(N ), `(k)) ∼ RS(σ, α + |`(k)|σ, 1/N, . . . , 1/N ),

(7)

Algorithm 2: Metropolis–Hastings algorithm for α in DPc Initialization of parameters α(0) for t ≤ T do propose: αc ∼ trN (α, σ2 p), where σp2= (2.38)2var(α0, . . . , α(t−1)) p(αc| αt) = min{1, trN (α;α c )π(αc) trN (αc_;α)π(α)} sample u ∼ Uniform(u; 0, 1) if u < p then αt ← α(c) end else αt ← α(t−1) end end

where `(k) = (`1, . . . , `k), `j ∈ {1, . . . , nj}, |`(k)| = `1+ . . . + `k and it follows the distribution:

Pr(`1 = l1, . . . , lk | θ(n)) ∝ Γ(α/σ + |l(k)|) k Y j=1 C_n_j_,l_j Nlj

Then sampling `k is simplified by introducing latent variable V , such that:

Pr(`1 = l1, . . . , lk | Z(N ), V ) ∝ k Y j=1 V N lj Cnj,lj (S12) V | Z(N ) ∝ e−vvα/σ−1 k Y j=1 nj X lj=1 V N lj Cnj,lj (S13)

The pseudo-code for Gibbs sampling algorithm used in the PYcmodel is provided in Algorithm3.

Algorithm 3: Algorithm for sampling p in PYc

Hyperparameters truncation number N Initialization of parameters `, p, v, k for t ≤ T do V ∼ f (v) ` ∼Qk_j=1 V_NljCnj,lj W ∼ Dir(n1− `1σ, . . . , nk− `kσ, α + |`(k)|σ) R ∼ RS(σ, α + |`(k)|σ, 1/N, . . . , 1/N ) p = (w1+ wk+1r1, . . . , wk+ wk+1rk, wk+1rk+1, . . . , wk+1rN) end Remark

(8)

1. For sampling V | k we used the ratio of uniforms algorithm for the density f (v). We have used the

rust package (Northrop, 2019), which use the generalized ratio-of-uniforms algorithm, based on the

work ofWakefield et al.(1991).

2. For sampling ratio stable distribution we have used the approach proposed in Remark 1 inLijoi et al.

(2020). For sampling the tempered-stable distribution we used the function rlaptrans from Ridout

(2009).

S.7 CLUSTER ESTIMATE

Bayesian nonparametric models described in this paper induce clustering on the rows of matrix Λ. In Bayesian modeling we obtain the posterior distribution of random partition on rows of Λ. The challenging question is how to summarize this posterior distribution and obtain cluster estimate. We follow the

Bayesian method based on decision and information theoretic approaches proposed inWade et al.(2018)

andRastelli and Friel(2018). These approaches are based on specifying the loss function L(c, ˆc), which

measures the loss of estimating true clustering c with ˆc. Optimal cluster estimate is then defined through

the minimization of this expected posterior loss, given the data Y1:n

c∗ = arg min

ˆ c

E[L(c, ˆc) | Y1:n] = arg min

ˆ c

X

c

L(c, ˆc)p(c | Y1:n), (S14)

where p(c | Y1:n) is posterior distribution of partition c. For defining the loss function, we need to specify

the distance on the partition space. The two distances are generally used: the Binder and Variation of Information. In both articles, the authors, propose to use VI distance for defining the posterior clustering. One of the properties of the Binder loss, that was shown in their works, is that the Binder loss overestimates the number of clusters. For this reason, in this paper we discuss the results obtained with VI distance. Two approaches differ in the algorithm used for finding optimal solution for (S14).

We have used the approach for estimation of the optimal clustering from Rastelli and Friel (2018), for

computational efficiency.

It is important to mention that, for finding solution in (S14), we need to explore all possible partitions c. Partitions c that we obtain through MCMC sampling could not contain the optimal c that minimizes (S14). For exploring the partition spaces beside the obtained MCMC samples the ‘greedy search’ algorithm is employed. Exploring the whole partition space is computationally too expensive, so the algorithm uses the approximations scheme. In addition, the algorithm should be run several times with different starting points.

In our analysis, for each model, we used three restarts with different starting points: every point in a separate cluster, random partition with sixteen clusters, and PFG partition. From all restarts for each model, we have chosen the optimal cluster with the smallest value of the posterior expected loss.

S.8 SUPPLEMENTARY MATERIALS FOR RESULTS SECTION

S.8.1 Prediction

We evaluated model fit at the species level. TableS2provides the predictive performances measured by

calculating the area under the receiver operating characteristic curve (AUC) on both training (AUCin) and

testing datasets ((AUCout)).

S.8.2 Comparison clustering results with random partition

To complete the comparison between the clusters, estimated by our models and PFGs, we compared the cluster estimated by our models with the two different random partitions. First random partition is

(9)

randomly assign species to 16 groups (RU). Second partition assigns the species randomly to the 16

groups, where the size of the groups is the same as in PFGs (RW). Clusters estimated by DP, DPc, PYc

are slightly closer in terms of adjusted Rand indices (ARI) to PFGs, than to random partition.

S.8.3 Sensitivity to prior specification

In this section we provide the parameters for DPc and PYc models, that were used for studying the

sensitivity for the prior calibration (see TableS.10).

Moreover, we can consider the posterior distribution of the α parameter in the DPcmodel. The expected

number of clusters of the prior distribution is set to 8 and 56 correspondingly (Fig.S6). We can see that

when the posterior distribution is close when the prior expected number of clusters is set to 8 and 16. While for the large prior number of clusters (56), the posterior distribution of α is concentrated at higher values.

S.9 RESIDUAL CORRELATION MATRICES

In FiguresS7-S9we provide residual correlation matrices for models DP, DPc, PYc.

S.10 CONVERGENCE ASSESSMENT

For each model, we ran two MCMC chains of 80 000 iterations, with 30 000 burn-in iterations. Convergence was assessed through the calculating Gelman–Rubin diagnostics or visual inspection of the

traceplots. We provide here the effective sample size (FigureS10) and potential scale reduction (Figure

S11) for regression coefficients and coefficients of the covariance matrix. In FigureS11we provide only

coefficients with the value of potential scale reduction factor less than 1.1, the convergence of a few coefficients with the values higher than 1.1 was confirmed through visual inspection of their traceplots. Potential scale reduction is less than 1.1 and the effective sample size is relatively high, so we can confirm satisfactory convergence. Convergence for regression coefficients is better than for coefficients of the residual covariance matrix (lower potential scale reduction factor and larger effective sample size).

REFERENCES

Chen D, Xue Y, Gomes C. End-to-end learning for the deep multivariate probit model

(Stockholmsm¨assan, Stockholm Sweden: PMLR) (2018), Proceedings of Machine Learning Research, vol. 80, 932–941.

Taylor-Rodriguez D, Kaufeld K, Schliep EM, Clark JS, Gelfand AE. Joint species distribution modeling:

dimension reduction using Dirichlet processes. Bayesian Analysis 12 (2017) 939–967.

Warton DI, Blanchet FG, O’Hara RB, Ovaskainen O, Taskinen S, Walker SC, et al. So many variables:

joint modeling in community ecology. Trends in Ecology & Evolution 30 (2015) 766–779.

Geweke JF, Singleton KJ. Interpreting the likelihood ratio statistic in factor models when sample size is small. Journal of the American Statistical Association 75 (1980) 133–137.

Hjort NL, Holmes C, M¨uller P, Walker SG. Bayesian nonparametrics (Cambridge University Press) (2010).

Ghosal S, Van der Vaart A. Fundamentals of nonparametric Bayesian inference(Cambridge University

Press) (2017).

Sethuraman J. A constructive definition of Dirichlet priors. Statistica Sinica 4 (1994) 639–650.

Ishwaran H, James L. Gibbs sampling methods for stick-breaking priors. Journal of the American

Statistical Association96 (2001) 161–173.

Muliere P, Secchi P. Weak convergence of a Dirichlet-multinomial process. Georgian Mathematical

(10)

M¨uller P, Quintana FA, Jara A, Hanson T. Bayesian nonparametric data analysis (Springer) (2015).

Pitman J. Poisson-Kingman partitions. Statistics and science: a Festschrift for Terry Speed, 134. IMS

Lecture Notes Monogr. Ser40 (2003).

Gnedin A, Pitman J. Exchangeable Gibbs partitions and Stirling triangles. Journal of Mathematical

Sciences138 (2006) 5674–5685.

De Blasi P, Favaro S, Lijoi A, Mena RH, Pr¨unster I, Ruggiero M. Are Gibbs-type priors the most natural

generalization of the Dirichlet process? Pattern Analysis and Machine Intelligence, IEEE Transactions

on37 (2015) 212–229.

Pitman J, Yor M. The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator.

The Annals of Probability25 (1997) 855–900.

Lijoi A, Pr¨unster I, Rigon T. The Pitman–Yor multinomial process for mixture modeling. Biometrika,

forthcoming(2020).

Charalambides CA. Combinatorial methods in discrete distributions, vol. 600 (John Wiley & Sons)

(2005).

Northrop PJ. rust: Ratio-of-Uniforms Simulation with Transformation (2019). R package version 1.3.8.

Wakefield J, Gelfand A, Smith A. Efficient generation of random variates via the ratio-of-uniforms

method. Statistics and Computing 1 (1991) 129–133.

Ridout MS. Generating random numbers from a distribution specified by its Laplace transform. Statistics

and Computing19 (2009) 439.

Wade S, Ghahramani Z, et al. Bayesian cluster analysis: Point estimation and credible balls (with

discussion). Bayesian Analysis 13 (2018) 559–626.

Rastelli R, Friel N. Optimal Bayesian estimators for latent variable cluster models. Statistics and

(11)

LIST OF FIGURES

S1 Biplot representation of Principle Component Analysis (PCA) performed on the

environmental variables: precipitation of coldest quarter (bio 8), annual precipitation (bio 12), annual mean temperature (bio 1), mean temperature of the wettest quarter

(bio 19) and topographic predictors (i.e. slope) (slope) at 100 m resolution. 12

S2 Deviance information criterion (DIC) values for different values r, r = {5, 10, 15, 20}

for all models DP, DPc, PYc 13

S3 Distribution of species trait ratio for different traits and for all clustering methods. The

reference curve is the distribution of species trait ratio of PFGs. (DP, DPc, PYc) 14

S4 Prior distribution of the number of clusters KS for DP model with parameters N = 16

and α = S (DPM1), and DP with parameter N = S and α = 6.23 (DPM2) and for DPc

model specified such that E[KS] = 16, n = S for all the distributions. 15

S5 Pairwise adjusted Rand indices (ARI) between the the clusters estimated by the models

(DP, DPc, PYc), PFGs and two random partitions of species RU and RW. 16

S6 Prior (violet) and posterior (green) distribution of the parameter α for the model DPc and

different prior specification of the number of clusters. Prior expected number of clusters

E[KS] = K, where K take values in {8, 16, 56}. 17

S7 Residual correlation matrix for DP model. 18

S8 Residual correlation matrix for DPc model. 19

S9 Residual correlation matrix for PYc model. 20

S10 Effective sample size for the coefficients of covariance matrix Σ (left) and regression

coefficients (elements of B matrix) (right) for all models (DP, DPc, PYc). 21

S11 Potential scale reduction factor for coefficient of covariance matrix Σ (left) and

(12)

bio_1 bio_12 bio_19 bio_8 slope 0.0 0.2 0.4 0.6 0.8 −1.0 −0.5 0.0 0.5 1.0 Dim1 (82.2%) Dim2 (15%) 19.25 19.50 19.75 20.00 20.25 20.50 contrib

Figure S1. Biplot representation of Principle Component Analysis (PCA) performed on the environmental variables: precipitation of coldest quarter (bio 8), annual precipitation (bio 12), annual mean temperature (bio 1), mean temperature of the wettest quarter (bio 19) and topographic predictors (i.e. slope) (slope) at 100 m resolution.

(13)

67.4 67.5

5 10 15 20

number of latent factors (r)

DIC v alue × 10 4 Models DP DPc PYc

Figure S2. Deviance information criterion (DIC) values for different values r, r = {5, 10, 15, 20} for all

(14)

LNC Landolt light indicator Landolt nutrient indicator Height SLA LDMC LCC 0.0 0.5 1.0 1.5 2.0 2.5 0.0 0.5 1.0 1.5 2.0 2.5 0.0 0.5 1.0 1.5 2.0 2.5 0.0 0.5 1.0 1.5 2.0 2.5 0 1 2 3 0 1 2 3

Species traits ratio

Density Model DP DPc PYc PFG

Figure S3. Distribution of species trait ratio for different traits and for all clustering methods. The

(15)

●●●●●●●●●●●●●● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●● ●● ● ●●●●● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●_{●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●} 0.00 0.25 0.50 0.75 0 30 60 90 k Probability Model ● ● ● DPM1 DPM2 DPc

Figure S4. Prior distribution of the number of clusters KS for DP model with parameters N = 16 and

α = S (DPM1), and DP with parameter N = S and α = 6.23 (DPM2) and for DPc model specified such

(16)

0.05

0.08

0.02

0.01

0

0.02

PFG RU RW DP DPc PYc 0.00 0.25 0.50 0.75 1.00 Distance

Figure S5. Pairwise adjusted Rand indices (ARI) between the the clusters estimated by the models (DP,

(17)

K = 8 K = 16 K = 56 0 20 40 60 0 10 20 30 40 90 100 110 120 130 0.0 0.1 0.2 0.3 α Probability

Figure S6. Prior (violet) and posterior (green) distribution of the parameter α for the model DPc and

different prior specification of the number of clusters. Prior expected number of clusters E[KS] = K,

(18)

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 Group:1:Acer pseudoplatan us Group:2:Astr antia major L. Group:2:Dactylis glomer ata L Group:2:Gentiana lutea L. Group:2:Lath yrus pr atensis L Group:3:Campan ula rhomboidal Group:3:Chaeroph yllum hirsut Group:3:Deschampsia cespitos Group:3:Ger anium sylv aticum Group:3:Pimpinella major (L. Group:4:Her acleum sphondyliu Group:4:Rume x ar ifolius All. Group:4:Silene dioica (L.) C Group:5:Care x flacca Schrebe Group:5:Laser pitium latif oli Group:6:Cor ylus a vellana L. Group:6:Heder a helix L. Group:6:Lonicer a xylosteum L Group:7:Ar nica montana L. Group:8:Centaurea uniflor a T Group:9:Care x panicea L. Group:9:Cirsium palustre (L. Group:9:Holcus lanatus L. Group:9:L ysimachia vulgar is Group:9:Phr agmites austr alis Group:9:Plantago lanceolata Group:9:P otentilla erecta (L Group:10:Hier acium m uror um L. Group:10:Sorb us ar ia (L.) Cr a Group:11:Deschampsia fle xuosa Group:11:Homogyne alpina (L.) Group:11:Rhododendron f errugi Group:11:Sorb us chamaemespilu Group:11:V accinium vitis−idae Group:11:V aler iana montana L. Group:12:Leontodon hispidus L Group:13:Anth yllis vulner aria Group:13:Arctostaph ylos uv a−u Group:13:Br ach ypodium r upestr Group:13:Br iza media L. Group:13:Brom us erectus Hudso Group:13:Campan ula rotundif ol Group:13:Carduus deflor atus L Group:13:Care x semper virens V Group:13:Euphorbia cypar issia Group:13:Glob ular ia n udicauli Group:13:Gymnadenia conopsea Group:13:Helianthem um n umm ula Group:13:Hippocrepis comosa L Group:13:Hyper icum r icher i Vi Group:13:J uniper us comm unis L Group:13:Leucanthem um adustum Group:13:Lin um cathar ticum L. Group:13:Lotus cor niculatus L Group:13:Onobr ychis montana D Group:13:Or igan um vulgare L. Group:13:Ph yteuma orbiculare Group:13:Plantago atr ata Hopp Group:13:P olygala chamaeb uxus Group:13:Rhinanthus alectorol Group:13:Sanguisorba minor Sc Group:13:Sesler ia caer ulea (L Group:13:T eucr ium chamaedr ys

Group:14:Picea abies (L.) Kar Group:14:Prenanthes pur purea Group:14:Sorb us aucupar ia L. Group:15:Serr atula tinctor ia Group:16:Aln us alnobetula (Eh Group:17:Ar uncus dioicus (W al Group:17:Cardamine pentaph yll Group:17:Care x sylv atica Huds Group:17:Ger anium rober tian um Group:18:Ph yteuma spicatum L. Group:19:Acer campestre L. Group:19:Cor nus sanguinea L. Group:19:Cr ataegus monogyna J Group:19:Ligustr um vulgare L. Group:19:Vib urnum lantana L. Group:20:Knautia dipsacif olia Group:20:Ran unculus tuberosus Group:21:V accinium m yrtillus Group:22:Mercur ialis perennis Group:23:T rifolium pr atense L

Group:24:Abies alba Miller Group:24:F agus sylv atica L. Group:25:Ajuga reptans L. Group:26:Galium odor atum (L.) Group:26:Lamium galeobdolon ( Group:27:Lin um alpin um J acq. Group:28:Epilobium angustif ol Group:29:Fr axin us e xcelsior L Group:30:P olygon um vivipar um Group:31:Clematis vitalba L. Group:32:Pulsatilla alpina (L Group:33:T rollius europaeus L Group:34:Filipendula ulmar ia Group:34:Ur tica dioica L. Group:35:Fr agar ia v esca L. Group:36:Saxifr aga rotundif ol Group:37:Care x ferr uginea Sco Group:38:Luzula sieber i Tausc Group:39:Lister a o vata (L.) R Group:40:T raunsteiner a globos Group:41:Antho xanthum odor atu Group:42:F estuca r ubr a L. Group:43:Hordelym us europaeus Group:44:Melica uniflor a Retz Group:45:Nardus str icta L. Group:46:Ath yrium filix−f emin Group:46:Dr yopter is filix−mas Group:2:Astrantia major L. Group:2:Dactylis glomerata L Group:2:Gentiana lutea L. Group:2:Lathyrus pratensis L Group:3:Campanula rhomboidal Group:3:Chaerophyllum hirsut Group:3:Deschampsia cespitos Group:3:Geranium sylvaticum Group:3:Pimpinella major (L. Group:4:Heracleum sphondyliu

Group:4:Rumex arifolius All. Group:4:Silene dioica (L.) C Group:5:Carex flacca Schrebe Group:5:Laserpitium latifoli Group:6:Corylus avellana L. Group:6:Hedera helix L. Group:6:Lonicera xylosteum L Group:7:Arnica montana L. Group:8:Centaurea uniflora T Group:9:Carex panicea L. Group:9:Cirsium palustre (L. Group:9:Holcus lanatus L. Group:9:Lysimachia vulgaris Group:9:Phragmites australis Group:9:Plantago lanceolata Group:9:Potentilla erecta (L Group:10:Hieracium murorum L. Group:10:Sorbus aria (L.) Cra Group:11:Deschampsia flexuosa Group:11:Homogyne alpina (L.) Group:11:Rhododendron ferrugi Group:11:Sorbus chamaemespilu Group:11:Vaccinium vitis−idae Group:11:Valeriana montana L. Group:12:Leontodon hispidus L Group:13:Anthyllis vulneraria Group:13:Arctostaphylos uva−u Group:13:Brachypodium rupestr Group:13:Briza media L. Group:13:Bromus erectus Hudso Group:13:Campanula rotundifol Group:13:Carduus defloratus L Group:13:Carex sempervirens V Group:13:Euphorbia cyparissia Group:13:Globularia nudicauli Group:13:Gymnadenia conopsea Group:13:Helianthemum nummula Group:13:Hippocrepis comosa L Group:13:Hypericum richeri Vi Group:13:Juniperus communis L Group:13:Leucanthemum adustum Group:13:Linum catharticum L. Group:13:Lotus corniculatus L Group:13:Onobrychis montana D Group:13:Origanum vulgare L. Group:13:Phyteuma orbiculare Group:13:Plantago atrata Hopp Group:13:Polygala chamaebuxus Group:13:Rhinanthus alectorol Group:13:Sanguisorba minor Sc Group:13:Sesleria caerulea (L Group:13:Teucrium chamaedrys Group:14:Picea abies (L.) Kar Group:14:Prenanthes purpurea Group:14:Sorbus aucuparia L. Group:15:Serratula tinctoria Group:16:Alnus alnobetula (Eh

Group:17:Aruncus dioicus (Wal

Group:17:Cardamine pentaphyll Group:17:Carex sylvatica Huds Group:17:Geranium robertianum Group:18:Phyteuma spicatum L. Group:19:Acer campestre L. Group:19:Cornus sanguinea L. Group:19:Crataegus monogyna J Group:19:Ligustrum vulgare L. Group:19:Viburnum lantana L. Group:20:Knautia dipsacifolia Group:20:Ranunculus tuberosus Group:21:Vaccinium myrtillus Group:22:Mercurialis perennis Group:23:Trifolium pratense L Group:24:Abies alba Miller Group:24:Fagus sylvatica L. Group:25:Ajuga reptans L. Group:26:Galium odoratum (L.) Group:26:Lamium galeobdolon ( Group:27:Linum alpinum Jacq. Group:28:Epilobium angustifol Group:29:Fraxinus excelsior L Group:30:Polygonum viviparum Group:31:Clematis vitalba L. Group:32:Pulsatilla alpina (L Group:33:Trollius europaeus L Group:34:Filipendula ulmaria Group:34:Urtica dioica L. Group:35:Fragaria vesca L. Group:36:Saxifraga rotundifol Group:37:Carex ferruginea Sco Group:38:Luzula sieberi Tausc Group:39:Listera ovata (L.) R Group:40:Traunsteinera globos Group:41:Anthoxanthum odoratu Group:42:Festuca rubra L. Group:43:Hordelymus europaeus Group:44:Melica uniflora Retz Group:45:Nardus stricta L. Group:46:Athyrium filix−femin Group:46:Dryopteris filix−mas Group:47:Rubus fruticosus gro

(19)

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 Group:1:Acer pseudoplatan us Group:1:Dr yopter is filix−mas Group:1:Galium odor atum Group:1:Lamium galeobdolon Group:2:Astr antia major Group:2:Dactylis glomer ata Group:2:Gentiana lutea Group:2:Lath yrus pr atensis Group:2:Pimpinella major Group:3:Campan ula rhomboidalis Group:3:Chaeroph yllum hirsutum Group:3:Deschampsia cespitosa Group:3:Epilobium angustif olium Group:3:Filipendula ulmar ia Group:3:Ger anium sylv aticum Group:3:Her acleum sphondylium Group:3:Knautia dipsacif olia Group:3:Ran unculus tuberosus Group:3:Rume x ar ifolius Group:3:Silene dioica Group:3:Ur tica dioica Group:4:Anth yllis vulner aria Group:4:Arctostaph ylos uv a−ursi Group:4:Br ach ypodium r upestre Group:4:Br iza media Group:4:Brom us erectus Group:4:Campan ula rotundif olia Group:4:Carduus deflor atus Group:4:Care x flacca Group:4:Care x semper virens Group:4:Euphorbia cypar issias Group:4:Glob ular ia n udicaulis Group:4:Gymnadenia conopsea Group:4:Helianthem um n umm ular ium Group:4:Hippocrepis comosa Group:4:Hyper icum r icher i Group:4:J uniper us comm unis Group:4:Laser pitium latif olium Group:4:Leucanthem um adustum Group:4:Lin um cathar ticum Group:4:Lotus cor niculatus Group:4:Onobr ychis montana Group:4:Or igan um vulgare Group:4:Ph yteuma orbiculare Group:4:Plantago atr ata Group:4:P olygala chamaeb uxus Group:4:Rhinanthus alectorolophus Group:4:Sanguisorba minor Group:4:Sesler ia caer ulea Group:4:T eucr ium chamaedr ys Group:5:Cor ylus a vellana Group:5:Fr axin us e xcelsior Group:5:Heder a helix Group:5:Lonicer a xylosteum Group:5:Mercur ialis perennis Group:6:Antho xanthum odor atum Group:6:Ar nica montana Group:6:Care x ferr uginea Group:6:Care x panicea Group:6:Centaurea uniflor a Group:6:Cirsium palustre Group:6:F estuca r ubr a Group:6:Holcus lanatus Group:6:Leontodon hispidus Group:6:Lin um alpin um Group:6:Lister a ovata Group:6:L ysimachia vulgar is Group:6:Nardus str icta Group:6:Phr agmites austr alis Group:6:Plantago lanceolata Group:6:P olygon um vivipar um Group:6:P otentilla erecta Group:6:Pulsatilla alpina Group:6:Serr atula tinctor ia Group:6:T raunsteiner a globosa Group:6:T rifolium pr atense Group:6:T rollius europaeus Group:7:Hier acium m uror um Group:7:Sorb us ar ia Group:8:Aln us alnobetula Group:8:Deschampsia fle xuosa Group:8:Homogyne alpina Group:8:Luzula sieber i Group:8:Rhododendron f errugineum Group:8:Sorb us chamaemespilus Group:8:V accinium vitis−idaea Group:8:V aler iana montana Group:9:Picea abies Group:9:Prenanthes pur purea Group:9:Sorb us aucupar ia Group:10:Ar uncus dioicus Group:10:Ath yrium filix−f emina Group:10:Cardamine pentaph yllos Group:10:Care x sylv atica Group:10:Ger anium rober tian um Group:10:Hordelym us europaeus Group:10:Melica uniflor a Group:10:Saxifr aga rotundif olia Group:11:Ph yteuma spicatum Group:12:Acer campestre Group:12:Cor nus sanguinea Group:12:Cr ataegus monogyna Group:12:Ligustr um vulgare Group:12:Vib urnum lantana Group:13:V accinium m yrtillus Group:14:Abies alba Group:14:F agus sylv atica Group:15:Ajuga reptans Group:16:Clematis vitalba Group:17:Fr agar ia v esca Group:1:Dryopteris filix−mas Group:1:Galium odoratum Group:1:Lamium galeobdolon Group:2:Astrantia major Group:2:Dactylis glomerata Group:2:Gentiana lutea Group:2:Lathyrus pratensis Group:2:Pimpinella major Group:3:Campanula rhomboidalis Group:3:Chaerophyllum hirsutum Group:3:Deschampsia cespitosa Group:3:Epilobium angustifolium Group:3:Filipendula ulmaria Group:3:Geranium sylvaticum Group:3:Heracleum sphondylium Group:3:Knautia dipsacifolia Group:3:Ranunculus tuberosus Group:3:Rumex arifolius Group:3:Silene dioica Group:3:Urtica dioica Group:4:Anthyllis vulneraria Group:4:Arctostaphylos uva−ursi Group:4:Brachypodium rupestre Group:4:Briza media Group:4:Bromus erectus Group:4:Campanula rotundifolia Group:4:Carduus defloratus Group:4:Carex flacca Group:4:Carex sempervirens Group:4:Euphorbia cyparissias Group:4:Globularia nudicaulis Group:4:Gymnadenia conopsea Group:4:Helianthemum nummularium Group:4:Hippocrepis comosa Group:4:Hypericum richeri Group:4:Juniperus communis Group:4:Laserpitium latifolium Group:4:Leucanthemum adustum Group:4:Linum catharticum Group:4:Lotus corniculatus Group:4:Onobrychis montana Group:4:Origanum vulgare Group:4:Phyteuma orbiculare Group:4:Plantago atrata Group:4:Polygala chamaebuxus Group:4:Rhinanthus alectorolophus Group:4:Sanguisorba minor Group:4:Sesleria caerulea Group:4:Teucrium chamaedrys Group:5:Corylus avellana Group:5:Fraxinus excelsior Group:5:Hedera helix Group:5:Lonicera xylosteum Group:5:Mercurialis perennis Group:6:Anthoxanthum odoratum Group:6:Arnica montana Group:6:Carex ferruginea Group:6:Carex panicea Group:6:Centaurea uniflora Group:6:Cirsium palustre Group:6:Festuca rubra Group:6:Holcus lanatus Group:6:Leontodon hispidus Group:6:Linum alpinum Group:6:Listera ovata Group:6:Lysimachia vulgaris Group:6:Nardus stricta Group:6:Phragmites australis Group:6:Plantago lanceolata Group:6:Polygonum viviparum Group:6:Potentilla erecta Group:6:Pulsatilla alpina Group:6:Serratula tinctoria Group:6:Traunsteinera globosa Group:6:Trifolium pratense Group:6:Trollius europaeus Group:7:Hieracium murorum Group:7:Sorbus aria Group:8:Alnus alnobetula Group:8:Deschampsia flexuosa Group:8:Homogyne alpina Group:8:Luzula sieberi Group:8:Rhododendron ferrugineum Group:8:Sorbus chamaemespilus Group:8:Vaccinium vitis−idaea Group:8:Valeriana montana Group:9:Picea abies Group:9:Prenanthes purpurea Group:9:Sorbus aucuparia Group:10:Aruncus dioicus Group:10:Athyrium filix−femina Group:10:Cardamine pentaphyllos Group:10:Carex sylvatica Group:10:Geranium robertianum Group:10:Hordelymus europaeus Group:10:Melica uniflora Group:10:Saxifraga rotundifolia Group:11:Phyteuma spicatum Group:12:Acer campestre Group:12:Cornus sanguinea Group:12:Crataegus monogyna Group:12:Ligustrum vulgare Group:12:Viburnum lantana Group:13:Vaccinium myrtillus Group:14:Abies alba Group:14:Fagus sylvatica Group:15:Ajuga reptans Group:16:Clematis vitalba Group:17:Fragaria vesca Group:18:Rubus fruticosus

(20)

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 Group:1:Acer pseudoplatan us Group:1:Ar uncus dioicus Group:1:Ath yrium filix−f emina Group:1:Cardamine pentaph yllos Group:1:Care x sylv atica Group:1:Dr yopter is filix−mas Group:1:Galium odor atum Group:1:Ger anium rober tian um Group:1:Hordelym us europaeus Group:1:Lamium galeobdolon Group:1:Melica uniflor a Group:2:Astr antia major Group:2:Gentiana lutea Group:3:Campan ula rhomboidalis Group:3:Chaeroph yllum hirsutum Group:3:Deschampsia cespitosa Group:3:Ger anium sylv aticum Group:3:Knautia dipsacif olia Group:3:Pimpinella major Group:3:Ran unculus tuberosus Group:4:Epilobium angustif olium Group:4:Filipendula ulmar ia Group:4:Her acleum sphondylium Group:4:Rume x ar ifolius Group:4:Silene dioica Group:4:Ur tica dioica Group:5:Anth yllis vulner aria Group:5:Arctostaph ylos uv a−ursi Group:5:Br ach ypodium r upestre Group:5:Br iza media Group:5:Brom us erectus Group:5:Campan ula rotundif olia Group:5:Carduus deflor atus Group:5:Care x flacca Group:5:Care x semper virens Group:5:Euphorbia cypar issias Group:5:Glob ular ia n udicaulis Group:5:Gymnadenia conopsea Group:5:Helianthem um n umm ular ium Group:5:Hippocrepis comosa Group:5:Hyper icum r icher i Group:5:J uniper us comm unis Group:5:Laser pitium latif olium Group:5:Leucanthem um adustum Group:5:Lin um cathar ticum Group:5:Lotus cor niculatus Group:5:Onobr ychis montana Group:5:Or igan um vulgare Group:5:Ph yteuma orbiculare Group:5:Plantago atr ata Group:5:P olygala chamaeb uxus Group:5:Rhinanthus alectorolophus Group:5:Sanguisorba minor Group:5:Serr atula tinctor ia Group:5:Sesler ia caer ulea Group:5:T eucr ium chamaedr ys Group:6:Cor ylus a vellana Group:6:Fr axin us e xcelsior Group:6:Heder a helix Group:6:Lonicer a xylosteum Group:6:Mercur ialis perennis Group:7:Aln us alnobetula Group:7:Antho xanthum odor atum Group:7:Ar nica montana Group:7:Care x ferr uginea Group:7:Care x panicea Group:7:Centaurea uniflor a Group:7:Cirsium palustre Group:7:F estuca r ubra Group:7:Holcus lanatus Group:7:Leontodon hispidus Group:7:Lin um alpin um Group:7:Lister a ovata Group:7:L ysimachia vulgar is Group:7:Nardus str icta Group:7:Phr agmites austr alis Group:7:Plantago lanceolata Group:7:P olygon um vivipar um Group:7:P otentilla erecta Group:7:Pulsatilla alpina Group:7:T raunsteiner a globosa Group:7:T rifolium pr atense Group:7:T rollius europaeus Group:8:Hier acium m uror um Group:8:Sorb us ar ia Group:9:Deschampsia fle xuosa Group:9:Homogyne alpina Group:9:Luzula sieber i Group:9:Rhododendron f errugineum Group:9:Sorb us chamaemespilus Group:9:V accinium vitis−idaea Group:9:V aler iana montana Group:10:Picea abies Group:10:Prenanthes pur purea Group:10:Sorb us aucupar ia Group:11:Ph yteuma spicatum Group:12:Acer campestre Group:12:Cor nus sanguinea Group:12:Cr ataegus monogyna Group:12:Ligustr um vulgare Group:12:Vib urnum lantana Group:13:V accinium m yrtillus Group:14:Dactylis glomer ata Group:14:Lath yrus pr atensis Group:15:Abies alba Group:15:F agus sylv atica Group:16:Ajuga reptans Group:17:Clematis vitalba Group:18:Fr agar ia v esca Group:19:Saxifr aga rotundif olia Group:1:Aruncus dioicus Group:1:Athyrium filix−femina Group:1:Cardamine pentaphyllos Group:1:Carex sylvatica Group:1:Dryopteris filix−mas Group:1:Galium odoratum Group:1:Geranium robertianum Group:1:Hordelymus europaeus Group:1:Lamium galeobdolon Group:1:Melica uniflora Group:2:Astrantia major Group:2:Gentiana lutea Group:3:Campanula rhomboidalis Group:3:Chaerophyllum hirsutum Group:3:Deschampsia cespitosa Group:3:Geranium sylvaticum Group:3:Knautia dipsacifolia Group:3:Pimpinella major Group:3:Ranunculus tuberosus Group:4:Epilobium angustifolium Group:4:Filipendula ulmaria Group:4:Heracleum sphondylium Group:4:Rumex arifolius Group:4:Silene dioica Group:4:Urtica dioica Group:5:Anthyllis vulneraria Group:5:Arctostaphylos uva−ursi Group:5:Brachypodium rupestre Group:5:Briza media Group:5:Bromus erectus Group:5:Campanula rotundifolia Group:5:Carduus defloratus Group:5:Carex flacca Group:5:Carex sempervirens Group:5:Euphorbia cyparissias Group:5:Globularia nudicaulis Group:5:Gymnadenia conopsea Group:5:Helianthemum nummularium Group:5:Hippocrepis comosa Group:5:Hypericum richeri Group:5:Juniperus communis Group:5:Laserpitium latifolium Group:5:Leucanthemum adustum Group:5:Linum catharticum Group:5:Lotus corniculatus Group:5:Onobrychis montana Group:5:Origanum vulgare Group:5:Phyteuma orbiculare Group:5:Plantago atrata Group:5:Polygala chamaebuxus Group:5:Rhinanthus alectorolophus Group:5:Sanguisorba minor Group:5:Serratula tinctoria Group:5:Sesleria caerulea Group:5:Teucrium chamaedrys Group:6:Corylus avellana Group:6:Fraxinus excelsior Group:6:Hedera helix Group:6:Lonicera xylosteum Group:6:Mercurialis perennis Group:7:Alnus alnobetula Group:7:Anthoxanthum odoratum Group:7:Arnica montana Group:7:Carex ferruginea Group:7:Carex panicea Group:7:Centaurea uniflora Group:7:Cirsium palustre Group:7:Festuca rubra Group:7:Holcus lanatus Group:7:Leontodon hispidus Group:7:Linum alpinum Group:7:Listera ovata Group:7:Lysimachia vulgaris Group:7:Nardus stricta Group:7:Phragmites australis Group:7:Plantago lanceolata Group:7:Polygonum viviparum Group:7:Potentilla erecta Group:7:Pulsatilla alpina Group:7:Traunsteinera globosa Group:7:Trifolium pratense Group:7:Trollius europaeus Group:8:Hieracium murorum Group:8:Sorbus aria Group:9:Deschampsia flexuosa Group:9:Homogyne alpina Group:9:Luzula sieberi Group:9:Rhododendron ferrugineum Group:9:Sorbus chamaemespilus Group:9:Vaccinium vitis−idaea Group:9:Valeriana montana Group:10:Picea abies Group:10:Prenanthes purpurea Group:10:Sorbus aucuparia Group:11:Phyteuma spicatum Group:12:Acer campestre Group:12:Cornus sanguinea Group:12:Crataegus monogyna Group:12:Ligustrum vulgare Group:12:Viburnum lantana Group:13:Vaccinium myrtillus Group:14:Dactylis glomerata Group:14:Lathyrus pratensis Group:15:Abies alba Group:15:Fagus sylvatica Group:16:Ajuga reptans Group:17:Clematis vitalba Group:18:Fragaria vesca Group:19:Saxifraga rotundifolia Group:20:Rubus fruticosus

(21)

Correlation coefficients Regression coefficients 0 10000 20000 30000 40000 0 10000 20000 30000 40000 0 10 20 30 40 0 50 100 150 200 250

Effective sample size

Counts

Model DP DPc PYc

Figure S10. Effective sample size for the coefficients of covariance matrix Σ (left) and regression

(22)

Correlation coefficients Regression coefficients D P D P c P Y c 1.000 1.025 1.050 1.075 1.000 1.025 1.050 1.075 0 100 200 300 400 0 100 200 300 0 100 200

Potential scale reduction factor

Counts

Figure S11. Potential scale reduction factor for coefficient of covariance matrix Σ (left) and regression

(23)

LIST OF TABLES

S1 Hyperparameters used for model fitting in models DP, DPc, PYc(Section S.5). 24

S2 Prediction performance and model fit for all the models. AUCoutcorresponds to the

model out-of-sample prediction. AUCin is prediction on the train data. 25

S3 Parameters for DPc and PYcmodels, where expected number of clusters is defined as

(24)

Table S1. Hyperparameters used for model fitting in models DP, DPc, PYc(SectionS.5). Parameters DP DPc PYc α mean 112 6.23 0.47 ν1 - 1.93 -ν2 - 0.31 -σ 0 0 0.5 E[KS] 56.2 16 16

(25)

Table S2. Prediction performance and model fit for all the models. AUCoutcorresponds to the model out-of-sample prediction. AUCinis prediction on the

train data.

Parameter DP DPc PYc

AUCout 0.745 0.747 0.746

(26)

Table S3. Parameters for DPcand PYcmodels, where expected number of clusters is defined as E[KS] take values in {8, 56} (SectionS.8.3). Models DPc PYc α mean 2.7 109.5 0.64 7.7 ν1 0.34 600 - -ν2 0.13 5.4 - -σ 0 0 0.25 0.8 E[KS] 8 56 8 56