Analysis of random fields using CompRandFld

(1)

January 2015, Volume 63, Issue 9. http://www.jstatsoft.org/

Analysis of Random Fields Using CompRandFld

Simone A Padoan

Bocconi University

Moreno Bevilacqua

University of Valpara´ıso

Abstract

Statistical analysis based on random fields has become a widely used approach in order to better understand real processes in many fields such as engineering, environmental sciences, etc. Data analysis based on random fields can be sometimes problematic to carry out from the inferential prospective. Examples are when dealing with: large dataset, counts or binary responses and extreme values data. This article explains how to perform, with the R package CompRandFld, challenging statistical analysis based on Gaussian, binary and max-stable random fields. The software provides tools for performing the statistical inference based on the composite likelihood in complex problems where standard likelihood methods are difficult to apply. The principal features are illustrated by means of simulation examples and an application of Irish daily wind speeds.

Keywords: binary data, composite likelihood, covariance tapering, environmental data anal-ysis, extremal coefficient, Gaussian random field, geostatistics, kriging, large dataset, max-stable random field, wind speed.

1. Introduction

Today, statistical analyses with one or more variables, observed in space and time are useful in different areas, such as environmental sciences and engineering fields (Cressie and Wikle 2011). Random fields (RFs) are the mathematical foundation for statistical spatial analyses;

basic introductions to this topic can be found in: Wackernagel (1998), Diggle and Ribeiro

(2007),Cressie and Wikle(2011) to name a few. One of the primary aims of spatio-temporal

analyses, together with marginal information, is to assess the dependence structure of the underlying random field. With spatial or spatio-temporal applications, the fitting of models with the maximum likelihood (ML) method may be troublesome to apply for several reasons. We discuss two causes: when the analysis concerns large datasets and when the analysis concerns datasets that are not normally distributed.

(2)

random field. For most of the well-known covariance models, the expression of the ML esti-mator is unknown and hence the parameter estimation is obtained by numerically maximizing the likelihood. Each likelihood evaluation requires the computation of the determinant and the inverse of the variance-covariance matrix and these tasks are generally performed with algorithms based on the Cholesky factorization, demanding a computational burden of the

order O(m3), see Bevilacqua, Gaetan, Mateu, and Porcu (2012) and the references therein.

This can be very time consuming.

In the second case, two interesting types of data are widely discussed in the literature: cat-egorical and extreme values. In order to model spatial binary or catcat-egorical data, one way is by dichotomizing or categorizing the outcome of a Gaussian RF into several classes

de-pending on some threshold values, see for instanceHeagerty and Lele(1998),de Leon(2005).

By doing this, the joint probability model depends on the underlying latent process. For a

dataset of size m, the expression of the joint distribution involves a sum of qm m-dimensional

normal integrals, where q is the number of categories. The closed form of the ML estimator is unknown and hence as an alternative the parameter estimation can be done numerically, but with a high computational cost.

In order to model spatial extremes, one way is to work with max-stable RFs (see e.g.,de Haan

and Ferreira 2006, Chapter 9). These were introduced by de Haan (1984) as an extension

of the block maxima approach from finite-dimensional to infinite-dimensional spaces. Also in this case likelihood-based inference is challenging. For most of the known models, (e.g., Davison, Padoan, and Ribatet 2012) although the closed form of the joint distribution is known, with a dataset of size m the derivation of the likelihood function requires computing a number of derivatives equal to the mth Bell number (Davison and Gholamrezaee 2012). Thus, for most spatial applications this calculation is not feasible in practice.

Alternative approaches have been proposed in order to perform parametric statistical inference with spatial and spatio-temporal dataset. For instance, in the last decade the strategies

pro-posed are based on simplifications of the covariance structure (e.g.,Cressie and Johannesson

2008) or likelihood approximations (e.g.,Stein, Chi, and Welty 2004;Fuentes 2007;Kaufman,

Schervish, and Nychka 2008). Inference based on the composite likelihood function (Lind-say 1988) has been shown to be a valid competitor. Motivations are the following. Firstly, the composite generalizes the ordinary likelihood and contains many interesting alternatives. Secondly, a number of theoretical results exist about the consistency and the asymptotic distribution of the related estimator and statistical tests. Thirdly, the computational cost is substantially lower than that of the likelihood. Finally, the approach is applicable with different types of data such as Gaussian, binary and extreme values. Examples of statistical

analyses based on the composite likelihood are: Nott and Ryd´en (1999), Heagerty and Lele

(1998) for binary data,Padoan, Ribatet, and Sisson(2010),Davison and Blanchet(2011) for

extreme values,Bevilacqua et al.(2012) for Gaussian responses.

For Gaussian, binary and max-stable RFs, several routines for performing the statistical inference based on the composite likelihood have been put together and implemented in an R (R Core Team 2014) package named CompRandFld (Padoan and Bevilacqua 2015). This article describes how to use this package to perform statistical analysis of random fields.

The article is organized as follows: Section2 summarizes basic notions on Gaussian, binary

and max-stable RFs, Section3summarizes the principal concepts of the composite likelihood

approach with some simulation examples that explain the main capabilities of CompRandFld.

(3)

2. Random fields

2.1. Gaussian random fields

Gaussian RF are useful for describing the random behavior of the mean levels of a real process. Here, a few basic notions are recalled.

Let {Y (s, t), (s, t) ∈ I} be a real-valued random process defined on I, where I = S × T , with

S ⊆ IRd_{and T ⊆ IR. When T = {t}

0} then I ≡ S and Y (s) ≡ Y (s, t0) is a purely spatial

ran-dom field. When S= {s0} then I ≡ T and Y (t) ≡ Y (s0, t) is a purely temporal random field.

Let I := {1, . . . , l} and J := {1, . . . , k} be sets of indices and I∗= I ×J be the Cartesian

prod-uct with cardinality m = k·l. It is assumed for simplicity that Y (s, t) is a second-order

station-ary spatio-temporal Gaussian RF, that means IE{Y (s, t)} = µ and var{Y (s, t)} = ω2 are finite

constants for all (s, t) ∈ I and the covariance cov{Y (s, t), Y (s0, t0)} = IE{Y (s, t), Y (s0, t)}−µ2,

(s, t), (s0, t0) ∈ I can be expressed by c(h, u), namely c(·, ·) is a positive definite function that

depends only on the separations between the points h = s0 − s and u = t0 − t (Cressie

and Wikle 2011, Chapter 6). The covariance is said to be translation-invariant. Since

c(0, 0) = var{Y (s, t)}, then ρ(h, u) = c(h, u)/c(0, 0) is the correlation function of Y . In order to take into account the local variation of real phenomena the overall process is defined

by Y (s, t) = Y0(s, t) + ε(s, t), where ε(s, t) ∼ N (0, τ2) is a white noise independent from Y0.

The covariance takes the following functional form

c(h, u) = τ21I{(0,0)}((h, u)) + σ2ρ0(h, u), (h, u) ∈ IRd× IR, (1)

where 1I(·) is the indicator function, σ2> 0 and ρ0 are the common variance and correlation

of the RF, whereas τ2 > 0 is the local variance that describes the so-called nugget effect

(e.g., Diggle and Ribeiro 2007; Cressie and Wikle 2011, Chapter 4). The variance of Y (s, t)

is equal to ω2 = τ2+ σ2 and the covariance function σ2ρ0(h, u). Since Y (s, t) is also

intrin-sically stationary (Cressie and Wikle 2011, Chapter 6), then its variability can equivalently be represented as 2γ(h, u) = var{Y (s + h, t + u) − Y (s, t)}. The function γ(·, ·) is called the semivariogram while the quantity 2γ(·, ·) is known as the variogram and under second order stationary γ(h, u) = c(0, 0) − c(h, u). A stationary space time covariance is called

spa-tially isotropic if c(h, u0) depends only on (khk, u0) for any fixed temporal lag u0, seePorcu,

Montero, and Schlather(2012).

There is extensive literature on parametric spatial covariance models, seeCressie and Wikle

(2011, Chapter 4) and the reference therein, and in the past years many parametric models for space-time covariances have been proposed. A possible covariance model is obtained as the product of any valid spatial and temporal covariance. This kind of covariance model, called separable model, has been criticized for its lack of flexibility. For this reason, different classes of non separable covariance models have been proposed recently in order to capture

possible space-time interactions (e.g.,Porcu, Mateu, and Christakos 2009;Cressie and Wikle

2011, Chapter 6). Some examples of covariance models implemented in CompRandFld and

used later in Section3and4are found in Table1. Observe that the first model is a separable

space-time stable or power-exponential covariance, the second is a special case of the class

proposed inGneiting(2002), the third and fourth are special cases of the classes proposed in

De Iaco, Myers, and Posa(2002) and Porcu et al. (2009). Finally, the fifth and sixth models

(4)

Model Functional form Parameters’ range

Double stable ρst(h, u) = exp {−ds(h) − dt(u)}

Gneiting ρst(h, u) = gt(u)−1exp

n − ds(h) gt(u)−ηκs/2 o 0 ≤ η ≤ 1 Iacocesare ρst(h, u) = {gt(u) + ds(h)}−η η > 0 Porcu ρst(h, u) = {0.5gs(h)η+ 0.5gt(u)η}−1/η 0 ≤ η ≤ 1 Matern ρs(h) = 21−αds(h)α/κsKα _khk ψs Γ(α)−1 α > 0 Gen. Cauchy ρs(h) = gs(h)−α/κs α > 0 ds(h) = _khk ψs κs , gs(h) = 1 + ds(h), ψs, ψt> 0 dt(u) = _|u| ψt κt , gt(u) = 1 + dt(u) 0 < κs, κt≤ 2

Table 1: Parametric families of isotropic correlation models. The first four models are

spatio-temporal and the last two are spatial and Kα is the modified Bessel function of order α, Γ is

the gamma function, η is a separability parameter, α, κs, κt are smoothing parameters and

ψs, ψt are scale parameters. The subscripts s and t stand for space and time.

functions implemented in the CompRandFld is available with the online help of the routine CovMatrix.

2.2. Binary random fields

In some applications the focus is on modeling the presence or absence of a certain phenomenon, observable across space and time. To describe the stochastic behavior of such a process a binary RF is needed.

An approach for defining a spatio-temporal binary random field is the following (e.g.,Heagerty

and Lele 1998). Let {Y (s, t), (s, t) ∈ I} be a second-order stationary spatio-temporal Gaussian RF with correlation function ρ(h, u) that we assume known up to a vector of parameters ϑ. Then, a binary spatio-temporal RF is defined by

Z(s, t) = 1I(b,+∞)(Y (s, t) + ε(s, t)), (2)

where for all (s, t) ∈ I, ε(s, t) ∼ N (0, τ2) is a white noise and b ∈ IR is a threshold. We

assume that ε is independent from Y . With this definition, for all (s, t) ∈ I, the univariate marginal probability of Z is p1 ≡ pr(Z(s, t) = 1) = Φ µ − b ω ,

where µ is the mean of Z and Φ is the univariate standard Gaussian distribution. In addition, for any (s, t), (s0, t0) ∈ I with (s, t) 6= (s0, t0), the bivariate joint probabilities are equal to

p11(h, u) ≡ pr{Z(s, t) = 1, Z(s0, t0) = 1} = Φ2 µ − b ω , µ − b ω ; σ2ρ(h, u; ϑ) ω2 , p00(h, u) = 1 − 2p1+ p11(h, u), (3) p10(h, u) ≡ p01(h, u) = p1− p11(h, u),

where Φ2 is the bivariate standard Gaussian distribution. In our package, Φ2 is computed

(5)

and the joint probability of success p1 and p11(h, u) completely characterize the bivariate

distribution. Higher order joint probabilities are similarly derived, by using the inclusion-exclusion principle. Thus given m spatio-temporal points, the computation of the probabilities for all the related success and failure events requires the evaluation of high-dimensional normal integrals, up to the order m.

2.3. Max-stable random fields

In some applications the focus is on modeling exceptionally high (low) levels of a real process. One way for describing the random behavior of spatial or spatio-temporal extremes is provided

by max-stable RFs. For simplicity let us focus on the spatial case. Let {Y (s)}s∈S be a

stationary RF with continuous sample path and Y1, . . . , Yn be n independent copies of it.

Consider the pointwise maximum process Mn(s) := maxi=1,...,nYi(s), s ∈ S. If there exist

continuous functions an(s) > 0 and bn(s) ∈ IR, for all s ∈ S, such that a−1n (s){Mn(s) − bn(s)}

converge weakly to a process Z(s) with non-degenerate marginal distributions for all s ∈ S,

then Z is a max-stable RF (e.g., de Haan and Ferreira 2006, Chapter 9). For any s ∈ S,

max-stable random fields have univariate marginal generalized extreme value distributions and for a finite set of locations the joint distribution

pr{Z(sj) ≤ zj, j ∈ J } = exp{−V (z)}, V (z) = k Z ∆k−1 max j∈J wj yj υ(dw), z ∈ IRk₊ (4)

with unit Fr´echet marginal distributions, where υ is a probability distribution defined on the

(k − 1)-dimensional unit simplex ∆k−1 and V is the so-called exponent dependence function

(e.g., de Haan and Ferreira 2006, Chapter 1,6). Examples of max-stable RF are the

Brown-Resnick (Kabluchko, Schlather, and de Haan 2009;Kabluchko 2010), the extremal-Gaussian

(Schlather 2002) and the extremal-t (Davison et al. 2012). In the first case the exponent dependence function of the bivariate distribution is

V1(z1, z2; h) = 1 z1 Φ λ(h) 2 + log z2/z1 λ(h) + 1 z2 Φ λ(h) 2 + log z1/z2 λ(h) , (5)

where λ(h) = p2γ(h) with γ denoting the semivariogram of the underlying Gaussian RF.

(5) was initially derived by H¨usler and Reiss (1989) and is also the exponent function of

the max-stable RF described by Smith(1990) but with a different form for λ(h). For these

processes, extensions to the spatio-temporal case with dependence function λ(h, u) exist, see

Kabluchko(2009),Davis, Kl¨uppelberg, and Steinkohl(2013). In the second case the bivariate

distribution depends on the exponent function V2(z1, z2; h) = 1 2 1 z1 + 1 z2 1 + 1 −2z1z2{ρ(h) + 1} (z1+ z2)2 1/2! , (6)

where ρ(h) is the correlation function ρ(h) of the underlying Gaussian RF. Finally, in the third case, the bivariate distribution depends on the exponent function

V3(z1, z2; h) = 1 z1 Tν+1 ( (z2/z1)1/ν− ρ(h) p{1 − ρ(h)2_{}/(ν + 1)} ) + 1 z2 Tν+1 ( (z1/z2)1/ν− ρ(h) p{1 − ρ(h)2_{}/(ν + 1)} ) , (7)

where Tν+1 is a Student-t distribution with ν + 1 degrees of freedom, ρ(h) is the scaling

(6)

A summary for describing the extremal dependence is ξ(h) = V (1, 1; h), named the pairwise extremal coefficient function (Smith 1990). It holds that 1 ≤ ξ(h) ≤ 2, where the lower bound represents the complete dependence case and the upper bound the independence case. For the above mentioned processes the extremal coefficients are equal to

ξ1(h) = 2Φ r γ(h) 2 ! , ξ2(h) = 1 + r 1 − ρ(h) 2 , ξ3(h) = 2Tν+1 (s (ν + 1)(1 − ρ(h) 1 + ρ(h) ) , (8) respectively.

3. Composite likelihood approach

Composite likelihood (CL) is a general class of estimating functions based on the likelihood of

marginal or conditional events (e.g.,Varin, Reid, and Firth 2011). Let Y = (Y1, . . . , Ym)>be

an m-dimensional random vector with joint distribution f (Y; θ), θ ∈ Θ ⊆ IRd. Let A1. . . , AR

be marginal or conditional sets of Y, then CL is defined as

`c(θ) = R

X

r=1

`(θ; Ar)wr, (9)

where `(θ; Ar) is the log-likelihood related to the density f (Y ∈ Ar; θ) obtained by considering

only the random variables in Arand wr are suitable non negative weights that do not depend

on θ. The maximum composite likelihood (MCL) estimator is given by bθ = argmax_θ`c(θ).

The CL estimation method has nice properties (e.g., Varin et al. 2011) among which, if the

sample size n is such that n m and n goes to infinity, then bθ is weakly consistent and

asymptotically normally distributed with variance-covariance matrix G(θ)−1, where

G(θ) = H(θ)J (θ)−1H(θ)> (10)

is the Godambe information matrix and

H(θ) = −IE[∇2`c(θ)], J (θ) = IE[∇`c(θ)∇`c(θ)>],

where ∇ denotes the vector differential operator. On the other hand, if n m or dependent data are available such as a single spatial sequence or a univariate time series, then the

above mentioned asymptotic results do not necessary hold, seeCox and Reid (2004). Then,

consistency and asymptotic normality of the estimator, as m goes to infinity, must be shown case by case. Likelihood-based statistics such as Wald statistic, score statistic and likelihood ratio statistic and their asymptotic distributions are also available in the CL setting (Chandler

and Bate 2007; Varin et al. 2011; Pace, Salvan, and Sartori 2011). In addition, the Akaike

information criterion used for model selection in the CL context assumes the following form:

CLIC = −2`c(θ) + 2tr{H(θ)G−1(θ)}, (11)

where tr(A) is the trace of the matrix A, see Varin and Vidoni (2005). The CL is a wide

class of functions that also includes the standard likelihood and other alternatives as special

cases. For example, the former is obtained letting Ar = Y, R = 1 and wr = 1. Depending

(7)

Composite Likelihood Full Likelihood Variants: Restricted, Tapered. Marginal Composite Likelihood Conditional Composite Likelihood Variants: independence, pairwise, difference, triwise. Variants: neighbours, pairwise, triwise, full.

Figure 1: Hierarchical structure involved with the composite likelihood approach.

CL can be obtained. One of the most explored cases is the pairwise likelihood. Considering

the marginal pairs Ar = (Yi, Yj), then we obtain the pairwise marginal log-likelihood. If we

let Ar = (Yi|Yj), then we obtain the pairwise conditional log-likelihood and finally setting

Ar = (Yi− Yj), then we obtain the pairwise difference log-likelihood. In these special cases,

log-likelihoods, for a single data contribution are respectively:

`pm(θ) = m X i=1 m X j>i log f (yi, yj; θ)wij, (12) `pc(θ) = m X i=1 m X j6=i log f (yi|yj; θ)wij, (13) `pd(θ) = m X i=1 m X j>i log f (yi− yj; θ)wij. (14)

For these likelihoods the computational burden is O(m2). In principle, the role of the weights

is to improve the statistical efficiency and save computational time, see for example Padoan

et al.(2010), Bevilacqua et al. (2012), Bevilacqua and Gaetan(2014) and Sang and Genton

(2014). The diagram in Figure 1 summarizes the types of likelihoods belonging to the CL

class that have been implemented in the package. For comparison purposes we have also implemented in the package other variants of the ordinary likelihood, such as the restricted

likelihood ofHarville(1977) and the tapered likelihood of Kaufman et al.(2008).

3.1. The Gaussian case

Assume that y = {y(s1, t1), . . . , y(sk, tl)}> is a realisation from a spatio-temporal Gaussian

RF (see Section 2.1). y has an m-dimensional normal density with the parameter vector

θ = (µ, τ2, σ2, ϑ)>. ϑ is a vector of correlation parameters (see for example the models in

Table1). In the Gaussian case, there are three estimation methods available. Standard ML

(8)

Model Functional form Parameters’ range Wendland1 ρs(h; cs) = {1 − ds(h)}2₊{1 + ds(h)/2} Wendland2 ρs(h; cs) = {1 − ds(h)}4+{1 + 4ds(h)} Double Wendland2 ρst(h, u; cs, ct) = {1 − ds(h)}4+{1 + 4ds(h)} × {1 − dt(u)}4₊{1 + 4dt(u)} ds(h) = khk_c_s , dt(u) = |u|_c_t cs, ct> 0

Table 2: Parametric families of isotropic taper functions.

likelihood = "Full" and type = "Standard" of the FitComposite routine. This method is recommended for small dataset. θ is estimated by maximization of the full likelihood

`(θ) = −1 2log |Σ| − 1 2(y − µ) > Σ−1(y − µ), (15)

where µ = µ1m and Σ = {τ21I{(r,q)}((j, i)) + σ2ρst(||sj− sr||, |ti− tq|); (i, j), (q, r) ∈ I∗} is the

m × m covariance matrix defined by (1). The restricted version is also available and it can be used setting type = "Restricted". Under increasing domain the asymptotic variance-covariance matrix of the ML estimator is given by the inverse of the Fisher information matrix

I(θ), see Mardia and Marshall(1984).

The second estimation method is based on the tapering likelihood (Kaufman et al. 2008), recommended for large datasets. This method is performed setting the parameter type = "Tapering". The tapering likelihood is given by

`tap(θ, c) = − 1 2log | ˜Σ| − 1 2(y − µ) > { ˜Σ−1◦ Ωc}(y − µ), (16)

where Ωc= {ρst(ksj−srk, |ti−tq|; c); (i, j), (q, r) ∈ I∗} is the m×m taper matrix, ˜Σ = Σ◦Ωc

is an m × m tapered covariance matrix and ◦ is the Schur product. Specifically, ρ(·, ·; c) is a correlation function (taper function) depending on a parameter’s vector c, which defines the compact support in space, or in time, or both (Wendland 1995). Hence, tapering in

likelihood (15) both the covariance matrix Σ and the matrix ˜Σ−1, the tapering likelihood

(16) is obtained. The parameter taper of FitComposite allows us to set different taper

functions. A list of models implemented in the package is reported in Table2. The first two

are spatial tapers, while the last is a spatio-temporal taper. The compact support parameters

for space and time can be set providing two positive values csand ctat maxdist and maxtime

(see Table 2). Likelihood (16) is implemented in CompRandFld and in the optimization

procedure it calls for specific routines of the package spam (Furrer and Sain 2010; Furrer

2014) for handling the sparse matrix ˜Σ. Under increasing domain, the maximizer of (16) is

a consistent estimator and has asymptotic Gaussian distribution, with variance given by the inverse of the Godambe matrix which is given as in (10) with

H(θ) = −IE[∇2`tap(θ, c)], J (θ) = IE[∇`tap(θ, c)∇`tap(θ, c)>],

seeShaby and Ruppert(2012). The tapering can have an effect, depending on the choice of the

compact support, when assessing a correlation structure. For example, the top-left panel of

(9)

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 −3 −2 −1 0 1 2 3 4 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 −3 −2 −1 0 1 2 3 4 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 −3 −2 −1 0 1 2 3 4 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 −2 0 2 4

Figure 2: A simulation from a zero-mean, unit-variance spatial Gaussian random field with

ex-ponential correlation and scale parameter cs = 0.2 is reported in the top-left panel. The same

simulation is shown from the top-right to the bottom-right panels, tapering the correlation using a Wendland function with compact supports 0.6, 0.4 and 0.2.

with exponential correlation and scale parameter cs= 0.2 on the unit square. Simulations are

repeated by tapering the covariance matrix with the first Wendland function of Table 2using

the compact supports: 0.6, 0.4 and 0.2. When the compact support is greater or equal to the practical range (0.6), the simulation in the top-right panel shows that there is no difference with the original simulation. While for decreasing values of compact support, the simulations on the two bottom panels show a decreasing range of the correlation structure.

The third estimation method is based on the pairwise likelihoods (12)–(14), also recommended

for large dataset. Bevilacqua and Gaetan (2014) have shown by means of a simulation study,

that under an increasing-domain asymptotic and for a fixed given compact support, the pairwise likelihood, compared to the other estimation methods, is the fastest. For instance,

Figure 3shows that the ratio between the computational time required for a single numerical

evaluation of the full and the pairwise, and the tapering and the pairwise likelihoods is greater than one, stressing that the pairwise likelihood is the fastest estimation method. The ratios have been computed for different sample sizes. We can see that for increasing sample size the pairwise likelihood provide a computational convenience that is increasingly prominent. CL estimation is performed setting likelihood = "Marginal" and type = "Difference" in FitComposite. This is based on the difference pairwise likelihood

`pd(θ) = X (i,j,q,r)∈D log γijqr(σ2, τ2, ϑ) 2 + {y(s_j, ti) − y(sr, tq)}2 4γijqr(σ2, τ2, ϑ) wijqr,

(10)

● ● ● ● ● 0 5000 10000 15000 0 500 1000 1500 2000 2500 3000 Observations

Figure 3: Points depict the ratio between the times required to numerically evaluate the full and the pairwise likelihood. The crosses report the comparison between the tapering and the pairwise likelihood. where D = ( (i, j, q, r) : q > i, i ∈ I, if r = j, j ∈ J i, q ∈ I, if r > j, j ∈ J (17)

and γijqr(σ2, τ2, ϑ) = τ21I{(r,q)}((j, i))+σ2{1−ρst(ksj−srk, |ti−tq|; ϑ)} is the variogram.

Pair-wise or conditional likelihoods (12) and (13) can be used setting likelihood = "Marginal" and type = "Pairwise" or likelihood = "Conditional". Binary weights

wijqr =

(

1 ||s_j− s_r|| ≤ c_s, |ti− tq| ≤ ct

0 otherwise

can be set by providing two positive values cs and ct at maxdist and maxtime. Under

increasing domain the maximizer of the CL is consistent and asymptotically normal with variance-covariance matrix as in (10) (Bevilacqua et al. 2012). Standard errors can be obtained setting varest=TRUE. These are obtained, for the three likelihood methods, as the square roots of the diagonal elements of the inverse of the Fisher and Godambe information. In the Gaussian case the analytical expressions of such matrices are known.

In the composite likelihood case however, since the expression of J is unwieldy, its numerical evaluation can be computationally demanding. So to quickly obtain standard errors, two options are implemented. If a single data replication is available, then the matrix is computed with a sub-sampling technique and this is obtained specifying vartype = "subsamp". If data replications are available, then the Godambe matrix is computed using its sampling version (Varin et al. 2011) and this is obtained specifying vartype = "sampling". Here is a practical example.

R> library("CompRandFld") R> set.seed(13)

(11)

R> coords <- cbind(runif(25, 0, 1), runif(25, 0, 1)) R> times <- 1:20

R> param <- list(nugget = 0, mean = 1, scale_s = 2, scale_t = 3, sill = 1) R> data <- RFsim(coordx = coords, coordt = times, corrmodel = "exp_exp",

+ param = param)$data

This code shows how to use the routine RFsim to simulate a spatio-temporal random field in 25 locations and for 20 times. The correlation is specified with corrmodel and the parameters

with param. The model used is the double exponential, the first of Table 1 with κs = κt =

1. Note that simulations from Gaussian random fields are performed using the Cholesky decomposition method. The next code shows the use of the FitComposite routine to perform MCL estimations of the parameters.

R> pml <- FitComposite(data, coordx = coords, coordt = times,

+ corrmodel = "exp_exp", likelihood = "Marginal", type = "Pairwise",

+ start = list(scale_s = 2, scale_t = 3, sill = 1),

+ fixed = list(mean = 1, nugget = 0), varest = TRUE, maxdist = 0.5,

+ maxtime = 3)

R> pml$param

scale_s scale_t sill

2.244840 5.739743 1.392082

With start, users can specify a starting value for any parameter that needs to be estimated. While with fixed, users can specify a value for a parameter that does not need to be estimated. Starting values are admitted only for parameters that are not fixed. Note that the routine CorrelationParam returns the list of parameters for any correlation model. Estimates are obtained by fitting the marginal pairwise likelihood. Binary weights are used so that only pairs of observations with spatial distances and temporal intervals smaller or equal to 0.5 and 3 are retained. print(pml) provides the summary of the fitting, omitted here for brevity. The collected results are: the parameters’ estimate with related standard errors and variance-covariance matrix, the value that maximizes the log pairwise likelihood and the value of the CLIC (11) and other information. The fitting results can be used for estimating the correlation or variogram function, computing the practical range (Diggle and Ribeiro 2007) with the Covariogram routine, or as the following code shows, for predicting the process on ungauged locations for future times.

R> times_to_pred <- 21:23 R> x <- seq(0,1, 0.02)

R> loc_to_pred <- as.matrix(expand.grid(x, x)) R> param <- as.list(c(pml$param, pml$fixed))

R> pr <- Kri(data, coords, coordt = times, loc = loc_to_pred,

+ time = times_to_pred, corrmodel = "exp_exp", param = param)

The Kri routine performs simple kriging (or ordinary kriging, Diggle and Ribeiro 2007) in

the new set of locations loc to pred and times times to pred. A dataset, spatio-temporal coordinates and a correlation model with the estimated parameters need to be specified. The

(12)

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 1st Kriging Time 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 2nd Kriging Time 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 3rd Kriging Time 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 1st Std error Time 0.5 0.6 0.7 0.8 0.9 ● ● ● ● ● ● _● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 2nd Std error Time 0.5 0.6 0.7 0.8 0.9 ● ● ● ● ● ● _● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 3rd Std error Time 0.5 0.6 0.7 0.8 0.9 ● ● ● ● ● ● _● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Figure 4: Simple kriging on a 51×51 grid of points on [0,1] and for 3 times (from left to right). The top panels report the prediction and the bottom the standard error of the prediction. The points are the locations where the data has been generated.

routine returns two matrices (together with other information), one with the predicted values and one with their associated standard errors. With standard R code, the results can be easily

visualized. Figure 4 shows, with the top and bottom panels, the predicted values and their

standard errors, on a 51 × 51 grid of points in [0, 1] and three consecutive times (left-right). Due to the strong dependence structure, we see that a noticeable dependence remains on time (although decreasing) and that the lower variabilities are recorded on the points where the data has been simulated. In CompRandFld simple and tapered kriging are implemented. The default option is simple kriging. This is performed working with the full covariance matrix, which is computationally intensive for large spatial or spatio-temporal data. A remedy is to

use the tapered kriging proposed byFurrer, Genton, and Nychka (2006). Tapered kriging is

an approximation of the classical kriging, faster to compute and hence useful when dealing with large datasets. With tapered kriging, the tapered covariance matrix is used in the simple kriging systems instead of the full covariance matrix, so that the solution is obtained using fast algorithms for sparse matrices. The routine Kri calls specific routines of the package spam (Furrer and Sain 2010) in order to handle sparse matrices. Tapered kriging can be done setting type = "Tapering" and specifying values for the parameters taper, maxdist and

maxtime. For performing tapered kriging see alsoNychka, Furrer, and Sain (2014).

3.2. Binary case

Assume that z(s1, t1), . . . , z(sk, tl) is a sample from a spatio-temporal binary random field

(see Section 2.2), where z(sj, ti) = 1 (z(sj, ti) = 0) represents the success (failure) events.

(13)

of the full likelihood is too computationally demanding, also for moderate sample sizes. For

this reason, Heagerty and Lele (1998) proposed performing the parameter estimation of the

probit model (2), using the pairwise marginal likelihood

`pm(θ) = X (i,j,q,r)∈D z(sj, ti)z(sr, tq) log(p11(ksj− srk, |ti− tq|)) + {z(sj, ti) + z(sr, tq) − 2z(sj, ti)z(sr, tq))} log(p01(||sj− sr||, |ti− tq|)) + {1 − z(sj, ti)}{1 − z(sr, tq)} log(p00(ksj− srk, |ti− tq|)), (18)

where D is as in (17) and the probabilities pxy(·, ·), x, y = 0, 1, are described in (3). MCL

fit-ting can be obtained with the routine FitComposite specifying the values model = "BinaryGauss", likelihood = "Marginal" and type = "Pairwise". The conditional like-lihood (12) is also available and it can be used by setting likelike-lihood = "Conditional".

Observe that, for simplicity and identifiability we follow the parametrization ofHeagerty and

Lele (1998), where a constant mean is assumed, τ2 + σ2 = 1, 0 < σ2 < 1, τ2 = 1 − σ2 and

b = 0. Here is a practical example. R> set.seed(11)

R> x <- seq(0, 10, 1) R> y <- seq(0, 10, 1) R> times <- seq(1, 15, 1)

R> param <- list(nugget = 0, mean = 0, scale_t = 1.5, scale_s = 2,

+ sill = 0.6)

R> data <- RFsim(x, y, times, corrmodel = "exp_exp", model = "BinaryGauss",

+ param = param, grid = TRUE)$data

This code shows how to simulate a binary random field with the routine RFsim. Since the process is based on a latent Gaussian RF, a correlation function and its parameters need to

be specified. Figure 5 shows the simulation for the first four temporal instants (top-left to

bottom-right), performed on an 11 × 11 grid on [0, 10]2 and for 15 times. The dependence

can be estimated using the lorelogram function (Heagerty and Zeger 1998). The lorelogram estimator is defined as logarithm of the pairwise odds ratio,

π(||sj− sr||, |ti− tq|) = log

{p11(||sj− sr||, |ti− tq|)}{p00(||sj− sr||, |ti− tq|)}

{p₁₀(||sj− sr||, |ti− tq|)}}{p01(||sj − sr||, |ti− tq|)}

. (19)

In the binary case, it provides a better measure of dependence than the covariance function, for instance it allows ease of interpretation and its upper and lower bounds do not depend on the mean. The following code shows how to compute the empirical estimator of (19) using the routine EVariogram.

R> lor <- EVariogram(data, x, y, times, type = "lorelogram", grid = TRUE,

+ maxdist = 10, maxtime = 10)

R> pml <- FitComposite(data, x, y, times, corrmodel = "exp_exp",

+ grid = TRUE, maxdist = 2, maxtime = 4, model = "BinaryGauss",

+ fixed = list(nugget = 0), start = list(mean = 0, scale_s = 2,

+ scale_t = 1.5, sill = 0.5))

(14)

0 2 4 6 8 10 0 2 4 6 8 10 1st time instant x y 0 2 4 6 8 10 0 2 4 6 8 10 2nd time instant x y 0 2 4 6 8 10 0 2 4 6 8 10 3rd time instant x y 0 2 4 6 8 10 0 2 4 6 8 10 4th time instant x y

Figure 5: Simulations of spatio-temporal binary random fields on an 11×11 grid. From left to right panels, the simulations of 4 consecutive times are reported. The green and gray colors represent the cases Y (s, t) = 1 and Y (s, t) = 0.

mean scale_s scale_t sill

0.0162018 2.1896897 1.8202424 0.9560474 R> library("scatterplot3d")

R> Covariogram(pml, vario = lor, show.vario = TRUE)

The above code shows how to compute a parametric estimate of the lorelogram. Firstly, the FitComposite routine computes the parameter estimates of the underlying latent Gaussian random field. With print(pml), the summary of the fitting results are returned (omitted here for brevity) and with pml$param, only the parameter estimates are displayed. Sec-ondly, to compute a smooth estimate of the lorelogram, based on the output obtained from FitComposite, the routine Covariogram is used. This also diplays the estimated function

(using scatterplot3d,Ligges and M¨achler 2003). Figure6shows the results. The left and right

panels display the non-parametric and parametric estimates of the spatio-temporal lorelogram versus spatial distances and temporal intervals.

3.3. Max-stable case

Assume that independent copies, z1, . . . , zm, of the vector of spatial componentwise maxima

z = {z(s1), . . . , z(sk)}>, are available. Examples of real data analyses using this type of

extremes are Padoan et al. (2010), Davison and Blanchet (2011) and Davison et al. (2012).

Suppose for simplicity that the random behavior of the spatial maxima is well described by one

of the classes of spatial max-stable RFs described in Section2.3. For those families, although

(15)

Empirical Space−time lorelogram 0 2 4 6 8 10 −0.5 0.0 0.5 1.0 1.5 2.0 0 2 4 6 8 10 Distance Time Lorelogr am ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● _●●●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Distance 0 2 4 6 8 Time 0 2 4 6 8 10 Lorelogr am 1 2 3 4 Space−time lorelogram

Figure 6: Spatio-temporal lorelogram versus spatial distances and time intervals. The left and right panels report the empirical and parametric estimates of the lorelogram

their relative density functions is unfeasible in practice. Indeed, given the particular form of a max-stable distribution, see (4), for deriving the density, a kth Bell number of derivatives is required to compute it (Davison and Gholamrezaee 2012). Nevertheless, the bivariate density is still feasible to compute in practice and it has the form

exp{−V (z1, z2)}[{∂V (z1, z2)/∂z1}{∂V (z1, z2)/∂z2} − ∂2V (z1, z2)/∂z1∂z2].

Taking the first and second order derivatives of (5)–(7) allows us to obtain the Brown– Resnick, the extremal-Gaussian and extremal-t bivariate densities. Therefore, for inference the full likelihood can be replaced by the pairwise marginal likelihood (12). Details on CL

inference for max-stable random fields are provided byPadoan et al.(2010) andDavison et al.

(2012). Here, practical examples are illustrated. R> library("RandomFields")

R> set.seed(3132)

R> coords <- cbind(runif(100, 0, 10), runif(100, 0, 10))

R> param <- list(mean = 0, sill = 1, nugget = 0, scale = 1.5, power = 1.2) R> extg <- RFsim(coords, corrmodel = "stable", model = "ExtGauss",

+ replicates = 30, param = param)$data

This code shows how to simulate an extremal-Gaussian RF with unit Fr´echet margins with

the routine RFsim. The simulation is performed using the routine MaxStableRF developed in the R package RandomFields (Schlather, Malinowski, Menck, Oesting, and Strokorb 2015). The correlation model and the parameters for the extremal-Gaussian (Schlather 2002) can be specified with corrmodel and param. The next code shows how to perform the composite likelihood-fitting of the extremal-Gaussian, using the routine FitComposite.

R> fextg <- FitComposite(extg, coords, corrmodel = "stable",

+ model = "ExtGauss", replicates = 30, varest = TRUE,

+ vartype = "Sampling", margins = "Frechet", fixed = list(sill = 1),

+ start = list(scale = 1.5, power = 1.2))

(16)

power scale 0.9156696 1.7135199

With print(fextg) the summary of the fitting results is returned (omitted here for brevity) and with fextg$param only the parameter estimates are displayed. The standard errors of the estimates are computed and specifying vartype = "Sampling" the matrices H and J in (10) are computed using their sampling version. Note that other correlation models, such as

the Whittle-Mat´ern or the generalized Cauchy (see Table1) can also be fitted. The next code

shows how to simulate and fit also the Brown–Resnick and Extremal-t models. R> bres <- RFsim(coords, corrmodel = "stable", model = "BrowResn",

+ param = param, numblock = 1000, replicates = 30)$data

R> extt <- RFsim(coords, corrmodel = "stable", model = "ExtT",

+ param = c(param, df = 3), numblock = 1000, replicates = 30)$data

R> fbres <- FitComposite(bres, coords, corrmodel = "stable",

+ model = "BrowResn", replicates = 30, varest = TRUE,

+ vartype = "Sampling", margins = "Frechet",

+ start = list(scale = 1.5, power = 1.2))

R> fextt <- FitComposite(extt, coords, corrmodel = "stable",

+ model = "ExtT", replicates = 30, varest = TRUE,

+ vartype = "Sampling", margins = "Frechet",

+ fixed = list(sill = 1), start = list(df = 3, scale = 1.5, power = 1.2))

The first two commands perform simulations from the two respective random fields. Simula-tions are obtained computing the pointwise maximum (appropriately rescaled) over a number of replications of the Gaussian or Student-t field. The number of replications is specified with numblock.

Max-stable RF can also be fitted using the least squares method proposed by Smith(1990).

The following code shows an example with the Brown-Resnick case. R> wlbr <- WLeastSquare(bres, coords, corrmodel = "stable",

+ model = "BrowResn", replicates = 30, fixed = list(sill = 1))

The extremal coefficient functions in (8) can be computed on the basis of the fitting results. These are useful for summarizing the extremal dependence. The next code shows how to use the routine Covariogram for computing the extremal coefficients for the case of the Brown– Resnick RF.

R> extcbr <- Covariogram(fbres, answer.extc = TRUE) R> extcbrw <- Covariogram(wlbr, answer.extc = TRUE)

Non-parametric estimators of the extremal coefficient are also available. A useful estimator

is obtained by means of the madogram and F -madogram introduced byCooley, Neveau, and

Poncet(2006). For instance, the empirical version of F -madogram is

ˆ ν(h) = 1 2m |Ph| X {z(sj),z(sr)}∈Ph m X i=1 |F {z(i)(sj)} − F {z(i)(sr)}|, j, r ∈ J, h > 0,