January 2015, Volume 63, Issue 9. http://www.jstatsoft.org/
Analysis of Random Fields Using CompRandFld
Simone A Padoan
Bocconi University
Moreno Bevilacqua
University of Valpara´ıso
Abstract
Statistical analysis based on random fields has become a widely used approach in order to better understand real processes in many fields such as engineering, environmental sciences, etc. Data analysis based on random fields can be sometimes problematic to carry out from the inferential prospective. Examples are when dealing with: large dataset, counts or binary responses and extreme values data. This article explains how to perform, with the R package CompRandFld, challenging statistical analysis based on Gaussian, binary and max-stable random fields. The software provides tools for performing the statistical inference based on the composite likelihood in complex problems where standard likelihood methods are difficult to apply. The principal features are illustrated by means of simulation examples and an application of Irish daily wind speeds.
Keywords: binary data, composite likelihood, covariance tapering, environmental data anal-ysis, extremal coefficient, Gaussian random field, geostatistics, kriging, large dataset, max-stable random field, wind speed.
1. Introduction
Today, statistical analyses with one or more variables, observed in space and time are useful in different areas, such as environmental sciences and engineering fields (Cressie and Wikle 2011). Random fields (RFs) are the mathematical foundation for statistical spatial analyses;
basic introductions to this topic can be found in: Wackernagel (1998), Diggle and Ribeiro
(2007),Cressie and Wikle(2011) to name a few. One of the primary aims of spatio-temporal
analyses, together with marginal information, is to assess the dependence structure of the underlying random field. With spatial or spatio-temporal applications, the fitting of models with the maximum likelihood (ML) method may be troublesome to apply for several reasons. We discuss two causes: when the analysis concerns large datasets and when the analysis concerns datasets that are not normally distributed.
random field. For most of the well-known covariance models, the expression of the ML esti-mator is unknown and hence the parameter estimation is obtained by numerically maximizing the likelihood. Each likelihood evaluation requires the computation of the determinant and the inverse of the variance-covariance matrix and these tasks are generally performed with algorithms based on the Cholesky factorization, demanding a computational burden of the
order O(m3), see Bevilacqua, Gaetan, Mateu, and Porcu (2012) and the references therein.
This can be very time consuming.
In the second case, two interesting types of data are widely discussed in the literature: cat-egorical and extreme values. In order to model spatial binary or catcat-egorical data, one way is by dichotomizing or categorizing the outcome of a Gaussian RF into several classes
de-pending on some threshold values, see for instanceHeagerty and Lele(1998),de Leon(2005).
By doing this, the joint probability model depends on the underlying latent process. For a
dataset of size m, the expression of the joint distribution involves a sum of qm m-dimensional
normal integrals, where q is the number of categories. The closed form of the ML estimator is unknown and hence as an alternative the parameter estimation can be done numerically, but with a high computational cost.
In order to model spatial extremes, one way is to work with max-stable RFs (see e.g.,de Haan
and Ferreira 2006, Chapter 9). These were introduced by de Haan (1984) as an extension
of the block maxima approach from finite-dimensional to infinite-dimensional spaces. Also in this case likelihood-based inference is challenging. For most of the known models, (e.g., Davison, Padoan, and Ribatet 2012) although the closed form of the joint distribution is known, with a dataset of size m the derivation of the likelihood function requires computing a number of derivatives equal to the mth Bell number (Davison and Gholamrezaee 2012). Thus, for most spatial applications this calculation is not feasible in practice.
Alternative approaches have been proposed in order to perform parametric statistical inference with spatial and spatio-temporal dataset. For instance, in the last decade the strategies
pro-posed are based on simplifications of the covariance structure (e.g.,Cressie and Johannesson
2008) or likelihood approximations (e.g.,Stein, Chi, and Welty 2004;Fuentes 2007;Kaufman,
Schervish, and Nychka 2008). Inference based on the composite likelihood function (Lind-say 1988) has been shown to be a valid competitor. Motivations are the following. Firstly, the composite generalizes the ordinary likelihood and contains many interesting alternatives. Secondly, a number of theoretical results exist about the consistency and the asymptotic distribution of the related estimator and statistical tests. Thirdly, the computational cost is substantially lower than that of the likelihood. Finally, the approach is applicable with different types of data such as Gaussian, binary and extreme values. Examples of statistical
analyses based on the composite likelihood are: Nott and Ryd´en (1999), Heagerty and Lele
(1998) for binary data,Padoan, Ribatet, and Sisson(2010),Davison and Blanchet(2011) for
extreme values,Bevilacqua et al.(2012) for Gaussian responses.
For Gaussian, binary and max-stable RFs, several routines for performing the statistical inference based on the composite likelihood have been put together and implemented in an R (R Core Team 2014) package named CompRandFld (Padoan and Bevilacqua 2015). This article describes how to use this package to perform statistical analysis of random fields.
The article is organized as follows: Section2 summarizes basic notions on Gaussian, binary
and max-stable RFs, Section3summarizes the principal concepts of the composite likelihood
approach with some simulation examples that explain the main capabilities of CompRandFld.
2. Random fields
2.1. Gaussian random fields
Gaussian RF are useful for describing the random behavior of the mean levels of a real process. Here, a few basic notions are recalled.
Let {Y (s, t), (s, t) ∈ I} be a real-valued random process defined on I, where I = S × T , with
S ⊆ IRdand T ⊆ IR. When T = {t
0} then I ≡ S and Y (s) ≡ Y (s, t0) is a purely spatial
ran-dom field. When S= {s0} then I ≡ T and Y (t) ≡ Y (s0, t) is a purely temporal random field.
Let I := {1, . . . , l} and J := {1, . . . , k} be sets of indices and I∗= I ×J be the Cartesian
prod-uct with cardinality m = k·l. It is assumed for simplicity that Y (s, t) is a second-order
station-ary spatio-temporal Gaussian RF, that means IE{Y (s, t)} = µ and var{Y (s, t)} = ω2 are finite
constants for all (s, t) ∈ I and the covariance cov{Y (s, t), Y (s0, t0)} = IE{Y (s, t), Y (s0, t)}−µ2,
(s, t), (s0, t0) ∈ I can be expressed by c(h, u), namely c(·, ·) is a positive definite function that
depends only on the separations between the points h = s0 − s and u = t0 − t (Cressie
and Wikle 2011, Chapter 6). The covariance is said to be translation-invariant. Since
c(0, 0) = var{Y (s, t)}, then ρ(h, u) = c(h, u)/c(0, 0) is the correlation function of Y . In order to take into account the local variation of real phenomena the overall process is defined
by Y (s, t) = Y0(s, t) + ε(s, t), where ε(s, t) ∼ N (0, τ2) is a white noise independent from Y0.
The covariance takes the following functional form
c(h, u) = τ21I{(0,0)}((h, u)) + σ2ρ0(h, u), (h, u) ∈ IRd× IR, (1)
where 1I(·) is the indicator function, σ2> 0 and ρ0 are the common variance and correlation
of the RF, whereas τ2 > 0 is the local variance that describes the so-called nugget effect
(e.g., Diggle and Ribeiro 2007; Cressie and Wikle 2011, Chapter 4). The variance of Y (s, t)
is equal to ω2 = τ2+ σ2 and the covariance function σ2ρ0(h, u). Since Y (s, t) is also
intrin-sically stationary (Cressie and Wikle 2011, Chapter 6), then its variability can equivalently be represented as 2γ(h, u) = var{Y (s + h, t + u) − Y (s, t)}. The function γ(·, ·) is called the semivariogram while the quantity 2γ(·, ·) is known as the variogram and under second order stationary γ(h, u) = c(0, 0) − c(h, u). A stationary space time covariance is called
spa-tially isotropic if c(h, u0) depends only on (khk, u0) for any fixed temporal lag u0, seePorcu,
Montero, and Schlather(2012).
There is extensive literature on parametric spatial covariance models, seeCressie and Wikle
(2011, Chapter 4) and the reference therein, and in the past years many parametric models for space-time covariances have been proposed. A possible covariance model is obtained as the product of any valid spatial and temporal covariance. This kind of covariance model, called separable model, has been criticized for its lack of flexibility. For this reason, different classes of non separable covariance models have been proposed recently in order to capture
possible space-time interactions (e.g.,Porcu, Mateu, and Christakos 2009;Cressie and Wikle
2011, Chapter 6). Some examples of covariance models implemented in CompRandFld and
used later in Section3and4are found in Table1. Observe that the first model is a separable
space-time stable or power-exponential covariance, the second is a special case of the class
proposed inGneiting(2002), the third and fourth are special cases of the classes proposed in
De Iaco, Myers, and Posa(2002) and Porcu et al. (2009). Finally, the fifth and sixth models
Model Functional form Parameters’ range
Double stable ρst(h, u) = exp {−ds(h) − dt(u)}
Gneiting ρst(h, u) = gt(u)−1exp
n − ds(h) gt(u)−ηκs/2 o 0 ≤ η ≤ 1 Iacocesare ρst(h, u) = {gt(u) + ds(h)}−η η > 0 Porcu ρst(h, u) = {0.5gs(h)η+ 0.5gt(u)η}−1/η 0 ≤ η ≤ 1 Matern ρs(h) = 21−αds(h)α/κsKα khk ψs Γ(α)−1 α > 0 Gen. Cauchy ρs(h) = gs(h)−α/κs α > 0 ds(h) = khk ψs κs , gs(h) = 1 + ds(h), ψs, ψt> 0 dt(u) = |u| ψt κt , gt(u) = 1 + dt(u) 0 < κs, κt≤ 2
Table 1: Parametric families of isotropic correlation models. The first four models are
spatio-temporal and the last two are spatial and Kα is the modified Bessel function of order α, Γ is
the gamma function, η is a separability parameter, α, κs, κt are smoothing parameters and
ψs, ψt are scale parameters. The subscripts s and t stand for space and time.
functions implemented in the CompRandFld is available with the online help of the routine CovMatrix.
2.2. Binary random fields
In some applications the focus is on modeling the presence or absence of a certain phenomenon, observable across space and time. To describe the stochastic behavior of such a process a binary RF is needed.
An approach for defining a spatio-temporal binary random field is the following (e.g.,Heagerty
and Lele 1998). Let {Y (s, t), (s, t) ∈ I} be a second-order stationary spatio-temporal Gaussian RF with correlation function ρ(h, u) that we assume known up to a vector of parameters ϑ. Then, a binary spatio-temporal RF is defined by
Z(s, t) = 1I(b,+∞)(Y (s, t) + ε(s, t)), (2)
where for all (s, t) ∈ I, ε(s, t) ∼ N (0, τ2) is a white noise and b ∈ IR is a threshold. We
assume that ε is independent from Y . With this definition, for all (s, t) ∈ I, the univariate marginal probability of Z is p1 ≡ pr(Z(s, t) = 1) = Φ µ − b ω ,
where µ is the mean of Z and Φ is the univariate standard Gaussian distribution. In addition, for any (s, t), (s0, t0) ∈ I with (s, t) 6= (s0, t0), the bivariate joint probabilities are equal to
p11(h, u) ≡ pr{Z(s, t) = 1, Z(s0, t0) = 1} = Φ2 µ − b ω , µ − b ω ; σ2ρ(h, u; ϑ) ω2 , p00(h, u) = 1 − 2p1+ p11(h, u), (3) p10(h, u) ≡ p01(h, u) = p1− p11(h, u),
where Φ2 is the bivariate standard Gaussian distribution. In our package, Φ2 is computed
and the joint probability of success p1 and p11(h, u) completely characterize the bivariate
distribution. Higher order joint probabilities are similarly derived, by using the inclusion-exclusion principle. Thus given m spatio-temporal points, the computation of the probabilities for all the related success and failure events requires the evaluation of high-dimensional normal integrals, up to the order m.
2.3. Max-stable random fields
In some applications the focus is on modeling exceptionally high (low) levels of a real process. One way for describing the random behavior of spatial or spatio-temporal extremes is provided
by max-stable RFs. For simplicity let us focus on the spatial case. Let {Y (s)}s∈S be a
stationary RF with continuous sample path and Y1, . . . , Yn be n independent copies of it.
Consider the pointwise maximum process Mn(s) := maxi=1,...,nYi(s), s ∈ S. If there exist
continuous functions an(s) > 0 and bn(s) ∈ IR, for all s ∈ S, such that a−1n (s){Mn(s) − bn(s)}
converge weakly to a process Z(s) with non-degenerate marginal distributions for all s ∈ S,
then Z is a max-stable RF (e.g., de Haan and Ferreira 2006, Chapter 9). For any s ∈ S,
max-stable random fields have univariate marginal generalized extreme value distributions and for a finite set of locations the joint distribution
pr{Z(sj) ≤ zj, j ∈ J } = exp{−V (z)}, V (z) = k Z ∆k−1 max j∈J wj yj υ(dw), z ∈ IRk+ (4)
with unit Fr´echet marginal distributions, where υ is a probability distribution defined on the
(k − 1)-dimensional unit simplex ∆k−1 and V is the so-called exponent dependence function
(e.g., de Haan and Ferreira 2006, Chapter 1,6). Examples of max-stable RF are the
Brown-Resnick (Kabluchko, Schlather, and de Haan 2009;Kabluchko 2010), the extremal-Gaussian
(Schlather 2002) and the extremal-t (Davison et al. 2012). In the first case the exponent dependence function of the bivariate distribution is
V1(z1, z2; h) = 1 z1 Φ λ(h) 2 + log z2/z1 λ(h) + 1 z2 Φ λ(h) 2 + log z1/z2 λ(h) , (5)
where λ(h) = p2γ(h) with γ denoting the semivariogram of the underlying Gaussian RF.
(5) was initially derived by H¨usler and Reiss (1989) and is also the exponent function of
the max-stable RF described by Smith(1990) but with a different form for λ(h). For these
processes, extensions to the spatio-temporal case with dependence function λ(h, u) exist, see
Kabluchko(2009),Davis, Kl¨uppelberg, and Steinkohl(2013). In the second case the bivariate
distribution depends on the exponent function V2(z1, z2; h) = 1 2 1 z1 + 1 z2 1 + 1 −2z1z2{ρ(h) + 1} (z1+ z2)2 1/2! , (6)
where ρ(h) is the correlation function ρ(h) of the underlying Gaussian RF. Finally, in the third case, the bivariate distribution depends on the exponent function
V3(z1, z2; h) = 1 z1 Tν+1 ( (z2/z1)1/ν− ρ(h) p{1 − ρ(h)2}/(ν + 1) ) + 1 z2 Tν+1 ( (z1/z2)1/ν− ρ(h) p{1 − ρ(h)2}/(ν + 1) ) , (7)
where Tν+1 is a Student-t distribution with ν + 1 degrees of freedom, ρ(h) is the scaling
A summary for describing the extremal dependence is ξ(h) = V (1, 1; h), named the pairwise extremal coefficient function (Smith 1990). It holds that 1 ≤ ξ(h) ≤ 2, where the lower bound represents the complete dependence case and the upper bound the independence case. For the above mentioned processes the extremal coefficients are equal to
ξ1(h) = 2Φ r γ(h) 2 ! , ξ2(h) = 1 + r 1 − ρ(h) 2 , ξ3(h) = 2Tν+1 (s (ν + 1)(1 − ρ(h) 1 + ρ(h) ) , (8) respectively.
3. Composite likelihood approach
Composite likelihood (CL) is a general class of estimating functions based on the likelihood of
marginal or conditional events (e.g.,Varin, Reid, and Firth 2011). Let Y = (Y1, . . . , Ym)>be
an m-dimensional random vector with joint distribution f (Y; θ), θ ∈ Θ ⊆ IRd. Let A1. . . , AR
be marginal or conditional sets of Y, then CL is defined as
`c(θ) = R
X
r=1
`(θ; Ar)wr, (9)
where `(θ; Ar) is the log-likelihood related to the density f (Y ∈ Ar; θ) obtained by considering
only the random variables in Arand wr are suitable non negative weights that do not depend
on θ. The maximum composite likelihood (MCL) estimator is given by bθ = argmaxθ`c(θ).
The CL estimation method has nice properties (e.g., Varin et al. 2011) among which, if the
sample size n is such that n m and n goes to infinity, then bθ is weakly consistent and
asymptotically normally distributed with variance-covariance matrix G(θ)−1, where
G(θ) = H(θ)J (θ)−1H(θ)> (10)
is the Godambe information matrix and
H(θ) = −IE[∇2`c(θ)], J (θ) = IE[∇`c(θ)∇`c(θ)>],
where ∇ denotes the vector differential operator. On the other hand, if n m or dependent data are available such as a single spatial sequence or a univariate time series, then the
above mentioned asymptotic results do not necessary hold, seeCox and Reid (2004). Then,
consistency and asymptotic normality of the estimator, as m goes to infinity, must be shown case by case. Likelihood-based statistics such as Wald statistic, score statistic and likelihood ratio statistic and their asymptotic distributions are also available in the CL setting (Chandler
and Bate 2007; Varin et al. 2011; Pace, Salvan, and Sartori 2011). In addition, the Akaike
information criterion used for model selection in the CL context assumes the following form:
CLIC = −2`c(θ) + 2tr{H(θ)G−1(θ)}, (11)
where tr(A) is the trace of the matrix A, see Varin and Vidoni (2005). The CL is a wide
class of functions that also includes the standard likelihood and other alternatives as special
cases. For example, the former is obtained letting Ar = Y, R = 1 and wr = 1. Depending
Composite Likelihood Full Likelihood Variants: Restricted, Tapered. Marginal Composite Likelihood Conditional Composite Likelihood Variants: independence, pairwise, difference, triwise. Variants: neighbours, pairwise, triwise, full.
Figure 1: Hierarchical structure involved with the composite likelihood approach.
CL can be obtained. One of the most explored cases is the pairwise likelihood. Considering
the marginal pairs Ar = (Yi, Yj), then we obtain the pairwise marginal log-likelihood. If we
let Ar = (Yi|Yj), then we obtain the pairwise conditional log-likelihood and finally setting
Ar = (Yi− Yj), then we obtain the pairwise difference log-likelihood. In these special cases,
log-likelihoods, for a single data contribution are respectively:
`pm(θ) = m X i=1 m X j>i log f (yi, yj; θ)wij, (12) `pc(θ) = m X i=1 m X j6=i log f (yi|yj; θ)wij, (13) `pd(θ) = m X i=1 m X j>i log f (yi− yj; θ)wij. (14)
For these likelihoods the computational burden is O(m2). In principle, the role of the weights
is to improve the statistical efficiency and save computational time, see for example Padoan
et al.(2010), Bevilacqua et al. (2012), Bevilacqua and Gaetan(2014) and Sang and Genton
(2014). The diagram in Figure 1 summarizes the types of likelihoods belonging to the CL
class that have been implemented in the package. For comparison purposes we have also implemented in the package other variants of the ordinary likelihood, such as the restricted
likelihood ofHarville(1977) and the tapered likelihood of Kaufman et al.(2008).
3.1. The Gaussian case
Assume that y = {y(s1, t1), . . . , y(sk, tl)}> is a realisation from a spatio-temporal Gaussian
RF (see Section 2.1). y has an m-dimensional normal density with the parameter vector
θ = (µ, τ2, σ2, ϑ)>. ϑ is a vector of correlation parameters (see for example the models in
Table1). In the Gaussian case, there are three estimation methods available. Standard ML
Model Functional form Parameters’ range Wendland1 ρs(h; cs) = {1 − ds(h)}2+{1 + ds(h)/2} Wendland2 ρs(h; cs) = {1 − ds(h)}4+{1 + 4ds(h)} Double Wendland2 ρst(h, u; cs, ct) = {1 − ds(h)}4+{1 + 4ds(h)} × {1 − dt(u)}4+{1 + 4dt(u)} ds(h) = khkcs , dt(u) = |u|ct cs, ct> 0
Table 2: Parametric families of isotropic taper functions.
likelihood = "Full" and type = "Standard" of the FitComposite routine. This method is recommended for small dataset. θ is estimated by maximization of the full likelihood
`(θ) = −1 2log |Σ| − 1 2(y − µ) > Σ−1(y − µ), (15)
where µ = µ1m and Σ = {τ21I{(r,q)}((j, i)) + σ2ρst(||sj− sr||, |ti− tq|); (i, j), (q, r) ∈ I∗} is the
m × m covariance matrix defined by (1). The restricted version is also available and it can be used setting type = "Restricted". Under increasing domain the asymptotic variance-covariance matrix of the ML estimator is given by the inverse of the Fisher information matrix
I(θ), see Mardia and Marshall(1984).
The second estimation method is based on the tapering likelihood (Kaufman et al. 2008), recommended for large datasets. This method is performed setting the parameter type = "Tapering". The tapering likelihood is given by
`tap(θ, c) = − 1 2log | ˜Σ| − 1 2(y − µ) > { ˜Σ−1◦ Ωc}(y − µ), (16)
where Ωc= {ρst(ksj−srk, |ti−tq|; c); (i, j), (q, r) ∈ I∗} is the m×m taper matrix, ˜Σ = Σ◦Ωc
is an m × m tapered covariance matrix and ◦ is the Schur product. Specifically, ρ(·, ·; c) is a correlation function (taper function) depending on a parameter’s vector c, which defines the compact support in space, or in time, or both (Wendland 1995). Hence, tapering in
likelihood (15) both the covariance matrix Σ and the matrix ˜Σ−1, the tapering likelihood
(16) is obtained. The parameter taper of FitComposite allows us to set different taper
functions. A list of models implemented in the package is reported in Table2. The first two
are spatial tapers, while the last is a spatio-temporal taper. The compact support parameters
for space and time can be set providing two positive values csand ctat maxdist and maxtime
(see Table 2). Likelihood (16) is implemented in CompRandFld and in the optimization
procedure it calls for specific routines of the package spam (Furrer and Sain 2010; Furrer
2014) for handling the sparse matrix ˜Σ. Under increasing domain, the maximizer of (16) is
a consistent estimator and has asymptotic Gaussian distribution, with variance given by the inverse of the Godambe matrix which is given as in (10) with
H(θ) = −IE[∇2`tap(θ, c)], J (θ) = IE[∇`tap(θ, c)∇`tap(θ, c)>],
seeShaby and Ruppert(2012). The tapering can have an effect, depending on the choice of the
compact support, when assessing a correlation structure. For example, the top-left panel of
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 −3 −2 −1 0 1 2 3 4 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 −3 −2 −1 0 1 2 3 4 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 −3 −2 −1 0 1 2 3 4 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 −2 0 2 4
Figure 2: A simulation from a zero-mean, unit-variance spatial Gaussian random field with
ex-ponential correlation and scale parameter cs = 0.2 is reported in the top-left panel. The same
simulation is shown from the top-right to the bottom-right panels, tapering the correlation using a Wendland function with compact supports 0.6, 0.4 and 0.2.
with exponential correlation and scale parameter cs= 0.2 on the unit square. Simulations are
repeated by tapering the covariance matrix with the first Wendland function of Table 2using
the compact supports: 0.6, 0.4 and 0.2. When the compact support is greater or equal to the practical range (0.6), the simulation in the top-right panel shows that there is no difference with the original simulation. While for decreasing values of compact support, the simulations on the two bottom panels show a decreasing range of the correlation structure.
The third estimation method is based on the pairwise likelihoods (12)–(14), also recommended
for large dataset. Bevilacqua and Gaetan (2014) have shown by means of a simulation study,
that under an increasing-domain asymptotic and for a fixed given compact support, the pairwise likelihood, compared to the other estimation methods, is the fastest. For instance,
Figure 3shows that the ratio between the computational time required for a single numerical
evaluation of the full and the pairwise, and the tapering and the pairwise likelihoods is greater than one, stressing that the pairwise likelihood is the fastest estimation method. The ratios have been computed for different sample sizes. We can see that for increasing sample size the pairwise likelihood provide a computational convenience that is increasingly prominent. CL estimation is performed setting likelihood = "Marginal" and type = "Difference" in FitComposite. This is based on the difference pairwise likelihood
`pd(θ) = X (i,j,q,r)∈D log γijqr(σ2, τ2, ϑ) 2 + {y(sj, ti) − y(sr, tq)}2 4γijqr(σ2, τ2, ϑ) wijqr,
● ● ● ● ● 0 5000 10000 15000 0 500 1000 1500 2000 2500 3000 Observations
Figure 3: Points depict the ratio between the times required to numerically evaluate the full and the pairwise likelihood. The crosses report the comparison between the tapering and the pairwise likelihood. where D = ( (i, j, q, r) : q > i, i ∈ I, if r = j, j ∈ J i, q ∈ I, if r > j, j ∈ J (17)
and γijqr(σ2, τ2, ϑ) = τ21I{(r,q)}((j, i))+σ2{1−ρst(ksj−srk, |ti−tq|; ϑ)} is the variogram.
Pair-wise or conditional likelihoods (12) and (13) can be used setting likelihood = "Marginal" and type = "Pairwise" or likelihood = "Conditional". Binary weights
wijqr =
(
1 ||sj− sr|| ≤ cs, |ti− tq| ≤ ct
0 otherwise
can be set by providing two positive values cs and ct at maxdist and maxtime. Under
increasing domain the maximizer of the CL is consistent and asymptotically normal with variance-covariance matrix as in (10) (Bevilacqua et al. 2012). Standard errors can be obtained setting varest=TRUE. These are obtained, for the three likelihood methods, as the square roots of the diagonal elements of the inverse of the Fisher and Godambe information. In the Gaussian case the analytical expressions of such matrices are known.
In the composite likelihood case however, since the expression of J is unwieldy, its numerical evaluation can be computationally demanding. So to quickly obtain standard errors, two options are implemented. If a single data replication is available, then the matrix is computed with a sub-sampling technique and this is obtained specifying vartype = "subsamp". If data replications are available, then the Godambe matrix is computed using its sampling version (Varin et al. 2011) and this is obtained specifying vartype = "sampling". Here is a practical example.
R> library("CompRandFld") R> set.seed(13)
R> coords <- cbind(runif(25, 0, 1), runif(25, 0, 1)) R> times <- 1:20
R> param <- list(nugget = 0, mean = 1, scale_s = 2, scale_t = 3, sill = 1) R> data <- RFsim(coordx = coords, coordt = times, corrmodel = "exp_exp",
+ param = param)$data
This code shows how to use the routine RFsim to simulate a spatio-temporal random field in 25 locations and for 20 times. The correlation is specified with corrmodel and the parameters
with param. The model used is the double exponential, the first of Table 1 with κs = κt =
1. Note that simulations from Gaussian random fields are performed using the Cholesky decomposition method. The next code shows the use of the FitComposite routine to perform MCL estimations of the parameters.
R> pml <- FitComposite(data, coordx = coords, coordt = times,
+ corrmodel = "exp_exp", likelihood = "Marginal", type = "Pairwise",
+ start = list(scale_s = 2, scale_t = 3, sill = 1),
+ fixed = list(mean = 1, nugget = 0), varest = TRUE, maxdist = 0.5,
+ maxtime = 3)
R> pml$param
scale_s scale_t sill
2.244840 5.739743 1.392082
With start, users can specify a starting value for any parameter that needs to be estimated. While with fixed, users can specify a value for a parameter that does not need to be estimated. Starting values are admitted only for parameters that are not fixed. Note that the routine CorrelationParam returns the list of parameters for any correlation model. Estimates are obtained by fitting the marginal pairwise likelihood. Binary weights are used so that only pairs of observations with spatial distances and temporal intervals smaller or equal to 0.5 and 3 are retained. print(pml) provides the summary of the fitting, omitted here for brevity. The collected results are: the parameters’ estimate with related standard errors and variance-covariance matrix, the value that maximizes the log pairwise likelihood and the value of the CLIC (11) and other information. The fitting results can be used for estimating the correlation or variogram function, computing the practical range (Diggle and Ribeiro 2007) with the Covariogram routine, or as the following code shows, for predicting the process on ungauged locations for future times.
R> times_to_pred <- 21:23 R> x <- seq(0,1, 0.02)
R> loc_to_pred <- as.matrix(expand.grid(x, x)) R> param <- as.list(c(pml$param, pml$fixed))
R> pr <- Kri(data, coords, coordt = times, loc = loc_to_pred,
+ time = times_to_pred, corrmodel = "exp_exp", param = param)
The Kri routine performs simple kriging (or ordinary kriging, Diggle and Ribeiro 2007) in
the new set of locations loc to pred and times times to pred. A dataset, spatio-temporal coordinates and a correlation model with the estimated parameters need to be specified. The
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 1st Kriging Time 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 2nd Kriging Time 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 3rd Kriging Time 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 1st Std error Time 0.5 0.6 0.7 0.8 0.9 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 2nd Std error Time 0.5 0.6 0.7 0.8 0.9 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 3rd Std error Time 0.5 0.6 0.7 0.8 0.9 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Figure 4: Simple kriging on a 51×51 grid of points on [0,1] and for 3 times (from left to right). The top panels report the prediction and the bottom the standard error of the prediction. The points are the locations where the data has been generated.
routine returns two matrices (together with other information), one with the predicted values and one with their associated standard errors. With standard R code, the results can be easily
visualized. Figure 4 shows, with the top and bottom panels, the predicted values and their
standard errors, on a 51 × 51 grid of points in [0, 1] and three consecutive times (left-right). Due to the strong dependence structure, we see that a noticeable dependence remains on time (although decreasing) and that the lower variabilities are recorded on the points where the data has been simulated. In CompRandFld simple and tapered kriging are implemented. The default option is simple kriging. This is performed working with the full covariance matrix, which is computationally intensive for large spatial or spatio-temporal data. A remedy is to
use the tapered kriging proposed byFurrer, Genton, and Nychka (2006). Tapered kriging is
an approximation of the classical kriging, faster to compute and hence useful when dealing with large datasets. With tapered kriging, the tapered covariance matrix is used in the simple kriging systems instead of the full covariance matrix, so that the solution is obtained using fast algorithms for sparse matrices. The routine Kri calls specific routines of the package spam (Furrer and Sain 2010) in order to handle sparse matrices. Tapered kriging can be done setting type = "Tapering" and specifying values for the parameters taper, maxdist and
maxtime. For performing tapered kriging see alsoNychka, Furrer, and Sain (2014).
3.2. Binary case
Assume that z(s1, t1), . . . , z(sk, tl) is a sample from a spatio-temporal binary random field
(see Section 2.2), where z(sj, ti) = 1 (z(sj, ti) = 0) represents the success (failure) events.
of the full likelihood is too computationally demanding, also for moderate sample sizes. For
this reason, Heagerty and Lele (1998) proposed performing the parameter estimation of the
probit model (2), using the pairwise marginal likelihood
`pm(θ) = X (i,j,q,r)∈D z(sj, ti)z(sr, tq) log(p11(ksj− srk, |ti− tq|)) + {z(sj, ti) + z(sr, tq) − 2z(sj, ti)z(sr, tq))} log(p01(||sj− sr||, |ti− tq|)) + {1 − z(sj, ti)}{1 − z(sr, tq)} log(p00(ksj− srk, |ti− tq|)), (18)
where D is as in (17) and the probabilities pxy(·, ·), x, y = 0, 1, are described in (3). MCL
fit-ting can be obtained with the routine FitComposite specifying the values model = "BinaryGauss", likelihood = "Marginal" and type = "Pairwise". The conditional like-lihood (12) is also available and it can be used by setting likelike-lihood = "Conditional".
Observe that, for simplicity and identifiability we follow the parametrization ofHeagerty and
Lele (1998), where a constant mean is assumed, τ2 + σ2 = 1, 0 < σ2 < 1, τ2 = 1 − σ2 and
b = 0. Here is a practical example. R> set.seed(11)
R> x <- seq(0, 10, 1) R> y <- seq(0, 10, 1) R> times <- seq(1, 15, 1)
R> param <- list(nugget = 0, mean = 0, scale_t = 1.5, scale_s = 2,
+ sill = 0.6)
R> data <- RFsim(x, y, times, corrmodel = "exp_exp", model = "BinaryGauss",
+ param = param, grid = TRUE)$data
This code shows how to simulate a binary random field with the routine RFsim. Since the process is based on a latent Gaussian RF, a correlation function and its parameters need to
be specified. Figure 5 shows the simulation for the first four temporal instants (top-left to
bottom-right), performed on an 11 × 11 grid on [0, 10]2 and for 15 times. The dependence
can be estimated using the lorelogram function (Heagerty and Zeger 1998). The lorelogram estimator is defined as logarithm of the pairwise odds ratio,
π(||sj− sr||, |ti− tq|) = log
{p11(||sj− sr||, |ti− tq|)}{p00(||sj− sr||, |ti− tq|)}
{p10(||sj− sr||, |ti− tq|)}}{p01(||sj − sr||, |ti− tq|)}
. (19)
In the binary case, it provides a better measure of dependence than the covariance function, for instance it allows ease of interpretation and its upper and lower bounds do not depend on the mean. The following code shows how to compute the empirical estimator of (19) using the routine EVariogram.
R> lor <- EVariogram(data, x, y, times, type = "lorelogram", grid = TRUE,
+ maxdist = 10, maxtime = 10)
R> pml <- FitComposite(data, x, y, times, corrmodel = "exp_exp",
+ grid = TRUE, maxdist = 2, maxtime = 4, model = "BinaryGauss",
+ fixed = list(nugget = 0), start = list(mean = 0, scale_s = 2,
+ scale_t = 1.5, sill = 0.5))
0 2 4 6 8 10 0 2 4 6 8 10 1st time instant x y 0 2 4 6 8 10 0 2 4 6 8 10 2nd time instant x y 0 2 4 6 8 10 0 2 4 6 8 10 3rd time instant x y 0 2 4 6 8 10 0 2 4 6 8 10 4th time instant x y
Figure 5: Simulations of spatio-temporal binary random fields on an 11×11 grid. From left to right panels, the simulations of 4 consecutive times are reported. The green and gray colors represent the cases Y (s, t) = 1 and Y (s, t) = 0.
mean scale_s scale_t sill
0.0162018 2.1896897 1.8202424 0.9560474 R> library("scatterplot3d")
R> Covariogram(pml, vario = lor, show.vario = TRUE)
The above code shows how to compute a parametric estimate of the lorelogram. Firstly, the FitComposite routine computes the parameter estimates of the underlying latent Gaussian random field. With print(pml), the summary of the fitting results are returned (omitted here for brevity) and with pml$param, only the parameter estimates are displayed. Sec-ondly, to compute a smooth estimate of the lorelogram, based on the output obtained from FitComposite, the routine Covariogram is used. This also diplays the estimated function
(using scatterplot3d,Ligges and M¨achler 2003). Figure6shows the results. The left and right
panels display the non-parametric and parametric estimates of the spatio-temporal lorelogram versus spatial distances and temporal intervals.
3.3. Max-stable case
Assume that independent copies, z1, . . . , zm, of the vector of spatial componentwise maxima
z = {z(s1), . . . , z(sk)}>, are available. Examples of real data analyses using this type of
extremes are Padoan et al. (2010), Davison and Blanchet (2011) and Davison et al. (2012).
Suppose for simplicity that the random behavior of the spatial maxima is well described by one
of the classes of spatial max-stable RFs described in Section2.3. For those families, although
Empirical Space−time lorelogram 0 2 4 6 8 10 −0.5 0.0 0.5 1.0 1.5 2.0 0 2 4 6 8 10 Distance Time Lorelogr am ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Distance 0 2 4 6 8 Time 0 2 4 6 8 10 Lorelogr am 1 2 3 4 Space−time lorelogram
Figure 6: Spatio-temporal lorelogram versus spatial distances and time intervals. The left and right panels report the empirical and parametric estimates of the lorelogram
their relative density functions is unfeasible in practice. Indeed, given the particular form of a max-stable distribution, see (4), for deriving the density, a kth Bell number of derivatives is required to compute it (Davison and Gholamrezaee 2012). Nevertheless, the bivariate density is still feasible to compute in practice and it has the form
exp{−V (z1, z2)}[{∂V (z1, z2)/∂z1}{∂V (z1, z2)/∂z2} − ∂2V (z1, z2)/∂z1∂z2].
Taking the first and second order derivatives of (5)–(7) allows us to obtain the Brown– Resnick, the extremal-Gaussian and extremal-t bivariate densities. Therefore, for inference the full likelihood can be replaced by the pairwise marginal likelihood (12). Details on CL
inference for max-stable random fields are provided byPadoan et al.(2010) andDavison et al.
(2012). Here, practical examples are illustrated. R> library("RandomFields")
R> set.seed(3132)
R> coords <- cbind(runif(100, 0, 10), runif(100, 0, 10))
R> param <- list(mean = 0, sill = 1, nugget = 0, scale = 1.5, power = 1.2) R> extg <- RFsim(coords, corrmodel = "stable", model = "ExtGauss",
+ replicates = 30, param = param)$data
This code shows how to simulate an extremal-Gaussian RF with unit Fr´echet margins with
the routine RFsim. The simulation is performed using the routine MaxStableRF developed in the R package RandomFields (Schlather, Malinowski, Menck, Oesting, and Strokorb 2015). The correlation model and the parameters for the extremal-Gaussian (Schlather 2002) can be specified with corrmodel and param. The next code shows how to perform the composite likelihood-fitting of the extremal-Gaussian, using the routine FitComposite.
R> fextg <- FitComposite(extg, coords, corrmodel = "stable",
+ model = "ExtGauss", replicates = 30, varest = TRUE,
+ vartype = "Sampling", margins = "Frechet", fixed = list(sill = 1),
+ start = list(scale = 1.5, power = 1.2))
power scale 0.9156696 1.7135199
With print(fextg) the summary of the fitting results is returned (omitted here for brevity) and with fextg$param only the parameter estimates are displayed. The standard errors of the estimates are computed and specifying vartype = "Sampling" the matrices H and J in (10) are computed using their sampling version. Note that other correlation models, such as
the Whittle-Mat´ern or the generalized Cauchy (see Table1) can also be fitted. The next code
shows how to simulate and fit also the Brown–Resnick and Extremal-t models. R> bres <- RFsim(coords, corrmodel = "stable", model = "BrowResn",
+ param = param, numblock = 1000, replicates = 30)$data
R> extt <- RFsim(coords, corrmodel = "stable", model = "ExtT",
+ param = c(param, df = 3), numblock = 1000, replicates = 30)$data
R> fbres <- FitComposite(bres, coords, corrmodel = "stable",
+ model = "BrowResn", replicates = 30, varest = TRUE,
+ vartype = "Sampling", margins = "Frechet",
+ start = list(scale = 1.5, power = 1.2))
R> fextt <- FitComposite(extt, coords, corrmodel = "stable",
+ model = "ExtT", replicates = 30, varest = TRUE,
+ vartype = "Sampling", margins = "Frechet",
+ fixed = list(sill = 1), start = list(df = 3, scale = 1.5, power = 1.2))
The first two commands perform simulations from the two respective random fields. Simula-tions are obtained computing the pointwise maximum (appropriately rescaled) over a number of replications of the Gaussian or Student-t field. The number of replications is specified with numblock.
Max-stable RF can also be fitted using the least squares method proposed by Smith(1990).
The following code shows an example with the Brown-Resnick case. R> wlbr <- WLeastSquare(bres, coords, corrmodel = "stable",
+ model = "BrowResn", replicates = 30, fixed = list(sill = 1))
The extremal coefficient functions in (8) can be computed on the basis of the fitting results. These are useful for summarizing the extremal dependence. The next code shows how to use the routine Covariogram for computing the extremal coefficients for the case of the Brown– Resnick RF.
R> extcbr <- Covariogram(fbres, answer.extc = TRUE) R> extcbrw <- Covariogram(wlbr, answer.extc = TRUE)
Non-parametric estimators of the extremal coefficient are also available. A useful estimator
is obtained by means of the madogram and F -madogram introduced byCooley, Neveau, and
Poncet(2006). For instance, the empirical version of F -madogram is
ˆ ν(h) = 1 2m |Ph| X {z(sj),z(sr)}∈Ph m X i=1 |F {z(i)(sj)} − F {z(i)(sr)}|, j, r ∈ J, h > 0,