Extreme value prediction: a computational approach

(1)

Corso di Laurea magistrale

in Computer Science

Tesi di Laurea

Extreme value prediction:

A computational approach

Relatore

Ch. Prof. Federica Giummole’

Laureando

Simone Lorenzon

Matricola 822388

Anno Accademico

2012 / 2013

(2)

Contents i

List of Tables iii

List of Figures v

Introduction 1

1 Extreme Value Theory 3

1.1 Classic Theory . . . 3

1.1.1 Generalized extreme value distribution . . . 5

1.1.2 Parameter Estimation . . . 7

1.1.3 Estimation for Minima . . . 9

1.1.4 Model Checking . . . 10

1.2 New Developments . . . 12

1.2.1 Point Process Characterization . . . 12

1.2.2 Threshold Models . . . 14

1.2.2.1 Threshold selection . . . 17

1.2.2.2 Parameter Estimation . . . 18

1.2.2.3 Return Levels . . . 19

1.2.2.4 Model Checking . . . 19

1.3 Extremes of Dependent Processes . . . 20

1.3.1 Modelling Stationary Series . . . 23

1.4 Non-stationary Processes . . . 26

1.4.1 Parameter Estimation . . . 28

1.4.2 Model Checking . . . 29

2 Calibrated prediction regions 31 2.1 Methods . . . 31 2.1.1 Corrected Quantiles . . . 32 2.1.2 Calibrated Quantiles . . . 34 2.2 Algorithm . . . 34 2.3 Results . . . 39 2.4 Multivariate Case . . . 62

2.4.1 Bivariate extreme value distribution . . . 62

2.4.2 Prediction regions in the bivariate domain . . . 64

(3)

3 Application to rainfall data in UK 67

3.1 MIDAS DataBase . . . 67 3.1.1 Data preprocessing . . . 71 3.1.2 Data Analysis . . . 75

4 Conclusions 81

A Extremal Index analysis 83

A.1 Method of analysis . . . 83 A.2 Results . . . 85

(4)

2.1 Coverage probability (Standard Error) - n = 10 - ξ = 0 - ξ Free . . . 41

2.2 Coverage probability (Standard Error) - n = 10 - ξ = 0 - ξ Lock . . . 41

2.8 Coverage probability (Standard Error) - n = 100 - ξ = 0 - ξ Lock . . . . 44

2.9 Coverage probability (Standard Error) - n = 10 - ξ = 0.5 - ξ Free . . . . 45

2.10 Coverage probability (Standard Error) - n = 10 - ξ = 0.5 - ξ Lock . . . . 45

2.16 Coverage probability (Standard Error) - n = 100 - ξ = 0.5 - ξ Lock . . . 48

2.17 Coverage probability (Standard Error) - n = 10 - ξ = −0.25 - ξ Free . . . 49

2.18 Coverage probability (Standard Error) - n = 10 - ξ = −0.25 - ξ Lock . . 49

2.23 Coverage probability (Standard Error) - n = 100 - ξ = −0.25 - ξ Free . . 52

2.31 Coverage probability (Standard Error) - n = 100 - ξ = −0.75 - ξ Free . . 56

2.33 Coverage probability comparison (Standard Error) - n = 50 . . . 60

2.34 Coverage probability comparison (Standard Error) - n = 100 . . . 61

(5)

3.1 Type of data collected in the MIDAS database . . . 68

3.2 Met Domain Name . . . 69

3.3 Record Field . . . 70

3.4 Station Summary . . . 72

3.5 ξvalues - GEV Distribution . . . 73

3.6 ξvalues - GP Distribution . . . 74

3.7 ξvalues - GP Distribution on Cluster Maxima . . . 75

3.8 Station 000032 . . . 77

3.9 Station 000113 . . . 77

3.10 Station 000235 . . . 78

(6)

1.1 Plot of return levels for some values of ξ . . . 7

1.2 Representation of point processes and its threshold . . . 13

3.1 Map of the stations . . . 80

A.1 Threshold Comparison - Daily Based - Nord Sud . . . 90

A.2 Threshold Comparison - Daily Based - Coast Inland . . . 90

A.3 Threshold Comparison - Hourly Based - Nord Sud . . . 91

A.4 Threshold Comparison - Hourly Based - Coast Inland . . . 91

A.5 Shape Comparison - Nord Sud . . . 92

A.6 Shape Comparison - Coast Inland . . . 92

A.7 EXI comparison . . . 93

A.8 EXI Comparison - Daily Based - Nord Sud . . . 94

A.9 EXI Comparison - Daily Based - Coast Inland . . . 94

A.10 EXI Comparison - Hourly Based - Nord Sud . . . 95

A.11 EXI Comparison - Hourly Based - Coast Inland . . . 95

(7)

(8)

Since many years, the study of meteorological phenomena, especially rainfall, has become particularly relevant for all those involved in the management of water resources, in the field of hydraulic engineering and prevention of mete-orological emergency situations. With an accurate analysis of the phenomenon it is possible to take weighted decisions about sizing, and thus investments in infrastructure and public works; consider for example the construction of a hy-droelectric plant or the management of outflows in urban areas.

Rainfall, like other meteorological, environmental or economic-financial phe-nomena, fall into a category of events for which the statistical interest can be ap-plied not only to aspects concerning the distribution of the whole complex data, but also and especially to the more rare or of particular intensity events, that is, relative to the tails of the distribution.

The extreme value theory is born to meet the need to describe and predict the behaviour of extreme values of a given phenomenon. In the case of precipitation for example, it allows to determine the return period of a certain value and then to understand what time elapse between the occurrence of a given level of rainfall and the recurrence of the same. It is thus possible to determine which is the amount of rain that will fall only once every 10, 100 or 1000 years.

The purpose of this thesis is to apply the recent results obtained in the field of calibrated prediction regions to extreme value distributions. Through the use of simulations and applying the methods to real data sets relating to the rainfall in the UK (MIDAS), we try to understand how these methods can provide an

(9)

improvement to the forecasting methods previously used, observing possibilities and limits.

In Chapter 1 we recall the basic elements of the extreme value theory, exposing for the generalized extreme value distribution, parameter estimation methodol-ogy and diagnostic tools. In a second stage the point process characterization is introduced as a prerequisite for the presentation of the modelling based on threshold. In this context some methods of threshold selection and evaluation of the return values are described. To conclude, we present some key concepts that later allow to model dependent sequences and non-stationary processes.

In Chapter 2 we analyze the latest developments in terms of calibrated predic-tion regions. Through the comparison of two different forecasting methods and the use of simulations, we assess the improvements these developments have contributed to the more traditional forecasting techniques, analyzing the limits and possibilities of further growth. A brief tour in the case of forecast for bivari-ate phenomena is also presented.

In Chapter 3 we apply what we have seen in previous chapters to series of real data. After a brief exposition of the data sources we examine the tools used for measurement, detection rates and the distribution of the missing data. At this point, the algorithms used to run the simulations are applied to the data series, thus being able to see how the benefits and limitations described above have an influence on the study of real phenomena. To conclude, an analysis to verify the behaviour in the case where the data are treated as dependent series is conducted on the same data.

In Chapter 4 we briefly close the discussion summarizing the results and drawing some conclusions.

(10)

E

XTREME

V

ALUE

T

HEORY

The extreme value theory is a branch of statistics that studies the extreme de-viations from the central portion of a probability distribution, its results have considerable importance in assessing the risk that characterizes rare events, such as collapse of the stock market, or earthquakes of exceptional intensity.

The main issues in the consideration of these phenomena are: • small number of observations on the tails of the distribution;

• inadequacy of the standard techniques on modelling the tails of distribu-tions.

The role of the Theory of Extremes is to develop scientifically rational procedures for estimating the limit behaviour of processes.

1.1 Classic Theory

Given a set of IID random variables X1, X2,. . . , Xn with distribution function

F (x), consider

Mn=max{X1, X2, . . . , Xn}.

(11)

Its distribution function is given by:

Pr{Mn≤ x} = Pr{X1 ≤ x, . . . , Xn≤ x}

= Pr{X1 ≤ x} . . . P r{Xn ≤ x}

= {F (x)}n.

The problem is that the distribution F (x) is unknown. It is therefore necessary to identify a family of possible distributions of Mn for n → ∞ and use it as an

approximation of the distribution of Mnfor finite n.

To detect the distribution that we will call G(x) we need to make some as-sumptions. As first thing we can note that, with a probability equal to 1, the distribution of Mnconverges to the upper limit of F (x).

Resume now the concept of the central limit theorem, which we recall briefly: Yn = √ n( ¯Xn− µ) σ D −→ N (0,1).

As we know, the theorem allows us to approximate the distribution of the sample mean to a standard normal distribution.

With the same principle we can bring the distribution Mn to a known

distri-bution. Define then an > 0 and bn as coefficients representatives of scale and

location parameters respectively such that: M_n∗ = Mn− bn

an

.

A correct choice of the coefficients makes it possible to stabilize the variations of position and scale of M∗

nfor n → ∞ and resolve difficulties that arise considering

the variable Mn.

At this point, we can draw the first conclusion.

Theorem 1 (Extremal types theorem) If there exist a sequence of constant an > 0

and bn, such that, as n → ∞,

(12)

where G is a non-degenerate distribution function, then G belongs to one of the following families: I : G(x) = exp{− exp(−x)} − ∞ < x < ∞ II : G(x) =        0 x ≤ 0 exp(−x−α) x > 0, α > 0 III : G(x) =        exp{−(−x)−α} x < 0, α > 0 1 x ≥ 0

Conversely, each of these distributions may appear as the limit of the distribution of (Mn−

bn)/an, this occurs when G(x) is itself the distribution of X.

Classes of distribution, indicated by I, II, III are respectively known as Gum-bel, Fréchet and Weibull distributions.

It is important to note that Theorem 1 does not guarantee the existence of a finite limit for Mn; it only specifies the distribution to use if this limit exists.

Clarified this aspect, by analogy with the Central Limit Theorem, we find that Mnapproximately follows one of the distributions shown in Theorem 1 whatever

the distribution F (x).

1.1.1 Generalized extreme value distribution

For statistical purposes, however, it is not convenient to work with three different classes of distributions. Thus Von Mises (1954) and later Jenkinson (1955) intro-duced the Generalized Extreme Value distribution G(µ, σ, ξ), (GEV), whose density function is: G(x) = exp ( − 1 + ξ x − µ σ −1/ξ) (1.1) defined for {x : 1 + ξ(x − µ)/σ > 0} and σ > 0.

The distribution has three parameters:

(13)

• σ : scale parameter, determines the variability of the phenomenon; • ξ : shape parameter (in reference to the classes of Theorem 1)

– ξ → 0:corresponds to the Gumbel distribution (I);

– ξ > 0: corresponds to the Fréchet distribution (II);

– ξ < 0: corresponds to the Weibull distribution (III).

In cases ξ > 0 and ξ → 0 the distribution is not limited. Only in the case where ξ < 0the distribution is bounded above and we can then calculate the maximum. It is now possible to restate theorem 1 with the help of the GEV distribution:

Theorem 2 If there exist sequences of constants an > 0and bn, such that, as n → ∞,

Pr{(Mn− bn)/an≤ x} → G(x)

where G is a non-degenerate distribution function, then G is a member of the GEV family: G(x) = exp ( − 1 + ξ x − µ σ −1/ξ) , defined on {x : 1 + ξ(x − µ)/σ > 0} with σ > 0.

Read otherwise theorem 2 says that for n large enough: Pr Mn− b

a ≤ x

≈ G(x), (1.2)

that, for a > 0, is equivalent to saying that Pr{Mn ≤ x} ≈ G x − b a = G∗(x) (1.3)

where G∗ _{is of the same type of G. We can therefore conclude that the family of}

the extreme value distributions, can be directly estimated from a series of obser-vations from Mn.

(14)

Inverting (1.1) we can achieve an expression for the quantiles of the distribu-tion: xp =        µ − σ_ξ 1 − {− log(1 − p)}−ξ if ξ 6= 0 µ − σ log{− log(1 − p)} if ξ = 0 (1.4)

with G(xp) = 1 − p. The value xp is said return level and indicates, in the case of

time series, the value that is expected to be exceeded on average once every 1/p years. In other words, xpis the return level associated with the return period 1/p.

Representing in Figure 1.1 values xp and log yp where

Figure 1.1: Plot of return levels for some values of ξ

yp = − log(1 − p),

we can see how the graph is linear for ξ = 0, has no finite limit for ξ > 0, and presents finite limit µ − σ/ξ as p → 0 in the case that ξ < 0.

1.1.2 Parameter Estimation

Suppose now to have a sequence of values M1, M2, . . . , Mk which for simplicity

(15)

ap-proximation of the distribution of the maximum with a GEV distribution. Our task now is to estimate the parameters (µ, σ, ξ) via maximum likelihood. First we must consider the asymptotic properties to which this method is subject, particu-larly in the GEV model where the support of the distribution is a function of the parameters µ ± σ/ξ. This violation of the general regularity conditions prevent us from automatically apply the model.

The problem has been studied by Smith (1985) who obtained the following results:

• if ξ > −0.5 maximum likelihood estimator exists and is regular;

• if −1 < ξ < −0.5 maximum likelihood estimator exists but is not regular; • if ξ < −1 maximum likelihood estimator does not exist.

Fortunately, ξ < −0.5 corresponds to a distribution with very short upper tail and is rarely encountered. We can therefore conclude that estimating parameters with Maximum Likelihood is a valuable method.

Starting from 1.1 we can then calculate the log-likelihood as: l(µ, σ, ξ) = log k Y i=1 dG(x) dx = log k Y i=1 g(x) = log k Y i=1 ( 1 σ 1 + ξ xi− µ σ −(1+1_ξ) exp 1 + ξ xi− µ σ −1_ξ) = k X i=1 ( − log σ − 1 + 1 ξ log 1 + ξ xi− µ σ − 1 + ξ xi− µ σ −1_ξ) (1.5) valid for 1 + ξ xi− µ σ > 0, i = 1, . . . , k

(16)

Particular attention must be paid in the case of ξ → 0. In this case the log-likelihood function takes the form:

l(µ, σ) = k X i=1 − log σ − xi − µ σ − exp − xi− µ σ . (1.6)

The maximization of functions 1.5 and 1.6 has no analytical solution, but for each dataset estimates can be identified using optimization algorithms managed by special software.

The variance of the parameters can be easily obtained by inverting the ob-served information matrix. Confidential intervals take the form:

Estimate ± qα/2∗ Std.Error (1.7)

where qα/2indicates the quantile of the standard normal or t-Student distribution.

1.1.3 Estimation for Minima

The extreme value theory can be also applied to describe the behaviour of the lower tail of a distribution. It is sufficient to take all the considerations made in the previous sections, substituting the values minimum to those maximum. In other words, if we are interested in studying Mn =min{X1, X2, . . . , Xn} we should

simply consider Mn=max{−X1, −X2, . . . , −Xn} and proceed as follows:

P r{Mn≤ x} = P r{−Mn≤ x} = P r{Mn≥ −x} = 1 − P r{Mn ≤ −x} ≈ 1 − exp ( − 1 + ξ −x − µ σ −1_ξ) = 1 − exp ( − 1 − ξ x + µ σ −1_ξ) . (1.8)

This property becomes particularly useful when applying the principle to the weakest-link. In systems composed of many components, the functioning of com-plex depends on the integrity of the individual elements. It is therefore necessary

(17)

to study the lower part of the distribution of the entire system in order to under-stand the conditions under which the weaker members may give up.

1.1.4 Model Checking

Once estimated the model, we can verify its appropriateness using the following representations. In the following we denote x(1), x(2), . . . , x(m) the data sorted in

ascending order. • Probability Plot:

Compare the empirical distribution function: ˜

G(x(i)) =

i

m + 1 X(i−1) < x ≤ X(i) (1.9) with the one parametrically estimated:

ˆ G(x(i)) = exp ( − 1 + ˆξ x − ˆµ ˆ σ −1/ ˆξ) . (1.10)

If the model is correct:

˜

G(x(i)) ≈ ˆG(x(i)). (1.11)

This graphic represents points:

n ˜_G(x_(i)_{), ˆ}_G(x_(i)₎

, i = 1, . . . , mo. (1.12)

• Quantile Plot:

Compare the quantiles of the estimated distribution ˆ G−1 i m + 1 = ˆµ − σˆ ˆ ξ ( 1 − − log i m + i − ˆξ) , (1.13)

with those of empirical derivation. Briefly, the Quantile plot represent: n ˆ_G−1

(i/(m + 1)), x(i)

, i = 1, . . . , mo. (1.14) This representation, with respect to the Probability Plot, allows to study more precisely the accuracy of the model in the right tail of the distribution.

(18)

• Return Level Plot:

Compare estimated quantiles xp, obtained by substituting the estimates of

(µ,σ,ξ) in (1.4), with the logarithmic transformation yp = − log(1 − p). The

representation includes a confidence interval of 95% calculated using the delta method. The graph represents:

{(log yp, ˆxp) , 0 < p < 1} , (1.15)

where ˆxp is the maximum likelihood estimation of xp.

• Density Plot:

Overlaps the estimated density function of the GEV with a histogram of the sample data.

(19)

1.2 New Developments

As we understand, given a set of data with a certain temporal frequency (yearly, monthly, daily), the classical analysis of extreme values allows to include in the model a single value for each period. This limitation makes the analysis based on the GEV distribution highly inefficient.

In the following discussion we analyse some methods that allow us to con-sider a larger number of observations. We focus in particular on the threshold method.

1.2.1 Point Process Characterization

The basic idea is to construct a two-dimensional representation of the values Nn = {(i, Xi) : i = 1, . . . , n}, and describe its behaviour in a space of dimensions

[t1, t2] × (u, ∞).

Consider a set of IID random variables X1, X2,. . . , Xnwith distribution

func-tion F (x). Also define the constants an e bn such that F (x) is in the domain of

attraction of G(x).

Represent the behaviour of xi by points:

Nn = i n + 1, xi− bn an : i = 1, . . . , n . (1.16)

The change in scale of the ordinate axis ensures that the values are always in the range [0, 1], while the one in the axis of abscissas stabilizes the value of extremes as n → ∞.

Consider now a region A = [0, 1] × (u, ∞); for large values of u each of the n points of Nnhas probability pnto fall in A, where, it can be demonstrated:

pn= Pr X_i− bn an > u ≈ 1 n 1 + ξ u − µ σ −1_ξ .

(20)

Since the measurements are mutually independent, Nn(A) ∼ Bin(n, pn). ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 0.2 0.4 0.6 0.8 1.0 0 2 4 6 i/(n+1) (Xi−b)/a

Figure 1.2: Representation of point processes and its threshold

From the convergence of a binomial distribution to a Poisson distribution fol-lows that, the limit distribution of Nn(A), as n → ∞, is a Poisson distribution

with parameter λ equal to:

λ = Λ(A) = 1 + ξ u − µ σ −1_ξ ,

where Λ(A) indicates the measurement of the intensity of the region A or, in other words, Λ(A) = E[Number of points in A] = E[N (A)].

Notice that, assuming A = [a1, x1] × · · · × [ak, xk] ⊂ Rk, the function

λ(x) = δΛ(A) δx1. . . δxk

(21)

corresponds to the density function of the process.

Generalizing we can therefore say that, beyond a certain threshold u, Nn

closely approximates a Poisson process with intensity function given by: Λ{[t1, t2] × (u, ∞)} = (t2− t1) 1 + ξ u − µ σ −1_ξ . (1.18)

Often for statistical purposes it is useful to represent the distribution of the annual maximum data. If we want to extend the study to a number of periods ny

we have to rewrite the density function as: Λ{(t1, t2) × (u, ∞)} = ny(t2− t1) 1 + ξ u − µ σ −1_ξ (1.19) thus making the estimation of the parameters (µ, σ, ξ) correspond with the values obtained from the GEV.

To summarize what we have seen so far, once chosen a sufficiently high thresh-old u and defined A = (0, 1) × [u, ∞), we can identify N (A) points included in the region A and rename them as(t1, x1), . . . , (tN (A), xN (A)) . Assuming that the

points are distributed within A as a Poisson process we can obtain the parameters (µ, σ, ξ) maximizing the likelihood function. Given a region Av = [0, 1] × (v, ∞)

with v > u, the likelihood function is given by:

L(Av; µ, σ, ξ) = exp{−Λ(A)} N (A) Y i=1 dΛ(ti, xi) = exp ( −ny 1 + ξ(v − µ) σ −1_ξ)N (A) Y i=1 σ−1 1 + ξ(xi− µ) σ −_ξ−11 . (1.20)

1.2.2 Threshold Models

The idea at the basis of this model is to take into account all values that exceed a certain threshold u.

(22)

Let X1, X2, . . . , Xn be a sequence of IID random variables with distribution

function F (x); consider all Xi such that:

Xi > u,

which for convenience we label x(1), x(2), . . . , x(k). Our analysis is based on:

yj = x(j)− u for j = 1, . . . , k.

We want to get an approximation to the conditional probability: P r(X > u + y|X > u) = 1 − F (u + y)

1 − F (u) . (1.21)

Before proceeding, define Λ(Az)as:

Λ(Az) = Λ1([t1, t2]) × Λ2([x, ∞)) where Λ1([t1, t2]) = (t2− t1) Λ2([x, ∞)) = 1 + ξ x − µ σ −1_ξ . For a sufficiently high threshold u, we can say that:

P r{(Xi− bn)/an > x|(Xi− bn)/an > u} = Λ2([x, ∞)) Λ2([u, ∞)) = n −1_{[1 + ξ(x − µ)/σ]}−1/ξ n−1_{[1 + ξ(u − µ)/σ]}−1/ξ = 1 + ξ(x − µ)/σ 1 + ξ(u − µ)/σ −1_ξ = 1 + ξ x − u ˜ σ −1_ξ (1.22) where ˜ σ = σ + ξ(u − µ). (1.23)

As indicated earlier, we can include coefficients anand bnin the distribution

ob-taining: P r{X > u + y|X > u} = 1 + ξy ˜ σ −1_ξ . (1.24)

(23)

Theorem 3 (Generalized Pareto Distribution) Let X1, X2, . . . , Xnbe a sequence of

IID random variables with distribution function F (x), and let Mn =max{X1, . . . , Xn}.

If F satisfies theorem 2, for n large enough

P r{Mn< x} ≈ G(x) where G(x) = exp ( − 1 + ξ x − u ˜ σ −1_ξ)

for µ, σ > 0. Then, for large enough u, the distribution function of (X − u), conditional on X > u, is approximately H(y) = 1 − 1 + ξy ˜ σ −1_ξ , (1.25)

defined on {y : y > 0 and (1 + ξy/˜σ) > 0}, where ˜

σ = σ + ξ(u − µ).

Distribution (1.25) is called Generalized Pareto Distribution (GPD).

As for GEV, also in this case, the parameter ξ determines the shape of the distribution: H(x) =                1 − 1 + ξy_σ_˜− 1 ξ ξ ∈ (0, ∞) if ξ > 0 1 − exp _σy_˜ ξ ∈ (0, ∞) if ξ = 0 1 − 1 + ξy_σ_˜− 1 ξ ξ ∈ (0, −σ˜_ξ) if ξ < 0. (1.26)

(1.26) suggests that only in the case ξ < 0 the distribution admits an upper limit and in the particular case ξ = 0, the GPD assumes the form of an exponential with parameter y/˜σ.

(24)

1.2.2.1 Threshold selection

In the choice of the threshold it is necessary to balance the different needs that an adequate estimation of the model requires: on one side, the threshold must be high enough so that it may be considered valid the asymptotic approximation, and on the other, the value of u should be small enough to guarantee a reasonable numerosity and then a contained variance. It is possible to choose between two methods to solve this problem.

• Exploratory technique

Define Y as belonging to the family GPD, given ξ < 1 it can be shown that: E(Y ) = σ

1 − ξ. Suppose also to choose u0 as threshold value; then:

E(X − u0|X > u0) =

σu0

1 − ξ, (1.27)

where σu0 identifies the scale parameter corresponding to the excess over

the threshold. If the GPD distribution is valid for the excess over u0, it is

equally valid for all values of u > u0. So for a generic u > u0, we have:

E(X − u|X > u) = σu 1 − ξ = σu0 + ξu

1 − ξ .

(1.28)

We can therefore conclude that, for u > u0, E(Xu|X > u) is nothing but the

average overruns over the threshold u and, therefore, a linear function of u itself. It is now possible to represent in a graph the estimate of the mean at change of the threshold, obtaining the points:

( u, 1 nu nu X i=1 (x(i)− u) ! : u < xmax ) .

This representation is called Mean Residual Life Plot; the optimum value u0

is detectable on the axis of abscissas as the point above which the curve assumes approximately linear behaviour.

(25)

• Parameters stability

The validity of the model may be determined by estimating the parameters, within a range of threshold values, up to reach its stability. By varying u > u0:

– Estimation of ξ should remain approximately constant,

– Estimation of σ is a linear function of u; from (1.23) it follows that: σu = σu0 + ξ(u − u0),

unless ξ = 0, in this case it is necessary to reconsider the scale parame-ter of the GPD as:

σ∗ = σu− ξu.

In this way both σ∗ _{and ξ will be constant compared with u} 0.

Proceeding to the verification of the hypotheses and to the graphical repre-sentation, simply select the lowest value of u for which the estimates have almost constant trend.

1.2.2.2 Parameter Estimation

Having identified the value u0, we can determine y1, . . . , ykexceedances over the

threshold and estimate the parameters by maximum likelihood distinguishing two cases.

• For ξ 6= 0 log-likelihood corresponds to: l(σ, ξ) = −k log σ − 1 + 1 ξ k X i=1 log 1 + ξyi σ ; (1.29) • for ξ = 0 l(σ) = −k log σ − Pk i=1yi σ ! . (1.30)

Even in this case there is no analytical solution for the maximization of the equa-tions (1.29) and (1.30) and numerical methods are used to identify the estimates.

(26)

1.2.2.3 Return Levels

Suppose to describe with a GPD of parameters σ and ξ exceedances of a threshold ufrom a variable X. For x > u, we have:

P r{X > x|X > u} = 1 + ξ x − u σ −1_ξ . (1.31) It follows that: P r{X > x} = ζu 1 + ξ x − u σ −1_ξ , (1.32)

where ζu = P r{X > u}. We can therefore say that:

xm =        u + σ_ξ (mζu)ξ− 1 if ξ 6= 0 u + σ log(mζu) if ξ = 0, (1.33)

where xmis the value that is exceeded once every m observations. More

interest-ing is the return value of annual scale, which is the value that is expected to be exceeded once every N years.

xN =        u +σ_ξ (Nnyζu)ξ− 1 if ξ 6= 0 u + σ log(N nyζu) if ξ = 0, (1.34)

where ny represents the number of annual observations available.

We already know how to find the values of σ and ξ. In order to estimate ζuwe

can use the proportion of points that exceed the threshold u: ˆ

ζu =

k n.

If the number of surplus follows a distribution Bin(n, ζu), ˆζu corresponds to the

maximum likelihood estimator of ζu.

1.2.2.4 Model Checking

(27)

• Probability Plot:

Consists of pairs of points: n

i/(k + 1), ˆH(y(i))

, i = 1, . . . , ko, where ˆ H(y) = 1 − 1 + ˆ ξy ˆ σ !−1_ξˆ . • Quantile Plot: Represent n ˆ_H−1 (i/(k + 1)), y(i) , i = 1, . . . , ko, with ˆ H−1(y) = u + σˆ ˆ ξ h y− ˆξ− 1i. • Return Level Plot:

Describes the behaviour of pairs of points {m, ˆxm} ,

where xm is estimated by (1.33). The graph takes a linear shape for ξ = 0,

concave for ξ > 0 or convex if ξ < 0. • Density Plot:

Overlaps the density function of the estimated GPD with a histogram of the sample data that exceed the threshold.

If the models fits the data correctly, both the probability plot and the quantile plot should be linear.

1.3 Extremes of Dependent Processes

Over the topics covered so far, we have always assumed that the process by which the analyzed data were derived was a sequence of independent and identi-cally distributed random variables. This assumption, although favorable for our

(28)

purposes, is ill-suited to the most common practical cases, especially if we think about some environmental phenomena. The first generalization that we present is to assume that our data is dependent but stationary, i.e. with stochastic proper-ties homogeneous over time. In particular, we assume that two events Xi > uand

Xj > uare independent considering a threshold u and a time interval between i

and j sufficiently high. This consideration, quite likely, allows us to eliminate the time dependencies of the long term and to focus on the short term. We can say that a stationary series X1, X2, . . . is nearly independent (satisfy the D(un)

condi-tion) if observation sufficiently distant in time, for i1 < · · · < ip < j1 < · · · < jq

with j1− jp > l, satisfy

| Pr{Xi1 ≤ un, . . . , Xip ≤ un, Xj1 ≤ un, . . . , Xjq ≤ un}

− Pr{Xi1 ≤ un, . . . , Xip ≤ un} Pr{Xj1 ≤ un, . . . , Xjq ≤ un}| ≤ α(n, l),

(1.35) with α(n, l) → 0 for some lnsuch that ln/n → 0for n → ∞.

For sequences of independent variables, the difference in probabilities ex-pressed in (1.35) is exactly zero for any sequence un. More generally, we will

require that the D(un)condition holds only for a specific sequence of thresholds

un that increases with n. For such a sequence, the D(un)condition ensures that,

for sets of variables that are far enough apart, the difference of probabilities ex-pressed in (1.35), while not zero, is sufficiently close to zero to have no effect on the limit laws for extremes. From this definition we can derive the following theorem:

Theorem 4 Let X1, X2. . . be a stationary process and define Mn= max{X1, . . . , Xn}.

Then if {an > 0}and {bn} are sequences of constants such that

P r{(Mn− bn)/an ≤ x} → G(x)

where G is a non-degenerate distribution function, and (1.35) is satisfied with un =

anx + bn for every real x, G is a member of the generalized extreme value family of

(29)

This result allows to state that, given a stationary series in which condition (1.35) is satisfied, the distribution of the maximum follows the same distribution of an independent series, even if the estimated values are influenced. We can also say that

Theorem 5 Let X1, X2. . . be a stationary process and X1∗, X ∗

2. . . be a sequence of

inde-pendent variables with the same marginal distribution. Define Mn = max{X1, . . . , Xn}

and M∗

n = max{X1∗, . . . , Xn∗}. Under suitable regularity conditions,

P r{(M_n∗− bn)/an ≤ x} → G1(x)

as n → ∞ for normalizing sequences {an > 0}and {bn}, where G1 is a non-degenerate

distribution function, if and only if

P r{(Mn− bn)/an≤ x} → G2(x)

where

G2(x) = Gθ1(x)

for a constant θ such that 0 < θ < 1.

We can also say that, if G1 corresponds, for example, to a GEV (µ, σ, ξ),

conse-quently Gθ₁(x) = exp ( − 1 + ξ x − µ σ −1/ξ)θ = exp ( −θ 1 + ξ x − µ σ −1/ξ) = exp ( − 1 + ξ x − µ ∗ σ∗ −1/ξ) where µ∗ = µ − σ ξ(1 − θ −ξ ), σ∗ = σθξ, in the case of ξ different from zero,

(30)

otherwise. Note that in both cases, the shape parameter remains unchanged. The value θ ∈ (0, 1] is named extremal index. It can be defined as the propen-sity of a series to form clusters above a certain threshold [13]:

θ = (limiting mean cluster size)−1.

In the case of independent series the index is equal to 1. This however does not imply the contrary, it is in fact possible to find dependent series for which the index is anyway equal to 1. This is due to the fact that a value equal to 1 means that the dependence is negligible at asymptotically high levels, but not necessarily at extreme levels. It is therefore necessary to be careful using this tool, studying individual applications case by case.

1.3.1 Modelling Stationary Series

In the following we see how the time dependence changes the way we analyze the data with the procedures so far seen.

Regarding the models based on block maxima, we have seen that Mn and

M_nθ∗ have similar properties. To estimate the model we can then use the methods designed for independent processes whereas parameters will vary slightly. The only problem concerns the lack of accuracy with which the GEV fits the distribu-tion of the maximum when the dependence within the series increases. This is due to the fact that while an independent series is going to take maximum every n observations, in the case of stationary series the value of n is influenced by θ (maxima are detected every nθ observations).

For the modelling of values exceeding a threshold things are different. In fact, consider a meteorological phenomenon like rain. Particularly high values at a given time are likely to be accompanied by values higher or lower in the surveys immediately before or after: a clear indication of dependence between the extremes. It is possible, observing the values above the threshold, to

(31)

iden-tify clusters in which the surpluses are grouped. A popular tools used to make independent a series of excess is the declustering.

The technique provides with an empirical method the identification of clus-ters within the series. Once this is done we proceed to estimate a GP distribution using only the maximum of each cluster, assuming that there is independence between these values. Although the method is of immediate application, in most cases the choice to determine the cluster is arbitrary and one of the main ad-vantages of GP distribution over GEV, the number of observations available to estimate the model, is lost.

This choice also determines the need to change the way in which the return values are calculated. In fact it is important to take into account the number of clusters identified. The N -year return level is

xN =        u +σ ξ(Nnyζuθ) ξ_{− 1} if ξ 6= 0 u + σ log(N nyζuθ) if ξ = 0. (1.36)

If we denote the total number of observation as n, the number of observations exceeding the threshold as nu and the number of cluster obtained as nc, we have

that ˆ ζu = nu n , ˆ ξ = nc nu . Thus ˆζuξˆcan be estimated as nc/n.

One of the most used methods for the determination of the clusters is to ini-tialize a new cluster when more than r observations fall below the threshold. Clearly this method requires that r is chosen on the basis of subjective criteria that must adapt to the phenomenon under examination.

Ferro and Segers [4] have developed a technique based on the study of time between the excesses above the threshold, which allows to fully automate the selection of the clusters and the estimate of the extremal index.

(32)

Starting from a sample ξ1, . . . , ξn, and a threshold u, if N = Nn(u) = n X i=1 I(ξi > u)

is the number of observation that exceed the threshold, and 1 ≤ S1 < · · · <

SN ≤ n is the exceedance time, than the observed inter-exceedance times are

Ti = Si+1− Si, for i = 1, . . . , N − 1.

Ferro and Segers demonstrate that a consistent interval estimator for the ex-tremal index is ˜ θn(u) =        1 ∧ ˆθn(u) if max{Ti : 1 ≤ i ≤ N − 1} ≤ 2, 1 ∧ ˆθ∗_n(u) if max{Ti : 1 ≤ i ≤ N − 1} > 2, (1.37) where ˆ θn(u) = 2PN −1 i=1 (Ti) 2 (N − 1)PN −1 i=1 Ti2 and ˆ θ∗_n(u) = 2 n PN −1 i=1 (Ti − 1) o2 (N − 1)PN −1 i=1 (Ti− 1)(Ti− 2) .

Given the extremal index estimator, they develop a completely not arbitrary procedure to determine clusters. Dividing inter-exceedance times into indepen-dent inter-cluster times (between clusters) and indepenindepen-dent sets of intra-cluster times (within clusters) we can define the extremal index as the proportion of inter-exceedance times that may be included among the inter-cluster times. Given N exceedance times, S1 < · · · < SN, on which the inter-exceedance times is

Ti = Si+1− Si for i = 1, . . . , N − 1, we can say that the largest C − 1 = [θN ]

inter-exceedance times are independent inter-cluster times that divide the remainder into independent groups of intra-cluster times. Consider T(C) as the Cth largest

inter-exceedance time and Tij as the jth inter-exceedance time to overcome T(C),

then {Tij}C−1j=1 can be defined as a group of independent inter-cluster times. Let

(33)

Then {Tj}Cj=1is a collection of independent sets of intra-cluster time and each set

Tj has bound to a collection of threshold overcome Cj = {ξk : k ∈ Sj}, where

Sj = {Sij−1+1, . . . , Sij}.

This procedure allows to subdivide the series of surpluses in C clusters, where the jth corresponds to the exceedances CJ. If we replace the original value θ with

the estimator seen in (1.37) we get a non-arbitrary procedure for the detection of clusters.

The article then goes on demonstrating the potential of the new method through a comparison with the run estimator and examples on real series.

1.4 Non-stationary Processes

In some cases, especially in the study of environmental phenomena, the charac-teristics that describe the process tend to change systematically with the flow of time; these changes are in the form of seasonality or trend due respectively to changing of the seasons and long-term climate change.

Unfortunately it is not possible to generalize the study of extreme values for non-stationary processes, but in the following we will present some of the most common cases.

Consider of a phenomenon that has a linear change in the location parameter, in order to describe the behaviour of the maximum values in the year t we may use the expression:

Xt∼ GEV(µ(t), σ, ξ)

where:

µ(t) = β0+ β1t.

As we can see, the location parameter takes the form of a linear trend where β0is

the intercept and β1the rate of annual change. Other cases can be represented by

(34)

What was seen for the location parameter can be applied to the scale parame-ter, taking

Xt∼ GEV(µ, σ(t), ξ).

This is useful to model the changes in variability within the period.

When using a threshold model to describe markedly seasonal phenomena, it is possible to take different values of u in different periods. Denoting by s(t) the season in which observation are collected, and by us(t) the threshold

correspond-ing to each period, the GPD model (σ, ξ) can be expressed as: Xt− us(t)|Xt > us(t) ∼ GPD σs(t), ξs(t)

(1.38) where σs(t), ξs(t) respectively indicate the scale and shape parameters during the

period s(t).

In the cases that we have seen so far the parameters can be expressed by means of the function:

θ(t) = h XTβ , (1.39)

where θ indicates the parameter (µ, σ, ξ) to be modelled, β is a vector of param-eters and h represents a specific function. In the case of linear trend, h is the identity function: µ(t) = [1, t]    β0 β1   .

With the seasonal model of period s1, . . . , sk, µ takes the form:

µ(t) = [I1(t), I2(t), . . . , Ik(t)]          β1 β2 .. . βk         

where Ij(t)corresponds to the dummy variable:

Ij(t) =        1, if s(t) = sj 0, else.

(35)

As someone has noted, there are some similarities between what we saw with (1.39) and generalized linear models, in fact all the instruments used for the study of GLMs are directly usable in the context of extreme values. But we must take into account one important difference. The use of generalized linear models is restricted to the study of distributions belonging to the exponential family and often the GEV and GPD models do not belong to this family. Nevertheless, (1.39) is a great way to represent the non-stationary distributions of extreme values.

1.4.1 Parameter Estimation

Consider a non-stationary GEV model that describes the distribution of Xt for

t = 1, . . . , m:

Xt ∼ GEV (µ(t), σ(t), ξ(t)) ,

where µ(t), σ(t), ξ(t) can be expressed in the form (1.39). Indicating with β the vector of parameters, the likelihood can be expressed as:

L(β) =

m

Y

t=1

g (xt; µ(t), σ(t), ξ(t)) ,

where g(xt; µ(t), σ(t), ξ(t))indicates the GEV density evaluated at xt. By analogy

with (1.5) if all ξ(t) 6= 0, the log-likelihood can be written as:

l(µ, σ, ξ) = k X t=1 − log σ(t) − 1 + 1 ξ(t) log 1 + ξ(t) xt− µ(t) σ(t) − k X t=1 ( 1 + ξ(t) xt− µ(t) σ(t) −_ξ(t)1 ) , (1.40) valid for 1 + ξ(t) xt− µ(t) σ(t) > 0, for t = 1, . . . , k.

As already seen, particular attention must be paid in the case where ξ → 0, as the log-likelihood function takes the form:

l(µ, σ) = k X t=1 − log σ(t) − xt− µ(t) σ(t) − exp − xt− µ(t) σ(t) . (1.41)

(36)

To maximize (1.40) and (1.41) and identify the estimates of β it is necessary to use numerical methods.

1.4.2 Model Checking

We have already seen that it is possible to verify the model’s ability to represent the data when these are identically distributed. In the case of non-stationary processes the loss of homogeneity in the distribution forces us to make some changes. In most cases we can not do anything else but apply the tools already seen with a standardized version of the data, conditioned on the estimated values of the parameters.

Take for example the model Xt∼ GEV ˆ µ(t), ˆσ(t), ˆξ(t); variables ˜Xtdefined by ˜ Xt= 1 ˆ ξ(t)log 1 + ˆξ(t) Xt− ˆµ(t) ˆ σ(t) ,

follow namely the Gumbel distribution described in Theorem 1. The distribution function corresponds to:

Pr{ ˜Xt ≤ x} = exp{−e−x}.

Ordering the values ˜xt and labelling them as ˜x(1), . . . , ˜x(m), the probability plot

represents the pairs of points:

i/(m + 1), exp − exp(−˜x(i)) ; i = 1, . . . , m ,

while for the quantile plot

˜

x(i), − log(− log(i/(m + 1))) ; i = 1, . . . , m .

It is important to note that, while in the case of probability plot the choice of distribution does not involve any change to the graph, with the quantile plot the representation varies depending on the distribution that we decide to use.

(37)

We can transfer what we have seen so far to the case of the generalized Pareto distribution. Given a set of thresholds u(t) varying in time and a set of values exceeding the thresholds yt1, . . . , ytk, the estimated model can be denoted as:

GP D(ˆσ(t), ˆξ(t)).

Consider the case in which ξ → 0 and define ˜Ytk:

˜ Ytk = 1 ˆ ξ(t)log 1 + ˆξ(t) Ytk − ˆµ(t) ˆ σ(t) .

Again we can sort the values ˜ytj and identify them with ˜y(1), . . . , ˜y(k), it follows

that the graph of the probability is formed by pairs

i/(k + 1), exp − exp(−˜y(i)) ; i = 1, . . . , k

and the quantile plot by values

˜

y(i), − log(1 − i/(k + 1)) ; i = 1, . . . , k .

(38)

C

ALIBRATED PREDICTION REGIONS

AND DISTRIBUTIONS

In this section we review the theory behind the construction of prediction regions and predictive distribution functions. For this purpose, we rely primarily on ar-ticles [5] and [6]. First we define a procedure that allows to compute predictive distribution functions and improved prediction limits. Then we study a method that allows to generate improved prediction regions, reducing forecast error of estimative prediction regions up to the third order. All methods take advantage of Monte Carlo and Bootstrap simulation techniques. This allows to avoid com-plications due to analytical procedures.

After explaining the theory and algorithms for these methods we are going to see the results of simulations.

2.1 Methods

Consider an observable random vector Y = (Y1, . . . , Yn), n ≥ 1, and define the

problem of prediction of a further random variable Z as an estimation of the future observation z given a sample y of Y . Assume that the joint distribution of Z and Y is known and can be described by a vector of parameters ω = (ω1, . . . , ωk) ∈

Ω ⊆ Rk_{, k ≥ 1}_{. Define the maximum likelihood estimator of ω based on the}

sample Y as ˆω = ˆω(Y ). If Z is a unidimensional random variable, the coverage

(39)

probability with respect to the joint distribution of Z and Y can be defined as: PY,Z{Z ≤ hα(Y )} = α,

for all α ∈ (0, 1), where hα(Y )is called a prediction limit. The most direct way

to compute a prediction limit would be to replace directly ω with the unknown parameter ˆω in the α-quantile of the conditional distribution, giving rise to an estimative prediction limit. Unfortunately, this solution leads to rather high pre-diction errors of order O(n−1). Recently, many solutions have been adopted to try to reduce this error to o(n−1)order. Among these we mention [1], [17] and [18]. In this section we focus on the work of Fonseca et al. [5] that allows to explicitly define the predictive distribution function which provides as quantiles the limits of prediction with improved coverage. We also consider the work of Hall et al [6] that use a bootstrap calibration technique.

Subsequently, the obtained results are extended also to the case where Z is a multidimensional random variable defining improved prediction regions as a modification of the estimative ones.

2.1.1 Corrected Quantiles

Let Z be a future scalar random variable independent from Y with marginal den-sity f (z, ω) and distribution function F (z, ω). Define also F−1_{(·; ω)}_{as the inverse}

of function F (·; ω) and zα(ω) = F−1(α, ω) as the corresponding α-quantile. Let

further zα(ˆω)be the estimative prediction limit for Z.

Denoting by c(α, ω) the O(n−1)order coverage error associated to zα(ˆω), we

have that ˆ

α(ω) = PY,Z{Z ≤ zα(ˆω)} = EY[F {zα(ˆω); ω}] = α + c(α, ω) + o(n−1),

(40)

To decrease the order of the error, [1] and [18] propose to modify the estima-tive prediction limit into:

zα(ˆω) + d(α, ˆω),

where d(α, ˆω) = −c(α, ˆω)/f (zα(ˆω); ˆω). Unfortunately, the identification of this

corrective term can lead to very complex calculations from the analytical point of view. An alternative solution has been proposed in [17]. Through the use of boot-strap simulations, we can obtain a term equivalent to the previous one, saving laborious calculations:

d(α, ω) = −c(α, ω)/f (zα(ω); ω) = zα(ω) − zα(ω)ˆ (ω) + o(n−1).

As a result, an improved limit can be obtained as ˜

zα(ˆω) = zα(ˆω) + zα(ˆω) − zα(ω)ˆ (ˆω) = 2zα(ˆω) − zα(ω)ˆ (ˆω).

Starting from this formula then, ˜zα(ˆω)can be derived by calculating the

estima-tive coverage probability ˆα(ω)through a bootstrap simulation procedure. In the following we refer to quantiles obtained with this procedure as corrected quan-tiles.

The problem of the procedure as described so far is that a solution is found only for fixed alpha in (0, 1). Defining the predictive distribution we could get a more general result and calculate the modified prediction limits as a quantile, for any alpha value. Looking at [1], we can easily define a predictive distribution function as:

F (z, ˆω) + c(α, ˆω)|α=F (z,ˆω) (2.1)

which is however difficult to compute due to the presence of the error term. Deferring to the original papers for the solution proposed by [2] and [12], we focus on the solution that takes advantage of the improved prediction limit proposed by Ueki and Fueda. Approximating the previous error term as

(41)

(2.1) can be approximated as ˜

F (z; Y ) = F (z; ˆω) + f (z; ˆω)[F−1{ ˆα(ω); ˆω}|α=F (z,ˆω)− z]

This important result allows to calculate improved prediction limits as α-quantile of ˜F (z; Y )for all α ∈ (0, 1).

2.1.2 Calibrated Quantiles

In this section we describe the work done by Hall et al. [6] that allows to identify improved prediction regions through an empirical adjustment, that is, identify among different values of alpha the one whose estimative quantile gives the cov-erage probability closer to the one required.

Once computed ˆω, we have to generate a sample Y∗ = (Y₁∗, . . . , Y_n∗)and a ran-dom variable Z∗ from the distribution f (Y, ˆω). From this sample we can estimate

ˆ

ω∗, compute the prediction region D(r, ˆω∗)and then, by Monte Carlo simulation or by numerical integration, compute the coverage probability:

ˆ

p(β) = PY,Z{R(Z∗, ˆω∗) ≤ rβ(ˆω∗)}

for different values of β close to the fixed target value α. Solving the equation ˆ

p(β) = α, we identify the value of β for which a coverage probability equal to α is obtained. We can now find the improved prediction region as the one corre-sponding to the value of β that generates the coverage probability nearest to α (we define this value as ˆβα). In fact as shown in [6],

PY,Z{R(Z, ˆω) ≤ r_βˆ_α(ˆω)} = α + O(n−2).

In the following we refer to quantiles obtained with this procedure as calibrated quantiles.

2.2 Algorithm

To implement the procedure we have performed a Bootstrap simulation (which has served to calculate the estimative coverage probability) nested inside a Monte

(42)

Carlo replication used to calculate estimative and improved prediction limits. The scripts contain all the necessary information but have become quite complex because of the techniques used to minimize the execution time. In the following we then explain to the best what is done during the execution of the program, leaving more technical aspects related to parallel execution.

Once defined parameters related to execution, we proceed with the first sim-ulation where we generate replications of Ynand Z from the distribution f (, ω):

Y11, Y12, . . . Y1n Z1

Y21, Y22, . . . Y2n Z2

..

. ... . . . ... ... Yr1, Yr2, . . . Yrn Zr

From each replication we than estimate the parameters.

Y11, Y12, . . . Y1n Z1 ⇓ ˆ ω1 Y21, Y22, . . . Y2n Z2 ⇓ ˆ ω2 .. . ... . . . ... ... Yr1, Yr2, . . . Yrn Zr ⇓ ˆ ωr

Now, for each estimate, a bootstrap simulation is made, generating samples Y∗and Z∗. For each vector Y∗ is then carried out parameter estimation.

(43)

Y11, Y12, . . . Y1n Z1 ⇓ ˆ ω1 ⇒ Y11∗, Y12∗, . . . Y1n∗ Z1∗ ⇓ ˆ ω₁∗ Y₂₁∗, Y₂₂∗, . . . Y_2n∗ Z₂∗ ⇓ ˆ ω₂∗ .. . ... . . . ... ... Y_b1∗, Y_b2∗, . . . Y_bn∗ Z_b∗ ⇓ ˆ ω_b∗

At this point procedures differentiate, depending on whether it has been de-cided to adopt the correction or the calibration.

- Corrected quantiles:

In this case, we estimate the estimative alpha quantile zα(ˆω∗) for each

dis-tribution f (, ˆω∗). The estimative coverage probability ˆα(ˆω)is then obtained comparing the quantiles just identified with the respective Z∗ _values

previ-ously generated ˆα(ˆω) =Pb

i=1I(Z ∗

(44)

Y₁₁∗, Y₁₂∗, . . . Y_1n∗ Z₁∗ ⇓ ˆ ω∗₁ ⇒ zα(ˆω∗1) Y₂₁∗, Y₂₂∗, . . . Y_2n∗ Z₂∗ ⇓ ⇒ ˆα1(ˆω1) ˆ ω∗₂ ⇒ zα(ˆω∗2) .. . ... . . . ... ... Y_b1∗, Y_b2∗, . . . Y_bn∗ Z_b∗ ⇓ ˆ ω∗_r ⇒ zα(ˆω∗b) - Calibrated quantiles:

In this case, through a binary search, we identify a value β near α that allows to generate a coverage probability as close as possible to α. To do this, the value of the quantile of level β is calculated for all values of ˆω∗ and then is compared with the values Z∗ previously generated. This will get the estimative coverage probability ( ˆβ(ˆω) = Pb

i=1I(Z ∗

i <= zβ(ˆωi∗))/b) for

different values of β. The value of β that gives the result closer to α is chosen ( ˆα(ˆω) = {β| ˆβ(ˆω) = α})

(45)

Y₁₁∗, Y₁₂∗, . . . Y_1n∗ Z₁∗ ⇓ ˆ ω∗₁ ⇒ zβ1(ˆω ∗ 1), . . . , zβd(ˆω ∗ 1) Y₂₁∗, Y₂₂∗, . . . Y_2n∗ Z₂∗ ⇓ ⇒ βˆ₁₁(ˆω1), . . . , ˆβd1(ˆω1) ⇒ αˆ1(ˆω1) ˆ ω∗₂ ⇒ zβ1(ˆω ∗ 2), . . . , zβd(ˆω ∗ 2) .. . ... . . . ... ... Y_b1∗, Y_b2∗, . . . Y_bn∗ Z_b∗ ⇓ ˆ ω∗_r ⇒ zβ1(ˆω ∗ b), . . . , zβd(ˆω ∗ b)

Now, for each distribution f (, ˆω)the estimative alpha quantile (zα(ˆω)) and the

improved one (˜zα(ˆω) = 2zα(ˆω) − zα(ˆˆ ω)(ˆω)) can be computed. Comparing these

with the respective Z values generated at the beginning, allows to identify the estimative (α(ˆω) =Pr

i=1I(Zi <= zα(ˆωi))/b) and improved (ˆα(ˆω) =

Pr

i=1I(Zi <=

˜

zα(ˆωi))/b) coverage probability that constitute the results of the simulation.

Y11, Y12, . . . Y1n Z1 ⇓ ˆ ω1 ⇒ zα(ˆω1), ˜zα(ˆω1) Y21, Y22, . . . Y2n Z2 ⇓ ⇒ α(ˆω), ˆα(ˆω) ˆ ω2 ⇒ zα(ˆω2), ˜zα(ˆω2) .. . ... . . . ... ... Yr1, Yr2, . . . Yrn Zr ⇓ ˆ ωr ⇒ zα(ˆωr), ˜zα(ˆωr)

(46)

2.3 Results

In this section we present the results of simulations done using the algorithms described above.

Results are achieved using the statistical software R, evd [15] package for the study of extreme values, and snowfall [9] package for parallel execution of the simulation algorithms, otherwise prohibitive from the viewpoint of the time com-plexity. Simulations are performed with 10000 Monte Carlo replications and an equal number of bootstrap replication. Different simulations are made for differ-ent combinations of the fixed α-value and observed sample size. Differdiffer-ent shape parameter values are used to vary the distribution between Weibull, Frechet and Gumbell. First, the shape parameter is fixed and imposed during the algorithm. Then we let the program estimate it, as the other parameters.

Each table (2.1 - 2.32) shows the value of the coverage probability for the method and the quantile used and the relative standard error. In the label are indicated the sample size used, the value of the shape parameter and the way in which this one is used.

Looking at results we can see how broadly the expected results occur. The methods for improving the coverage probability in fact seem to improve the re-sults obtained using the estimative quantile. Looking more closely, we can see how increasing the observed sample size an improvement in the estimative cov-erage probability is observed. The same result is obtained for higher α values of the improving methods that present a significant improvement as n increases. Regarding the lower quantiles in improving methods, the results are satisfactory even for the lowest values of n.

In simulations in which the shape parameter was fixed, results obtained are generally better than those obtained leaving that the algorithm estimates the pa-rameter. The difference can be seen especially with lower sample size and higher