Research Article
Markov Switching Model Analysis of Implied Volatility for
Market Indexes with Applications to S&P 500 and DAX
Luca Di Persio
1and Samuele Vettori
21Department of Computer Science, University of Verona, Strada le Grazie 15, 37134 Verona, Italy 2Department of Mathematics, University of Trento, Via Sommarive 14, 38123 Trento, Italy
Correspondence should be addressed to Luca Di Persio; dipersioluca@gmail.com
Received 31 May 2014; Revised 26 November 2014; Accepted 26 November 2014; Published 18 December 2014 Academic Editor: Niansheng Tang
Copyright ยฉ 2014 L. Di Persio and S. Vettori. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
We adopt a regime switching approach to study concrete financial time series with particular emphasis on their volatility characteristics considered in a space-time setting. In particular the volatility parameter is treated as an unobserved state variable whose value in time is given as the outcome of an unobserved, discrete-time and discrete-state, stochastic process represented by a suitable Markov chain. We will take into account two different approaches for inference on Markov switching models, namely, the classical approach based on the maximum likelihood techniques and the Bayesian inference method realized through a Gibbs sampling procedure. Then the classical approach shall be tested on data taken from the Standard & Poorโs 500 and the Deutsche
Aktien Index series of returns in different time periods. Computations are given for a four-state switching model and obtained
numerical results are put beside by explanatory graphs which report the outcomes obtained exploiting both smoothing and filtering algorithms used in the estimation/calibration procedures we proposed to infer on the switching model parameters.
1. Introduction
Many financial time series are characterized by abrupt changes in their behaviour, a phenomena that can be implied by a number of both endogeneous and exogeneous facts, often far from being forecasted. Examples of such changing factors can be represented, for example, by large financial crises, government policy and political instabilities, natural disasters, and speculative initiatives.
Such phenomena have been frequently observed during last decade especially because of the worldwide financial cri-sis which originated during the first years of 2000 and is still running. Indeed such a crisis, often referred to as the Global
Financial Crisis, has caused a big lack of liquidity for banks,
both in USA, Europe, and many countries all over the world, resulting, for example, in a collapse of many financial insti-tutions, a generalized downfall in stock markets, a decline consumer wealth, and an impressive growth of the Eurpean sovereign debt. The reasons behind these phenomena are rather complicated, particularly because of the high num-ber of interconnected point of interests, each of which is
driven by specific influences often linked between each other. Nevertheless there are mathematical techniques which can be used to point out some general characteristics able to synthe-size some relevant informations and to give an effective help in forecasting future behaviour of certain macroquantities of particular interest.
Although linear time series techniques, for example, the autoregressive (AR) model, the moving average (MA) model, and their combination (ARMA), have been success-fully applied in a large number of financial applications, by their own nature they are unable to describe nonlinear dynamic patterns as in the case, for example, of asymmetry and volatility clustering. In order to overcome latter issue various approaches have been developed. Between them and with financial applications in mind, we recall the autore-gressive conditional heteroskedasticity (ARCH) model of Engle, together with its generalised version (GARCH), and the regime switching (RS) model which involves multiple equations, each characterizing the behaviors of quantities of interest in different regimes, and a mechanism allowing the switch between them. Concerning the switching methods
Volume 2014, Article ID 753852, 17 pages http://dx.doi.org/10.1155/2014/753852
that can be considered in the RS framework, we would like to cite the threshold autoregressive (TAR) model proposed by Tong in [1], in which regime switching is controlled by a fixed threshold, the autoregressive conditional root (ACR) model of Bec et al. (see [2]) where the regime switching between stationary and nonstationary state is controlled by a binary random variable, and its extension, namely, the functional
coefficient autoregressive conditional root (FCACR) model,
considered by Zhou and Chen in [3]. In particular in this work we aim at using the RS approach to model aforemen-tioned types of unexpected changes by their dependence on an unobserved variable typically defined as the regime or
state. A customary way to formalize such an approach is given
through the following state-space representation:
๐ฆ๐ก= ๐ (๐๐ก, ๐๐) , ๐ก = 1, . . . , ๐, ๐ โ N+, ๐๐กโ ฮฉ, (1) where๐ is the vector of problem parameters, ๐ is the infor-mation set, the state setฮฉ, with |ฮฉ| = ๐ โ N+, is the (finite) set of possible values which can be taken by the the state process๐ at time ๐ก, that is, ๐๐ก โ ฮฉ, and ๐ is a suitable func-tion determining the value of the dependent variable๐ฆ at any given time๐ก โ {1, . . . , ๐}.
The simplest type of structure considers two regimes; that is, ฮฉ = {1, 2} and at most one switch in the time series: in other words, the first ๐ (unknown) observations relate regime 1, while the remaining๐ โ ๐ data concern regime 2. Such an approach can be generalized allowing the system to switch back and forth between the two regimes, with a certain probability. The latter is the case considered by Quandt in his paper published in 1962, where it is assumed to be theo-retically possible for the system to switch between regimes every time that a new observation is generated. Note that previous hypothesis is not realistic in an economic context, since it contradicts the volatility clustering property, typical of financial time series.
The best way to represent the volatility clusters phe-nomenon consists in assuming that the state variable follows a Markov chain and claiming that the probability of having a switch in the next time is much lower than the probability of remaining in the same economic regime. The Markovian switching mechanism was first considered by Goldfeld and Quandt in [4] and then extended by Hamilton to the case of structural changes in the parameters of an autoregres-sive process (see [5]). When the unobserved state variable that controls the switching mechanism follows a first-order Markov chain, the RS model is called Markov Switching Model (MSM). In particular the Markovian property of such a model implies that, given๐ก โ {2, . . . , ๐}, the value ๐๐กof the state variable depends only on๐๐กโ1, a property that turns out to be useful to obtain a good representation of financial data, where abrupt changes occur occasionally.
After Hamiltonโs papers, Markov switching models have been widely applied, together with a number of alternative versions, to analyze different type of both economic and financial time series, for example, stock options behaviors, energy markets trends, and interest rates series. In what
follows we shall often refer to the following, actually rather general, form for an MSM:
๐ฆ๐ก= ๐ (๐๐ก, ๐)
๐๐กโ {1, 2, . . . , ๐} with probabilities
๐๐๐:= P (๐๐ก= ๐ | ๐๐กโ1= ๐) , ๐, ๐ = 1, 2, . . . , ๐, ๐ก = 2, . . . , ๐, (2) where the terms ๐๐๐ are nothing but the transition prob-abilities, from state ๐ at time ๐ก โ 1 to state ๐ at time ๐ก, which determine the stochastic dynamic of the process๐ = {๐๐ก}๐ก=1,...,๐.
In Section 2 we recall the classical approach to MSM following [5] and with respect to both serially uncorrelated and correlated data. Then, in Section 3, we first introduce basic facts related to the Bayesian inference; then we recall the Gibbs sampling technique and related Monte Carlo approximation method which is later used to infer on the MSM parameters. Section 4 is devoted to a goodness of fit (of obtained estimates for parameters of interest) analysis, while inSection 5we describe our forecasting MSM-based procedure which is then used inSection 6to analyse both the
Standard & Poorโs 500 and the Deutsche Aktien indexes.
2. The Classical Approach
In this section we shall use the classical approach (see, for example, [5]) to develop procedures which allow us to make inference on unobserved variables and parameters characterizing MSM. The main idea behind such an approach is splitin two steps: first we estimate the modelโs unknown parameters by a maximum likelihood method; secondly we infer the unobserved switching variable values conditional on the parameter estimates. Along latter lines, we shall analyze two different MSM settings, namely, the case in which data are serially uncorrelated, and the case when they are
autocorrelated.
2.1. Serially Uncorrelated Data. Let us suppose that ๐ =
{๐๐ก}๐กโ{1,...,๐}, ๐ โ N+, is a discrete time stochastic process represented by a first-order Markov chain taking value in some setฮฉ = {1, . . . , ๐}, with transition probabilities:
๐๐๐:= P (๐๐ก+1= ๐ | ๐๐ก= ๐, ๐๐กโ1= ๐๐กโ1, . . . , ๐0= ๐0) = P (๐๐ก+1= ๐ | ๐๐ก= ๐) , โ๐ก โ {2, . . . , ๐} . (3) Then the state-space model we want to study is given by
๐ฆ๐ก= ๐๐๐ก+ ๐๐ก, ๐ก = 1, 2, . . . , ๐, {๐๐ก}๐กโ{1,...,๐} are i.i.d. N (0, ๐2๐๐ก) , ๐๐๐ก= ๐1๐1,๐ก+ โ โ โ + ๐๐๐๐,๐ก, ๐๐๐ก= ๐1๐1,๐ก+ โ โ โ + ๐๐๐๐,๐ก, ๐๐กโ {1, . . . , ๐} with probabilities ๐๐๐, (4)
where the variables๐๐,๐ก,(๐, ๐ก) โ {1, . . . , ๐} ร {1, . . . , ๐} are introduced in order to have a slightly compact equation for ๐๐๐กand๐๐๐ก: in particular,๐๐,๐ก = 1 if ๐๐ก = ๐, otherwise ๐๐,๐ก = 0, which implies that under regime ๐, for ๐ โ {1, . . . , ๐}, parameters of mean and variance are given by๐๐and๐๐.
Let us underline that in (4) the ๐ฆ๐ก are our observed data, for example, historical returns of a stock or some index time series, and we suppose that they are locally distributed as Gaussian random variable in the sense that occasionally jumps could occur for both the mean๐๐๐ก and the variance ๐2๐๐ก. In particular, we assume that ๐1 < ๐2 < โ โ โ < ๐๐ and we want to estimate these ๐ unobserved values for standard deviation, as well as the๐ values for the switching mean. Note that we could also take๐ as a constant, obtaining the so called switching variance problem or๐ as a constant, having a switching mean problem. The first one is in general more interesting in the analysis of financial time series, since variance is usually interpreted as an indicator for the marketโs volatility.
Given the model described by(4), the conditional density of ๐ฆ๐ก given ๐๐ก is Gaussian with mean ๐๐๐ก and standard deviation๐๐๐ก; namely, its related probability density function reads as follow: ๐๐ฆ๐ก|๐๐ก(๐ฅ) = 1 โ2๐๐2 ๐๐ก ๐โ(๐ฅโ๐๐๐ก)2/2๐๐๐ก2, ๐ฅ โ R, (5) and we are left with the problem of estimating both the expec-tations ๐1, . . . , ๐๐ and the standard deviations ๐1, . . . , ๐๐ parameters; a task that is standard to solve by maximizing the associated log-likelihood function lnL defined as follows:
lnL :=
๐
โ
๐ก=1
ln(๐๐ฆ๐ก|๐๐ก) . (6) A different and more realistic scenario is the one character-ized by unobserved values for๐๐ก. In such a case, it could be possible to consider the MSM-inference problem as a two-step procedure consisting in
(i) estimating the parameters of the model by maximiz-ing the log-likelihood function,
(ii) making inferences on the state variable ๐๐ก, ๐ก = 1, . . . , ๐ conditional on the estimates obtained at previous point.
Depending on the amount of information we can use inferencing on๐๐ก, we have
(i) filtered probabilities that refer to inferences about๐๐ก conditional on information up to time๐ก, namely, with respect to๐๐ก,
(ii) smoothed probabilities that refer to inferences about๐๐ก conditional to the whole sample (history),๐๐. In what follows we describe a step-by-step algorithm which allows us to resolve the filtering problem for a sample of serially uncorrelated data. In particular we slightly generalize the approach given in [6], assuming that the state variable๐๐ก
belongs to a 4-state space setฮฉ := {1, 2, 3, 4}, at every time ๐ก = 1, . . . , ๐. Despite 2-state (expansion/contraction) and 3-state (low/medium/high volatility regime) models being the usual choices, we decided to consider 4-state MSM in order to refine the analysis with respect to volatility levels around the mean. A finer analysis can be also performed, even if one has to take into account the related nonlinear computational growth. We first define the log-likelihood function at time๐ก as
๐ (๐, ๐๐ก) = ๐ (๐1, . . . , ๐๐, ๐1, . . . , ๐๐, ๐๐ก) , (7) where๐ := (๐1, . . . , ๐๐, ๐1, . . . , ๐๐) is the vector of parame-ters that we want to estimate. Let us note that the๐ is updated at every iteration, since we maximize with respect to the function๐(๐, ๐๐ก) at every stage of our step-by-step procedure. In particular the calibrating procedure reads as follows.
Inputs.
(i) Put๐(๐) := ๐(๐, ๐0) = 0.
(ii) Compute the transition probabilities of the homoge-neous Markov chain underlying the state variable; that is,
๐๐๐= P (๐๐ก= ๐ | ๐๐กโ1= ๐) , ๐, ๐ โ ฮฉ. (8)
Since in the applications we can only count on return time series, we first have to calibrate with respect to the transition probabilities๐๐๐:
(1) choose 4 values ๐21 < ๐22 < ๐23 < ๐42 (e.g., ๐1 = 0.1, ๐2 = 0.2, ๐3 = 0.3, and ๐4 = 0.4)
and a positive, arbitrary small, constant๐ฟ > 0, for example,๐ฟ = 0.01,
(2) compute for every๐ = 1, 2, 3, 4 and for every ๐ก = 1, . . . , ๐ the values ๐๐ := ฮฆ (๐ฆ๐ก+ ๐ฟ) โ ฮฆ (๐ฆ๐กโ ๐ฟ) = โซ๐ฆ๐ก+๐ฟ ๐ฆ๐กโ๐ฟ 1 โ2๐๐2 ๐ exp [ [ (๐ฅ โ ๐๐)2 2๐2 ๐ ] ] , (9)
(3) simulate a value in{1, 2, 3, 4} for ๐๐กat each time from the discrete probability vector
( ๐1 โ4๐=1๐๐, ๐2 โ4๐=1๐๐, ๐3 โ4๐=1๐๐, ๐4 โ4๐=1๐๐) , (10) (4) set the transition probabilities๐๐๐just by count-ing the number of transition from state๐ to state ๐, for ๐, ๐ = 1, . . . , 4, in order to obtain the following transition matrix:
๐โ = ( ๐11 ๐12 ๐13 ๐14 ๐21 ๐22 ๐23 ๐24 ๐31 ๐32 ๐33 ๐34 ๐41 ๐42 ๐43 ๐44 ) . (11)
(iii) Compute the steady-state probabilities
๐ (0) = (P (๐0= 1 | ๐0) , . . . , P (๐0= 4 | ๐0)) . (12)
Let us note that, by definition, if๐(๐ก) is a 4 ร 1 vector of steady-state probabilities, then๐(๐ก + 1) = ๐(๐ก) for every ๐ก = 1, . . . , ๐; moreover ๐(๐ก + 1) = ๐โ๐(๐ก), and (see for example, [6, pag. 71]) we also have that ๐(๐ก) = (๐ด๐๐ด)โ1๐ด๐[04
1], where 04is a4 ร 1 matrix of
zeros and๐ด = (Id4โ๐โ
14 ), Id4is the four dimensional
identity matrix, while14 = (1, 1, 1, 1), that is, the vector of steady-state probabilities is the last column of the matrix(๐ด๐๐ด)โ1๐ด๐.
Next we perform the following steps, for๐ก = 1, . . . , ๐.
Step 1. The probability of๐๐กconditional to information set at
time๐ก โ 1 is given by P (๐๐ก= ๐ | ๐๐กโ1) =โ4
๐=1
๐๐๐P (๐๐กโ1= ๐ | ๐๐กโ1) , ๐ = 1, . . . , 4. (13)
Step 2. Compute the joint density of๐ฆ๐กand๐๐กconditional to
the information set๐๐กโ1
๐ (๐ฆ๐ก, ๐๐ก= ๐ | ๐๐กโ1) = ๐ (๐ฆ๐ก| ๐๐ก= ๐, ๐๐กโ1)
ร P (๐๐ก= ๐ | ๐๐กโ1) , ๐ = 1, . . . , 4. (14) The marginal density of๐ฆ๐กis given by the sum of the joint density over all values of๐๐ก
๐ (๐ฆ๐ก| ๐๐กโ1) =โ4
๐=1
๐ (๐ฆ๐ก| ๐๐ก= ๐, ๐๐กโ1) P (๐๐ก= ๐ | ๐๐กโ1) . (15)
Step 3. Update the log-likelihood function at time๐ก in the
following way:
๐ (๐, ๐๐ก) = ๐ (๐, ๐๐กโ1) + ln (๐ (๐ฆ๐ก| ๐๐กโ1)) , (16) and maximize ๐(๐, ๐๐ก) with respect to ๐ = (๐1, . . . , ๐4, ๐1, . . . , ๐4), under the condition ๐1 < ๐2 < ๐3 < ๐4, to find the maximum likelihood estimator ฬ๐ for the next time period.
Step 4. Once๐ฆ๐กis observed at the end of the๐กth iteration, we
can update the probability term: P (๐๐ก= ๐ | ๐๐ก)
= ๐ (๐ฆ๐ก๐๐ก= ๐, ๐๐กโ1) P (๐๐ก= ๐๐๐กโ1) โ4๐=1๐ (๐ฆ๐ก| ๐๐ก= ๐, ๐๐กโ1) P (๐๐ก= ๐ | ๐๐กโ1),
๐ = 1, . . . , 4,
(17)
where both๐๐ก = {๐๐กโ1, ๐ฆ๐ก} and ๐(๐ฆ๐ก | ๐๐ก = ๐, ๐๐กโ1) are com-puted with respect to the estimator ฬ๐ = (ฬ๐1, . . . , ฬ๐4, ฬ๐1, . . . , ฬ๐4).
2.2. Serially Correlated Data. In some cases it is possible to
argue and mathematically test by, for example, the Durbin-Watson statistics or Breusch-Godfrey test, for the presence of a serial correlation (or autocorrelation) between data belong-ing to a certain time series of interest. Such a characteristic is often analyzed in signal processing scenario, but examples can be also found in economic, meteorological, or sociolog-ical data sets, especially in connection with autocorrelation of errors in related forecasting procedure. In particular if we suppose that the observed variable๐ฆ๐ก linearly depends on its previous value, then we obtain a first-order autoregressive
pattern and the following state-space model applies:
๐ฆ๐กโ ๐๐๐ก= ๐ (๐ฆ๐กโ1โ ๐๐๐กโ1) + ๐๐ก, ๐ก = 1, 2, . . . , ๐, {๐๐ก}๐กโ{1,...,๐} are i.i.d. N (0, ๐2๐๐ก) , ๐๐๐ก= ๐1S1,๐ก+ โ โ โ + ๐๐๐๐,๐ก, ๐๐๐ก= ๐1๐1,๐ก+ โ โ โ + ๐๐๐๐,๐ก, ๐๐กโ {1, . . . , ๐} with probabilities ๐๐๐, (18) where๐๐๐ := P(๐๐ก = ๐ | ๐๐กโ1 = ๐), ๐, ๐ = 1, . . . , ๐ and ๐๐,๐ก, and (๐, ๐ก) โ {1, . . . , ๐} ร {1, . . . , ๐} are the same variables introduced in the previous section; that is,๐๐,๐ก = 1 if ๐๐ก= ๐, otherwise๐๐,๐ก= 0.
In this situation, if the state๐๐ก is known for every๐ก = 1, . . . , ๐, we need ๐๐กand ๐๐กโ1 to compute the density of ๐ฆ๐ก conditional to past information๐๐กโ1, indeed we have
lnL := ๐ โ ๐ก=1 ln(๐๐ฆ๐ก|๐๐กโ1,๐๐ก,๐๐กโ1) , (19) where ๐๐ฆ๐ก|๐๐กโ1,๐๐ก,๐๐กโ1(๐ฅ) = 1 โ2๐๐2 ๐๐ก ๐โ(๐ฅโ๐๐๐กโ๐(๐ฆ๐กโ1โ๐๐๐กโ1))2/2๐๐๐ก2, ๐ฅ โ R. (20) If ๐๐ก are unobserved (and as before we assume that the state variable can take the four values{1, 2, 3, 4}), we apply the following algorithm in order to resolve the filtering problem for a sample of serially correlated data.
Inputs.
(i) Put๐(๐) := ๐(๐, ๐0) = 0.
(ii) Compute the transition probabilities
๐๐๐= P (๐๐ก= ๐ | ๐๐กโ1= ๐) , ๐, ๐ = 1, 2, 3, 4. (21)
We apply the same trick as before, but firstly we have to estimate the parameter๐: in order to obtain this value we can use the least square methods (see for example, [7]) that is,
ฬ๐ := โ๐๐ก=1๐ฆ๐ก๐ฆ๐ก+1
โ๐๐ก=1๐ฆ2 ๐ก
Then we compute๐ง๐ก:= ๐ฆ๐กโ๐๐ฆ๐กโ1for every๐ก = 1, . . . , ๐ and consider the values๐๐ := ฮฆ(๐ง๐ก+ ๐ฟ) โ ฮฆ(๐ง๐กโ ๐ฟ) (we apply the Normal distribution function to๐ง๐ก+ ๐ฟ instead of๐ฆ๐ก+ ๐ฟ, as done before).
(iii) Compute the steady-state probabilities
๐ (0) = (P (๐0= 1 | ๐0) , . . . , P (๐0= 4 | ๐0)) , (23)
taking the last column of the matrix(๐ด๐๐ด)โ1๐ด๐(see procedure inSection 2.1for details).
Next perform the following steps for๐ก = 1, . . . , ๐.
Step 1. Compute the probabilities of๐๐กconditional to
infor-mation set at time๐ก โ 1, for ๐ = 1, . . . , 4 P (๐๐ก= ๐ | ๐๐กโ1) =โ4 ๐=1 P (๐๐ก= ๐, ๐๐กโ1= ๐ | ๐๐กโ1) =โ4 ๐=1๐๐๐P (๐๐กโ1 = ๐ | ๐๐กโ1) . (24)
Step 2. Compute the joint density of๐ฆ๐ก, ๐๐ก, and ๐๐กโ1 given
๐๐กโ1:
๐ (๐ฆ๐ก, ๐๐ก, ๐๐กโ1| ๐๐กโ1) = ๐ (๐ฆ๐ก| ๐๐ก= ๐, ๐๐กโ1= ๐, ๐๐กโ1) ร P (๐๐ก= ๐, ๐๐กโ1= ๐ | ๐๐กโ1) , (25) where ๐(๐ฆ๐ก | ๐๐ก = ๐, ๐๐กโ1 = ๐, ๐๐กโ1) is given by (20)and P(๐๐ก= ๐, ๐๐กโ1 = ๐ | ๐๐กโ1) is computed inStep 1. The marginal density of๐ฆ๐กconditional on๐๐กโ1is obtained by summing over all values of๐๐กand๐๐กโ1:
๐ (๐ฆ๐ก| ๐๐กโ1) = โ4 ๐=1 4 โ ๐=1 ๐ (๐ฆ๐ก| ๐๐ก= ๐, ๐๐กโ1= ๐, ๐๐กโ1) ร P (๐๐ก= ๐, ๐๐กโ1= ๐ | ๐๐กโ1) . (26)
Step 3. The log-likelihood function at time๐ก is again
๐ (๐, ๐๐ก) = ๐ (๐, ๐๐กโ1) + ln (๐ (๐ฆ๐ก| ๐๐กโ1)) , (27) and it can be maximized with respect to ๐ = (๐1, . . . , ๐4, ๐1, . . . , ๐4), under condition ๐1 < ๐2 < ๐3 < ๐4, giving the maximum likelihood estimator ฬ๐ for the next time period.
Step 4. Update the joint probabilities of๐๐กand ๐๐กโ1
condi-tional to the new information set ๐๐ก, using the estimator ฬ๐ computed in Step 3 by maximizing the log-likelihood function๐(๐, ๐๐ก):
P (๐๐ก= ๐, ๐๐กโ1= ๐ | ๐๐ก)
= ๐ (๐ฆ๐ก, ๐๐ก= ๐, ๐๐กโ1= ๐ | ๐๐กโ1)
๐ (๐ฆ๐ก| ๐๐กโ1) , ๐, ๐ = 1, . . . , 4. (28)
Then compute the updated probabilities of๐๐กgiven๐๐กby summing the joint probabilities over๐๐กโ1as follows:
P (๐๐ก= ๐ | ๐๐ก) = 4 โ ๐=1 P (๐๐ก= ๐, ๐๐กโ1= ๐ | ๐๐ก) , โ๐ = 1, . . . , 4. (29)
The Smoothing Algorithm. Once we have run this procedure,
we are provided with the filtered probabilities, that is, the valuesP(๐๐ก= ๐ | ๐๐ก) for ๐ = 1, . . . , 4 and for each ๐ก = 1, . . . , ๐ (in addition to the estimator ฬ๐).
Sometimes it is required to estimate probabilities of ๐๐ก given the whole sample information; that is,
P (๐๐ก= ๐ | ๐๐) = P (๐๐ก= ๐ | ๐ฆ1, . . . , ๐ฆ๐) , โ๐ = 1, . . . , 4, (30) which are called smoothed probabilities. We are going to show how these new probabilities can be computed from previous procedure (the same algorithm, although with some obvious changes, can be still used starting from procedure in Section
2.1).
Since the last iteration of the algorithm gives us the probabilitiesP(๐๐ = ๐ | ๐๐) for ๐ = 1, . . . , 4, we can start from these values and use the following procedure by doing the two steps for every๐ก = ๐ โ 1, ๐ โ 2, . . . , 2, 1.
Step 1. For๐, ๐ = 1, . . . , 4, compute
P (๐๐ก= ๐, ๐๐ก+1= ๐ | ๐๐) = P (๐๐ก+1= ๐ | ๐๐) P (๐๐ก= ๐ | ๐๐ก+1= ๐, ๐๐) = P (๐๐ก+1= ๐ | ๐๐) P (๐๐ก= ๐ | ๐๐ก+1= ๐, ๐๐ก) = P (๐๐ก+1= ๐ | ๐๐)P (๐๐ก= ๐, ๐๐ก+1= ๐ | ๐๐ก) P (๐๐ก+1= ๐ | ๐๐ก) . (โ)
Remark 1. Note that equality(โ), that is,
P (๐๐ก= ๐ | ๐๐ก+1= ๐, ๐๐) = P (๐๐ก= ๐ | ๐๐ก+1= ๐, ๐๐ก) , (31) holds only under a particular condition; namely,
๐ (โ๐ก+1,๐| ๐๐ก+1= ๐, ๐๐ก= ๐, ๐๐ก) = ๐ (โ๐ก+1,๐| ๐๐ก+1= ๐, ๐๐ก) , (32) whereโ๐ก+1,๐ := (๐ฆ๐ก+1, . . . , ๐ฆ๐)๓ธ (see [6] for the proof). Equa-tion(32)suggests that if๐๐ก+1 were known, then๐ฆ๐ก+1 would contain no information about๐๐ก beyond that contained in ๐๐ก+1 and ๐๐กand does not hold for every state-space model with regime switching (see, for example, [6, Ch. 5]) in which case the smoothing algorithm involves an approximation.
Step 2. For๐ = 1, . . . , 4 compute
P (๐๐ก= ๐ | ๐๐) =โ4
๐=1
P (๐๐ก= ๐, ๐๐ก+1= ๐ | ๐๐) . (33)
3. The Gibbs Sampling Approach
3.1. An Introduction to Bayesian Inference. Under the general
title Bayesian inference we can collect a large number of different concrete procedures; nevertheless they are all based on smart use of the Bayesโ rule which is used to update the probability estimate for a hypothesis as additional evidence is learned (see, for example, [8, 9]). In particular, within
the Bayesian framework, the parameters, for example, let us collect them in a vector called๐, which characterize a certain statistic model and are treated as random variables with their own probability distributions; let us say๐(๐), which plays the role of a prior distribution since it is defined before taking into account the sample data ฬ๐ฆ. Therefore, exploiting the Bayesโ theorem and denoting by๐( ฬ๐ฆ | ๐) the likelihood of ฬ๐ฆ of the interested statistic model, we have that
๐ (๐ | ฬ๐ฆ) = ๐ ( ฬ๐ฆ | ๐) ๐ (๐)
๐ ( ฬ๐ฆ) , (34)
where ๐(๐ | ฬ๐ฆ) is the joint posterior distribution of the parameters. The denominator ๐( ฬ๐ฆ) defines the marginal likelihood of ฬ๐ฆ and can be taken as a constant, obtaining the proportion
๐ (๐ | ฬ๐ฆ) โ ๐ ( ฬ๐ฆ | ๐) ๐ (๐) . (35) It is straightforward to note that the most critical part of the Bayesian inference procedure relies in the choice of a suitable prior distribution, since it has to agree with parameters constraints. An effective answer to latter issue is given by the so called conjugate prior distribution, namely the distribution obtained when the conjugate prior is combined with the likelihood function. Let us note that the posterior distribution ๐(๐ | ฬ๐ฆ) is in the same family as the prior distribution.
As an example, if the likelihood function is Gaussian, it can be shown that the conjugate prior for the mean๐ is the Gaussian distribution, whereas the conjugate prior for the variance is the inverted Gamma distribution (see, for example, [9,10]).
3.2. Gibbs Sampling. A general problem in Statistics concerns
the question of how a sequence of observations which cannot be directly sampled, can be simulated, for example, by mean of some multivariate probability distribution, with a prefixed precision degree of accuracy. Such kind of problems can be successfully attacked by Monte Carlo Markov Chain (MCMC) simulation methods, see, for example, [11โ13], and in particular using the so called Gibbs Sampling technique which allows to approximate joint and marginal distributions by sampling from conditional distributions, see, for example, [14โ16].
Let us suppose that we have the joint density of๐ random variables, for example,๐ = ๐(๐ง1, ๐ง2, . . . , ๐ง๐), fix ๐ก โ {1, . . . , ๐} and that we are interested in in obtaining characteristics of the๐ง๐ก-marginal, namely
๐ (๐ง๐ก) = โซ โ โ โ โซ ๐ (๐ง1, ๐ง2, . . . , ๐ง๐) ๐๐ง1โ โ โ ๐๐ง๐กโ1๐๐ง๐ก+1โ โ โ ๐๐ง๐, (36) such as the related mean and/or variance. In those cases when the joint density is not given, or the above integral turns out to be difficult to treat, for example, an explicit solution does not exist, but we know the complete set of conditional densities, denoted by ๐(๐งt | ๐ง๐ ฬธ=๐ก), ๐ก = 1, 2, . . . , ๐, with
๐ง๐ ฬธ=๐ก = {๐ง1, . . . , ๐ง๐กโ1, ๐ง๐ก+1, . . . , ๐ง๐}, then the Gibbs Sampling method allows us to generate a sample๐ง๐1, ๐ง๐2, . . . , ๐ง๐๐from the
joint density๐(๐ง1, ๐ง2, . . . , ๐ง๐) without requiring that we know either the joint density or the marginal densities. With the following procedure we recall the basic ideas on which the Gibbs Sampling approach is based given an arbitrary starting set of values(๐ง02, . . . , ๐ง๐0).
Step 1. Draw๐ง11from๐(๐ง1| ๐ง20, . . . , ๐ง0๐).
Step 2. Draw๐ง12from๐(๐ง2| ๐ง11, ๐ง30, . . . , ๐ง0๐).
Step 3. Draw๐ง31from๐(๐ง3| ๐ง11, ๐ง21, ๐ง40, . . . , ๐ง๐0).
...
Step k. Finally draw๐ง1๐from๐(๐ง๐ | ๐ง11, . . . , ๐ง๐โ11 ) to complete
the first iteration.
The steps from1through๐ can be iterated ๐ฝ times to get (๐ง1๐, ๐ง๐2, . . . , ๐ง๐๐), ๐ = 1, 2, . . . , ๐ฝ.
In [17] S. Geman and D. Geman showed that both the joint and marginal distributions of generated(๐ง1๐, ๐ง๐2, . . . , ๐ง๐๐), converge, at an exponential rate, to the joint and marginal distributions of ๐ง1, ๐ง2, . . . , ๐ง๐, as ๐ฝ โ โ. Thus the joint and marginal distributions of๐ง1, ๐ง2, . . . , ๐ง๐ can be approxi-mated by the empirical distributions of๐ simulated values (๐ง1๐, ๐ง๐2, . . . , ๐ง๐๐), ๐ = ๐ฟ + 1, . . . , ๐ฟ + ๐, where ๐ฟ is large enough to assure the convergence of the Gibbs sampler. Moreover๐ can be chosen to reach the required precision with respect to the empirical distribution of interest.
In the MSM framework we do not have conditional distributions๐(๐ง๐ก| ๐ง๐ ฬธ=๐ก), ๐ก = 1, 2, . . . , ๐, and we are left with the problem of estimate parameters๐ง๐,๐ = 1, . . . , ๐. Latter problem can be solved exploiting Bayesian inference results, as we shall state in the next section.
3.3. Gibbs Sampling for Markov Switching Models. A major
problem when dealing with inferences from Markov switch-ing models relies in the fact that some parameters of the model are dependent on an unobserved variable, let us say ๐๐ก. We saw that in the classical framework, inference on Markov switching models consists first in estimating the modelโs unknown parameters via maximum likelihood, then inference on the unobserved Markov switching variable ฬ๐๐= (๐1, ๐2, . . . , ๐๐), conditional on the parameter estimates, has to be perfomed.
In the Bayesian analysis, both the parameters of the model and the switching variables๐๐ก, ๐ก = 1, . . . , ๐ are treated as ran-dom variables. Thus, inference on ฬ๐๐is based on a joint dis-tribution, no more on a conditional one. By employing Gibbs sampling techniques, Albert and Chib (see [14]) provided an easy to implement algorithm for the Bayesian analysis of Markov switching models. In particular in their work the parameters of the model and๐๐ก,๐ก = 1, . . . , ๐ are treated as missing data, and they are generated from appropriate con-ditional distributions using Gibbs sampling method. As an example, let us consider the following simple model with two-state Markov switching mean and variance:
{๐๐ก}๐กโ{1,...,๐} are i.i.d. N (0, ๐๐2๐ก) , ๐๐๐ก= ๐0+ ๐1๐๐ก,
๐๐๐ก= ๐02(1 โ ๐๐ก) + ๐12๐๐ก= ๐20(1 + โ1๐๐ก) , โ1> 0, (37) where๐๐กโ {0, 1} with transition probabilities ๐ := P(๐๐ก= 0 | ๐๐กโ1 = 0), ๐ := P(๐๐ก = 1 | ๐๐กโ1 = 1). The Bayesian method consider both๐๐ก,๐ก = 1, . . . , ๐ and the modelโs unknown para-meters๐0,๐1,๐0,๐1,๐ and ๐, as random variables. In order to make inference about these๐ + 6 variables, we need to derive the joint posterior density๐(ฬ๐๐, ๐0, ๐1, ๐20, ๐21, ๐, ๐ | ๐๐), where ๐๐ := (๐ฆ1, ๐ฆ2, . . . , ๐ฆ๐) and ฬ๐๐ = (๐1, ๐2, . . . , ๐๐). Namely the realization of the Gibbs sampling relies on the derivation of the distributions of each of the above๐ + 6 variables, conditional on all the other variables. Therefore we can approximate the joint posterior density written above by running the following procedure๐ฟ + ๐ times, where ๐ฟ is an integer large enough to guarantee the desired convergence. Hence we have the following scheme.
Step 1. We can derive the distribution of๐๐ก,๐ก = 1, . . . , ๐
con-ditional on the other parameters in two different ways. (1) Single-move gibbs sampling: generate each ๐๐ก from
๐(๐๐ก | ฬ๐ ฬธ=๐ก, ๐0, ๐1, ๐2
0, ๐12, ๐, ๐, ๐๐), ๐ก = 1, . . . , ๐, where
ฬ๐ฬธ=๐ก= (๐1, . . . , ๐๐กโ1, ๐๐ก+1, . . . , ๐๐).
(2) Multi-move gibbs sampling: generate the whole block ฬ๐๐from๐(ฬ๐๐| ๐0, ๐1, ๐02, ๐12, ๐, ๐, ๐๐).
Step 2. Generate the transition probabilities๐ and ๐ from
๐(๐, ๐ | ฬ๐๐). Note that this distribution is conditioned only on ฬ๐๐because we assume that๐ and ๐ are independent of both the other parameters of the model and the data,๐๐.
If we choose the Beta distribution as prior distribution for both๐ and ๐, we have that posterior distribution ๐(๐, ๐ | ฬ๐๐) = ๐(๐, ๐)๐ฟ(๐, ๐ | ฬ๐๐) is again a Beta distribution. So,
Beta distribution is a conjugate prior for the likelihood of transition probabilities.
Step 3. Generate๐0 and๐1 from๐(๐0, ๐1 | ฬ๐๐, ๐02, ๐12, ๐, ๐,
๐๐). In this case the conjugate prior is the Normal distribution.
Step 4. Generate๐20 and๐12 from๐(๐02, ๐21 | ฬ๐๐, ๐0, ๐1, ๐, ๐,
๐๐). From definition of the model we have that ๐2
1 = ๐20(1 +
โ1): we can first generate ๐2
0 conditional on โ1, and then
generateโ1 = 1 + โ1conditional on๐02. We use in both cases the Inverted Gamma distribution as conjugate prior for the parameters.
For a more detailed description of these steps (see [6, pp. 211โ218]). Here we examine only the so called
Multi-move Gibbs sampling, originally motivated by Carter and
Kohn (see [15]) in the context of state space models and then implemented in [6] for a MSM. For the sake of simplicity,
let us suppress the conditioning on modelโs parameters and denote
๐ (ฬ๐๐| ๐๐) := ๐ (ฬ๐๐| ๐0, ๐1, ๐02, ๐12, ๐, ๐, ๐๐) . (38) Using the Markov property of{๐๐ก}๐กโ{1,...,๐} it can be seen that
๐ (ฬ๐๐| ๐๐) = ๐ (๐๐| ๐๐)๐โ1โ
๐ก=1
๐ (๐๐ก| ๐๐ก+1, ๐๐ก) , (39) where ๐(๐๐ | ๐๐) = P(๐๐ | ๐๐) is provided by the last iteration of filtering algorithm (see Sections2.1and2.2). Note that(39)suggests that we can first generate๐๐conditional on ๐๐and then, for๐ก = ๐ โ 1, ๐ โ 2, . . . , 1, we can generate ๐๐ก
conditional on๐๐กand๐๐ก+1, namely we can run the following steps.
Step 1. Run the basic filter procedure to get๐(๐๐ก | ๐๐ก), ๐ก =
1, 2, . . . , ๐ and save them; the last iteration of the filter gives us the probability distribution๐(๐๐ | ๐๐), from which ๐๐is generated.
Step 2. Note that
๐ (๐๐ก| ๐๐ก+1, ๐๐ก) = ๐ (๐๐ก, ๐๐ก+1| ๐๐ก) ๐ (๐๐ก+1| ๐๐ก) = ๐ (๐๐ก+1| ๐๐ก, ๐๐ก) ๐ (๐๐ก| ๐๐ก) ๐ (๐๐ก+1| ๐๐ก) = ๐ (๐๐ก+1| ๐๐ก) ๐ (๐๐ก| ๐๐ก) ๐ (๐๐ก+1| ๐๐ก) โ ๐ (๐๐ก+1| ๐๐ก) ๐ (๐๐ก| ๐๐ก) , (40) where๐(๐๐ก+1| ๐๐ก) is the transition probability and ๐(๐๐ก| ๐๐ก) has been saved from Step 1. So we can generate ๐๐ก in the following way, first calculate
P (๐๐ก= 1 | ๐๐ก+1, ๐๐ก)
= ๐ (๐๐ก+1| ๐๐ก= 1) ๐ (๐๐ก= 1 | ๐๐ก) โ1๐=0๐ (๐๐ก+1| ๐๐ก= ๐) ๐ (๐๐ก= ๐ | ๐๐ก),
(41)
and then generate๐๐กusing a uniform distribution. For exam-ple, we generate a random number from a uniform distri-bution between 0 and 1; if this number is less than or equal to the calculated value ofP(๐๐ก = 1 | ๐๐ก+1, ๐๐ก), we set ๐๐ก = 1, otherwise,๐๐กis set equal to 0.
In view of applications, let us now consider the following
four state MSM:
๐ฆ๐กโผ N (0, ๐๐2๐ก) , ๐ก = 1, 2, . . . , ๐,
๐๐๐ก= ๐1๐1,๐ก+ ๐2๐2,๐ก+ ๐3๐3,๐ก+ ๐4๐4,๐ก,
๐๐กโ {1, 2, 3, 4} with transition probabilities ๐๐๐:= P (๐๐ก= ๐ | ๐๐กโ1= ๐) , ๐, ๐ = 1, 2, 3, 4,
(42)
where๐๐,๐ก = 1 if ๐๐ก = ๐, otherwise ๐๐,๐ก = 0. Note that this is a particular case of the model analysed inSection 2.1, where
๐๐ก = 0 โ๐ก, hence we can perform the procedure referred to serially uncorrelated data taking๐๐๐ก= ๐ = 0 to start the Gibbs sampling algorithm, therefore we have
Step 1. Generate ฬ๐๐ := (๐1, ๐2, . . . , ๐๐) conditional on ฬ๐2 :=
(๐2
1, ๐22, ๐32, ๐42),
ฬ๐ := (๐11, ๐12, ๐13, ๐21, ๐22, ๐23, ๐31, ๐32, ๐33, ๐41, ๐42, ๐43) ,
๐๐:= (๐ฆ1, ๐ฆ2, . . . , ๐ฆ๐) .
(43) For this purpose, we employ the Multi-move Gibbs sam-pling algorithm:
(1) run procedure inSection 2.1with๐๐๐ก = 0 in order to get, from last iteration,๐(๐๐| ๐๐) = P(๐๐| ๐๐), (2) recalling that๐(๐๐ก| ๐๐ก+1, ๐๐ก) โ ๐(๐๐ก+1| ๐๐ก)๐(๐๐ก| ๐๐ก)
for๐ก = ๐โ1, . . . , 1, we can generate ๐๐กfrom the vector of probabilities (P (๐๐ก= 1 | ๐๐ก+1, ๐๐ก) , P (๐๐ก= 2 | ๐๐ก+1, ๐๐ก) , P (๐๐ก= 3 | ๐๐ก+1, ๐๐ก) , P (๐๐ก= 4 | ๐๐ก+1, ๐๐ก)) , (44) where, for๐ = 1, . . . , 4, P (๐๐ก= ๐๐๐ก+1, ๐๐ก) = ๐ (๐๐ก+1๐๐ก= 1) ๐ (๐๐ก= ๐๐๐ก) โ3๐=1๐ (๐๐ก+1๐๐ก= ๐) ๐ (๐๐ก= ๐๐๐ก). (45)
Step 2. Generateฬ๐2, conditional on ฬ๐๐and the data๐๐.
We want to impose the constraint๐12 < ๐22< ๐23 < ๐42, so we redefine๐๐2๐กin this way:
๐2๐๐ก= ๐21(1 + ๐2,๐กโ2) (1 + ๐3,๐กโ2) (1 + ๐3,๐กโ3) ร (1 + ๐4,๐กโ2) (1 + ๐4,๐กโ3) (1 + ๐4,๐กโ4) ,
(46)
whereโ๐ > 0 for ๐ = 1, . . . , 4, so that ๐22 := ๐12(1 + โ2), ๐32 := ๐2
1(1+โ2)(1+โ3) and ๐24 := ๐21(1+โ2)(1+โ3)(1+โ4). With this
specification, we first generate๐21, then generateโ2 = 1 + โ2, โ3 = 1 + โ3 andโ4 = 1 + โ4 to obtain๐22,๐32and๐24 indi-rectly.
Generating๐12, Conditional onโ2,โ3 andโ4. Define for๐ก =
1, . . . , ๐
๐๐ก1:= ๐ฆ๐ก
โ(1 + ๐2,๐กโ2) (1 + ๐3,๐กโ2) (1 + ๐3,๐กโ3) (1 + ๐4,๐กโ2) (1 + ๐4,๐กโ3) (1 + ๐4,๐กโ4) (47)
and take๐1๐ก โผ N(0, ๐21), in(42). By choosing an inverted
Gamma prior distribution, that is,๐(๐12 | โ2, โ3, โ4) โผ IG(]1/
2, ๐ฟ1/2) where ]1, ๐ฟ1are the known prior hyperparameters, it can be shown that the conditional posterior distribution from which we generate๐12is given by:
๐12| ๐๐, ฬ๐๐, โ2, โ3, โ4โผ IG (]1+ ๐
2 ,
๐ฟ1+ โ๐๐ก=1๐1 ๐ก
2 ) . (48)
Generatingโ2 Conditional on๐21,โ3 andโ4. Note that the
likelihood function ofโ2depends only on the values of๐ฆ๐กfor which๐๐ก โ {2, 3, 4}. Therefore, take ๐ฆ๐ก(1) := {๐ฆ๐ก | ๐๐ก โ {2, 3, 4}, ๐ก = 1, . . . , ๐} and denote with ๐2the size of this sample. Then define
๐๐ก2:= ๐ฆ๐ก(1)
โ๐2
1(1 + ๐3,๐กโ3) (1 + ๐4,๐กโ3) (1 + ๐4,๐กโ4)
, (49) hence, for the observation in which๐๐ก = 2, 3 or 4, we have ๐2๐กโผ N(0, โ2). If we choose an inverted Gamma distribution
with parameters]2, ๐ฟ2 for the prior, we obtainโ2 = 1 + โ2 from the following posterior distribution:
โ2| ๐๐, ฬ๐๐, ๐12, โ3, โ4โผ IG (]2+ ๐2
2 ,
๐ฟ2+ โ๐2
๐ก=1๐๐ก2
2 ) . (50)
In caseโ2> 1 put โ2= โ2โ1 and ๐22:= ๐21(1+โ2). Otherwise reiterate this step.
Generatingโ3 Conditional on๐21,โ2 andโ4. Operate in a
similar way as above. In particular if we define๐ฆ๐ก(2) := {๐ฆ๐ก | ๐๐กโ {3, 4}, ๐ก = 1, . . . , ๐} we will obtain ๐๐ก3:= ๐ฆ๐ก(2) โ๐2 1(1 + ๐3,๐กโ2) (1 + ๐4,๐กโ2) (1 + ๐4,๐กโ4) โผ N (0, โ3) . (51)
Generatingโ4 Conditional on๐21,โ2 andโ3. Operate in a
similar way as above. In particular if we define๐ฆ๐ก(3) := {๐ฆ๐ก | ๐๐ก= 4, ๐ก = 1, . . . , ๐} we will have
๐๐ก4:= ๐ฆ๐ก(3)
โ๐2
1(1 + ๐4,๐กโ2) (1 + ๐4,๐กโ3)
โผ N (0, โ4) . (52)
Step 3. Generateฬ๐ conditional on ฬ๐๐. In order to generate the
transition probabilities we exploit the properties of the prior
Beta distribution. Let us first define
๐๐๐:= P (๐๐ก ฬธ= ๐ | ๐๐กโ1= ๐) = 1 โ ๐๐๐, ๐ = 1, 2, 3, 4; ๐๐๐:= P (๐๐ก= ๐ | ๐๐กโ1= ๐, ๐๐ก ฬธ= ๐) , ๐ ฬธ= ๐. (53)
Hence we have that
๐๐๐:= P (๐๐ก= ๐ | ๐๐กโ1= ๐)
= P (๐๐ก= ๐ | ๐๐กโ1= ๐, ๐๐ก ฬธ= ๐) P (๐๐ก ฬธ= ๐ | ๐๐กโ1= ๐) = ๐๐๐(1 โ ๐๐๐) โ๐ ฬธ= ๐.
(54)
Given ฬ๐๐, let๐๐๐,๐, ๐ = 1, 2, 3, 4 be the total number of tran-sitions from state๐๐กโ1= ๐ to ๐๐ก= ๐, ๐ก = 2, 3, . . . , ๐ and ๐๐๐the number of transitions from state๐๐กโ1= ๐ to ๐๐ก ฬธ= ๐.
Begin with the generation of probabilities๐๐๐,๐ = 1, 2, 3, 4 by taking the Beta distribution as conjugate prior: if we take ๐๐๐โผ Beta(๐ข๐๐, ๐ข๐๐), where ๐ข๐๐and๐ข๐๐are the known
hyperpara-meters of the priors, the posterior distribution of๐๐๐given ฬ๐๐ still belongs to the Beta family distributions, that is,
๐๐๐| ฬ๐๐โผ Beta (๐ข๐๐+ ๐๐๐, ๐ข๐๐+ ๐๐๐) , ๐ = 1, 2, 3, 4. (55) The others parameters, that is,๐๐๐for๐ ฬธ= ๐ and ๐ = 1, 2, 3, can be computed from the above equation๐๐๐ = ๐๐๐(1 โ ๐๐๐), where ๐๐๐ are generated from the following posterior Beta distribution:
๐๐๐| ฬ๐๐โผ Beta (๐ข๐๐+ ๐๐๐, ๐ข๐๐+ ๐๐๐) . (56) For example, given that๐11is generated, we can obtain๐12 and๐13by considering
๐12| ฬ๐๐โผ Beta (๐ข12+ ๐12, ๐ข12+ ๐12) ,
๐13| ฬ๐๐โผ Beta (๐ข13+ ๐13, ๐ข13+ ๐13) , (57) where๐12= ๐13+๐14and๐13= ๐12+๐14. Finally, given๐11,๐12 and๐13generated in this way, we have๐14 = 1โ๐11โ๐12โ๐13.
Remark 2. When we do not have any information about
priors distribution we employ hyperparameters ๐ข๐๐ = 0.5, ๐, ๐ = 1, 2, 3, 4. Usually we know that elements of the matrix diagonal in the transition matrix are bigger than elements out of the diagonal, because in a financial framework regime switching happens only occasionally: in this case, since we want๐๐๐close to 1 and๐๐๐,๐ ฬธ= ๐, close to 0, we will choose ๐ข๐๐bigger than๐ข๐๐.
4. Goodness of Fit
Since financial time series are characterized by complex and rather unpredictable behavior, it is difficult to find, if there is any, a possible pattern. A typical set of techniques which allow to measure the goodness of forecasts obtained by using a certain model, is given by the residual analysis. Let us suppose that we are provided with a time series of return observations {๐ฆ๐ก}๐ก=1,...,๐,๐ โ N+, for which we choose, for example, the
model described in(4)with๐ = 4. By running the procedure ofSection 2.1we obtain the filtered probabilities
P (๐๐ก= ๐ | ๐๐ก) , ๐ = 1, 2, 3, 4, ๐ก = 1, . . . , ๐, (58) and, by maximization of the log-likelihood function, we compute the parameters ฬ๐1, . . . , ฬ๐4, ฬ๐1, . . . , ฬ๐4, therefore we
can estimate both the mean and variance of the process at time๐ก, for any ๐ก = 1, . . . , ๐, given the information set ๐๐กas weighted average of four values:
ฬ๐๐ก:= E (๐๐ก| ๐๐ก) := ฬ๐1P (๐๐ก= 1 | ๐๐ก)
+ โ โ โ + ฬ๐4P (๐๐ก= 4 | ๐๐ก) , ฬ๐๐ก2:= E (๐๐ก2| ๐๐ก) := ฬ๐12P (๐๐ก= 1 | ๐๐ก)
+ โ โ โ + ฬ๐42P (๐๐ก= 4 | ๐๐ก) .
(59)
If the chosen model fits well the data, then the standardized residuals will have the following form:
ฬ๐๐ก=๐ฆ๐กโ ฬ๐๐ก
ฬ๐๐ก โผ N (0, 1) , ๐ก = 1, . . . , ๐, (60)
therefore it is natural to apply a normality test as, for example, the Jarque-Bera test (see [18]) for details. We recall briefly that Jarque-Bera statistics is defined as
JB:= ๐ 6 (๐2โ
1
4(๐พ โ 3)2) , (61) where the parameters๐ and ๐พ indicate the skewness, respec-tively, the kurtosis ofฬ๐๐ก. Ifฬ๐๐กcome from a Normal distribu-tion, the Jarque-Bera statistics converges asymptotically to a chi-squared distribution with two degrees of freedom, and can be used to test the null hypothesis of normality: this is a joint hypothesis of the skewness being zero and the excess kurtosis(๐พ โ 3) being also zero.
Remark 3. Note that the Jarque-Bera test is very sensitive
and often rejects the null hypothesis only because of a few abnormal observations, this is the reason why, one has to take point out these outliers which has to be canceled out before apply the test on the obtained smoothed data.
5. Prediction
The forecasting task is the most difficult step in the whole MSM approach. Let us suppose that our time series ends at time๐ โ N+, without further observations, then we have to start the prediction with the following quantities:
(i) the transition probability matrix๐โ:= {๐๐๐}๐,๐=1,2,3,4; (ii) the vector ๐๐ := P(๐๐ | ๐๐) = (P(๐๐ = 1 |
๐๐), . . . , P(๐๐ = 4 | ๐๐)) obtained from the last iteration of the filter algorithm, for example, the procedure inSection 2.1.
It follows that we have to proceed with the first step of the filter procedure obtaining the one-step ahead probability of the state๐๐+1given the sample of observations๐๐, that is,
P (๐๐+1= ๐ | ๐๐) =โ4
๐=1
๐๐๐P (๐๐= ๐ | ๐๐) , ๐ = 1, 2, 3, 4. (62) Equation (62) can be seen as a prediction for the regime at time๐ + 1, knowing observations up to time ๐. At this
point, the best way to make prediction about the unobserved variable is the simulation of further observations. Indeed, with the new probability P(๐๐+1 | ๐๐) and the vector of parameter estimates ฬ๐ := (ฬ๐1, . . . , ฬ๐4, ฬ๐1, . . . , ฬ๐4) we can estimate the one step ahead mean and variance as follows:
ฬ๐๐+1:= E (๐๐+1| ๐๐) := ฬ๐1P (๐๐+1= 1 | ๐๐) + โ โ โ + ฬ๐4P (๐๐+1= 4 | ๐๐) , ฬ๐2 ๐+1:= E (๐2๐+1| ๐๐) := ฬ๐12P (๐๐+1= 1 | ๐๐) + โ โ โ + ฬ๐42P (๐๐+1= 4 | ๐๐) . (63) Then we simulate ฬ๐ฆ๐+1by the Gaussian distributionN(ฬ๐๐+1, ฬ๐๐+1) and, once ฬ๐ฆ๐+1 has been simulated, we defe๐๐+1 := {๐ฆ1, . . . , ๐ฆ๐, ฬ๐ฆ๐+1}. Then we first apply again the filter proce-dure ofSection 2.1for๐ก = ๐ + 1 in order to obtain P(๐๐+1 | ๐๐+1), then we compute P(๐๐+1 | ๐๐+1), ฬ๐๐+2and ฬ๐๐+22 , and
we simulateฬ๐ฆ๐+2by the Gaussian distributionN(ฬ๐๐+2, ฬ๐๐+2). Latter procedure runs the same all the other rime-steps๐ + 3, . . . , ๐+๐, where ๐ โ N+is the time horizon of our forecast.
Remark 4. We would like to underline that latter described
method is not reliable with few simulations since each๐ฆ๐+๐, for๐ = 1, . . . , ๐ may assume a wide range of values and a single drawn describes only one of the many possible paths. So we can think to reiterate previous strategy many times in order to compute the mean behavior ofP(๐๐+๐ | ๐๐+๐), ฬ๐๐+๐andฬ๐๐+๐. After having obtained a satisfactory number of
data, then we can construct a confidence interval within the state probability will more likely take value. Obviously a high number of iterations of latter procedure rapidly increases the computational complexity of the whole algorithm because of the MLE related computational complexity, therefore we will adopt a rather different strategy which consists in simulating ๐ฆ๐+๐ ๐ times at each step (e.g., ๐ = 10000) and then taking the mean over those values. However, we must pay attention because the mean calculation could cancel the possible regime switching: for example, if we draw many times๐ฆ๐กfromN(0, ๐๐๐ก) and we take the mean, by the law of large number we will have zero at any time. To overcome this problem we can take the mean of absolute values and then multiply this mean by a number๐ฅ, which is a random variable that takes values 1 orโ1, with equal probability, hence deciding the sign of๐ฆ๐กat every simulation step.
6. Applications
In this section we are going to apply the classical inference approach for a MSM to analyse real financial time series. In particular we will first examine data coming from the Standard & Poorโs 500 (S&P 500) equity index which is con-sidered, being based on the 500 most important companies in the United States, as one of the best representations of the U.S. stock market. Secondly, we shall consider the DAX (Deutsche Aktien Index) index which follows the quotations of the 30 major companies in Germany. Our choice is motivated by
a twofold goal: first we want to test the proposed 4-states MSM model on two particularly significant indexes which have shown to incorporate abrupt changes and oscillations, secondly we aim at comparing the behaviour of the two indexes between each other.
Computations have been performed following the MSM approach described in previous section, namely exploiting the procedures illustrated inSection 2. Let us underline that, instead of a standard 3-states MSM model, we shall use a 4-states MSM approach both for the S&P 500 and the DAX returns. Moreover the analysis has been realized for different intervals of time, focusing mainly on the period of Global Financial Crisis.
6.1. The S&P 500 Case. Figure 1 reports the graph of the
Standard & Poorโs 500 from 1st June 1994 to 27th May 2014, and it points out the dramatic collapse of index prices in years 2008-2009, when the crisis blowed-up causing the achievement, 6th of March 2009 with 683.38 points, of the lowest value since September 1996.
Because of the latter fact we decided to focus our analysis on recent years. In particular we take into account data starting from the 1st of June 2007, and until 27 May 2014, therefore, denoting withฮ the set of observations and with ๐๐ก,๐ก โ ฮ, the price data of the S&P 500, returns are calculated as usual by๐ฆ๐ก := (๐๐กโ ๐๐กโ1)/๐๐กโ1,๐ก โ ฮ, where {๐ฆ๐ก}๐กโฮ are the values for which we want to choose the best MSM. Note that in our implementation we grouped the daily data into weekly parcels in order to make the filter procedures less time-consuming and have a more clear output, therefore we obtain a vector of 365 values, still denoted by๐ฆ๐ก, as shown in
Figure 2.
Next step consists in understand if the returns are serially correlated or serially uncorrellated, a taks which can be accomplished by running some suitable test, for example, the Durbin-Watson test (see, for example, [19,20] or [7]) com-puting directly the value of the autoregressive parameter๐ by least square methods, namely๐ := (โ๐กโฮ๐ฆ๐ก๐ฆ๐ก+1)/(โ๐กโฮ๐ฆ๐ก2), which gives us a rather low value, that is,โ0.0697, so that we can neglect the autoregressive pattern and start the analysis by considering S&P 500 returns to be generated by a Gaussian distribution with switching mean and variance, that is,
๐ฆ๐ก= ๐๐๐ก+ ๐๐ก, ๐ก โ ฮ {๐๐ก}๐กโ{1,...,๐} are i.i.d. N (0, ๐2๐๐ก) , ๐๐๐ก= ๐1๐1,๐ก+ โ โ โ + ๐4๐4,๐ก, ๐๐๐ก= ๐1๐1,๐ก+ โ โ โ + ๐4๐4,๐ก, ๐๐กโ {1, . . . , 4} with probabilities ๐๐๐, (64)
where for(๐, ๐ก) โ {1, . . . , 4} ร ฮ we have ๐๐,๐ก = 1 if ๐๐ก = ๐, otherwise ๐๐,๐ก = 0. Therefore, we suppose that the state variable๐๐ก,๐ก โ ฮ, takes its values in the set ฮฉ := {1, 2, 3, 4}, and we expect that the probabilities of being in the third and fourth state increase as a financial crisis occurs. Exploiting the procedure provided inSection 2.1, with respect to the returns
17-May-1994400 19-Jan-2001 24-Sep-2007 29-May-2014 600 800 1000 1200 1400 1600 1800 2000 S&P500
Figure 1: Daily observations of S&P 500 from 1994 to 2014.
07-Jun-2007 04-Oct-2009 01-Feb-2012 31-May-2014
0 0.05 0.1 0.15
Weekly returns S&P500 โ0.2
โ0.15 โ0.1 โ0.05
Figure 2: Daily returns of S&P 500 from 2007 to 2014.
๐ฆ๐ก,๐ก โ ฮ = {1, . . . , 365}, we get the results shown in Figures3
and4.
Let us now consider the estimated standard deviation ฬ๐๐ก:= E (๐๐ก| ๐๐ก) := ฬ๐1P (๐๐ก= 1 | ๐๐ก)
+ โ โ โ + ฬ๐4P (๐๐ก= 4 | ๐๐ก) , ๐ก โ ฮ, (65) which we want to compare with the VIX index, also known as the Chicago Board Options Exchange (CBOE) market volatility index, namely one of the most relevant measure for the implied volatility of the S&P 500 index, whose value used by our analysis are reported inFigure 5.
What we obtain by plotting both estimated volatility and VIX values in the same graph can be seen inFigure 6, where the VIX trend is plotted in red, while we have used the blue color for the conditional standard deviation values.
Note that, in order to have values of the same order, each ฬ๐๐ก, ๐ก โ ฮ, has been multiplied by a scale factor equal to
1000. We would like to point out how the estimated standard deviation accurately approximates the VIX behaviour, hence allow us to count on an effective substitute for the volatility of the S&P 500, at least during a relative nonchaotic period. In fact we also underline that the greater discrepancies between real and simulated values appears during the maximum intensity period of the recent financial crisis. In particular the widest gaps are realized in correspondence with the recession experienced at the end of 2008.
In what follows we study how latter evidence influences the global goodness of the realized analysis. In particular we performed a goodness of fit analysis computing the standardized residuals of the proposed MSM byฬ๐๐ก := (๐ฆ๐กโ ฬ๐๐ก)/ฬ๐๐ก,๐ก โ ฮ, where ๐ฆ๐กis the observation of S&P 500 return
at time๐ก, ฬ๐๐กis the estimated conditional mean, andฬ๐๐กis the standard deviation. If the model is a good fit for the S&P 500 return, standardized residuals will be generated by a standard Gaussian distribution. In Figures7and8, we have reported both the histogram, its related graph, and the so called normal
probability plot (NPP) for the standardized residuals.
Let us recall that the purpose of the NPP is to graphically assess whether the residuals could come from a normal distribution. Indeed, if such a hypothesis holds, then the NPP
has to be linear, namely the large majority of the computed
values, that is, the blue points inFigure 8, should stay close to a particular line, which is the red dotted one inFigure 8, which is the case in our analysis apart from the three points in the left-hand corner of the graph, which correspond to the minimal values of the vector of standardized residuals.
Applying two normality tests on {ฬ๐๐ก}๐กโฮ, that is, the
Jarque-Bera test and (see, for example, [21, pag. 443]) the
Lilliefors test, we have that the null hypothesis of normality
for the standardized residuals can be rejected at the 5% level, unless the previous pointed out outliers are removed. Indeed if the two minimal standardized residuals, corresponding to ฬ๐71 = โ3.8441 and ฬ๐153 = โ3.6469, are cancelled out from the vector{ฬ๐๐ก}๐กโฮ, previously cited tests indicate that the
0 50 100 150 200 250 300 350 400 0 State 1 0 50 100 150 200 250 300 350 400 0 State 2 0 50 100 150 200 250 300 350 400 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 State 3 0 50 100 150 200 250 300 350 400 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 State 4
Figure 3: Filtered probabilitiesP(๐๐ก= ๐ | ๐๐ก) for ๐ก โ ฮ, ๐ โ {1, 2, 3, 4}.
0 50 100 150 200 250 300 350 400 0 50 100 150 200 250 300 350 400 0 50 100 150 200 250 300 350 400 0 0.05 0.15 0.25 0.35 0 50 100 150 200 250 300 350 400 0 State 1 State 2 State 3 0.1 0.1 0.2 0.2 0.3 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 State 4
17-May-19940 19-Jan-2001 24-Sep-2007 29-May-2014 10 20 30 40 50 60 70 80 90 VIX
Figure 5: CBOE volatility index (VIX), daily data from 1994 to 2014.
0 50 100 150 200 250 300 350 400 10 20 30 40 50 60 70 80 VIX index Estimated volatility
Figure 6: VIX index (red) versus estimated volatility (blue).
0 50 100 150 200 250 300 350 400 0 1 2 3 4 Standardized residuals โ1 โ1 โ2 โ2 โ3 โ3 โ4 โ4 0 1 2 3 4 0 20 40 60 80 100 120
Figure 7: Plot and histogram of standardized residuals.
hypothesis of normality, at the same significance level of5%, cannot be rejected. In particular, the Jarque-Bera statistics value is JB = 2.7858 with corresponding ๐-value ๐JB =
0.2153, and the critical value for this test, that is, the max-imum value of the JB statistics for which the null hypothesis cannot be rejected at the chosen significance level, is equal to๐JB = 5.8085. Similarly, with regard to the Lilliefors test,
numerical value of Liellifors statistics,๐-value and critical value are respectively given by๐ฟ = 0.0424, ๐๐ฟ = 0.1181 and ๐๐ฟ= 0.0472.
In what follows we develop the forecast procedure shown in Section 5. Since we are dealing with weekly data, let us
suppose we want to predict probability of volatilityฬ๐๐ก,๐ก โ ฮ on a time horizon of two months, hence8 steps ahead, then simulations have been performed according to Remark 4, with๐ = 15000, ๐ = 365, ๐ = {1, 2, . . . , 8}, and ๐ฅ uniformly distributed in{โ1, 1}. Obtained forecasting results are shown inFigure 9, where plots are referred to the observations from the 300th to the 373rd, with the last8 simulated values within red rectangles.
6.2. The DAX Case. In what follows the proposed 4-state
MSM shall be applied to analyse the Deutsche Aktien Index (DAX) stock index during a shorter, compared to the study
0 1 2 3 0.001 0.0030.01 0.02 0.05 0.10 0.25 0.50 0.75 0.90 0.95 0.98 0.99 0.997 0.999 Data Prob abi lit y
Normal probability plot
โ1 โ2 โ3 โ4
Figure 8: Normal probability plot of standardized residuals.
0 10 20 30 40 50 60 70 80 0 0.01 0.02 0.03 Forecasted returns โ0.01 โ0.02 โ0.03 (a) 0 10 20 30 40 50 60 70 80 0.0156 0.0158 0.016 0.0162 0.0164 0.0166 0.0168 0.017 0.0172 Forecasted volatility (b)
Figure 9: Forecast for returns (a) and conditional standard deviation (b).
09-Jan-2008 06-Jan-2009 04-Jan-2010 02-Jan-2011
0 0.05 0.1 0.15 0.2
Weekly returns DAX โ0.2
โ0.25
โ0.15โ0.1
โ0.05
Figure 10: Weekly returns of DAX from 1st January 2008 to 1st January 2011.
made in Section 6.1, time interval, indeed we take into account data between the 1st of January 2008 and until the 1st of January 2011. Such a choice is motivated by the will of concentrate our attention on the most unstable period, that for the DAX index starts with the abrupt fall experienced by the European markets at the beginning of 2008. We underline that, with respect to the considered time period, the estimated autoregressive coefficient for the DAX index is a bit higher, in absolute value, than the one obtained for the S&P 500 index, indeed ฬ๐DAX = โ0.1477 versus ฬ๐S&P500= โ0.0697, so that we
decided to fit the weekly DAX returns with an autoregressive
pattern using the procedure inSection 2.2and according to the following MSM: ๐ฆ๐ก= ๐ (๐ฆ๐กโ1) + ๐๐ก, ๐ก โ ฮ {๐๐ก}๐กโ{1,...,๐} are i.i.d. N (0, ๐2๐๐ก) , ๐๐๐ก= ๐1๐1,๐ก+ โ โ โ + ๐4๐4,๐ก, ๐๐กโ {1, . . . , 4} with probabilities ๐๐๐, (66)
0 20 40 60 80 100 120 140 160 00 20 40 60 80 100 120 140 160 0.05 0.15 0.25 0.35 0 20 40 60 80 100 120 140 160 0 0.05 0.15 0.25 0.35 0.45 0 20 40 60 80 100 120 140 160 0 State 1 State 2 State 3 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.1 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.6 0.7 0.8 0.9 1 State 4 0.1 0.2 0.3
Figure 11: Filtered probabilitiesP(๐๐ก= ๐ | ๐๐ก) for ๐ก โ ฮ, ๐ โ {1, 2, 3, 4}.
0 20 40 60 80 100 120 140 160 0 20 40 60 80 100 120 140 160 0 20 40 60 80 100 120 140 160 00 20 40 60 80 100 120 140 160 State 1 State 2 State 3 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.8 0.9 1 State 4