• Non ci sono risultati.

ASSESSING IDENTIFICATION RESTRICTIONS IN STRUCTURAL VECTOR AUTOREGRESSIVE MODELS: A GENERALISED, DATA-DRIVEN APPROACH

N/A
N/A
Protected

Academic year: 2021

Condividi "ASSESSING IDENTIFICATION RESTRICTIONS IN STRUCTURAL VECTOR AUTOREGRESSIVE MODELS: A GENERALISED, DATA-DRIVEN APPROACH"

Copied!
96
0
0

Testo completo

(1)

Università di Pisa

Scuola Superiore di Studi Universitari e di Perfezionamento

Sant’Anna

Corso di Laurea Magistrale in Economics

Tesi di Laurea Magistrale

Assessing Identification Restrictions in

Structural Vector Autoregressive Models

a Generalized, Data-Driven Approach

Relatore

Prof. Alessio Moneta

Candidato

Corrado Calemma

(2)

Abstract

Shock identification in Vector Autoregressive (VAR) models has often put researchers in a position from which they can only rely, for the purpose of obtaining a structural representation of the economic mechanisms that they try to capture, on a number of assumptions derived mostly from economic theory or often built on an oversimplified description of the macroeconomy for the sheer sake of reducing the number of unknowns in underidentified systems of equations. Many of these assumptions cannot be easily tested jointly with the specification of the model, leaving a great amount of space to the discretion of researchers in the endeavor of spotting the perfect shock identification strategy. Recent developments in the VAR literature, drawing on the heavy, but extremely generical assumption of independent (and, in many cases, non-Gaussian) structural shocks, have demonstrated that it is possible to identify structural shocks by using only the distribution of reduced-form shocks and taking advantage of the information provided by its moments even beyond the variance-covariance matrix, making on one hand shock identification possible without those kinds of assumptions, and on the other hand offering a new way to evaluate – or even test – previous identification strategies. The primary question driving the research at the basis of the following work is about looking for suitable ways in which we can assess the plausibility of a priori shock identification assumptions (depending on the category they belong to) in the light of the results obtained with these new models. After a thorough review of the most appreciated techniques used in the SVAR literature, we will proceed by evaluating the possibilities (if any) to devise a plausibility test for each one of them using the simulated posterior distribution of model parameters obtained under the shock-independence assumption.

Keywords: Bayesian VAR, time-series analysis, shock identification. JEL Codes: C32, C52.

(3)

Contents

1 Introduction 1

2 The Vector Autoregressive Model 5

3 Identification Strategies 12

3.1 The recursive approach . . . 15

3.2 Non-recursive identification strategies . . . 18

3.3 Long-Run Restrictions . . . 27

3.4 Identification by Sign Restrictions . . . 31

4 Independent and non-Gaussian Shocks 34 4.1 Time-invariant parameter Structural BVAR . . . 35

4.2 The TVP-SVAR model . . . 44

5 A Testing Ground for Identification Restrictions 52 5.1 Recursive identifications . . . 53

5.2 Fiscal Shocks VARs . . . 57

5.3 Monetary Policy VARs . . . 61

5.4 Long-run Restrictions . . . 64

5.5 Sign Restrictions . . . 66

6 Results 70 6.1 Oil Market SVAR . . . 70

6.2 Fiscal Shocks and the Macroeconomy . . . 71

6.3 Macroeconomic VECM models . . . 74

6.4 Monetary policy models . . . 76

6.5 Sign restrictions . . . 77

7 Conclusions 79

A Time Series 87

(4)

List of Figures

1 Kilian (2009) time series . . . 87

2 Blanchard and Perotti (2002) time series . . . 88

3 King et al. (1991) time series . . . 89

4 Bernanke and Mihov (1998) time series . . . 90

5 Gambetti and Musso (2017) time series . . . 91

(5)

1

Introduction

The problematic nature of shock identification in Vector Autoregressive models seems to have afflicted each and every implementation of this technique since its introduction in the macroeconomic literature (Sims 1980, pp. 15, 21) as a viable and more reliable alternative to large-scale simultaneous equations models (Kilian 2013). Estimation of reduced form parameters in a system in which all endogenous variables are treated simultaneously, leaves the door open to infinite possible decompositions of the error terms (called innovations just for the fact that they are new, in the sense that cannot be predicted by past values of the variables in the system) into uncorrelated structural shocks.

Selecting the right structural declination of the model always requires a set of assump-tions that constrain the behavior of structural and reduced-form shocks alike. They can be derived from (and thus justified by) prior knowledge on the institutional structure of the economy, or from economic theory just as well. Other strategies can revolve around the idea of selecting the right error variance decomposition on the basis of the model responses that they produce.

Since the very first years of this particular strand of empirical research, it was already clear that a general overreliance on spurious assumptions to identify economic shocks made the acceptability of these econometric exercises quite sensible to changes in the dominant framework from which scholars tend to see the macroeconomy. Cooley and Leroy (1985) for instance recognized these vulnerabilities but rebuked all attempts to identify VAR shocks without relying on theory as the restrictions they ended up using to restrict the structural parameter space were not justified or had little to no economic sense, invalidating any

(6)

causal interpretation of the representations they put forward.

In the light of these premises, these restrictions derived on a priori knowledge still seem to be the only way a structural representation of VAR models can be attained. Of course, there’s a trade-off between the effectiveness with which a given identification strategy is able to select a set of fitting vectors of structural parameters, and the lightness/vagueness of the assumptions on which that same strategy relies on: the more specific, strong, articulate the assumption, the smaller the set of complying parameter vectors. This compromise could capture quite efficiently the state of the VAR literature, if it weren’t for some recent developments that promise unique identification of the structural shocks but rely on relatively innocuous assumptions.

While previous specifications did not put any constraint on the behavior of structural shocks apart from orthogonality – and, in some cases, normality, like in Primiceri (2005) and Del Negro and Primiceri (2015) – the pioneering work of Lanne, Meitz, and Saikkonen (2017) demonstrated that assuming structural shocks to be independent and non-Gaussian allowed for unique identification (up to permutation and scaling of the shocks) by exploiting statistical properties of the error terms, without the help of any other theory-driven assumption. This intuition has been elaborated in a number of ways: it has served as the basis for efficient maximum likelihood SVAR estimation and identification – just like in the aforementioned Lanne, Meitz, and Saikkonen (ibid.) – but also as the foundation of Bayesian approaches realized by means of Gibbs sampling algorithms Lanne and Luoto (2016). Some variants (that deserve praise for their flexibility) are even able to relax the

non-Gaussianity assumption on structural shocks Kocięcki (2018).

(7)

unable to offer by themselves a structural representation of the causal links among aggregate economic variables: although the problem of error-variance decomposition appears to be solved, the model does not reveal anything concerning the economic nature of the orthogonal shocks into which the reduced-form residuals have been decomposed. As we look at estimation results, those shocks bear no self-evident economic meaning unless their behavior matches some external description. Nevertheless, we still see a reversal of roles between theory-driven a priori assumptions and structural shock identification. If the former are still useful to put labels on the structural shocks, the aforementioned developments in the latter offer us precious grounds against which we can appraise the same assumptions’ plausibility. If previous approaches depended on those constraints to generate structural impulse responses, these new methods can offer us the possibility to evaluate just how close those same assumptions are to the results provided by the estimation and identification algorithms.

This work intends to start with the ambition to investigate and eviscerate the ways in which the plausibility of classic shock-identifying assumptions can be appraised or even tested in the light of the posterior parameter distribution obtained without them.

The first part of the work will start by introducing of the Vector Autoregressive model, with a special emphasis on the different meanings that can be given to shocks in this context. Then, after a general discussion around the theme of shock identification, the remaining space will be devoted to an exhaustive overview of the most popular identification strategies.

The second part will be entirely dedicated to recent developments in statistical identi-fication of structural shocks in VAR model, from the basic assumptions that undergird

(8)

the approach of Lanne, Meitz, and Saikkonen (2017), to more sophisticated, Bayesian ramifications of this strand of research. Since these declinations of the model will serve as the basis of our attempt to test the constraints around which all the identification approaches described in the first section revolve, we will dedicate a copious amount of space to the description of the Gibbs sampling algorithms that allow us to simulate the posterior distribution of the structural model’s parameters.

On these premises, we will discuss the possibility to develop, for each one of the conventional shock identification tactics, a testing framework allowing us to accept or reject their founding hypotheses on the grounds of the posterior distribution of structural parameters obtained through the approaches described in the second part. Every testing approach will be employed hands-on on actual assumptions found in works of applied literature.

The last section will be dedicated to the conclusions that can be drawn about the potential of statistical identification beyond simple error variance decomposition and the further development our investigation points to.

(9)

2

The Vector Autoregressive Model

The deep reasons of the general success that the VAR specification has been enjoying in the last thirty years in the business cycle literature can be traced back to Sims (1980) synthesizing the critiques surrounding what was then the set of common identification strategies of large scale models of the likes of the FRB-MIT model, described in Rasche and Shapiro (1968). It is useful, for the following parts of this work as well, to provide some more context about what terms like identification and structure meant and still mean.

The former, as a concept, spawns from the awareness that the observable patterns of behavior of a given set of economic variables may well leave a considerable amount of space to indeterminacy concerning the parameters of the model that are supposed to produce the economic dynamics captured by those same variables. In this context, we can define a model parametrization as identified if and only if “different points in the model’s parameter space imply observationally different patterns of behavior for the model’s variables” (Sims 1980). This requires us to explain a (not so) secondary concept, normalization. We normalize a parametrization when we transform the parameter space so as to map all points that imply equivalent behavior into the same point of the new parameter space.

The latter is often deemed to be crucial in forecasting and policy analysis endeavors. The term structure is used by Sims (ibid., p. 12) to indicate, within a model, everything that is robust to policy shocks. More precisely, his understatement is derived from Hurwicz (1966, p. 38) and Koopmans and Bausch (1959): in order for the causal properties of a given model to be meaningful, we need the set of admissible vectors of model parameters to be reduced (in light of our data, along with what could be called, although somehow

(10)

misleadingly, a set of a priori assumptions) to a singlet.

The critique formalized by Sims turns around the methodological soundness of the identification strategies that were then used to establish a connection between large-scale models and macroeconomic reality. Firstly, these strategies relied heavily on implausible (even from an empirical point of view) exogeneity assumptions concerning variables

deter-mined in systems of equations that are supposed to be simultaneous. On the other hand, the reliance on serial uncorrelation of error terms in the dynamic specification of the model bore no economic sense (Kilian 2013).

The alternative to these traditional large-scale models, as proposed in Sims (1980), has slowly become one of the most popular econometric analysis tools when it comes to dealing with time series data (Kilian and Lütkepohl 2017, p. xvii). Since the need was pressing to do away with specifications and identification techniques that dealt with one equation at a time, in favor of models that treat all variables endogenously, the Vector Autoregression became the model of choice for many scholars confronting forecasting efforts.

The basis of this approach was a springboard for a wealth of later developments, that were not incorporated into the literature in an orderly manner. By virtue of its extraordinary versatility (in terms of the variety of identification techniques it is able to work with), the model has been object of an extensive exploration over the years.

Everything revolves around a basic model. Let’s imagine a time series of m dimensions. Each variable is allowed to be influenced by its lagged values, along with the lagged values of all other variables (up to lag p), along with – obviously – an error term.

(11)

We can summarize these assumptions into the following equations: y1,t = a11,1y1,t−1+ a12,1y2,t−1+ · · · + a1m,1ym,t−1+ · · · + a1m,pym,t−p+ u1,t y2,t = a21,1y1,t−1+ a22,1y2,t−1+ · · · + a2m,1ym,t−1+ · · · + a2m,pym,t−p+ u2,t · · · ym,t = am1,1y1,t−1+ am2,1y2,t−1+ · · · + amm,1ym,t−1+ · · · + amm,pymt−p+ um,t. (2.0.1)

These equations can be efficiently condensed into a much more understandable matrix form:

yt= A1yt−1+ A2yt−2+ · · · + Apyt−p+ ut. (2.0.2)

The linearly unpredictable components, ut, are allowed to be correlated. Using the

jargon of simultaneous equation models, what is shown in this last equation is the so-called reduced form, presenting the vector of variables at a given instant in time as a function of only its lagged values. The number of applications of this specific variant are numerous. However, for policy research purposes, we must focus on the so-called structural form. We then move from last equation to

B0yt= B1yt−1+ B2yt−2+ · · · + Bpyt−p+ εt. (2.0.3)

We still see the vector of observed time series data, along with a collection of p matrices of regression slope coefficients. There is however one more matrix, B0, capturing all

instantaneous relations among the variables the model is built upon. The vector εt is made

up of m structural shocks; they are serially uncorrelated and have a non-singular, diagonal variance-covariance matrix. The inverse of B0 is supposed to capture all the effects, on

(12)

the diagonality of the variance-covariance matrix of εt, the structural shocks are postulated

to be mutually uncorrelated and to have a precise economic interpretation. Identification in Structural-form vector autoregressive essentially consists in recovering {Bi}pi=0 from the

reduced-form representation.

The use of Vector Autoregressions can be justified in a number of ways. Following Canova (2007) we make use of the Wold representation theorem. This particular line of reasoning will then be reconsidered backwards in order to provide a smoother elaboration on the subject.

According to the Wold theorem, every vector stochastic process can be broken down into two components, one linearly predictable component and one linearly unpredictable. Defining Ft as the information available at time t, we assume that

Ft= Ft−1⊕ Et, (2.0.4)

with Et as the news that becomes available at time t. Consequently, we have that:

Ft =yt−1∗ + et, yt−1∗ ∈ Ft−1, et ∈ Et . (2.0.5)

We have that Et⊥Ft−1. By iterated substitution, we have that

Ft = F−∞⊕ ∞

X

j=0

Et−j. (2.0.6)

Orthogonality allows us to write y∗t = E [yt∗|Ft] = E " yt∗ F−∞⊕ ∞ X j=0 Et−j # = E [y∗t|F−∞] + ∞ X j=0 E [yt∗|Et−j]. (2.0.7)

(13)

projection, and about the consistency through time of the projection coefficients. We thus have: y∗t = ay−∞∗ + ∞ X j=0 Djet−j. (2.0.8)

For the theorem to hold it is sufficient for every new piece of information to be orthogonal to the existing ones. After demeaning, with yt = y∗t − ay−∞∗ , we can express yt under (one

of) its (many) MA representation(s): yt=

X

j=0

Djet−j = D (L) et. (2.0.9)

The multiplicity of MA representation is that taking arbitrarily a matrix H (L) such that H (L) H (L−1)0 = I and H (z) has no roots inside the unit circle, the representation

yt = eD (L)eet= [D (L) H (L)] h H L−10 et i (2.0.10) is equally valid. Among these possible alternatives, it is common practice to choose the one where D (z) has roots outside the unit circle (the so-called fundamental MA representation). If this condition is met, we can invert the polynomial and obtain

A0− A (L) = D (L) −1

, (2.0.11)

and setting A0 = I, we obtain

yt= A (L) yt−1+ et. (2.0.12)

On the other hand, since et’s are allowed to be mutually correlated, we cannot attach

an economic meaning to these innovation terms. In order to obtain mutually uncorrelated terms, we need the shocks to be orthogonal. Defining Σe as the variance-covariance matrix

(14)

of the shocks, we need to decompose it into

Σe = P V P0 (2.0.13)

where V is a diagonal matrix. The representation we want will be obtained with:

yt = eD (L)eet (2.0.14) Where eD (L) = D (L) P V1/2 and

e

et = V− 1/2

P−1et. Proceeding backwards, this moving

average representation is the same we obtain when we recursively substitute the observable terms in equation 2.0.3. The terms εt are exactly the terms eet that we want to isolate.

Once we have obtained them, we are able to explore the structural properties of the model. As already noted in Kilian and Lütkepohl (2017, p. xvii), orthogonal, economically meaningful shocks are not necessarily to be identified if our goals are not extremely ambitious, such as evaluating the speed with which the model variables revert back to equilibrium once a random disturbance has been introduced. These shocks are not strictly necessary even if we approach a VAR model with forecasting purposes. They are needed only when we want to dig deeper into the model, for example in cases when we want to assess its response to a given shock with a known origin. Since we assume the model coefficient to stay constant through time, the equation 2.0.14 allows us to obtain every slope coefficient connecting yt with any past random disturbance. It is useful to introduce

(15)

the so-called companion form, revolving around the mp × mp matrix F =                 A1 A2 A3 · · · Ap Im 0 · · · 0 0 0 Im . .. ... 0 .. . . .. ... 0 ... 0 · · · 0 Im 0                 (2.0.15)

Where the coefficients in the first m rows are exactly the slope coefficients in equation 2.0.2. We have that ∂yi,t+h ∂εj,t =Deh  i,j =Sm×m F h B−1 0  i,j, (2.0.16)

where Sm×m selects the first m rows and columns of its argument.

The only information we have at our disposal from our data about B0−1 is the estimated variance-covariance (VCV) matrix of the innovation terms, that can be estimated as the sample VCV matrix of the regression residuals. As already stated in 2.0.13, we need to decompose the VCV in order to obtain an impact matrix. What could be considered as a problem for applied researchers, could also considered to be one of the main strengths. The decompositions of Σε are potentially infinite. In order for us to obtain a vector of shock to

which we can attach some economic meaning, the assumptions driving the decomposition must be derived on prior or external knowledge concerning the system of equation upon which the model is built.

The following pages will be devoted to a quick review of the most successful identification strategies employed in the Structural VAR literature to obtain structural shock from which it is possible to derive the models’ impulse response function.

(16)

3

Identification Strategies

We already described in the previous pages the primary assumptions and properties that underlie vector autoregressive model specifications. However, for this econometric approach to be complete, a set of sound shock identification strategies is a must-have.

We start from imagining a structural VAR data generating process:

B0yt = B1yt−1+ · · · + Bpyt−p+ t, (3.0.1)

where yt is the usual m × 1 vector of variables and Bi (with i = 1, · · · , p) is the equally

usual m × m matrix of coefficients. Just as usually, the term t a vector of uncorrelated,

unconditionally homoskedastic disturbances or innovations. By virtue of these assumptions we can state

E t0t−i = 0 ∀ i ≥ 1, (3.0.2)

E (t0t) = Σ, (3.0.3)

with Σ diagonal. Without any loss of generality – as long as no constraints are imposed on the diagonal elements of B0 – we can force the variance of every single component of t to

be unitary, so that we have

E (t0t) = Σ = Im. (3.0.4)

As Kilian and Lütkepohl (2017, p. 217) remark, these shocks are not associated with particular variables within the model, so it is (quite understandably) impossible to give them a unit of measurement. Nevertheless, for the model to be legitimately considered a Structural VAR, these same shocks need an economic interpretation.

(17)

easily obtain the reduced-form representation of the data generating process: multiplying each side by B0−1 we get

yt= B0−1B1yt−1+ · · · + B0−1Bpyt−p+ B0−1t = A1yt−1+ · · · + Apyt−p+ ut (3.0.5) with {Ai}pi=1 = B0−1Bi p i=1 and ut = B −1

0 t (that is, the reduced-form error terms are

nothing but a linear combination of the structural innovations).

The main question that arise here is: how can we find the components of B0 (or,

equivalently, its inverse) – thus obtaining a sound structural representation of a given data-generating process – when all we have is a consistent estimate of the reduced-form specification? The main source of information is usually the estimate of the variance-covariance (VCV) matrix of the reduced form error term, Σu. Since t has, by construction,

an identity matrix as its VCV matrix, we can easily show that Σu = B0−1B

−1 0

0

. (3.0.6)

We could imagine this as a system of nonlinear equations with m2 unknowns. However,

since Σu, as any other matrix, is symmetric, it provides us only withm (m + 1)/2independent

equations. This system can be solved numerically, provided the number of unknowns does not exceed the number of independent equations. This precise passage calls for the introduction of a number of restrictions to be imposed on specific elements of B0−1. Restrictions on B0 or its inverse take the name of short-term restrictions by the fact that

they are imposed on the matrix that govern the contemporaneous relations among the different variables of the model, but also because its inverse, B0−1 (the impact matrix) is

(18)

the matrix that determines the immediate responses of the model to a given structural shock. They can take different forms.

Among the typologies of restrictions that enjoy the largest amount of consideration in the Structural VAR literature since Sims (1980), one that surely distinguishes itself for its simplicity and for its easy interpretability is the so-called exclusion restriction, defined as the type of restriction that forces a certain element of B0−1 to be equal to zero (thereby excluding a given structural disturbance from the linear combination that generates a given

reduced-form error).

Beyond that, another extremely functional kind of restriction is the proportionality restriction: this time, the number of unknowns is reduced not by forcing some of them to take on a known value, but by linking two or more of them by means of a linear equation.

To complete the list of the simplest and most popular restriction there’s a restriction that can be viewed as a generalization of the first kind, as described in the previous lines: while exclusion restrictions forced some elements of the impact matrix to be equal to 0, equality restrictions generalize this constraint by forcing a given element of B0−1 to take on a precise value (not necessarily 0).

It is extremely important to bear in mind that the only criterion guiding the choice among different types of restriction is given by the level of theoretical justifiability of the shock identification that they generate. We can formulate the concept in other terms by stating the identification strategy needs to be a function of the shocks the researcher has in mind and, of course, of the way their effects propagate through time within the model.

One of the most straightforward identification strategies (so straightforward that it enjoys much more consideration within econometrics textbooks, for didactic purposes, than

(19)

in actual research literature) is built on a recursive causal architecture that deserves some explanation in the following lines.

3.1

The recursive approach

As Kilian and Lütkepohl (2017, p. 220) remark, this approach is not built on insights that can be learned from the data about the nature of the causal relationships that need to be identified. This identification is rather built upon the a priori imposition of a whole causal chain with a rigid, recursive causation order. This extremely cumbersome assumption justifies the use of the Cholesky decomposition technique to recover B0−1, which is then, by construction, a lower triangular matrix with only positive elements on its diagonal.

The incredibly strict nature of the assumptions that underpin this identification strategy inspired a strand of critic contributions to the literature that take aim explicit aim at an approach that seems (because of its simplicity) to be built on the (quite blind, or even quite misled) confidence in the data’s ability to speak for themselves but in reality relies on a set of assumption that are extremely difficult to justify within real-world applications (Cooley and Leroy 1985).

This approach is problematic for a number of reasons. Kilian and Lütkepohl (2017, p. 220) identify three of them: firstly, the credibility of an approach that imposes a recursive causal architecture without any clear order of variables in mind is compromised from the start. Secondly, the number of possible orderings grows with the factorial of the number of variables (namely the number of all the permutations the model’s variables can be arranged in). And thirdly, even if the uncertainty of the second point can be overcome (e.g. all variable permutations offer the same impulse responses), the result we obtain does not

(20)

amount to proving that every identification strategy is bound to bring the same results. What it does prove is that only recursive identifications provide the same answers. There’s no factual evidence suggesting that the model should be even recursive in the first place. That being said, while we have to admit that the debate among researchers was not kind at all towards recursive approaches, we can also point at some showing that this kind of strategy can actually be used in some very special cases. For instance, the aforementioned Cooley and Leroy (1985), along with Bernanke and Blinder (1992) showed that Cholesky decompositions can (partially) identify even models whose structure depart from pure causal recursion. The former proved that structural responses to a given variables can be easily obtained by placing this variable first, while the latter proved that if one shock contemporaneously affects one and only one variable, structural responses to this shock can be obtained by placing said variable last in the ordering for the Cholesky decomposition (Keating 1996).

Moreover, recent literature shows that Cholesky approaches actually enjoy some kind of appreciation. For instance, this error variance decomposition technique was used by Kilian (2009). The work is built on a three-variable, 24-lag VAR model that is supposed to offer a hyper-simplified description of the global crude oil market by using (in this precise order) monthly time series of percentage change in the world’s crude oil output (∆prod), the level of real global economic activity as measured by a business cycle index (rea) and the level of real crude oil price (rpo).

The structural representation (using the author’s original notation) is A0yt = c +

24

X

i=1

(21)

where t, again, is a vector of mutually uncorrelated disturbances. Shock-identifying

assumptions are derived, as in many other works that do not necessarily rely on this kind of shock identification framework, from institutional knowledge, i.e. information about the institutional structure of the economy. For the purposes of clearer explanation, it comes in handy to write down the impact matrix A−10 to better frame the assumptions that are imposed on it: given the reduced-form representation

yt= A−10 c + 24 X i=1 A−10 Aiyt−i+ A−10 t ≡ µ + 24 X i=1 Θiyt−i+ υt (3.1.2) we impose         υt∆prod υrea t υrpot         ≡ υt = A−10 t≡         a11 0 0 a21 a22 0 a31 a32 a33                 ost ad t osd t         , (3.1.3)

meaning that reduced-form shocks to each of the three variables in the model are decomposed into three structural shocks to oil supply (os

t ), to aggregate demand (adt ), and to oil-specific

demand (osdt ), respectively.

In this case, the institutional knowledge inspiring this restriction concerns primarily the ability (or better yet, the inability) of suppliers to respond to demand shocks in the short run (or anyhow not within the same period within which these shocks are unleashed). Disturbances to oil production (υ∆prodt ), on impact, cannot come from demand shocks, thus excluding ad

t and osdt from the process that generates them (by imposing the two

zeroes on the first line of A−10 ). This can be formalized under the shape of a vertical or near vertical short-term oil supply curve, and as a consequence, links innovations in

(22)

oil production to one and only one shock (ost ) that is consequently interpretable as an oil supply shock. As for disturbances to real economic activity, everything that cannot be explained by oil supply shocks is linked to a shock to global demand for industrial commodity, or aggregated demand shock (adt ), excluding any other possible shocks from the disturbance generating process (with the zero on the last element of the second line of A−10 ). Looking at disturbances to real oil prices, no exclusion restrictions are imposed, giving to the last structural shock the interpretation of a shock to oil-specific demand (namely, everything in υtrpo that cannot be explained by structural shocks to oil supply or

aggregated demand).

The recursive structure imposed to structural shocks, in spite of its simplicity, is powerful enough to decompose reduced-form shocks in an economically justifiable way. The reduced number of variables in the model, however, points again to the fact that the higher m, the harder it is to find a justifiable recursive identification framework. For this and many other reasons, the huge majority of recent works in the SVAR literature has ended up finding other kinds of shock-identification assumptions much more appealing. The following pages are dedicated to providing couple of examples of the identification strategies behind some of the most cited works in the field.

3.2

Non-recursive identification strategies

As Kilian and Lütkepohl (2017) remark, the sources of information (or better yet, the kinds of knowledge) from which identification restrictions can be derived are numerous. Beyond simple knowledge about the institutional structure of given markets (as showed earlier), one can impose constraints on the basis of assumptions about market structures,

(23)

by assuming for instance that certain feedback mechanisms are negligible in their impact size or even nonexistent at all. In addition to that, there are information delays or even physical constraints in production chains (rendering any immediate feedback to a given shock impossible). However, one additional approach (one that could be regarded as one of the boldest) is to build the impact matrix in such a way that its elements can be viewed as elasticities, and then recur to external data in order to estimate enough of them to bring the model to just-identification, or even more. In addition to constraining the impact matrix elements on the basis of institutional knowledge, this is exactly the approach at the basis of the strategy adopted in Blanchard and Perotti (2002), to which we intend to dedicate some space.

The Blanchard & Perotti approach Upon the paper’s publication, Blanchard and Perotti (ibid.) intended to contribute to the debate upon the effects of shocks to fiscal policy by specifying a three-variable, four-lag VAR model. The variables included are, in that order, the logarithms of real quarterly US taxes per capita, real quarterly public spending per capita, and real gross domestic product per capita. The quarterly frequency of the data, by admission of the authors themselves, is optimal for the purposes of shock identification, which is reached by recovering the structure of fiscal policy’s automatic response to economic activity (ibid., p. 1330). This is done by taking advantage of information about the timing of tax collection, and on institutional knowledge on the tax and transfer system derived from other data.

Specifying the model as

(24)

with yt as the vector of the three aforementioned variables, and A (L, q) as a four-quarter

(q = 4) distributed lag polynomial, we need to find the linear relationship linking to the three structural shocks (called t from here on). While in the previous pages we discussed

only about the possibility of imposing restrictions on individual elements of the impact matrix, the authors generalize the approach, by posing

Cut= Bt (3.2.2)

with C and B (partially) unknown. This time, we do not force the structural disturbances to unit variance.

Restrictions are introduced in an equation-by-equation fashion. In partial contrast with what is generally frowned upon by Kilian and Lütkepohl (2017, p. 217), each of the structural shocks is indeed associated with one of the variables. The first reduced-form residual, affecting taxes (i.e. unexpected movements of real taxation per capita within a given quarter), is assumed to be dependent on three causes: unexpected movements in GDP per capita (the third reduced-form residual), structural shocks to spending and (with unit coefficient) structural shocks to taxes, according to the equation

utt= τ1uxt + τ2gt +  t

t. (3.2.3)

The second component of ut (spending per capita) is assumed to be dependent, again,

on unexpected movements in GDP per capita, and similarly to ut

t, on structural shocks to

taxes and (with unit coefficient) structural shocks to spending: ugt = γ1uxt + γ2tt+ 

g

(25)

The third and last component (GDP residuals) is this time dependent on unexpected movement of both taxes and spending. The only structural shock in the equation is the only one left out of 3.2.3 and 3.2.4 and is assumed to have a unit coefficient, thus gaining the interpretation of structural shock to real GDP per capita:

uxt = ξ1utt+ ξ2ugt + xt. (3.2.5)

These assumptions can be translated into a matrix representation, following 3.2.2:         1 0 −τ1 0 1 −γ1 −ξ1 −ξ2 1                 utt ugt ux t         =         1 τ2 0 γ2 1 0 0 0 1                 tt gt x t         . (3.2.6)

The authors turn first to τ1 and γ1. They denote respectively the automatic effects

of output on government revenues and the discretionary measures that policymakers can take in response to a given output shock within the time window delimited by the data frequency. Using quarterly data, this second phenomenon is virtually inexistent, so it is assumed γ1 = 0. The coefficient τ1 is instead viewed as the elasticity of net taxes to output,

and is then calculated using external data. The average value for the sample spanning from 1947Q1 to 1997Q4 is 2.08. Next, we can build

utt0 = utt− τ1uxt, (3.2.7)

ugt0 = ugt − γ1uxt = u g

t. (3.2.8)

Since they are no longer correlated with xt, they are used as instruments to estimate ξ1 and

ξ2 as regression coefficients of uxt on utt and u g

(26)

Since the authors find no credible identification of the hierarchy between shocks to taxes and shocks to spending, they present two alternative frameworks: in the first τ2 is assumed

to be equal to zero, after which γ2 can be easily estimated by regressing the second element

of Cut on the first, using the notation of 3.2.2. In the other setting, γ2 is assumed to be

equal to zero and then τ2 is estimated by regressing the second element of Cut on the first,

and thus completing the identification of structural shocks. It is useful to remark that this approach does not pose any constraint on the variance of structural shocks, so impulse response functions will need to be scaled to take into account the standard deviations of the innovation they may stem from.

Moreover, this approach is performed on more than one specification of the model. The first is defined in levels, and within it every reduced-form equation allows for a linear and quadratic trend. The second one is defined by taking first differences, and then subtracting them the weighted average of past first difference, with decay parameter equal to 2.5% per quarter. These specifications, referred to in the work as DT (deterministic trend) and ST (stochastic trend), both allow also for a series of dummies in order to account for large tax shocks that cannot be credibly attributed to the same stochastic process that generates structural disturbances within the models we have just described: consequently the model’s equations accommodate for present values and four lags of a total of 18 dummies. The first thirteen are associated with quarters from 1950Q1 to 1953Q1 respectively (to account for large fiscal shocks during the Fifties) while the fourteenth fires only during the huge temporary tax cut in 1975Q2. The last four are the first to fourth lags of this dummy. It is useful to emphasize that this identification is designed by the authors to fit both model specifications, but it needs to rely on external data in order to work. That is not the case

(27)

of another approach championed by Bernanke and Mihov (1998), which deserves to be included in this quick review of the literature.

The Bernanke and Mihov approach The characterizing feature of this work (that surely differentiates it from other previously described methods) is that it is not driven by the intention of identifying a number of shocks equal to the number of variables. Since the authors are interested in recovering only structural shocks coming from monetary policy institutions, they limit themselves to identifying only three structural innovations (within a model of six variables). The meaning of this needs some quick explanation.

Given a standard structural VAR model (like the one described in 3.0.5, its responses to a given structural shocks are recovered by the standard method of Structural Impulse Response Functions:

∂yt

∂t−k

= Ψk = ΦkB0−1 with Φ0 = Im and Φk= A1Φk−1+ · · · + Amin (k,p)Φk−min (k,p).

(3.2.9) Once a vector of structural disturbances has been identified, each column j of Ψkcontains

the responses of the model’s variables (after k lags) to a unit shock corresponding to the j-th element of t. Once all the shocks are identified, we can easily recover all the SIRF’s

we need, but if we are not interested in determining all the elements of B0−1 (or equivalently, the model’s response to a shock that we deem underserving of our attention), we can limit ourselves to identifying the impact matrix only up to the columns corresponding to the shocks we are really interested in. This is the basic approach followed by Bernanke and Mihov (ibid.).

(28)

semi-SVAR model with two categories of monthly data: for the policy variables group we find nonborrowed reserves, total bank reserves and the federal funds rate. For the nonpolicy variables we have real GDP, GDP deflator – both interpolated from quarterly data by means of a method described in Bernanke et al. (1997) – and the Dow Jones index of spot commodity prices. The model is defined in levels, where GDP, deflator and price index are log-transformed, while nonborrowed and total reserves are normalized by a 36-month moving average of total reserves. The relationship between policy (pt) and nonpolicy (yt)

variables is formalized into the following equations: yt= k X i=0 Biyt−i+ k X i=1 Cipt−i+ Ayvty, (3.2.10) pt= k X i=0 Diyt−i+ k X i=0 Gipt−i+ Apvtp, (3.2.11)

with k = 13. We can rewrite these two equations in a VAR form:     yt pt     = k X i=0     Bi Ci Di Gi         yt−i pt−i     +     Ay 0 0 Ap         vty vtp     . (3.2.12) Reduced-form representation is:

    yt pt     = k X i=1     I3− B0 0 −D0 I3− G0     −1    Bi Ci Di Gi         yt−i pt−i     +     I3− B0 0 −D0 I3− G0     −1    Ay 0 0 Ap         vyt vpt     = k X i=1 Γi     yt−i pt−i     + Θ     vty vpt     . (3.2.13)

(29)

Now, we have Θ =     I3− B0 0 −D0 I3− G0     −1    Ay 0 0 Ap     =     (I3− B0) −1 0 (I3− G0) −1 D0(I3− B0) −1 (I3− G0) −1         Ay 0 0 Ap     =     (I3− B0) −1 Ay 0 (I3− G0) −1 D0(I3− B0) −1 Ay (I3− G0) −1 Ap     . (3.2.14) So we have that the reduced form residuals

ut=     (I3− B0) −1 Ayvy t (I3− G0) −1 D0(I3− B0) −1 Ayvyt + (I3− G0) −1 Apvtp     . (3.2.15) Since we assumed orthogonality between vpt and vyt, we get that the portion ˆupt of upt which is orthogonal to uyt is nothing else than

ˆ

upt = (I3− G0) −1

Apvpt. (3.2.16) Now, we know that

E vptv p t 0 = Σ ≡         σ12 0 0 0 σ2 2 0 0 0 σ2 3         , (3.2.17) and consequently E ˆuptuˆ p0 t  = Ξ = (I3− G0) −1 Ap ΣpAp0(I3− G00) −1 , (3.2.18) with Σp as a diagonal matrix.

(30)

Here the point restrictions are introduced, concentrated into a triplet of equations describing the market for bank reserves (in innovation terms). Decomposing vpt into a demand shock, a supply shock and a borrowing function shock, we follow the notation in 3.2.2, and obtain             1 0 0 α 0 1 0 −β 0 0 1 0 1 −1 −1 0                         uT R t uBR t uN BRt uF F R t             =             1 0 0 0 0 1 φd 1 φb 0 0 0                     vdt vs t vb t         (3.2.19) where uBR

t = uT Rt − uN BRt indicates borrowed reserves. Solving the model for uT Rt , uN BRt

and uF F Rt brings ˆ upt = (I3− G0) −1 Apvpt =         − α α+β 1 − φ d + 1 α α+β α α+β 1 + φ b φd 1 φb 1 α+β 1 − φ d − 1 α+β − 1 α+β 1 + φ b         vpt. (3.2.20)

The model has seven unknowns (when including {σ2 i}

3

i=1) but only six covariances that can

be obtained by estimating Ξ. Thus, a series of five identification strategies are proposed (four over-identified and one, the last, just-identified). The first is the so-called FFR identification, based on Bernanke and Blinder (1992), assumes φd = −φb = 1. The second,

the NBR model, based on Christiano and Eichenbaum (1992), poses φd = φb = 0. The third (NBR/TR), inspired by Strongin (1995) poses α = φb = 0, the fourth (BR) imposes

φd= 1 and φb =α/β, and the fifth, just-identified (JI) model, imposes just α = 0.

For the JI model, matching ˆΞ = (I3− G0) −1

Ap ΣAp0(I3− G00) −1

(31)

all the six unknowns. For all other just-identified models, estimation of other six parameters can be performed by GMM. Once this is done, what we can easily obtain is the last three columns of the impact matrix of the general VAR model with six-variables:

[Θ]·,4:6 =     03×3 (I3− G0) −1 Ap     , (3.2.21)

allowing us to recover structural impulse response functions exclusively to the three identified policy shocks:

∂     yt pt     ∂4:6,t−j

= Ψj = Φj[Θ]·,4:6 with Φ0 = Im and Φj = Γ1Φj−1+· · ·+Γmin (j,p)Φj−min (j,p).

(3.2.22)

3.3

Long-Run Restrictions

These pages use the same notation of Kilian and Lütkepohl (2017, pp. 269–271). We have dedicated enough space to the problematic nature of finding sufficiently justifiable assumptions to put together an identification strategy built exclusively on short term restrictions. This makes the use of other sources of information much more appealing. In this specific case, we point to knowledge concerning the long-term response of a given model to a given structural shock.

Imagining an m-variable, p-lag VAR model, we can manipulate 3.0.5 to obtain

(32)

or equivalently, the structural representation

(L) yt= B0A (L) yt= t (3.3.2)

where A (L) = Im−A1L− · · · −ApLp is the distributed lag polynomial. Naturally, assuming

stationarity of yt, further manipulation brings the common moving average representation

yt= B (L) −1 t= Υ (L) B0−1t = Θ (L) t= ∞ X i=0 Θit−i. (3.3.3)

Since we are dealing with a stationary series, it is safe to assume that impulse responses will tend to zero as we increase the horizon, but we can introduce some assumptions concerning the cumulative structural impulse function (on its limit, to be more specific):

∞ X i=0 Θi = Θ (1) = B (1) −1 . (3.3.4)

This is extremely useful when using for example first-differenced time series. Now, recalling again equation 3.0.5, we have

B (1)−1 = A (1)−1B0−1, (3.3.5) and consequently

B0−1 = A (1) B (1)−1. (3.3.6) By this we obtain (normalizing the structural error variance to Im) that

Σu ≡ B0−1B −1 0

0

= A (1) B (1)−1B (1)−10A (1)0. (3.3.7) Pre- and post-multiplying Σu we get

A (1)−1 ΣuA (1) −10

(33)

From this point on, we can decompose A (1)−1 ΣuA (1) −10

by using the same kind of restrictions we have described earlier. Once Θ (1) is found, we use equation 3.3.6 to recover B0−1 and then calculate all the structural impulse response functions we need. This is the technique used to identify structural shocks, for instance, in Blanchard and Quah (1989). A semi-structural variation to this approach is surely found in the second model of King et al. (1991). It is a six-variable Vector Error Correction Model (VECM) with logarithms of three real aggregate flow variables, namely per capita real consumption expenditures ct,

per capita gross private domestic fixed investments invt, and per capita real private gross

national product gnpt (obtained as the difference between real per capita GNP and real per capita government expenditures); the logarithm of the implicit price deflator pt of per

capita real private GNP is included as a price index. To complete the framework, the per capita real M2 aggregate (mt− pt) and the three-month treasury bill return rate it are

added.

Starting from a common RBC setting, three cointegrating relationships among the variables are assumed:

mt− pt− βyyt+ βRRt ∼ I (0) ,

ct− yt− φ1(Rt− ∆pt) ∼ I (0) ,

it− yt− φ2(Rt− ∆pt) ∼ I (0) .

(3.3.9)

The parameters are then estimated by dynamic OLS as in Stock and Watson (1993). These equations are used to identify three shocks that are assumed to be permanent (in contrast with the other three shocks, which are supposed to be just transitory): a balanced-growth shock – with long-run effects on output, investment and consumption, but not on the ratios between consumption and output and between investment and output; a

(34)

neutral inflation shock, influencing inflation and nominal interest rates but not real flow variables; and finally a real interest rate shock, which affects the two aforementioned ratios along with nominal interest rates.

On these bases, defining xt = (yt, ct, it, mt− pt, Rt, ∆pt) 0 , it is assumed Θ (1) =  A 06×3  , Γ = eΓΠ, Π =         π11 0 0 π21 π22 0 π31 π32 π33         , Γ =e                     1 0 0 1 0 φ1 1 0 φ2 βy −βR −βR 0 1 1 0 1 0                     , (3.3.10) where all coefficients of eΓ are known. Forcing Σ1:3,1:3 = I3 (where Σ is the VCV matrix

of the structural shocks, and Σ1:3,1:3 = S3×3(Σ)) we can recover the first three columns

of B0−1 (linked to the three permanent shocks) as follows: we start from the structural moving average representation

∆xt =µ + Υ (L) B0−1t= Θ (L) t. (3.3.11)

Let D be any solution to Υ (1) = eΓD, like for instance, as suggested by the authors themselves,Γe0Γe −1 e Γ0Υ (1). It is shown that e ΓDut= eΓΠ1:3,t. (3.3.12) And since DΣuD0 = ΠΣ1:3,1:3Π0 = ΠΠ0, (3.3.13)

(35)

the square matrix Π is obtained as the lower triangular Cholesky factorization of DΣuD0.

From this, we obtain that

[B0]1:3,· = Π−1D. (3.3.14)

The first three columns of B0−1 are obtained by: B−1 0  ·,1:3 = Υ (L) ΣuD 0 Π−10 (3.3.15) thus giving us the impulse responses to the three corresponding structural shocks.

3.4

Identification by Sign Restrictions

The identification strategies we have described so far bring us to structural models that are point-identified: there should be one and only one acceptable solution given the restrictions imposed on the model’s responses. A different, increasingly popular approach brings to set identification – i.e. there are infinite solutions which are compatible to the restrictions imposed.

We discussed the possibility of putting equality (and especially exclusion) restrictions on the impact matrix for the sake of reducing the number of unknowns, but if all information we have concerns the sign of certain responses at certain horizons, the results that can be obtained are still meaningful.

We can imagine a structural VAR model as in equation 2.0.3. Using the notation of equation 3.2.9, we can impose restrictions on the sign of any element of any Ψi, not only

Ψ0 = B−10 . The only caveat is to use or strict and weak inequalities so as to avoid B −1 0

(36)

The interesting part comes with the techniques that can be used to draw from the distribution of admissible impact matrices. We start with the usual assumption of equation 3.0.4. Estimating the reduced form model gives us the error VCV matrix Σu. Without any

further assumption, we have that the set S of acceptable impact matrices is S=nB0−1 : B0−1B0−10 = Σu

o

. (3.4.1)

The Cholesky factorization Σ1u/2 of Σu automatically belongs to S. The errors wt = Σ −1/2 u ut

are easily shown to be uncorrelated with unit variance. Now let’s imagine an orthogonal matrix Q (such that, by definition, QQ0 = Im). The errors vt = Qwt satisfy the identity

VCV condition as well. Given ut = Σ 1/2 u wt= Σ 1/2 u QQ 0 wt= Σ 1/2 u Qvt, (3.4.2) since Σu1/2QΣ1u/2Q0 = Σu1/2QQ0Σ1u/20 = Σu, (3.4.3) we have that Σ1u/2Q ∈ S, (3.4.4) so any matrix Q ∈ O (m) – the set of all m × m orthogonal matrices – is able to generate an admissible impact matrix. Generating a large number of Q is consequently essential to generate a large sample of the distribution of all the possible B0−1’s.

A recent approach, devised by Rubio-Ramírez, Waggoner, and Zha (2010), goes as follows: we start by generating an m × m matrix W by randomly drawing each column from N (0, Im). We factorize W by QR decomposition with the diagonal elements of the

(37)

upper triangular matrix R assumed to be positive. This, as shown in the work, amounts to extracting samples from a uniform distribution covering O (m). Moreover, as Fry and Pagan (2011) show, this technique is equivalent to the Givens rotation matrices approach analyzed in Canova and De Nicoló (2002) – an approach that becomes more and more computationally cumbersome as m increases and has consequently become extremely rare in recent literature.

(38)

4

Independent and non-Gaussian Shocks

The model specifications and the identification strategies discussed so far limit themselves to assuming that structural shocks are reciprocally uncorrelated but make no assumptions about the probability distribution that determines their stochastic behavior. Starting from these constraints, shock identification is impossible without the help of some external source of knowledge about the relationships among the variables involved or about how the model should respond to certain structural disturbances. Even if we turn the knob up a little notch – by imposing that there is absolutely no connection whatsoever among structural shocks (i.e. by assuming that they are mutually independent) – we cannot obtain any improvement for our situation: since the cardinality of S from equation 3.4.1 is infinite, the variance decomposition that recovers the independent shocks we want is buried among an infinite number of other impact matrices that generate disturbances that are simply mutually uncorrelated. This predicament can by no means be modified by defining the distribution that these disturbances are supposed to follow, if this distribution is the standard Normal distribution. However, the framework changes radically when the assumption of non-Gaussianity is introduced (Moneta et al. 2013; Gouriéroux and Monfort 2014). The consequence of these assumption is that the impact matrix is identified up to permutation and flipping of sign of its columns (the problem of scaling is solved if we force a priori the variance of structural shocks). The theoretical justification of this result is rooted in Independent Component Analysis (ICA), an analogue of PCA when components are supposed to be independent instead of being simply uncorrelated. On a practical level, the impact matrix can be obtained “[. . . ] by considering cross-moment conditions, or tail

(39)

properties” (Gouriéroux and Monfort 2014, p. 18).

This has given rise to a new strand of literature (Lanne, Meitz, and Saikkonen 2017, for instance), with serious ramifications in the discourse around Bayesian VAR analysis. While the conventional approach takes advantage of likelihood-maximization techniques to estimate the various parameters of interest. Among the most interesting applications of these insights to BVAR there is certainly Lanne and Luoto (2016), that make use of a Gibbs sampling algorithm to draw parameters recursively from their full conditional posterior distributions, with the help of Metropolis-Hastings algorithms where those same posteriors cannot be easily integrated.

Another extremely interesting example, however, extends statistical shock identification to the context of Time-Varying Parameter Structural Vector Autoregressions: it’s the case of Kocięcki (2018). Since they will be at the center of the experimentations the present work is supposed to turn around, they need an in-depth analysis, that is provided to the reader in the following pages.

4.1

Time-invariant parameter Structural BVAR

The procedure devised by Lanne and Luoto (2016) is built upon the standard reduced-form VAR model:

yt= c + A1yt−1+ · · · + Apyt−p+ Bwt, (4.1.1)

where yt is a vector of m variables and it’s dependent on past values of itself up to p lags –

coherently with the notation that we have used so far. It is assumed that the i-th component of structural errors wt is independent and identically distributed for t = 1, · · · , T . We

(40)

chose not to use the symbol t since the latter has been used so far to indicate vectors

of uncorrelated, unit-variance disturbances, in contrast with the fact that each of the mutually independent components of the structural disturbance vector is assumed to follow a Student’s t distribution with degree of freedom λi – thus with variance equal toλi/(λi− 2).

Estimation is performed by means of a Metropolis-within-Gibbs sampling algorithm. Every variable is cyclically drawn by conditioning the posterior distribution on the previously generated values of the other ones.

Sampling from the full conditional posterior of B For computational convenience, we can follow the definition of Student’s t distribution and reparametrize the error terms as

wit= ηith −1/2 it , (4.1.2) Where ηit ∼ N (0, 1) (4.1.3) and hitλi ∼ χ2λi. (4.1.4)

Using the notation by which we define the matrix

Ht= Diagonal (h1t, h2t, · · · , hmt) , (4.1.5)

we can use the distribution of ηit to build our conditional likelihood function: defining the

reduced form residuals

(41)

we can infer that ηt = H 1/2 t B −1 ut ∼ N (0, Im) , (4.1.7)

and consequently, that

ut∼ N 0, BHt−1B 0 .

(4.1.8) Multiplying all the density functions, we finally obtain

p (y|A, B, H) ∝ det B−1 T YT t=1 |Ht| 1/2 exp " −1 2 T X t=1 u0tB−10HtB−1ut # . (4.1.9) To avoid any ambiguity, we define

H = Diagonal(h11, · · · , hm1, · · · , h1T, · · · , hmT) (4.1.10)

and

A = [c A1· · · Ap] 0

. (4.1.11)

Since independence and non-Gaussianity of the error terms allow us to identity the models up to scaling and permutation of the model, we arbitrarily choose only one possible configuration of B’s columns (or equivalently, of B−1’s rows). More specifically, we take the parameter space of B−1 = (cij) and restrict it so that |cjj| > |cij| for all i > j. Moreover,

we impose that all elements of B−1’s diagonal are positive. This is done by defining p∗(y|A, B, H) = p (y|A, B, H) · Iid B−1



(4.1.12) where Iid(·) is an indicator function whose value is 1 or 0 depending on its argument

satisfying the aforementioned conditions.

(42)

prior for b = vec(B−1), but provides results obtained using an uninformative prior, where p (b) ∝ 1. Following that spirit, we made the same choice, making our conditional posterior for b directly proportional to our likelihood function.

Since the density function cannot be easily integrated, we recur to an acceptance-rejection Metropolis-Hastings (ARMH) algorithm, just as defined in Chib and Greenberg (1995). The proposal distribution is constructed as follows. Since the logarithm of our

posterior density function is

log p (y|A, B, H) = T log det B−1 − 1 2vec B −10 T X t=1 utu0t⊗ Ht ! vec B−1 + K, (4.1.13) we call f and G the gradient and the negative hessian of log p (y|A, B, H) with respect to b (evaluated at the density function’s mode eb) respectively:

f = ∂ log p (y|A, B, H) ∂ vec (B−1) b=eb = T vec (B0) − T X t=1 utu0t⊗ Ht ! vec B−1, (4.1.14) G = − ∂ 2log p (y|A, B, H) ∂vec (B−1)2 b=eb = T Kmm(B0⊗ B) + T X t=1 utu0t⊗ Ht ! , (4.1.15) where Kmm is a m2× m2 commutation matrix, defined as, for instance, in Lütkepohl (1996,

p. 115).

The density’s median can be easily found by means of a Newton-Raphson algorithm, using the explicit formulas for the gradient and the Hessian. In order to facilitate extraction from the posterior distribution, we build the proposal distribution for the Metropolis-Hastings algorithm a Gaussian mixture model with two components with equal probability. The first component has eb and G respectively as mean and precision parameter (so the variance will be defined as G−1). To account for the possibility that eb won’t satisfy the

(43)

condition imposed on B−1’s rows (making possibly extremely difficult the extraction of a vector that actually does satisfy them), we define ¯b and ¯G, where

¯ b = (Im⊗ DP ) eb, (4.1.16) ¯ G =  Im⊗ (DP ) −10 G Im⊗ (DP ) −1  . (4.1.17) Where P is the permutation matrix making P eB−1 = (ecij) satisfy the condition by which

|ecjj| > |ecij| for all i > j, while D is the diagonal matrix (with only 1’s or −1’s on its diagonal) making each diagonal element of DP eB−1 positive. Thus, our proposal distribution will be defined as follows: q (b) = 1 2φ  b eb, G −1 + 1 2φ b ¯b, ¯G−1 . (4.1.18) A conventional Metropolis-Hastings algorithm would require a parameter c such that cq (x) ≥ f (x) for every point x belonging to the support set of f (·) (the kernel of the density function we cannot directly sample from, namely our posterior density). The reason that justifies the choice of an ARMH algorithm is given by the lack for any need to find such a parameter (more precisely, the procedure admits the existence of a subset of f (·)’s support set within which cq (x) < f (x). To generate an acceptable value of c for each iteration, we generate 50 draws zi from q (b) satisfying the identifying conditions imposed

on B−1, and then calculate from each of them as ci =

f (zi)

q (zi)

. (4.1.19)

The median of this collection is then chosen as input for the ARMH algorithm, which is described in the following lines.

(44)

First, we generate candidate values by means of a simple Acceptance-Rejection (AR) mechanism: each time, a single value y from q (·) is drawn. This point is then accepted following a Bernoulli distribution with probability parameter equal to min



f (y) cq(y), 1



. Then we have another selection mechanism governing the transition from a previously accepted value x to a given candidate y provided by the AR step: if we have f (x) < cq (x), then the candidate is always accepted. If this is not the case, then two possibilities arise: if we have f (y) < cq (y) then y is accepted with probability equal to cq(x)f (x). Otherwise, y is accepted with probability equal to minf (y)q(x)q(y)f (x), 1.

Sampling from the full conditional posterior of A Once a value of b is drawn, we can proceed with extraction of the reduced form coefficients. Defining a = vec (A), and constructing X by stacking Xt from t = 1 to T , where

Im⊗ 1, yt−10 , · · · , y 0 t−p



(4.1.20) we can rewrite the likelihood function (once B−1 is fixed) as

p (y|A, B, H) ∝ exp  −1 2(y − Xa) 0 Ω (y − Xa)  , (4.1.21) where Ω =  IT ⊗ B0−1  H IT ⊗ B−1 . (4.1.22)

(45)

We impose a Normal, near uninformative prior, with a = 0 and Va = 10 0002Im2p. Multiplication brings: p (A|B, H, y) ∝ exp  −1 2(y − Xa) 0 Ω (y − Xa)  exp  −1 2(a − a) 0 V−1a (a − a)  ∝ exp  −1 2(y 0

Ωy − a0X0Ωy − y0ΩXa + a0X0ΩXa+ a0V−1a a − a0V−1a a − a0V−1a a (4.1.23) Imposing V−1a = V−1a + X0ΩX, (4.1.24) we obtain p (A|B, H, y) ∝ exp  −1 2  a0V−1a a − a0Va−1VaX0Ωy − y0ΩXVaV −1 a a− a0V−1a VaV−1a a − a 0 V−1a VaV −1 a a i . (4.1.25) Now, imposing a = Va V−1a a + X 0Ωy , (4.1.26) we get p (A|B, H, y) ∝ exp  −1 2(a − a) 0 V−1a (a − a)  , (4.1.27) which is exactly the kernel of a multivariate Normal distribution with mean a and variance Va. From there, all that’s needed is a simple draw.

Sampling from the full conditional posterior of H We now move to the extraction of the model’s latent variables. Starting with H, we rewrite the log-likelihood function

(46)

(with A and B−1 fixed) as log p (y|A, B, H) = T X t=1 log |Ht| 1/2 −1 2 T X t=1 u0tB−10HtB−1ut+ K = T X t=1 " m X i=1  log h1it/2 −1 2hitw 2 it # + K (4.1.28) Equivalently, we have p (y|A, B, H) ∝ T Y t=1 ( m Y i=1  h1it/2exp  −1 2hitw 2 it ) . (4.1.29) We can use one of the fundamental assumptions of our model, hitλi ∼ χ2λi, to obtain a

prior. By virtue of the definition of the chi-square distribution, hitλi ∼ Γ  λi 2, 1 2  (4.1.30) (using, from here on, the shape-rate parametrization for Gamma-distributed variables).

Moreover, by property of said Gamma distribution, hit∼ Γ  λi 2, λi 2  ∝ hλi2−1 it exp  −λi 2hit  . (4.1.31) Multiplication with the likelihood function (taking advantage of the fact that all other hit’s

are considered as fixed), brings p (y|A, B, H) p (hit) ∝ h λi 2−1 it exp  −λi 2hit  h1it/2exp  −1 2hitw 2 it  ∝ hλi−12 it exp  −(λi + w 2 it) hit 2  , (4.1.32) which is, luckily, the kernel of a gamma distribution. Extraction is performed as follows:

hit∼ Γ  λi+ 1 2 , λi+ wit2 2  . (4.1.33)

(47)

Sampling from the full conditional posterior of {λi} m

i=1 We can now move on to

the last variable, λi. Within this specification,λi’s influence the model only through H, so

once hit’s are estimated, they are to be considered as constants, as far as the likelihood

function with which we should build the conditional posterior is concerned. One thing we can take advantage of is the hierarchical prior structure that links λi’s with H. Using an

exponential prior for λi

p (λi) = 1 λiexp  −1 λiλi  with λi = 5, (4.1.34) we obtain pλi {hit} T t=1, y  ∝ p (y|A, B, H) p{hit} T t=1 λi  p (λi) ∝ p{hit} T t=1 λi  p (λi) ∝   λi 2 λi2 Γ λi2   T ( T Y t=1  hλi2 −1 it exp  −λi 2hit ) exp  −1 λiλi  ∝  2λi/2Γ λi 2 −T λi λiT 2 T Y t=1 hλi2−1 it ! exp " − 1 λi + 1 2 T X t=1 hit ! λi # . (4.1.35) Extraction from the non-integrated PDF f (·) is performed by means of an independence chain Metropolis-Hastings (ICMH) algorithm. As a proposal distribution q (·), we use a normal with the mode of the posterior density of the mean, and the negative of the second derivative of the log-posterior density as precision parameter. Following ICMH rules, transition from a previously accepted value x and a candidate y drawn from q (·) is simply governed by a Bernoulli-distributed variable with probability equal to min



f (y)q(x) q(y)f (x), 1

 .

(48)

4.2

The TVP-SVAR model

The strategy developed by Kocięcki (2018) follows a radically different approach than what is proposed by Lanne and Luoto (2016). Its specification setting is built upon a model in which everything is allowed to vary over time, following a normally-distributed random walk. We won’t be using the author’s notation (which confusingly twists many of the most well-established conventions of SVAR literature since time immemorial), hoping to convey the power of this approach in a clearer way by extending (when possible) the notation of Lanne and Luoto (ibid.) to the Time-Varying Parameter (TVP) realm.

Whereas the time-invariant parameter reduced-form VAR model was specified as yt= c + A1yt−1+ · · · + Apyt−p+ Bt, (4.2.1)

now we allow all parameters to vary over time by adding a time tag to each of them: yt= ct+ A1,tyt−1+ · · · + Ap,tyt−p+ Btt. (4.2.2)

In this case, the random disturbance term it is now iid N (0, 1) for i = 1, · · · , m and

t = 1, · · · , T . Coherently with the previously described notation, we define at= vec (At),

although the estimation technique we describe in the following pages requires us to excise ct from the definition of At= [A1· · · Ap]

0

. Finally, we can comfortably define bt= vec (Bt).

We assume a normally-distributed random walk for all parameters:

bt= bt−1+ ξt, (4.2.3)

ct= ct−1+ ωct, (4.2.4)

(49)

Error terms are normally distributed:     ωc t ωa t     ∼ iid N (0, Ω) with Ω =     Ωc Ωca Ω0ca Ωa     , (4.2.6) ξt∼ iid N (0, Σ) . (4.2.7)

Naturally, all disturbance terms are independent of one another, for all i = 1, · · · , m and t = 1, · · · , T . As shown in the original work, identification can be reached by assuming that Ωca = 0 (meaning that ωtc’s are assumed not to be correlated with ωta’s), Ωa is

equation-wise block-diagonal (meaning that the disturbance terms guiding the random walk of the reduced-form parameters within a given equation of the model are not correlated with the errors concerning another equation; this implies m blocks sized mp × mp), and lastly that Σ is column-wise block diagonal (meaning that the disturbance terms guiding the trajectory of different columns of Bt are uncorrelated by assumption; in this case we have

m blocks of size m × m). The priors for b0, c0 and a0 are normally distributed:

b0 ∼ N b0, Vb0 , (4.2.8)

c0 ∼ N c0, Vc0 , (4.2.9)

a0 ∼ N a0, Va0 . (4.2.10)

On the other hand, priors for Σ’s blocks (from here on, Σii) and Ω’s blocks (Ωj for

j = 1, · · · , m + 1 for the sake of conciseness) are Inverse-Wishart-distributed:

Σii∼ W−1 Σ, m2 , (4.2.11)

Ωj ∼ W−1 Ωj, m 2p .

Riferimenti

Documenti correlati

L’inaccessibilità delle nostre città non può essere considerato un problema che riguarda una minoranza della popolazione, in quanto si tratta di aspetti che riguardano tutte

al 1920, anno di pubblicazione della raccolta Instigations dell’amico Ezra Pound, raccolta nella quale compare l’ultima e più rifinita versione di una lunga e tormentata serie

Since the Voivod is obligated to make an appropriate referral to the Police, Internal Security Agency or other relevant authorities for information concerning the problem if

This suggests recasting the question in the following form: how many families in Mubarakzyanov’s list of solvable Lie algebras contain at least one hypo Lie algebra.. The answer is

mechanical loading; (b) the PSDs of OF and 6Cyc samples being subjected to 1.6 MPa... Behaviour of samples along drying to 50 and 300 kPa of suction: a) stress and hydraulic paths;

Copyright: See the publisher/journal website (http://tandfonline.com/toc/tejr20/current) RoMEO:Questo è un editore gray in RoMEO colore ROMEO policy di archiviazione verde

While there is no evidence of a productivity dierential between Polizia and Cara- binieri for the very rst robbery of a sequence, subsequent robberies that fall in the Polizia

The proposed approach for creating intermediate models has been shown to increase the prediction of the stress level of the users using the data derived from motor activity; from