• Non ci sono risultati.

Novel Tools for Causal Inference: A critical application to Spain innovation studies

N/A
N/A
Protected

Academic year: 2021

Condividi "Novel Tools for Causal Inference: A critical application to Spain innovation studies"

Copied!
72
0
0

Testo completo

(1)

University of Pisa

Sant’Anna School of Advanced Studies

Master’s Degree Thesis in

Economics

Novel Tools for Causal Inference

A critical application to Spanish innovation studies

Supervisor Candidate

Prof. Alessio Moneta

Sebastiano Cattaruzzo

(2)
(3)
(4)
(5)

Abstract

The debate between predictive and causal models has been at the core of econo-mics science for quite a long time. Starting from here, this dissertation investigates the roots of causal reasoning in economics and the paths that brought to the current set of tools for this type of modeling.

The emergence of graphic causal models brought to a new possibility to obtain causal estimations that can be later exploited in several different contexts ranging from policy making to further economic modeling. At the same time, the approach poses several difficulties to researchers entailing a complicated multi-disciplinary apparatus that make use of different assumptions and increasingly complex es-timation methods. This feature increases the potential of the approach as it can accommodate different systems, but also requires in-depth knowledge of the imp-lied techniques.

The first part of the dissertation will contain an introduction that summarizes the relevance of the chosen topic in economic research, and that contextualizes the problem of causal inference in complex systems. Following the introduction, a comprehensive chapter that builds the framework for causal inference, in the literature, and in our specific case. Another chapter will comprise the critical re-view of the above. The final chapter contains a two-fold empirical exercise. On the one hand, a simple replication of the Janzing study using the same specificati-ons. On the other hand, an attempt to amend the above thanks to the features that the PITEC dataset offers.

(6)
(7)

Contents

1 Introduction 1

2 Causal Inference in Complex Systems 3

2.1 The Concept of Causation . . . 3

2.2 The Quest for Causal Inference . . . 5

2.3 Causal Inference in Economics (of Innovation) . . . 10

3 Literature Review on Causal Inference in Economics 13 3.1 Graphical Causal Modeling and the PC Algorithm . . . 13

3.1.1 Formal preliminaries . . . 14

3.2 Vector Auto-Regression and the Structural Form . . . 19

3.3 Additive Noise Models . . . 23

4 Janzing’s Approach 27 4.1 Graphic Causal Discovery . . . 28

4.2 Additive Noise Model . . . 32

4.3 Non-algorithmic Inference By Hand . . . 35

4.4 Results on the CIS database . . . 36

5 Empirical Estimation 39 5.1 Descriptive Statistics . . . 39

5.2 Replication of Janzing’s analysis . . . 44

5.3 Personal Refinements . . . 45

5.3.1 Using continuous variables for subsidies . . . 46

5.3.2 Using continuous, lagged variables for subsidies . . . 47

6 Conclusions 51 6.1 Future Developments . . . 52

A Resulting Graphs 55 A.1 Janzing’s variables . . . 55

(8)

A.2 Continuous variables . . . 55

A.3 Continuous, lagged variables . . . 56

A.3.1 One-year lag on fundings . . . 56

(9)

Chapter 1

Introduction

The aim of the dissertation is to critically analyze the new approach which has been developed by Dominik Janzing, whose roots can be found in Peters, Janzing and Schölkopf (2011). Rather than coming out from scratch, the novelty of this approach consists in combining and amending three already existing methodolo-gies with the final aim of accommodating more functional forms. Namely, the three employed techniques are the conditional independence approach, additive noise-based causal discovery, and non-algorithmic inference by hand.

Causal inference in experimental settings has become a quite useful tool for policy making. Despite its high costs of design and implementation, the rando-mized controlled trials offer estimates with extremely high internal validity and often acceptable, external validity. Nevertheless, there are still several fields of study where experimental settings cannot be implemented due to moral or practi-cal reasons. In this framework, causal inference and econometric estimations face the challenge of controlling for other causes that often remain only partially ans-wered. The development of statistical techniques in other sciences may come helpful to economists working in non-experimental settings.

Despite the exposure of this methodology to several fallacies and critiques, if correctly applied, it has the potential to shed some light on interesting topics in science generally, and in innovation, particularly. The authors already applied their method to several, distinct waves of the CIS database relative to different countries. Nevertheless, due to limited samples size, they were forced to merge their data coming from different contexts in the attempt to have more observations. Rather than such a general and wide application, a more longitudinal and focused use can be done. On the one hand, this could better test the robustness of the method across time. On the other hand, a larger sample of observations allows to reduce the role of the curse of dimensionality, making the results of conditional testing sounder.

(10)

CHAPTER 1. INTRODUCTION 1.0

have been rarely applied to economics. For this reason, understanding their beha-vior in different contexts becomes fundamental to fine-tune their characteristics. In particular, the conditional independence approach proposed by Janzing em-ploys kernel estimation to obtain estimations of the distributions of interest and then test their independence. Further, the author proposes a model that similarly to independent component analysis, tries to recover from the distribution of the residuals the causal direction of pairs of variables. These techniques show interes-ting features, but they require a deep understanding of the estimation mechanisms in order to be reliable when dealing with economic problems.

The first part of the dissertation will contain an introduction that summarizes the relevance of the chosen topic in economic research, and that contextualizes the problem of causal inference in complex systems. Following the introduction, a comprehensive chapter that builds the framework for causal inference, in the literature, and in our specific case. Another chapter will comprise the critical re-view of the above. The final chapter contains a two-fold empirical exercise. On the one hand, a simple replication of the Janzing study using the same specificati-ons. On the other hand, an attempt to amend the above thanks to the features that the PITEC dataset offers.

(11)

Chapter 2

Causal Inference in Complex

Systems

2.1

The Concept of Causation

Causality has been the subject of an extremely long debate in philosophy and epistemology that has lasted thousands of years (see for example Aristotle and his claims on the nature of scientific knowledge). Despite this long tradition, the debate is still vividly open, and it spread to many of the scientific fields that emer-ged in parallel to human (and economic) development. Tentatively, the debate has touched a variety of fundamental arguments regarding: 1) the actual possibility to make causal inference, 2) the (pluralistic) approaches and definitions of causality, 3) the appropriate methods to study it, 4) the applicability to social sciences, and 5) the interpretation of the estimated causal statements.

On the possibilistic account, there have been numerous argumentation both in favor and in opposition to infer causal statements in science. If on a practical ground, we could divide the answers to this topic between inference in experi-mental settings and inference in observational ones; on a philosophical ground, the debate has focused on the ontological side (e.g. What is the nature of a cau-sal relation?). The former will be fully covered in the next chapters of this work, while the latter has seen as main contributors, philosophers like Hume and Rus-sell, who were particularly skeptical about the possibility of carrying out causal inference, particularly in social contexts where the role of unobservability (for both variables and underlying mechanisms) is fundamental and ubiquitous.

Following this debate, a stream of different views of causality have emerged trying to link causation to different factors such as regularities, laws, probability, counter-factual, etc. The most influential ones are Suppes’ probabilistic theory (1970), the counter-factual approach developed by Lewis (1973), Salmon-Dowe

(12)

THE CONCEPT OF CAUSATION 2.2

process theory (Dowe, 2000, Salmon, 1998), and the manipulability approaches developed in Pearl (2000) and Woodward (2003). Each approach considers a particular aspect of causal influence, which is clearly a multi-faceted concept.

The paper by Moneta & Russo (2014) can stand in for a conclusion on this topic. First, they note there exists a clearcut difference between early metho-dologists and present-day ones in the way they addressed causal discovery and language. If in the past, the use of explicit causal language was widespread and credited, now researchers adopt much more caution in the formulation of causal claims. This can be a reflection of the advancement in both statistical modeling and comprehension of economic phenomena. Following this observation, the two authors build a framework for distinguishing associational and causal models by comparing them on three grounds: 1) background knowledge, 2) assumptions, and 3) methodology. Eventually, they propose the idea of causal model being the “augmentation” of associational ones that become augmented by combining extensive knowledge from different, not-necessarily statistical, sources.

This abundance of views, which are in most cases not mutually exclusive, has led to an interesting pluralism in causal inference. Thus, a thorough comprehen-sion of each view and an equally careful choice according to the field of study becomes fundamental in causal inference. In reflection of this, each scientific field picked the approach that fitted the best their needs, and scientists developed on it a variety of statistical and mathematical methods aimed at the production of causal claims.

In parallel to the possibilistic debate in epistemology, social sciences (and economics, particularly) alimented another similar debate on the possibility and extent to which it is possible to infer causation among variables. The political economist John Stuart Mill (1843, 1848) was one of the most skeptical resear-cher when considering the possibility to run practical induction in economics, since its phenomena were considered too complex. More than 150 years later, the social scientists community is facing a wide array of techniques and uncertainty-reducing tools ranging from probability to statistics, and merging the concepts that survived the debate in the last decades.

To conclude on the concept of causation in economics, it is interesting to see Dullstein’s account of Hall dualistic view of causality (Dullstein, 2007). In Hall’s view, cause can have two different meanings: 1) which relates to the concept of production, in the sense that causes are physically linked to and produce their effects, and 2) which related to difference making, in the sense that causes are responsible for differences, either probabilistic or counter-factual, in the presence of their effects. A similar argument is posed by Russo and Williamson (2007), who believe that reducing the concept of causation to either one of the two above concepts is a mistake, and only a combination of the two perspectives can be useful for causal inference (see Russo-Williamson Thesis for more details).

(13)

THE QUEST FOR CAUSAL INFERENCE 2.2

2.2

The Quest for Causal Inference

The decision to continue this document with an historical perspective has the pur-pose of highlighting the depth and complexity of the debate over causal inference, which started in physics and philosophy thousands of years ago and then spread to numerous other scientific fields. As reported by Hoover (2006), after seldom appearances in old writings, causality becomes a subject of formal study thanks to David Hume. On the one hand, he believed that economics should have been approached as a largely, causal science. On the other hand, he was particularly doubtful about the possibility to discover the nature of causation. This closure to the possibility of practicing inductive inference by the well-known philosopher made the scientific community very cautious toward the subject.

An additional voice to this chorus was the one belonging to John Stuart Mill, who actually was more open regarding inductive reasoning, but considered it in-applicable to social sciences (Mill, 1843). Indeed, as the basis of his methodology were controlled experiments, he acknowledged the limitations of experimenting on society due to its inner complexity, particularly given the available tools at that time. By acknowledging economics’ complexity and considering it as an “inexact and separate science” (Mill, 1848), Mill put a lot of emphasis on apriorism and ceteris paribustype of analysis. Despite being the proposer of a strong deducti-vist approach, Mill was probably one of the first economists that understood the obstacle played by complexity in fully understanding economic phenomena.

Although there have been XIX century economists who believed in causal reasoning on economic data, it was not until the XX century that causality and identification made their comeback in economics. Thanks to the development of innovative statistical techniques, such as tests of significance and regressions, researchers started to associate some of these new techniques to causal inference. At the end of the century, the publication of an important book shifted again the attention of scientists toward causality and probability: The Grammar of Science (Pearson, 1892). Pearson first noticed that a probabilistic dependence between two variables does not imply necessarily a causal connection. Then, he argued that science shall devote less attention to cause and effect relationships to favor the study of probabilistic dependence, in the form of contingency tables.

In the first half of the nineteenth century, economists such as Marshall and Walras sustained the concepts of partial equilibrium analysis (Marshall, 1930) and general equilibrium one (Walras, 1954), in which prices and quantities are simultaneously determined. This not only introduced the well-known econometric problem of identification, but also added complexity to the problem of causal inference.

Despite these obstacles, the development of structural econometric models in the 1930s has put forward the conceptual basis of probabilistic econometrics, later

(14)

THE QUEST FOR CAUSAL INFERENCE 2.2

formalized by Haavelmo (1944) and the Cowles Commission (1953, 1950). In his well-known publication, recurring to techniques of statistical inference, Haavelmo tried to reconcile economic theory with empirical data, two fields that were quite distinct at that time. His seminal work not only contains the formalization of the difference between correlation and causation, but also introduced the foundation for counter-factual analysis. He believed that nature can run experiments for re-searchers, and that causality could be grasped through a model that governed the observed data and the independent manipulation of inputs.

Less than ten year later, the Cowles Commission began to publish the results of its econometric research program inducing two different approaches. One the one hand, statistician Hermann Wold was a proposer of the so-called process ana-lysis. The emphasis of this approach was on the asymmetry of causal relations and relied upon these asymmetries and criteria of temporal precedence to infer predictions. Process analysis later became the conceptual foundation of the time-series studies leading eventually to Granger causality and vector auto-regression (VAR). On the other hand, the Commission connected the concept of causality with the invariance properties of structural econometric modeling, whose founda-tion can be found in Wright (1921). This method is based upon the distincfounda-tion between endogenous and exogenous variables and the identification and estima-tion of structural parameters is fundamental.

Given these premises, it is easy to understand the strong reliance on a pri-oriknowledge of this methodology. For this reason, the structural approach has been subject of the well-known Lucas’ critique (1976), which highlighted how the structural models fail to take into account factors that are likely to influence the structure of the economy itself (e.g. motivations and expectations of the agents).

Finally, contributing to the work of the commission, Simon (1953) put for-ward two beliefs: 1) that causality could be established not only when relating exogenous and endogenous variables, but also among the endogenous variables themselves and 2) that the conditions to identify a causal order were the same of the already known for the identification of parameters. Simon’s conceptual fra-mework is quite aligned with the one developed by the Cowles Commission, but with an exception. He believed that the identification problem could be solved recurring to experiments, either natural or controlled.

Despite the efforts made by statisticians and econometricians and the ascent of structural modeling, as noted by Hoover (2004), evidences of explicit causal language (in the sense of true causation that we will see later) in economics ex-perienced a fall between 1950 and 1990, with a couple of exceptions. First, the philosopher of science Hans Reichenbach formulated his well-known common cause principle (1956):

(15)

THE QUEST FOR CAUSAL INFERENCE 2.2

indicates either that A causes B, or that B causes A, or that A and B have a common cause. It also seems that causes always occur before their effects and, thus, that common causes always occur before the correlated events.

The principle implies that causal relations among random variables can be inferred following their associated statistical independence relationships.

A pair of decades later, Hausman (1983) opened to the possibility of obtaining a causal ordering of variables in a model that reflects the world’s causal order. Then, Hoover (1990) recovered Simon’s concept and developed a framework of conditional analysis. This approach combines non-statistical information (e.g. historical or institutional facts) with information derived from statistical testing, such as structural breaks, in order to make inferences.

On the same wake, economics saw the appearance of the literature on instru-mental variables applied to natural experiments (see for example, Angrist and Krueger 2001). The approach entails the use of observational studies aimed to assess the impact of a given policy intervention, and they are possible whene-ver there is a clear change in the jurisdiction of the geographic area of interest. Using the appropriate econometric tools, the experiment can help to identify the underlying dynamics and assess the causally relevant parameters.

If on the one side economists tried to base inferences upon structural models, there was another vein of study that believed in process analysis. The main con-tribution to this field consists of the approach developed by Clive W. J. Granger in (1969). The author developed a data-driven approach which is purely inferential in the sense that it does not imply the direct reference to prior economic know-ledge. Further, it pertains the process approaches category due to its applicability to dynamic time-series models.

The underlying idea is that it is possible to test causality by measuring the po-wer of prediction of the future values of a time series given prior values of another time series. The next chapter contains a critical and technical review of this appro-ach, which has been widely used in economics. A famous example of these appli-cations is the one by Sims (1972), who demonstrated the causal priority of money over nominal income, thanks to this technique. Later, by generalizing Granger-causality to the multivariate case, he proposed the use of time-series regressions with lagged values over structural econometric models in macroeconomics (Sims, 1980).

The grounds for the critique pointed to structural models were parallel to the ones moved by Lucas (1976), which were based on the failure to take endogenous variables into account with the purpose to solve the identification problem. If on the one hand Lucas was more focused on the fact that structural modeling is unstable under policy intervention, on the other hand Sims believed that the formal restrictions used by Cowles Commission researchers were not empirically

(16)

THE QUEST FOR CAUSAL INFERENCE 2.2

acceptable. Additionally, the macroeconomic chaos of the 1970s that peaked with the oil crisis of 1973 pushed the economics community to seek more trustworthy, alternative approaches.

In the same decade, another scholar named Arnold Zellner developed his own view on causality (Zellner, 1979). He believed in “predictability according to law(s)” being a fundamental feature of causation, but also advocated the use of an underlying structure known a priori. The latter can be used as the distinguishing features between laws and accidental regularities, a possible issue that could arise following Granger’s approach.

Among the emerged methodologies, Sims’ application received considerable attention, and critiques. Particularly, some researchers (Cooley and LeRoy, 1985, Leamer, 1985) highlighted that the results of the vector auto-regressions were strongly dependent on a particular decomposition used to restrict the parametric space by Sims. The economist acknowledged the critique and answered proposing structural vector auto-regressions (SVARs), which thanks to an assumption on the estimated parameter value solves the identification problem. As pointed out by Hoover (2006), it is worth noticing that the VAR approach was appreciated because of the avoidance of tenuous assumptions, while the SVAR introduces assumptions that must reconcile with economic theory.

A more recent development on causal inference in economics regards the graph-theoretic approach, which has been firstly developed by Wright (1934), then recovered in Pearl (2000) and Spirtes, Glymour, and Scheines (2001), and finally applied to economics (see for example (Demiralp and Hoover, 2003), or (Bes-sler and Lee, 2002)). Graph theory has the advantage to provide directly readable graphs in which arrows indicate the causal order among variable, as estimated by their dependence or independence among each other. By doing so, they become synthetic representations of the joint probability distributions under study. These graphic outputs can be obtained through the execution of some causal search al-gorithms that have been developed by scholars in these decades.

The typical algorithm uses information on statistical dependences (e.g. corre-lations or other tests) in order to have an insight on the possible structure. Then, it tests for independence among possible variables until exhaustion; combining this with other conditions, it is possible to obtain a graphical representation of the underlying relations. The most common of these algorithm is the PC one, na-med after its developers Peter Spirtes and Clark Glymour (1991), and it will be analyzed in depth in the next chapters.

Finally, quite recently, Dominik Janzing (2016) proposes the use of a com-bination of tools for causal inference to innovation survey data. The approach is new, in the sense that it brings together a variety of methodologies developed both in economics and computer science. First, he applies the usual statistical techniques (see for example Glymour and Spirtes (1991)) to test for conditional

(17)

THE QUEST FOR CAUSAL INFERENCE 2.3

and unconditional independences among the variable; by doing so, it is possible to obtain a first picture of the underlying processes. Then, the scientist applies additive noise-based causal discovery in order to deal with one of the main pitfalls of discovery algorithms. Further, Janzing proposes an interesting starting point to infer causal relations when one variable is discrete and the other is continuous using non-algorithmic inference by hand.

The overall procedure is interesting as it permits researchers to strike a balance between an excessively empirical approach and a purely a priori one by integra-ting statistical information with background knowledge to obtain credible causal claims in economics. Interestingly enough, the first intuition of this need in cau-sal analysis goes back to 1960, when the anthropologist Hubert Blalock critically reviewed Simon’s approach (1953). In his review, the author not only wondered why the social science community almost ignored the approach, but also high-lighted the quality of the conceptual foundations and its pitfalls, finally asking to embrace the fact that “causality can never be established empirically since one can never be assured that all relevant variables have been controlled” (Blalock, 1960). It is easy to see how this statement is particularly true for social sciences where the role of information, complexity and irrational behavior is fundamental (Nelson and Winter, 1982).

Indeed, at least two stylized facts should have emerged from this historical outline: 1) the existence and alternation of different approaches in causal inference (deductivism v. inductivism, see table 1.1 for a more clear-cut representation) and 2) the general lack of robust definition of causation that depends strongly on the field of application, the data available, the processes under study and many other possible determinants.

Table 2.1: Classification of Causal Approaches in Economics by Author from Hoover (2006)

Structural Process

A priori Cowles Commission

(1950s) Zellner (1979)

Inferential Simon (1953), Hoover (1990, 2001)

Granger (1969), Sims (1980)

A more epistemological question that could arise is whether true causal infe-rence is possible or not, but it is outside the scope of this dissertation, as it is the subject of a vivid philosophical debate dating back to the Ancient Greeks.

(18)

CAUSAL INFERENCE IN ECONOMICS (OF INNOVATION) 2.3

2.3

Causal Inference in Economics (of Innovation)

After this comprehensive historical outline, it should be clear that pluralism of approaches is a fact in causal inference both in science and in economics. The battlefield of this debate has longly been political economics, where researchers have always tried to answer to questions regarding monetary policy, trade and growth, labor costs and unemployment, for instance. However, the practice of causal inference is extremely relevant in almost any economic field for several reasons.

First, it helps to obtain a deep understanding of economic phenomena. Se-cond, it allows for the impact evaluation of policy interventions such as subsidies to innovation or new labor laws. Finally, causal inference can be helpful when trying to predict the future trends of given variables (e.g. GDP, trade, manufac-turing production). Differently from political economics, inferring causal state-ments in economics of innovation has been rarely explicitly addressed, but rather very deeply and widely studied in micro-econometrics. Indeed, studies in eco-nomic innovation contain several obstacles to the estimation because they rarely allow for the implementation of experimental settings.

Adding up complexity to estimation from observational data, many of the vari-ables usually studied are endogenous and subject to selection mechanisms, which are often obscure. More generally, economists still lack of a complete understan-ding of the many mechanisms underlying economic innovations.

Trying to estimate causal relations in this context, econometricians proposed several methods with its associated fallacies. Where randomized control trials are not possible, regression discontinuity design comes handy. However, as the randomized trials, the method is subject to high internal validity and low external one.

Researchers amended the early regression frameworks with approaches based on instrumental variables, which shall substitute the imputed ones and “clean” the results if appropriately chosen. More in detail, the typical Ordinary Least Squares regression framework fails on many accounts to provide causal insights also due to the above mentioned obstacles that rarely satisfy the assumptions nee-ded by the approach. According to Angrist and Pischke (2008), a regression can approximate the difference account of causation if the conditional independence assumption holds. The latter is also known as selection on observables assumption since it implies the knowledge and observation of all covariates to be kept fixed. However, as control and observability for confounding factors is not always pos-sible, methods that rely upon instrumental variables can be helpful. Potentially, this approach can be applied widely to address measurement error problems or to estimate simultaneous equation models; still, Angrist and Pischke (2008) believe that its most important current use is to solve the problem of omitted variable bias.

(19)

CAUSAL INFERENCE IN ECONOMICS (OF INNOVATION) 2.3

Unfortunately, innovation datasets and the economic set-up do not offer many possibilities in this regard, as current innovation surveys do not provide good, usa-ble instruments1 in the most studied aspects of innovation (see Arvanitis (2002)

and Cerulli (2010) for two detailed surveys on the estimation of the effect of public policies on business research and development). Even the use of lagged terms is restricted by the nature of typical innovation surveys that tend to be cross-sectional.

Finally, matching and re-weighting methods are able to get rid of the bias due to selection on observables and offer good estimates. Nevertheless, it is rare the selection depends only on observables and in this case, matching could actually increase the potential bias.

A more recent development on this topic is the method by Janzing (2016), who applied a combination of already-existing tools to the analysis of the 2008 wave of the CIS dataset, from which he pooled a set of countries to create a pan-European database. The approach is exposed to a series of limitations that will be analyzed in the next chapters, but it also has the potential to shed some light on innovation processes, if cautiously applied.

1For a variable to qualify as an instrument, it needs to be correlated with the causal variable of

(20)
(21)

Chapter 3

Literature Review on Causal

Inference in Economics

3.1

Graphical Causal Modeling and the PC

Algo-rithm

The characteristic that should have emerged the most from the historical outline and the variety of debates contained in it, is the importance of combining statistical knowledge with background facts and theory when inferring causation. This type of approach is gaining importance and visibility, thus becoming the foundation of a few approaches to causation. Among these, a possible route is graphical causal modeling, whose roots have been developed by Pearl (2000) and Spirtes, Glymour and Scheines (2001).

The purpose of this approach is to represent statistical dependencies through a collection of vertices and edges (e.g. arrows in the directed case) in order to have insights on the structure and mechanism involving a set of variables. Each vertex represents a variable and each edge the estimated causal relationships. The relation can be directed if the data reveals direct causation, bi-directed if causation is believed to be running in both ways, or undirected if there is uncertainty over the direction.

Before starting, it is important to realize that the field is still very fragmented and far from being a coherent body of standardly accepted procedure and definiti-ons. Although one of aims of this dissertation is to complement different sources to tackle individual weaknesses, it is worth noticing that the major reference on this has to be the book Causation, Prediction, and Search by Spirtes, Glymour and Scheines.

(22)

GRAPHICAL CAUSAL MODELING AND THE PC ALGORITHM 3.1

3.1.1

Formal preliminaries

The graphical approach requires an apparatus of concepts that require simple, but specific interpretation. The central element of the approach are graphs, as the one in Figure 2.1. A graph is an ordered triplet hV, M, Ei, corresponding to:

• a non-empty set V of vertices (or nodes) to represent variables,

• a non-empty set M of marks where ’>’ stands for direct influence, ’-’ for the empty mark (also EM), and ’o’ for partial orientation. The latter is out of the scope of this work, but for more references see Chapter 6 of Spirtes, Glymour and Scheines (2001),

• a set E of edges, which are ordered pairs of the form {[V1, M1], [V2, M2]}

that represent causal relations. X1

X2

X3 X4

Fig. 3.1: A directed acyclic graph showing possible causal relations among the random variables X1, X2, X3and X4

Using the above introduced language, the graph in Figure 2.1 can be encoded in the triplet: G:h{X1, X2, X3, X4} | {z } V , {−, >} |{z} M , {[(X1, −), (X2, −)], [(X2, −), (X3, >)], [(X2, −), (X4, >)]} | {z } E i

A graph can either be undirected if the set of marks M = {−} or directed if the set of marks M= {−, >} and for each edge in E the marks are always (−, >). Similarly, the vertices X2 and X3, and X2 and X4, form directed edges, where X2

is the said to be the parent and X3 and X4 its descendants. Then, a vertex V such

that A −→ V ←− B is said to be a collider, which can be unshielded if and only if A and B are not connected by any edge (i.e. not-adjacent).

Finally, cyclicity can be defined as the existence of paths that contain a vertex more than once, otherwise the graph is said to be acyclic. Although many combi-nations of these characteristics exist, the focus of this work is mainly on directed acyclic graphs (sometimes referred as DAGs).

(23)

GRAPHICAL CAUSAL MODELING AND THE PC ALGORITHM 3.1

A graph has the purpose of representing the causal relationships existing among a given set of random variables; the structure of these relations must be consistent with the corresponding conditional independence relations and it can be pursued through the application of specific algorithms. Formally,

Conditional Independence. Given three random variables X, Y, and Z, X is said to be conditionally independent of Y given Z (analogously, X y Y | Z) if

• P(X = x, Y = y|Z = z) = P(X = x|Z = z)P(Y = y|Z = z) in the discrete case • fXY|Z(x, y|z)= fX|Z(x|z) fY|Z(y|z) in the continuous case

Further properties of the conditional independence relation are symmetry and trivially, unconditional independence if the third variable Z is empty. The main implication of this relation is that once Z is known, discovering the value of Y does not provide additional information about X. In parallel, once Z is observed, observing realizations of Y becomes irrelevant for predicting the realization of X.

1

The key assumption that turns graphs based on probability distributions into possible representations of causal structures is known as the causal Markov con-dition. In the work by Glymour, Spirtes and Scheines (2001), the condition is enunciated as follows:

Causal Markov Condition. Let G be a causal graph with vertex set V and P a probability distribution over the vertices in V and generated by the causal struc-ture represented by G. Then, G and P satisfies the Causal Markov Condition if and only if for every X in V, X is independent of V/(Descendants(X)SParents(X) given Parents(X)

Equivalently, Scheines (2005) proposes a more formal and synthetic defini-tion:

∀X ∈ V, X y Non-effects of (X)|Direct Causes of (X) (3.1) For instance, the above condition holds on the causal graph in Figure 1 if and only if:

X3∧ X4 y X1|X2 (3.2)

Then, by application of some well-known properties of conditional independence and using the Markov condition as starting point, it is possible to compute many other implied relations. As pointed out by Hausman and Woodward (1999), it

1The most eager readers can find a direct application of this concept to see how conditional

independence, coupled with another few assumptions that will be presented below, can be helpful to recover the causal structure of a model in Bryant et al. (2009)

(24)

GRAPHICAL CAUSAL MODELING AND THE PC ALGORITHM 3.1

shall be explicit that for the condition to hold, the variables must be clearly dis-tinct from one another. Indeed, the parallel between causal relations and proba-bilistic distribution may not hold correctly if variables bear conceptual or logical connections to one another, or if their located values have part in common.

Forming the necessary tools to apply graphic causal analysis, there exist other two important definitions to introduce: the d-separation criterion and the so-called faithfulness condition. The former has the purpose to capture the number of con-ditional independence relations generated by a causal graph, while the latter can be seen as the complement of the Markov condition.

In practice, d-separation helps to complete the implications put forward by the Markov condition. More specifically, it was designed as an algorithm that computes, for any directed graph representing a linear statistical model, all and exclusively those conditional independence relations that hold for all values of the parameters (Pearl, 1988). The definition is not really intuitive as it makes use of several words from the specific graphic dictionary.

D-separation Criterion. Given a directed acyclic graph G, with X and Y as verti-ces for which X , Y holds, consider W the set of vertiverti-ces in G not containing X or Y, then X and Y are d-separated given W if and only if there exists no undirected path U between X and Y, such that (i) every collider on U has a descendant in W and (ii) no other vertex on U is in W.

Thus, if two variables are d-separated relative to a set of variables Z in a di-rected graph, then they are independent conditional on Z in all the possible pro-bability distributions that a graph can represent. Intuitively, if a path is considered active whenever it moves information (i.e. statistical dependence), then two vari-ables are d-separated if there exists no active path between them. The opposite, in terms of d-connectedness, holds, as well.

There exist a number of implications for this concept that has been proved by scholars in the field. First, d-separation correctly estimated the conditional independence relations contained in cyclic directed graphs when interpreted as statistical models (1996a). Then, Pearl (1995) demonstrated that this property extended also to a special class of discrete causal models. In 1994, a scholar showed that the criterion can be used as a starting point to determine both cyclic and acyclic graphs (Richardson, 1994). Finally, Spirtes et al. (1996b) showed that d-separation is also applicable to linear statistical models with correlated errors.

Please note that all of this holds under the more general assumption that the causal structure of real world is entirely captured by the directed graph that acts as statistical model. As mentioned before, there is still a missing condition that form the basis of causal graphical discovery, namely the faithfulness condition. Faithfulness Condition. Let G be a causal graph and P the associated probabi-lity distribution with the vertices of G. Then, < G, P > satisfies the faithfulness

(25)

GRAPHICAL CAUSAL MODELING AND THE PC ALGORITHM 3.1

condition if and only if every conditional independence relation true in P is en-tailed by the causal Markov condition applied to G

The above statement is equivalent to:

if X y Y|Z, then X is d-separated from Y by Z (3.3) Then, it should be straightforward that the condition in Equation 3 above combi-ned with the Markov condition imply the following relation

X y Y|Z ⇐⇒ X is d-separated from Y by Z (3.4)

The idea underlying the faithfulness condition is the one to tie probability distribu-tions with the associated causal structure. If for the linear cases this could be done invoking normality and using the simple correlation coefficient, the two above conditions allow to do the same without forcing any assumption about Gaussian distributions.

To conclude, on the one hand the causal Markov condition implies that a causal structure generates come conditional independence relations. On the other hand, the faithfulness condition ensures that all the conditional independence relations are generated by the causal structure.

Constraint-based causal search

Using the conditions and criteria enunciated above, researchers both at UCLA and at Carnegie Mellon University started to develop algorithms to infer a set of causal graphs from observational data in a way that they are compatible with the corresponding conditional independence relations. Nevertheless, an extremely easy and straightforward application of the above concepts can be found in Bryant, Bessler & Haigh (2009).

Given a set of variables for which it is possible to observe a set of realizati-ons, there exist mathematical algorithms that are able to reproduce the estimated underlying causal structure. Obviously, scholars have come up with different al-gorithms that fits particular settings. A setting can be characterized in terms of the structure detected and aimed to estimate.

This becomes particularly relevant for conditional independence testing. As many testing procedures exist, the search algorithm shall incorporate the one that best applies to the nature of the data and the properties of the statistical model. In a classic, parametric, Gaussian setting the search for conditional independences is carried out through simple zero partial correlations testing; technically, Spirtes et al. (2001) suggest to recur to Fisher’s z transformation.

(26)

GRAPHICAL CAUSAL MODELING AND THE PC ALGORITHM 3.1

More difficult but not conceptually impossible, is testing conditional indepen-dence in non-parametric settings. Recently, scholars have started to use estimati-ons of kernel densities and then, they check whether the distance between the es-timates is close enough to zero. However, the procedure is subject to high degree of complexities that eventually bring to a curse of dimensionality in this setting.

In order to better understand what is meant by setting, the simplest instance shall consists of a setting having causal sufficiency (i.e. no latent variables) and acyclicity, implying the absence of feedback loops. Nonetheless, the latter con-dition is always satisfied when using directed-acyclic graph structures. More in general, there is a set of assumptions that is entirely or partially shared by every existing algorithm of this class and it entails the following (Spirtes et al., 2001):

• The set of observed variables is causal sufficient

• Every unit in the population has the same causal relations among the varia-bles

• The distribution of observed variables is faithful to a directed acyclic graph • The statistical decisions required by the algorithms are correct for the

po-pulation

The above requirements, particularly the fourth, may appear binding and rarely satisfied in the real world, but they are less demanding than the usual assumptions that would be required in other models with causal interpretation. Further, the last requirement has been known for not being always necessary in the work carried out by the algorithm.

However, it should be noted that these are the assumptions that pertain directly to the execution of the algorithm, while there are others that will be needed when interpreting the results coming out of the statistical analysis. Typically, algorithms take as input the covariance matrix and the procedure involves the estimation of d-separation criteria for the model starting from conditional independence testing for the discrete case and vanishing partial correlations for the linear continuous case.

The typical procedure of constraint-based causal discovery entails at least two phases: 1) an adjacency phase in which the adjancies are determined and 2) an orientation phase, when as many edges as possible are directed.

Probably the most known and used algorithm for causal discovery is the PC algorithm (Spirtes and Glymour, 1991) developed by Spirtes, Glymour and Schei-nes emending an earlier version. The starting point consists of drawing the com-plete undirected graph G connecting the variables of interest in all possible ways. The first step implies the recursive elimination of edges on the basis of conditional

(27)

VECTOR AUTO-REGRESSION AND THE STRUCTURAL FORM 3.2

independence tests starting from zero order independencies to higher order ones. The second step entails the identification of unshielded colliders in the intermedi-ate output. This helps to orient the first edges.

Orientation is then carried out by the identification of chains and cycles lea-ding to the final output: a set of directed acyclic graphs. The fact that the output is non-singular poses the problem of observational equivalence. Indeed, given four variables, it is possible to imagine up to 543 directed acyclic graphs. Among these, it is also possible to identify a sub-set of observationally equivalent graphs. These refer to a class of graphs sharing very similar characteristics, but containing ambiguities on the direction of some edge.

The problem was first faced by Verna and Pearl (1990), who proposed the observational equivalence theorem and the use of “patterns” to represent a DAG conditional independence equivalence class. On the one hand, the theorem states that:

Observational Equivalence Theorem. Two directed acyclic graphs are condi-tional independence equivalent if and only if they contain the same vertices, the same adjacencies, and the same unshielded colliders.

This allows for the application of consistent weak Bayes estimators on the effects of manipulations on the model. Furthermore, the concept of patterns uses the above theorem to reconcile which predicted effects of a manipulation are the same in every member of a conditional equivalence class and which are not.

Finally, possible extensions in constraint-based causal discovery algorithms try to deal with non-typical settings involving features such as the possibility of latent variables and loops. For the latter, some modifications must be taken into account since in this setting the causal Markov condition does not hold anymore and directed acyclic graphs must be substituted with directed cyclic graphs; ho-wever, the d-separation criterion is still valid and helpful in detecting conditional independences.

An example of such an algorithm is the CDC algorithm that can be found in Richardson (1996). Moreover, dealing with latent variables, both the causal Markov condition and the faithfulness one can be exploited for causal search, and some scholars (Spirtes et al., 2001) introduced the use of a a special arrow in causal graph to denote a common latent variable between two variables and ameliorated their PC algorithm for this specific setting leading to the FCI one.

3.2

Vector Auto-Regression and the Structural Form

As seen in the historical outline, vector auto-regression is an approach originally proposed by Christopher Sims in 1980 as a response to the untrustworthiness of

(28)

VECTOR AUTO-REGRESSION AND THE STRUCTURAL FORM 3.2

the structural equation models of that time. In his article Macroeconomics and Reality, he harshly criticized the previous econometric work and its strong re-liance on a priori restrictions. He believed that macroeconomic models could work only in the idealized world of market clearing and economic equilibria. On the contrary, he proposed the idea of a “disequilibrium economics”, where prices themselves may sluggishly adjust, business behavior is not invariant to changes in the public’s taste (Lucas, 1976), and markets do not always clear.

Unsatisfied with the unrealistic assumption and results of the works by the Co-wles Commission, Sims proposed a linear model in n-equations and n-variables in which each variable is in turn explained by its own lagged values, plus current and past values of the remaining n-1 variables. Despite its simplicity, the model provi-ded a systematic approach to detect and study multiple time series dynamics. The final purpose was to propose a robust and credible tool to understand economic fluctuations.

As reported by (Stock and Watson, 2001), a vector auto-regression can be expressed in three forms. First, a reduced one, where each variable is a linear function of its own past values and the past values of all other regressors, and which has a serially uncorrelated error term. Second, a recursive form that con-structs the error terms in each regression in a way that they are uncorrelated with the term of the preceding equations. This is done by realizing that the error terms can be uncorrelated in some of the n! possible orderings of the model that in-cludes the contemporaneous values of other variables as regressors. And finally, a structural form that draws the contemporaneous relations among the variables. This form, together with the independence among error terms, are fundamental features for policy analysis.

Beyond its simplicity, the strength of the approach developed by Sims relies in the interpretation of each of these forms, which are rich of information. First, the reduced and the recursive forms summarize the co-movements of the data series. However, due to the complicated dynamics often involved in this type of modeling, instead of focusing on the estimated regression coefficients or on the r-squared statistics, it is standard practice to report results from Granger-causality tests, forecast error-variance decompositions and impulse responses.

Impulse responses correspond to the impact on current and future values of each of the regressors given by a one-unit increase in the current value of one of the error terms, assuming this will return to zero the subsequent period and all others are zero. This is usually performed either on the recursive form or on the structural one because the thought experiment makes clearly more sense when errors are uncorrelated across equations. Additionally, the structural functions is enriched with the contemporaneous structure of the model.

The forecast error-variance decomposition represents simply the ratio of the variance of the error made in forecasting a given variable due to a specific shock

(29)

VECTOR AUTO-REGRESSION AND THE STRUCTURAL FORM 3.2

at a given time horizon. Eventually, Granger-causality statistics indicate whether the lagged values of a given variable are helpful to predict another variable. The idea is that a variable X Granger-causes Y if Y can be better predicted using the histories of both X and Y than it can using the history of Y alone. This information can be easily retrieved from the reduced form equations by looking at the lagged coefficients of interest and the associated F-statistics and p-values.

Florens and Mouchart (1982) noted that the hypothesis testing of Granger-causality is equivalent to testing conditional independence among variables. Nota benethat the concept of Granger-causality is particularly related to the concept of predictive causality by its own definition, and this does not always coincide with true or structural causality. Indeed, as pointed out by Hoover (2001), predictive causality does not necessarily imply the existence of an underlying economic me-chanism (structural causality) by which one variable’s manipulation can affect another, and the opposite is also true.

The above conceptual limitation is not the only one that pertains the appli-cation of vector auto-regressions; nevertheless, the other pitfalls of the appro-ach have a methodological nature and come only when considering the structural form, which is widely used for policy analysis.

A vector auto-regression can be written as

B(L)Yt = Utwhere B(0)= I (3.5)

The subscript t is the time index, Ytis an n X 1 column vector of the contemporary

values of Yit, i = 1, 2, ...,n; B(L) is a conformable square matrix whose terms

are polynomial in the lag operator; and Ut is a column vector of residuals with

elements uit.

If the reduced form model can be estimated directly from the data, the esti-mation of both the recursive and the structural forms entails the solution of an identification problem to estimate the corresponding form starting from the vector auto-regression in Equation 1. This happens because there are more unobserved parameters than parameters that can be estimated without imposing restrictions on the system.

For example, a linear structural auto-regression corresponds to the equation above “enriched” by the contemporaneous structure; an operation that can be ea-sily carried out by pre-multiplying each side of the vector auto-regression by a matrix that can orthogonalize the error terms. The original solution to the identifi-cation problem proposed by Sims (1980) recurred to the Choleski decomposition. However, it did not take long to notice that since the transformation is unique, the result was dependent on the ordering of the variables in the vector Yit, which then

impacts also the order of residuals (Cooley and LeRoy, 1985).

The novelty in the structural vector auto-regression process is to deal with the identification problem by imposing restrictions on the contemporaneous variables.

(30)

VECTOR AUTO-REGRESSION AND THE STRUCTURAL FORM 3.2

In the literature, there exist two popular approaches to design these restrictions: 1) recurring to economic theory and other a priori knowledge to impose plausible constraints on the interactions between variables (Blanchard and Watson, 1986), and 2) assuming that in the long run, certain economic shocks have effect to some variables, but also no level effects on others (Shapiro and Watson, 1988).

As reported by (Demiralp and Hoover, 2003), there is a wide consensus that there exists no empirical or statistical grounds for the choice of the contempora-neous causal ordering; thus, economists must rely on a priori knowledge. The structural implications of this approach are only as robust as their identification schemes, which do not always receive the attention they deserve. Nevertheless, there are economic fields where natural experiments have been successfully app-lied to deal with this issue (Stock and Watson, 2001).

An alternative and more recent approach to the identification scheme is found in Moneta et al. (2011, 2013). The authors argue the application of graphical causal models (Pearl, 2000, Spirtes et al., 2001), independent component analysis (Comon, 1994), and tests for conditional independence (Demiralp and Hoover, 2003, Swanson and Granger, 1997) to deal with the structural identification in vector auto-regressive models.

Specifically, they argue the use of graphic causal modeling to recover the com-plete (or partial) contemporaneous causal structure thanks to the analysis of con-ditional independencies among the estimated residuals in Gaussian settings, while recurring to independent component analysis in non-Gaussian ones. On the one hand, the approach allows to minimize the role of a priori economic theory in the structural identification of the model, thus distancing itself from the approach pro-posed by the Cowles’ Commission in the 1950s. On the other hand, the authors emphasize the fundamental need for developing an accurate, appropriate and rich statistical model of the real data.

Although structural vector auto-regressions have been widely applied in eco-nomics, this led to the development of a large variety of econometric tools that must be applied as responses to the specific statistical structure of each real world dataset. This improved the reliability of this method, but requires caution and a deep understanding in its application. The use of graphic causal modeling emer-ges as a good alternative to avoid to excessively rely on economic theory. Howe-ver, this shall not imply the disregard on economic knowledge, which must keep a fundamental role in causal discovery, but rather the combination of the knowledge coming from the data and the one derived from background theory.

Clearly, focusing on the structural model is a big step further when compared to the reduced form. Indeed, the latter is a simple associational model, while the former tries to represent the true data-generating process to a certain extent. Nevertheless, the structural auto-regression approach still shows some weaknesses that shall be deeply investigated when using them.

(31)

ADDITIVE NOISE MODELS 3.3

In particular, there are instances of possible high sensitivity to the measured variables, controls and rescaling operations that may be included. It still does not allow for across-individuals and over time heterogeneity, thus hindering the potential for causal analysis using panel data. It is susceptible to biases due to unobserved feedbacks among variables or latent ones. Finally, it shares the main drawback with its parent, and it consists of the unbreakable relation between num-ber of possible shocks and numnum-ber of equations.

3.3

Additive Noise Models

Another family of causal discovery methods is the one made up by additive noise models, whose roots can be found in Hoyer et al. (2009). So far, only models that draw information from conditional and unconditional independences have been presented. An alternative source of information for causal inference focuses on the variables joint distributions and particularly, on their properties.

The additive noise model class belong to the latter stream of research and is seen as an intuitive approach to distinguish cause from effect. Considering the two possible decompositions of the joint probability distribution P(X, Y): P(X)P(Y|X) and P(Y)P(X|Y), the underlying idea is to look if one decomposition looks “sim-pler” than the other. The conceptual foundation of this obviously relates to and is a special case of Occam’s razor. Despite the easiness of the core idea, the actual implementation became immediately a challenging research program in terms of defining simplicity and complexity, estimating it from finite data and justifying the causal inference potential.

More formally, additive noise models belong to the class of functional models to which the structural equation models belong as well. Nevertheless, instead of assuming that effects are linear functions of their causes plus independent, Gaus-sian noises, additive noise models use typically non-linear functions of causes and latent noise variables. It is key to formalize the language that comes from anot-her scientific fields: for additivity of the noise, the author refers to a property of the residual that has a bounded and almost regular range around the estimated function.

Although the models have been extended to cover different settings, the sim-plest one consists of a bivariate case for X and Y and it assumes that the two variables are dependent, there is no confounding, no selection bias and no feed-back loops. Thus, by application of the Reichenbach’s principle, inferring the causal graph is reduced only to the decision whether X −→ Y or Y −→ X, with the possibility of latent variables between the two. In this setting, if Y is believed to have an effect of a cause X and other m latent causes contained in the vector U,

(32)

ADDITIVE NOISE MODELS 3.3

then it is possible to formulate the relationship as        Y = f (X, U1, . . . , Um) X y U, X ∼ pX(x), U ∼ pU(u1, . . . , um) (3.6) where f : IR × IRm → IR is a possibly non-linear function, while pX(x) and

pU(u1, . . . , um) are the joint densities of the observed cause X and latent causes

U. Independency between the two variables is implied by the fact that there it was assumed no cofounding, no measurement noise, no selection bias and no feed-back loops. As the latent causes are unobserved, their influences in Equation 6 can be summarized by a single effective noise variable EY ∈ IR (also referred as

disturbance term):        Y = fY(X, EY) X y EY, X ∼ pX(x), EY ∼ pEY(eY) (3.7) Still, Hoyer and his colleagues point out that the same equation could be re-written switching X and Y and the result would be to have observationally equi-valent models. The ideal way out of this identification problem is to find access to the interventional distributions2 that breaks this symmetry and helps to infer the

causal direction.

However, as this is not possible, researchers proposed alternative methods. For instance, Kano and Shimizu (2003) showed that it is possible to exploit non-Gaussianity of the input and noise distributions to establish the directionality of such functional models. As mentioned by the authors, the technique is closely related to the already mentioned independent component analysis, which relies as well on assumptions such as non-normality and independence between the error and explanatory variables. This paved the way to other compelling ways to exploit non-linearity and other related properties with the final purpose of “breaking up the symmetry” and establish the direction of causality.

The key contribution for this class of model has been the fact that also non-linearity of functional relationships is helpful to identify the causal direction, as long as the influence of the noise is additive. Basically, the approach relies on the fundamental assumption that the relationship between two correlated variables is not symmetrical in each aspect. Particularly, since any measurement set contains some noise from various causes, the authors argue that the pattern of noise contai-ned in the causal variable must be different from the one in the effect.

The aim of the additive noise model is to detect this asymmetry in noise pat-terns and compute the associated statistical testing to determine a causal direction. Assuming it is possible to fit a non-linear model in one direction and the noise

(33)

ADDITIVE NOISE MODELS 3.3

turns out to be additive and independent of the regressor, then by construction, this direction is more likely to be the causal one. The underlying rationale to un-ravel this asymmetry is that additivity, due to boundedness and rough constancy, implies more independence for the noise than any other forms. Thus, the residual resulting from the other regression would be less independent of the regressor.

A practical estimation on two observed scalar variables x and y would work as follows. First, it involves testing whether the two variables are statistically independent. In the negative case, x is non-linearly regressed on y (i.e. y := f(x)+ n) and the corresponding residuals ˆn = y − ˆf(x) are computed. Then, if the model is consistent with the data and the associated residuals are independent of x, it gets accepted, otherwise rejected. The same procedure is repeated on the reverse model.

The scenarios that might arise after this estimation are the following ones. First, if the two variables turn out to be mutually independent then it is unlikely to infer the existence of a causal relationship between the two. Oppositely, if they are deemed dependent but both models fit well, it is impossible to infer directionality from the data. A more positive result is when the accepted model is unique and the associated direction can be inferred. A final scenario takes place when neither model is consistent with the data, which brings to the conclusion that the under-lying data-generating mechanisms are more complex and cannot be captured with the additive noise model.

Obviously, it is possible to generalize the procedure for the multi-variate case; however, this is feasible for very small networks (an upper bound of seven variable was set by the authors) since it implies the estimation and testing of all directed acyclic graphs compatible with the observational structure and the associate mul-tiple hypothesis testing.

The two above problems are clearly the main limitation that additive noise modeling has in its application to datasets. However, it suffers from other two weaknesses that pertains more the underlying concept of the model. First, alt-hough trivial, it must be pointed out that the approach is purely data-driven and deductivist, which can be appropriate for the so-called “hard sciences”, but it may perform poorly or differently on social sciences such as economics. Indeed, the capability of solving the causal identification problem from purely observational data can be at the same time a strength and weakness, according to the specific context of application.

Second, the approach relies on the fundamental, structural assumption of noise additivity, which is often the case in telecommunications and other fields but again, economics may not involve a similar scenario. Nonetheless, the authors performed a thorough testing on the procedure and used information criteria to evaluate its performance compared to other approaches or to other different ver-sion of the same approach and the result were quite positive. Further, they tested

(34)

ADDITIVE NOISE MODELS 3.3

the method on specific datasets for which a “ground truth” was already establis-hed (i.e. altitude −→ temperature) and the outcome was correct for up to 80 % of the simulations. It shall be pointed out that the approach seems to perform even in the presence of weaker confounding factors or latent variables. Again, this can be seen a perk because it makes it fit for more settings, but also as a disadvantage because it does not guarantee in any way the identification of the transmission mechanism of causation and thus it may uselessly narrow down and simplify an intrinsically complex setting.

Finally, Table 2.1 concludes the chapter by collocating each of the reviewed approaches in the clear-cut representation proposed originally by Hoover (2006) and contained in its original form (sorted by author) at the end of the first chapter.

Table 3.1: Classification of Causal Approaches in Economics by Method

Structural Process

A priori Structural Equation

Modeling Zellner (1979) Inferential Structural Vector Auto-Regression, Instrumental Variable, Conditional Analysis Granger’s causality, Vector Auto-Regression,

Additive Noise Models, Graph Theory

Quite interestingly, among the methods that have been reviewed in this chap-ter, there is one that eschews strongly from this classification and poses itself at the very center of it: Moneta et al. (2011). Indeed, the approach proposed by the authors consists of an application of graph theory for causal search to structural vector autoregressive models. By doing so, they do not deny the role of a priori economic theory in the identification of a structural model, but they successfully minimize its role shifting the emphasis on the appropriateness and richness of the statistical model of the data.

(35)

Chapter 4

Janzing’s Approach

As mentioned in Chapter 2, the method developed in Janzing (2016) is new to economics and extremely multi-faceted. Its being a novelty in economic science is because it actually merges a variety of methodologies and borrows also concepts from other scientific fields such as machine learning, physics and data analysis.

Starting from the stylized facts that all current causal models have their we-aknesses, the author tried to minimize their role by combining three different, existing approaches to infer causality, namely: graphic causal discovery associa-ted to (conditional) independence testing, additive noise modeling, and inference by hand.

The formal language of the study results is that of graphical causal modeling presented in Section 3.1, as the purpose is producing directed acyclic graphs repre-senting the network of causal relations interplaying among the variable of interest. As pointed out by the author, the analysis of highly heterogeneous datasets, such as innovation surveys, implies taking into consideration firms from a variety of countries, regions, sectors, sizes, etc. This can clearly hinder the potential for causal discovery and if the sample size permits, it calls for a breakdown at a higher level of detail.

However, potential refinements shall be performed keeping in mind that the conditioning of sub-groups on given variables (e.g. size of firms) may induce sta-tistical dependences that actually do not exist before this operation; thus, possible results shall be interpreted accordingly. This type of selection bias may induce the so-called Berkson’s paradox; the result is that two independent events become conditionally, negatively dependent given that at least one of them occurs.

To conclude this introduction, it is obvious that a correct and cautious design and estimation of the statistical model is the only way to obtain sound results. Indeed, as the modeling process is made up of several stages (i.e. assumptions, statistical and causal information, background knowledge, etc.), it should be clear that the results of empirical studies are primarily valid within the model in

(36)

consi-GRAPHIC CAUSAL DISCOVERY 4.1

deration, and it is each separate step of the modeling procedure that contributes to the overall validity of the model.

4.1

Graphic Causal Discovery

The first step in Janzing’s procedure consists of applying some of the concepts developed in these decades by both UCLA and Carnegie-Mellon University scho-lars to identify a first network of causal interactions. It is noteworthy that the author restricted the use of constraint-based search algorithm to the possibility of directed acyclic graphs only. This implies that neither feedback between variables and cycles are admitted as possible discoveries.

Indeed, the approach is able to detect only one of the alternative scenarios that arise from Reichenbach principle: 1) X −→ Y, 2) Y −→ X, and 3) there is a common cause Z influencing both variables. Additionally, the author also assumes that no more than one of the above cases can happen for the same network. This is a further assumption that may rarely be satisfied in real life situations where reverse causation and cyclic processes are typical; however, there are two reasons that explain this choice.

The first is rather technical and it concerns the applicability and procedures of the additive noise model that will be employed in the second part of the analysis. In fact, the latter works only in situation representable by directed acyclic graphs and it has the big limitation of not being able to pick one estimated model when more than one fits well the data. On the other hand, rather than a second reason, it is possible to find a justification, which relies upon the fact that it may be more fruitful for practical investigation purposes to focus on the main causal relations. This could arise several philosophical question on the righteous of this choice, but it is also true that statements like “all variables influence each other” have little potential in terms of policy advising.

As mentioned before, causal graphs require the application of the Markov condition, which was presented early. To put the definition into practice, it is possible to consider the following example of directed acyclic graph:

X −→ Y −→ Z (4.1)

By causal Markov condition, it is possible to factorize the joint density distributi-ons corresponding to (1).

p(x, y, z)= p(x)p(y|x)p(z|x, y) but note that p(z|x, y) = p(z, y) (4.2) The Markov condition is a key feature of graphic causal models, but it is also subject to some limitations and pitfalls. The first to notice a weakness was Udny

Riferimenti

Documenti correlati

Un sistema geotermico può essere definito schematicamente come “un sistema acqueo convettivo, che, in uno spazio confinato della parte superiore della crosta

This proceedings included papers from 2018 3rd International Conference on Advances on Clean Energy Research (ICACER 2018), held in Barcelona, Spain from April 6-8, 2018.. This

This PhD Thesis deals with the synthesis of iminosugar alkaloids with pyrrolidine, piperidine and pyrrolizidine structures, their biological evaluation towards a range of

Currently, Italy, has the lowest number of beds for acute psychiatric admissions (1.7 per 10,000 inhabitants) compared to all other European countries and the length of hospital

© Copyright: School of Design Jiangnan University, Aalto University School of Arts, Design and Architecture and the Authors.. All content remains the property of authors, editors

In this appendix we are interested in spontaneous baryogenesis induced by the decay of φ, thus we do not necessarily have to require reheating to have taken place prior to the

[r]

the 14 N / 15 N ratios were computed from the propagation of errors on the column densities, as explained above, without taking the calibration uncertainties into account because