Hierarchical structures in ownership networks

(1)

Facolt`

a di Scienze Matematiche, Fisiche e Naturali

Dipartimento di Fisica

Corso di Laurea Magistrale in Fisica

Hierarchical structures in ownership

networks

Relatori:

Prof. Fabrizio Lillo

Prof. Salvatore Ruggieri

Prof. Riccardo Mannella

Candidato:

Giulio Bartali

Sessione autunnale

Anno Accademico 2018/2019

(2)

(3)

Introduction

The study of networks finds applications in different disciplines from physics to medicine, from biology to economy, and constitutes a branch of complex systems that has had a remarkable development in recent years, in particular since the be-ginning of the 21st century. Basically, a network is a set of elements (for example particles, people, companies, etc.) represented by nodes that can interact with each other; these interactions are represented by links between nodes. Networks can be directed or undirected: in the first case the links have a direction while in the sec-ond case they do not have one. If the links have a weight (typically a numerical attribute) the network is called a weighed network.

In our thesis work, we will study a directed weighted economic network, called an ownership network [8] in which the nodes represent shareholders and companies, the links represent the ownership relations and the weights represent the percent-ages of ownership. In particular we will investigate the empirical network of the Italian companies, shareholders and shareholdings, taken from a 2012 snapshot of the Italian business register.

Initially we do an exploratory analysis of the network and then move on to the study of the first main problem we want to deal with in this thesis, namely the study and identification of a hierarchical structure [11]: a perfect hierarchical structure in a network provides that the set of nodes can be partitioned into an ordered collec-tion of subsets (disjoint between them) that we will call layers, such that links exist only from a node of a lower rank layer to a node of a higher rank layer.

In particular, in an ownership network, this hierarchical arrangement of nodes means that the layers with a lower rank contain those shareholders which own and can therefore control the companies placed in the layers that have a higher rank. A par-ent shareholder control a firm if he has an ownership percpar-entage greater than 0.51 since typically voting rights are proportional to ownership rights. The ownership or control of a certain company A can be undirected i.e. the shareholder owns, for ex-ample, a company B which in turn owns company A, or directed i.e. the ownership does not involve intermediate companies [16].

There are several studies, in literature, that deal with the problem of network control and hierarchical structure, [3], [8], [10], [19], [20]. There are several ways in which the hierarchical structure can be identified and therefore the hierarchy we find also depends on the algorithm used. We provide a review of the most interesting and most used ones.

In our work we are going to compare two different methods of identification of the hierarchical structure. We want to understand if applying these two algorithms to our empirical network the two hierarchical structures obtained are the same or they present some differences and, if so, try to understand what kind of node structures

(6)

6 CONTENTS generate them. To do this we will compare the two results and then go on to design targeted analyses and examples so as to be able to disentangle the results obtained. The first of the two algorithms [9], that we will call topological algorithm, is based on a pyramidal model: at top level there is a parent shareholder (a shareholder which owns firms but it is owned by anybody), that owns some companies, which in turns own other companies at layer with an higher rank and so on. The second algorithm [13],[11], [22], that we will call agony algorithm, looks for the partition in layers that minimizes a certain quantity called agony: this quantity is calculated through a function that penalizes the backward links i.e. those links that start from a node with a higher rank and arrive to a node with a lower rank.

The second problem we deal with in this thesis is the search and identification of shareholders hidden agreements. In a shareholders agreement a set of the sharehold-ers of a company own and control the company together, yet none of them has the full control when considered individually. We refer to these shareholders as parent shareholders (sometimes they are members of the same family [14]). In particular we will focus on the identification of shareholders hidden agreements. In this case the parent shareholders hide ownerships through a chain of shareholdings. These ones are interesting from a point of view of hierarchical structures and firm con-trol. Typically the ownerships of parent shareholders follow a clear hierarchical (or pyramidal) structure which allows them to have the control of a company through small percentages of ownership: for example if a parent shareholder A controls a firm B with an ownership percentage of 51% which in turn controls firm C with an ownership percentage of 51%, A controls C with an undirected ownership percent-age of 26%. The proposed identification method makes use of the results obtained from the partition in layer and the concept of similarity [36]: two nodes are similar if they have common properties. Clearly two shareholders may not have common ownerships at a directed level (directed similarity) but may have them at an undi-rected level (as in the case of a shareholders hidden agreement). For this reason the similarity that we will consider will be calculated taking into account both directed and undirected ownerships and we will refer to it as undirected similarity.

Once all the shareholders agreement are identified, we will study a ranking based on the directed and undirected similarity of the parent shareholders to highlight the shareholders hidden agreements.

We propose two different clustering algorithms, based on the average percentage of ownership of the firms by the parent shareholders and on the distance between them [21], to break down particularly complex subnetworks (for example the ones with a large number of parent shareholders) in which it is not immediate to understand who controls the companies commonly owned.

This thesis is composed of six chapters. The first three are a summary of the material found in state of the art literature, while the other three contain original material. We summarize their content:

• 1 - Network science: in this chapter we give an overview of networks. Starting from a formal definition of network, we continue with the study of the fundamental tools used in the study of networks and the mathematical methods used to analyze network measures and statistics. Furthermore an example of a random network called Erd˝os-R´enyi model is introduced, useful

(7)

CONTENTS 7 for a comparison with the real one that we will analyze.

• 2 - Ownership Networks and hierarchy detection: state of art: in this chapter we cover the relevant literature on the problems faced in this thesis. In particular we focus on the description of ownership network, on the concept of hierarchical structure and on that of control. Furthermore we will discuss in detail the two extraction algorithms that we are going to compare. Finally we introduce and describe the concept of shareholders agreement.

• 3 - Dataset, software tools and methods: in this chapter we give a brief description of our dataset and of the different types of networks that can be derived from it. Subsequently we review the largely used python packages and the software tool FANMOD useful for the search of motifs, i.e. of structures formed by a fixed number of nodes. Then we introduce similarity and distance measures between nodes, and the hierarchical clustering algorithm, all tools useful for the shareholders agreements analysis. Finally we describe the Greedy algorithm through which it is possible to search for communities, i.e. groups of nodes with similar features that tend to connect with each other rather than with nodes of other communities.

• 4 - Exploratory analysis: in this chapter we deal with an exploratory anal-ysis of our network to highlight its nature and its structural characteristics. In particular, we compare the results obtained with those known in the litera-ture or with those expected from a random network, and try to give them an interpretation in light of the fact that we are analyzing an economic network. • 5 - Comparing methods of hierarchy identification: in this chapter we compare the results obtained using the topological algorithm and the agony algorithm, for the identification of the hierarchical structure and the related layers partition. Starting from a first direct comparison between the two re-sults, we will investigate the origin of the differences found through targeted studies and examples.

• 6 - Structures in ownership networks: shareholders hidden agree-ments: in this concluding chapter we propose a method for the identification of shareholders agreements and a ranking based on the similarity of parent shareholders to highlights the shareholders hidden agreements. Subsequently we propose two different types of clustering for the breakdown of particularly complex shareholders agreements.

Finally, we present the conclusion and some possible insights for extending the approach of this thesis.

(8)

(9)

Chapter 1 Network science

In this chapter we introduce the concept of network and we describe the fundamen-tal network measures and statistics that characterize it. All these quantities and concepts can be found in more details in many textbooks and articles.

We also review one of the best known random network model called Erd˝os-R´enyi model so to compare it with real networks.

1.1 General definitions

Network science is a branch of complex systems that analyze network (or graph): a graph G is a pair (N , L) where N is a finite set of size N and L is a binary relation on N . The set N is called the vertex set of G, and its elements are called vertices, or nodes. The set L is called the edge set of G, and its elements are called edges or links: we call the total number of links L. In a graph self-loops edges from a vertex to itself are possible.

A network can be undirected or directed (see [12]):

• In an undirected network G = (N , L), the edge set L consists of unordered pairs of vertices: an edge is a set {u, v} where u, v ∈ N . We use the notation (u, v) for an edge and so in an undirected network (u, v) and (v, u) are the same edge (see Fig. 1.1b). We say that u is a neighbor of v and vice versa. Facebook connections are an example of undirected network;

• In a directed network G = (N , L), the edge set L consists of ordered pairs of vertices so we say that the edge (u, v) starts from vertex u and ends in vertex v (see Fig. 1.1a). In this case only v is a neighbor of u. The citation network, in which the articles are the nodes and the citations are the links is an example of directed network.

A network can also be weighted, i.e. the links have a weight1_{, or unweighted,}

i.e. the links do not have a weight and so every link is equal to the other.

From a directed or undirected graph we can extract a subgraph: we say that a graph G0 = (N0, L0) is a subgraph of G = (N , L) if N0 ⊆ N and L0 _{⊆ L: so given}

a set N0 ⊆ N , the subgraph of G induced by N0 _{is the graph G}0 _{= (N}0_{, L}0_{), where}

L0 _{= {(u, v) ∈ L : u, v ∈ N}0_{}. An example is represented in Fig.1.1c.}

1_{For weighted network we use the notation notation (u, v, w), where w is the weight of the link.}

(10)

10 CHAPTER 1. NETWORK SCIENCE

(a) Directed graph (b) Undirected graph (c) Subgraph

Figure 1.1: Directed and undirected graphs: (a) A directed graph G = (N , L), where N = {1, 2, 3, 4, 5, 6} and L = {(1, 2), (2, 2), (2, 4), (2, 5), (4, 1), (4, 5), (5, 4), (6, 3)}. The edge (2, 2) is a self-loop. (b) An undirected graph G = (N , L), where N = {1, 2, 3, 4, 5, 6} and L = {(1, 2), (1, 5), (2, 5), (3, 6)}; the vertex 4 is isolated. (c) The subgraph of the graph in (a) induced by the nodes set {1, 2, 3, 6}. Taken from [12].

Computationally speaking, an efficient way to look at a network is reading it like a matrix. There is a one-to-one correspondence between the graph representation and the matrix representation, in fact we can define the Adjacency matrix (N × N ) in this way [1]:

• In the undirected case the matrix A is symmetric and defined as: Aij =

(

1 if there is an edge between nodes i and j,

0 otherwise (1.1)

• In the directed case usually the matrix A is not symmetric and it is defined as:

Aij =

(

1 if there is an edge from node i to node j,

0 otherwise (1.2)

Clearly instead of 1 we can have the weight of the edge. In this case the Adjacency matrix is often indicated with Wij instead of Aij.

1.2 Degree of a node

For an undirected network we can define the degree of a node as the number of links connected to it or equivalently the number of links that start from that node (see [2]). ki = N X j=1 Aij (1.3)

We can express the total number of links in the network as a function of the nodes degree: L = 1 2 N X i=1 ki = 1 2 N X i=1 N X j=1 Aij (1.4)

and using this equation we can express the average degree in function of L and N as: hki = 1 N N X i=1 ki = 2L N (1.5)

(11)

1.3. RANDOM MODEL: PROPERTIES AND DISTRIBUTIONS 11 For a directed network we can define two quantities as follows:

• out-degree: k_iout = N X j=1 Aij (1.6)

the total number of links that start from the selected node. If we have a weighted network we can also compute the weighted out-degree i.e. the sum of the weights of the links that start from the selected node:

k_iw =

N

X

j=1

Wij (1.7)

this quantity is often called the out-strength [3]. • in-degree: k_iin = N X j=1 Aji (1.8)

the total number of links that end in the selected node. Also in this case, if we have a weighted network we can compute the weighted in-degree called in-strength.

The in-degree and the out-degree are connected to the total degree of a node by the relation:

ki = kini + k out

i (1.9)

and so for the total number of links and the average in-out-degree we have:

L = 1 2 N X i=1 ki = N X i=1 kout_i = N X i=1 kin_i (1.10) hkin_{i =} 1 N N X i=1 k_iin= hkouti = 1 N N X i=1 kout_i = L N (1.11)

1.3 Random model: properties and distributions

In this brief section we study a random network model known as Erd˝os-R´enyi model. We use a random network to predict some interesting properties and to have a benchmark for our results. In a random network there are some fixed properties that give origin to a random structure.

In the Erd˝os-R´enyi model we fix the number of nodes N and the probability p of connection and we find the mean value of links number and the average degree: in this model the links are independent. Usually this kind of random network is indicated as G(N, p) [2]. We can compute the probability to have exactly L links in the network making the product of three terms (the following discussion concerns the undirected case) [2]:

(12)

12 CHAPTER 1. NETWORK SCIENCE • The probability that L of the N (N − 1)/2 possible node pairs are connected:

pL_;

• The probability that the remaining N (N − 1)/2 − L pairs are not connected: (1 − p)N (N −1)/2−L; • A combinational factor, N (N −1) 2 L

counting the number of different ways the L links can be placed among the N (N − 1)/2 possible pairs.

So the probability is:

pL = N (N −1) 2 L (1 − p)N (N −1)/2−LpL (1.12)

It is a binomial distribution. From that we can find the average value of L and than the average degree:

hLi = N (N −1) 2 X L=0 LpL= p N (N − 1) 2 (1.13) hki = 2hLi N = p(N − 1) (1.14)

To find the degree distribution of this random model we can repeat the same reasoning. So the probability that a given node i has a degree k is given by the product of three terms (remember that the maximum degree is N − 1) [2]:

• The probability that i has k edges: pk_;

• The probability that the remaining N −1−k possible links are missing: pN −1−k_;

• A combinational factor,

N − 1 k

counting the number of different ways the k links can be chosen among the (N − 1) possible links.

So the probability is:

pk=

N − 1 k

(1 − p)N −1−kpk (1.15)

still a binomial distribution.

Often the average value of the degree hki is smaller then the number of nodes in the network N . So in this case we can use the relation k N to approximate the binomial distribution with a Poisson distribution.

pk =

hkik

k! e

−hki

(1.16) This is the Poisson approximation of the degree distribution of the G(N, p) ran-dom model.

(13)

1.4. THE SCALE FREE PROPERTY 13

1.4 The scale free property

If a network were to be an Erd˝os-R´enyi network we expect that the degree distri-bution would follow the Poisson distridistri-bution found above. Instead in many real networks the degree distribution is well approximated by a power-law distribution [35]: pk ∼ k−γ or ( pkin ∼ k −γin in pkout ∼ k −γout out (1.17) where γ is called the degree exponent [2].

We define as scale-free network a network whose degree distribution follows asymp-totically a power law with 2 < γ < 3 (see the end of the section). In the power law case we have nodes with an high degree called hubs, that in the random case do not exist (see Fig. 1.2).

Generally the distribution is not a pure power law but is a truncated power law (see [34]):

pk∼ k−γe−αk (1.18)

where α is the cut-off coefficient.

We can normalize the probability distribution in the discrete formalism in this way [2]: pk= Ck−γ with C = 1 P∞ k=1k−γ (1.19) For large k values it is convenient to assume that k is a continuous variable instead of a discrete variable and write the power law as:

p(k) = Ck−γ with C = R∞1

kmink

−γdk = (γ − 1)k γ−1

min (1.20)

where kmin is the smallest degree for which the power law of Eq. (1.19) holds.

In this case p(k)dk is the probability that a given node had a degree between k and k + dk. From the continuous case we can find a relation between kmax (the node

with kmax is called the largest hub) and the total number of nodes N assuming that

in a network of N nodes we expect at most one node in the (kmax, ∞) regime:

Z ∞

kmax

p(k)dk = 1

N (1.21)

and then using Eq. 1.20 we find:

Z ∞ kmax Ck−γdk = C −γ + 1k −γ+1 ∞ kmax = C γ − 1k −γ+1 max = k γ−1 mink −(γ−1) max = 1 N kmax = kminN 1 γ−1 _(1.22)

Now we can compute the nth _{moment in function of k}

(14)

(a) Poisson vs Power law (b) Poisson vs Power law (log-log scale)

(c) Random network (d) Scale-free network

Figure 1.2: Poisson network vs Power law network: in (a) we have a comparison between the Poisson distribution and the power law distribution with γ = 2.1. Both distribution have hki = 11. In log-log scale it is more evident the differences of the two distributions in large k regime, (b). The size of the nodes in (c) and in (d) is proportional to their degree: in (c) we have a random network with hki = 3 and N = 50 and we see that most nodes have k ' hki; in (d) we have a power law network with hki = 3 and N = 50 and we see that there are numerous small degree nodes together with a few hubs (nodes with a large degree). Taken from [2]

(15)

1.5. CONNECTED COMPONENTS 15 hkn_{i =} Z kmax kmin knp(k)dk = C Z kmax kmin k−γ+ndk = Ck −γ+n+1 max − k −γ+n+1 min n − γ + 1 (1.23)

In the limit of large network N → ∞ and then kmax → ∞ we can distinguish

two cases [2]:

• If n − γ + 1 ≤ 0, the first term in (Eq. 1.23) goes to zero as kmax increase, so

all moment with n ≤ γ − 1 are finite.

• If n − γ + 1 > 0, the first term in (Eq. 1.23) goes to infinity as kmax increase,

so all moments larger than γ − 1 diverge.

Often the degree exponent is between 2 and 3 and this means that for the N → ∞ limit the first moment hki is finite while the second and higher moments diverge. This is the reason why we call that type of network scale-free, because they do not have a meaningful internal scale, but are scale-free.

So in a real network we expect a degree distribution given by Eq.1.17 or by the Eq. 1.18: for example internet network, citation network and protein interaction network are scale-free.

1.5 Connected components

A network can consist in one or more connected set of nodes i.e. connected com-ponents. There is no path between any pair of nodes in the different comcom-ponents. A network with more than one component is called disconnected otherwise it is called connected.

For an undirected network the definition of connected component is the following: ”a component is the subset of the nodes of a network such that there exist at least one path from each member of that subset to each other member, and such that no other node in the network can be added to the subset while preserving this prop-erty” [1].

For a directed network we can define two types of connected components: • Weakly connected component (WCC): in this case we do not consider the

direction of the edges, so the result that we find is the same of the undirected case.

• Strongly connected component (SCC): in this case we consider the direc-tion of the edges and we still apply the definidirec-tion above. ”A strongly connected component is a maximal subset of nodes such that there is a directed path in both directions between every pair in the subset”[1].

Typically in an undirected network (or directed if we look at the weakly con-nected components) we find a large component that includes most of the nodes. In this case we call this component the giant component (usually it covers more than

(16)

Figure 1.3: Components in an undirected network: the component formed by the red nodes is the giant component. The blue nodes form other smaller connected components. Taken from the Web (https://www.sci.unich.it/ francesc/teaching/network/components.html)

the 50% of the network). An example of connected components and giant compo-nent is reported in Fig. 1.3.

Another interesting aspect of the connected components is their sizes distribu-tion. In literature there are several works in which the connected components size distribution is studied. In [26] Newman found the expected distribution 2 _starting

from a network in which the degrees follow a power law distribution: he found that this distribution, in the asymptotic limit, is a truncated power law:

ps∼ s−γe−αs (1.24)

The distribution develops the exponential cut-off when a giant component is present.

1.5.1 The birth of the giant component in Erd˝

os-R´

enyi model

The existence of a giant component is physically interesting because the size of the largest component undergoes a phase transition from constant size to extensive size at one particular value of p and so of hki.

We consider again an Erd˝os-R´enyi random graph an we call u the average fraction of nodes not included in the giant component. The probability that a given node i does not belong to the giant component is given by the sum of two terms [1]:

• i is not connected to j through a link: 1 − p;

• i is connected to j but j does not belong to the giant component: pu. where j is any node in the network except i.

So the probability that i is not connected to the giant component through node j is 1 − p + pu while the complete probability u of not being connected to the giant component through any of the N − 1 other nodes is [1]:

u = (1 − p + pu)N −1 (1.25)

(17)

1.5. CONNECTED COMPONENTS 17 Using Eq.1.14, and the k N limit we can rearrange the previous equation:

u =

1 − hki

N − 1(1 − u) N −1 taking the log of both sides:

ln u = (N − 1) ln 1 − hki N − 1(1 − u) ' −(N − 1) hki N − 1(1 − u) = −hki(1 − u) taking the exp of both sides:

u = e−hki(1−u) (1.26)

Since u is the fraction of nodes not in the giant component then we can define S = 1 − u as the fraction of nodes in the giant component: rearranging Eq.1.26 we find [1]

S = 1 − e−hkiS (1.27)

This equation does not have a simple solution for S in closed form but we can still compute the graphical solution.

(a) Graphic Solution (b) Phase Transition

Figure 1.4: Graphical solution for the size of the giant component: (a) The three curves show y = 1 − e−hkiS for different values of hki (0.5, 1, 1.5) while the diagonal dashed line shows y = S and the intersection is the solution. (b) The blue line represents the resulting solution for the size of the giant component as a function of hki while the red broken line represents the same quantity but obtained from a simulation by changing the value of hki and generating a random graph G(N, p) with N = 100.

In Fig. 1.4a the dashed red line represents the curve y = S while the continuous blue lines represent the curve y = 1 − e−hkiS for three different values of hki. The intersection of the dashed line with the continuous line represents the graphical solution of Eq. 1.27: we clearly see that for certain values of hki there is only one solution and it corresponds to S = 0, while for other values we have two distinct solutions (one is still S = 0). The transition between one and two solutions for S takes place when:

d dS(1 − e −hkiS ) S=0 = 1 (1.28) hki = 1 (1.29)

(18)

18 CHAPTER 1. NETWORK SCIENCE Until now we have only proved that for hki < 1 a giant component can not exist while for hki > 1 it can exist but not that it has to exist. To prove that we can follow these steps (see [1]):

• Find a small connected set of nodes in the network with s nodes;

• Divide the set in its core (nodes with connections only within the set) and in its periphery (nodes with at least one neighbor outside the set);

• Enlarge our set by adding to it the nodes of the periphery and so create a new periphery;

• To find how big is the new set we can reason like that: the nodes outside the set are N − s, and the average number of connections that a node in the periphery has to the nodes outside the periphery is:

p(N − s) = hkiN − s

N − 1 ' hki (1.30)

where the equality becomes exact in the limit N → ∞. So the average number of immediate neighbors of the set is about hki times the old periphery3; • Repeating this argument as many times we want, the average size of the

pe-riphery will increase by a factor of hki and we can distinguish two cases: – if hki < 1 the average size of the periphery will shrink every time and

eventually it will go to zero: this is the case in which can not exist a giant component;

– if hki > 1 the average size of the periphery grows exponentially and eventually can form a giant component: this is the case in which can exist a giant component;

with these steps we see that indeed we expect a giant component if hki > 1. The size of the giant component is given by the larger solution of Eq.1.27.

In the Fig. 1.4b the blue curve represents the graphical solution of Eq. 1.27 for different values of average degree hki while the red curve represents a simulation obtained by changing the value of hki and generating a random graph G(N, p) with N = 100 and p = hki/(N − 1); for every fixed value of hki we compute the size of the giant component 100 times and then we took the mean value to reduce the effect of the statistical noise. Clearly in simulation there is not an edgy trend near hki = 1 but the behavior is smoother and continuous as shown in figure.

1.6 Clustering coefficient

An interesting notion in complex network is the clustering coefficient.

To give a simple definition of clustering coefficient we define first the notion of path length [1] (see Fig. 1.5):

3_{This calculation overcounts the number of neighbors if two nodes in the set have the same} neighbor outside the set, but the probability of this happening is very small when N is large

(19)

1.6. CLUSTERING COEFFICIENT 19

Figure 1.5: Example of walk: two different paths that connect node 1 to node 7 are represented. One is shown in orange and the other in grey; both of them have length 3 and since they are the shortest paths that connect node 1 and node 7, 3 is also the distance between these two nodes. Taken from [2]

• A walk in a network is any route that runs from node to node along the edges; • A path is a walk that does not intersect itself;

• The length of the path is the number of edges that compose the path; • The distance of two nodes is defined as the length of the shortest path that

connects the two nodes.

Now we can define the clustering coefficient as the fraction of closed paths of length two i.e. that form a loop of length three, or a triangle, in the network:

C = (number of closed paths of length two)

(number of paths of length two) (1.31)

”C = 1 implies perfect transitivity (if node u is connected to node v and node v is connected to node w then u is connected to w too), i.e. a network whose components are all cliques (all nodes in a component are connected to all others)” [1].

The clustering coefficient is a property of each node. The previous definition con-siders all the nodes in the network and returns a single value of C, but we can give an alternative definition of clustering coefficient valid individually for each node (it is called the local clustering coefficient ). For the undirected case we have [5]:

cu =

2T (u) ku(ku− 1)

(1.32) where T (u) is the number of triangle having node u as a vertex and ku is the degree

of node u. For the weighted case we can define the clustering coefficient as the geometric average of the subgraph edge weights [5]:

cw_u = 1 ku(ku− 1) X v,w (w_buvwbuwwbvw) 1/3 _(1.33)

wherew_buv are the weight normalized by the maximum weight in the networkwbuv = wuv/max(w).

(20)

20 CHAPTER 1. NETWORK SCIENCE For the directed case the clustering is similarly defined respectively, as the frac-tion of all the possible directed triangles (see Fig. 1.6) for unweighted graph or geometric average of the subgraph edge weights for weighted graph [6].

Figure 1.6: Directed triangles: all eight directed triangles with node u as one vertex.

We report the unweighted version while for the weighted version just change the number of triangles with the geometric average of the weights as in the undirected case:

cu =

1

ku(ku− 1) − 2k↔u

T (u) (1.34)

where T (u) is the number of directed triangles having node u as a vertex (see Fig. 1.34), ku is the total degree of node u and ku↔is the reciprocal degree4 of u (see also

[4]).

Another equivalent way to define the local clustering coefficient of Eq. 1.32 is [2]:

ci =

2Li

ki(ki− 1)

(1.35) where Li is the number of links between the ki neighbors of node i (i.e. between the

nodes to which node i is connected).

In a random undirected network the average number of links between the ki

neigh-bors of node i is [2]:

hLii = p

ki(ki − 1)

2 (1.36)

so using Eq. 1.35 we have for a node in a random undirected network: ci = 2hLii ki(ki− 1) = p = hki N − 1 ' hki N (for k N limit) (1.37)

We obtain a local clustering coefficient which does not depend on the selected node so the average local clustering coefficient of a random undirected network is still:

cr =

hki

N (1.38)

1.7 Assortativity

The assortativity is a measure that quantifies the tendency of nodes to connect to other nodes with similar characteristics, assortative mixing, or different characteris-tics, disassortative mixing.

(21)

1.7. ASSORTATIVITY 21 Generally the ”similar characteristics” used is the degree: in this way we measure the tendency of nodes to connect to other nodes with the same/different degree. For the undirected and directed case we can define a general quantity call the average degree connectivity computed for every single node [4]:

knn,i = 1 |N (i)| N X j=1 Aijkj (1.39)

where N (i) are the neighbors of node i.

For the directed network we can consider as target degree, kj, the in/out-degree of

the node and the same for the source degree (with source we means the node of which we are computing the knn quantity). For the weighted case we can consider

the weighted degree.

Instead of having a knn value for all the network nodes is useful to have a single

quantity that tells us the general attitude of the network for what regards assor-tativity. One of the most used measure for this purpose is the Pearson correlation coefficient for assortativity.

To define that, we previously have to define the mean µ of the value of xj at the

end of an edge as (xj is some quantity of node j we are interested in) [1]:

µ = P ijAijxj P ijAij = 1 2L X j kjxj (1.40)

where L is the total number of edges in the network and we are considering an undirected network (the final results can be extended to the directed case). So the covariance of xi and xj is:

cov(xi, xj) = P ijAij(xi− µ)(xj − µ) P ijAij = 1 2L X ij Aij(xixj− xiµ − xjµ + µ2) = = 1 2L X ij Aijxixj − µ2 = 1 2L X ij Aijxixj − 1 (2L)2 X ij kikjxixj = = 1 2L X ij Aij− kikj 2L xixj (1.41) and the Pearson correlation coefficient for assortativity (r) is defined as the covariance normalized by its maximum value. So when the quantity xi is the degree

of the node i we have 5_:

r = P ij Aij − kikj/2Lkikj P ij kiδij − kikj/2Lkikj (1.42) The value of r is between -1 and 1: a value of r near 1 means a strong correlation between the degrees so an assortative mixing; a value of r near - 1 means a strong anti-correlation between the degrees so a disassortative mixing.

5_{For a directed network the quantities x}

(22)

1.8 Centrality measures

A centrality measure is a ranking that orders the nodes from the most to the least important looking at some specific property or quantity of the node. There are many definitions of centrality measures and we report below, the ones we are going to use in this thesis:

• degree centrality: the importance of a node is proportional to its degree, so the node with the largest degree is the most important and so on;

• in-degree centrality: the same of degree centrality but with the in-degree quantity (directed network only);

• out-degree centrality: the same of degree centrality but with the out-degree quantity (directed network only);

• Katz centrality [1], [30]: if we call xi the centrality of node i we have

xi = α N X j=1 Aijxj + β x = β(I − αA)−11 (1.43)

so the centrality of node i is proportional to the centrality of the adjacent nodes plus a small term β, given ’for free’ to every nodes: this term is included because of a problem with the eigenvector centrality (it is the same of Katz without the β term) in which only nodes that belong to a strongly connected components of size two or more, can have non-zero eigenvector centrality [1]. To solve the system we can set β = 1 because we are not interested in the absolute centrality measure values but only in the relative ones. Moreover we have to choose an appropriate value of α because the term (I − αA)−1diverges when det(I − αA) passes trough zero i.e. when α = 1/λmax where λmax is the

largest eigenvalue of A. So we have to choose a value of α smaller than this. • Page-rank centrality [1] and [31]: the Katz centrality has the undesirable

feature that if a node with high Katz centrality has links connected to many others then all of those others nodes also get high Katz centrality; to allow this we can define the Page-rank centrality in which the centrality a node derives from its neighbors is proportional to their centrality divided by their out-degree xi = α N X j=1 Aij xj kout j + β

In this case we have to set α < λmax where λmax is the highest eigenvalue of

(23)

1.9. COMMUNITIES 23

1.9 Communities

For community we mean a group of nodes with similar characteristics that tend to connect with each other rather than with nodes of other communities.

To explain better what communities mean in network we can list three fundamental hypothesis (see [2]):

• Fundamental Hypothesis: a network community structure is uniquely en-coded in the network and can be uncovered by analyzing the Adjacent matrix Aij;

• Connectedness and Density Hypothesis: a community is a dense and connected subgraph. With connectedness we mean starting from any node we can reach every node in the community. With density we mean that nodes that belong to a community tend to connect to the other members of that community than to nodes not included in the same community.

• Random Hypothesis: in Random networks an inherent community struc-ture is lacking. So if we compare the link density of a community with the link density obtained from a randomly rewired network with the same nodes, we could decide if the original community corresponds to a dense subgraph, or if its structure emerged by chance.

• Maximal Modularity Hypothesis: the optimal community structure is given by the partition with maximum modularity. The modularity measures the quality of each partition focusing on the deviation from the random model (see Eq. 1.44).

We consider an undirected network with N nodes L links and nc communities.

Each community has Nc nodes and Lc links where c = 1, ..., nc. The modularity of

a community is defined as the difference between the network real wiring diagram (Aij) and the expected number of links between i and j if the network is randomly

wired (pij) [2]: Mc= 1 2L X (i,j)∈Cc (Aij − pij) (1.44)

where pij is obtained randomizing the original network, while keeping the expected

degree of each node unchanged so:

pij =

kikj

2L (1.45)

replacing pij in Eq. 1.44 using Eq. 1.45 we find:

Mc= 1 2L X (i,j)∈Cc Aij − kikj 2L = 1 2L 2Lc− k2_c 2L = Lc L − k_c2 4L2 (1.46)

where kc is the total degree of the nodes in the c-th community.

If we sum all over the communities we find the modularity of the complete network:

M = nc X c=1 Lc L − kc 2L 2 (1.47)

(24)

24 CHAPTER 1. NETWORK SCIENCE Using this definition of modularity and aware of the fourth hypothesis there are several ways in which we can find the optimal communities partition of the network. In Sec. 3.3.2 we describe the one we are going to use in this thesis.

1.10 Motifs

Another typical analysis of the network is the study and the search of the network motifs (see for example [23]). Network motifs are patterns of interconnections be-tween a fixed number of nodes that occur, in complex networks, much more (or much less) frequently than in randomized networks [7]. There are different types of motifs since they depend on the number of nodes we fix called ”size of the motif”: for a fixed size, there can be several ways to form the pattern, more for a directed network than for an undirected network.

Figure 1.7: Motifs: some examples of motifs of size 4 in a directed graph. Taken from the Web (http://evelinag.com/blog/2014/06-09-comparing-dependency-networks/).

To detect motifs, we look at the different occurrences of them between the original network and the randomized network: the randomized network is obtained random-izing the original network while keeping unchanged the degree of each node. In case of directed network both in and out-degree are kept unchanged. The randomization process is repeated several times to obtain different randomized networks. With these randomized networks we compute the average occurrence number of each mo-tif and its standard deviation.

Generally the over or under-expression of a motif respect to the random case is quantified through the z-score defined as follow:

z-score = Nmotif original− Nmotif random

σ (1.48)

where Nmotif original is the occurrences number of the selected motif in the original

network, Nmotif random is the average occurrences number of the selected motif in

the random networks and σ is its standard deviation.

So the z-score tells us how different the occurrence number of a certain motif is compared to that computed with the randomized networks, in standard deviation units.

(25)

Chapter 2 Ownership Networks and

hierarchy detection: state of art

In this chapter we introduce the ownership networks.

We start by giving some fundamental characteristics of the ownership network and by explaining how to compute a new graph called ”integrated ownership” that give us all the information about directed and undirected ownerships. Then we discuss typical ownership structures and the notion of control. We also define hierarchical structures and we study how to identify them in the network with two different algorithms. Finally we discuss the notion of shareholders agreement and why it is interesting from a point of view of hierarchical structure and control.

This chapter covers the relevant literature on the problem of control in network and of hierarchy identification.

2.1 Ownership network

Ownership networks represent a special type of economic networks, where the struc-ture of shareholders or companies are represented by nodes and links in the network. The directed links between nodes represent possession and the weight of the link (if there is) represents the shareholder percentage of ownership of a firm. Shareholders can be entities who cannot be owned themselves as natural persons, families, co-operative societies, registered associations, foundations, public authorities or other legal entities or can be national or transnational corporations [8].

Shareholders invest in a company to generate returns: they own the company to-gether and they receive annual dividends1 in proportion to the shares owned. More-over business decisions are made by the board of directors of the company that is nominated during shareholder meetings with a voting mechanism that generally fol-lows the one-share-one-vote rule: voting rights are proportional to ownership rights (i.e., to shares owned). Thus if a shareholder owns 51% of shares he controls the business decision[9].

As we said above the element Wij of the weighted adjacency matrix is the

per-centage of ownership shareholder i has in company j so we expect that: 1_{That part of the profit that is delivered by a company to its shareholders.}

(26)

26CHAPTER 2. OWNERSHIP NETWORKS AND HIERARCHY DETECTION

N

X

i=1

Wij = 1 (2.1)

for every owned node j, i.e. for every node with non-zero in-degree. Generally it is difficult that the weighted in-degree sum up to 1 because some times there can be missing ownership data (percentages values very small and therefore negligible or sometimes there can be companies or firms that are external to the network and they do not appear).

We can formalize the ownership graph using the following definition [9]. We de-fine N = {1, ..., N } as the set of IDs of shareholders and companies. An ownership graph is a weighted directed graph G = (N , L) where a directed edge l = (i, j, w) is in L ⊆ N × N × (0, 1] if node i owns w > 0 shares of node j. The set of in-coming neighbors of node j is Nin(j) = {i ∈ N | Aij = 1}. Analogously Nout(i) =

{j ∈ N | Aij = 1} is the set of outgoing neighbors of i.

Sometimes we have other information attached to nodes: in fact the nodes can be characterized with a quantity vi, called non-topological, that generally represents

market capitalization, total assets or the operating revenue (see Fig. 2.1).

Figure 2.1: Example of ownership graph: the nodes represent the shareholders and the firms while the directed links represent the percentage of ownership; vjis a non-topological state variable, acting as a proxy for their value or size. Taken from [8].

In an ownership network we can find several types of interesting structures. To introduce some terminology and concepts, four examples are represented in Fig. 2.2 [9].

(a) non-canonical (b) canonical (c) partnership (d) pyramidal (chain)

Figure 2.2: Examples of ownership graphs: four typical examples of structures present in an ownership graph; see the text for a description of the four examples shown. Taken from [9].

(27)

2.2. THE INTEGRATED OWNERSHIP GRAPH 27 • The network in Fig. 2.2a is called non-canonical because of the self-link of node 3. A self-link in an ownership graph means that the issuing company owns some treasury shares but a ”treasury shares do not pay a dividend, have no voting rights, and cannot exceed a maximum percentage of total capitalization” [9]. So the percentage of ownership that other shareholders own of that company has to be renormalized through a re-weighting procedure (see next point).

• The network in Fig. 2.2b is called canonical because the self-link has been removed and the links that arrived at node 3 have been re-weighted following this procedure: if wjj is the weight of a self link of node j the re-weighted links

l = (i, j, w) for i 6= j are l0 = (i, j, w0) where w0 = w/(1 − wjj) .

• The network in Fig. 2.2c is called partnership. A partnership is the simplest organization of companies and their shareholders where a company is owned by shareholders that own nothing else.

• The network in Fig. 2.2d is called pyramidal and represent a clear example of a chain of ownership.

2.2 The integrated ownership graph

The original ownership graph does not give us directly all the ownership informa-tions. In fact, if we want to compute the total amount of shares Oij that shareholder

i owns of company j we have to consider both the directed and the undirected own-ership relations. For example in Fig. 2.2d node 2 directly owns 20% of shares of node 4 and indirectly owns additional shares through the path h2, 3, 4i that is 0.6·0.25 = 15%. The O matrix with elements Oij is called the integrated ownership graph and

the element Oij is called the integrated ownership of j by i (see [9]). Oij is defined

recursively as:

Oij = Wij +

X

k6=i

OikWkj (2.2)

namely, Oij is the sum of two distinct term: the directed ownership given by Wij

and the undirected ownership through a node k 6= i given by OikWkj 2.

In matrix notation Eq. 2.2 becomes:

O = (I − diag(O))W + OW (2.3)

Clearly the Adjacency matrix of the integrated ownership graph is not equal to the one of the original ownership graph indicated with A. We will call B this new Adjacency matrix.

2_{”Undirected ownership through i itself is not considered, since this amounts at cycles of} self-ownership. Cycles through other companies are instead possible because national legislations admit cross ownership: contrarily to treasure shares cross-owned shares pay a dividend and have voting rights” [9]

(28)

28CHAPTER 2. OWNERSHIP NETWORKS AND HIERARCHY DETECTION

2.2.1 Solution of the integrated ownership problem

We report now the solution of the integrated ownership problem which consists in the computation of the O matrix. The solution and the algorithm are taken from [9] and are those we are going to use in our thesis.

A solution of Eq. 2.3 is (see [16]):

O = diag(V )−1(V − I) with V = (I − W )−1 (2.4) Using the Newman series we find:

V − I = (I − W )−1− I =X

n≥0

Wn− I =X

n≥1

Wn (2.5)

i.e., V − I is the transitive closure of W so ”Vij, for i 6= j, is the sum of the weights

of all paths from i to j, and Viiis the sum of the weights of all paths from i to i plus

1” [9]. The matrix I − W is invertible under a condition that is generally satisfied by ownership graphs3_.

The final solution is:

Oij =

(

Vij/Vii if i 6= j

(Vii− 1)/Vii if i = j

(2.6) We can justify the results in this way: for i 6= j, the sum of weights of all paths from i to j can be divided into the sum of those paths that do not cycle through i plus those that first cycle on i and then do not, so Vij = Oij + (Vii− 1)Oij, and

solving this we obtain Oij = Vij/Vii; for i = j, the sum of all paths is directly

Vii − 1 = Oii+ (Vii − 1)Oii, and solving this we obtain Oii = (Vii − 1)/Vii (this

quantity is the amount of shares self-owned by shareholder i) [9].

2.3 Control in network and hierarchy

identifica-tion

Regarding ownership network we are particularly interested in the concept of con-trol. With control we mean which shareholder, company, partnership or some other business structure owns the majority quote of a firm and has a relevant position in the network so that can control it and can influence the strategic decision during the official voting.

From this point of view we are interested in what we can call the hierarchical struc-ture and in which ways it can be identified: who stay at the top of the ’chain’ and has the greater power and control in term of possession? In this thesis we do not focus on the control computed through non topological state variable like vi cited

above, but we will focus on a topological notion of control or a control based on the percentage of ownership i.e. the quantity stored in the Wij matrix.

We now review four articles that regard typical and advanced studies of this argu-ment. Not all the techniques and the studies we are going to talk about are dealt with in this thesis but some of them help us to give a clear framework of the problem and of the results reached with other works.

(29)

2.3. CONTROL IN NETWORK AND HIERARCHY IDENTIFICATION 29 In these articles review we try to unify the formalism and the language to simplify the comparison between different types of analysis.

2.3.1 The bow-tie structure

The bow-tie structure (see Fig. 2.3) is one of the most used partition of networks. We do not explicitly study it in our thesis, but the layer partition, which we are going to talk about and to use heavily, can be compared with it (see Fig. 2.5). In [10] the authors apply the bow-tie partition to the control network of transnational corporations (an example of ownership graph). They investigate the architecture of the international ownership network, along with the computation of the control held by each global player.

Figure 2.3: A Bow-Tie structure: an example of bow-tie structure in which are clearly visible the five parts of which it is composed. The flow of control goes in the opposite direction of the edges so from the OUT set to the IN set; see the text for a detailed description. Taken from [10].

Generally the bow-tie partition is applied to the giant weakly connected compo-nent. This partition consists of five structures [19] (see Fig. 2.3):

• The SCC, strongly connected component, presents in the GCC (see Cap.2 Sec.1.5 for the definition). ”The SCC has explanations such as anti-takeover4 strategies, reduction of transaction costs, risk sharing, increasing trust and groups of interest” [19]. This kind of structure weakens market competition. • The IN set contains nodes that can reach the SCC but can not be reached

by any node in the SCC;

• The OUT set contains nodes that can not reach the SCC but can be reached by any node in the SCC;

• The Tendrils hang off the IN set and the OUT set, and contain nodes that are reachable from some nodes of the IN set or that can reach nodes of the OUT set, without passing through the SCC;

4_{A takeover is a form of an acquisition, where the company makes an offer to purchase a certain} value of the equity of another company to exercise complete control over it. Anti-takeover are tactics to resist to the takeover

(30)

30CHAPTER 2. OWNERSHIP NETWORKS AND HIERARCHY DETECTION • The Tubes contain nodes that passage from the IN set to the OUT set without

touching the SCC.

In the cited article ([10]), for example, the original network includes 600,508 nodes and 1,006,987 ownership links and they found a giant connected component that contains 3/4 of all nodes: this giant component contains only one dominant strongly connected component (with 1,347 nodes); this core is very small compared to the size of the other bow-tie structures, while the most larger structure is the OUT set (with 324,561 nodes).

2.3.2 The network control via non-topological values

As said above, we can define the control through the non-topological values as, for example, the market capitalization value of firms. There are several, but often similar, definitions of network control and it is a concept used in three of the articles we are discussing [9], [10], [3]. The definition we illustrate below is reported and studied in [3].

We define the portfolio of the shareholder i as:

pi = N

X

j=1

Wijvj (2.7)

where vj is the market capitalization of firm j.

To define the integrated portfolio5 _{we should use the integrated ownership matrix}

Oij defined in Eq.2.6: ˜ pi = N X j=1 Oijvj (2.8)

To define the control ci of a certain shareholder the authors introduce a quantity

called the fraction of control :

Hij = W2 ij PN l=1Wlj2 (2.9) This quantity has values between (0, 1] so, for example, Hij ' 1 means that i is

the most important source node for j. We can now define the control as:

ci = N

X

j=1

Hijvj (2.10)

A high ci value means that there is the possibility to control a portfolio with, in

this case, a big market capitalization value. We can generalize this definition to the integrated ownership graph so that we have the integrated control through directed 5_{Namely the portfolio computed taking into account directed and undirected ownerships. Often} this quantity is called the inflow of node i, and indicated with φi

(31)

2.3. CONTROL IN NETWORK AND HIERARCHY IDENTIFICATION 31 and undirected ownership. The integrated version of Hij can be found by solving

Eq. 2.3 for Hij 6:

˜

H = (I − diag( ˜H))H + ˜HH (2.11)

and for the integrated control we have:

˜ ci = N X j=1 ˜ Hijvj (2.12)

Once we have defined the integrated control, we can explain the procedure that the authors use in [3] to extract the backbone of corporate control. The steps are the following:

1. Compute the integrated control value ˜ci for every shareholder and fix a

thresh-old value for the ownership percentage, for example 50%, so that any share-holder with Wij > 0.5 controls by default;

2. Identify the shareholder having the highest ˜civalue and extract from its

portfo-lio the stocks that are owned at more than the said 50%; then the shareholder with the second highest value of ˜ci is selected and we extract, in addition to

the stocks owned by this shareholder at more than the 50%, the stocks which are cumulatively owned by the top two shareholders at more than the fixed threshold value.

3. At every step we both update Uin(n) i.e. the set of indices of the stocks that are

individually held above the threshold value by the n selected top shareholders, and Ucu(n) i.e. the set of indices of the cumulatively controlled companies;

moreover we compute the portfolio value: vcu = X j∈Uin(n) vj + X j∈Ucu(n) vj (2.13)

4. Called ntot the total number of shareholders and vtot the total market value

we normalize n and vcu in this way:

η(n) = n ntot

, θ(n) = vcu vtot

, η, θ ∈ (0, 1] (2.14)

5. Construct a Lorenz-like curve7 plotting on the x-axis the quantity η and on the y-axis the quantity θ: so, for example, a coordinate pair with value (102_,

0.5) reveals that the top 1% of shareholders cumulatively control 50% of the total market value.

6. The last step consists in setting a threshold for the percentage of jointly con-trolled market value ˆθ, e.g. ˆθ = 0.8, so that we can identify the percentage ˆη of shareholders that theoretically hold the power to control this market value 6_{An alternative way to define ˜}_{H, that may gives a different results, is: ˜}_H

ij = O2

ij

Pkinj l=1Olj2

7_{In economics, the Lorenz curve is a graphical representation of the distribution of income or} of wealth (Wikipedia).

(32)

32CHAPTER 2. OWNERSHIP NETWORKS AND HIERARCHY DETECTION if they were to coordinate their activities and their voting decisions. The subnetwork of these power holders and their portfolios is called the backbone.

Figure 2.4: Cumulative Control: (Color online) Fraction of shareholders η, sorted by descending (integrated) control value ˜ci, cumulatively controlling θ percent of the total market value; the horizontal line denotes a market value of 80%; the diagram is in semilogarithmic scale. Taken from [3]

The backbone extraction, could be an interesting upgrade of our thesis work and a way to improve the concept of hierarchy (see next section) from the point of view of non-topological values.

2.3.3 Hierarchy : the topological layers

We show now a different partition of the network respect to the one presented in the previous section of bow-tie. With this partition we want to identify the hierarchical structure in the network by ranking the nodes.

Given a network G(N , L), a rank function r: N → {1, ..., R} associates each node to an integer number which indicates the position of the subset or Layer, containing the node in the hierarchy. R corresponds to the number of subsets found by the ranking algorithm and it is not fixed a-priori.

The nodes which have the smallest value of the ranking variable r are considered as the most important nodes because they own or control the underlying nodes i.e. nodes with a larger ranking value.

In this thesis we use two types of ranking to construct the layer structure and one of the main problem we want to solve is understand the differences between these two types of ranking.

The one reported in this subsection is called Topological (see Fig. 2.5) and it is studied in [9].

Computationally speaking the procedure of ranking and layers extraction follows those two steps:

• Step 1: starts from the nodes with zero in-degree. To these nodes is assigned r = 1 i.e. they are put in the first layer. Then they are removed from the graph and is assigned r = 2 to the nodes that now have zero in-degree. This procedure is repeated until the algorithm does not find anymore nodes with this property. The subgraph V of the nodes not yet assigned to layers contains the bottom layers and eventually the recursive layer (see below);

(33)

2.3. CONTROL IN NETWORK AND HIERARCHY IDENTIFICATION 33 • Step 2: starts from the set V and the nodes with zero out-degree are assigned to the bottom layer; then these nodes are removed from the graph and the algorithm proceeds as in the step 1 but considering the out-degree instead of the in-degree. If there are some nodes not ranked with step 1 or step 2, they end up in the recursive layer.

Formally a partition of the nodes set N into R sets L1, ..., LR called Layers, is

a layered partition if, for r ∈ [1, R], all nodes in layer Lr have all incoming edges

from nodes in lower layers L1, ..., Lr−1 except for at most one r ∈ [1, R] for which

all nodes in Lr have all incoming edges from nodes in layers L1, ..., Lr. Layer r is

called the recursive layer (it is not always present).

Figure 2.5: Layers in ownership graph: an example of layers decomposition in an ownership graph; on the right the nodes are closed in a kind of bow-tie structure. Taken from [9]

The layers decomposition is closely related to the bow-tie decomposition: the layers L1, ..., Lr−1 looks like the IN set of the bow-tie structure, the recursive layer

Lr looks like the SCC, and the Lr+1, ..., LR looks like the OUT set. Obviously there

can be some differences: Lr is not necessary strongly connected; some of the nodes

with r > r can be placed in the IN-tendril while some of the nodes with r < r can be placed in the OUT-tendril; finally if a chain does not have nodes in the recursive layer it is spread along the path IN-set/tubes/OUT-set.

This kind of partition and ranking helps us to uncover the hierarchical structure hidden beyond the network. As we said above, this is not the only way in which we can rank the nodes and in the next subsection we present another method.

2.3.4 Hierarchy : the agony layers

We show now another kind of layer partitioning based on a different ranking algo-rithm studied in [11] (see also [22]).

The purpose of this partition is again the identification of the hierarchical structure. This algorithm tries to minimize a penalty function, Af(G, r), whose value, for a

given ranking, is called Agony (see [11] and [13]): this kind of ranking is often used for many application as social network analysis, biology, economics, and finance. With this ranking we want to find an optimal hierarchical structure in which the presence of backward links is suitably penalized: with backward link we mean link that starts from a node with a certain ranking value rf and ends in a node with

(34)

34CHAPTER 2. OWNERSHIP NETWORKS AND HIERARCHY DETECTION important node.

This penalization depends on the number of backward links and on the rank dis-tance between the nodes connected through them.

Given a graph G = (N , L) and a ranking r the value of agony with respect to r is given by [11]:

Af(G, r) =

X

(u,v)∈L

f (r(u) − r(v)) (2.15)

where f is a penalty function.

If the network is weighted we can define a weighted version of agony as [20]: Af(G, r) =

X

(u,v)∈L

Wuvf (r(u) − r(v)) (2.16)

We can define the penalty function as:

fd(r(u) − r(v)) =

(

(r(u) − r(v) + 1)d _{r(u) − r(v) ≥ 0}

0 r(u) − r(v) < 0 d ≥ 0 (2.17)

In our work we use the penalty function of Eq. 2.17 with d = 1 as used by Nikolaj Tatti in [13]:

f (r(u) − r(v)) = max(r(u) − r(v) + 1, 0) (2.18) As we see from the definition of f , we have a penalization even if there are links between nodes with the same ranking value: this is because, otherwise, the optimal partition would always be the trivial one with all the nodes in the same layer. The exponent d in Eq. 2.17 acts as a tuning parameter: if d increases only ranking with a strong hierarchical structure are privileged over the trivial one.

The agony of the network is the minimum value of agony respect to all possible rankings of the nodes (since we fixed the penalty function from now on we omit the subscript f for the Agony quantity), i.e. [11]:

A∗(G) = min

r∈R A(G, r) (2.19)

where R denotes the set of all rankings.

Thank to the minimization (see Eq. 2.19) agony of the graph is included between:

0 ≤ A∗(G) ≤ L (2.20)

where L is still the total number of links and the maximal value is reached with the trivial partition.

We can define a quantity, called hierarchy, for the direct graph as: h∗(G) = 1 − A

∗_(G)

L (2.21)

So the hierarchy is between 0 ≤ h∗(G) ≤ 1 and h∗(G) = 1 indicates a perfect hierarchy.

(35)

2.3. CONTROL IN NETWORK AND HIERARCHY IDENTIFICATION 35

(a) random rapresentation (b) random representation

(c) minimal agony (d) minimal agony

Figure 2.6: Optimal rank and agony (d = 1) for simple graph: in (a) and (b) are represented two graphs without any ordering; in (c) and (d) are represented respectively, the same graphs of figure (a) and (b) but with the nodes divided according to their ranks. The red links represent the backward ones while the black represent the forward ones. Taken from [11].

In Fig. 2.6 we report two examples of graphs (left and right column) represented both with the nodes organized according to their ranks (bottom figures) and without any ordering (upper figures). In Fig. 2.6c we have A∗ = 0 and so h∗ = 1 in fact we do not have backward links. In Fig 2.6d we have 8 backward links, 5 of which are between nodes with the same ranking value and the remaining 3 are between nodes with a difference of one in the ranking value so we have A∗ = 11 and h∗ = 0.5.

Finding the optimal ranking is quite complex specially for d > 1. In case of d = 1, our case, few exact algorithms to identify the optimal partition that minimizes agony are known [13]. Moreover this case is proven to be solved by an algorithm of poly-nomial complexity.

Once we have found the optimal ranking we have r ∈ [1, R] and the sets L1, ..., LR: as for the topological partition, in the first layer L1 we will put all

the nodes with r = 1 in the second all the nodes with r = 2 and so on.

There are two clearly differences between the agony ranking and the topological ranking:

• A single recursive layer does not exist in the agony ranking since al the layers can be recursive while, in the topological, ranking if a recursive layer exist, it is unique;

• The optimal ranking may be not unique in this case and there could be some degenerate cases in which the minimized agony value A∗ is still the same but the rankings are different. Instead in the topological ranking there is only one possible result.

(36)

36CHAPTER 2. OWNERSHIP NETWORKS AND HIERARCHY DETECTION We deal with these differences later on in Chapter 5 in which we will make a com-parison between the topological layers and the agony layers.

2.4 Shareholders agreements

Discussing hierarchy an interesting structure is the one formed by the Shareholders agreement : in these structures firms are usually owned by a small number of share-holders, called parent shareshare-holders, which own and control them together. Some-times those shareholders are family members who inherited their shares from rela-tives (see [14]) but they can also be companies in partnership with each other [9]. We deal with these shareholders agreements because they are interesting from the point of view of control and hierarchical structure. In particular we are interested in the shareholder hidden agreement in which the parent shareholders hide the com-mon ownerships through a chain of shareholdings. The parent shareholders often organize the ownership of the firms in a pyramidal structure, i.e. they achieve con-trol of the firms by a chain of ownership relations (for an example see Fig. 2.2d). A typical justification for the existence of these structures argues that pyramids are formed to allow the parent shareholders to achieve control of a firm trough a small ownership percentage. For instance, if a parent shareholder directly owns 51% of a firm that in turn owns 51% of a different firm, it also has the control of the latter firm with an ownership percentage of 26%. Securing control through such pyramidal structures can be advantageous for the parent shareholders especially when private benefits of control are large [14].

Since we do not know a-priori how these shareholders agreement are structured, we have to proceed with a rather general approach to find them. So, this kind of inves-tigation often starts with the launch of a clustering algorithm over the ownership network because we expect that shareholders with similar property of shares should fall in the same cluster [9]. In our work, we will follow that way and we deal with this problem in Chapter 6. Also the authors of [9] started this detection launching a clustering algorithm, the Markov Cluster Algorithm (MCL), over the ownership network; once they found the clusters they ranked the importance of each cluster by the number of parent shareholders in the cluster: if there is only one parent shareholder, then the cluster is a corporate group or a part of it.

Fig. 2.7 shows a family business group i.e. a shareholders agreement in which the parent shareholders, yellow nodes, are family members (see [9]). The nodes are di-vided according to their topological ranking value. Clearly even shareholders agree-ments in which the parent shareholders are not natural person can look like this one.

Figure 2.7: A family business group (shareholders agreement): the nodes are represented in their topological layers. The three yellow nodes (layer 1) are family members who own about 1/3 shares respectively of each company at layer 2. The two companies at layer 3 are owned directly by two family members and by the company shown in red at layer 2. Taken from [9]

Hierarchical structures in ownership networks

Facolt`

a di Scienze Matematiche, Fisiche e Naturali

Dipartimento di Fisica

Corso di Laurea Magistrale in Fisica

Hierarchical structures in ownership

networks

Relatori:

Prof. Fabrizio Lillo

Prof. Salvatore Ruggieri

Prof. Riccardo Mannella

Candidato:

Giulio Bartali

Sessione autunnale

Anno Accademico 2018/2019

Contents

Introduction

Chapter 1

Network science

1.1

General definitions

1.2

Degree of a node

1.3

Random model: properties and distributions

1.4

The scale free property

1.5

Connected components

1.5.1

The birth of the giant component in Erd˝

os-R´

enyi model

1.6

Clustering coefficient

1.7

Assortativity

1.8

Centrality measures

1.9

Communities

1.10

Motifs

Chapter 2

Ownership Networks and

hierarchy detection: state of art

2.1

Ownership network

2.2

The integrated ownership graph

2.2.1

Solution of the integrated ownership problem

2.3

Control in network and hierarchy

identifica-tion

2.3.1

The bow-tie structure

2.3.2

The network control via non-topological values

2.3.3

Hierarchy : the topological layers

2.3.4

Hierarchy : the agony layers

2.4

Shareholders agreements