N ETWORK S CIENCE

(1)

Prof. Marcello Pelillo

Ca’ Foscari University of Venice

a.y. 2016/17

N ^ETWORK S ^CIENCE

Scale-free Networks

(2)

The power law distribution: Discrete vs. Continuous formalism

Discrete Formalism

As node degrees are always positive integers, the discrete formalism captures the

probability that a node has exactly k links:

p _k = Ck ^−γ .

∞

∑ k=1

p _k = 1.

C ∑ ^∞

k=1

k ^−γ = 1 C = 1 _∞

∑ k=1

k ^−γ = 1 ζ(γ) ,

p _k = k ^−γ ζ(γ)

Continuous Formalism

In analytical calculations it is often convenient to assume that the degrees can take up any

positive real value:

p(k) = Ck ^−γ .

C = 1

∞ k ∫

_min

k ^−γ dk

= (γ − 1)k _min ^γ−1

p(k) = (γ − 1)k _min ^γ−1 k ^−γ .

p _k

INTERPRETATION:

∞ k ∫

_min

p(k)dk = 1

Riemann-Zeta function

(3)

a)  Numbers of occurrences of words in the novel Moby Dick by Hermann Melville.

b)  Numbers of citations to scientic papers published in 1981, from time of publication until June 1997.

c)  Numbers of hits on web sites by 60000 users of the America Online Internet service for the day of 1 December 1997.

d)  Numbers of copies of bestselling books sold in the US between 1895 and 1965.

e)  Number of calls received by AT&T telephone customers in the US for a single day.

f)  Magnitude of earthquakes in California between

January 1910 and May 1992. Magnitude is proportional to the logarithm of the maximum amplitude of the earthquake, and hence the distribution obeys a power law even though the horizontal axis is linear.

g)  Diameter of craters on the moon. Vertical axis is measured per square kilometre.

h)  Peak gamma-ray intensity of solar ares in counts per second, measured from Earth orbit between February 1980 and November 1989.

i)  Intensity of wars from 1816 to 1980, measured as battle deaths per 10 000 of the population of the participating countries.

j)  Aggregate net worth in dollars of the richest individuals in the US in October 2003.

k)  Frequency of occurrence of family names in the US in the year 1990.

l)  Populations of US cities in the year 2000.

From: Newman 2006

Power law

(4)

Vilfredo Pareto (1848 – 1923) , Italian economist, political scientist and philosopher, who had

important contributions to our understanding of income distribution and to the analysis of individuals choices.

A number of fundamental principles are named after him, like Pareto efficiency, Pareto distribution (another name for a power-law distribution), the Pareto principle (or 80/20 law).

“80% of the wealthis in the hands of the richest 20% of people.”

The 80/20 rule

Other examples

•  80% of problems can be attributed to 20% of causes.

•  80% of a company's profits come from 20% of its customers

•  80% of a company's complaints come from 20% of its customers

•  80% of a company's profits come from 20% of the time its staff spent

•  80% of a company's revenue comes from 20% of its products

•  80% of a company's sales are made by 20% of its sales staff

•  …

(5)

WORLD WIDE WEB

Snapshots of the World Wide Web sample mapped out by Hawoong Jeong in 1998 [1]. The sequence of images show an increasingly magnified local region of the network. The first panel displays all 325,729 nodes, offering a global view of the full dataset. Nodes with more than 50 links are shown in red and nodes with more than 500 links in purple. The closeups reveal the presence of a few highly connected nodes, called hubs, that accompany scale-free networks.

(6)

Nodes: WWW documents Links: URL links

Over 3 billion documents ROBOT: collects all URL’s found in a document and follows them recursively

Ex p e c te d

R. Albert, H. Jeong, A-L Barabasi, Nature, 401 130 (1999).

WORLD WIDE WEB

Network Science: Scale-Free Property

(7)

Hubs

Section 3

(8)

THE SCALE-FREE PROPERTY 10

Poisson vs. Power-law Distributions Figure 4.4

(d) (

b)

(

a)

(

c)

(a) Comparing a Poisson function with a power-law function ( = 2.1) on a linear plot.

Both distributions have k = 10.

(b) The same curves as in (a), but shown on a log-log plot, allowing us to inspect the difference between the two functions in the high-

k

regime.

(c) A random network with k = 3 and

N

= 50, illustrating that most nodes have comparable degree k k .

(d) A scale-free network with =2.1 and k = 3, illustrating that numerous small-degree nodes coexist with a few highly connected hubs.

The Largest Hub

All real networks are finite. The size of the WWW is estimated to be N 10

¹²

nodes; the size of the social network is the Earth’s population, about N 7 × 10

⁹

. These numbers are huge, but finite. Other networks pale in com- parison: The genetic network in a human cell has approximately 20,000 genes while the metabolic network of the E. Coli bacteria has only about a thousand metabolites. This prompts us to ask: How does the network size affect the size of its hubs? To answer this we calculate the expected maxi- mum degree, k

_max

, called the natural cutoff of the degree distribution p

_k

. It represents the expected size of the largest hub in a network.

It is instructive to perform the calculation first for the exponential dis- tribution

For a network with minimum degree k

_min

, the normalization condition

provides C = e

^k^min

. To calculate k

_max

we assume that in a network of N nodes we expect at most one node in the (k

_max

, ∞) regime (ADVANCED TOPICS 3.B). In other words the probability to observe a node whose degree exceeds k

_max

is 1/N:

(4.16) (4.15)

∫

k^∞_min

^{p k dk} ^{( )} ⁼ ¹

∫

k^∞_max

^{p k dk} ^{( )} ⁼ _N ^{1 .}

10⁰

0 10 20 30 40 50

0.05 0.1 0.15

10^-6 10⁰ 10^-1 10^-2 10^-3 10^-4 10^-5

10¹ 10² 10³ POISSON

k k

p_k p_k

p_k ~ k^-2.1

POISSON p_k~ k^-2.1

10⁰

0 10 20 30 40 50

0.05 0.1 0.15

10^-6 10⁰ 10^-1 10^-2 10^-3 10^-4 10^-5

10¹ 10² 10³ POISSON

k k

p_k p_k

p_k ~ k^-2.1

POISSON p_k~ k^-2.1

10⁰

0 10 20 30 40 50

0.05 0.1 0.15

10^-6 10⁰ 10^-1 10^-2 10^-3 10^-4 10^-5

10¹ 10² 10³ POISSON

k k

p_k p_k

p_k ~ k^-2.1

POISSON p_k~ k^-2.1

10⁰

0 10 20 30 40 50

0.05 0.1 0.15

10^-6 10⁰ 10^-1 10^-2 10^-3 10^-4 10^-5

10¹ 10² 10³ POISSON

k k

p_k p_k

p_k ~ k^-2.1

POISSON p_k~ k^-2.1

p(k) = Ce

^k

.

The difference between a power law and an exponential distribution

(9)

The difference between a power law and an exponential distribution

Let us use the WWW to illustrate the properties of the high-k regime.

The probability to have a node with k~100 is

•  About if p

_k

follows a Poisson distribution

•  About if p

_k

follows a power law.

Consequently, if the WWW were to be a random network, according to the Poisson prediction we would expect 10

^-18

k>100 degree nodes, or none.

For a power law degree distribution, we expect about k>100 degree nodes

p ₁₀₀ ≃ 10 ⁻³⁰ p ₁₀₀ ≃ 10 ⁻⁴

N _k>100 = 10 ⁹

(10)

THE SCALE FREE PROPERTY 12 HUBS (a) The degrees of a random network follow a

Poisson distribution, rather similar to the Bell curve. Therefore most nodes have comparable degrees and nodes with a large number of links are absent.

(b) A random network looks a bit like the na- tional highway network in which nodes are cities and links are the major highways connecting them. There are no cities with hundreds of highways and no city is disconnected from the highway system.

(c) In a network with a power-law degree dis- tribution most nodes have only a few links.

These numerous small nodes are held togeth- er by a few highly connected hubs.

(d) A scale-free network looks like the air-traf- fic network, whose nodes are airports and links are the direct flights between them. Most airports are tiny, with only a few flights. Yet, we have a few very large airports, like Chicago or Los Angeles, that act as major hubs, connecting many smaller airports to each other.

Once hubs are present, they change the way we navigate the network. For example, if we travel from Boston to Los Angeles by car, we must drive through many cities. On the air- plane network, however, we can reach most destinations via a single hub, like Chicago.

After [4].

Figure 4.6

Random vs. Scale-free Networks

No highly connected nodes

A few hubs with large number of links Number of nodes with k linksNumber of nodes with k links

Number of links (k)

Number of links (k) Many nodes with only a few links Most nodes have the same number of links POISSON

POWER LAW

No highly connected nodes

Number of links (k)

Number of links (k) Many nodes with only a few links Most nodes have the same number of links POISSON

POWER LAW

(b)

(d) No highly

connected nodes

Number of links (k)

Number of links (k) Many nodes with only a few links Most nodes have the same number of links

POISSON

POWER LAW (a)

(c)

Boston Boston Chicago

Chicago Los Angeles

Los Angeles

The difference between a power law and an exponential distribution

(11)

All real networks are finite à let us explore its consequences.

à We have an expected maximum degree, k

_max

Estimating k

_max

P(k)dk

k_max

∞

∫ ^≈ _N ¹

k _max = k _min N

1 γ −1

Why: we expect at most one node with degree > k

_max

(natural upper cutoff)

P(k)dk

k_max

∞

∫ ^{= (} ^γ ^−1)k

^min

^γ

⁻¹

^k

⁻

^γ ^dk

k_max

∞

∫ ⁼ ₍ ⁽ ₋ ^γ _γ ⁻¹⁾ ₊₁₎ ^k

^min

^γ

⁻¹

^⎡⎣ ^k

⁻

^γ

⁺¹

^⎤⎦

^kmax

∞

= k

_min

^γ

⁻¹

k

_max

^γ

⁻¹

≈ 1

N

The size of the largest hub

(12)

k _max = k _min N

1 γ −1 The size of the largest hub

To illustrate the difference in the maximum degree of an exponential and a scale-free network let us return to the WWW sample, consisting of N ≈ 3 × 10

⁵

nodes.

As k

_min

= 1, if the degree distribution were to follow an exponential, (4.17) predicts that the maximum degree should be k

_max

≈ 14 for λ=1. In a scale-free network of similar size and γ = 2.1, (4.18) predicts k

_max

≈ 95,000, a remarkable difference.

Note that the largest in-degree of the WWW map of Image 4.1 is 10,721, which is comparable to kmax predicted by a scale-free network.

This reinforces our conclusion that in a random network hubs are effectivelly

forbidden, while in scale-free networks they are naturally present.

(13)

Expected maximum degree, k

_max

k

_max

= k

_min

N

1

γ

−1

k

_max

increases with the size of the network

the larger a system is, the larger its biggest hub γ > 2: k

_max

increases slower than N

The largest hub will contain a decreasing fraction of links as N increases.

γ = 2: k

_max

~ N

The size of the biggest hub is O(N)

γ < 2: k

_max

increases faster than N: condensation phenomena

The largest hub will grab an increasing fraction of links. Anomaly!

The size of the largest hub

(14)

k

_max

= k

_min

N

1

γ

−1

THE SCALE FREE PROPERTY 11 HUBS

The expected degree of the largest node (natu- ral cutoff) in scale-free and random networks with the same average degree k = 3. For the scale-free network we chose = 2.5. For com- parison, we also show the linear behavior, k _max N − 1, expected for a complete network.

Overall, hubs in a scale-free network are sev- eral orders of magnitude larger than the big- gest node in a random network with the same N and k .

Figure 4.5

Hubs are Large in Scale-free Networks

(4.18)

k _max ~ k N _min γ − ¹ ¹ .

Equation (4.16) yields

As lnN is a slow function of the system size, (4.17) tells us that the max- imum degree will not be significantly different from k _min . For a Poisson degree distribution the calculation is a bit more involved, but the obtained dependence of k _max on N is even slower than the logarithmic dependence predicted by (4.17) ( ADVANCED TOPICS 3.B ).

For a scale-free network, according to (4.12) and (4.16) , the natural cutoff follows

Hence the larger a network, the larger is the degree of its biggest hub.

The polynomial dependence of k _max on N implies that in a large scale-free network there can be orders of magnitude differences in size between the smallest node, k _min , and the biggest hub, k _max ( Figure 4.5 ).

To illustrate the difference in the maximum degree of an exponential and a scale-free network let us return to the WWW sample of Figure 4.1 , consisting of N 3 × 10 ⁵ nodes. As k _min = 1, if the degree distribution were to follow an exponential, (4.17) predicts that the maximum degree should be k _max 13. In a scale-free network of similar size and = 2.1, (4.18) predicts k _max 85,000, a remarkable difference. Note that the largest in-degree of the WWW map of Figure 4.1 is 10,721, which is comparable to k _max predicted by a scale-free network. This reinforces our conclusion that in a random network hubs are effectivelly forbidden, while in scale-free networks they are naturally present.

In summary the key difference between a random and a scale-free net- work is rooted in the different shape of the Poisson and of the power-law function: In a random network most nodes have comparable degrees and hence hubs are forbidden. Hubs are not only tolerated, but are expected in scale-free networks ( Figure 4.6 ). Furthermore, the more nodes a scale- free network has, the larger are its hubs. Indeed, the size of the hubs grows polynomially with the network size, hence they can grow quite large in scale-free networks. In contrast in a random network the size of the larg- est node grows logarithmically or slower with N, implying that hubs will be tiny even in a very large random network.

k _max

N

10

⁰

10

²

10

⁴

10

⁶

10

⁸

10

¹⁰

10

¹²

10

¹

10

²

10

³

10

⁴

10

⁵

10

⁷

10

⁸

10

⁹

10

¹⁰

RANDOM NETWORK SCALE-FREE

(N - 1)

k _max ~ InN

k _max ~ N

⁽¹⁾

k _max = k _min + ln N

. _(4.17)

The size of the largest hub

The estimated degree of the largest node in scale-free and random networks with the

same average degree

〈 k〉= 3.

For the scale-free network we chose γ = 2.5. For comparison, we also show the linear behavior, k

_max

~ N − 1, expected for a complete network.

Overall, hubs in a

scale-free network are several orders of

magnitude larger than

the biggest node in a

random network with

the same N and 〈k〉

(15)

The meaning of scale-free

For many scale-free networks the degree exponent is between 2 and 3 (Table 4.1). Hence for these in the N ∞ limit the first moment k is finite, but the second and higher moments, k

²

, k

³

, go to infinity. This diver- gence helps us understand the origin of the “scale-free” term. Indeed, if the degrees follow a normal distribution, then the degree of a randomly chosen node is

Yet, the average degree <k> and the fluctuations

_k

have rather different magnitude in random and in scale-free networks:

• Random Networks Have a Scale

For a random network with a Poisson degree distribution

_k

= <k>

^1/2

which is always smaller than k . Hence the network’s nodes have de- grees in the range k = k ± k

^1/2

. In other words nodes in a random network have comparable degrees. Therefore the average degree k serves as the “scale” of a random network.

• Scale-free Networks Lack a Scale

For a network with a power-law degree distribution with < 3 the first moment is finite but the second moment is infinite. The divergence of k

²

(and of

_k

) for large N indicates that the fluctuations around the average can be arbitrary large. This means that when we random- ly choose a node, we do not know what to expect: the selected node’s degree could be tiny or arbitrarily large. Hence networks with < 3 do not have a meaningful internal scale. They are “scale-free” (Figure 4.7).

For example the average degree of the WWW sample is k = 4.60 (Ta- ble 4.1). Given that 2.1, the second moment diverges, which means that our expectation for the in-degree of a randomly chosen WWW document is k =4.60 ± ∞ in the N ∞ limit. That is, a randomly chosen web document could easily yield a document of degree one or two, as 74.02% of nodes have in-degree less than k . Yet, it could also yield a node with hundreds of millions of links, like google.com or facebook.com.

Strictly speaking k

²

diverges only in the N ∞ limit. Yet, the diver- gence is relevant for finite networks as well. To illustrate this, Table 4.1 lists

k

²

and Figure 4.8 shows the standard deviation for ten real networks.

For most of these networks is significantly larger than k , documenting large variations in node degrees. For example, the degree of a randomly chosen node in the WWW sample is k

_in

= 4.60 ± 1545, indicating once again that the average is not informative.

In summary, the scale-free name captures the lack of an internal scale, a consequence of the fact that nodes with widely different degrees coexist in the same network. This feature distinguishes scale-free networks from lattices, in which all nodes have exactly the same degree ( = 0), or from random networks, whose degrees vary in a narrow range ( = k

^1/2

). As we will see in the coming chapters, this divergence is the origin of some of the (4.21)

THE SCALE FREE PROPERTY 14 THE MEANING OF SCALE-FREE

For any bounded distribution, like a Poisson or a Gaussian, the degree of a randomly cho- sen node is in the vicinity of k ^{. Hence} k

serves as the network’s scale. For a power law distribution the second moment can diverge, and the degree of a randomly chosen node can be significantly different from k ^{. Hence} k does not serve as an intrinsic scale . ^{As a}

network with a power law degree distribution lacks an intrinsic scale, we call it scale-free.

Figure 4.7

Lack of an Internal Scale

k = k ± σ

_k

k

²

k

²

σ = −

Random Network

Randomly chosen node:

Scale: k

Scale-Free Network

Randomly chosen node:

Scale: none

= ±

k k k

^1/2

= ± ∞ k k p

_k

k

(16)

The meaning of scale-free

For many scale-free networks the degree exponent is between 2 and 3 ( Table 4.1 ). Hence for these in the N ∞ limit the first moment k is finite, but the second and higher moments, k ² , k ³ , go to infinity. This diver- gence helps us understand the origin of the “scale-free” term. Indeed, if the degrees follow a normal distribution, then the degree of a randomly chosen node is

Yet, the average degree <k> and the fluctuations _k have rather different magnitude in random and in scale-free networks:

• Random Networks Have a Scale

For a random network with a Poisson degree distribution _k = <k> ^1/2 which is always smaller than k . Hence the network’s nodes have de- grees in the range k = k ± k ^1/2 . In other words nodes in a random network have comparable degrees. Therefore the average degree k serves as the “scale” of a random network.

• Scale-free Networks Lack a Scale

For a network with a power-law degree distribution with < 3 the first moment is finite but the second moment is infinite. The divergence of k ² (and of _k ) for large N indicates that the fluctuations around the average can be arbitrary large. This means that when we random- ly choose a node, we do not know what to expect: the selected node’s degree could be tiny or arbitrarily large. Hence networks with < 3 do not have a meaningful internal scale. They are “scale-free” ( Figure 4.7 ).

For example the average degree of the WWW sample is k = 4.60 ( Ta- ble 4.1 ). Given that 2.1, the second moment diverges, which means that our expectation for the in-degree of a randomly chosen WWW document is k =4.60 ± ∞ in the N ∞ limit. That is, a randomly chosen web document could easily yield a document of degree one or two, as 74.02% of nodes have in-degree less than k . Yet, it could also yield a node with hundreds of millions of links, like google.com or facebook.com.

Strictly speaking k ² diverges only in the N ∞ limit. Yet, the diver- gence is relevant for finite networks as well. To illustrate this, Table 4.1 lists k ² and Figure 4.8 shows the standard deviation for ten real networks.

For most of these networks is significantly larger than k , documenting large variations in node degrees. For example, the degree of a randomly chosen node in the WWW sample is k _in = 4.60 ± 1545, indicating once again that the average is not informative.

In summary, the scale-free name captures the lack of an internal scale, a consequence of the fact that nodes with widely different degrees coexist in the same network. This feature distinguishes scale-free networks from lattices, in which all nodes have exactly the same degree ( = 0), or from random networks, whose degrees vary in a narrow range ( = k ^1/2 ). As we will see in the coming chapters, this divergence is the origin of some of the (4.21)

THE SCALE FREE PROPERTY 14 THE MEANING OF SCALE-FREE

For any bounded distribution, like a Poisson or a Gaussian, the degree of a randomly cho- sen node is in the vicinity of k ^{. Hence} k

serves as the network’s scale. For a power law distribution the second moment can diverge, and the degree of a randomly chosen node can be significantly different from k . Hence

k does not serve as an intrinsic scale . ^{As a}

network with a power law degree distribution lacks an intrinsic scale, we call it scale-free.

Figure 4.7

Lack of an Internal Scale

k = k ± σ _k

k ² k ²

σ = −

Random Network

Randomly chosen node: Scale: k

Scale-Free Network

Randomly chosen node: Scale: none

= ±

k k k ^1/2

= ± ∞

k k

p _k

k k

For a random network the standard deviation follows σ = ‹k›

^1/2

shown as a green dashed line on the figure. The symbols show σ for nine of the ten

reference networks, calculated using the values shown in Table 4.1. The actor network has a very large 〈k〉 and σ, hence it omitted for clarity. For each network σ is larger than the value

expected for a random network with the same 〈k〉. The only exception is the power grid, which is not scale-free.

While the phone call network is scale-

free, it has a large γ, hence it is well

approximated by a random network.

(17)

universality

Section 5

(18)

(Faloutsos, Faloutsos and Faloutsos, 1999)

Nodes: computers, routers Links: physical lines

INTERNET BACKBONE

(19)

(20)

( γ = 3)

(S. Redner, 1998)

P(k) ~k ^-γ

1736 PRL papers (1988)

SCIENCE CITATION INDEX

Nodes: papers Links: citations

578...

25 H.E. Stanley,...

(21)

SCIENCE COAUTHORSHIP

M: math

NS: neuroscience

Nodes: scientist (authors) Links: joint publication

(Newman, 2000, Barabasi et al 2001)

(22)

Nodes: online user Links: email contact

Ebel, Mielsch, Bornholdtz, PRE 2002.

Kiel University log files 112 days, N=59,912 nodes

Pussokram.com online community;

512 days, 25,000 users.

Holme, Edling, Liljeros, 2002.

ONLINE COMMUNITIES

(23)

ONLINE COMMUNITIES

(24)

Not all networks are scale-free

Networks appearing in material science, like the network describing the bonds between the atoms in crystalline or amorphous materials, where each node has exactly the same degree.

The neural network of the C.elegans worm.

The power grid, consisting of generators

and switches connected by transmission

lines

(25)

Ultra-small property

Section 6

(26)

DISTANCES IN RANDOM GRAPHS

Random graphs tend to have a tree-like topology with almost constant node degrees.

•  nr. of first neighbors:

•  nr. of second neighbors:

• nr. of neighbours at distance d:

•  estimate maximum distance:

k log

N l _max = log

∑ =

= +

l

max

1 l

i N

k 1

k N ₁ ≅

2 2 k

N ≅

€

N _d ≅ k ^d

(27)

Distances in scale-free networks

Size of the biggest hub is of order O(N). Most nodes can be connected within two layers of it, thus the average path length will be independent of the system size.

The average path length increases slower than logarithmically. In a random network all nodes have comparable degree, thus most paths will have comparable length. In a scale-free network the vast majority of the path go through the few high degree hubs, reducing the distances between nodes.

Some key models produce γ=3, so the result is of particular importance for them. This was first derived by Bollobas and collaborators for the network diameter in the context of a dynamical model, but it holds for the average path length as well.

The second moment of the distribution is finite, thus in many ways the network behaves as a random network. Hence the average path length follows the result that we derived for the random network model earlier.

Cohen, Havlin Phys. Rev. Lett. 90, 58701(2003); Cohen, Havlin and ben-Avraham, in Handbook of Graphs and Networks, Eds. Bornholdt and Shuster (Willy-VCH, NY, 2002) Chap. 4; Confirmed also by: Dorogovtsev et al (2002), Chung and Lu (2002); (Bollobas, Riordan, 2002; Bollobas, 1985; Newman, 2001

Ultra Small World

Small World

SMALL WORLD BEHAVIOR IN SCALE-FREE NETWORKS

(28)

Distances in scale-free networks

SMALL WORLD BEHAVIOR IN SCALE-FREE NETWORKS

THE SCALE-FREE PROPERTY 21

ULTRA-SMALL WORLD PROPERTY

SECTION 4.6

The presence of hubs in scale-free networks raises an interesting ques- tion: Do hubs affect the small world property? Figure 4.4 suggests that they do: Airlines build hubs precisely to decrease the number of hops between two airports. The calculations support this expectation, finding that dis- tances in a scale-free network are smaller than the distances observed in an equivalent random network.

The dependence of the average distance d on the system size N and the degree exponent are captured by the formula [29, 30]

Next we discuss the behavior of d in the four regimes predicted by (4.22), as summarized in Figure 4.11:

According to (4.18) for = 2 the degree of the biggest hub grows linearly with the system size, i.e. k

_max

N. This forces the network into a hub and spoke configuration in which all nodes are close to each other because they all connect to the same central hub. In this regime the average path length does not depend on N.

Equation (4.22) predicts that in this regime the average distance increas- es as lnlnN, a significantly slower growth than the lnN derived for ran- dom networks. We call networks in this regime ultra-small, as the hubs radically reduce the path length [29]. They do so by linking to a large number of small-degree nodes, creating short distances between them.

(4.22)

d

N

N N N

~

const. =2, lnln

ln( 1) 2 3,

ln

lnln =3,

ln 3.

γ

γ γ

〈 〉 − < <

>

⎧

⎨

⎪ ⎪

⎪⎪

⎩

⎪ ⎪

The scaling of the average path length in the four scaling regimes characterizing a cale-free network: constant (γ = 2), lnlnN (2 ‹ γ ‹ 3), lnN/

lnlnN (γ = 3), lnN (γ › 3 and random networks).

The dotted lines mark the approximate size of several real networks. Given their modest size, in biological networks, like the human protein- protein interaction network (PPI), the

differences in the node-to-node distances are relatively small in the four regimes. The differences in 〈d〉 is quite significant for networks of the size of the social network or the WWW. For these the small-world formula significantly underestimates the real 〈d〉.

(b)-(d) Distance distribution for networks of size N = 102, 104, 106, illustrating that while for small networks (N = 102) the distance

distributions are not too sensitive to γ, for large networks (N = 106) pd and 〈d〉 change visibly with γ.

(29)

The role of the degree exponent

THE SCALE-FREE PROPERTY 25 THE ROLE OF THE DEGREE EXPONENT

ANOMALOUS REGIME

DIVERGES

GROWS FASTER THAN

1 2 3

A B

γ

SCALE-FREE REGIME

ULTRA-SMALL WORLD

SMALL WORLD

RANDOM REGIME

No large network can exist here

Indistinguishable from a random network

WWW (OUT)EMAIL (OUT)ACT OR

WWW (IN)MET AB. (IN)

MET AB. (OUT) PRO

TEIN (IN)

COLLABORA TION

INTERNETEMAIL (IN) CITATION (IN)

FINITE

DIVERGES

CRITICAL POINT

k

²

k k

k

²

FINITE

k k

²

d const

2

3 k_max N

k

_max N

d lnlnN d lnN

ln k

ln N lnln N

k

_max

N

^-1

1

FIG. 4.13

THE DEPENDENT PROPERTIES OF SCALE-FREE NETWORKS

(30)

Distances in scale-free networks

Graphicality: No large networks for γ<2

k _max = k _min N

1 In scale-free networks: γ −1 For γ<2: 1/(γ-2)>1

Networks With γ ‹ 2 are Not Graphical

Degree distributions and the corresponding degree sequences for two small networks. The difference between them is in the degree of a single node. While we can build a simple network using the degree distribution (a), it is impossible to build one using (b), as one stub always remains unmatched. Hence (a) is graphical, while (b) is not.

Fraction of networks, g, for a given γ that are graphical. A large number of degree sequences with degree

exponent γ and N = 105 were generated, testing the graphicality of each network. The figure indicates that while

virtually all networks with γ › 2 are graphical, it is impossible to find graphical networks in the 0 ‹ γ ‹ 2 range

(31)

Generating networks with predefined degree seq: Configuration model

(1)  Degree sequence: Assign a degree to each node, represented as stubs or half-links. The degree sequence is either generated

analytically from a preselected distribution, or it is extracted from the adjacency matrix of a real network. We must start from an even number of stubs, otherwise we will be left with unpaired stubs.

(2)  Network assembly: Randomly select a stub pair and connect them. Then randomly

choose another pair from the remaining stubs and connect them. This procedure is repeated until all stubs are paired up.

Depending on the order in which the stubs were chosen, we obtain different networks.

Some networks include cycles (2b), others self- loops (2c) or multi-edges (2d).

Yet, the expected number of self- and multi-edges goes to zero in the limit.

p _ij = k _i k _j 2L − 1

THE SCALE-FREE PROPERTY 28

GENERATING NETWORKS WITH ARBITRARY

DEGREE DISTRIBUTION

SECTION 4.8

The networks generated by the Erd s-Rényi model have a Poisson de- gree distribution. The empirical results discussed in this chapter indicate, however, that the degree distribution of real networks significantly devi- ates from a Poisson form, raising an important question: How do we gen- erate networks with an arbitrary p

_k

? In the following we discuss three fre- quently used algorithms designed for this purpose.

Configuration Model

The configuration model, described in Figure 4.15, helps us build a network with a pre-defined degree sequence. In the network generated by the model each node has a pre-defined degree k

_i

, but otherwise the network is wired randomly. Consequently the obtained network is often called a random network with a pre-defined degree sequence. By repeatedly apply- ing this procedure to the same degree sequence we can generate different networks with the same p

_k

( Figure 4.15b-d). A couple of caveats to consider:

• The probability to have a link between nodes of degree k

_i

and k

_j

is

.

Indeed, a stub starting from node i can connect to 2L - 1 other stubs. Of these, k

_j

are attached to node j. So the probability that a particular stub is connected to a stub of node j is k

_j

/(2L - 1). As node i has k

_i

stubs, it has k

_j

attempts to link to j, resulting in (4.24).

• The obtained network contains self-loops and multi-links, as there is nothing in the algorithm to forbid a node connecting to itself, or to generate multiple links between two nodes. We can choose to reject stub pairs that lead to these, but if we do so, we may not be able to complete the network. Rejecting self-loops or multi-links also means that not all possible matchings appear with equal probability. Hence (4.24) will not be valid, making analytical calculations difficult. Yet, the number of self-loops and multi-links goes to zero for large networks, as the number of choices to connect to increases with N, so typically we do not need to exclude them [42].

Figure 4.15

The Configuration Model

(4.24)

The configuration model builds a network whose nodes have pre-defined degrees [40, 41].

The algorithm consists of the following steps:

(a) Degree Sequence

Assign a degree to each node, represented as stubs or half-links. The degree sequence is either generated analytically from a preselected p

_k

distribution (BOX 4.5), or it is extracted from the adjacency matrix of a real network. We must start from an even number of stubs, otherwise we are left with unpaired stubs.

(b, c, d) Network Assembly

Randomly select a stub pair and connect them. Then randomly choose another pair from the remaining 2L - 2 stubs and con- nect them. This procedure is repeated until all stubs are paired up. Depending on the order in which the stubs were chosen, we obtain different networks. Some networks include cycles (b), others self-loops (c) or multi-links (d). Yet, the expected number of self-loops and multi-links goes to zero in the N ∞ limit.

(a) (b)

(c) (d)

k

₁

=3 k

₂

=2 k

₃

=2 k

₄

=1

= −

p k k

L

ij

2 1

i j

(32)

summary

Section 9

(33)

The Barabási-Albert model

Section 3

(34)

Section 1 Hubs represent the most striking difference between a random and a scale- free network. Their emergence in many real systems raises several

fundamental questions:

•  Why does the random network model of Erd ő s and Rényi fail to reproduce the hubs and the power laws observed in many real networks?

•  Why do so different systems as the WWW or the cell converge to a similar

scale-free architecture?

(35)

networks expand through the addition of new nodes

Barabási & Albert, Science 286, 509 (1999)

BA MODEL: Growth

BA model: Growth

ER model:

the number of nodes, N, is fixed (static models)

THE BARABÁSI-ALBERT MODEL 6

Nodes Prefer to Link to the More Connected Nodes

The random network model assumes that we randomly choose the interaction partners of a node. Yet, most real networks new nodes prefer to link to the more connected nodes, a process called preferential attach- ment (Figure 5.2).

Consider a few examples:

• We are familiar with only a tiny fraction of the trillion or more documents available on the WWW. The nodes we know are not entirely random: We all heard about Google and Facebook, but we rarely encoun- ter the billions of less-prominent nodes that populate the Web. As our knowledge is biased towards the more popular Web documents, we are more likely to link to a high-degree node than to a node with only few links.

• No scientist can attempt to read the more than a million scientific papers published each year. Yet, the more cited is a paper, the more likely that we hear about it and eventually read it. As we cite what we read, our citations are biased towards the more cited publications, repre- senting the high-degree nodes of the citation network.

• The more movies an actor has played in, the more familiar is a casting director with her skills. Hence, the higher the degree of an actor in the actor network, the higher are the chances that she will be considered for a new role.

In summary, the random network model differs from real networks in two important characteristics:

(A) Growth

Real networks are the result of a growth process that continuously increases N. In contrast the random network model assumes that the number of nodes, N, is fixed.

(B) Preferential Attachment

In real networks new nodes tend to link to the more connected nodes.

In contrast nodes in random networks randomly choose their interaction partners.

There are many other differences between real and random networks, some of which will be discussed in the coming chapters. Yet, as we show next, these two, growth and preferential attachment, play a particularly important role in shaping a network’s degree distribution.

Networks are not static, but grow via the addition of new nodes:

(a) The evolution of the number of WWW hosts, documenting the Web’s rapid growth. After http://www.isc.org/solu- tions/survey/history.

(b) The number of scientific papers published in Physical Review since the journal’s funding. The increasing number of papers drives the growth of both the science col- laboration network as well as of the citation network shown in the figure.

(c) Number of movies listed in IMDB.com, driving the growth of the actor network.

Figure 5.1

The Growth of Networks (a)

(b)

(c)

WORLD WIDE WEB

ACTOR NETWORK CITATION NETWORK

GROWTH AND PREFERENTIAL ATTACHMENT YEARS

1880 50000 100000 150000 200000 250000

0

1900 1920 1940 1960 1980 2000 2020

NUMBER OF MOVIES

YEARS

1880 1900 1920 1940 1960 1980 2000 2020 1982

0•10⁰ 1•10⁸ 2•10⁸ 3•10⁸ 4•10⁸ 5•10⁸ 6•10⁸ 8•10⁸ 9•10⁸ 1•10⁹

7•10⁸

1987 1992 1997 2002 2007 2012

NUMBER OF HOSTS

0 50000 100000 150000 200000 250000 300000 400000 450000 350000

NUMBER OF PAPERS

(36)

New nodes prefer to connect to the more connected nodes

Barabási & Albert, Science 286, 509 (1999) Network Science: Evolving Network Models

BA MODEL: Preferential attachment

BA model: Growth

ER model: links are added randomly to the network

(37)

Barabási & Albert, Science 286, 509 (1999) Network Science: Evolving Network Models

Section 2: Growth and Preferential Sttachment

BA model: Growth

The random network model differs from real networks in two important characteristics:

Growth: While the random network model assumes that the number of nodes is fixed (time invariant), real networks are the result of a growth process that continuously increases.

Preferential Attachment: While nodes in random networks randomly choose

their interaction partner, in real networks new nodes prefer to link to the more

connected nodes.

(38)

Barabási & Albert, Science 286, 509 (1999)

P(k) ~k

^-3

(1) Networks continuously expand by the addition of new nodes

WWW : addition of new documents

GROWTH:

add a new node with m links

PREFERENTIAL ATTACHMENT:

the probability that a node connects to a node with k links is proportional to k.

(2) New nodes prefer to link to highly connected nodes.

WWW : linking to well known sites

Network Science: Evolving Network Models

Origin of SF networks: Growth and preferential attachment

j j

i

i k

k k

= Σ

Π ( )

(39)

Section 4 The degree distribution of a network generated by the Barabási-Albert model.

The figure shows p

_k

for a single network of size N=100,000 and m=3. It shows both the linearly-binned

(purple) and the log-binned version (green) of p

_k

. The straight line is added to guide the eye and has slope

γ=3, corresponding to the network’s predicted degree exponent.

(40)

γ = 3

Network Science: Evolving Network Models

Degree distribution

k _i (t) = m t t _i

⎛

⎝ ⎜ ⎞

⎠ ⎟

β

β = 1 2

(i) The degree exponent is independent of m.

(ii) As the power-law describes systems of rather different ages and sizes, it is expected that a correct model should provide a time-independent degree

N ETWORK S CIENCE

Prof. Marcello Pelillo

Ca’ Foscari University of Venice

a.y. 2016/17

N ETWORK S CIENCE

Scale-free Networks

The power law distribution: Discrete vs. Continuous formalism

Discrete Formalism

As node degrees are always positive integers, the discrete formalism captures the

probability that a node has exactly k links:

p k = Ck −γ .

∞

∑ k=1

p k = 1.

C ∑ ∞

k=1

k −γ = 1 C = 1 ∞

∑ k=1

k −γ = 1 ζ(γ) ,

p k = k −γ ζ(γ)

Continuous Formalism

In analytical calculations it is often convenient to assume that the degrees can take up any

positive real value:

p(k) = Ck −γ .

C = 1

∞ k ∫

k −γ dk

= (γ − 1)k min γ−1

p(k) = (γ − 1)k min γ−1 k −γ .

p k

INTERPRETATION:

∞ k ∫

p(k)dk = 1

Power law

Vilfredo Pareto (1848 – 1923) , Italian economist, political scientist and philosopher, who had

important contributions to our understanding of income distribution and to the analysis of individuals choices.

A number of fundamental principles are named after him, like Pareto efficiency, Pareto distribution (another name for a power-law distribution), the Pareto principle (or 80/20 law).

“80% of the wealthis in the hands of the richest 20% of people.”

The 80/20 rule

WORLD WIDE WEB

Nodes: WWW documents Links: URL links

Over 3 billion documents ROBOT: collects all URL’s found in a document and follows them recursively

Ex p e c te d

WORLD WIDE WEB

Hubs

Section 3

(d) (

(

(

k

N

The Largest Hub

All real networks are finite. The size of the WWW is estimated to be N 10

nodes; the size of the social network is the Earth’s population, about N 7 × 10

, called the natural cutoff of the degree distribution p

. It represents the expected size of the largest hub in a network.

It is instructive to perform the calculation first for the exponential dis- tribution

For a network with minimum degree k

, the normalization condition

provides C = e

. To calculate k

we assume that in a network of N nodes we expect at most one node in the (k

, ∞) regime (ADVANCED TOPICS 3.B). In other words the probability to observe a node whose degree exceeds k

is 1/N:

(4.16) (4.15)

∫

p k dk ( ) = 1

∫

p k dk ( ) = N 1 .

p(k) = Ce

.

The difference between a power law and an exponential distribution

The difference between a power law and an exponential distribution

Let us use the WWW to illustrate the properties of the high-k regime.

The probability to have a node with k~100 is

• About if p

follows a Poisson distribution

• About if p

follows a power law.

Consequently, if the WWW were to be a random network, according to the Poisson prediction we would expect 10

N ^ETWORK S ^CIENCE

p _k = Ck ^−γ .

p _k = 1.

C ∑ ^∞

k ^−γ = 1 C = 1 _∞

k ^−γ = 1 ζ(γ) ,

p _k = k ^−γ ζ(γ)

p(k) = Ck ^−γ .

k ^−γ dk

= (γ − 1)k _min ^γ−1

p(k) = (γ − 1)k _min ^γ−1 k ^−γ .

p _k

^{p k dk} ^{( )} ⁼ ¹

^{p k dk} ^{( )} ⁼ _N ^{1 .}

•  About if p

•  About if p

p ₁₀₀ ≃ 10 ⁻³⁰ p ₁₀₀ ≃ 10 ⁻⁴

N _k>100 = 10 ⁹

∫ ^≈ _N ¹

k _max = k _min N

∫ ^{= (} ^γ ^−1)k

^γ

^k

^γ ^dk

∫ ⁼ ₍ ⁽ ₋ ^γ _γ ⁻¹⁾ ₊₁₎ ^k

^γ

^⎡⎣ ^k

^γ

^⎤⎦

^γ

^γ

k _max = k _min N

The expected degree of the largest node (natu- ral cutoff) in scale-free and random networks with the same average degree k = 3. For the scale-free network we chose = 2.5. For com- parison, we also show the linear behavior, k _max N − 1, expected for a complete network.