Prof. Marcello Pelillo
Ca’ Foscari University of Venice
a.y. 2016/17
N ETWORK S CIENCE
Scale-free Networks
The power law distribution: Discrete vs. Continuous formalism
Discrete Formalism
As node degrees are always positive integers, the discrete formalism captures the
probability that a node has exactly k links:
p k = Ck −γ .
∞
∑ k=1
p k = 1.
C ∑ ∞
k=1
k −γ = 1 C = 1 ∞
∑ k=1
k −γ = 1 ζ(γ) ,
p k = k −γ ζ(γ)
Continuous Formalism
In analytical calculations it is often convenient to assume that the degrees can take up any
positive real value:
p(k) = Ck −γ .
C = 1
∞ k ∫
mink −γ dk
= (γ − 1)k min γ−1
p(k) = (γ − 1)k min γ−1 k −γ .
p k
INTERPRETATION:
∞ k ∫
minp(k)dk = 1
Riemann-Zeta function
a) Numbers of occurrences of words in the novel Moby Dick by Hermann Melville.
b) Numbers of citations to scientic papers published in 1981, from time of publication until June 1997.
c) Numbers of hits on web sites by 60000 users of the America Online Internet service for the day of 1 December 1997.
d) Numbers of copies of bestselling books sold in the US between 1895 and 1965.
e) Number of calls received by AT&T telephone customers in the US for a single day.
f) Magnitude of earthquakes in California between
January 1910 and May 1992. Magnitude is proportional to the logarithm of the maximum amplitude of the earthquake, and hence the distribution obeys a power law even though the horizontal axis is linear.
g) Diameter of craters on the moon. Vertical axis is measured per square kilometre.
h) Peak gamma-ray intensity of solar ares in counts per second, measured from Earth orbit between February 1980 and November 1989.
i) Intensity of wars from 1816 to 1980, measured as battle deaths per 10 000 of the population of the participating countries.
j) Aggregate net worth in dollars of the richest individuals in the US in October 2003.
k) Frequency of occurrence of family names in the US in the year 1990.
l) Populations of US cities in the year 2000.
From: Newman 2006
Power law
Vilfredo Pareto (1848 – 1923) , Italian economist, political scientist and philosopher, who had
important contributions to our understanding of income distribution and to the analysis of individuals choices.
A number of fundamental principles are named after him, like Pareto efficiency, Pareto distribution (another name for a power-law distribution), the Pareto principle (or 80/20 law).
“80% of the wealthis in the hands of the richest 20% of people.”
The 80/20 rule
Other examples
• 80% of problems can be attributed to 20% of causes.
• 80% of a company's profits come from 20% of its customers
• 80% of a company's complaints come from 20% of its customers
• 80% of a company's profits come from 20% of the time its staff spent
• 80% of a company's revenue comes from 20% of its products
• 80% of a company's sales are made by 20% of its sales staff
• …
WORLD WIDE WEB
Snapshots of the World Wide Web sample mapped out by Hawoong Jeong in 1998 [1]. The sequence of images show an increasingly magnified local region of the network. The first panel displays all 325,729 nodes, offering a global view of the full dataset. Nodes with more than 50 links are shown in red and nodes with more than 500 links in purple. The closeups reveal the presence of a few highly connected nodes, called hubs, that accompany scale-free networks.
Nodes: WWW documents Links: URL links
Over 3 billion documents ROBOT: collects all URL’s found in a document and follows them recursively
Ex p e c te d
R. Albert, H. Jeong, A-L Barabasi, Nature, 401 130 (1999).
WORLD WIDE WEB
Network Science: Scale-Free Property
Hubs
Section 3
THE SCALE-FREE PROPERTY 10
Poisson vs. Power-law Distributions Figure 4.4
(d) (
b)(
a)(
c)(a) Comparing a Poisson function with a power-law function ( = 2.1) on a linear plot.
Both distributions have k = 10.
(b) The same curves as in (a), but shown on a log-log plot, allowing us to inspect the dif- ference between the two functions in the high-
k
regime.(c) A random network with k = 3 and
N
= 50, illustrating that most nodes have compara- ble degree k k .(d) A scale-free network with =2.1 and k = 3, illustrating that numerous small-degree nodes coexist with a few highly connected hubs.
The Largest Hub
All real networks are finite. The size of the WWW is estimated to be N 10
12nodes; the size of the social network is the Earth’s population, about N 7 × 10
9. These numbers are huge, but finite. Other networks pale in com- parison: The genetic network in a human cell has approximately 20,000 genes while the metabolic network of the E. Coli bacteria has only about a thousand metabolites. This prompts us to ask: How does the network size affect the size of its hubs? To answer this we calculate the expected maxi- mum degree, k
max, called the natural cutoff of the degree distribution p
k. It represents the expected size of the largest hub in a network.
It is instructive to perform the calculation first for the exponential dis- tribution
For a network with minimum degree k
min, the normalization condition
provides C = e
kmin. To calculate k
maxwe assume that in a network of N nodes we expect at most one node in the (k
max, ∞) regime (ADVANCED TOPICS 3.B). In other words the probability to observe a node whose degree exceeds k
maxis 1/N:
(4.16) (4.15)
∫
k∞minp k dk ( ) = 1
∫
k∞maxp k dk ( ) = N 1 .
100
0 10 20 30 40 50
0.05 0.1 0.15
10-6 100 10-1 10-2 10-3 10-4 10-5
101 102 103 POISSON
k k
pk pk
pk ~ k-2.1
POISSON pk ~ k-2.1
100
0 10 20 30 40 50
0.05 0.1 0.15
10-6 100 10-1 10-2 10-3 10-4 10-5
101 102 103 POISSON
k k
pk pk
pk ~ k-2.1
POISSON pk ~ k-2.1
100
0 10 20 30 40 50
0.05 0.1 0.15
10-6 100 10-1 10-2 10-3 10-4 10-5
101 102 103 POISSON
k k
pk pk
pk ~ k-2.1
POISSON pk ~ k-2.1
100
0 10 20 30 40 50
0.05 0.1 0.15
10-6 100 10-1 10-2 10-3 10-4 10-5
101 102 103 POISSON
k k
pk pk
pk ~ k-2.1
POISSON pk ~ k-2.1
p(k) = Ce
k.
The difference between a power law and an exponential distribution
The difference between a power law and an exponential distribution
Let us use the WWW to illustrate the properties of the high-k regime.
The probability to have a node with k~100 is
• About if p
kfollows a Poisson distribution
• About if p
kfollows a power law.
Consequently, if the WWW were to be a random network, according to the Poisson prediction we would expect 10
-18k>100 degree nodes, or none.
For a power law degree distribution, we expect about k>100 degree nodes
p 100 ≃ 10 −30 p 100 ≃ 10 −4
N k>100 = 10 9
THE SCALE FREE PROPERTY 12 HUBS (a) The degrees of a random network follow a
Poisson distribution, rather similar to the Bell curve. Therefore most nodes have comparable degrees and nodes with a large number of links are absent.
(b) A random network looks a bit like the na- tional highway network in which nodes are cit- ies and links are the major highways connect- ing them. There are no cities with hundreds of highways and no city is disconnected from the highway system.
(c) In a network with a power-law degree dis- tribution most nodes have only a few links.
These numerous small nodes are held togeth- er by a few highly connected hubs.
(d) A scale-free network looks like the air-traf- fic network, whose nodes are airports and links are the direct flights between them. Most airports are tiny, with only a few flights. Yet, we have a few very large airports, like Chicago or Los Angeles, that act as major hubs, con- necting many smaller airports to each other.
Once hubs are present, they change the way we navigate the network. For example, if we travel from Boston to Los Angeles by car, we must drive through many cities. On the air- plane network, however, we can reach most destinations via a single hub, like Chicago.
After [4].
Figure 4.6
Random vs. Scale-free Networks
No highly connected nodes
A few hubs with large number of links Number of nodes with k linksNumber of nodes with k links
Number of links (k)
Number of links (k) Many nodes with only a few links Most nodes have the same number of links POISSON
POWER LAW
No highly connected nodes
A few hubs with large number of links Number of nodes with k linksNumber of nodes with k links
Number of links (k)
Number of links (k) Many nodes with only a few links Most nodes have the same number of links POISSON
POWER LAW
(b)
(d) No highly
connected nodes
A few hubs with large number of links Number of nodes with k linksNumber of nodes with k links
Number of links (k)
Number of links (k) Many nodes with only a few links Most nodes have the same number of links
POISSON
POWER LAW (a)
(c)
Boston Boston Chicago
Chicago Los Angeles
Los Angeles
The difference between a power law and an exponential distribution
All real networks are finite à let us explore its consequences.
à We have an expected maximum degree, k
maxEstimating k
maxP(k)dk
kmax
∞
∫ ≈ N 1
k max = k min N
1 γ −1
Why: we expect at most one node with degree > k
max(natural upper cutoff)
P(k)dk
kmax
∞
∫ = ( γ −1)k
minγ
−1k
−γ dk
kmax
∞
∫ = ( ( − γ γ −1) +1) k
minγ
−1⎡⎣ k
−γ
+1⎤⎦
kmax∞
= k
minγ
−1k
maxγ
−1≈ 1
N
The size of the largest hub
k max = k min N
1 γ −1 The size of the largest hub
To illustrate the difference in the maximum degree of an exponential and a scale-free network let us return to the WWW sample, consisting of N ≈ 3 × 10
5nodes.
As k
min= 1, if the degree distribution were to follow an exponential, (4.17) predicts that the maximum degree should be k
max≈ 14 for λ=1. In a scale-free network of similar size and γ = 2.1, (4.18) predicts k
max≈ 95,000, a remarkable difference.
Note that the largest in-degree of the WWW map of Image 4.1 is 10,721, which is comparable to kmax predicted by a scale-free network.
This reinforces our conclusion that in a random network hubs are effectivelly
forbidden, while in scale-free networks they are naturally present.
Expected maximum degree, k
maxk
max= k
minN
1
γ
−1k
maxincreases with the size of the network
the larger a system is, the larger its biggest hub γ > 2: k
maxincreases slower than N
The largest hub will contain a decreasing fraction of links as N increases.
γ = 2: k
max~ N
The size of the biggest hub is O(N)
γ < 2: k
maxincreases faster than N: condensation phenomena
The largest hub will grab an increasing fraction of links. Anomaly!
The size of the largest hub
k
max= k
minN
1
γ
−1THE SCALE FREE PROPERTY 11 HUBS
The expected degree of the largest node (natu- ral cutoff) in scale-free and random networks with the same average degree k = 3. For the scale-free network we chose = 2.5. For com- parison, we also show the linear behavior, k max N − 1, expected for a complete network.
Overall, hubs in a scale-free network are sev- eral orders of magnitude larger than the big- gest node in a random network with the same N and k .
Figure 4.5
Hubs are Large in Scale-free Networks
(4.18)
k max ~ k N min γ − 1 1 .
Equation (4.16) yields
As lnN is a slow function of the system size, (4.17) tells us that the max- imum degree will not be significantly different from k min . For a Poisson degree distribution the calculation is a bit more involved, but the obtained dependence of k max on N is even slower than the logarithmic dependence predicted by (4.17) ( ADVANCED TOPICS 3.B ).
For a scale-free network, according to (4.12) and (4.16) , the natural cutoff follows
Hence the larger a network, the larger is the degree of its biggest hub.
The polynomial dependence of k max on N implies that in a large scale-free network there can be orders of magnitude differences in size between the smallest node, k min , and the biggest hub, k max ( Figure 4.5 ).
To illustrate the difference in the maximum degree of an exponential and a scale-free network let us return to the WWW sample of Figure 4.1 , consisting of N 3 × 10 5 nodes. As k min = 1, if the degree distribution were to follow an exponential, (4.17) predicts that the maximum degree should be k max 13. In a scale-free network of similar size and = 2.1, (4.18) predicts k max 85,000, a remarkable difference. Note that the largest in-degree of the WWW map of Figure 4.1 is 10,721, which is comparable to k max predicted by a scale-free network. This reinforces our conclusion that in a random network hubs are effectivelly forbidden, while in scale-free networks they are naturally present.
In summary the key difference between a random and a scale-free net- work is rooted in the different shape of the Poisson and of the power-law function: In a random network most nodes have comparable degrees and hence hubs are forbidden. Hubs are not only tolerated, but are expected in scale-free networks ( Figure 4.6 ). Furthermore, the more nodes a scale- free network has, the larger are its hubs. Indeed, the size of the hubs grows polynomially with the network size, hence they can grow quite large in scale-free networks. In contrast in a random network the size of the larg- est node grows logarithmically or slower with N, implying that hubs will be tiny even in a very large random network.
k max
N
10
010
210
410
610
810
1010
1210
110
210
310
410
510
710
810
910
10RANDOM NETWORK SCALE-FREE
(N - 1)
k max ~ InN
k max ~ N
(1)k max = k min + ln N
. (4.17)
The size of the largest hub
The estimated degree of the largest node in scale-free and random networks with the
same average degree
〈 k〉= 3.
For the scale-free network we chose γ = 2.5. For comparison, we also show the linear behavior, k
max~ N − 1, expected for a complete network.
Overall, hubs in a
scale-free network are several orders of
magnitude larger than
the biggest node in a
random network with
the same N and 〈k〉
The meaning of scale-free
For many scale-free networks the degree exponent is between 2 and 3 (Table 4.1). Hence for these in the N ∞ limit the first moment k is finite, but the second and higher moments, k
2, k
3, go to infinity. This diver- gence helps us understand the origin of the “scale-free” term. Indeed, if the degrees follow a normal distribution, then the degree of a randomly chosen node is
Yet, the average degree <k> and the fluctuations
khave rather different magnitude in random and in scale-free networks:
• Random Networks Have a Scale
For a random network with a Poisson degree distribution
k= <k>
1/2which is always smaller than k . Hence the network’s nodes have de- grees in the range k = k ± k
1/2. In other words nodes in a random network have comparable degrees. Therefore the average degree k serves as the “scale” of a random network.
• Scale-free Networks Lack a Scale
For a network with a power-law degree distribution with < 3 the first moment is finite but the second moment is infinite. The divergence of k
2(and of
k) for large N indicates that the fluctuations around the average can be arbitrary large. This means that when we random- ly choose a node, we do not know what to expect: the selected node’s degree could be tiny or arbitrarily large. Hence networks with < 3 do not have a meaningful internal scale. They are “scale-free” (Figure 4.7).
For example the average degree of the WWW sample is k = 4.60 (Ta- ble 4.1). Given that 2.1, the second moment diverges, which means that our expectation for the in-degree of a randomly chosen WWW document is k =4.60 ± ∞ in the N ∞ limit. That is, a randomly chosen web document could easily yield a document of degree one or two, as 74.02% of nodes have in-degree less than k . Yet, it could also yield a node with hundreds of millions of links, like google.com or facebook.com.
Strictly speaking k
2diverges only in the N ∞ limit. Yet, the diver- gence is relevant for finite networks as well. To illustrate this, Table 4.1 lists
k
2and Figure 4.8 shows the standard deviation for ten real networks.
For most of these networks is significantly larger than k , documenting large variations in node degrees. For example, the degree of a randomly chosen node in the WWW sample is k
in= 4.60 ± 1545, indicating once again that the average is not informative.
In summary, the scale-free name captures the lack of an internal scale, a consequence of the fact that nodes with widely different degrees coexist in the same network. This feature distinguishes scale-free networks from lattices, in which all nodes have exactly the same degree ( = 0), or from random networks, whose degrees vary in a narrow range ( = k
1/2). As we will see in the coming chapters, this divergence is the origin of some of the (4.21)
THE SCALE FREE PROPERTY 14 THE MEANING OF SCALE-FREE
For any bounded distribution, like a Poisson or a Gaussian, the degree of a randomly cho- sen node is in the vicinity of k . Hence k
serves as the network’s scale. For a power law distribution the second moment can diverge, and the degree of a randomly chosen node can be significantly different from k . Hence k does not serve as an intrinsic scale . As a
network with a power law degree distribution lacks an intrinsic scale, we call it scale-free.
Figure 4.7
Lack of an Internal Scale
k = k ± σ
kk
2k
2σ = −
Random Network
Randomly chosen node:
Scale: k
Scale-Free Network
Randomly chosen node:
Scale: none
= ±
k k k
1/2= ± ∞ k k p
kk
k
The meaning of scale-free
For many scale-free networks the degree exponent is between 2 and 3 ( Table 4.1 ). Hence for these in the N ∞ limit the first moment k is finite, but the second and higher moments, k 2 , k 3 , go to infinity. This diver- gence helps us understand the origin of the “scale-free” term. Indeed, if the degrees follow a normal distribution, then the degree of a randomly chosen node is
Yet, the average degree <k> and the fluctuations k have rather different magnitude in random and in scale-free networks:
• Random Networks Have a Scale
For a random network with a Poisson degree distribution k = <k> 1/2 which is always smaller than k . Hence the network’s nodes have de- grees in the range k = k ± k 1/2 . In other words nodes in a random network have comparable degrees. Therefore the average degree k serves as the “scale” of a random network.
• Scale-free Networks Lack a Scale
For a network with a power-law degree distribution with < 3 the first moment is finite but the second moment is infinite. The divergence of k 2 (and of k ) for large N indicates that the fluctuations around the average can be arbitrary large. This means that when we random- ly choose a node, we do not know what to expect: the selected node’s degree could be tiny or arbitrarily large. Hence networks with < 3 do not have a meaningful internal scale. They are “scale-free” ( Figure 4.7 ).
For example the average degree of the WWW sample is k = 4.60 ( Ta- ble 4.1 ). Given that 2.1, the second moment diverges, which means that our expectation for the in-degree of a randomly chosen WWW document is k =4.60 ± ∞ in the N ∞ limit. That is, a randomly chosen web document could easily yield a document of degree one or two, as 74.02% of nodes have in-degree less than k . Yet, it could also yield a node with hundreds of millions of links, like google.com or facebook.com.
Strictly speaking k 2 diverges only in the N ∞ limit. Yet, the diver- gence is relevant for finite networks as well. To illustrate this, Table 4.1 lists k 2 and Figure 4.8 shows the standard deviation for ten real networks.
For most of these networks is significantly larger than k , documenting large variations in node degrees. For example, the degree of a randomly chosen node in the WWW sample is k in = 4.60 ± 1545, indicating once again that the average is not informative.
In summary, the scale-free name captures the lack of an internal scale, a consequence of the fact that nodes with widely different degrees coexist in the same network. This feature distinguishes scale-free networks from lattices, in which all nodes have exactly the same degree ( = 0), or from random networks, whose degrees vary in a narrow range ( = k 1/2 ). As we will see in the coming chapters, this divergence is the origin of some of the (4.21)
THE SCALE FREE PROPERTY 14 THE MEANING OF SCALE-FREE
For any bounded distribution, like a Poisson or a Gaussian, the degree of a randomly cho- sen node is in the vicinity of k . Hence k
serves as the network’s scale. For a power law distribution the second moment can diverge, and the degree of a randomly chosen node can be significantly different from k . Hence
k does not serve as an intrinsic scale . As a
network with a power law degree distribution lacks an intrinsic scale, we call it scale-free.
Figure 4.7
Lack of an Internal Scale
k = k ± σ k
k 2 k 2
σ = −
Random Network
Randomly chosen node: Scale: k
Scale-Free Network
Randomly chosen node: Scale: none
= ±
k k k 1/2
= ± ∞
k k
p k
k k
For a random network the standard deviation follows σ = ‹k›
1/2shown as a green dashed line on the figure. The symbols show σ for nine of the ten
reference networks, calculated using the values shown in Table 4.1. The actor network has a very large 〈k〉 and σ, hence it omitted for clarity. For each network σ is larger than the value
expected for a random network with the same 〈k〉. The only exception is the power grid, which is not scale-free.
While the phone call network is scale-
free, it has a large γ, hence it is well
approximated by a random network.
universality
Section 5
(Faloutsos, Faloutsos and Faloutsos, 1999)
Nodes: computers, routers Links: physical lines
INTERNET BACKBONE
Network Science: Scale-Free Property
Network Science: Scale-Free Property
( γ = 3)
(S. Redner, 1998)
P(k) ~k -γ
1736 PRL papers (1988)
SCIENCE CITATION INDEX
Nodes: papers Links: citations
578...
25
H.E. Stanley,...
Network Science: Scale-Free Property
SCIENCE COAUTHORSHIP
M: math
NS: neuroscience
Nodes: scientist (authors) Links: joint publication
(Newman, 2000, Barabasi et al 2001)
Network Science: Scale-Free Property
Nodes: online user Links: email contact
Ebel, Mielsch, Bornholdtz, PRE 2002.
Kiel University log files 112 days, N=59,912 nodes
Pussokram.com online community;
512 days, 25,000 users.
Holme, Edling, Liljeros, 2002.
ONLINE COMMUNITIES
ONLINE COMMUNITIES
Not all networks are scale-free
Networks appearing in material science, like the network describing the bonds between the atoms in crystalline or amorphous materials, where each node has exactly the same degree.
The neural network of the C.elegans worm.
The power grid, consisting of generators
and switches connected by transmission
lines
Ultra-small property
Section 6
DISTANCES IN RANDOM GRAPHS
Random graphs tend to have a tree-like topology with almost constant node degrees.
• nr. of first neighbors:
• nr. of second neighbors:
• nr. of neighbours at distance d:
• estimate maximum distance:
k log
N l max = log
∑ =
= +
l
max1 l
i N
k 1
k N 1 ≅
2
2 k
N ≅
€
N d ≅ k d
Network Science: Scale-Free Property
Distances in scale-free networks
Size of the biggest hub is of order O(N). Most nodes can be connected within two layers of it, thus the average path length will be independent of the system size.
The average path length increases slower than logarithmically. In a random network all nodes have comparable degree, thus most paths will have comparable length. In a scale-free network the vast majority of the path go through the few high degree hubs, reducing the distances between nodes.
Some key models produce γ=3, so the result is of particular importance for them. This was first derived by Bollobas and collaborators for the network diameter in the context of a dynamical model, but it holds for the average path length as well.
The second moment of the distribution is finite, thus in many ways the network behaves as a random network. Hence the average path length follows the result that we derived for the random network model earlier.
Cohen, Havlin Phys. Rev. Lett. 90, 58701(2003); Cohen, Havlin and ben-Avraham, in Handbook of Graphs and Networks, Eds. Bornholdt and Shuster (Willy-VCH, NY, 2002) Chap. 4; Confirmed also by: Dorogovtsev et al (2002), Chung and Lu (2002); (Bollobas, Riordan, 2002; Bollobas, 1985; Newman, 2001
Ultra Small World
Small World
SMALL WORLD BEHAVIOR IN SCALE-FREE NETWORKS
Distances in scale-free networks
SMALL WORLD BEHAVIOR IN SCALE-FREE NETWORKS
THE SCALE-FREE PROPERTY 21
ULTRA-SMALL WORLD PROPERTY
SECTION 4.6
The presence of hubs in scale-free networks raises an interesting ques- tion: Do hubs affect the small world property? Figure 4.4 suggests that they do: Airlines build hubs precisely to decrease the number of hops between two airports. The calculations support this expectation, finding that dis- tances in a scale-free network are smaller than the distances observed in an equivalent random network.
The dependence of the average distance d on the system size N and the degree exponent are captured by the formula [29, 30]
Next we discuss the behavior of d in the four regimes predicted by (4.22), as summarized in Figure 4.11:
According to (4.18) for = 2 the degree of the biggest hub grows linearly with the system size, i.e. k
maxN. This forces the network into a hub and spoke configuration in which all nodes are close to each other because they all connect to the same central hub. In this regime the average path length does not depend on N.
Equation (4.22) predicts that in this regime the average distance increas- es as lnlnN, a significantly slower growth than the lnN derived for ran- dom networks. We call networks in this regime ultra-small, as the hubs radically reduce the path length [29]. They do so by linking to a large number of small-degree nodes, creating short distances between them.
(4.22)
d
N
N N N
~
const. =2, lnln
ln( 1) 2 3,
ln
lnln =3,
ln 3.
γ
γ γ
γ γ
〈 〉 − < <
>
⎧
⎨
⎪ ⎪
⎪⎪
⎩
⎪ ⎪
⎪ ⎪
The scaling of the average path length in the four scaling regimes characterizing a cale-free network: constant (γ = 2), lnlnN (2 ‹ γ ‹ 3), lnN/
lnlnN (γ = 3), lnN (γ › 3 and random networks).
The dotted lines mark the approximate size of several real networks. Given their modest size, in biological networks, like the human protein- protein interaction network (PPI), the
differences in the node-to-node distances are relatively small in the four regimes. The differences in 〈d〉 is quite significant for networks of the size of the social network or the WWW. For these the small-world formula significantly underestimates the real 〈d〉.
(b)-(d) Distance distribution for networks of size N = 102, 104, 106, illustrating that while for small networks (N = 102) the distance
distributions are not too sensitive to γ, for large networks (N = 106) pd and 〈d〉 change visibly with γ.
The role of the degree exponent
THE SCALE-FREE PROPERTY 25 THE ROLE OF THE DEGREE EXPONENT
ANOMALOUS REGIME
DIVERGES
DIVERGES
GROWS FASTER THAN
1 2 3
A B
γ
SCALE-FREE REGIME
ULTRA-SMALL WORLD
SMALL WORLD
RANDOM REGIME
No large network can exist here
Indistinguishable from a random network
WWW (OUT)EMAIL (OUT)ACT OR
WWW (IN)MET AB. (IN)
MET AB. (OUT) PRO
TEIN (IN)
COLLABORA TION
INTERNETEMAIL (IN) CITATION (IN)
FINITE
DIVERGES
CRITICAL POINT
k
k
2k k
k
2FINITE
FINITE
k k
2d const
2
3 kmax N
k
max Nd lnlnN d lnN
ln k
ln N lnln N
k
maxN
-11
FIG. 4.13
THE DEPENDENT PROPERTIES OF SCALE-FREE NETWORKS
Distances in scale-free networks
Graphicality: No large networks for γ<2
k max = k min N
1
In scale-free networks: γ −1 For γ<2: 1/(γ-2)>1
Networks With γ ‹ 2 are Not Graphical
Degree distributions and the corresponding degree sequences for two small networks. The difference between them is in the degree of a single node. While we can build a simple network using the degree distribution (a), it is impossible to build one using (b), as one stub always remains unmatched. Hence (a) is graphical, while (b) is not.
Fraction of networks, g, for a given γ that are graphical. A large number of degree sequences with degree
exponent γ and N = 105 were generated, testing the graphicality of each network. The figure indicates that while
virtually all networks with γ › 2 are graphical, it is impossible to find graphical networks in the 0 ‹ γ ‹ 2 range
Generating networks with predefined degree seq: Configuration model
(1) Degree sequence: Assign a degree to each node, represented as stubs or half-links. The degree sequence is either generated
analytically from a preselected distribution, or it is extracted from the adjacency matrix of a real network. We must start from an even number of stubs, otherwise we will be left with unpaired stubs.
(2) Network assembly: Randomly select a stub pair and connect them. Then randomly
choose another pair from the remaining stubs and connect them. This procedure is repeated until all stubs are paired up.
Depending on the order in which the stubs were chosen, we obtain different networks.
Some networks include cycles (2b), others self- loops (2c) or multi-edges (2d).
Yet, the expected number of self- and multi-edges goes to zero in the limit.
p ij = k i k j 2L − 1
THE SCALE-FREE PROPERTY 28
GENERATING NETWORKS WITH ARBITRARY
DEGREE DISTRIBUTION
SECTION 4.8
The networks generated by the Erd s-Rényi model have a Poisson de- gree distribution. The empirical results discussed in this chapter indicate, however, that the degree distribution of real networks significantly devi- ates from a Poisson form, raising an important question: How do we gen- erate networks with an arbitrary p
k? In the following we discuss three fre- quently used algorithms designed for this purpose.
Configuration Model
The configuration model, described in Figure 4.15, helps us build a network with a pre-defined degree sequence. In the network generated by the model each node has a pre-defined degree k
i, but otherwise the network is wired randomly. Consequently the obtained network is often called a random network with a pre-defined degree sequence. By repeatedly apply- ing this procedure to the same degree sequence we can generate different networks with the same p
k( Figure 4.15b-d). A couple of caveats to consider:
• The probability to have a link between nodes of degree k
iand k
jis
.
Indeed, a stub starting from node i can connect to 2L - 1 other stubs. Of these, k
jare attached to node j. So the probability that a particular stub is connected to a stub of node j is k
j/(2L - 1). As node i has k
istubs, it has k
jattempts to link to j, resulting in (4.24).
• The obtained network contains self-loops and multi-links, as there is nothing in the algorithm to forbid a node connecting to itself, or to generate multiple links between two nodes. We can choose to reject stub pairs that lead to these, but if we do so, we may not be able to complete the network. Rejecting self-loops or multi-links also means that not all possible matchings appear with equal probability. Hence (4.24) will not be valid, making analytical calculations difficult. Yet, the number of self-loops and multi-links goes to zero for large networks, as the number of choices to connect to increases with N, so typically we do not need to exclude them [42].
Figure 4.15
The Configuration Model
(4.24)
The configuration model builds a network whose nodes have pre-defined degrees [40, 41].
The algorithm consists of the following steps:
(a) Degree Sequence
Assign a degree to each node, represented as stubs or half-links. The degree sequence is either generated analytically from a preselected p
kdistribution (BOX 4.5), or it is extracted from the adjacency matrix of a real network. We must start from an even number of stubs, otherwise we are left with unpaired stubs.
(b, c, d) Network Assembly
Randomly select a stub pair and connect them. Then randomly choose another pair from the remaining 2L - 2 stubs and con- nect them. This procedure is repeated until all stubs are paired up. Depending on the order in which the stubs were chosen, we obtain different networks. Some networks include cycles (b), others self-loops (c) or multi-links (d). Yet, the expected number of self-loops and multi-links goes to zero in the N ∞ limit.
(a) (b)
(c) (d)
k
1=3 k
2=2 k
3=2 k
4=1
= −
p k k
L
ij
2 1
i jsummary
Section 9
The Barabási-Albert model
Section 3
Section 1
Hubs represent the most striking difference between a random and a scale- free network. Their emergence in many real systems raises several
fundamental questions:
• Why does the random network model of Erd ő s and Rényi fail to reproduce the hubs and the power laws observed in many real networks?
• Why do so different systems as the WWW or the cell converge to a similar
scale-free architecture?
networks expand through the addition of new nodes
Barabási & Albert, Science 286, 509 (1999)
BA MODEL: Growth
BA model: Growth
ER model:
the number of nodes, N, is fixed (static models)
THE BARABÁSI-ALBERT MODEL 6
Nodes Prefer to Link to the More Connected Nodes
The random network model assumes that we randomly choose the in- teraction partners of a node. Yet, most real networks new nodes prefer to link to the more connected nodes, a process called preferential attach- ment (Figure 5.2).
Consider a few examples:
• We are familiar with only a tiny fraction of the trillion or more docu- ments available on the WWW. The nodes we know are not entirely ran- dom: We all heard about Google and Facebook, but we rarely encoun- ter the billions of less-prominent nodes that populate the Web. As our knowledge is biased towards the more popular Web documents, we are more likely to link to a high-degree node than to a node with only few links.
• No scientist can attempt to read the more than a million scientific pa- pers published each year. Yet, the more cited is a paper, the more likely that we hear about it and eventually read it. As we cite what we read, our citations are biased towards the more cited publications, repre- senting the high-degree nodes of the citation network.
• The more movies an actor has played in, the more familiar is a casting director with her skills. Hence, the higher the degree of an actor in the actor network, the higher are the chances that she will be considered for a new role.
In summary, the random network model differs from real networks in two important characteristics:
(A) Growth
Real networks are the result of a growth process that continuously increases N. In contrast the random network model assumes that the number of nodes, N, is fixed.
(B) Preferential Attachment
In real networks new nodes tend to link to the more connected nodes.
In contrast nodes in random networks randomly choose their inter- action partners.
There are many other differences between real and random networks, some of which will be discussed in the coming chapters. Yet, as we show next, these two, growth and preferential attachment, play a particularly im- portant role in shaping a network’s degree distribution.
Networks are not static, but grow via the addition of new nodes:
(a) The evolution of the number of WWW hosts, documenting the Web’s rapid growth. After http://www.isc.org/solu- tions/survey/history.
(b) The number of scientific papers published in Physical Review since the journal’s funding. The increasing number of papers drives the growth of both the science col- laboration network as well as of the cita- tion network shown in the figure.
(c) Number of movies listed in IMDB.com, driving the growth of the actor network.
Figure 5.1
The Growth of Networks (a)
(b)
(c)
WORLD WIDE WEB
ACTOR NETWORK CITATION NETWORK
GROWTH AND PREFERENTIAL ATTACHMENT YEARS
1880 50000 100000 150000 200000 250000
0
1900 1920 1940 1960 1980 2000 2020
NUMBER OF MOVIES
YEARS
YEARS
1880 1900 1920 1940 1960 1980 2000 2020 1982
0•100 1•108 2•108 3•108 4•108 5•108 6•108 8•108 9•108 1•109
7•108
1987 1992 1997 2002 2007 2012
NUMBER OF HOSTS
0 50000 100000 150000 200000 250000 300000 400000 450000 350000
NUMBER OF PAPERS
New nodes prefer to connect to the more connected nodes
Barabási & Albert, Science 286, 509 (1999) Network Science: Evolving Network Models
BA MODEL: Preferential attachment
BA model: Growth
ER model: links are added randomly to the network
Barabási & Albert, Science 286, 509 (1999) Network Science: Evolving Network Models
Section 2: Growth and Preferential Sttachment
BA model: Growth
The random network model differs from real networks in two important characteristics:
Growth: While the random network model assumes that the number of nodes is fixed (time invariant), real networks are the result of a growth process that continuously increases.
Preferential Attachment: While nodes in random networks randomly choose
their interaction partner, in real networks new nodes prefer to link to the more
connected nodes.
Barabási & Albert, Science 286, 509 (1999)
P(k) ~k
-3(1) Networks continuously expand by the addition of new nodes
WWW : addition of new documents
GROWTH:
add a new node with m links
PREFERENTIAL ATTACHMENT:
the probability that a node connects to a node with k links is proportional to k.
(2) New nodes prefer to link to highly connected nodes.
WWW : linking to well known sites
Network Science: Evolving Network Models
Origin of SF networks: Growth and preferential attachment
j j
i
i k
k k
= Σ
Π ( )
Section 4
The degree distribution of a network generated by the Barabási-Albert model.
The figure shows p
kfor a single network of size N=100,000 and m=3. It shows both the linearly-binned
(purple) and the log-binned version (green) of p
k. The straight line is added to guide the eye and has slope
γ=3, corresponding to the network’s predicted degree exponent.
γ = 3
Network Science: Evolving Network Models