A network approach to investigate the aggregation phenomena in sports

(1)

22 July 2021

AperTO - Archivio Istituzionale Open Access dell'Università di Torino

Original Citation:

A network approach to investigate the aggregation phenomena in sports

Terms of use:

Open Access

(Article begins on next page)

Anyone can freely access the full text of works made available as "Open Access". Works made available under a Creative Commons license can be used according to the terms and conditions of said license. Use of all other works requires consent of the right holder (author or publisher) if not exempted from copyright protection by the applicable law.

Availability:

This is the author's manuscript

(2)

A network approach to investigate the aggregation

phenomena in sports

Luca Ferreri1,2 Fabio Daolio4 Marco Ivaldi3 Mario Giacobini1,2 Marco Tomassini4

1_{Complex System Unit, Molecular Biotechnology Center}

2_{Department of Animal Production, Epidemiology and Ecology, Faculty of Veterinary Medicine} 3_{Motor Science Research Center, S.U.I.S.M}

University of Torino, Italy

4_{Faculty of Business and Economics, Department of Information Systems} University of Lausanne, Switzerland

CS-SPORTS Paris 12th August 2011

(3)

Network: definition

A network is given by a set ofnodes

and of interactions among nodes callededges

(4)

Network: definition

A network is given by a set ofnodes and of interactions among nodes callededges

(5)

Weighted Networks

285 π 0.1 24 3444 1 2 55 81

(6)

(7)

Complex Networks: Nodes Centralities

• degree-centrality: the importance of a node grows proportionally

with its degree

• betweenness-centrality: the importance of a node given by the number of paths of minimum lenght that cross the node

• _{eigenvector-centrality: the importance of a node is proportional to} the sum of the importance of all vertices that point to it (Newman 2003):

(8)

Complex Networks: Nodes Centralities

• degree-centrality: the importance of a node grows proportionally with its degree

• _{betweenness-centrality}_{: the importance of a node given by the}

number of paths of minimum lenght that cross the node

• _{eigenvector-centrality: the importance of a node is proportional to} the sum of the importance of all vertices that point to it (Newman 2003):

(9)

Complex Networks: Nodes Centralities

• degree-centrality: the importance of a node grows proportionally with its degree

• _{betweenness-centrality: the importance of a node given by the} number of paths of minimum lenght that cross the node

• _{eigenvector-centrality}_{: the importance of a node is proportional to}

the sum of the importance of all vertices that point to it (Newman 2003):

(10)

Complex Networks: Communities Detection

A community is defined as a subnet having few number of edges departing from it

(11)

Complex Networks: Distance among Nodes

Distance among nodes is defined as the minimum number of edges necessary to connect two nodes.

The shortest path in network is called theradiusof the network while the longest is thediameter. In real world network it has been

observed thesmall-world phenomena: a small diameter compared with the number of nodes.

(12)

Complex Networks: Assortativity

We try to answer the question whether nodes prefer to connect with their similar (assortative behaviour) or not (dissasortative). In particular for node similarity we intend degree similarity.

Two approaches are known to detect the assortativity:

• the study of the Pearson assortative coefficient, that detect the correlation among nodes;

(13)

Complex Networks: Assortativity

We try to answer the question whether nodes prefer to connect with their similar (assortative behaviour) or not (dissasortative). In particular for node similarity we intend degree similarity. Two approaches are known to detect the assortativity:

(14)

Complex Networks: Assortativity

(15)

Complex Networks: Assortativity

• the study of the average degree of the nearest neighbors

hknn

i

(16)

Complex Networks: Assortativity

• the study of the average degree of the nearest neighbors

hknn

i

(17)

Complex Networks: Clustering Coefficient

The clustering coefficient of a node is a measure of how its neighbors are connected.

(18)

Complex Networks: Heterogenity

The degree distribution of a network is the probability for a node to have a number of edges departing from it.

Complex networks could be distinct in:

regular having homogeneous degree distribution such as fixed, binomial, Poisson, exponential or normal

;

scale-free having fat tailed degree distribution well described by power law distribution (at least asymptotically);

(19)

Complex Networks: Heterogenity

regular having homogeneous degree distribution such as fixed, binomial, Poisson, exponential or normal;

scale-free having fat tailed degree distribution well described by power law distribution (at least asymptotically)

(20)

Complex Networks: Heterogenity

scale-free having fat tailed degree distribution well described by power law distribution (at least asymptotically); a measure of theheterogeneityis given by:

hki hk2_i

(21)

Complex Networks: Heterogenity

scale-free having fat tailed degree distribution well described by power law distribution (at least asymptotically);

P (x = h ) h 1 ln P (x = h ) ln(h)

(22)

Bipartite Networks

A A A A B B B B B Many example: • _{co-authorship network;} • diseasome;

• heterosexual contact network; • vector-borne disease network;

(23)

Bipartite Networks

A A A A

B B B B B

(24)

Bipartite Networks

A A A A

B B B B B

1

(25)

Bipartite Networks

A A A A B B B B B 1 3 1

(26)

Bipartite Networks

The A-projection A A A A 1 3 1 the B-projection B B 2 B B B 1 1 2 2

(27)

Bipartite Projection is less informative

B B B

A

B B B

A A A

are both projected in

B 1 B B

1 1

(28)

Bipartite Projection is less informative

B B B

A

B B B

A A A

B 1 B B

1 1

(29)

Bipartite Projection is less informative

B B B

A

B B B

A A A

B 1 B B

1 1

(30)

We-Sport: a sparse network

We consider a snapshot of the entire network: • 1680 athletes

• 240 sports • 6107 interactions

we define the density for bipartite network as:

δ = edges

sports · athletes ' 0.014 We observe only two large connected components:

• _{the first have 1679 athletes and 239 sports} • the second

A man

(31)

We-Sport: a sparse network

• _{240 sports}

• 6107 interactions

δ = edges

A man

(32)

We-Sport: a sparse network

• _{240 sports} • 6107 interactions

δ = edges

A man

(33)

We-Sport: a sparse network

δ = edges

sports · athletes ' 0.014

We observe only two large connected components: • _{the first have 1679 athletes and 239 sports} • the second

A man

(34)

We-Sport: a sparse network

δ = edges

• _{the first have 1679 athletes and 239 sports}

• the second

A man

(35)

We-Sport: a sparse network

δ = edges

A man

(36)

(37)

(38)

The 70 most played Sports

100 200 300 400 500 600

joggingcalcioa5nuototennis_biciclettafitnesspallavolo pallacanestro scialpino sno wb oard calcioa11 mountainbik e calcioa7 beachvol ley escursionismo_{tennisdatavolo}ma rciaatletica corsasustrada baseballcalcioa8 arram pic ata(sp ecialsp ort) culturismo vela immersionesubacquea pattinaggioa rotelle scino rdico alpinismo moto ciclismo golf acquagym arrampicata canoa equitazione squash mountainbik e(sp ecialsp ort) ma ratona_ciasp ole bowling ciclismosustradahock eysughiaccio bilia rdo karate windsurf e-crosscountry(X C) danzalatinoamericani rugb y calciobalillakickb oxing sno wb oard(sp ecialsp ort)surf scacchidanza tiro conl’a rco apnea beachvolley(sp ecialsp ort) freerunning nordicw alking spinningpugilatoaikidoka

rting

pescasp ortivapilatesstep

triathlonfreesb y rafting ecialsp ort)yoga canottaggio corsaca mp estre female male 14 of 1

(39)

The 70 most played Sports: gender frequencies

20% 40%

jogging_calcioa5nuototennis bicicletta fitness pallavolo pallacanestro scialpino sno wb oard calcioa11 mountainbik e calcioa7 beachvolley escursionismotennisdatavoloma rciaatletica corsasustrada baseballcalcioa8 arrampicata(sp ecialsp ort) culturismo vela immersionesubacquea pattinaggioa rotelle scino rdic o alpinismo moto ciclismo golf acquagym arrampicata canoa equitazione squash mountainbik e(sp ecialsp ort) ma ratona_ciasp ole bowling ciclismosustrada_hock eysughiaccio bilia rdo karate windsurf e-cro sscountry(X C) danzalatinoamericani rugb y calciobalillakickb oxing sno wb oard(sp ecialsp ort)surf scacchidanza tiro conl’a rco apnea beachvoll ey(sp ecialsp ort) freerunning nord icw alking

spinningpugilatoaikidoka rting

pescasp ortivapilatesstep

triathlonfreesb y rafting otballamericano(sp ecialsp ort)yoga canottaggio corsa camp estre female male

(40)

A complex network

partition mode median hki hk2_i hki

hk2_i

sport 1 5 25.54 6.183 · 103 0.0041

(41)

Degree distribution: sport nodes

P (x = h ) h 100 200 300 400 500 600 1

(42)

Degree distribution: sport nodes

P (x ≥ h ) h 100 200 300 400 500 600 1

(43)

Degree distribution: sport nodes logarithmic scale

P (x ≥ h ) h 100 101 102 103 10−3 10−2 10−1 100 α = 1.96 xmin= 15 p-value=0.5670

bin.neg. Poisson exponential Weibull log-normal Yule power law + cutoff

LR p LR p LR p LR p LR p LR p LR p

(44)

Degree distribution: sport nodes logarithmic scale

(45)

Degree distribution: sport nodes logarithmic scale

(46)

Degree distribution: athletes nodes

P (x = h ) h 5 10 15 20 25 30 35 1

(47)

Degree distribution: athletes nodes

P (x ≥ h ) h 5 10 15 20 25 30 35 1

(48)

Degree distribution: athletes nodes logarithmic scale

p (x ≥ h ) h 100 101 102 10−4 10−3 10−2 10−1 100 α = 3.49 xmin= 6 p-value= 0.0730

(49)

Degree distribution: athletes nodes logarithmic scale

(50)

Degree distribution: athletes nodes logarithmic scale

(51)

Nodes distance

The maximum distance between every pairs of nodes in a graph is defined as thediameter of the graph. We observe a diameter of 8 but on average the shortest path between nodes is 3.33.

(52)

The distance distribution

the athletes-athletes distance

1 3 2 3 1 1 2 3 4 5 6 7 8

(53)

The distance distribution

the sport-sport distance

1 3 2 3 1 1 2 3 4 5 6 7 8

(54)

The distance distribution

the sport-athletes distance

1 3 2 3 1 1 2 3 4 5 6 7 8

(55)

Assortativity in bipartite

We-sport network shows a disassortativebehaviour: the Pearson coefficient is −0.2425.

Moreover if we calculate the nearest neighbor degree:

4 8 12 200 400 600 k-sport knn -athletes

(56)

Assortativity in bipartite

We-sport network shows a disassortativebehaviour: the Pearson coefficient is −0.2425.

Moreover if we calculate the nearest neighbor degree:

100 200 300 10 20 30 k-athletes knn -sp o rt

(57)

(58)

2-length assortativity in bipartite

We want to try to answer the question:

do people choose sports that connect them with similar people or not?

Therefore we analyze the 2-length assortativity: we observe a weak assortative behavior for athletes-nodes (0.0326) and stronger for sport-nodes (0.2620)

(59)

2-length assortativity in bipartite

100 200 300 200 400 600 k-sport knn 2 l-sp o rt

(60)

2-length assortativity in bipartite

5.5 6 10 20 30 k-athletes knn 2 l-sp o rt

(61)

The Clustering Coefficient for Biparite Networks

Again in order to understand the aggregation behavior of athletes we try to understand if people prefer to connect with other sharing the same sport’s preference. Hence we define a similarity matrix cc which counts for each couple of athletes the number of sports they share:

|N(v ) ∩ N(u)|

then we can normalize that matrix. Le Blond et al., Latapy et al., and Borgatti suggest the three following denominator:

• min (|N(v )|, |N(u)|) for cc•(u, v )

• max (|N(v )|, |N(u)|) for cc•(u, v )

(62)

The Clustering Coefficient for Biparite Networks

|N(v ) ∩ N(u)|

• min (|N(v )|, |N(u)|) for cc•(u, v )

• max (|N(v )|, |N(u)|) for cc•(u, v )

(63)

The Clustering Coefficient for Biparite Networks

|N(v ) ∩ N(u)|

• min (|N(v )|, |N(u)|) for cc•(u, v )

• max (|N(v )|, |N(u)|) for cc•(u, v )

(64)

The clustering coefficient II

From the similarity matrix we can calculate the clustering coefficient of each nodes.

cc(v ) = P

u∈N(N(v ))cc(v , u)

|N(N(v ))|

and from that the clustering coefficient of A-partition: cc = 1 |A| X v ∈A cc(v ) graph cc• cc• cc• athletes 0.6628 0.2672 0.2315 sport 0.4126 0.0615 0.0536

(65)

The clustering coefficient II

The athletes case

0.5 1 3 6 9 12 15 18 21 24 27 30 33 k hcc i cc• cc• cc•

(66)

The clustering coefficient II

The sport case

0.5 1 2 5 10 20 50 100 200 500 k hcc i cc• cc• cc•

(67)

The centrality

2−mode Key Sports Analysis

Betweenness Centrality Eige v ector Centr ality 0.2 0.4 0.6 0.8 1.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●●● ● ● ●●● ●●● ● ● ● ● ●●●● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●●● ● ● ● ● ●● ●● ● ● ● ●● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●●●● ● ● ● ●● ●● ● ● ● ●● ●● ●● ● ● ●●●● ●●●● ● mountain_bike jogging pallavolo calcio_a_11 fitness tennis calcio_a_5 snowboard_(special_sport) beach_volleysnowboard motocross sci_alpino surf nuoto pallacanestro golf _kickboxing bicicletta taekwondo escursionismo acquagym badminton_(special_sport) 0.00 0.05 0.10 0.15 0.20 dc ● 0.01 ● 0.04 ●0.09 ●0.16 ●0.25 cc ● 0.25 ● 0.30 ● 0.35 ● 0.40 ● 0.45

(68)

(69)

Contacts

for further informations:

www.we-sport.com or contact us:

[email protected] [email protected]