• Non ci sono risultati.

A network approach to investigate the aggregation phenomena in sports

N/A
N/A
Protected

Academic year: 2021

Condividi "A network approach to investigate the aggregation phenomena in sports"

Copied!
69
0
0

Testo completo

(1)

22 July 2021

AperTO - Archivio Istituzionale Open Access dell'Università di Torino

Original Citation:

A network approach to investigate the aggregation phenomena in sports

Terms of use:

Open Access

(Article begins on next page)

Anyone can freely access the full text of works made available as "Open Access". Works made available under a Creative Commons license can be used according to the terms and conditions of said license. Use of all other works requires consent of the right holder (author or publisher) if not exempted from copyright protection by the applicable law.

Availability:

This is the author's manuscript

(2)

A network approach to investigate the aggregation

phenomena in sports

Luca Ferreri1,2 Fabio Daolio4 Marco Ivaldi3 Mario Giacobini1,2 Marco Tomassini4

1Complex System Unit, Molecular Biotechnology Center

2Department of Animal Production, Epidemiology and Ecology, Faculty of Veterinary Medicine 3Motor Science Research Center, S.U.I.S.M

University of Torino, Italy

4Faculty of Business and Economics, Department of Information Systems University of Lausanne, Switzerland

CS-SPORTS Paris 12th August 2011

(3)

Network: definition

A network is given by a set ofnodes

and of interactions among nodes callededges

(4)

Network: definition

A network is given by a set ofnodes and of interactions among nodes callededges

(5)

Weighted Networks

285 π 0.1 24 3444 1 2 55 81

(6)
(7)

Complex Networks: Nodes Centralities

• degree-centrality: the importance of a node grows proportionally

with its degree

• betweenness-centrality: the importance of a node given by the number of paths of minimum lenght that cross the node

eigenvector-centrality: the importance of a node is proportional to the sum of the importance of all vertices that point to it (Newman 2003):

(8)

Complex Networks: Nodes Centralities

• degree-centrality: the importance of a node grows proportionally with its degree

betweenness-centrality: the importance of a node given by the

number of paths of minimum lenght that cross the node

eigenvector-centrality: the importance of a node is proportional to the sum of the importance of all vertices that point to it (Newman 2003):

(9)

Complex Networks: Nodes Centralities

• degree-centrality: the importance of a node grows proportionally with its degree

betweenness-centrality: the importance of a node given by the number of paths of minimum lenght that cross the node

eigenvector-centrality: the importance of a node is proportional to

the sum of the importance of all vertices that point to it (Newman 2003):

(10)

Complex Networks: Communities Detection

A community is defined as a subnet having few number of edges departing from it

(11)

Complex Networks: Distance among Nodes

Distance among nodes is defined as the minimum number of edges necessary to connect two nodes.

The shortest path in network is called theradiusof the network while the longest is thediameter. In real world network it has been

observed thesmall-world phenomena: a small diameter compared with the number of nodes.

(12)

Complex Networks: Assortativity

We try to answer the question whether nodes prefer to connect with their similar (assortative behaviour) or not (dissasortative). In particular for node similarity we intend degree similarity.

Two approaches are known to detect the assortativity:

• the study of the Pearson assortative coefficient, that detect the correlation among nodes;

(13)

Complex Networks: Assortativity

We try to answer the question whether nodes prefer to connect with their similar (assortative behaviour) or not (dissasortative). In particular for node similarity we intend degree similarity. Two approaches are known to detect the assortativity:

• the study of the Pearson assortative coefficient, that detect the correlation among nodes;

(14)

Complex Networks: Assortativity

We try to answer the question whether nodes prefer to connect with their similar (assortative behaviour) or not (dissasortative). In particular for node similarity we intend degree similarity. Two approaches are known to detect the assortativity:

• the study of the Pearson assortative coefficient, that detect the correlation among nodes;

(15)

Complex Networks: Assortativity

We try to answer the question whether nodes prefer to connect with their similar (assortative behaviour) or not (dissasortative). In particular for node similarity we intend degree similarity. Two approaches are known to detect the assortativity:

• the study of the Pearson assortative coefficient, that detect the correlation among nodes;

• the study of the average degree of the nearest neighbors

hknn

i

(16)

Complex Networks: Assortativity

We try to answer the question whether nodes prefer to connect with their similar (assortative behaviour) or not (dissasortative). In particular for node similarity we intend degree similarity. Two approaches are known to detect the assortativity:

• the study of the Pearson assortative coefficient, that detect the correlation among nodes;

• the study of the average degree of the nearest neighbors

hknn

i

(17)

Complex Networks: Clustering Coefficient

The clustering coefficient of a node is a measure of how its neighbors are connected.

(18)

Complex Networks: Heterogenity

The degree distribution of a network is the probability for a node to have a number of edges departing from it.

Complex networks could be distinct in:

regular having homogeneous degree distribution such as fixed, binomial, Poisson, exponential or normal

;

scale-free having fat tailed degree distribution well described by power law distribution (at least asymptotically);

(19)

Complex Networks: Heterogenity

The degree distribution of a network is the probability for a node to have a number of edges departing from it.

Complex networks could be distinct in:

regular having homogeneous degree distribution such as fixed, binomial, Poisson, exponential or normal;

scale-free having fat tailed degree distribution well described by power law distribution (at least asymptotically)

(20)

Complex Networks: Heterogenity

The degree distribution of a network is the probability for a node to have a number of edges departing from it.

Complex networks could be distinct in:

regular having homogeneous degree distribution such as fixed, binomial, Poisson, exponential or normal;

scale-free having fat tailed degree distribution well described by power law distribution (at least asymptotically); a measure of theheterogeneityis given by:

hki hk2i

(21)

Complex Networks: Heterogenity

The degree distribution of a network is the probability for a node to have a number of edges departing from it.

Complex networks could be distinct in:

regular having homogeneous degree distribution such as fixed, binomial, Poisson, exponential or normal;

scale-free having fat tailed degree distribution well described by power law distribution (at least asymptotically);

P (x = h ) h 1 ln P (x = h ) ln(h)

(22)

Bipartite Networks

A A A A B B B B B Many example: • co-authorship network; • diseasome;

• heterosexual contact network; • vector-borne disease network;

(23)

Bipartite Networks

A A A A

B B B B B

(24)

Bipartite Networks

A A A A

B B B B B

1

(25)

Bipartite Networks

A A A A B B B B B 1 3 1

(26)

Bipartite Networks

The A-projection A A A A 1 3 1 the B-projection B B 2 B B B 1 1 2 2

(27)

Bipartite Projection is less informative

B B B

A

B B B

A A A

are both projected in

B 1 B B

1 1

(28)

Bipartite Projection is less informative

B B B

A

B B B

A A A

are both projected in

B 1 B B

1 1

(29)

Bipartite Projection is less informative

B B B

A

B B B

A A A

are both projected in

B 1 B B

1 1

(30)

We-Sport: a sparse network

We consider a snapshot of the entire network: • 1680 athletes

• 240 sports • 6107 interactions

we define the density for bipartite network as:

δ = edges

sports · athletes ' 0.014 We observe only two large connected components:

the first have 1679 athletes and 239 sports • the second

A man

(31)

We-Sport: a sparse network

We consider a snapshot of the entire network: • 1680 athletes

240 sports

• 6107 interactions

we define the density for bipartite network as:

δ = edges

sports · athletes ' 0.014 We observe only two large connected components:

the first have 1679 athletes and 239 sports • the second

A man

(32)

We-Sport: a sparse network

We consider a snapshot of the entire network: • 1680 athletes

240 sports • 6107 interactions

we define the density for bipartite network as:

δ = edges

sports · athletes ' 0.014 We observe only two large connected components:

the first have 1679 athletes and 239 sports • the second

A man

(33)

We-Sport: a sparse network

We consider a snapshot of the entire network: • 1680 athletes

240 sports • 6107 interactions

we define the density for bipartite network as:

δ = edges

sports · athletes ' 0.014

We observe only two large connected components: • the first have 1679 athletes and 239 sports • the second

A man

(34)

We-Sport: a sparse network

We consider a snapshot of the entire network: • 1680 athletes

240 sports • 6107 interactions

we define the density for bipartite network as:

δ = edges

sports · athletes ' 0.014 We observe only two large connected components:

the first have 1679 athletes and 239 sports

• the second

A man

(35)

We-Sport: a sparse network

We consider a snapshot of the entire network: • 1680 athletes

240 sports • 6107 interactions

we define the density for bipartite network as:

δ = edges

sports · athletes ' 0.014 We observe only two large connected components:

the first have 1679 athletes and 239 sports • the second

A man

(36)
(37)
(38)

The 70 most played Sports

100 200 300 400 500 600

joggingcalcioa5nuototennisbiciclettafitnesspallavolo pallacanestro scialpino sno wb oard calcioa11 mountainbik e calcioa7 beachvol ley escursionismotennisdatavoloma rciaatletica corsasustrada baseballcalcioa8 arram pic ata(sp ecialsp ort) culturismo vela immersionesubacquea pattinaggioa rotelle scino rdico alpinismo moto ciclismo golf acquagym arrampicata canoa equitazione squash mountainbik e(sp ecialsp ort) ma ratonaciasp ole bowling ciclismosustradahock eysughiaccio bilia rdo karate windsurf e-crosscountry(X C) danzalatinoamericani rugb y calciobalillakickb oxing sno wb oard(sp ecialsp ort)surf scacchidanza tiro conl’a rco apnea beachvolley(sp ecialsp ort) freerunning nordicw alking spinningpugilatoaikidoka

rting

pescasp ortivapilatesstep

triathlonfreesb y rafting ecialsp ort)yoga canottaggio corsaca mp estre female male 14 of 1

(39)

The 70 most played Sports: gender frequencies

20% 40%

joggingcalcioa5nuototennis bicicletta fitness pallavolo pallacanestro scialpino sno wb oard calcioa11 mountainbik e calcioa7 beachvolley escursionismotennisdatavoloma rciaatletica corsasustrada baseballcalcioa8 arrampicata(sp ecialsp ort) culturismo vela immersionesubacquea pattinaggioa rotelle scino rdic o alpinismo moto ciclismo golf acquagym arrampicata canoa equitazione squash mountainbik e(sp ecialsp ort) ma ratonaciasp ole bowling ciclismosustradahock eysughiaccio bilia rdo karate windsurf e-cro sscountry(X C) danzalatinoamericani rugb y calciobalillakickb oxing sno wb oard(sp ecialsp ort)surf scacchidanza tiro conl’a rco apnea beachvoll ey(sp ecialsp ort) freerunning nord icw alking

spinningpugilatoaikidoka rting

pescasp ortivapilatesstep

triathlonfreesb y rafting otballamericano(sp ecialsp ort)yoga canottaggio corsa camp estre female male

(40)

A complex network

partition mode median hki hk2i hki

hk2i

sport 1 5 25.54 6.183 · 103 0.0041

(41)

Degree distribution: sport nodes

P (x = h ) h 100 200 300 400 500 600 1

(42)

Degree distribution: sport nodes

P (x ≥ h ) h 100 200 300 400 500 600 1

(43)

Degree distribution: sport nodes logarithmic scale

P (x ≥ h ) h 100 101 102 103 10−3 10−2 10−1 100 α = 1.96 xmin= 15 p-value=0.5670

bin.neg. Poisson exponential Weibull log-normal Yule power law + cutoff

LR p LR p LR p LR p LR p LR p LR p

(44)

Degree distribution: sport nodes logarithmic scale

P (x ≥ h ) h 100 101 102 103 10−3 10−2 10−1 100 α = 1.96 xmin= 15 p-value=0.5670

bin.neg. Poisson exponential Weibull log-normal Yule power law + cutoff

LR p LR p LR p LR p LR p LR p LR p

(45)

Degree distribution: sport nodes logarithmic scale

P (x ≥ h ) h 100 101 102 103 10−3 10−2 10−1 100 α = 1.96 xmin= 15 p-value=0.5670

bin.neg. Poisson exponential Weibull log-normal Yule power law + cutoff

LR p LR p LR p LR p LR p LR p LR p

(46)

Degree distribution: athletes nodes

P (x = h ) h 5 10 15 20 25 30 35 1

(47)

Degree distribution: athletes nodes

P (x ≥ h ) h 5 10 15 20 25 30 35 1

(48)

Degree distribution: athletes nodes logarithmic scale

p (x ≥ h ) h 100 101 102 10−4 10−3 10−2 10−1 100 α = 3.49 xmin= 6 p-value= 0.0730

bin.neg. Poisson exponential Weibull log-normal Yule power law + cutoff

LR p LR p LR p LR p LR p LR p LR p

(49)

Degree distribution: athletes nodes logarithmic scale

p (x ≥ h ) h 100 101 102 10−4 10−3 10−2 10−1 100 α = 3.49 xmin= 6 p-value= 0.0730

bin.neg. Poisson exponential Weibull log-normal Yule power law + cutoff

LR p LR p LR p LR p LR p LR p LR p

(50)

Degree distribution: athletes nodes logarithmic scale

p (x ≥ h ) h 100 101 102 10−4 10−3 10−2 10−1 100 α = 3.49 xmin= 6 p-value= 0.0730

bin.neg. Poisson exponential Weibull log-normal Yule power law + cutoff

LR p LR p LR p LR p LR p LR p LR p

(51)

Nodes distance

The maximum distance between every pairs of nodes in a graph is defined as thediameter of the graph. We observe a diameter of 8 but on average the shortest path between nodes is 3.33.

(52)

The distance distribution

the athletes-athletes distance

1 3 2 3 1 1 2 3 4 5 6 7 8

(53)

The distance distribution

the sport-sport distance

1 3 2 3 1 1 2 3 4 5 6 7 8

(54)

The distance distribution

the sport-athletes distance

1 3 2 3 1 1 2 3 4 5 6 7 8

(55)

Assortativity in bipartite

We-sport network shows a disassortativebehaviour: the Pearson coefficient is −0.2425.

Moreover if we calculate the nearest neighbor degree:

4 8 12 200 400 600 k-sport knn -athletes

(56)

Assortativity in bipartite

We-sport network shows a disassortativebehaviour: the Pearson coefficient is −0.2425.

Moreover if we calculate the nearest neighbor degree:

100 200 300 10 20 30 k-athletes knn -sp o rt

(57)
(58)

2-length assortativity in bipartite

We want to try to answer the question:

do people choose sports that connect them with similar people or not?

Therefore we analyze the 2-length assortativity: we observe a weak assortative behavior for athletes-nodes (0.0326) and stronger for sport-nodes (0.2620)

(59)

2-length assortativity in bipartite

100 200 300 200 400 600 k-sport knn 2 l-sp o rt

(60)

2-length assortativity in bipartite

5.5 6 10 20 30 k-athletes knn 2 l-sp o rt

(61)

The Clustering Coefficient for Biparite Networks

Again in order to understand the aggregation behavior of athletes we try to understand if people prefer to connect with other sharing the same sport’s preference. Hence we define a similarity matrix cc which counts for each couple of athletes the number of sports they share:

|N(v ) ∩ N(u)|

then we can normalize that matrix. Le Blond et al., Latapy et al., and Borgatti suggest the three following denominator:

• min (|N(v )|, |N(u)|) for cc•(u, v )

• max (|N(v )|, |N(u)|) for cc•(u, v )

(62)

The Clustering Coefficient for Biparite Networks

Again in order to understand the aggregation behavior of athletes we try to understand if people prefer to connect with other sharing the same sport’s preference. Hence we define a similarity matrix cc which counts for each couple of athletes the number of sports they share:

|N(v ) ∩ N(u)|

then we can normalize that matrix. Le Blond et al., Latapy et al., and Borgatti suggest the three following denominator:

• min (|N(v )|, |N(u)|) for cc•(u, v )

• max (|N(v )|, |N(u)|) for cc•(u, v )

(63)

The Clustering Coefficient for Biparite Networks

Again in order to understand the aggregation behavior of athletes we try to understand if people prefer to connect with other sharing the same sport’s preference. Hence we define a similarity matrix cc which counts for each couple of athletes the number of sports they share:

|N(v ) ∩ N(u)|

then we can normalize that matrix. Le Blond et al., Latapy et al., and Borgatti suggest the three following denominator:

• min (|N(v )|, |N(u)|) for cc•(u, v )

• max (|N(v )|, |N(u)|) for cc•(u, v )

(64)

The clustering coefficient II

From the similarity matrix we can calculate the clustering coefficient of each nodes.

cc(v ) = P

u∈N(N(v ))cc(v , u)

|N(N(v ))|

and from that the clustering coefficient of A-partition: cc = 1 |A| X v ∈A cc(v ) graph cc• cc• cc• athletes 0.6628 0.2672 0.2315 sport 0.4126 0.0615 0.0536

(65)

The clustering coefficient II

The athletes case

0.5 1 3 6 9 12 15 18 21 24 27 30 33 k hcc i cc• cc• cc•

(66)

The clustering coefficient II

The sport case

0.5 1 2 5 10 20 50 100 200 500 k hcc i cc• cc• cc•

(67)

The centrality

2−mode Key Sports Analysis

Betweenness Centrality Eige v ector Centr ality 0.2 0.4 0.6 0.8 1.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●●● ● ● ●●● ●●● ● ● ● ● ●●●● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●●● ● ● ● ● ●● ●● ● ● ● ●● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●●●● ● ● ● ●● ●● ● ● ● ●● ●● ●● ● ● ●●●● ●●●● ● mountain_bike jogging pallavolo calcio_a_11 fitness tennis calcio_a_5 snowboard_(special_sport) beach_volleysnowboard motocross sci_alpino surf nuoto pallacanestro golf kickboxing bicicletta taekwondo escursionismo acquagym badminton_(special_sport) 0.00 0.05 0.10 0.15 0.20 dc ● 0.01 ● 0.04 ●0.09 ●0.16 ●0.25 cc ● 0.25 ● 0.30 ● 0.35 ● 0.40 ● 0.45

(68)
(69)

Contacts

for further informations:

www.we-sport.com or contact us:

luca.ferreri@unito.it fabio.daolio@unil.ch

Riferimenti

Documenti correlati

Now, if we move these processes of reading, interpreting and representing relationships generated by chromatic masses in textile design to a more complex level, such as

In this report, we described a case study of a patient admitted to our Internal Medicine Department for a bone marrow biopsy and myelogram due to a monoclonal peak observed by

d’archivio, l’autore ripercorre le tappe principali della vita di questa tipografa: Charlotte, originaria della regione del Maine, fece parte di una famiglia della media

In recent years, several investigators have used laser-assisted microdissection to isolate small areas of tissue or single cells out of histological sections to

Tutti i pazienti sono stati sottoposti ad angiografia coronarica quantitativa (QCA) con una nuova coronarografia di follow-up a 9-12 mesi dalla precedura indice: ciò ha

As reported in the questionnaire explanation part, classification involves the assignment of classes (e.g. high/medium/low or acceptable/quite acceptable/not acceptable)

Alcuni recenti episodi di aggressione da parte di cani, in particolare da parte di ‘pit bull’, segnalati con grande risalto dagli organi di informazione[1], hanno posto

Quindi con una corretta formazione sul tema le aziende nel contesto attuale potrebbero trarre molteplici vantaggi dalla disciplina del Business Process