• Non ci sono risultati.

The textual file edges.csv

N/A
N/A
Protected

Academic year: 2021

Condividi "The textual file edges.csv "

Copied!
31
0
0

Testo completo

(1)

GraphFrames

(2)

GraphFrame

Input:

The textual file vertexes.csv

▪ It contains the vertexes of a graph

Each vertex is characterized by

▪ id (string): user identifier

▪ name (string): user name

▪ age (integer): user age

(3)

The textual file edges.csv

▪ It contains the edges of a graph

Each edge is characterized by

▪ src (string): source vertex

▪ dst (string): destination vertex

▪ linktype (string): “follow”or “friend”

(4)

Output:

For each user with at least one follower, store in the output folder the number of followers

▪ One user per line

▪ Format: user id, number of followers

Use the CSV format to store the result

(5)

Input graph example

u1

Alice 34

u5 u6 u4

u3

John 30

U2

Bob 36

u7

friend

friend friend

follow

follow follow friend

friend friend

friend

(6)

Result

id numFollowers

u3 2

u6 2

u2 1

(7)

GraphFrame

Input:

The textual file vertexes.csv

▪ It contains the vertexes of a graph

Each vertex is characterized by

▪ id (string): user identifier

▪ name (string): user name

▪ age (integer): user age

(8)

The textual file edges.csv

▪ It contains the edges of a graph

Each edge is characterized by

▪ src (string): source vertex

▪ dst (string): destination vertex

▪ linktype (string): “follow”or “friend”

(9)

Output:

Consider only the users with at least one follower

Store in the output folder the user(s) with the maximum number of followers

▪ One user per line

▪ Format: user id, number of followers

Use the CSV format to store the result

(10)

Input graph example

u1

Alice 34

u6

Adel 36

u5

Paul 32

u4

David 29

u3

John 30

U2

Bob 36

u7

Eddy 60 friend

friend

friend friend

follow

follow follow

follow friend

friend friend

friend

follow

(11)

Result

id numFollowers

u3 2

u6 2

(12)

GraphFrame

Input:

The textual file vertexes.csv

▪ It contains the vertexes of a graph

Each vertex is characterized by

▪ id (string): user identifier

▪ name (string): user name

▪ age (integer): user age

(13)

The textual file edges.csv

▪ It contains the edges of a graph

Each edge is characterized by

▪ src (string): source vertex

▪ dst (string): destination vertex

▪ linktype (string): “follow”or “friend”

(14)

Output:

The pairs of users Ux, Uy such that

▪ Ux is friend of Uy (link “friend” from Ux to Uy)

▪ Uy is not friend of Uy (no link “friend” from Uy to Ux)

One pair Ux,Uy per line

Format: idUx, idUy

Use the CSV format to store the result

(15)

Input graph example

u1

Alice 34

u5 u6 u4

u3

John 30

U2

Bob 36

u7

friend

friend friend

follow

follow follow

friend friend

(16)

Result

IdFriend IdNotFriend

u4 u1

u1 u2

(17)

GraphFrame

Input:

The textual file vertexes.csv

▪ It contains the vertexes of a graph

Each vertex is characterized by

▪ id (string): vertex identifier

▪ entityType (string): “user” or “topic”

▪ name (string): name of the entity

(18)

The textual file edges.csv

▪ It contains the edges of a graph

Each edge is characterized by

▪ src (string): source vertex

▪ dst (string): destination vertex

▪ linktype (string): “expertOf” or “follow” or “correlated”

(19)

Output:

The followed topics for each user

One pair (user name, followed topic) per line

Format: username, followed topic

Use the CSV format to store the result

(20)

Input graph example

like

follow follow

expertOf

V1

“user”

“Paolo”

V4

“topic”

“Big Data”

V3

“user”

“David”

V2

“topic”

“SQL”

V5

“user”

“John”

correlated correlated follow

follow

(21)

Result

username topic Paolo Big Data

David SQL

David Big Data

(22)

GraphFrame

Input:

The textual file vertexes.csv

▪ It contains the vertexes of a graph

Each vertex is characterized by

▪ id (string): vertex identifier

▪ entityType (string): “user” or “topic”

▪ name (string): name of the entity

(23)

The textual file edges.csv

▪ It contains the edges of a graph

Each edge is characterized by

▪ src (string): source vertex

▪ dst (string): destination vertex

▪ linktype (string): “expertOf” or “follow” or “correlated”

(24)

Output:

The names of the users who follow a topic correlated with the “Big Data” topic

One user name per line

Format: username

Use the CSV format to store the result

(25)

Input graph example

like

follow

expertOf

V1

“user”

“Paolo”

V3 V4

V2

“topic”

“SQL”

V5

“user”

“John”

correlated correlated follow

follow

(26)

Result

username David

(27)

GraphFrame

Input:

The textual file vertexes.csv

▪ It contains the vertexes of a graph

Each vertex is characterized by

▪ id (string): user identifier

▪ name (string): user name

▪ age (integer): user age

(28)

The textual file edges.csv

▪ It contains the edges of a graph

Each edge is characterized by

▪ src (string): source vertex

▪ dst (string): destination vertex

▪ linktype (string): “follow”or “friend”

(29)

Output:

Select the users who can reach user u1 in less than 3 hops (i.e., at most two edges)

▪ Do not consider u1 itself

For each of the selected users, store in the output folder his/her name and the minimum number of hops to reach user u1

▪ One user per line

▪ Format: user name, #hops to user u1

(30)

Input graph example

u1

Alice 34

u6

Adel 36

u5

Paul 32

u4

David 29

u3

John 30

U2

Bob 36

u7

Eddy 60 friend

friend

friend friend

follow

follow follow

follow friend

friend friend

friend

(31)

Result

name numHops

Bob 1

John 2

David 1

Paul 1

Riferimenti

Documenti correlati

ACE: Angiotensin converting enzyme; COVID‑19: Coronavirus 19 disease; MERS: Middle East Respiratory Syndrome; SARS: Severe acute respiratory syn‑ drome; SARS‑Cov‑2: Severe

Epidemiology and outcome of invasive fungal infection in adult hematopoietic stem cell transplant recipients: analysis of Multicenter Prospective Antifungal Therapy

This neglect of time is expressed by the emphasis on actor preferences instead of substantive policy change (i.e. European University Institute. Available Open Access on

The comparison between the measured differential cross section for the associated production of a Z boson and exactly one b jet and the theoretical predictions provided both by

The definition of the Dirac operator perturbed by a concentrated nonlinearity is given and it is shown how to split the nonlinear flow into the sum of the free flow plus a

Solution proposed by Roberto Tauraso, Dipartimento di Matematica, Universit`a di Roma “Tor Vergata”, via della Ricerca Scientifica, 00133 Roma, Italy. If at least one of the functions

● ClassicAnalyzer: splits according to a set of rules (basically, non-letters) + lowercase + stopword removal + normalization..

We describe the development process of the mobile app “Naturblick” as an example of a user-centred design in citizen science and discuss digital user feedback with regard to the