Sistemi di raccomandazione

(1)

Sistemi di raccomandazione

Francesco Barile

francesco.barile@unina.it

Università degli Studi di Napoli “Federico II”

(2)

Sistemi di raccomandazione

•

Software tools e tecniche che forniscono suggerimenti su oggetti item che devono essere utilizzati dall'utente

•

Suggerimenti relativi a diversi processi di decision-

making (prodotti da acquistare, musica da ascoltare,

ecc..)

(3)

Sistemi di raccomandazione

•

Vantaggi sia per il fornitore che per l'acquirente:

– Incrementare il numero di oggetti venduti

– Vendere oggetti che l'utente diffcilmente troverebbe

– Incrementare la soddisfazione e la fdelizzazione della clientela

– Capire meglio i desideri dei clienti

(4)

Sistemi di raccomandazione

•

Composto da

– Items: oggetti da raccomandare

– Users: utenti del sistema

– Transactions: interazioni tra l'utente ed il sistema

● Spesso rating, ma non necessariamente

•

In generale, il sistema deve stimare l'utilità R(u, i) che un utente u riceve da un item i

– Spesso corrisponde al prevedere il rating che l'utente darebbe all'item i

(5)

Sistemi di raccomandazione

•

l sistema dati alcuni R(u, i) che conosce deve ottenere R(u, i) che è una stima per quelli che non conosce

•

una volta ottenute le stime di utilità il sistema

raccomanda i K oggetti che massimizano l'utilità per il soggetto u

•

non necessariamente calcola tutte le utilità, potrebbe

usare un'euristica

(6)

Tecniche possibili

•

Content-based

– Il sistema stima l'utilità di un item in base a quelle di item simili già valutati dall'utente

•

Collaborative-fltering

– La stima viene fatta sulla base alle valutazioni sull'item fatte da utenti simili

•

Demographic

– Utenti suddivisi in categorie e le raccomandazioni sono fssate sulla base delle categorie

(7)

Tecniche possibili

•

Knowledge-based

– Raccomandazioni basate su uno specifco dominio di conoscenza su quanto determinati attributi degli item possono essere utili per determianti utenti

•

Community-based

– Raccomandazioni basate sulle preferenze di amici dell'utente

•

Hybrid

– Combinazione delle precedenti

(8)

Problematiche

•

Cold start problem

– All'inizio dell'attività il sistema non ha informazioni su cui basare le raccomandazioni

•

Sparsity

– Gli oggetti valutati dagli utenti possono essere molto pochi rispetto al totale

(9)

A Web Portal for Smart Tourism

(10)

User Profling

This kind of interface introduces the typical information overload problem:

too much information to show and to manage

A possible solution:

an automatic filtering based on the user profile

The user can save all the POI she/he prefers in a favourites list

Simultaneously, by applying a filtering approach, all the information is automatically ordered and filtered according to the user profile

(11)

User Profling

Given a user uⁱ and a set of m POI

the recommendation system, for each user i, aims at building a Preference Profile or a ranking Rⁱ of the user i over P.

Such preference profile is the set represents a partial order over P.

Our goal is not to guess the exact value of r^{i; j} the user i would assign to the item j, but to properly select the k -best items in the preference profile (the ones with the highest rating).

(12)

Collaborative Filtering

The most common approach used in RSs to generate a user preference profile is based on Collaborative Filtering techniques.

This approach suggests items to the user (or defines a rating for an item) by taking into account the preferences of similar users

This similarity is evaluated by considering the common items that they rated

This kind of technique suffers from two problems:

cold start and sparsity

(13)

Collaborative Filtering

The cold-start problem concerns the issue that the system, at the

beginning, has not yet sufficient information about a user, because she/he rated too few items:

so, it cannot properly evaluate the similarity between users.

The sparsity problem regards especially systems where the set of items is extremely large:

most of the users only rated a small subset of the overall.

(14)

A Possible Approach

To use social networks as external sources to obtain users’ information and to overcome the above– mentioned problems.

In detail, we use the most popular social networks: Facebook:com.

The aim of this approach is to examine all cross-domain information, from a user profile, to obtain, then, a recommendation in a specific domain (e.g., touristic preferences).

(15)

User Profling of Facebook

In detail, our method, analyzing user’s likes, tags, check-in and photos on Facebook.com, collects data from users’ profile in every possible domains (age, education level, music, movies, check-in places, etc.)

and uses them to evaluate the similarity between the current user and other users of the system

we do not consider only the specific items that are liked by user (e.g. the Rolling Stones’ page or a check-in at Colosseum), but we evaluate their category (e.g. musician, rock band, history museum, Chinese restaurant, etc.)

(16)

User Profling of Facebook

First calculate a score on specific nodes: a score of a node can represents both a like on a page and a check-in in a specific place.

(17)

User Profling of Facebook

The rate of a like on an item is propagated to its parent category and then to all its hierarchy

(18)

User Profling of Facebook

Finally, the prediction of the preference of items in our specific domain (cultural sites and other POI of the city) is obtained using the explicit ratings or saved itineraries produced on the Web Portal by the most similar users

Problema: il portale deve fornire un supporto anche all'organizzazione di viaggi per gruppi di persone

Gli utenti possono selezionare un gruppo di amici

Come fornire una raccomandazione che soddisfi tutti i membri del gruppo?

(19)

Group Recommendation

•

How to aggregate single group's members preferences to fnd the best for the whole group?

•

Two main techniques:

1) Aggregate single members profles into a group profle

(20)

Group Recommendation

•

How to aggregate single group's members preferences to fnd the best for the whole group?

•

Two main techniques:

2) Provide recommendations for each user and then aggregate them

(21)

Group Recommendation

•

We focuses on the second approach:

– More fexibility in group formation process

– The identity of group's members must be dynamically determined

– Single user’s profle and recommendations build independently from the other group’s members

•

Multi Agent Systems (MAS) techniques can be applied

to aggregate these recommendations

(22)

Problem Defnition

set of Friends

preference profle evaluated by single user recommendation agent

group's preference profle set of POI

F ={1 ,... , n}

P ={1 , ... , m}

Our goal:

∀i∈F , r

_i

={r

_i

(1) ,... , r

_i

(m)}

r

_F

={r

_F

(1) ,... , r

_F

(m)}

•

(23)

MAS techniques for

Group Recommendations

•

Social choice analysis

•

Negotiation

•

Coalitions

•

Non-cooperative Games

•

Weighted Utilities

(24)

Social Choice Analysis

•

Majority based

– try to determine the most popular choices

● Examples: Borda Count, Plurality Voting

•

Consensus based

– try to average among all the possible choices

● Examples: Average, Fairness

•

Role based

– explicitly takes into account possible roles and hierarchical relationships among members

● Examples: Least misery, Most Pleasure

(25)

Majority Based

•

Borda Count

– users' rankings replaced with progressive scores

● 0 for the lowest rated

● 1 for the next one, and so on

– on this scores an average strategy may be applied

•

Plurality Voting

– a sorting among users is necessary

– starting from the frst user, his/her k top rated activities are considered

– select the more voted activity among these and assign the greater score

(26)

Consensus Based

•

Average Strategy

– consider the average of all group members’ ratings

– commonly used as a benchmark for comparison or as base to defne more complex approaches

•

Fairness Strategy

– needs of an ordering among the users

– frst a user is selected and his k -best rated activities are taken into account

– among them chose activity that guarantees the less misery to the other users

(27)

Role Based

•

Least Misery

– can be used when one or more users give a rating particularly low for some activities

– in small groups, it is reasonable to assume that the satisfaction of the group could decrease if one or more components really dislike the activity

– assign to each activity the minimum of its ratings

•

Most Pleasure

– assign to each activity the greatest given rating

(28)

Negotiation

•

A set of agents act on behalf of human group's

members, participating in a cooperative negotiation for generating the group recommendations

•

Two level user profling

– Agents must reproduce preferences and behaviours

•

Protocols used

– Merging ranks with mediator, alternating offers, bilateral alternating offers protocol

(29)

Coalition

•

Reorganization of group in small and cohesive subgroups

•

Problem modelled as coalitional game

– People grouped into disjoint coalitions to maximize the social welfare function of the group.

– The payoff function considers the similarity between coalition members’ ratings, and a weighting factor for the coalition size

•

Not applicable in some cases

(30)

Non-cooperative Games

•

Members viewed as self-interested agents

•

Problem modelled as non-cooperative game:

– Group's members are the players

– Items to recommend are game actions

– Payoff matrix consider user's rankings for each item

•

Problem is modelled as a problem of fnding the Nash

Equilibrium for the game

(31)

Weighted Utilities

•

Results presented in the literature showed that there is no strategy that can be defned as the “best”

– different approaches are better suited in different scenarios

– depending from the characteristics of the specifc group

•

Traditional MAS techniques do not seem to capture all the features of real-world scenarios

•

Information like: Expertise, Infuence on other

members, Dissimilarity

(32)

Weighted Utilities

•

Necessary to integrate information from the social

relationships among group's members with the classical MAS techniques

– and so to derive new strategies more applicable to real-life situations

•

Proposed techniques

– Social Value Function

– Empathetic utility

(33)

Social Value Function

•

Computed on the basis of social interactions between group members [Gartrell et al. 2010, GROUP]

– Used to discriminate the strategy to apply for the group

•

Other characteristics used as weights for single users' preferences:

– Expertise

– Dissimilarity between users

•

All these information are derived from questionnaires

(34)

Emphatetic Utility

•

The satisfaction of an individual depends from both his intrinsic utility and his empathetic utility deriving from the happiness of his neighbours in the social network [Salehi-Abari and C. Boutilier 2014, AAMAS]

– individual preferences are aggregated in a weighted social choice function

– that takes into account local relationships with neighbourhoods in the network

•

They do not specify how to evaluate such numerical

relationships

(35)

Our Proposed Approach

•

Both these approaches do not provide a way to automatically retrieve social information

•

Information are derived from questionnaires or they are not specifed

•

An idea to address this problem is to retrieve social

information from Online Social Networks (OSNs)

(36)

A Leadership Weighted Model

•

To retrieve info about relationship from OSN (facebook.com)

•

To use these information to calculate a leadership value for each user w.r.t. the other group's members

•

Leadership values are used to weight single user

preference (single recommendation) in classic social choice analysis techniques

•

Leadership based on the centrality of the user in the

friends' network

(37)

Ranking System

The set of agents is the same as the set of outcomes (N = O)

Agents are asked to vote to express their opinions about each other, with the goal of determining a social ranking.

Each agent divides the other agents into a set that he likes equally, and a set that he dislikes equally (or, equivalently, has no opinion about).

(38)

Ranking System

Formally, for each i ∈ N the outcome set is partitioned into two sets O_i,1 and O_i,2, with ∀ o₁ ∈ O_i,1, ∀ o₂ ∈ O_i,2,

o1 ≻i o2, and with ∀o, o′ ∈ Oi,k for k ∈ {1, 2}, o ∼ i o′.

A ranking rule must simply return some total preordering on the agents

Arrow’s impossibility system does not hold in the ranking systems

(39)

Properties

(40)

Strong transitivity

Consider a preference profle in which outcome o2 receives at least as many votes as o1, and it is possible to pair up all the voters for o1

with voters from o2 so that each voter for o2 is weakly preferred by the ranking rule to the corresponding voter for o1.

Further assume that o2 receives more votes than o1 and/or that there is at least one pair of voters where the ranking rule strictly prefers the voter for o2 to the voter for o1.

Then the ranking rule satisfes strong transitivity if it always strictly prefers o2 to o1.

(41)

(42)

Weak transitivity

Consider a preference profile in which outcome o₂ receives at least as many votes as o₁, and it is possible to pair up all the voters for o₁ with voters for o₂ who have both voted for exactly the same number of

outcomes so that each voter for o₂ is weakly preferred by the ranking rule to the corresponding voter for o₁.

Further assume that o₂ receives more votes than o₁ and/or that there is at least one pair of voters where the ranking rule strictly prefers the voter for o₂ to the voter for o₁.

Then the ranking rule satisfies weak transitivity if it always strictly prefers o₂ to o₁.

(43)

RIIA, informal

A ranking rule satisfies ranked independence of irrelevant alternatives (RIIA) if the relative rank between pairs of outcomes is always

determined according to the same rule, and this rule depends only on:

1. the number of votes each outcome received; and 2. the relative ranks of these voters

(44)

RIIA, informal

There is no ranking system that always satisfies both weak transitivity and RIIA.

For example:

PageRank algorithm (Google)

• Ranking system that satisfies weak transitivity but not RIIA

(45)

What is a Key User ?

• Large community

• Affects a large number of persons

• Unlikely to live OSN

• Pay for Premium services

3/19

(46)

Users’ Connectivity in OSN

• Structural characteristics of the network

• Well-connected users

• Social Graph

• Centrality measures

– Degree

– Closeness

– Betweenness

4/19

(47)

Users’ Communication Activity

• Exchange of information

• User interaction

• Activity Graph

• Strong/Weak connection

5/19

(48)

Misure di centralità

•

Problema: determinare I nodi più “importanti” di una rete

•

Differenti modi per defnire l'importanza di un nodo, basati sull'analisi dei link tra i nodi:

– Degree: grado entrante del nodo

– Closeness: vicinanza di un nodo agli altri nel grafo

● Per ogni nodo si calcola la distanza media con tutti gli altri

● Minore è il valore ottenuto più il nodo è importante

– Popularity: non conta solo quanti nodi ti raggiungono, ma anche l'importanza di essi

(49)

PageRank Algorithm

•

Il valore di ogni nodo dipende dal valore dei nodi che puntano verso di esso

•

Inizialmente ogni nodo ha una percentuale equivalente di “importanza”

– Ad ogni nodo è assegnata popolarità uguale a 1/n

•

Iterativamente sono ricalcolate le popolarità di tutti i nodi, sulla base delle popolarità dei nodi adiacenti

all'iterazione precedente

(50)

PageRank Algorithm

•

Il processo è ripetuto fnché i valori di ogni nodo non cambiano

•

Si può dimostrare che l'algoritmo converge se sono verifcate alcune condizioni sul grafo

•

Il valore per il nodo x coincide con la probabilità di

arrivare al nodo x partendo da un nodo a caso nel grafo

e seguendo un percorso random

(51)

PageRank Algorithm

•

Criticità che possono compromettere la convergenza dell'algoritmo:

– Nodi senza archi uscenti

● Aggiungere un arco dal nodo ad ogni altro nodo

– Loop tra due nodi

● Salti casuali

(52)

PageRank

• An algorithm used by Google

• PageRank is a link analysis algorithm

• Outputs a probability distribution

• Apply to any graph or network

• Personalized PageRank is used by Twitter

6/19

(53)

Novel PageRank

• Identify key users

• First step

– Derive a weighted activity graph

• Second step

– Determine users’ centrality scores

7/19

(54)

Weighted Activity Graph

• Users who actually communicate

• Graph Links

• Informational and Normative influence

8/19

(55)

Weighted Activity Graph

• Graph representation

– Symmetric adjacency matrix

• Weight of an undirected activity link

C^ij – number of communication activities (i  j) C^ji – number of communication activities (j  i)

• Activity Graph

n – Number of users

9/19

(56)

Users’ Centrality Scores

• PageRank used by Google

N – Total number of webpages

Oj – Number of outgoing links from page j Bi – Set of web pages pointing to web page i d – dampening factor (usually set to 0.85)

• Novel PageRank

• Fi – Set of users connected to i

10/19

(57)

Demonstration and Evaluation

• Facebook dataset – New Orleans

– Set of users (63,731)

– Set of social links (817,090) – Communication activity

– 832,277 wall posts – BFS Crawler

11/19

(58)

Demonstration and Evaluation

12/19

(59)

Pros and Cons

• Great results

• Complexity O(n²)

• Social and Activity Graph

• Offline contacts

• Direction of posts/messages

• Privacy risks

13/19

(60)

Relaxations of transitivity (Dropping)

Add the requirements that an agent’s rank can improve only when he receives more votes (“positive response”), and

that the agents’ identities are ignored by the ranking function (“anonymity”).

Approval voting is the only ranking rule that satisfies RIIA, positive response, and anonymity.

(61)

Strong quasi-transitivity

Consider a preference profile in which outcome o2 receives at least as many votes as o1, and it is possible to pair up all the voters for o1 with voters from o2 so that each voter for o2 is weakly preferred by the ranking rule to the

corresponding voter for o1.

Then the ranking rule satisfies strong quasi-transitivity if it weakly prefers o2 to o1, and strictly strong prefers o2 to o1

if either o1 received no votes or each paired voter for o2 is strictly preferred by the ranking rule to the corresponding voter for o1.

No anonimity

(62)

Page-rank

forall i ∈ N do rank(i) ← 0 repeat

forall i ∈ N do

if |voters_for(i)| > 0 then

rank(i) ← 1/n+1 [|voters_for(i)| + maxj∈voters_for(i) rank(j)]

else

rank(i) ← 0 until rank converges

O(n³)

(63)

Hits Algorithm

•

Ogni nodo ha una doppia “identità” (Hub e Autority)

(64)

Hits Algorithm

•

Due tipi di ranking per ogni nodo:

– Hub weight: somma degli autority weight dei nodi puntati dall'hub

– Autiority weight: somma degli hub weight che puntano l'autority

•

Si inizializzano tutti i pesi a 1

•

Ad ogni iterazione si ricalcolano gli hubs weights e poi

gli autority weights, che poi vengono normalizzati

(65)

Hits Algorithm

•

Alla fne otterremo per ogni nodo due pesi

•

I nodi con alto Hub Weight puntano nodi con alto Autority Weight

– Gli hubs sono i nodi che

“indirizzano” meglio la ricerca

– Gli autority sono i nodi che rappresentano I risultati migliori

(66)

A Leadership Weighted Model

•

Leadership value

– extension of google PageRank algorithm

– centrality measure that takes into account the degree of activity of a person and the directionality of specifc

communication activities between pairs of users

– use a combination of data collected from the OSN

– as in the classic PageRank, each user inherits a portion of popularity from other users

(67)

A Leadership Weighted Model

set of Friends, and

R (x )= 1 −d

∣F∣ +d ∑

i∈ F

w (i , x )

w (i) R (i)

•

We defne:

F ={1 , ... , n}

Where:

x , i ∈F

d is a dampening factor set to 0.85

••

w(i, x) estimate i-th friend’s popularity and the weight of the communication activity of the i-th friend

towards the friend x

••

i-th froens's global activity on the whole group (used to normalize)

••

w (i)= ∑

j∈F

w (i , j)

(68)

A Leadership Weighted Model

set of Friends, and

•

We defne:

Where:

d is a dampening factor set to 0.85

••

w(i, x) estimate i-th friend’s popularity and the weight of the communication activity of the i-th friend

towards the friend x

••

i-th friend's global activity on the whole group (used to normalize)

••

R (x )= 1 −d

∣F∣ +d ∑

i∈ F

w (i , x )

w (i) R (i)

x , i ∈F

F ={1 , ... , n}

w (i)= ∑

j∈F

w (i , j)

(69)

A Leadership Weighted Model

(70)

A Leadership Weighted Model

set of Friends

F ={1 , ... , n}

P ={1 , ... , m} ^{set of POI}

•

∀i∈F , R(i) leadership value of Friend i

∀i∈F , r

_i

={r

_i

(1) ,... , r

_i

preference profle (m)}

evaluated by single user recommendation agent

•

We defne: ∀ x∈P , r

_avg

(x)= 1

n ∑

i=1 n

(R(i)⋅r

_i

(x)) r

_avg

={r

_avg

(1), ... , r

_avg

(m)}

=>

(71)

A Leadership Weighted Model

set of Friends

set of POI

•

leadership value of Friend i

preference profle evaluated by single user recommendation agent

•

F ={1 ,... , n}

P ={1 , ... , m}

∀i∈F , R(i)

∀i∈F , r

_i

={r

_i

(1) ,... , r

_i

(m)}

(72)

Fairness Strategy

The proposed fairness strategy (rfair) tries to accommodate everyone in the group, and requires an ordering of the users

The frst user’s top K choices are selected. Among them, the one that causes the least misery to the others is selected (in case of items with the same rating a non deterministic choice is made)

and the process is repeated with the successive user in the rank

The group POI ranking values are assigned in a descending order from m to 1.

Finally, the group recommendation will correspond to the K-POI with the highest rfair values.

(73)

Fairness Strategy

One of the main issues with the use of this strategy is that, by

changing the users’ ordering, the selection process will produce a different result in the outcome

We propose to use the D(i) values to provide a ranking/to sort the users

(74)

Weighted Average Strategy

(75)

Valutazione del sistema

•

Problematiche nella valutazione di sistemi di raccomandazione di gruppo:

– Mancanza di dataset adatti

– Servirebbe un dataset contenente rating indivduali, rating di gruppo, e interazioni tra gli utenti

•

Necessario effettuare “pilot study”

•

Una valutazione più accurata è possibile solo quando il

sistema sarà in funzione

(76)

Valutazione del sistema

•

Due Esperimenti

•

Nel primo caso valutiamo i benefci dell'introduzione della leadership confrontando i risultati con quelli

ottenuti con tecniche standard non pesate

•

Nel secondo valutiamo la soddisfazione degli utenti

sulle raccomandazioni fornite

(77)

Practical Example 1

•

Task: planning a trip in a city

– We evaluated only the users’ behaviors in the decision making process and the impact of our consensus function on a simple case of binary selection of POI

•

14 groups of friends

– composed in the average of 3,36 friends

•

46 users (26 men and 20 women)

– average age was 27,3

– graduate education

(78)

Practical Example 1

•

Phase 1

– each user register on a specifc web site using the credentials of facebook.com

● he/she was asked to imagine to plan a one-day visit in a specifc city and to select:

● three activities for the day, from a checklist of ten items

● two restaurants, from a check list of eight

(79)

Practical Example 1

•

Phase 2

– The group was, asked to discuss, face-to-face, in order to obtain a shared and unique decision for the group (r_GT).

(80)

Practical Example 1

(81)

Results

•

Valutazione: similarità con la scelta fnale del gruppo

– Le strategie “pesate” hanno risultato migliori

– In particolare, ravg ha risultati migliori per quasi tutti i gruppi valutati

(82)

Practical Example 2

•

Task: planning a trip in a city

– We evaluated the user's acceptance and satisfaction respect to the recommendations provided

•

17 groups of friends

– composed in the average of 3,1 friends

•

53 users (26 men and 27 women)

– average age was 26,8

– graduate education

(83)

Practical Example 2

•

Phase 1

– each user register on a specifc web site using the credentials of facebook.com

● he/she was asked to imagine to plan a one-day visit in a specifc city and to evaluate, providing ratings, ten activities and eight restaurants

(84)

Practical Example 2

•

Phase 2

– Users were presented with both the recommendation provided by using the rfair and ravg functions (as in the previous case the top fve activities were recommended).

– The associated ratings for the proposed activities provided by all the membersof the group in the frst phase were shown.

(85)

Practical Example 2

•

Phase 2

– Finally, each user was requested to answer the following questions:

1) Which of the two proposed itineraries do you prefer?

[None/First/Second/Both]

2) How do you rate the frst itinerary? [1 to 5]

3) How do you rate the second itinerary? [1 to 5]

4) How much have you been infuenced in the evaluations by the knowledge of your friends’ ratings? [1 to 5].

(86)

Results

– Mediamente 2 membri su 3 accettano le raccomandazioni proposte

– Considerando gli utenti singoli, 49 su 53 hanno accettato almeno una soluzione proposta

– Le valutazioni sulle soluzioni proposte sono buone (4 su 5)

(87)

Conclusions

•

Semplici weighted social function possono aiutare nella raccomandazione di gruppo

•

La strategia AVG ottiene risultati migliori in caso di selezioni binarie, mentre la FAIR sembra essere più performante in caso di rating espliciti

•

Necessarie valutazioni più approfondite in fase di

sperimentazione del portale

(88)