Sistemi di raccomandazione
Francesco Barile
francesco.barile@unina.it
Università degli Studi di Napoli “Federico II”
Sistemi di raccomandazione
•
Software tools e tecniche che forniscono suggerimenti su oggetti item che devono essere utilizzati dall'utente
•
Suggerimenti relativi a diversi processi di decision-
making (prodotti da acquistare, musica da ascoltare,
ecc..)
Sistemi di raccomandazione
•
Vantaggi sia per il fornitore che per l'acquirente:
– Incrementare il numero di oggetti venduti
– Vendere oggetti che l'utente diffcilmente troverebbe
– Incrementare la soddisfazione e la fdelizzazione della clientela
– Capire meglio i desideri dei clienti
Sistemi di raccomandazione
•
Composto da
– Items: oggetti da raccomandare
– Users: utenti del sistema
– Transactions: interazioni tra l'utente ed il sistema
● Spesso rating, ma non necessariamente
•
In generale, il sistema deve stimare l'utilità R(u, i) che un utente u riceve da un item i
– Spesso corrisponde al prevedere il rating che l'utente darebbe all'item i
Sistemi di raccomandazione
•
l sistema dati alcuni R(u, i) che conosce deve ottenere R(u, i) che è una stima per quelli che non conosce
•
una volta ottenute le stime di utilità il sistema
raccomanda i K oggetti che massimizano l'utilità per il soggetto u
•
non necessariamente calcola tutte le utilità, potrebbe
usare un'euristica
Tecniche possibili
•
Content-based
– Il sistema stima l'utilità di un item in base a quelle di item simili già valutati dall'utente
•
Collaborative-fltering
– La stima viene fatta sulla base alle valutazioni sull'item fatte da utenti simili
•
Demographic
– Utenti suddivisi in categorie e le raccomandazioni sono fssate sulla base delle categorie
Tecniche possibili
•
Knowledge-based
– Raccomandazioni basate su uno specifco dominio di conoscenza su quanto determinati attributi degli item possono essere utili per determianti utenti
•
Community-based
– Raccomandazioni basate sulle preferenze di amici dell'utente
•
Hybrid
– Combinazione delle precedenti
Problematiche
•
Cold start problem
– All'inizio dell'attività il sistema non ha informazioni su cui basare le raccomandazioni
•
Sparsity
– Gli oggetti valutati dagli utenti possono essere molto pochi rispetto al totale
A Web Portal for Smart Tourism
User Profling
This kind of interface introduces the typical information overload problem:
too much information to show and to manage
A possible solution:
an automatic filtering based on the user profile
The user can save all the POI she/he prefers in a favourites list
Simultaneously, by applying a filtering approach, all the information is automatically ordered and filtered according to the user profile
User Profling
Given a user ui and a set of m POI
the recommendation system, for each user i, aims at building a Preference Profile or a ranking Ri of the user i over P.
Such preference profile is the set represents a partial order over P.
Our goal is not to guess the exact value of ri; j the user i would assign to the item j, but to properly select the k -best items in the preference profile (the ones with the highest rating).
Collaborative Filtering
The most common approach used in RSs to generate a user preference profile is based on Collaborative Filtering techniques.
This approach suggests items to the user (or defines a rating for an item) by taking into account the preferences of similar users
This similarity is evaluated by considering the common items that they rated
This kind of technique suffers from two problems:
cold start and sparsity
Collaborative Filtering
The cold-start problem concerns the issue that the system, at the
beginning, has not yet sufficient information about a user, because she/he rated too few items:
so, it cannot properly evaluate the similarity between users.
The sparsity problem regards especially systems where the set of items is extremely large:
most of the users only rated a small subset of the overall.
A Possible Approach
To use social networks as external sources to obtain users’ information and to overcome the above– mentioned problems.
In detail, we use the most popular social networks: Facebook:com.
The aim of this approach is to examine all cross-domain information, from a user profile, to obtain, then, a recommendation in a specific domain (e.g., touristic preferences).
User Profling of Facebook
In detail, our method, analyzing user’s likes, tags, check-in and photos on Facebook.com, collects data from users’ profile in every possible domains (age, education level, music, movies, check-in places, etc.)
and uses them to evaluate the similarity between the current user and other users of the system
we do not consider only the specific items that are liked by user (e.g. the Rolling Stones’ page or a check-in at Colosseum), but we evaluate their category (e.g. musician, rock band, history museum, Chinese restaurant, etc.)
User Profling of Facebook
First calculate a score on specific nodes: a score of a node can represents both a like on a page and a check-in in a specific place.
User Profling of Facebook
The rate of a like on an item is propagated to its parent category and then to all its hierarchy
User Profling of Facebook
Finally, the prediction of the preference of items in our specific domain (cultural sites and other POI of the city) is obtained using the explicit ratings or saved itineraries produced on the Web Portal by the most similar users
Problema: il portale deve fornire un supporto anche all'organizzazione di viaggi per gruppi di persone
Gli utenti possono selezionare un gruppo di amici
Come fornire una raccomandazione che soddisfi tutti i membri del gruppo?
Group Recommendation
•
How to aggregate single group's members preferences to fnd the best for the whole group?
•
Two main techniques:
1) Aggregate single members profles into a group profle
Group Recommendation
•
How to aggregate single group's members preferences to fnd the best for the whole group?
•
Two main techniques:
2) Provide recommendations for each user and then aggregate them
Group Recommendation
•
We focuses on the second approach:
– More fexibility in group formation process
– The identity of group's members must be dynamically determined
– Single user’s profle and recommendations build independently from the other group’s members
•
Multi Agent Systems (MAS) techniques can be applied
to aggregate these recommendations
Problem Defnition
set of Friends
preference profle evaluated by single user recommendation agent
group's preference profle set of POI
F ={1 ,... , n}
P ={1 , ... , m}
Our goal:
∀i∈F , r
i={r
i(1) ,... , r
i(m)}
r
F={r
F(1) ,... , r
F(m)}
•
•
•
MAS techniques for
Group Recommendations
•
Social choice analysis
•
Negotiation
•
Coalitions
•
Non-cooperative Games
•
Weighted Utilities
Social Choice Analysis
•
Majority based
– try to determine the most popular choices
● Examples: Borda Count, Plurality Voting
•
Consensus based
– try to average among all the possible choices
● Examples: Average, Fairness
•
Role based
– explicitly takes into account possible roles and hierarchical relationships among members
● Examples: Least misery, Most Pleasure
Majority Based
•
Borda Count
– users' rankings replaced with progressive scores
● 0 for the lowest rated
● 1 for the next one, and so on
– on this scores an average strategy may be applied
•
Plurality Voting
– a sorting among users is necessary
– starting from the frst user, his/her k top rated activities are considered
– select the more voted activity among these and assign the greater score
Consensus Based
•
Average Strategy
– consider the average of all group members’ ratings
– commonly used as a benchmark for comparison or as base to defne more complex approaches
•
Fairness Strategy
– needs of an ordering among the users
– frst a user is selected and his k -best rated activities are taken into account
– among them chose activity that guarantees the less misery to the other users
Role Based
•
Least Misery
– can be used when one or more users give a rating particularly low for some activities
– in small groups, it is reasonable to assume that the satisfaction of the group could decrease if one or more components really dislike the activity
– assign to each activity the minimum of its ratings
•
Most Pleasure
– assign to each activity the greatest given rating
Negotiation
•
A set of agents act on behalf of human group's
members, participating in a cooperative negotiation for generating the group recommendations
•
Two level user profling
– Agents must reproduce preferences and behaviours
•
Protocols used
– Merging ranks with mediator, alternating offers, bilateral alternating offers protocol
Coalition
•
Reorganization of group in small and cohesive subgroups
•
Problem modelled as coalitional game
– People grouped into disjoint coalitions to maximize the social welfare function of the group.
– The payoff function considers the similarity between coalition members’ ratings, and a weighting factor for the coalition size
•
Not applicable in some cases
Non-cooperative Games
•
Members viewed as self-interested agents
•
Problem modelled as non-cooperative game:
– Group's members are the players
– Items to recommend are game actions
– Payoff matrix consider user's rankings for each item
•
Problem is modelled as a problem of fnding the Nash
Equilibrium for the game
Weighted Utilities
•
Results presented in the literature showed that there is no strategy that can be defned as the “best”
– different approaches are better suited in different scenarios
– depending from the characteristics of the specifc group
•
Traditional MAS techniques do not seem to capture all the features of real-world scenarios
•
Information like: Expertise, Infuence on other
members, Dissimilarity
Weighted Utilities
•
Necessary to integrate information from the social
relationships among group's members with the classical MAS techniques
– and so to derive new strategies more applicable to real-life situations
•
Proposed techniques
– Social Value Function
– Empathetic utility
Social Value Function
•
Computed on the basis of social interactions between group members [Gartrell et al. 2010, GROUP]
– Used to discriminate the strategy to apply for the group
•
Other characteristics used as weights for single users' preferences:
– Expertise
– Dissimilarity between users
•
All these information are derived from questionnaires
Emphatetic Utility
•
The satisfaction of an individual depends from both his intrinsic utility and his empathetic utility deriving from the happiness of his neighbours in the social network [Salehi-Abari and C. Boutilier 2014, AAMAS]
– individual preferences are aggregated in a weighted social choice function
– that takes into account local relationships with neighbourhoods in the network
•
They do not specify how to evaluate such numerical
relationships
Our Proposed Approach
•
Both these approaches do not provide a way to automatically retrieve social information
•
Information are derived from questionnaires or they are not specifed
•
An idea to address this problem is to retrieve social
information from Online Social Networks (OSNs)
A Leadership Weighted Model
•
To retrieve info about relationship from OSN (facebook.com)
•
To use these information to calculate a leadership value for each user w.r.t. the other group's members
•
Leadership values are used to weight single user
preference (single recommendation) in classic social choice analysis techniques
•
Leadership based on the centrality of the user in the
friends' network
Ranking System
The set of agents is the same as the set of outcomes (N = O)
Agents are asked to vote to express their opinions about each other, with the goal of determining a social ranking.
Each agent divides the other agents into a set that he likes equally, and a set that he dislikes equally (or, equivalently, has no opinion about).
Ranking System
Formally, for each i ∈ N the outcome set is partitioned into two sets Oi,1 and Oi,2, with ∀ o1 ∈ Oi,1, ∀ o2 ∈ Oi,2,
o1 ≻i o2, and with ∀o, o′ ∈ Oi,k for k ∈ {1, 2}, o ∼ i o′.
A ranking rule must simply return some total preordering on the agents
Arrow’s impossibility system does not hold in the ranking systems
Properties
Strong transitivity
Consider a preference profle in which outcome o2 receives at least as many votes as o1, and it is possible to pair up all the voters for o1
with voters from o2 so that each voter for o2 is weakly preferred by the ranking rule to the corresponding voter for o1.
Further assume that o2 receives more votes than o1 and/or that there is at least one pair of voters where the ranking rule strictly prefers the voter for o2 to the voter for o1.
Then the ranking rule satisfes strong transitivity if it always strictly prefers o2 to o1.
Weak transitivity
Consider a preference profile in which outcome o2 receives at least as many votes as o1, and it is possible to pair up all the voters for o1 with voters for o2 who have both voted for exactly the same number of
outcomes so that each voter for o2 is weakly preferred by the ranking rule to the corresponding voter for o1.
Further assume that o2 receives more votes than o1 and/or that there is at least one pair of voters where the ranking rule strictly prefers the voter for o2 to the voter for o1.
Then the ranking rule satisfies weak transitivity if it always strictly prefers o2 to o1.
RIIA, informal
A ranking rule satisfies ranked independence of irrelevant alternatives (RIIA) if the relative rank between pairs of outcomes is always
determined according to the same rule, and this rule depends only on:
1. the number of votes each outcome received; and 2. the relative ranks of these voters
RIIA, informal
There is no ranking system that always satisfies both weak transitivity and RIIA.
For example:
PageRank algorithm (Google)
• Ranking system that satisfies weak transitivity but not RIIA
What is a Key User ?
• Large community
• Affects a large number of persons
• Unlikely to live OSN
• Pay for Premium services
3/19
Users’ Connectivity in OSN
• Structural characteristics of the network
• Well-connected users
• Social Graph
• Centrality measures
– Degree
– Closeness
– Betweenness
4/19
Users’ Communication Activity
• Exchange of information
• User interaction
• Activity Graph
• Strong/Weak connection
5/19
Misure di centralità
•
Problema: determinare I nodi più “importanti” di una rete
•
Differenti modi per defnire l'importanza di un nodo, basati sull'analisi dei link tra i nodi:
– Degree: grado entrante del nodo
– Closeness: vicinanza di un nodo agli altri nel grafo
● Per ogni nodo si calcola la distanza media con tutti gli altri
● Minore è il valore ottenuto più il nodo è importante
– Popularity: non conta solo quanti nodi ti raggiungono, ma anche l'importanza di essi
PageRank Algorithm
•
Il valore di ogni nodo dipende dal valore dei nodi che puntano verso di esso
•
Inizialmente ogni nodo ha una percentuale equivalente di “importanza”
– Ad ogni nodo è assegnata popolarità uguale a 1/n
•
Iterativamente sono ricalcolate le popolarità di tutti i nodi, sulla base delle popolarità dei nodi adiacenti
all'iterazione precedente
PageRank Algorithm
•
Il processo è ripetuto fnché i valori di ogni nodo non cambiano
•
Si può dimostrare che l'algoritmo converge se sono verifcate alcune condizioni sul grafo
•
Il valore per il nodo x coincide con la probabilità di
arrivare al nodo x partendo da un nodo a caso nel grafo
e seguendo un percorso random
PageRank Algorithm
•
Criticità che possono compromettere la convergenza dell'algoritmo:
– Nodi senza archi uscenti
● Aggiungere un arco dal nodo ad ogni altro nodo
– Loop tra due nodi
● Salti casuali
PageRank
• An algorithm used by Google
• PageRank is a link analysis algorithm
• Outputs a probability distribution
• Apply to any graph or network
• Personalized PageRank is used by Twitter
6/19
Novel PageRank
• Identify key users
• First step
– Derive a weighted activity graph
• Second step
– Determine users’ centrality scores
7/19
Weighted Activity Graph
• Users who actually communicate
• Graph Links
• Informational and Normative influence
8/19
Weighted Activity Graph
• Graph representation
– Symmetric adjacency matrix
• Weight of an undirected activity link
Cij – number of communication activities (i j) Cji – number of communication activities (j i)
• Activity Graph
n – Number of users
9/19
Users’ Centrality Scores
• PageRank used by Google
N – Total number of webpages
Oj – Number of outgoing links from page j Bi – Set of web pages pointing to web page i d – dampening factor (usually set to 0.85)
• Novel PageRank
• Fi – Set of users connected to i
10/19
Demonstration and Evaluation
• Facebook dataset – New Orleans
– Set of users (63,731)
– Set of social links (817,090) – Communication activity
– 832,277 wall posts – BFS Crawler
11/19
Demonstration and Evaluation
12/19
Pros and Cons
• Great results
• Complexity O(n²)
• Social and Activity Graph
• Offline contacts
• Direction of posts/messages
• Privacy risks
13/19
Relaxations of transitivity (Dropping)
Add the requirements that an agent’s rank can improve only when he receives more votes (“positive response”), and
that the agents’ identities are ignored by the ranking function (“anonymity”).
Approval voting is the only ranking rule that satisfies RIIA, positive response, and anonymity.
Strong quasi-transitivity
Consider a preference profile in which outcome o2 receives at least as many votes as o1, and it is possible to pair up all the voters for o1 with voters from o2 so that each voter for o2 is weakly preferred by the ranking rule to the
corresponding voter for o1.
Then the ranking rule satisfies strong quasi-transitivity if it weakly prefers o2 to o1, and strictly strong prefers o2 to o1
if either o1 received no votes or each paired voter for o2 is strictly preferred by the ranking rule to the corresponding voter for o1.
No anonimity
Page-rank
forall i ∈ N do rank(i) ← 0 repeat
forall i ∈ N do
if |voters_for(i)| > 0 then
rank(i) ← 1/n+1 [|voters_for(i)| + maxj∈voters_for(i) rank(j)]
else
rank(i) ← 0 until rank converges
O(n3)
Hits Algorithm
•
Ogni nodo ha una doppia “identità” (Hub e Autority)
Hits Algorithm
•
Due tipi di ranking per ogni nodo:
– Hub weight: somma degli autority weight dei nodi puntati dall'hub
– Autiority weight: somma degli hub weight che puntano l'autority
•
Si inizializzano tutti i pesi a 1
•
Ad ogni iterazione si ricalcolano gli hubs weights e poi
gli autority weights, che poi vengono normalizzati
Hits Algorithm
•
Alla fne otterremo per ogni nodo due pesi
•
I nodi con alto Hub Weight puntano nodi con alto Autority Weight
– Gli hubs sono i nodi che
“indirizzano” meglio la ricerca
– Gli autority sono i nodi che rappresentano I risultati migliori
A Leadership Weighted Model
•
Leadership value
– extension of google PageRank algorithm
– centrality measure that takes into account the degree of activity of a person and the directionality of specifc
communication activities between pairs of users
– use a combination of data collected from the OSN
– as in the classic PageRank, each user inherits a portion of popularity from other users
A Leadership Weighted Model
set of Friends, and
R (x )= 1 −d
∣F∣ +d ∑
i∈ F
w (i , x )
w (i) R (i)
•
We defne:
F ={1 , ... , n}
Where:
x , i ∈F
d is a dampening factor set to 0.85
••
w(i, x) estimate i-th friend’s popularity and the weight of the communication activity of the i-th friend
towards the friend x
••
i-th froens's global activity on the whole group (used to normalize)
••
w (i)= ∑
j∈F
w (i , j)
A Leadership Weighted Model
set of Friends, and
•
We defne:
Where:
d is a dampening factor set to 0.85
••
w(i, x) estimate i-th friend’s popularity and the weight of the communication activity of the i-th friend
towards the friend x
••
i-th friend's global activity on the whole group (used to normalize)
••
R (x )= 1 −d
∣F∣ +d ∑
i∈ F
w (i , x )
w (i) R (i)
x , i ∈F
F ={1 , ... , n}
w (i)= ∑
j∈F
w (i , j)
A Leadership Weighted Model
A Leadership Weighted Model
set of Friends
F ={1 , ... , n}
P ={1 , ... , m} set of POI
•
∀i∈F , R(i) leadership value of Friend i
∀i∈F , r
i={r
i(1) ,... , r
ipreference profle (m)}
evaluated by single user recommendation agent
•
•
•
We defne: ∀ x∈P , r
avg(x)= 1
n ∑
i=1 n
(R(i)⋅r
i(x)) r
avg={r
avg(1), ... , r
avg(m)}
=>
A Leadership Weighted Model
set of Friends
set of POI
•
leadership value of Friend i
preference profle evaluated by single user recommendation agent
•
•
•
F ={1 ,... , n}
P ={1 , ... , m}
∀i∈F , R(i)
∀i∈F , r
i={r
i(1) ,... , r
i(m)}
Fairness Strategy
The proposed fairness strategy (rfair) tries to accommodate everyone in the group, and requires an ordering of the users
The frst user’s top K choices are selected. Among them, the one that causes the least misery to the others is selected (in case of items with the same rating a non deterministic choice is made)
and the process is repeated with the successive user in the rank
The group POI ranking values are assigned in a descending order from m to 1.
Finally, the group recommendation will correspond to the K-POI with the highest rfair values.
Fairness Strategy
One of the main issues with the use of this strategy is that, by
changing the users’ ordering, the selection process will produce a different result in the outcome
We propose to use the D(i) values to provide a ranking/to sort the users
Weighted Average Strategy
Valutazione del sistema
•
Problematiche nella valutazione di sistemi di raccomandazione di gruppo:
– Mancanza di dataset adatti
– Servirebbe un dataset contenente rating indivduali, rating di gruppo, e interazioni tra gli utenti
•
Necessario effettuare “pilot study”
•
Una valutazione più accurata è possibile solo quando il
sistema sarà in funzione
Valutazione del sistema
•
Due Esperimenti
•
Nel primo caso valutiamo i benefci dell'introduzione della leadership confrontando i risultati con quelli
ottenuti con tecniche standard non pesate
•
Nel secondo valutiamo la soddisfazione degli utenti
sulle raccomandazioni fornite
Practical Example 1
•
Task: planning a trip in a city
– We evaluated only the users’ behaviors in the decision making process and the impact of our consensus function on a simple case of binary selection of POI
•
14 groups of friends
– composed in the average of 3,36 friends
•
46 users (26 men and 20 women)
– average age was 27,3
– graduate education
Practical Example 1
•
Phase 1
– each user register on a specifc web site using the credentials of facebook.com
● he/she was asked to imagine to plan a one-day visit in a specifc city and to select:
● three activities for the day, from a checklist of ten items
● two restaurants, from a check list of eight
Practical Example 1
•
Phase 2
– The group was, asked to discuss, face-to-face, in order to obtain a shared and unique decision for the group (rGT).
Practical Example 1
Results
•
Valutazione: similarità con la scelta fnale del gruppo
– Le strategie “pesate” hanno risultato migliori
– In particolare, ravg ha risultati migliori per quasi tutti i gruppi valutati
Practical Example 2
•
Task: planning a trip in a city
– We evaluated the user's acceptance and satisfaction respect to the recommendations provided
•
17 groups of friends
– composed in the average of 3,1 friends
•
53 users (26 men and 27 women)
– average age was 26,8
– graduate education
Practical Example 2
•
Phase 1
– each user register on a specifc web site using the credentials of facebook.com
● he/she was asked to imagine to plan a one-day visit in a specifc city and to evaluate, providing ratings, ten activities and eight restaurants
Practical Example 2
•
Phase 2
– Users were presented with both the recommendation provided by using the rfair and ravg functions (as in the previous case the top fve activities were recommended).
– The associated ratings for the proposed activities provided by all the membersof the group in the frst phase were shown.
Practical Example 2
•
Phase 2
– Finally, each user was requested to answer the following questions:
1) Which of the two proposed itineraries do you prefer?
[None/First/Second/Both]
2) How do you rate the frst itinerary? [1 to 5]
3) How do you rate the second itinerary? [1 to 5]
4) How much have you been infuenced in the evaluations by the knowledge of your friends’ ratings? [1 to 5].
Results
– Mediamente 2 membri su 3 accettano le raccomandazioni proposte
– Considerando gli utenti singoli, 49 su 53 hanno accettato almeno una soluzione proposta
– Le valutazioni sulle soluzioni proposte sono buone (4 su 5)
Conclusions
•
Semplici weighted social function possono aiutare nella raccomandazione di gruppo
•
La strategia AVG ottiene risultati migliori in caso di selezioni binarie, mentre la FAIR sembra essere più performante in caso di rating espliciti
•