• Non ci sono risultati.

Sistemi di raccomandazione

N/A
N/A
Protected

Academic year: 2021

Condividi "Sistemi di raccomandazione"

Copied!
88
0
0

Testo completo

(1)

Sistemi di raccomandazione

Francesco Barile

francesco.barile@unina.it

Università degli Studi di Napoli “Federico II”

(2)

Sistemi di raccomandazione

Software tools e tecniche che forniscono suggerimenti su oggetti item che devono essere utilizzati dall'utente

Suggerimenti relativi a diversi processi di decision-

making (prodotti da acquistare, musica da ascoltare,

ecc..)

(3)

Sistemi di raccomandazione

Vantaggi sia per il fornitore che per l'acquirente:

Incrementare il numero di oggetti venduti

Vendere oggetti che l'utente diffcilmente troverebbe

Incrementare la soddisfazione e la fdelizzazione della clientela

Capire meglio i desideri dei clienti

(4)

Sistemi di raccomandazione

Composto da

Items: oggetti da raccomandare

Users: utenti del sistema

Transactions: interazioni tra l'utente ed il sistema

Spesso rating, ma non necessariamente

In generale, il sistema deve stimare l'utilità R(u, i) che un utente u riceve da un item i

Spesso corrisponde al prevedere il rating che l'utente darebbe all'item i

(5)

Sistemi di raccomandazione

l sistema dati alcuni R(u, i) che conosce deve ottenere R(u, i) che è una stima per quelli che non conosce

una volta ottenute le stime di utilità il sistema

raccomanda i K oggetti che massimizano l'utilità per il soggetto u

non necessariamente calcola tutte le utilità, potrebbe

usare un'euristica

(6)

Tecniche possibili

Content-based

Il sistema stima l'utilità di un item in base a quelle di item simili già valutati dall'utente

Collaborative-fltering

La stima viene fatta sulla base alle valutazioni sull'item fatte da utenti simili

Demographic

Utenti suddivisi in categorie e le raccomandazioni sono fssate sulla base delle categorie

(7)

Tecniche possibili

Knowledge-based

Raccomandazioni basate su uno specifco dominio di conoscenza su quanto determinati attributi degli item possono essere utili per determianti utenti

Community-based

Raccomandazioni basate sulle preferenze di amici dell'utente

Hybrid

Combinazione delle precedenti

(8)

Problematiche

Cold start problem

All'inizio dell'attività il sistema non ha informazioni su cui basare le raccomandazioni

Sparsity

Gli oggetti valutati dagli utenti possono essere molto pochi rispetto al totale

(9)

A Web Portal for Smart Tourism

(10)

User Profling

This kind of interface introduces the typical information overload problem:

too much information to show and to manage

A possible solution:

an automatic filtering based on the user profile

The user can save all the POI she/he prefers in a favourites list

Simultaneously, by applying a filtering approach, all the information is automatically ordered and filtered according to the user profile

(11)

User Profling

Given a user ui and a set of m POI

the recommendation system, for each user i, aims at building a Preference Profile or a ranking Ri of the user i over P.

Such preference profile is the set represents a partial order over P.

Our goal is not to guess the exact value of ri; j the user i would assign to the item j, but to properly select the k -best items in the preference profile (the ones with the highest rating).

(12)

Collaborative Filtering

The most common approach used in RSs to generate a user preference profile is based on Collaborative Filtering techniques.

This approach suggests items to the user (or defines a rating for an item) by taking into account the preferences of similar users

This similarity is evaluated by considering the common items that they rated

This kind of technique suffers from two problems:

cold start and sparsity

(13)

Collaborative Filtering

The cold-start problem concerns the issue that the system, at the

beginning, has not yet sufficient information about a user, because she/he rated too few items:

so, it cannot properly evaluate the similarity between users.

The sparsity problem regards especially systems where the set of items is extremely large:

most of the users only rated a small subset of the overall.

(14)

A Possible Approach

To use social networks as external sources to obtain users’ information and to overcome the above– mentioned problems.

In detail, we use the most popular social networks: Facebook:com.

The aim of this approach is to examine all cross-domain information, from a user profile, to obtain, then, a recommendation in a specific domain (e.g., touristic preferences).

(15)

User Profling of Facebook

In detail, our method, analyzing user’s likes, tags, check-in and photos on Facebook.com, collects data from users’ profile in every possible domains (age, education level, music, movies, check-in places, etc.)

and uses them to evaluate the similarity between the current user and other users of the system

we do not consider only the specific items that are liked by user (e.g. the Rolling Stones’ page or a check-in at Colosseum), but we evaluate their category (e.g. musician, rock band, history museum, Chinese restaurant, etc.)

(16)

User Profling of Facebook

First calculate a score on specific nodes: a score of a node can represents both a like on a page and a check-in in a specific place.

(17)

User Profling of Facebook

The rate of a like on an item is propagated to its parent category and then to all its hierarchy

(18)

User Profling of Facebook

Finally, the prediction of the preference of items in our specific domain (cultural sites and other POI of the city) is obtained using the explicit ratings or saved itineraries produced on the Web Portal by the most similar users

Problema: il portale deve fornire un supporto anche all'organizzazione di viaggi per gruppi di persone

Gli utenti possono selezionare un gruppo di amici

Come fornire una raccomandazione che soddisfi tutti i membri del gruppo?

(19)

Group Recommendation

How to aggregate single group's members preferences to fnd the best for the whole group?

Two main techniques:

1) Aggregate single members profles into a group profle

(20)

Group Recommendation

How to aggregate single group's members preferences to fnd the best for the whole group?

Two main techniques:

2) Provide recommendations for each user and then aggregate them

(21)

Group Recommendation

We focuses on the second approach:

More fexibility in group formation process

The identity of group's members must be dynamically determined

Single user’s profle and recommendations build independently from the other group’s members

Multi Agent Systems (MAS) techniques can be applied

to aggregate these recommendations

(22)

Problem Defnition

set of Friends

preference profle evaluated by single user recommendation agent

group's preference profle set of POI

F ={1 ,... , n}

P ={1 , ... , m}

Our goal:

∀i∈F , r

i

={r

i

(1) ,... , r

i

(m)}

r

F

={r

F

(1) ,... , r

F

(m)}

(23)

MAS techniques for

Group Recommendations

Social choice analysis

Negotiation

Coalitions

Non-cooperative Games

Weighted Utilities

(24)

Social Choice Analysis

Majority based

try to determine the most popular choices

Examples: Borda Count, Plurality Voting

Consensus based

try to average among all the possible choices

Examples: Average, Fairness

Role based

explicitly takes into account possible roles and hierarchical relationships among members

Examples: Least misery, Most Pleasure

(25)

Majority Based

Borda Count

users' rankings replaced with progressive scores

0 for the lowest rated

1 for the next one, and so on

on this scores an average strategy may be applied

Plurality Voting

a sorting among users is necessary

starting from the frst user, his/her k top rated activities are considered

select the more voted activity among these and assign the greater score

(26)

Consensus Based

Average Strategy

consider the average of all group members’ ratings

commonly used as a benchmark for comparison or as base to defne more complex approaches

Fairness Strategy

needs of an ordering among the users

frst a user is selected and his k -best rated activities are taken into account

among them chose activity that guarantees the less misery to the other users

(27)

Role Based

Least Misery

can be used when one or more users give a rating particularly low for some activities

in small groups, it is reasonable to assume that the satisfaction of the group could decrease if one or more components really dislike the activity

assign to each activity the minimum of its ratings

Most Pleasure

assign to each activity the greatest given rating

(28)

Negotiation

A set of agents act on behalf of human group's

members, participating in a cooperative negotiation for generating the group recommendations

Two level user profling

Agents must reproduce preferences and behaviours

Protocols used

Merging ranks with mediator, alternating offers, bilateral alternating offers protocol

(29)

Coalition

Reorganization of group in small and cohesive subgroups

Problem modelled as coalitional game

People grouped into disjoint coalitions to maximize the social welfare function of the group.

The payoff function considers the similarity between coalition members’ ratings, and a weighting factor for the coalition size

Not applicable in some cases

(30)

Non-cooperative Games

Members viewed as self-interested agents

Problem modelled as non-cooperative game:

Group's members are the players

Items to recommend are game actions

Payoff matrix consider user's rankings for each item

Problem is modelled as a problem of fnding the Nash

Equilibrium for the game

(31)

Weighted Utilities

Results presented in the literature showed that there is no strategy that can be defned as the “best”

different approaches are better suited in different scenarios

depending from the characteristics of the specifc group

Traditional MAS techniques do not seem to capture all the features of real-world scenarios

Information like: Expertise, Infuence on other

members, Dissimilarity

(32)

Weighted Utilities

Necessary to integrate information from the social

relationships among group's members with the classical MAS techniques

and so to derive new strategies more applicable to real-life situations

Proposed techniques

Social Value Function

Empathetic utility

(33)

Social Value Function

Computed on the basis of social interactions between group members [Gartrell et al. 2010, GROUP]

Used to discriminate the strategy to apply for the group

Other characteristics used as weights for single users' preferences:

Expertise

Dissimilarity between users

All these information are derived from questionnaires

(34)

Emphatetic Utility

The satisfaction of an individual depends from both his intrinsic utility and his empathetic utility deriving from the happiness of his neighbours in the social network [Salehi-Abari and C. Boutilier 2014, AAMAS]

individual preferences are aggregated in a weighted social choice function

that takes into account local relationships with neighbourhoods in the network

They do not specify how to evaluate such numerical

relationships

(35)

Our Proposed Approach

Both these approaches do not provide a way to automatically retrieve social information

Information are derived from questionnaires or they are not specifed

An idea to address this problem is to retrieve social

information from Online Social Networks (OSNs)

(36)

A Leadership Weighted Model

To retrieve info about relationship from OSN (facebook.com)

To use these information to calculate a leadership value for each user w.r.t. the other group's members

Leadership values are used to weight single user

preference (single recommendation) in classic social choice analysis techniques

Leadership based on the centrality of the user in the

friends' network

(37)

Ranking System

The set of agents is the same as the set of outcomes (N = O)

Agents are asked to vote to express their opinions about each other, with the goal of determining a social ranking.

Each agent divides the other agents into a set that he likes equally, and a set that he dislikes equally (or, equivalently, has no opinion about).

(38)

Ranking System

Formally, for each i ∈ N the outcome set is partitioned into two sets Oi,1 and Oi,2, with ∀ o1 ∈ Oi,1, ∀ o2 ∈ Oi,2,

o1i o2, and with ∀o, o′ ∈ Oi,k for k ∈ {1, 2}, o ∼ i o′.

A ranking rule must simply return some total preordering on the agents

Arrow’s impossibility system does not hold in the ranking systems

(39)

Properties

(40)

Strong transitivity

Consider a preference profle in which outcome o2 receives at least as many votes as o1, and it is possible to pair up all the voters for o1

with voters from o2 so that each voter for o2 is weakly preferred by the ranking rule to the corresponding voter for o1.

Further assume that o2 receives more votes than o1 and/or that there is at least one pair of voters where the ranking rule strictly prefers the voter for o2 to the voter for o1.

Then the ranking rule satisfes strong transitivity if it always strictly prefers o2 to o1.

(41)
(42)

Weak transitivity

Consider a preference profile in which outcome o2 receives at least as many votes as o1, and it is possible to pair up all the voters for o1 with voters for o2 who have both voted for exactly the same number of

outcomes so that each voter for o2 is weakly preferred by the ranking rule to the corresponding voter for o1.

Further assume that o2 receives more votes than o1 and/or that there is at least one pair of voters where the ranking rule strictly prefers the voter for o2 to the voter for o1.

Then the ranking rule satisfies weak transitivity if it always strictly prefers o2 to o1.

(43)

RIIA, informal

A ranking rule satisfies ranked independence of irrelevant alternatives (RIIA) if the relative rank between pairs of outcomes is always

determined according to the same rule, and this rule depends only on:

1. the number of votes each outcome received; and 2. the relative ranks of these voters

(44)

RIIA, informal

There is no ranking system that always satisfies both weak transitivity and RIIA.

For example:

PageRank algorithm (Google)

Ranking system that satisfies weak transitivity but not RIIA

(45)

What is a Key User ?

• Large community

• Affects a large number of persons

• Unlikely to live OSN

• Pay for Premium services

3/19

(46)

Users’ Connectivity in OSN

• Structural characteristics of the network

• Well-connected users

• Social Graph

• Centrality measures

– Degree

– Closeness

– Betweenness

4/19

(47)

Users’ Communication Activity

• Exchange of information

• User interaction

• Activity Graph

• Strong/Weak connection

5/19

(48)

Misure di centralità

Problema: determinare I nodi più “importanti” di una rete

Differenti modi per defnire l'importanza di un nodo, basati sull'analisi dei link tra i nodi:

Degree: grado entrante del nodo

Closeness: vicinanza di un nodo agli altri nel grafo

Per ogni nodo si calcola la distanza media con tutti gli altri

Minore è il valore ottenuto più il nodo è importante

Popularity: non conta solo quanti nodi ti raggiungono, ma anche l'importanza di essi

(49)

PageRank Algorithm

Il valore di ogni nodo dipende dal valore dei nodi che puntano verso di esso

Inizialmente ogni nodo ha una percentuale equivalente di “importanza”

Ad ogni nodo è assegnata popolarità uguale a 1/n

Iterativamente sono ricalcolate le popolarità di tutti i nodi, sulla base delle popolarità dei nodi adiacenti

all'iterazione precedente

(50)

PageRank Algorithm

Il processo è ripetuto fnché i valori di ogni nodo non cambiano

Si può dimostrare che l'algoritmo converge se sono verifcate alcune condizioni sul grafo

Il valore per il nodo x coincide con la probabilità di

arrivare al nodo x partendo da un nodo a caso nel grafo

e seguendo un percorso random

(51)

PageRank Algorithm

Criticità che possono compromettere la convergenza dell'algoritmo:

Nodi senza archi uscenti

Aggiungere un arco dal nodo ad ogni altro nodo

Loop tra due nodi

Salti casuali

(52)

PageRank

• An algorithm used by Google

• PageRank is a link analysis algorithm

• Outputs a probability distribution

• Apply to any graph or network

• Personalized PageRank is used by Twitter

6/19

(53)

Novel PageRank

• Identify key users

• First step

– Derive a weighted activity graph

• Second step

– Determine users’ centrality scores

7/19

(54)

Weighted Activity Graph

• Users who actually communicate

• Graph Links

• Informational and Normative influence

8/19

(55)

Weighted Activity Graph

• Graph representation

– Symmetric adjacency matrix

• Weight of an undirected activity link

Cij – number of communication activities (i  j) Cji – number of communication activities (j  i)

• Activity Graph

n – Number of users

9/19

(56)

Users’ Centrality Scores

• PageRank used by Google

N – Total number of webpages

Oj – Number of outgoing links from page j Bi – Set of web pages pointing to web page i d – dampening factor (usually set to 0.85)

• Novel PageRank

• Fi – Set of users connected to i

10/19

(57)

Demonstration and Evaluation

• Facebook dataset – New Orleans

– Set of users (63,731)

– Set of social links (817,090) – Communication activity

– 832,277 wall posts – BFS Crawler

11/19

(58)

Demonstration and Evaluation

12/19

(59)

Pros and Cons

• Great results

• Complexity O(n²)

• Social and Activity Graph

• Offline contacts

• Direction of posts/messages

• Privacy risks

13/19

(60)

Relaxations of transitivity (Dropping)

Add the requirements that an agent’s rank can improve only when he receives more votes (“positive response”), and

that the agents’ identities are ignored by the ranking function (“anonymity”).

Approval voting is the only ranking rule that satisfies RIIA, positive response, and anonymity.

(61)

Strong quasi-transitivity

Consider a preference profile in which outcome o2 receives at least as many votes as o1, and it is possible to pair up all the voters for o1 with voters from o2 so that each voter for o2 is weakly preferred by the ranking rule to the

corresponding voter for o1.

Then the ranking rule satisfies strong quasi-transitivity if it weakly prefers o2 to o1, and strictly strong prefers o2 to o1

if either o1 received no votes or each paired voter for o2 is strictly preferred by the ranking rule to the corresponding voter for o1.

No anonimity

(62)

Page-rank

forall i ∈ N do rank(i) ← 0 repeat

forall i ∈ N do

if |voters_for(i)| > 0 then

rank(i) ← 1/n+1 [|voters_for(i)| + maxj∈voters_for(i) rank(j)]

else

rank(i) ← 0 until rank converges

O(n3)

(63)

Hits Algorithm

Ogni nodo ha una doppia “identità” (Hub e Autority)

(64)

Hits Algorithm

Due tipi di ranking per ogni nodo:

Hub weight: somma degli autority weight dei nodi puntati dall'hub

Autiority weight: somma degli hub weight che puntano l'autority

Si inizializzano tutti i pesi a 1

Ad ogni iterazione si ricalcolano gli hubs weights e poi

gli autority weights, che poi vengono normalizzati

(65)

Hits Algorithm

Alla fne otterremo per ogni nodo due pesi

I nodi con alto Hub Weight puntano nodi con alto Autority Weight

Gli hubs sono i nodi che

“indirizzano” meglio la ricerca

Gli autority sono i nodi che rappresentano I risultati migliori

(66)

A Leadership Weighted Model

Leadership value

extension of google PageRank algorithm

centrality measure that takes into account the degree of activity of a person and the directionality of specifc

communication activities between pairs of users

use a combination of data collected from the OSN

as in the classic PageRank, each user inherits a portion of popularity from other users

(67)

A Leadership Weighted Model

set of Friends, and

R (x )= 1 −d

∣F∣ +d

i∈ F

w (i , x )

w (i) R (i)

We defne:

F ={1 , ... , n}

Where:

x , i ∈F

d is a dampening factor set to 0.85

••

w(i, x) estimate i-th friend’s popularity and the weight of the communication activity of the i-th friend

towards the friend x

••

i-th froens's global activity on the whole group (used to normalize)

••

w (i)=

j∈F

w (i , j)

(68)

A Leadership Weighted Model

set of Friends, and

We defne:

Where:

d is a dampening factor set to 0.85

••

w(i, x) estimate i-th friend’s popularity and the weight of the communication activity of the i-th friend

towards the friend x

••

i-th friend's global activity on the whole group (used to normalize)

••

R (x )= 1 −d

∣F∣ +d

i∈ F

w (i , x )

w (i) R (i)

x , i ∈F

F ={1 , ... , n}

w (i)=

j∈F

w (i , j)

(69)

A Leadership Weighted Model

(70)

A Leadership Weighted Model

set of Friends

F ={1 , ... , n}

P ={1 , ... , m} set of POI

∀i∈F , R(i) leadership value of Friend i

∀i∈F , r

i

={r

i

(1) ,... , r

i

preference profle (m)}

evaluated by single user recommendation agent

We defne: ∀ x∈P , r

avg

(x)= 1

n

i=1 n

(R(i)⋅r

i

(x)) r

avg

={r

avg

(1), ... , r

avg

(m)}

=>

(71)

A Leadership Weighted Model

set of Friends

set of POI

leadership value of Friend i

preference profle evaluated by single user recommendation agent

F ={1 ,... , n}

P ={1 , ... , m}

∀i∈F , R(i)

∀i∈F , r

i

={r

i

(1) ,... , r

i

(m)}

(72)

Fairness Strategy

The proposed fairness strategy (rfair) tries to accommodate everyone in the group, and requires an ordering of the users

The frst user’s top K choices are selected. Among them, the one that causes the least misery to the others is selected (in case of items with the same rating a non deterministic choice is made)

and the process is repeated with the successive user in the rank

The group POI ranking values are assigned in a descending order from m to 1.

Finally, the group recommendation will correspond to the K-POI with the highest rfair values.

(73)

Fairness Strategy

One of the main issues with the use of this strategy is that, by

changing the users’ ordering, the selection process will produce a different result in the outcome

We propose to use the D(i) values to provide a ranking/to sort the users

(74)

Weighted Average Strategy

(75)

Valutazione del sistema

Problematiche nella valutazione di sistemi di raccomandazione di gruppo:

Mancanza di dataset adatti

Servirebbe un dataset contenente rating indivduali, rating di gruppo, e interazioni tra gli utenti

Necessario effettuare “pilot study”

Una valutazione più accurata è possibile solo quando il

sistema sarà in funzione

(76)

Valutazione del sistema

Due Esperimenti

Nel primo caso valutiamo i benefci dell'introduzione della leadership confrontando i risultati con quelli

ottenuti con tecniche standard non pesate

Nel secondo valutiamo la soddisfazione degli utenti

sulle raccomandazioni fornite

(77)

Practical Example 1

Task: planning a trip in a city

We evaluated only the users’ behaviors in the decision making process and the impact of our consensus function on a simple case of binary selection of POI

14 groups of friends

composed in the average of 3,36 friends

46 users (26 men and 20 women)

average age was 27,3

graduate education

(78)

Practical Example 1

Phase 1

each user register on a specifc web site using the credentials of facebook.com

he/she was asked to imagine to plan a one-day visit in a specifc city and to select:

three activities for the day, from a checklist of ten items

two restaurants, from a check list of eight

(79)

Practical Example 1

Phase 2

The group was, asked to discuss, face-to-face, in order to obtain a shared and unique decision for the group (rGT).

(80)

Practical Example 1

(81)

Results

Valutazione: similarità con la scelta fnale del gruppo

Le strategie “pesate” hanno risultato migliori

In particolare, ravg ha risultati migliori per quasi tutti i gruppi valutati

(82)

Practical Example 2

Task: planning a trip in a city

We evaluated the user's acceptance and satisfaction respect to the recommendations provided

17 groups of friends

composed in the average of 3,1 friends

53 users (26 men and 27 women)

average age was 26,8

graduate education

(83)

Practical Example 2

Phase 1

each user register on a specifc web site using the credentials of facebook.com

he/she was asked to imagine to plan a one-day visit in a specifc city and to evaluate, providing ratings, ten activities and eight restaurants

(84)

Practical Example 2

Phase 2

Users were presented with both the recommendation provided by using the rfair and ravg functions (as in the previous case the top fve activities were recommended).

The associated ratings for the proposed activities provided by all the membersof the group in the frst phase were shown.

(85)

Practical Example 2

Phase 2

Finally, each user was requested to answer the following questions:

1) Which of the two proposed itineraries do you prefer?

[None/First/Second/Both]

2) How do you rate the frst itinerary? [1 to 5]

3) How do you rate the second itinerary? [1 to 5]

4) How much have you been infuenced in the evaluations by the knowledge of your friends’ ratings? [1 to 5].

(86)

Results

Mediamente 2 membri su 3 accettano le raccomandazioni proposte

Considerando gli utenti singoli, 49 su 53 hanno accettato almeno una soluzione proposta

Le valutazioni sulle soluzioni proposte sono buone (4 su 5)

(87)

Conclusions

Semplici weighted social function possono aiutare nella raccomandazione di gruppo

La strategia AVG ottiene risultati migliori in caso di selezioni binarie, mentre la FAIR sembra essere più performante in caso di rating espliciti

Necessarie valutazioni più approfondite in fase di

sperimentazione del portale

(88)

Riferimenti

Documenti correlati

Current address: Dipartimento di Matematica, Universit` a degli Studi di Roma Tor Vergata, Via della Ricerca Scientifica - 00133 Roma, Italy. E-mail

Observed (O) and expected (E) cases, and standardized incidence ratios (SIR) of selected" subsequent cancer sites after an initial diagnosis of testicular cancer, and

In that case, we could say that the external control, and in particular the accounting control entrusted to the external auditor, should not be mandatory, but it should be quite

In a list of distinct positive integers, say that an entry a is left-full if the entries to the left of a include 1,.. Show that the number of arrangements of n elements from

With motivation by the inverse scattering transform and help from the state-space method, an explicit formula is obtained to express such exact solutions in a compact form in terms of

The 3 rd i-TREC 2018 was proudly organized by the Tropical Renewable Energy Center – Faculty of Engineering Universitas Indonesia.. The 3 rd i-TREC 2018, which was held in

Going back to the Roman Zoo history, poor records and lack of scientific publications have in many cases prevented the full scientific use of the mammalian

It entails a modification of perceptions and expectations of economic agents which end up dealing with an institutional arrangement whose degree of transparency (all prices