Preserving privacy of contents in Decentralized Online Social Networks

(1)

PhD in Computer Science - XXIX cycle

Ph.D. Thesis

Preserving privacy of contents in

Decentralized Online Social Networks

Andrea De Salve email: [email protected]

Supervisor: Prof. Laura Ricci University of Pisa

Supervisor: Dr. Paolo Mori

IIT - National Research Council

(2)

(3)

Traditional Online Social Networks (OSNs), which are based on a single service provider, suffer several drawbacks, first of all the privacy issues arising from the delegation of user data to a single entity, the OSN provider. In the last years, Dis-tributed Online Social Networks (DOSNs) have been proposed to shift the control over user data from the OSN provider to the users of the DOSN themselves. Indeed, in contrast to centralized OSNs (such as Facebook), DOSNs are not based on cen-tralized storage services which decide the term of service and the contents shared by the users are stored on the devices of the users themselves. However, the lack of a centralized entity introduces new interesting challenges, like that of guaranteeing the availability of user’s contents when the user disconnects from the DOSN and the privacy of the user’s contents.

In this dissertation we investigate the problem of preserving the privacy of the con-tents shared by users of DOSNs by focusing on two different aspects: i) the need of defining proper privacy policies for content access and ii) the storage (allocation and replication) of these contents on the nodes which build up the DOSN system. When efficiency has to be taken into account, new solutions have to be devised that minimize as much as possible the overhead introduced by the enforcement of the privacy policy and enable a higher contents availability through distribution and replication. Current solutions fall short in meeting the above criteria, while in this dissertation we proposed two approaches which adopt two very different solutions to guarantee the protection of the contents according to the privacy preferences of users and efficiently reduce the overhead introduced by privacy enforcement mechanisms. The proposed approaches are validated by an experimental campaign based on data obtained from a real OSN which have enabled the definition of a set of simulations taking into account realistic scenarios. The experimental results obtained from a set of simulations performed on real traces show the effectiveness of our approaches.

(4)

(5)

engaging in social engineering.” — Jaron Lanier

(6)

(7)

Introduction 17

I.1 Thesis Contribution . . . 20

I.2 Outline . . . 23

I

Basic concepts

27

1 Background 29 1.1 Online Social Networks . . . 29

1.2 P2P Systems . . . 31

1.3 The P2P look-up service: DHT . . . 33

1.3.1 Pastry . . . 34

1.3.2 Chord . . . 36

1.4 Decentralized Online Social Networks . . . 37

1.5 Conclusion . . . 39

2 Related Works 41 2.1 Decentralizing the Online Social Networks . . . 41

2.2 Privacy requirements in DOSNs . . . 45

2.3 Privacy model . . . 47

2.3.1 Advanced privacy policy mechanisms . . . 51

2.4 Privacy policy management . . . 52

2.4.1 Initialization . . . 53

2.4.2 Updating privacy policy . . . 57

2.5 Evaluation and Discussion . . . 62

2.5.1 Evaluation of the privacy models . . . 64

2.5.2 Evaluation of the privacy policy management . . . 66

II

Modelling Online Social Networks

73

(8)

3.1 Modelling social profiles . . . 75

3.1.1 The Profile Tree . . . 76

3.1.2 Actors and actions on the contents . . . 79

3.2 Privacy policy definition . . . 79

3.2.1 Attributes of the OSNs . . . 82

3.3 A general DOSN reference architecture . . . 84

4 Characterizing users behaviour and groups in OSNs 87 4.1 The Facebook Ego Network Dataset . . . 87

4.1.1 Characterizing user groups in Facebook Ego Network dataset . 89 4.1.2 Characterizing user temporal properties in Facebook Ego Net-work dataset . . . 90

4.2 The Facebook groups dataset . . . 92

4.2.1 Groups leave and join in Facebook group dataset . . . 94

III

Cryptography-based model for privacy protection of

97

5 Logical Key Hierarchy-based privacy control 99 5.1 Motivation and Contribution . . . 99

5.2 Group-based communication in OSNs: Use Cases . . . 101

5.3 The Logical Key Hierarchy (LKH) model . . . 102

5.4 Privacy-preserving group management . . . 104

5.4.1 Group creation . . . 106

5.4.2 Join . . . 107

5.4.3 Eviction . . . 110

5.4.4 Publishing and retrieving contents . . . 111

6 LKH-based approach: Experimental evaluation 113 6.1 Experimental Framework . . . 113 6.2 Group Creation . . . 114 6.3 Group Join . . . 115 6.3.1 Single user . . . 115 6.3.2 Multiple users . . . 116 6.4 Group Leave . . . 122 6.4.1 Single user . . . 122 6.4.2 Multiple users . . . 123 6.5 Performance evaluation . . . 123

(9)

IV

Policy-based model for privacy protection of contents

131

7 Policy-based privacy control 133 7.1 Motivation and Contribution . . . 133

7.2 The privacy support . . . 135

7.3 The authorization system . . . 136

7.4 Privacy preserving content replication . . . 137

7.4.1 The replication framework . . . 138

7.4.2 The Algorithms . . . 139

7.5 Privacy and temporal aware content replication . . . 142

7.5.1 Using linear predictor to estimate user availability . . . 143

7.5.2 Extending the linear predictor with intervals . . . 149

7.5.3 Replica selection algorithm . . . 151

8 Policy-based approach: Experimental evaluation 153 8.1 Evaluation of the authorization framework . . . 153

8.1.1 Privacy policy reference examples . . . 154

8.1.2 Experimental results . . . 154

8.2 Evaluation of the linear predictor . . . 158

8.2.1 User’s selection strategy . . . 159

8.2.2 Simulation settings and Performance measures . . . 159

8.2.3 Experimental results . . . 160

8.3 Evaluation of the privacy preserving approach . . . 166

8.3.1 Privacy policy reference examples . . . 166

8.3.2 Experimental results: the replication model . . . 167

8.3.3 Experimental results: the privacy and temporal aware repli-cation . . . 171

V

Comparison and Discussion

177

9 Comparisons of the different approaches 179 9.1 Experimental Framework used for comparison . . . 179

9.1.1 Policy-based access control . . . 181

9.1.2 LKH-based access control . . . 183

9.1.3 ACL-based access control . . . 186

(10)

9.2.1 Content publish . . . 189

9.2.2 Join . . . 192

9.2.3 Leave . . . 197

9.3 Discussion . . . 202

VI

Conclusion

205

10 Conclusion and Future Work 207 10.1 Concluding remarks . . . 207

10.2 Future work . . . 209

Appendices 211 A The P2P simulators 213 A.1 The Peersim simulator . . . 213

A.2 The PeerfactSim.KOM simulator . . . 216

B The XACML architecture 221 B.1 The XACML Language . . . 221

(11)

Basic concepts

29

Background 29

1.1 P2P systems and applications . . . 31

1.2 General P2P look-up service . . . 33

1.3 Pastry routing information . . . 35

1.4 Pastry routing algorithm . . . 36

1.5 Routing information Chord . . . 36

1.6 Challenges in DOSNs . . . 38

Related Works 41 2.1 Components of the privacy policy . . . 46

2.2 Comparison of the security mechanisms in DOSNs . . . 48

2.3 A taxonomy for privacy models . . . 65

Modelling Online Social Networks

75

Social data and Privacy Policies 75 3.1 General profile tree of a user . . . 77

3.2 Tree representing a profile of a user . . . 80

3.3 Attributes of the OSNs . . . 82

3.4 A reference architecture of a general-purpose DOSN platform . . . . 84

Characterizing users behaviour and groups in OSNs 87 4.1 Temporal properties of the Facebook users . . . 92

4.2 Properties of groups based on Work & Education . . . 95

4.3 Properties of groups based on News & Info . . . 95

(12)

Cryptography-based model for privacy protection of

con-tents

99

Logical Key Hierarchy-based privacy control 99

5.1 The general structure of a Key Tree . . . 104

5.2 Overview of the sytem model . . . 105

5.3 Overview of the data structures . . . 106

5.4 Addition (and removal) of a user . . . 107

5.5 Addition of users . . . 109

LKH-based approach: Experimental evaluation 113 6.1 Join operation on groups . . . 116

6.2 Number of messages created by the group owner . . . 118

6.3 Overhead for the join of users (group owner) . . . 119

6.4 Overhead for the join of users (joining user) . . . 120

6.5 Overhead for the join of users (other members) . . . 121

6.6 Leave operation of group . . . 122

Policy-based model for privacy protection of contents

133

Policy-based privacy control 133 7.1 The overlay networks of the system . . . 135

7.2 The privacy framework architecture . . . 136

7.3 General profile tree data structure . . . 139

7.4 Weights coefficients vectors . . . 147

7.5 Computation of the prediction . . . 148

Policy-based approach: Experimental evaluation 153 8.1 Number of profile migrations and requests . . . 155

8.2 Evaluation time of the privacy policies . . . 156

8.3 Total evaluation time . . . 157

8.4 Comparison of different coefficients vectors . . . 161

8.5 Precision of the linear predictor . . . 162

8.6 Precision at different instants of a day . . . 163

8.7 Precision for different number of predicted units . . . 165

8.8 Number distinct users and transitions . . . 166

8.9 Distribution of the transitions . . . 167

8.10 Number of contents and trusted replicas . . . 168

8.11 Assessment of contents availability and load . . . 169

(13)

8.13 Statistics on the contents created . . . 173

8.14 Statistics on the availability of the contents . . . 174

8.15 Statistics on the traffic . . . 175

Comparison and Discussion

179

Comparisons of the different approaches 179 9.1 Computation time for the publication of a content . . . 190

9.2 Computation time and message size of the Policy-based . . . 191

9.3 Computation time and messages size of the LKH-based and ACL-based192 9.4 Evaluation of the join with backward secrecy . . . 195

9.5 Evaluation of the join without backward secrecy . . . 197

9.6 Evaluation of the leave with backward secrecy . . . 200

9.7 Evaluation of the leave without backward secrecy . . . 203

Appendices

213

The P2P simulators 213 A.1 Interface of the simulator . . . 214

A.2 PeerSim simulation models . . . 215

A.3 Peersim: Life-cycle of the cycle-based simulation . . . 216

A.4 The architecture of the PeerFactSim simulator. . . 217

A.5 Simulation time in discrete event-based simulations. . . 218

The XACML architecture 221 B.1 An XACML policies structure. . . 222

(14)

(15)

Basic concepts

29

Background 29

1.1 Privacy options and centralized OSNs . . . 30

Related Works 41 2.1 Table of notations and symbols . . . 53

2.2 Architectural style of DOSNs . . . 63

2.3 Classification of the privacy model . . . 66

2.4 Overhead for privacy policy definition . . . 67

2.5 Overhead for user addition . . . 69

2.6 Overhead for user removal . . . 70

Modelling Online Social Networks

75

Characterizing users behaviour and groups in OSNs 87 4.1 Descriptive characteristics of the dataset . . . 89

4.2 Size and number of groups in Facebook . . . 90

4.3 Characteristics of the monitored groups . . . 93

Cryptography-based model for privacy protection of

con-tents

99

Logical Key Hierarchy-based privacy control 99 5.1 Type of groups in OSNs . . . 102

5.2 Table of notation . . . 103

LKH-based approach: Experimental evaluation 113 6.1 Overhead for the creation of a group . . . 114

(16)

6.2 Overhead for joining one user . . . 115

6.3 Overhead for the join of multiple users . . . 117

6.4 Overhead for the leave of a user . . . 122

6.5 Overhead for the leave of multiple users . . . 123

6.6 Cost of encryptions algorithms . . . 124

6.7 Computation time and Messages size . . . 125

6.8 Overhead of the security mechanisms in DOSNs . . . 126

6.9 Computation time taken by DOSN security mechanisms . . . 129

Policy-based model for privacy protection of contents

133

Policy-based privacy control 133 7.1 Notations and symbols used in our model . . . 145

Policy-based approach: Experimental evaluation 153 8.1 Parameters of the experiments . . . 159

Comparison and Discussion

179

Comparisons of the different approaches 179 9.1 Types of groups available in OSNs . . . 181

9.2 The parameters of the simulation . . . 188

9.3 Cost of the encryption algorithms . . . 189

9.4 Cost for the join with backward secrecy . . . 193

9.5 Cost for the join without backward secrecy . . . 196

9.6 Cost for the leave with backward secrecy . . . 198

9.7 Cost for the leave without backward secrecy . . . 201

(17)

In international journals

[DS+15] Andrea De Salve, Barbara Guidi, Tobias Amft, Kalman Graffi, and Laura Ricci. “DiDuSoNet: A P2P architecture for distributed Dunbar-based social networks”. In: Peer-to-Peer Networking and Applications (2015), pp. 1–18.

[DS+16d] Andrea De Salve, Marco Dondio, Barbara Guidi, and Laura Ricci. “The impact of user’s availability on On-line Ego Networks: a Face-book analysis”. In: Computer Communications 73 (2016), pp. 211– 218.

[DS+17a] Andrea De Salve, Roberto Di Pietro, Paolo Mori, and Laura Ricci. “A Logical Key Hierarchy Based Approach to preserve content privacy in Decentralized Online Social Networks”. In: IEEE Transactions on Dependable and Secure Computing (2017).

[DSGR17b] Andrea De Salve, Barbara Guidi, and Laura Ricci. “Evaluation of Structural and Temporal Properties of Ego Networks for Data Avail-ability in DOSNs”. In: Mobile Networks and Applications (2017), pp. 1– 12.

In international conferences

[DS+14b] Andrea De Salve, Marco Conti, Barbara Guidi, Francesco Pitto, and Laura Ricci. “Trusted Dynamic Storage for Dunbar-Based P2P Online Social Networks.” In: OTM Conferences. Springer. 2014, pp. 400–417. [DS+16a] Andrea De Salve, Barbara Guidi, Paolo Mori, and Laura Ricci. “Dis-tributed Coverage of Ego Networks in F2F Online Social Networks”. In: Ubiquitous Intelligence & Computing, Advanced and Trusted Com-puting, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress, 2016 Int. IEEE Conferences. IEEE. 2016, pp. 423–431.

(18)

[DS+16b] Andrea De Salve, Roberto Di Pietro, Paolo Mori, and Laura Ricci. “Logical key hierarchy for groups management in Distributed Online Social Network”. In: Computers and Communication (ISCC), 2016 IEEE Symposium on. IEEE. 2016, pp. 710–717.

[DS+16c] Andrea De Salve, Paolo Mori, Laura Ricci, Raed Al-Aaridhi, and Kalman Graffi. “Privacy-Preserving Data Allocation in Decentralized Online Social Networks”. In: Distributed Applications and Interopera-ble Systems. Springer. 2016, pp. 47–60.

[DS+17b] Andrea De Salve, Barbara Guidi, Paolo Mori, Laura Ricci, and Vin-cenzo Ambriola. “Privacy and temporal aware allocation of data in De-centralized Online Social Networks”. In: Green, Pervasive and Cloud Computing, Amalfi Coast, Italy, The 12th International Conference on. Springer. 2017.

[DSGR16] Andrea De Salve, Barbara Guidi, and Laura Ricci. “An analysis of ego network communities and temporal affinity for online social networks”. In: Smart Objects and Technologies for Social Good, Venice November - 2016, 2nd EAI International Conference on. Springer. 2016.

[DSGR17a] Andrea De Salve, Barbara Guidi, and Laura Ricci. “A data aggrega-tion strategy based on Wavelet for the Internet of Things”. In: 19th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing. 2017.

[DSMR15] Andrea De Salve, Paolo Mori, and Laura Ricci. “A Privacy-Aware Framework for Decentralized Online Social Networks”. In: Database and Expert Systems Applications. Springer. 2015, pp. 479–490.

In international workshop

[DS+14a] Andrea De Salve, Marco Conti, Barbara Guidi, and Laura Ricci. “Epi-demic diffusion of social updates in dunbar-based dosn”. In: European Conference on Parallel Processing. Springer International Publishing. 2014, pp. 311–322.

[DSMR17c] Andrea De Salve, Paolo Mori, and Laura Ricci. “Evaluating the impact of friends in predicting user’s availability in Online Social Networks”. In: 1st International Workshop on Personal Analytics and Privacy (In conjunction with ECML PKDD 2017). 2017.

(19)

Technical report

[DSMR17b] Andrea De Salve, Paolo Mori, and Laura Ricci. “Comparison of the Privacy Controls for Group Communication in Decentralized Online Social Networks”. In: Technical Report - University of Pisa - Depart-ment of Computer Science (2017).

Submitted papers

[DSGM17] Andrea De Salve, Barbara Guidi, and Paolo Mori. “Predicting the availability of users devices in Decentralized Online Social Networks”. In: Concurrency and Computation: Practice and Experience (2017). Under second round of revisions.

[DSMR17a] Andrea De Salve, Paolo Mori, and Laura Ricci. “A Survey on Pri-vacy in Decentralized Online Social Networks”. In: Computer Science Review Journal (2017). Under review.

(20)

(21)

Recent years have seen unprecedented growth in the Online Social Network ser-vices, with about 300 OSNs collecting information on more than half a billion reg-istered users1. The OSNs have changed the way of how people interact with each other by enabling their users to define their own profiles, a virtual representation of themselves, containing personal information, photos, posts and the relationships established with the profile of the other users. Regardless of their purpose, the main service provided by OSNs is the possibility to easily share information with a set of selected contacts [Ell+07]. Users can publish information of very hetero-geneous types, ranging from personal information, private messages, wall posts, photos, videos, etc.

The most popular OSNs are based on a centralized architecture where the service provider (e.g. Facebook, Twitter and Google+) acts as central authority and takes control over users’ information, by storing a huge amount of private and possibly sen-sitive information on users and their interactions (such as the personal information and lifestyle behaviours). Users of the OSNs generate a high-volume of valuable in-formation which introduces privacy risks. For instance, Facebook users share more than 30 billion pieces of content and interact with over 900 million objects (e.g. posts, comments, etc.) each month [Fac]. When using centralized OSNs, users have to accept the policies of the service which give to the providers the right to use the users’ data for several purposes, even though these data were intended for users’ friends only. For instance, the terms of service (TOS) established by Facebook allow users to delete their account from the network at any time, but Facebook still has the right to access and exploit their old contents [Tos]. In particular, the information generated by users of the centralized OSNs are typically exploited by the central-ized service provider for targeted advertising, product recommendations and any kind of forecasts. In contrast to the centralized architecture, information generated by users of the OSNs are usually intended for a specific audience only (the friends of the users or a subset of them). Due to their centralized infrastructures, users of the current OSNs are forced to share the information intended for their friends by means of the service providers, increasing the risk of censorship, surveillance and information revelation. Indeed, recent events have shown that centralized OSNs introduce several privacy issues that expose users to a number of risks. If not

(22)

erly protected, data of the OSNs can be used by malicious users to infer personal information [Lin+09; Pon+12] or to perform other harmful activities [AG09]. For instance, in several recent cases, providers of centralized OSNs have been blamed for giving to end users a false sense of control over their data privacy [SSW10; SSP09; Zha+10], which introduces several privacy issues exposing users to a number of risks [GA05; Hog07; Str07].

Recent events have shown that, in addition to malicious users (internal or exter-nal to the OSN), also the centralized service provider [O’C10; GM13] and third-party applications (e.g. event organizer, photo editor, etc.) introduce new security and privacy risks [SF10]. As for example, the NSA documents clearly illustrate how the agencies collected users information by exploiting the weaknesses of the Facebook’s security platform [GM13]. Moreover, this centralized OSN infrastructure has sev-eral drawbacks, such as the dependency on a single service provider, no re-use of data coming from other OSNs [Yeu+09], service availability or scalability [Dat+10]. However, the lack of privacy with respect the centralized provider is one of the main concerns that has driven both scientists and the open source community towards the development of alternative OSN platforms able to shift the control over data to the end-users.

A current trend for developing OSNs is towards the decentralization of the OSN service. A Decentralized Online Social Network (DOSN) [Dat+10] is a OSN imple-mented in a distributed and decentralized way, such as a network of trusted servers or a P2P system. Instead of being based on a single service provider which manages the whole system by storing all the users’ data, a DOSN consists of a (dynamic) set of peers, which collaborate to implement the services needed to provide users with a seamless social network experience. Therefore, DOSNs shift the control over users’ profiles from the single OSN provider to the peers that build up the DOSN (i.e., to the users these peers belong to), thus solving some usual social network manage-ment issues, such as scalability, operating costs, and user privacy with respect to the single OSN provider [Dat+10]. As matter of fact, Diaspora2 is one major active DOSN currently available and it is attracting more than 669,000 users.

However, the decentralization of data management introduces some new issues, such as the need for guaranteeing the privacy of the contents published by DOSNs users with respect to the other users and the strategy for the allocation and repli-cation of contents on the peers. The privacy of the users’ profile data in DOSNs presents new interesting challenges which involve two different aspects: i) the ac-cess control on the contents of the user’s profile, and ii) the storage (allocation and replication) of the profile data on the peers which builds up the DOSN system. In-deed, the user data must be kept private according to the preferences expressed by its owner but, at the same time, they should be available to authorized users even when the owner is not online. As the number of users’ contacts as well as the num-ber and the type of contents shared on OSNs are constantly increasing, memnum-bers

(23)

of DOSNs need efficient ways to define authorizations that protect their contents and to properly enforce them in order to disclose these contents only to authorized people.

Regardless of their architectures, one of the main features provided by current OSNs is the capability of the users to define privacy preferences on the contents of their profiles, i.e., to define which other users are allowed to see their contents. Most existing DOSNs (and OSNs in general) enable users to organize their connections with friends in several groups, for instance according to the type of relationship (e.g., colleagues, school mates, family) or to their preferences, such as shared features (e.g., users’ interests or hobbies) [CF10]. Moreover, users have the capability to create discussion groups aimed to link together users of the network who share similar interest on specific topics (such as sport or photography) [ML14; GT09]. In both cases, the user privacy preferences are based on a group communication model which requires content delivery from one user (sender) to one of these (possibly large) groups of friends (receivers). Furthermore, these groups are highly dynamic: new members can be added to the group and existing members can be removed at any time by the group owner.

In general, the peers of a DOSN are organized according to an overlay network (such as Distributed Hash Table or DHT, [Bal+03]) where the contents of a user may be stored on any peer of the overlay. Therefore, peers hosting the contents published for a group could not be allowed to access them according to the privacy preferences specified by the group owner. This implies that contents can be stored on untrusted peers and that proper security mechanisms must be adopted in order to prevent undesired disclosure of the users contents to the owners of the peers storing such profiles. Usually, existing DOSNs protect the privacy of the contents allocated on users’ peers adopting encryption techniques, having only authorized parties being able to decrypt it.

As for example, in [Gra+11] to achieve fine-grained access control, every time a user wants to share a content with a group G of users (according to the privacy policy), the content is encrypted with a new symmetric key before being stored and, in turn, that symmetric key is encrypted separately with the public keys of the users belonging to G. Other DOSNs that have been proposed, try to reduce as much as possible this overhead by exploiting a shared symmetric key for contents publication. The typical solution is to have each group paired with its symmetric encryption key, and data representing the content is encrypted by the publisher for the users of the group with this symmetric group key, before being stored on the peers of the overlay network. In turn, the symmetric group key is securely distributed to each user of the group by using his public key. In this way, each authorized user is able to decrypt the symmetric group key with his private key and can access the content published in the group. Every time a new user is added to the group, the symmetric group key needs to be changed and securely distributed to the users of the group. Similarly, every time an existing member is removed from a group, the corresponding symmetric group key must be changed as well, in order to

(24)

prevent the leaving member from accessing the new contents that will be posted to the same group (i.e., enforcing forward secrecy). As a result, when the composition of the group changes (because of either a join or a leave) the symmetric group key needs to be changed and securely distributed to the current members of the group. However, the previous solution is affected by a serious drawback: it is not scalable because the number of asymmetric encryption operations to be executed to remove a user from a group is linear on the number of users belonging to that group, and this could cause a relevant overhead in case of large groups.

As a matter of fact, users tend to have a significantly large number of friends in their networks (e.g. 27 % of 18–29 year old Facebook users have more than 500

3_{). Authors of [BB13; Gra+09; Nil+12] investigated the overhead introduced by}

encryption schemes used in current DOSNs, by highlighting the impact they have on performance and user experience. In particular, they showed that the majority of DOSNs have a cost per user removal from a group proportional to the size of the group.

I.1 Thesis Contribution

DOSNs present new interesting issues concerning the protection of the users’ in-formation. Several studies on DOSNs underestimate the impact of the proposed privacy preserving techniques, while focusing primary on decentralization of the system. We believe that the design of current DOSNs may be further improved in terms of performance by defining proper privacy preserving techniques which take into account the users needs concerning contents sharing. For this purpose, in this thesis, we focus on the challenge of preserving the privacy of the contents shared to groups of users in DOSN. In order to deal with the complexity demanded by the OSN scenario new efficient solutions have to be devised that have to: i) enable users to define privacy policies on their contents ii) minimize as much as possible the overhead introduced by security mechanisms due to the enforcement of privacy policies and iii) increase the availability of the users’ contents without diverging from their privacy preferences. While current solutions fall short in meeting all the above criteria, this thesis proposes two very different approaches for preserving the privacy of the users’ contents meeting the above requirements.

The first approach enhances the current encryption based techniques for guar-anteeing the privacy of contents in DOSNs by exploiting the strength of the Logical Key Hierarchy model (LKH) [WGL00] for managing the key required for contents encryption. The LKH model exploits the hierarchical properties of tree data struc-tures in order to reduce the number of encryption operations needed for group management. In this approach the underling overlay network is used in order to guarantee the replication and higher availability of contents. For this reason, con-tents are protected against users who have not the permission to access them by

(25)

using encryption and they can be stored on any peers of the overlay network. Current solutions are affected by a serious drawback because require O(n) num-ber of encryption operations to be executed to remove a user from a group of n users, and this could cause a relevant overhead in case of large groups. Indeed, we monitored a number of real Facebook groups (see Chapter 4), and we found out that these groups tend to have a significantly large number of members (about 9000 members on average). In addition, analysis of actual data revealed that, in some cases, there is a high number of users who request to leave or join the group simul-taneously (at most 350 leave operations and 600 join operations per day). For these reasons, we propose an approach which enables to efficiently share contents with groups of users and manage the dynamism of DOSN user groups: i) by minimizing the re-encryption of the contents published in a group when the composition of the group changes and ii) by enabling a fast distribution of the cryptographic keys to all the members (n) of a group, each time a set of users is removed from or added to the group by the group owner. Indeed, our approach requires only O(d · logd(n))

encryption operations when a user is removed from a group (where d is an input parameter of the system), and O(2 · logd(n)) when a user joins the group.

In order to manage the volume of such operations that our group management ap-proach has to perform in order to accommodate the corresponding load, we provide as several further contribution the definition of a strategy for adding a bunch of users, all at the same time, to a user group. The proposed strategy performs proper insertion of the joining users to reduce as much as possible the number of encryption operations. This improvement allows to strongly affect the system performance by further reducing the number of encryption/decryption operations performed by the users.

The study conducted on the groups of real OSNs about the dynamics of such groups, as well as on the size of the social groups that users can define on the basis of common features and interests are exploited to set up the parameters of a realistic simulation environment. As a matter of fact, a further contribution of this thesis is an extensive set of experiments conducted to evaluate the performance of the proposed approach. The proposed approach allows to efficiently reduce the overhead introduced by security mechanisms, but it provides users with limited flexibility when specifying such privacy policies because users. Indeed, privacy policies provided by current DOSNs are limited to few basic privileges (such as friends, friends of friends, etc) and the adopted solutions to support users when defining privacy preferences lack of expressiveness and flexibility.

The second approach proposed for preserving the privacy of users’ contents is not based on cryptography but on an different solution that supports the user in the definition of advanced privacy policies while reducing the overhead introduced by enforcement of such policies. The approach consists in allowing users to define privacy policies to regulate the accesses to their contents by means of a proper Pri-vacy Policy Language, and the enforcement of such policies is guarantee by a proper allocation strategy which exploits them for making decisions about the allocation

(26)

and replication of these contents on the peers, in order to preserve the expected users’ privacy.

In particular, we provided a privacy support with an advanced language to express privacy policies, which allow DOSNs users to exploit social network-related infor-mation (such as inforinfor-mation related to the users’ profiles and relationships among members) to define the access rights to their contents. Indeed, the policies defined by users will be able to identify the group of authorized users by exploiting different aspects (or attributes) derived from the OSN knowledge.

The proposed replication strategy allocates a copy of each content on another peer that is currently online, and whose user is allowed to access the content. The replica peers where the contents are stored are distributed by exploiting the overlay network of the DOSN. In this way, there is no need of employing encryption mechanisms to protect the confidentiality of the contents stored on the peer, because the owner of the peer is entitled to access these data according to the privacy policy of the data owner. In fact, users of the DOSNs cannot collect additional information about another user by directly inspecting the contents of allocated on their devices (e.g., by browsing the files stored in their file systems), because such users would find only the contents they are already allowed to access. To the best of our knowledge, none of existing DOSNs take into account users’ privacy policies to drive the tasks of the underlying support, e.g. to perform a smart allocation of the profile data to the users’ peers in order to avoid encryption when possible. Instead, we believe that making the underlying infrastructure aware of the privacy preference of the users may further improve the privacy of the data owner.

While the previous proposed approach dramatically reduces the overhead introduced by the encryption, it can suffer of data availability issues because of the few users authorized to access a content. For this reason, we improve the privacy preserving replication strategy by exploiting the availability patterns of the users to define the content allocation. For this purpose, we use a linear predictor which predicts the availability status of the users (i.e., online or offline) in a future time interval on the basis of their past behaviour. We conducted an extensive evaluation of the pro-posed approach by defining a number of simulations based on the real behaviour of users extracted from a Facebook dataset, and we demonstrated that the proposed approach can effectively improve the performance of current DOSNs and ensure higher data availability.

To help the reader in the assessment of the work, the proposed solutions have been extensively evaluated by assessing their ability to implement several types of groups having different features. The types of groups identified reflect the nature and properties of possible social groups that users can define in the OSN’s scenario. We have implemented several simulations in order to measure the performance achieved by the two proposed approaches for implementing the different operations available on a group, i.e. content publish, join, and leave. The evaluation provides better understanding of the limits and benefits of each approach. In addition, we compare the computational and space cost introduced by our approaches against the overhead

(27)

introduced by popular, state of art, solution used by our competitors. Finally, the suitability of each approach to support the different types of groups is also evaluated. The comparison of the two different approaches against typical solution adopted by current DOSNs provides important insights and advice concerning the practical utilization of such approaches.

The proposed approaches are logically implemented above a general DOSN ref-erence architecture, which abstracts from the specific platform features by focusing manly on the core services provided by DOSNs system (such as the storage or the content sharing service component). Since we focus on a general DOSN reference architecture, the solution proposed in this thesis does not depend on the specific DOSN architecture since it only assumes only the presence of the allocation mecha-nisms of the DOSN. As results, they can be easily adapted to any existing platform or allocation mechanisms for handling data availability in current DOSNs.

Overall, the comprehensive assessment provided by our work clearly indicates that our approaches outperforms current DOSNs mechanisms for guaranteeing pri-vacy of contents by reducing the overhead introduced by the encryption operations, in the first approach and providing higher availability of data, in the second one. To the best of our knowledge, our study uncovered a number of important find-ings related to both the proposed approaches and the specific privacy requirements demanded by users of DOSNs.

I.2 Outline

The contributions previously defined are presented in the rest of the thesis. In the following, we describe the structure of the thesis and the entries list of the publications related to each part.

Part I: Basic concepts We introduce the reader to the fundamental information related to both the Online Social Networks and the Decentralized Online Social Networks, which give to the reader the basis for understanding the scenario considered in the rest of the thesis.

Chapter 1 reviews the features of current Online Social Networks, as well as the approach used to decentralize them. We focus on the challenges raised by the decentralization of the Online Social Network by using the P2P approach.

Chapter 2 presents a comprehensive comparison to the privacy mechanisms used by the current DOSNs, allowing the reader to better understand the main privacy requirements of DOSNs, and how they impact the perfor-mances of the current DOSN implementations.

(28)

• [DSMR17a] Andrea De Salve, Paolo Mori, and Laura Ricci. “A Survey on Privacy in Decentralized Online Social Networks”. In: Computer Science Review Journal (2017). Under review

Part II: Modelling Online Social Networks We provide a detailed model of the reference scenario by identifying the properties of privacy policies defined by users and the resources/actors involved during the management of the privacy policies.

Chapter 3 describes the types of contents that need to be protected and how they are organized, as well as the actors involved during the protection and/or governance of the shared contents.

Chapter 4 characterizes the properties of users in OSNs by investigating how the users behave and the temporal properties of social groups of users. The results obtained from the analysis of real OSNs are exploited in the rest of the thesis to better evaluate the performance of the proposed solutions.

Contributions to the chapter

• [DS+16d] Andrea De Salve et al. “The impact of user’s availability on On-line Ego Networks: a Facebook analysis”. In: Computer Communications 73 (2016), pp. 211–218

• [DSGR16] Andrea De Salve, Barbara Guidi, and Laura Ricci. “An anal-ysis of ego network communities and temporal affinity for online social networks”. In: Smart Objects and Technologies for Social Good, Venice November - 2016, 2nd EAI International Conference on. Springer. 2016 • [DSGR17b] Andrea De Salve, Barbara Guidi, and Laura Ricci.

“Evalu-ation of Structural and Temporal Properties of Ego Networks for Data Availability in DOSNs”. In: Mobile Networks and Applications (2017), pp. 1–12

Part III: Cryptography-based model for privacy protection of contents We present the first main contribution of the thesis which consists in adopting a new encryption based approach for guaranteeing the privacy of contents in DOSNs. The proposed approach exploiting the strength of the Logical Key Hierarchy model (LKH) for managing the security requirements for contents encryption and to efficiently manage the dynamism of DOSN user groups. Chapter 5 discusses in more detail the architecture of the proposed system

and the data structures used by our approach in order to implement the privacy preserving group management. In addition, it provides a solution in the case of a higher number of users must be added simultaneously to a group.

(29)

Chapter 6 evaluates the quality and viability of our approach by presenting results we obtained by an experimental campaign run over a real data set extracted from Facebook.

• [DS+16b] Andrea De Salve et al. “Logical key hierarchy for groups man-agement in Distributed Online Social Network”. In: Computers and Com-munication (ISCC), 2016 IEEE Symposium on. IEEE. 2016, pp. 710–717 • [DS+17a] Andrea De Salve et al. “A Logical Key Hierarchy Based Ap-proach to preserve content privacy in Decentralized Online Social Net-works”. In: IEEE Transactions on Dependable and Secure Computing (2017)

Part IV: Policy-based model for privacy protection of contents We discuss the second main contribution of the thesis, proposing a new privacy preserving approach for DOSNs which exploits the privacy policies of the users to increase the availability of the users’ contents without diverging from their privacy pref-erences. In addition, it presents also the experimental results obtained from the simulations on traces taken from a real OSN.

Chapter 7 defines the privacy framework we proposed in order to allow users to define flexible privacy policies to regulate the accesses to the content they have shared by means of a proper Privacy Policy Language. The pro-posed approach has been designed to exploit both the temporal behaviour of users and the users’ privacy policies, for making decisions about the allocation and replication of the users’ profile data on the peers.

Chapter 8 presents the evaluation of our the different components of our proposal by using privacy policies which can be derived from a real OSN. In particular, we evaluated the performance of the following components: the privacy framework, the temporal properties, and the replica selection strategy.

• [DSGM17] Andrea De Salve, Barbara Guidi, and Paolo Mori. “Predicting the availability of users devices in Decentralized Online Social Networks”. In: Concurrency and Computation: Practice and Experience (2017). Un-der second round of revisions

• [DS+16c] Andrea De Salve et al. “Privacy-Preserving Data Allocation in Decentralized Online Social Networks”. In: Distributed Applications and Interoperable Systems. Springer. 2016, pp. 47–60

(30)

• [DSMR15] Andrea De Salve, Paolo Mori, and Laura Ricci. “A Privacy-Aware Framework for Decentralized Online Social Networks”. In: Database and Expert Systems Applications. Springer. 2015, pp. 479–490

• [DS+17b] Andrea De Salve et al. “Privacy and temporal aware allocation of data in Decentralized Online Social Networks”. In: Green, Pervasive and Cloud Computing, Amalfi Coast, Italy, The 12th International Con-ference on. Springer. 2017

Part V: Comparison and Discussion We compare the approaches we proposed against the popular solutions adopted by the current DOSNs and we discuss the advantages and limits of the proposed approach.

Chapter 9 presents the types of possible social groups that users of the DOSNs are able to define and measures the performance that each ap-proach achieves for different operations on a group. Finally, it assesses the suitability of each approach to support the different types of groups. Contributions to the chapter

• [DSMR17b] Andrea De Salve, Paolo Mori, and Laura Ricci. “Compari-son of the Privacy Controls for Group Communication in Decentralized Online Social Networks”. In: Technical Report - University of Pisa - De-partment of Computer Science (2017)

Conclusion and Appendices We terminate by drawing the conclusions and fu-ture works. In addition, we provide the supplemental materials related to the thesis.

Chapter 10 discusses the conclusions of the thesis and the possible improve-ments we plan to design as future works.

Appendices A,B presents an overview of the different P2P simulators used to assess the performance of the approaches, as well as the detailed XACML architecture used by our approach in order to define and evalu-ate the privacy policies.

(31)

(32)

(33)

Background

Abstract

In this chapter we introduce the reader to the main background notions related to the research field of DOSN (and OSN in general), by considering only the aspects on which this thesis is focused. For these reasons, we intro-duce some characteristics of the most popular OSN services, the fundamental aspects of P2P systems and the main challenges for the development of a DOSN.

1.1 Online Social Networks

Popular centralized OSN services (such as that provided by Facebook) allow users to create a profile and define connections (or friendships) with the other profiles in order to share contents. In order to help users to protect their personal con-tent, current centralized OSNs adopt a simple user-centric policy management ap-proach, where a privacy aware user is able to specify a policy that defines ac-cess to their personal contents. Most of the current centralized OSNs implement very basic access control mechanisms, by simply making a user able to decide which personal information are accessible by other members by marking a given content as public, private, or accessible to their direct contacts, or by providing simple variants to this basic setting. Table 1.1 reviews from [CFP09] the privacy options and purpose of several current popular centralized OSN. Besides the ba-sic setting, each OSN offers only a limited set of privacy options that typically ranges from a completely “private” to fully “public” option. Some of these services, such as Bebo (http://www.bebo.com/), Multiply (http://multiply.com/) and Xing (http://www.xing.com/), Facebook (http://www.facebook.com), Google+ (https://plus.google.com/) and LinkedIn (http://www.linkedin.com/) make users able to define privacy rules which give access rights to specific users or a users list, namely groups of contacts in Facebook and LinkedIn or circles of contacts in

(34)

OSN Purpose Protection Options

Bebo general public, private, 1st-degree contacts, selected contacts Google+ general public, private, 1st-degree contacts, selected contacts

Facebook general public, private, 1st and 2nd degree contacts, selected contacts Friendster general private, 1st-degree contacts, users from selected continents

MySpace general public, private, 1st-degree contacts, members > 18 years old Multiply general public, private, 1st and n degree contacts, selected contacts

Flickr photos public, private, 1st degree contacts Last.fm music public, private, 1st degree contacts

Xing business public, private, selected contacts LinkedIn business public, private, selected members

Table 1.1: Privacy options and purpose of the most current popular centralized OSNs.

Google+. Other centralized OSNs, such as MySpace (http://myspace.com/) and Last.fm (http://www.last.fm/), do not allow user to define subset of their con-tacts, but they simply provide the ability to share information with all the user’s contacts (1st-degree contacts), with all the contacts of the user’s contacts (2st-degree contacts) or all contacts achieved starting from the user’s contacts (from 1st-degree to n-degree contacts).

Most OSNs (such as Friendster and Xing) allow users to create only one type of relationship, a general friendship. Recently, some OSNs (such as Facebook and Google+) have included the ability to create specific types of relationships by simply allowing users to mark them as family, work, education, etc. In contrast, some other OSNs provide a wider range of relationships’ types, such as LinkedIn and Multiply, which enable users to choose among “classmate”, “business partner”, “colleague” and other types of relations. Then, these relationship types are exploited by the user to share contents to all his contacts having a specific type of relationship.

These simple access control paradigms have the advantage of being straight-forward and easy to be implemented, but they suffer from several drawbacks. The available protection settings do not allow users to easily specify their access control requirements because they are too restrictive or too loose. As a result, they may either limit too much information sharing or grant access to unauthorized users. Typically, such simple privacy policies are platform-specific and have the advantage of being easy to be implemented for the OSN provider. On the other hand, these access control paradigms lack of flexibility because are hard to be extended by users and hard to be implemented for various OSNs. Consequently, it is not possible for a user to state specific privacy policies based on heterogeneous users requirements. Besides relationships, some other information available in the OSN can be used for access control purposes.

(35)

Figure 1.1: P2P system and applications.

In recent years several proposals (such as [AVM07; Kru+06; Cho+06; Car+11; Too+09; SSP09; Cra+02; MPP10] have been presented to address these limitations. The main aims of these works are extending the access control mechanisms to im-prove their flexibility or limit undesired information disclosure.

1.2 P2P Systems

A P2P network [She+10] is a distributed network composed of many distributed, heterogeneous, autonomous, and highly dynamic peers in which participants share a part of their own resources such as processing power, storage capacity, software, and files contents. P2P systems have been widely used for the development of several decentralized file sharing protocols (such as BitTorrent [Bit] and Kademlia [MM02]) where users can upload/retrieve contents of different types. In addition, P2P sys-tems have been also used to protect the privacy of users (such as Tor [GRS99]) and to escape government censorship by implementing fully decentralized web search engines (such as YaCy 1). The P2P system can successfully support the develop-ment of peercasting applications, such as P2P TV for streaming media contents to consumers or contents delivery networks (such as Freecast 2 _{and Spotify} 3_).

Real-time communication applications which provide VoIP, such as Skype4 can also take advantage of the P2P systems in order to reduce the costs of the infrastructure. Recently, P2P systems have been used to increase the control of the OSNs users over their data by decentralizing the OSN (such as Diaspora [DGZ] and Friendica

5_).

The participants of the P2P network act as both server and client at the same time. Despite this, some P2P architectures rely also on the presence of centralized components in the system design (such as Napster). Indeed, partly centralized P2P systems include a dedicated controller node that maintains the set of participating nodes and controls some information on the whole system. Instead, in pure

decen-1_{http://yacy.net} 2_{http://freecast.com/} 3_{https://www.spotify.com/} 4_{https://www.skype.com} 5_{http://friendi.ca/}

(36)

tralized P2P systems, there are no dedicated nodes that are critical for the operation of the system and there is no inherent bottleneck, this makes P2P system poten-tially resilient to failure, attack, and legal challenge [Kra+]. In some decentralized P2P systems, nodes with higher number of resources and high availability act as super-nodes. These super-nodes have additional responsibilities, such as acting as a rendezvous point for nodes behind firewalls, storing state or keeping an index of available content.

P2P systems maintain an overlay network, which is a logical organization of network nodes, utilizing specific topologies (rings, hypercubes, trees ), where a pair of nodes connected by a logical link is aware of each other and communicates directly via the Internet. We distinguish between systems that maintain an unstructured or a structured overlay network. In an unstructured P2P system, there are no constraints on the links between different nodes, and therefore the overlay graph does not have any particular structure, so nodes can occupy any position within the overlay network. A new node can join the network and define its initial links randomly starting at the bootstrap node. In a structured overlay, each node has a unique identifier in a large numeric key space and identifiers are chosen in a way that makes them uniformly distributed in that space. The overlay graph has a specific structure and the identifier determine its position within the structure and constrains its set of overlay links.

A P2P system must be designed to ensure the following requirements [Kra+]: • High Degree of Decentralization, the peers implement both client and server

functionality and most of the system’s state and tasks are dynamically dis-tributed among the peers.

• Self-organization, this requirement is essential because of the churn of the network, i.e. nodes join and leave the network in a distributed manner and they can fail continually and rapidly. When a node joins the network, it builds connections with other nodes to become an active participant. When a node leaves the system, other participants have to remove existing connections to the node and adapt their local views to the current status of the network. • Scalability. This concept refers to the ability of a system to increase their

per-formance as long as the amount of available resources in the system increase. • Reliability, denotes the ability of the network to deliver its services even when

one or several nodes fail.

• Security must ensure that any information transmitted can be verified and protected against unauthorized access and modification. As there is no central data provider, security is difficult to manage and it relies on cryptography or hashing function.

(37)

Figure 1.2: General P2P look-up service.

1.3 The P2P look-up service: DHT

A Distributed Hash Table [Bal+03] (or DHT) provides implementation of a dis-tributed index across a number of nodes and defines a routing scheme which allows efficient retrieval/storage of a specific data item which is paired to a unique key identifier. Nodes and data stored in a DHT have an identifier, a unique value from the address space which is typically computed from the data itself via a collision-resistant consistent hash function, such as SHA-1. This kind of hashing avoid to remapping all the keys of the DHT whenever it is resized. Each data item is rep-resented by a pair (k, v) pair, where k is called key while v is the actual data item. Figure 1.2 summarizes the general operations performed by the P2P look-up ser-vice.

The nodes that join the DHT become responsible for a particular range of key identifiers and store a partial view of the whole distributed system. Based on this information, the routing scheme allows message to traverse several nodes, getting closer to the destination key at each hop, until the node responsible for the key is reached.

In particular, DHTs have the following characteristics:

• Each node stores only a partial view of the whole system by managing a small set of links to other nodes. Generally, these are O(logN ) references, where N depicts the number of nodes in the system.

• The nodes are responsible for providing all the data items that fall on a specific range of the identifiers space.

• A data item can be located by routing via O(logN ) hops.

• The load on different nodes can be balanced by distributing equally the iden-tifiers space.

• There is no centralized point of failure and bottlenecks can be avoided. • It is possible to introduce repair mechanisms which manage the departure or

(38)

The general API interface implemented by each node of the DHT consists of the following operation: i) the put function, which store the pair (k, v) on the node responsible for the key k, and ii) the get function which retrieves the data item paired with a specified identifier key k. As a result, DHTs provide a fast and fault-tolerant way of accessing data while reducing costs to maintain the service. For these reasons, DHTs are used by complex applications, such as content distribution systems and file sharing application (such as BitTorrent[Bit]) to implement distributed file systems or storage service. During the years, several DHT variants have been proposed. The main differences among these DHTs concern the organization of the identifiers space and the overlay topology. In the following, we are going to introduce Pastry [RD01a] and Chord [Sto+01] because, in addition to being used by our approaches, they are two of the look-up services most used by existing DOSNs.

1.3.1 Pastry

Pastry [RD01a] is a structured P2P overlay network which is based on a prefix routing schema and the nodes are organized according to a circular space. Each node in the Pastry network has a unique identifier nodeId which is used to give a position on a circular identifiers space (which ranges from 0 to 2128_{− 1). Usually, a}

nodeId can be assigned by applying a hash function to the IP address of the node along with other identifiers. The nodeId and keys are considered a sequence of digits with base 2b_{, where b is a configuration parameter with typical value is 4. For the}

purpose of the routing, each node maintains the routing information shown by the Fig. 1.3, which consists the following components [Kra+]:

Leaf set contains the set l nearest nodeIds which are either l/2 numerically larger or l/2 numerically smaller with respect to the current node’s nodeId (see Fig. 1.3b).

Routing table contains references to the nodes having the longest prefix shared with the current nodeId. The routing table is organized in log₂bN rows with

2b − 1 entries each. The 2b_{− 1 entries at row n of the routing table contains}

a reference to a peer whose nodeId share the same prefix in n digits with the current nodeId and whose (n + 1)-th digit has one of the 2b_{− 1 possible values}

different to the (n + 1)-th digit of the current nodeId. If no node is known with a suitable nodeId, then the routing table entry is left empty (see Fig. 1.3a). Neighborhood set As shown by the Fig 1.3b, the neighbourhood set maintains

information about the nodes that are very close to the current node in terms of network locality (or geographic distance).

Given a specific key k, the Pastry routing protocol routes a request to the peer numerically closest to k as represented by the Fig. 1.4. Initially, a node checks if the key k is managed by peers of the Leaf Set. If the case, the node is forwarded

(39)

(a) The routing table of Pastry

(b) The leaf set and the neighbour-hood set of Pastry

Figure 1.3: The routing information related to a Pastry node (Source: [Lua+05]).

directly to node in the leaf set whose nodeId is closest to k, otherwise the routing table is considered. In particular, the current node forwards the request message to a peer whose nodeId has at least one common share prefix with respect the nodeId of the current peer. If such a node is not known, the message is forwarded to a node whose nodeId with the same common prefix of the current nodeId and the key k. The routing procedure allows message to converge to the node that either i) shares a longer common prefix with the key than the current node, or ii) shares the same common prefix of the current node, but it is numerically closer to the key [RD01a] .

The choice of N and b determines the size of the routing table. Larger b increases the routing table size but reduces the number of hops. If identifiers are uniformly distributed in the 128-bit space, then the size of the routing table consists of about log₂bN · (2b− 1) entries and the expected number of hops required to route between

any pair of peers is O(logN ).

Since nodes may join or leave the system at any time, routing table must be peri-odically updated by considering the current state of the nodes. In particular, the Neighbourhood set is used to recover the routing state in case of nodes’ failures. The nodes exchange keep alive messages among neighbouring nodes and nodes failure is automatically detected when messages are missing. If the case, the current node has to adjust the leaf sets by asking for the leaf sets to their neighbourhood.

(40)

Figure 1.4: The routing information related to a Pastry node (Source: [Lua+05]).

(a) The routing table of Chord (b) The routing of Chord

Figure 1.5: The routing information and the routing protocol of a Chord node (Source: http://wwarodomfr.blogspot.it/2008/09/chord-routing.html).

1.3.2 Chord

Chord [Sto+01] organizes the nodes in a circular space and it introduces the re-quirement of clockwise routing over ring (see Fig. 1.5). The routing protocol of Chord is based on proximity routing approach, where each node maintains a set of references to the IP addresses and identifiers of its successors and predecessors in the circle. The nodes of the system, as well as the data items, are paired to indi-vidual identifiers of m bits randomly selected from the identifiers space [0, 2m− 1]. A data item with key k is assigned to the successor node of key k (also denoted by successor(k)): the first node whose identifier is equal to or follows (the identifier of) k in the identifier space. Indeed, the node responsible for a key k (successor(k)) is the first node clockwise from k.

(41)

table (see Fig. 1.5a). The i-th entry in the routing table of the node n contains the identity of the node responsible for key n + 2i_{, where 1 ≤ i ≤ 160. The first nodes}

in the routing table of n is its immediate successor on the circle. All the operations are considered modulo the size of the identifier space because the logical space is circular.

The routing protocol of Chord proceeds in multiple hops around the identifier ring (see Fig. 1.5b). When a node n does not know the successor of a key k, it requests the key k to a node in the table whose nodeId is closer to k. As a result, in each hop, the node finds the nodeId responsible for a key k by searching its finger table for the closest node f preceding k, sends the request to it, and waits for the result. Each hop eliminates at least half of the remaining distance to the desired successor [Sto+01]. The number of hops to route a message between any pair of nodes is about O(logN ) where N is the number of participating nodes to the network.

In order to initialize the routing table, a new node sends a query to an existing node for the successors of the nodeIds in the table. In such a way, the Chord protocol initializes the routing table.

Chord needs to deal with nodes joining the system and with nodes that fail or leave voluntarily. Each node has to update the information in its routing table in order to reflect the change in the network topology caused by the join or leave of node. For this reason, each chord peer periodically executes a stabilization protocol in background. The stabilization procedure updates the finger tables of the nodes and the successor pointers. In order to maintain consistent the mapping between keys and peers responsible for these keys, a node n that joins the network becomes responsible for a certain set of keys previously assigned to n’s successor. Symmetri-cally, when node n leaves the network, the keys managed by n are reassigned to the n’s successor.

1.4 Decentralized Online Social Networks

A Decentralized Online Social Network (DOSN) [Dat+10] is a OSN implemented in a distributed and decentralized way. Due to the similar network-based structure of P2P systems and social networks, DOSNs are very often implemented using a P2P approach where the nodes participating in the network are heterogeneous in terms of resources, demands, online behaviour, service capability, provided security, allocated resources, etc. Instead, DOSNs are based on a set of peers that take on and share the tasks needed to keep on the system. Besides deciding who can see the content they share, in DOSNs users are enabled to express their preference to decide where their information should be stored (on trusted servers, on the local computer, etc). Therefore, DOSNs shift the control over data to the end user and hence enhance security and privacy.

(42)

Figure 1.6: Challenges in Decentralized Online Social Networks.

require efficient, secure and privacy-aware solutions. In [Dat+10], Datta et al. iden-tified several challenges associated with the decentralization of the OSN focusing on the architectural view of the system. An overview of the challenges arising from DOSNs is shown in Fig. 1.6. Decentralizing the existing functionalities of OSNs requires finding ways for dealing with the following issues:

Social Dynamism The social dynamism is present in social relationships due to the variation of relations between users (such as friendship revocation). Infrastructural Dynamism The infrastructural dynamism arise when users’

de-vices join or leave the system and nodes of the DOSN can be online or offline. Contents diffusion The content produced by users must the distributed among the authorized members by applying personalized content dissemination (such as gossip and epidemic protocol) on specific or popular topics of interest. Performance The DOSNs performances should be monitored in order to ensure a

higher level of the service. Scalability of the service must be ensured in the case of number of nodes grows exponentially.

Security and Privacy Privacy is related to the capability provided to the user to decide which users can access the contents shared with other users, while security concerns the mechanisms (or tools) which are used to ensure that those decisions are properly enforced (for instance, using encryption).

(43)

Search and Addressing These issues are related to the problem of how a user can find its friends in the DOSN, and how can they discover new friend nodes registered on the service even they change their physical address over multiple sessions.

Third-party Application Information must be protected from third untrusted parties and only user can decide the private profile information that can be accessed by third-party applications.

Data availability Data should be stored at some nodes to be always available. As a matter of fact, users may join or leave the network at any time, but their data must remain available on some online nodes, such as at nodes run by friends or at random nodes.

Topology management Nodes should be connected according to a topology which ensures protocols for routing messages between nodes of the system.

Motivated by the increasing awareness about privacy of users [Mag12] several researchers have been proposed several DOSNs architectures. In addition, a lot of works put forward novel solutions for particular issues in DOSNs. We review them in more detail in chapter 2.

1.5 Conclusion

In this chapter we presented the main features which characterizes the OSNs services, showing that privacy preserving mechanisms of the most popular OSNs provide limited capabilities.

In addition, we review the scheme of the most popular P2P system, which will also be exploited by the approaches presented in this thesis. Finally, we provided an overview of the efforts and challenges needed to decentralize the OSN service.

(44)

(45)

Related Works

Abstract

This chapter allows the reader to better understand the main privacy requirements of DOSNs, and how they impact the performances of the current

DOSNs implementations. We present a comprehensive comparison of the

privacy mechanisms used by the current DOSNs. We identify a large set of DOSNs which have been proposed in the literature and for each of the selected DOSN, we study the architectural style used to provide independence from a centralized provider and the approaches implemented to enable users in defining their privacy preferences (i.e., the privacy model and the privacy policy management). To help the reader in understanding the characteristics of the different privacy models, we evaluate the overhead introduced by the privacy policy management in terms of number of cryptographic keys created, and number of encryption operations required.

2.1 Decentralizing the Online Social Networks

A current trend for developing OSNs that do not rely on a centralized service provider is moving towards the decentralization of the OSN service. A Decentralized Online Social Network (DOSN) [Dat+10] is a OSN implemented in a distributed and decentralized way. The approaches exploited by current DOSNs to provide indepen-dence from a centralized provider are typically based on Peer to Peer (P2P) archi-tectures (such as a Distributed Hash Table [Bal+03] or network of interconnected trusted servers). Indeed, DOSNs are free from centralized authorities’ control, since there is no designed main server; every participating user can act both as a server and as a client, depending on the context [Kwo11]. The approaches used by cur-rent DOSNs to provide independence from a centralized authority combine multiple architectural levels, each with its own features. In order to allow efficient lookup of identifiers and routing to the corresponding nodes, many of the systems recently proposed rely on two different P2P architectural styles: