Interest clustering techniques for efﬁcient event routing in large-scale settings∗

(1)

Interest clustering techniques for efficient event routing in

large-scale settings

^∗

Leonardo Querzoni

Dipartimento di Informatica e Sistemistica, Sapienza, Universitá di Roma Via Ariosto 25, Rome, Italy

[email protected]

ABSTRACT

The publish/subscribe interaction paradigm is today becom- ing mainstream in a large number of very large scale applications like news syndication (with RSS) or massive multi- player games. These applications are often still implemented by means of centralized services that will hardly scale with the user growth expected in the next years. Modern publish/subscribe systems are striving to address these scalability needs to play a dominant role in this future market. A very important contribution, on the road to reach this goal, is given by the interest clustering techniques adopted by these systems. Interest clustering aims at putting in close applicative relationship groups of users sharing similar interests in order to reduce the effort needed to dispatch a message to group. This technique can be applied to event dissemination mechanisms based on filtering to reduce the total amount of messages generated during event routing and, consequently, improve the overall system performance.

In this paper we explore this topic to discover the poten- tialities of interest clustering, to understand how it can be implemented in a publish/subscribe system, and to study, through a small focussed survey, the central role played by this technique in modern systems.

Categories and Subject Descriptors

H.4 [Information Systems Applications]: Miscellaneous

1. INTRODUCTION

The widespread adoption of the clients/server interaction paradigm has led in the past to the development of distributed applications with a rigid structure, constrained by the lack of flexibility of point-to-point and synchronous interactions. The evolution of the Internet, pushed in the last years by the huge growth of large-scale systems in the form of peer-to-peer and social applications, is clearly marking the limits of this approach to communication, and raising the

∗This work was partially supported by the ReSIST Euro- pean network of excellence

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

DEBS’08, July 1-4, 2008, Rome, Italy

demand for more flexible interactions schemes. The publish/subscribe interaction paradigm has been introduced in the past as an alternative to the clients/server sibling, with the aim of providing a form of communication where inter- acting parties are decoupled.

In a publish/subscribe interaction participants to the communication can act both as producers (publishers) or con- sumers (subscribers) of information that takes the form of events. Subscribers can express which events they want to receive issuing subscriptions. Subscriptions express conditions on the content of events (content-based subscription model) or just on a category they belong to (topic-based subscription model). The paradigm states that once an event is published, for each subscription whose conditions are satisfied by the event (i.e., the event matches the subscription), the corresponding subscriber must be notified.

The basic building block of systems implementing the publish/subscribe paradigm is a distributed event notification service whose goal is to diffuse any published event from the publisher to the set of matched subscribers.

The complete decoupling offered by this form of interaction makes it very appealing for modern distributed applications characterized by very large scale and an inherently unsta- ble population constituted by users that can join or leave the system at any moment or fail due to their unreliability. To support the publish/subscribe interaction in these adverse settings, modern publish/subscribe systems use a a peer-to-peer event notification service, built on top of an overlay network connecting all user nodes (both publishers and subscribers). The main problem with such systems is the adoption of an event diffusion mechanism able to scale with both the size of the system (number of users) and the load (number of events and subscriptions).

From this point of view, the scale of these systems severely limits the applicability of simplistic approaches for event diffusion like event or subscription flooding. The complex form of information selection defined by the subscription models can be leveraged to greatly reduce the amount of resources (either network bandwidth and node computational power) used by the event diffusion mechanism to notify subscribers of incoming events. This is usually accomplished through some form of filtering that, exploiting information extracted from issued subscriptions, limits the diffusion of each event only to those parts of the overlay network where target subscribers are located [5].

(2)

Event filtering has been shown to be effective only for those application characterized by subscription regionalism, i.e., where subscriptions matched by same events are hosted on nodes localized in the same region of the overlay network. In scenarios where no regionalism is present the improvement obtained with event filtering with respect to simple flooding is often negligible [13] and does not compensate the overhead generated to spread subscription information.

In these latter cases it is possible to adopt a technique called interest clustering whose purpose is to analyze and adapt the system at run-time in order to mimic the presence of subscription regionalism. With interest clustering, nodes hosting similar subscriptions are moved to close logical dis- tance inside the overlay network with the aim of reducing the number of independent paths each event has to travel to reach all its target subscribers.

In this tutorial paper we will investigate how interest clustering works, how it can be integrated in an existing event diffusion mechanism and how state-of-the-art publish/subscribe systems leverage it.

The paper is structured as follows: Section 1 introduces the paper; Section 2 gives the reader a necessary background on publish/subscribe systems; Section 3 explains the details of the interest clustering technique; Section 4 contains a small survey of existing systems that explores how interest clustering can be differently implemented and how these differences impact system performance. Finally, Section 5 concludes the paper.

2. BACKGROUND

The publish/subscribe interaction paradigm provides users of a system with an alternative communication model with respect to the classical client/server model. In a publish/subscribe system users interact playing one of two roles: publishers that produce information and inject (publish) it into the system in the form of events, and subscribers that consume events received from the system. A third component, the event notification service, has the role of receiving the events injected by the publishers and notify all the subscribers that can be interested in those events. The event notification service plays in the system a role of a mediator between publishers and subscribers, decoupling the interactions among them in time, space and flow [9].

Subscribers can define the events they are interested in by issuing subscriptions. Each subscription works as a filter on the set of events injected in the system: the subscriber will be notified about events that satisfy (match) the conditions expressed by its subscriptions. Subscriptions can be expressed in various ways depending on the subscription model adopted by the system. Currently only two models have been widely accepted by the community working on publish/subscribe: the topic-based and the content-based models.

The topic-based model is the simplest one. it requires each event to be tagged by a single identifier called topic. The topic completely summarizes the event content, while its details remain completely concealed to the publish/subscribe system. With this model subscribers can simply declare

their interest in one or more topics. The event notification service will then notify subscribers about all the events tagged with the topic they subscribed. While this selection method can appear simplistic it actually matches the requirements of a large number of real-world applications where events divide naturally into groups that correspond to users interests. A typical example is the increasingly popular RSS news syndication system [1]. An RSS system is a simple topic-based publish/subscribe system. Publishers distribute their news by publishing them into a RSS feed and providing the URL of this feed on a website. Users subscribe to this feed by specifying its URL to their client applications.

Note that almost every existing RSS application relies on a rather primitive implementation where RSS readers periodically poll the feed. With todays continuous dramatic increase in the number of RSS users and feeds, it is highly probable that, to attain scalability, future implementations will move to more complex push-based architectures.

The content-based model increases user selection flexibility by means of more expressive subscriptions. This model requires the definition of an event space common to all the users. The event space is a collection of attributes each characterized by a name and a type (integer, floating-point, strings, etc.). Given a specific event space, an event is a collection of values, one for each attribute. The greater flexibility of this model comes from the possibility of defining each subscription as a complex expression constituted by constraints expressed on the attributes defined by the event space. A subscriber will by notified by the event notification service about an event only if it satisfies the expression contained in one of the node’s subscriptions. The finer event selection granularity allowed by this model typically comes at the cost of more sophisticated protocols with higher runtime overhead, as well as more sophisticated user interaction.

In order to support the complexity induced by the subscription model in an environment characterized by a possibly very large set of dynamic users, most modern publish/subscribe systems are based on a peer-to-peer architecture.

These systems usually rely on the services offered by state- of-the-art overlay management protocols [16, 14, 10] that can provide the system with both strong connectivity and, in some cases, with efficient message routing primitives. In these publish/subscribe systems the event notification service is realized by a distributed protocol running on the user nodes. An event published on a node must then be routed trough the overlay network connecting the whole event notification service toward all the nodes representing target subscribers. This problem is usually referred to as the event routing problem.

Event routing in a distributed event notification service can be implemented using various techniques. A very basic solution is represented by the event flooding approach: each time an event is published it is flooded all over the event notification service, reaching all the nodes constituting the overlay network, and, as a consequence, all the possible target subscribers. While this approach correctly solves the event routing problem, it is clearly non scalable with the total number of users: with event flooding each node in the system receives every published event, regardless of the real interests expressed by its subscriptions.

(3)

P S

S

S S S

S

S S

(a)

P

S

S S S

S

(b)

Figure 1: The effect of subscription regionalism on event diffusion. Figures (a) shows diffusion of an event when subscriptions are spread over the event notification service (no regionalism). Figure (b) shows the diffusion of the same event when interests are clustered by subscription regionalism.

This lack of scalability come from the fact that usually in a publish/subscribe system the set of subscribers that must be notiﬁed about an event is a small percentage of the user population¹. Ideally, in order to lower the cost of event routing, each node should only receive and process events that match its subscriptions. An event reaching a node whose subscriptions are not matched can be considered as a sort of spam message, i.e., an unsolicited, useless message. Low- ering the cost of event routing is a relevant objective [5, 17]

because it allows to reduce the overall processing load at each node, due to event matching against subscriptions and event forwarding, thus enhancing the scalability of the whole system. In order to reach this goal, spam messages should be avoided as long as it is possible.

The event flooding approach produces a huge amount of spam message for every published event. Its performance can be usually improved by introducing some form of event filtering [5]. Systems employing event filtering maintain routing tables whose content is based on the selection cri- teria defined by subscriptions. These routing tables can be exploited at event diffusion time to isolate from the diffusion parts of the overlay network where no interested subscriber is located. By reducing the total number of nodes interested by the diffusion of an event, systems based on this filtering technique can reduce the number of produced spam messages. Note, however, that this reduction is not always as large as desired: if the subscribers target for an event are dispersed in various parts of the overlay network, the event will probably travel through a lot of different independent paths producing, again, a large amount of spam messages.

3. INTEREST CLUSTERING

Spam messages are generated by the the event diﬀusion mechanism whenever an event is forwarded inside the event notiﬁcation service from the publisher node to the set of target subscriber nodes. This production of spam messages is

1Applications where every event must be notiﬁed to a large part of the user population are usually better served by broadcast-like services.

more evident if target subscribers are dispersed in the overlay network.

Applications characterized by subscription regionalism be- have diﬀerently. In these applications users in close geo- graphic relationship usually share similar subscriptions that are matched with high probability by the same events. Ge- ographic vicinity often maps to corresponding node vicinity in the system overlay network and this means that users sharing similar subscriptions are usually not dispersed in the overlay, but rather are all localized in a region of it.

When an event matching several subscriptions is published in a system enjoying subscription regionalism there is a high probability that it will travel through a single path from the publisher node toward the region of the overlay hosting all the target subscribers, thus generating a small number of spam messages.

Figure 1 shows an example that clarifies the effect of subscription regionalism on the performance of an event diffusion mechanism employing the filtering technique. Figure 1(a) depicts the event notification service of an imaginary application that do not enjoy subscription regionalism. Dot- ted circles marked with S represent subscribers interested in a same event published by the white node P . Black arrows represent messages generated by the event diffusion mechanism after the publishing of an event in order to reach all the target subscribers. Even if filtering avoids the diffusion of the event through useless paths, nevertheless, due to the lack of subscription regionalism a lot of spam messages are generated (11 on a total of 19 messages). Figure 1(b) shows the same example, but with subscription regionalism. In this case the event follows a single path from the publisher P toward the region of the overlay network hosting the tar- get subscribers, considerably reducing the amount of spam messages (3 on a total of 11 messages)

However, subscription regionalism is a characteristic shared only by few speciﬁc applications (e.g. the diﬀusion of news related to some topics like politics or sports are often of

(4)

interest only for a specific country). For this reason various projects in the publish/subscribe area introduced during the last years new systems integrating some mechanisms aimed at inducing a form of artificial regionalism in the system also for those applications that do not naturally enjoy it.

All these mechanisms implement, with different solutions, the same technique called interest clustering.

The concept behind interest clustering is simple: nodes hosting subscriptions that have a large probability to be matched by the same set of events should be positioned close (in term of application-level hops) inside the overlay network connecting the whole system. This technique, in practice, imitates the behaviour of subscription regionalism adapting at run-time the topology of the overlay, with the aim of generating the ideal conditions for the efficient delivery of events.

3.1 Subscription similarity

The first step needed in the implementation of the interest clustering technique is a method for estimating the similarity between two users interests. This estimation will then represent the criterion used to decide if the two nodes must be clustered together or not. Obviously, this method must necessarily be based on the specific subscription model adopted by the publish/subscribe system.

Defining similarity between the interests of two nodes in a topic-based publish/subscribe system is intuitive: two subscriptions are either identical, if they refer to the same topic, or distinct, if they refer to different topics. This easy approach is a direct consequence of the non intersection of interests among distinct topics and has a strong positive impact on the complexity of the similarity estimation method.

Things get more complicated when we consider a content- based selection model. In this case various methods can be adopted; two subscriptions can be analyzed to determine the area of the event space their intersection occupies; in this case the similarity among them can be defined as the ratio between the size of this area and the size of the total area occupied by the two subscriptions: two disjoint subscriptions (i.e., no similarity) will lead to an estimation equal to zero, while two identical subscriptions (i.e., maximum similarity) will lead to an estimation equal to 1. Note, however, that this approach requires a method for calculating the area of the intersection between n-dimension subspaces defined by different subscriptions (where n is the number of attributes defined in the event space), and the complexity of this method is usually exponential with n. Moreover, the definition of intersection becomes non trivial when we consider non numeric attributes, like strings. Finally, this approach only takes into account similarity between subscriptions without evaluating how many events will actually match those subscriptions. This last detail can sometimes be important: paying some costs to cluster two nodes containing similar subscriptions can be worthless if that subscriptions will not be matched by any event.

A possible alternative that solves these limitations makes use of histories of matched events [4]. Each node is required to maintain a history containing the most recent events that matched subscriptions hosted locally. Similarity between

two nodes is then evaluated comparing their histories by counting the number of events in common. By focussing on matching events rather than subscriptions this approach overcome all the limitations previously listed.

3.2 Event diffusion in a clustered system

Given that a method for measuring similarity between the subscriptions hosted on two nodes is available, clustering can be realized simply adding a logical link between them. This link can be added either to the overlay network connecting all the nodes in the system (general overlay), or to a separate overlay network embodying the specific topic cluster (cluster overlay). These two different clustering methods mainly affect the flexibly and cost of cluster management. Using a single general overlay network for clustering offers the possibility for large cost savings especially if we consider the ability of the system to scale with respect to the number of subscriptions per node. Consider, for example, a topic based system where two nodes have two distinct subscriptions in common: with this method it is possible to share between the two nodes just a single link in the general overlay, while in the other case (where two distinct overlays are generated for the clusters representing the two subscriptions), two distinct links must be created and managed, thus doubling the cost. However, managing clusters with distinct overlays gives system administrators greater flexibility for choosing different strategies to event diffusion. Considering again the previous example, if the two topics subscribed by the two nodes require different guarantees on the diffusion of events (e.g. one topic requires an all-or-none diffusion semantic, while the other topic requires a simple best-effort semantic), their corresponding clusters can be realized employing different overlay network management protocols able to deliver the required service level.

Ideally each cluster should contain all and only the subscribers interested in a given event. In this way, once the event reaches one member of the cluster, its diffusion can be limited to the cluster itself, thus confining as much as possible the network traffic generated for its diffusion. This form of interest clustering can be defined as perfect. How- ever, perfect interest clustering cannot be always attained depending on the chosen strategies for similarity estimation and clustering method. When interest clustering is not perfect nodes containing subscriptions matched by a same event are not necessarily clustered together: they can be grouped in several distinct smaller clusters, or they can simply be dispersed in the general overlay.

Event diffusion in a publish/subscribe system characterized by interest clustering requires, ideally, two consecutive steps:

1. Outer-Cluster Routing. An event can be generally published on any node of the system, therefore it must be routed from there, to at least one node belonging to the target cluster. Note that traffic confinement is fully attained when non-interested subscribers do not receive the event. Then, the goal of outer-clustering routing is to reach the target cluster involving a number of nodes (usually not interested in the event) as small as possible, thus generating the minimum number of spam messages.

(5)

2. Inner-Cluster Diﬀusion. Once the event reaches one member of the interested cluster, as long as clustering is perfect, the dissemination inside the cluster can fol- low a simple ﬂooding-like scheme. Alternatively more sophisticated routing techniques can be used to save network resources [18].

In order to improve system performance during event diffusion outer-cluster routing should strive to reach represen- tatives of the target cluster using the minimum number of distinct paths originating from the publisher node. if clustering is perfect outer cluster routing can send a single copy of the event to only one of the target subscribers. Once this subscriber receives the event, its diffusion can continue confined inside the cluster (inner-cluster diffusion) without involving any other non-interested node, but guaranteeing at the same time that all the target subscribers will correctly receive the event (because the cluster is perfect).

4. A SURVEY OF THE CURRENT STATE

OF THE ART

In this section we will give the reader an overview on various publish/subscribe systems that implement, with different choices, interest clustering. This section is not meant to be an exhaustive survey of publish/subscribe systems, but only a way to show how interest clustering can take different forms and provide different performance improvements in different systems.

4.1 SCRIBE

SCRIBE [6] is an application-level multicast infrastructure for large scale settings. Even if SCRIBE has not been proposed as a publish/subscribe system, its behaviour actually matches the one of a topic-based system, with a few minor diﬀerences in the software API.

SCRIBE is built on top of Pastry [16], an overlay management protocol that can build and maintain a distributed hash table (DHT) software abstraction. The DHT can be used to store and retrieve data, or simply to route messages among nodes in the system. The advantages of employing this abstraction lies in the fact that every operation on the DHT refers to abstract objects (keys) deﬁned over a virtual space (key space) and not directly to nodes. The mapping between keys and nodes is realized and continuously maintained by Pastry, despite nodes joining or leaving the system, and despite unexpected failures. This decoupling between nodes and keys greatly helps the development of large scale applications that can rely on it for building more robust services on top on an intrinsically unreliable environment.

SCRIBE leverages the powerful storage and lookup primitives oﬀered by Pastry to implement topic-based event dif- fusion based on the rendez-vous approach. This approach is based on the idea that each topic is mapped to a node that is the sole responsible in the system for managing subscriptions for it and diﬀusing the matching events to the target subscribers.

SCRIBE does not implement explicitly what we deﬁned as interest clustering. The rendez-vous node is responsible for

P

S₁

S₂

S₃

S₄ S5

RV_T

Cluster_T

Figure 2: The SCRIBE system.

managing the group of nodes that subscribed the topic assigned to it. From this point of view the cluster management is completely demanded to this node. In order to efficiently diffuse events while reducing maintenance costs, the rendez- vous node arranges all the subscribers in a multicast tree rooted on itself. Each multicast tree represents the cluster associated to a specific topic.

Outer-cluster routing is realized simply leveraging the Pas- try lookup procedure: the event is sent from the publisher node toward the rendez-vous node, that is treated as a node belonging to the cluster. The rendez-vous node then real- izes the inner-cluster diﬀusion by ﬂooding the multicast tree rooted on itself.

Multicast trees built from SCRIBE do not contain only subscribers for a same topic. Each subscriber, in fact, leverages Pastry’s lookup procedure to convey its subscription to the correct rendez-vous node; this procedure makes the sub- scription travel on a path that traverses a certain number of other nodes that only act as messages routers. This same path is then used, in the opposite direction, by the rendez- vous node to diﬀuse events toward the subscriber. All the intermediate nodes involved in this path will then route every event targeted at the subscriber without being actually interested in it (as they are not necessarily subscriber for the same topic), thus generating spam messages during the inner-cluster diﬀusion phase. From this point of view we can consider SCRIBE as an example of a system implementing interest clustering in a unique overlay network: the one provide by the single Pastry instance.

Another important disadvantage of SCRIBE’s architecture is the single access point assigned to each topic cluster, i.e., the topic rendez-vous node. This node is required to handle every event and every subscription² related to that speciﬁc

2SCRIBE employs a smart tree management procedure that tries to reduce the amount of subscribe/unsubscribe re- quests that reach the rendez-vous node. Therefore, for top- ics with a large number of subscribers, the number of subscribe/unsubscribe messages that reach the root of the multicast tree is small.

(6)

Cluster_T

General overlay

P

RV_T RV_T

S₁ S₂

S₃

S₄ S₁

S₂

S₃ S₄

Figure 3: A publish/subscribe system based on a 2-dimensions CAN.

topic, even if it is not interested in that topic.

Figure 2 shows an example of a system using the SCRIBE architecture. Nodes contained in the circled area constitute the multicast tree associated to a topic T . The rendez-vous node RVT is the root of this multicast tree. When a pub- lisher P publishes an event for T this is routed through the Pastry overlay toward node RVT. This node then ﬂoods the event in the multicast tree. Note how the multicast tree contains the rendez-vous node, the topic subscribers (S1, . . . , S5) and a certain number of other nodes (black circles) not interested in the event.

4.2 CAN

An approach to topic-based publish/subscribe similar to the one employed in SCRIBE was also adopted by [15]. This publish/subscribe system adopts CAN [14] as a substrate for routing data and managing node connectivity. Similarly to Pastry, CAN also implements a distributed hash table abstraction, delivering eﬃcient routing of messages in a vir- tual key space, with the only lying in the d-dimensional key space adopted by CAN.

The innovation introduced by [15] was the adoption of to- tally separated overlay networks for distinct topic clusters.

In [15] each topic is assigned with a rendez-vous node that acts as an access point to the topic cluster. However, differently from SCRIBE, each topic cluster is a different instance of the CAN protocol providing full isolation between cluster participants and other non interested nodes. When a node wants to subscribe a topic, it starts a join procedure sending a message toward the rendez-vous node responsible for the specific topic, and asking it to join the corresponding cluster. From this point of view, the subscription to a topic is treated by [15] as a simple join procedure to a specific CAN instance. The bootstrap node used for joining this instance is obviously the topic rendez-vous node that will for sure be part of the topic cluster.

Exactly as in SCRIBE, outer-cluster routing is realized simply by exploiting the CAN routing mechanism: the event is forwarded through the general CAN instance toward the topic rendez-vous node. This node will then initiate the inner-cluster diffusion phase. The authors of [15] advocate the use of any flooding-like algorithm for inner-cluster diffusion (but also provide a possible efficient implementation), in order to reach all the target subscribers. The interesting aspect is that in this case, only nodes subscribed to the target topic are part of the cluster, and all and only them will receive every matching event. The only exception to this rule is, again, represented by the rendez-vous node: given that this role is deterministically decided by the mapping be- tween node identifiers and the d-dimensional key space, the rendez-vous node not necessarily is interested in the topic it is asked to manage. Nevertheless, this single node will suffer the burden of all events published in the system that match the topic it manages.

Figure 3 shows an example of a system built using [15]. As the figure shows, differently from the SCRIBE architecture, the cluster associated to topic T is realized with a com- pletely independent overlay network (top rectangle in the figure). Nodes belonging to this cluster are, obviously, also part of the general overlay constituting the event notification service. In this case spam messages can be generated only in the outer-cluster routing phase as this is realized in the general overlay. As soon as the event enters the cluster through the rendez-vous node RVT, only nodes that must be notified about it will be interested by its diffusion.

4.3 Data Aware Multicast

As we saw in the previous sections, both SCRIBE and [15]

suﬀers from the presence of rendez-vous nodes that are re- sponsible for topics they are not actually interested in, and for which they suﬀer a noticeable load. This common drawback is a direct consequence of the fact that both systems rely on the underlying DHT mapping function to assign deterministically a single access point to each topic cluster.

Data Aware Multicast [2] (daMulticast) avoids the adoption of any structured DHT-like overlay management protocol to embrace a fully gossip-based approach.

DaMulticast is a topic-based publish/subscribe system where topics are assumed to be arranged in a hierarchical structure. This hierarchy represents a containment relationship between topics: if an event is published that matches a child topic, it will also match the father topic. DaMulticast re- lies on the gossiping technique of [10] to build and maintain groups of nodes. Each group is mapped to a speciﬁc topic, and each group is constituted by all and only the nodes subscribed to that topic. In this way daMulticast realizes a perfect form of interest clustering. Nodes in each group maintain through [10] a partial view of the group population large enough to probabilistically guarantee group connectivity beside nodes joining or leaving it. Groups representing two topics in a father-child relationship are connected through a set of inter-group links maintained probabilistically. In this way daMulticast builds a hierarchical overlay topology where each vertex of the hierarchy represents a group of nodes subscribed to a topic, and where the hierarchy itself perfectly matches the topic hierarchy.

(7)

Cluster

A

Cluster_B Cluster_C

Figure 4: Data Aware Multicast.

Event diffusion is realized assuming that the publisher node has previously joined the group representing the matched topic; this means that in daMulticast the outer-cluster routing phase is completely avoided. The publisher node will then (i) diffuse the event inside the group it belongs to and then (ii) probabilistically propagate the event to the upper level group using inter-group links. In this way the inner- cluster diffusion proceeds from the lower level groups, up till it reaches the root level group where event diffusion ends.

All the groups representing topics matched by the event will be interested by the diﬀusion process.

Figure 4 represents a possible daMulticast system for a hierarchy constituted by three topics, A, B and C, where A is the father of topics B and C. As the figure shows, daMulti- cast builds three separate groups containing the nodes subscribed to the corresponding topics, and then connects the groups though a number of links (dotted grey lines between groups). An event published inside the group associated to topic B by the white node (publisher) will be diffused inside the corresponding cluster and probabilistically forwarded to the upper cluster where, again, it will be diffused. Note that events diffused in a cluster can only be forwarded to clusters at higher level in the hierarchy, i.e., the event is not forwarded from the top cluster to the cluster associated to topic C.

Clearly the main drawback of the solution provided by da- Multicast is the fact that publishers are forced to join the topics where they are interested to publish events. This aspect obviously forces these nodes to join distinct groups for each topic they want to publish in. As a consequence all these nodes will receive every event diﬀused in these groups, regardless of the fact that they not necessarily subscribed the corresponding topics.

4.4 Sub-2-Sub

Sub-2-Sub [18] noticeably departs from the systems analyzed so far, as it implements a content-based selection model. In Sub-2-Sub each subscription is a conjunction of conditions expressed on the attributes of the event space. Each con- dition can either be a discrete value or an allowed range.

As we previously showed (Section 3) the content-based selection model poses serious problems in the deﬁnition of a

similarity metric between subscriptions. Sub-2-Sub assumes the event space to be constituted only by floating point attributes (therefore any attribute that can be mapped on a floating point value is allowed). Each subscriber in Sub-2- Sub is identified by its subscription.

Sub-2-Sub employs an epidemic distributed algorithm to cluster together nodes whose subscriptions have non empty intersection [8]. Each subscription is partitioned, by means of this algorithm, in the minimal number of sub-ranges such that a distinct cluster is associated to each sub-range and each event matching a sub-range will match all and only the subscriptions identifying all the nodes in the corresponding cluster.

Nodes in a cluster are organized in a ring-like structure whose main purpose is to guarantee that inner-cluster diffusion, initiated by one of the nodes in the ring, will reach all the nodes belonging to the cluster. In order to improve event diﬀusion performance each ring is also traversed by a number of random shortcut links.

The approach proposed by Sub-2-Sub is able to effectively cluster interests beside the difficulties posed by the complex content-based model. However, the way clusters are assigned to subscription sub-ranges makes impossible to limit the number of clusters a node will belong to, and, therefore, limit the number of ring links it will manage. This number, in fact, will depend at run-time from the number of decomposition needed for each range constraint defined in its subscription, i.e., from all the possible intersections with other subscriptions active in the system. This number can quickly grow to very large values in a system with even a moderate number of subscriptions, raising some doubts on the scalability of this approach to interest clustering.

Finally, another problem involved with the method used by Sub-2-Sub to build and maintain clusters, is that a publisher, before publishing an event, must join the cluster whose sub- range will be matched by that event. The publisher node wil then start the inner-cluster diffusion. Similarly to daMul- ticast, in this way Sub-2-Sub completely avoid the outer- cluster routing phase of event diffusion. However, this approach seems difficultly applicable to systems where publishers generate events that are not concentrated on a single limited area of the event space. A publisher generating events that are dispersed in the event space will, in fact necessarily change continuously the cluster it belongs to.

4.5 TERA

Many of the limitations that aﬀect Sub-2-Sub are a consequence of the complex content-based model it adopts. Many of these problems have been addressed in TERA [3], a sim- pler topic-based publish/subscribe system.

TERA inherit the same structure previously adopted by [15]:

each cluster is realized by a dedicated overlay network connecting only nodes subscribed to the same topic. However, diﬀerently from [15], TERA leverages a gossip based overlay management protocol to maintain all the nodes connected in a single general overlay, avoiding the restrictions imposed by a structured DHT-like protocol.

(8)

General Overlay Cluster_A Cluster_B

P S₁

S₂

S₃ S₄

S₅

S₆ S₁

S₂

S₃ S₄ S₅

S₆

Figure 5: The TERA system.

The main innovation introduced by TERA is the outer- cluster routing mechanism used for both subscribing to topics and diﬀusing events. Each node participating to TERA periodically advertises to other nodes chosen at random in the system its subscriptions. The random choice is realized through a peer sampling service [11] that probabilistically guarantees the uniformity of the samples it provides. Ad- vertised subscriptions are used by receiving nodes to update a local data structure (a table) that contains information about nodes that can act as access point for clusters associated to some of the topics active in the system. The content of this table is continuously updated such that its content satisﬁes the following properties: (i) the topics listed in each table are a uniform random sample of all the topics active in the system (i.e., all the topics subscribed by at least one node) and (ii) each node listed in the table and associated to a topic is a uniform random sample among all the nodes subscribed to that topic.

The tables maintained on each node participating to the general overlay are then exploited during the outer-cluster routing phase. Routing toward a topic cluster is realized by means of a simple random walk that, at each step, looks inside the table stored on the reached node. The random walk stops when it finds an access point for the target topic or if it lasts a predefined maximum number of hops. The developer, by carefully tuning the maximum random walk length and the size of the tables, can configure TERA to deliver events to topic cluster with a probability of success as large as needed. The same outer-cluster routing mechanism is also used by nodes subscribing to a topic, in order to find a bootstrap node needed to join the desired topic cluster.

Inner cluster dissemination can then be realized using any protocol able to ﬂood a connected group of nodes.

Figure 5 represents a possible TERA system. The figure shows how all the system participants are connected by a general overlay. Participants are then clustered in separate overlay networks depending on their interests. Arrows in the figure represent the path followed by an event published by node P in topic A for its diffusion. The event is first routed in the general overlay following a random path until an access point to the cluster associated to topic A is found.

Afterwards, the access point will start the inner-cluster diffusion phase inside the appropriate cluster.

Cluster_T

S₁

S₂

S3

S₄ S₅

S₆

Figure 6: The Spidercast overlay network.

The clear advantage that this solution oﬀers with respect to [15], is the absence of a single, ﬁxed, access point dedicated to each topic cluster. In TERA each node subscribed to a topic can act at any moment as an access point for the corresponding cluster, thus evenly distributing the load among the participants. Moreover, all the access points in TERA are actually subscribers of the same topic; from this point of view TERA is capable of attaining perfect interest clustering.

The interesting characteristics provided by TERA come at a cost. Each node is, in fact, required to manage a number of links that is directly proportional to the number of topic it subscribes. The management of these links can , in prin- ciple, become expensive, especially in environments where nodes often change their subscriptions. Moreover, in order to limit the length of random walks used for outer-cluster routing, the tables maintained at each node must have a size proportional to the total number of active topics, mak- ing this system non scalable with respect to the number of topics.

4.6 Spidercast

Spidercast [7] has been designed with the same goals of TERA, but using an opposite approach to interest clustering. Spidercast designers started from the idea that systems that maintain a separate overlay for each cluster (like TERA does) suﬀer from being non scalable with the number of subscriptions each node handles. This is due to the fact that in these systems the number of links managed by each node grows linearly with the number of subscribed topics.

To overcome this problem Spidercast relies on a single overlay approach where interest clustering is realized adding or removing links in a single general overlay network that maintains the whole node population connected. The Spidercast overlay management protocol, inspired by [12], aims at (i) maintaining an average node degree scalable with respect to both the system size (total number of participating nodes) and the system load (number of topics, subscriptions and events), (ii) probabilistically guarantee topics cluster connectivity to reduce event diffusion costs, (iii) maintaining a scalable topic cluster diameter to improve latency during event diffusion and, finally, (iv) building a churn-resilient overlay network (i.e., an overlay able to withstand the continuous join or leave of nodes).

(9)

Clusters are built and maintained by a decentralized protocol that acts autonomously on each node by selecting its neighbors. The rationale behind the local neighbor selection criterion is to guarantee k-coverage to every subscribed topic. K-coverage (where k is a system-wide parameter) for a topic means that the node maintains at least k links to other nodes subscribed to the same topic. The trick is then to guarantee k-coverage with a small number of links.

By adopting a weighted mix between greedy and randomized strategies Spidercast is able to build and maintain an overlay network satisfying all the previously listed properties. More specifically the authors of [7] show how overlay networks built by Spidercast have an average node degree that grows sub-linearly with the number of subscribed topics and that decreases slightly when the number of nodes in the system increases. An example of an overlay network built with the Spidercast protocol is represented in Figure 6.

Note that, despite the interesting properties offered by Spi- dercast, the authors do not propose a specific method for event diffusion. For this reason, it is difficult to directly compare Spidercast’s performance to the performance offered by other systems listed in this section. The only drawback that can be accounted to Spidercast’s event clustering approach is the adoption of a single protocol for the overlay management. This choice does not leave system developers enough flexibility to differentiate the service level guarantees required by distinct topics: given that the Spidercast overlay management protocol is common to all the clusters, all the topics will be necessarily limited to a common service level.

5. CONCLUSIONS

Modern publish/subscribe systems have been designed to deal with applications characterized by very large scale, huge loads, unreliability, etc. All these characteristics poses serious problems in the development of an efficient event diffusion mechanism. In this paper we analyzed in details a technique that can help in attaining the desired level of scalability: interest clustering. All the most recent proposals in the area of publish/subscribe systems implement interest clustering in some form to deliver high performance in the above adverse settings. We firmly believe that also future proposals in this area will not be able to depart from this choice.

6. REFERENCES

[1] Rss 2.0 specification.

[2] S. Baehni, P. T. Eugster, and R. Guerraoui.

Data-aware multicast. In Proceedings of the

International Conference on Dependable Systems and Networks (DSN), pages 233–242, 2004.

[3] R. Baldoni, R. Beraldi, V. Quema, L. Querzoni, and S. Tucci-Piergiovanni. Tera: topic-based event routing for peer-to-peer architectures. In DEBS ’07:

Proceedings of the 2007 inaugural international conference on Distributed event-based systems, pages 2–13, New York, NY, USA, 2007. ACM.

[4] R. Baldoni, R. Beraldi, L. Querzoni, and A. Virgillito.

Efficient publish/subscribe through a self-organizing broker overlay and its application to SIENA. The

Computer Journal, 50(4):444–459, July 2007.

[5] A. Carzaniga, D. S. Rosenblum, and A. L. Wolf.

Design and evaluation of a wide-area notification service. ACM Transactions on Computer Systems, 3(19):332–383, August 2001.

[6] M. Castro, P. Druschel, A. Kermarrec, and

A. Rowston. Scribe: A large-scale and decentralized application-level multicast infrastructure. IEEE Journal on Selected Areas in Communications, 20(8), October 2002.

[7] G. Chockler, R. Melamed, Y. Tock, and R. Vitenberg.

Spidercast: a scalable interest-aware overlay for topic-based pub/sub communication. In DEBS ’07:

Proceedings of the 2007 inaugural international conference on Distributed event-based systems, pages 14–25, New York, NY, USA, 2007. ACM.

[8] ´Etienne Rivi`ere, R. Baldoni, H. Li, and J. Pereira.

Compositional gossip: a conceptual architecture for designing gossip-based applications. SIGOPS Oper.

Syst. Rev., 41(5):43–50, 2007.

[9] P. Eugster, P. Felber, R. Guerraoui, and A.-M.

Kermarrec. The many faces of publish/subscribe.

ACM Computing Surveys, 35(2):114–131, 2003.

[10] P. T. Eugster, R. Guerraoui, S. B. Handurukande, P. Kouznetsov, and A.-M. Kermarrec. Lightweight Probabilistic Broadcast. ACM Transanctions on Computer Systems, 21(4):341–374, 2003.

[11] M. Jelasity, R. Guerraoui, A. Kermarrec, and M. van Steen. The Peer Sampling Service: Experimental Evaluation of Unstructured Gossip-based Implementations. In Proceedings of the 5th ACM/IFIP/USENIX International Conference on Middleware, pages 79–98, 2004.

[12] R. Melamed and I. Keidar. Araneola: A scalable reliable multicast system for dynamic environments.

In Proceedings of the 3rd IEEE International Symposium on Network Computing and Applications (NCA), pages 5–14, 2004.

[13] L. Opyrchal, M. Astley, J. Auerbach, G. Banavar, R.Strom, and D. Sturman. Exploiting IP multicast in content-based publish-subscribe systems. In

Proceedings of Middleware 2000, IFIP/ACM International Conference on Distributed Systems Platforms, pages 185–207, 4-7 April 2000.

[14] S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker. A scalable content-addressable network. In Proceedings of SIGCOMM, 2001.

[15] S. Ratnasamy, M. Handley, R. Karp, and S. Shenker.

Application-level multicast using content-addressable networks. Lecture Notes in Computer Science, 2233:14–34, 2001.

[16] A. Rowstron and P. Druschel. Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems. In Proceedings of IFIP/ACM International Conference on Distributed Systems Platforms (Middleware), pages 329–350, 12-16 November 2001.

[17] P. Triantafillou and A. Economides. Subscription summarization: A new paradigm for efficient

publish/subscribe systems. In Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS), pages 562–571, 24-26 March 2004.

(10)

[18] S. Voulgaris, E. Rivi`ere, A.-M. Kermarrec, and M. van Steen. Sub-2-sub: Self-organizing content-based publish and subscribe for dynamic and large scale collaborative networks. Research Report RR5772, INRIA, Rennes, France, December 2005.