CISCyber Intelligence andinformation Security

(1)

DIPARTIMENTO DI INGEGNERIA

INFORMATICA AUTOMATICA E

GESTIONALE ANTONIO RUBERTI

Cyber Intelligence and information Security

CIS Sapienza

CIS

THE OVERLAY SCAN ATTACK

INFERRING TOPOLOGIES OF DISTRIBUTED  

PUB/SUB SYSTEMS THROUGH BROKER SATURATION

Leonardo Aniello, Roberto Baldoni, Claudio Ciccotelli,  

Giuseppe Di Luna, Fabrizio Frontali and Leonardo Querzoni

[email protected]

DEBS 2014 

Mumbai - 28/5/2014

(2)

I NTRODUCTION

▪Mainstream middleware solutions for many-to-many asynchronous

communications

▪The rise of large-scale cloud platforms further boosted the adoption

of such solutions

◆Fundamental building block for most of such platforms

• Yahoo! Message Broker

• Facebook’s Wormhole pub/sub

• Spotify’s

▪The resiliency of such systems against possible attacks is therefore of

utmost importance

(3)

A ^NATOMY ^OF ^AN A ^TTACK

▪A typical strategy employed in many cyber attacks includes

◆Reconnaissance phase

• silently study the target

• identify weak points

• plan the most effective strategy to deliver the attack

◆Decoy phase

• confuse the defendants

• hide the real target

◆Attack phase

(4)

R ECONNAISSANCE P ^HASE

▪Often performed using a set of tools and techniques whose main

purpose is to collect detailed information about the target systems:

▪

characteristics of the software/hardware platforms

▪

internal architectural organization

▪

defensive strategies

▪

etc.

▪Typical example: port scanning

(5)

R ECONNAISSANCE P ^HASE

▪A kind of “preliminary” attacks that is often underestimated

▪

sometimes considered only slightly more annoying than email spam.

!

▪The more the attacker knows about the target:

▪

the faster the attack will be

▪

the largest will be its success probability

▪

the less effective target defenses will be

Conversely, the amount of non-publicly available information that can

be obtained with these attacks is striking and greatly help the attacker

in carefully planning potentially devastating future attacks.

(6)

V ÂLUE ÔF T ÔPOLOGICAL I ^NFORMATION

▪Knowledge about the internal topology of a target system is extremely

valuable to plan a cyber attack

▪Coremelt Attack [1]

◆Aims at congesting network core

◆N malicious clients collaborate to saturate a specific link in the system

▪Crossfire attack [2]

◆traffic is directed towards a large number of publicly accessible destinations

◆target gets isolated and has no way of identifying that it is under attack

[1] A. Studer and A. Perrig. “The Coremelt Attack”. Proc. 14th European Conf. on Research in Computer Security, 2009 [2] M. S. Kang, S. B. Lee, and V. D. Gligor. “The Crossfire Attack”. Proc. of the 34th IEEE Symp. on Security and Privacy, 2013

(7)

T ^HE O ^VERLAY S ^CAN A ^TTACK

▪We show here that a class of existing distributed publish/subscribe

middleware may be susceptible to such a kind of attack

◆We named this attack the “overlay scan attack”

!

▪In the next slides:

◆Definition of the system/attack model

◆Topology inference algorithm

◆Evaluation of its effectiveness

The attacker goal is to obtain information about the pub/sub

system topological structure

(8)

S ^YSTEM M ^ODEL

▪We consider a managed publish/subscribe systems

◆Multiple brokers interconnected to form an overlay network

◆Routing happens through application level multicast trees

• Each broker with connected publishers is at the root of a spanning tree

◆Satisfies the all-pairs-path-simmetry rule [3]

[3] A. Carzaniga, M. J. Rutherford, and A. L. Wolf. A Routing Scheme for Content-Based Networking. In Proceedings of the 23rd IEEE International Conference on Computer Communications (INFOCOM), 2004.

B₁

B₂

(9)

S ^YSTEM M ^ODEL

▪We consider a managed publish/subscribe systems

B₁

B₂

(10)

S ^YSTEM M ^ODEL

▪We consider a managed publish/subscribe systems

B₁

B₂

(11)

S ^YSTEM M ^ODEL

▪We consider a managed publish/subscribe systems

B₁

B₂

(12)

S ^YSTEM M ^ODEL

▪The pub/sub system is modeled as a tuple P=⟨G(V,E),d,r⟩

◆V is the set of brokers

◆E is the set of application level links interconnecting them in the topology

◆d : E → R+

• Each link is characterized by a latency d(e)

• Links are symmetric and FIFO

◆r : V → {rmin,rmax}

• Represents the computational rate of the broker

• r^min ≪ rmax

(13)

S ^YSTEM M ^ODEL

▪The computation time needed by a broker v that routes less than r(v)

events per second is negligible with respect to network latency d

▪If v handles more than r(v) events per second then it gets saturated

◆the computation time needed to route an event in this case is in the same order of magnitude as network latencies

▪Available bandwidth is enough to avoid it becoming a bottleneck

◆We can saturate brokers, but we can’t saturate network links

▪The set of brokers is partitioned V=V

E

∪ V

I

◆VE are external nodes (access points to the service)

◆VI are internal brokers (pure event routers)

(14)

S ^YSTEM M ^ODEL

▪The adversary

◆knows VE and interacts with the system through the standard API

◆controls at least three clients

◆these clients can be connected to any node in VE

◆clients are synchronized

▪Goal: infer G(V,E)

(15)

T ^OPOLOGY I ^NFERENCE A ^LGORITHMS

▪Basic idea

◆by injecting appropriately shaped traffic in the system is it possible to selectively saturate some brokers.

• Broker saturation induces latencies on event routing

◆by measuring latency variations it is possible to pinpoint the position of saturated brokers in the system topology.

◆information about the position of several brokers can be correlated to iteratively infer the full system topology

▪Traffic is injected by the attacker using controlled clients

(16)

T ^OPOLOGY I ^NFERENCE A ^LGORITHMS

▪We will first introduce a simplified version of the algorithm called “Full

mapper” based on the following naive assumptions

◆All links have a same latency d

◆d is known to the adversary

◆there is no traffic in the system, but the   one produced by the attacker

▪Then, we will remove these assumptions  

and introduce the “Real mapper” version  

of the algorithm

CAUTION

STRONG 

ASSUMPTIONS AHEAD

(17)

F ^ULL M ^APPER

▪The algorithm works in three phases:

1. Setup and distance estimation 2. Path discovery

3. Path merging

(18)

F ÛLL M ÂPPER - S ÊTUP ÂND D ÎSTANCE E ^ST .

1. Initializes several clients: a client c

i

for each edge broker B

i

∈ V

E

2. Repeats for each possible pair of clients (c

x

, c

y

)

2.1. configures cx as a publisher and cy as a subscriber

▪cy's subscription is such that it will only be notified about events published by cx

2.2. starts publishing from cx (at a low rate) 2.3. calculate average notification latency on cy

2.4. The notification latency, normalized for d, is the hop count for the path connecting the two edge brokers Bx and By

3. Uses these values to build a (symmetric) delay matrix D with  

size |V

E

|

²

(19)

F ÛLL M ÂPPER - P ÂTH D ÎSCOVERY

▪Repeats for each triple (c

x

,c

y

,c

z

)

1. configures c

x

and

^c^y as both publishers and subscribers such that cx (resp.

cy) will be notified of the events published by cy (resp. cx)

2. both cx and cy start to publish events at rate R/3 tagging each event with a sequence number

3. configures cz as publisher and cy as subscriber such that cy will be notified of the events published by cz

4. cz starts to publish event

s at a rate slightly larger than R/3

▪R is an estimation of the maximum computational rate r

min

▪Can be probed on edge brokers ➔ rmin ≤ R ≤ rmax

▪This means some brokers will be invisible

(20)

F ÛLL M ÂPPER - P ÂTH D ÎSCOVERY

B₃

B₆

B₄ B₂

B₁

B₇

B₅ C_X

C_Y

C_Z events from

C_X to C_Y

events from C_Y to C_X

events from C_Z to C_Y

(21)

F ÛLL M ÂPPER - P ÂTH D ÎSCOVERY

B₃

B₆

B₄

B₅ B₁

B₇

C_X

C_Y

C_Z 1 6 2

5 3

4 4 5 3

2 6

1

(22)

F ÛLL M ÂPPER - P ÂTH D ÎSCOVERY

B₃

B₆

B₄

B₅ B₁

B₇

late

notification C_X

C_Y

C_Z

… 9 4

8 5

7,6 7

8 5,6 4 9

3

S-node

(23)

F ÛLL M ÂPPER - P ÂTH D ÎSCOVERY

▪[continue…]

5. cx and cy track the sequence numbers of the first events that they see with a late notification

6. the difference between the two is equal to the length difference between path [Bx,S] and [S,By]

▪For each triple the attacker thus obtains:

◆information about the existence of a broker S (that may be internal)

◆the length of two subpaths in the topology

▪All triples with the inferred information constitute a set T

▪Cost is O(|V

E

|

³

), but performance is not our goal!

(24)

F ÛLL M ÂPPER - P ÂTH M ÊRGING

▪T contains with high probability a large number of duplicates

◆brokers that are listed as separate S-nodes in T, but that actually represent the same internal broker identified in the analysis of two distinct triples.

▪The path merging procedure builds the final topology:

◆Considers couples of elements from T

◆Builds partial paths from them

◆Merges common subpaths starting from S-nodes

◆Remove duplicates

(25)

F ÛLL M ÂPPER - P ÂTH M ÊRGING

▪Example

B₁ a b B₇

(26)

F ÛLL M ÂPPER - P ÂTH M ÊRGING

▪Example

B₁ a b B₇

B₅

c

d

e

(27)

F ÛLL M ÂPPER - P ÂTH M ÊRGING

▪Example

B₁ a b B₇

B₅

c

d

e S

d[B₁,S]

(28)

F ÛLL M ÂPPER - P ÂTH M ÊRGING

▪Example

B₁ a b B₇

B₅

c

d

e

S d[S,B ⁷

]

(29)

F ÛLL M ÂPPER - P ÂTH M ÊRGING

▪Example

S

b B₁

B₅

c

B₇

(30)

R ^EAL M ^APPER

▪Application-level traffic

◆It affects latency measurements like background noise

◆We can filter it out by increasing the number of measurements

▪Unknown and different network latencies

◆Considers a latency unit u smaller than any link latency

◆otherwise some brokers may “disappear” form the inferred topology

◆Creates fake brokers that are then discarded in the path merge phase

◆As a consequence brokers with degree 2 cannot be detected

(31)

E XPERIMENTAL E ^VALUATION

▪We studied how much precise topology inference can be

◆

Considering different topologies

◆

Varying the network size

▪Both by simulations and on a really functioning prototype

◆

The latter has been implemented to work with SIENA prototype

▪Measured metrics:

◆

broker/link detection rate

◆

graph distance

[A. Papadopoulos and Y. Manolopoulos. Structure-based Similarity Search with Graph Histograms. In Proceedings of the 10th International Workshop on Database and Expert Systems Applications, pages 174–178, 1999]

(32)

E XPERIMENTAL E ^VALUATION

“A picture is worth a thousand words”

4-cluster topology with mesh-connected core

(33)

E XPERIMENTAL E ^VALUATION

4-cluster topology with mesh-connected core

(34)

E XPERIMENTAL E ^VALUATION

3-cluster topology with ring-connected core

(35)

E XPERIMENTAL E ^VALUATION

4-cluster topology with bus-connected core

(36)

THE OVERLAY SCAN ATTACK - LEONARDO QUERZONI - DEBS 2014 - 28/5/2014

E XPERIMENTAL E ^VALUATION

▪Tests on random topology (3-15 internal broker + up to 20 edge b.)

If invisible brokers are present in the system, it is neces- sary to introduce an additional metric:

• graph distance ( ): the minimum number of edit op- erations (addition/removal of brokers and edges) re- quired to transform the inferred topology so that it becomes isomorphic to the original one [17]; if the inferred topology is isomorphic to the original one, then

= 0.

Beyond the quality of the obtained outcomes, we evalu- ated performances of our algorithms measuring its message complexity and execution time.

5.1.1 Simulations on Random Topologies

A random topology is composed by a variable number of internal brokers (from 3 to 15) and edge brokers (up to 20 nodes). We ran simulations to assess the accuracy of Real Mapper, to compare the performance of Full and Real Mapper and to evaluate the impact of invisible brokers on the accuracy of both Full and Real Mapper.

0"

0,2"

0,4"

0,6"

0,8"

1"

8" 10" 12" 14" 16" 18" 20" 22"

detec%on(rate(

number(of(brokers(

broker"detec1on"rate"

link"detec1on"rate"

N7degree"broker"detec1on"rate"

Figure 2: Broker, link and N-degree broker detection rates for the Real Mapper (Scenario C) when executed on a random topology with a fixed number of edge brokers and an increasing number of internal brokers.

Detection Accuracy.

In order to assess the accuracy of the algorithms, we used a random topology with a fixed number of edge brokers and a variable number of internal brokers. While the Full Map- per showed a perfect accuracy in detecting both brokers and links (Scenario A), the Real Mapper exhibited a decreasing accuracy as the number of internal brokers increased (Sce- nario C, see Figure 2).

This bad performance level was mainly caused by the in- ability of the algorithm to correctly detect internal brokers with degree 2. The same graph also show how the Real Map- per always correctly detected brokers with degree larger than 2 (⌘ = 1).

Performance Comparison.

We first compared the performances of Full Mapper and Real Mapper in terms of execution time and message complexity for a random topology with a fixed number of edge brokers as the number of internal brokers increases. As Fig- ure 3 shows, both metrics increase linearly with the number of internal brokers. However, as could be expected, the Real Mapper algorithm sports worse performance on both aspects with respect to the Full Mapper. This performance di↵erence is due to the large number of iterations that the

0" 2" 4" 6" 8" 10" 12"

0"

1"

2"

3"

4"

5"

6"

8" 10" 12" 14" 16" 18" 20" 22"

execu%on(%me((103 (seconds)( messages((106 (msg)(

FM"msgs" RM"msgs" FM"0me" RM"0me"

Figure 3: Performance comparison between Full Mapper (FM, Scenario A) and Real Mapper (RM, Scenario C) in execution time and number of mes- sages on a random topology with a fixed number of edge brokers and an increasing number of internal brokers.

0"

0,5"

1"

1,5"

2"

2,5"

4" 8" 12" 16" 20"

execu%on(%me((105 (second)(

edge(brokers( Full"Mapper"

Real"Mapper"

Figure 4: Execution time comparison between Full Mapper (Scenario A) and Real Mapper (Scenario C) on a random topology with a fixed number of internal brokers and an increasing number of edge brokers.

We then investigated how performance are a↵ected by the number of edge brokers, by simulating the execution of the algorithms on a random topology with a fixed number of internal brokers. Figure 4 shows that execution time increases as more edge brokers are included in the topology, and that again Real Mapper performs worse than Full Mapper.

Impact of Invisible Brokers.

We also analyzed how the presence of invisible brokers a↵ects the accuracy of the inferred topology. We built a random topology and started setting brokers with degree larger than 2 in invisible brokers by setting their dispatching rate to rmax (see Table 1).

0"

10"

20"

30"

40"

50"

60"

0" 0,5" 1" 1,5" 2" 2,5" 3" 3,5" 4" 4,5" 5"

graph&distance&

invisible&brokers&

Full"Mapper" Real"Mapper"

Figure 5: Accuracy comparison between Full Map- per (Scenario B) and Real Mapper (Scenario D) on a random topology with a fixed number of brokers and an increasing number of invisible brokers.

If invisible brokers are present in the system, it is neces- sary to introduce an additional metric:

• graph distance ( ): the minimum number of edit op- erations (addition/removal of brokers and edges) re- quired to transform the inferred topology so that it becomes isomorphic to the original one [17]; if the inferred topology is isomorphic to the original one, then

= 0.

Beyond the quality of the obtained outcomes, we evalu- ated performances of our algorithms measuring its message complexity and execution time.

5.1.1 Simulations on Random Topologies

A random topology is composed by a variable number of internal brokers (from 3 to 15) and edge brokers (up to 20 nodes). We ran simulations to assess the accuracy of Real Mapper, to compare the performance of Full and Real Mapper and to evaluate the impact of invisible brokers on the accuracy of both Full and Real Mapper.

0"

0,2"

0,4"

0,6"

0,8"

1"

8" 10" 12" 14" 16" 18" 20" 22"

detec%on(rate(

link"detec1on"rate"

Figure 2: Broker, link and N-degree broker detection rates for the Real Mapper (Scenario C) when executed on a random topology with a fixed number of edge brokers and an increasing number of internal brokers.

Detection Accuracy.

In order to assess the accuracy of the algorithms, we used a random topology with a fixed number of edge brokers and a variable number of internal brokers. While the Full Map- per showed a perfect accuracy in detecting both brokers and links (Scenario A), the Real Mapper exhibited a decreasing accuracy as the number of internal brokers increased (Sce- nario C, see Figure 2).

This bad performance level was mainly caused by the in- ability of the algorithm to correctly detect internal brokers with degree 2. The same graph also show how the Real Map- per always correctly detected brokers with degree larger than 2 (⌘ = 1).

Performance Comparison.

We first compared the performances of Full Mapper and Real Mapper in terms of execution time and message complexity for a random topology with a fixed number of edge brokers as the number of internal brokers increases. As Fig- ure 3 shows, both metrics increase linearly with the number of internal brokers. However, as could be expected, the Real Mapper algorithm sports worse performance on both aspects with respect to the Full Mapper. This performance di↵erence is due to the large number of iterations that the

0"

2"

4"

6"

8"

10"

12"

0"

1"

2"

3"

4"

5"

6"

8" 10" 12" 14" 16" 18" 20" 22"

execu%on(%me((103(seconds)(

messages((106(msg)(

FM"msgs" RM"msgs" FM"0me" RM"0me"

Figure 3: Performance comparison between Full Mapper (FM, Scenario A) and Real Mapper (RM, Scenario C) in execution time and number of mes- sages on a random topology with a fixed number of edge brokers and an increasing number of internal brokers.

0"

0,5"

1"

1,5"

2"

2,5"

4" 8" 12" 16" 20"

execu%on(%me((105(second)(

edge(brokers(

Full"Mapper"

Real"Mapper"

Figure 4: Execution time comparison between Full Mapper (Scenario A) and Real Mapper (Scenario C) on a random topology with a fixed number of internal brokers and an increasing number of edge brokers.

We then investigated how performance are a↵ected by the number of edge brokers, by simulating the execution of the algorithms on a random topology with a fixed number of internal brokers. Figure 4 shows that execution time increases as more edge brokers are included in the topology, and that again Real Mapper performs worse than Full Mapper.

Impact of Invisible Brokers.

We also analyzed how the presence of invisible brokers a↵ects the accuracy of the inferred topology. We built a random topology and started setting brokers with degree larger than 2 in invisible brokers by setting their dispatching rate to rmax (see Table 1).

0"

10"

20"

30"

40"

50"

60"

0" 0,5" 1" 1,5" 2" 2,5" 3" 3,5" 4" 4,5" 5"

graph&distance&

Full"Mapper"

Real"Mapper"

Figure 5: Accuracy comparison between Full Map- per (Scenario B) and Real Mapper (Scenario D) on a random topology with a fixed number of brokers and an increasing number of invisible brokers.

(37)

THE OVERLAY SCAN ATTACK - LEONARDO QUERZONI - DEBS 2014 - 28/5/2014

E XPERIMENTAL E ^VALUATION

▪Tests on cluster-based topologies

and the inferred one. As expected, the graph distance increases as more brokers become invisible because they cannot be saturated. An interesting result standing out in that figure is that the Real Mapper begins to perform better than the Full Mapper after the insertion of the third invisible broker.

0"

0,2"

0,4"

0,6"

0,8"

1"

0" 1" 2" 3" 4" 5"

detec%on(rate(

invisible(brokers(

Figure 6: Comparison between broker detection rate and N-degree broker detection rate of the Real Map- per (Scenario D) on a random topology with a fixed number of brokers and an increasing number of invisible brokers.

The reason why this happens can be explained by looking at Figure 6, where it is shown how ↵ and ⌘ vary for the Real Mapper when invisible brokers are added. For each new invisible broker, ⌘ decreases because an additional N-degree broker cannot be detected. On the contrary, ↵ increases (this happened three times out of five in our tests) because another broker gets saturated in place of the invisible one, and if such broker is 2-degree (that couldn’t be detected before) then the Real Mapper can spot it now.

Figure 7: Example of the e↵ect of an invisible broker (the central broker of the original topology) on the topologies inferred by Full Mapper (Scenario B) and Real Mapper (Scenario D). Wrongly inferred brokers and links are dotted.

A practical example where this situation occurs is drawn in Figure 7. In the original topology (the leftmost), the 4- degree broker is invisible and the other 2-degree brokers are visible. Both Full and Real Mapper always saturate one of the 2-degree brokers instead of the 4-degree one. The Full Mapper knows the link delay, so it is aware that another broker lies in between any of the 6 pairs of original 2-degree brokers, but it cannot recognize that such broker is always the same, which leads to the addition of several wrong brokers and links in the inferred topology (the central one in Figure 7). The Real Mapper instead doesn’t know the link delay, and adds a certain number of virtual brokers in all the 6 paths between any pair of the original 2-degree brokers. All these virtual brokers are removed because they are

2-degree, and only the links remain (rightmost topology in Figure 7).

Figure 8: 4-tree-cluster topology with mesh- connected core. Edge brokers are white-filled, internal brokers are red-filled.

5.1.2 Simulations on Cluster Topologies

Cluster topologies are composed by tree-structured clusters of brokers. Their structure is meant to mimic many real-world overlay networks where local subnets (clusters) are connected through gateways linked in a core structure.

We ran tests using 4-cluster topologies that di↵er in the way the clusters are connected together in the core of the network, by using either a bus, a ring or a mesh (see Figure 8).

Each single cluster is a tree with 4 internal brokers and 9 edge brokers.

0"

50"

100"

150"

200"

0" 1" 2" 3" 4"

graph&distance&

bus"

ring"

mesh"

Figure 9: Accuracy comparison of the Real Map- per (Scenario D) on distinct cluster topologies by varying the number of invisible brokers.

We measured how the accuracy of the Real Mapper is a↵ected by the presence of invisible brokers for the three cluster topologies. Results are plotted in Figure 9. As expected, for all the topologies the graph distance increases as more brokers become invisible. Another result emerg- ing from this simulation concerns the di↵erence in detection accuracy between mesh-connected, ring-connected and bus- connected topologies: the more complex the connection is in the number of links, the less accurate the detection is.

and the inferred one. As expected, the graph distance increases as more brokers become invisible because they cannot be saturated. An interesting result standing out in that figure is that the Real Mapper begins to perform better than the Full Mapper after the insertion of the third invisible broker.

0"

0,2"

0,4"

0,6"

0,8"

1"

0" 1" 2" 3" 4" 5"

detec%on(rate(

invisible(brokers(

Figure 6: Comparison between broker detection rate and N-degree broker detection rate of the Real Map- per (Scenario D) on a random topology with a fixed number of brokers and an increasing number of invisible brokers.

The reason why this happens can be explained by looking at Figure 6, where it is shown how ↵ and ⌘ vary for the Real Mapper when invisible brokers are added. For each new invisible broker, ⌘ decreases because an additional N-degree broker cannot be detected. On the contrary, ↵ increases (this happened three times out of five in our tests) because another broker gets saturated in place of the invisible one, and if such broker is 2-degree (that couldn’t be detected before) then the Real Mapper can spot it now.

Figure 7: Example of the e↵ect of an invisible broker (the central broker of the original topology) on the topologies inferred by Full Mapper (Scenario B) and Real Mapper (Scenario D). Wrongly inferred brokers and links are dotted.

A practical example where this situation occurs is drawn in Figure 7. In the original topology (the leftmost), the 4- degree broker is invisible and the other 2-degree brokers are visible. Both Full and Real Mapper always saturate one of the 2-degree brokers instead of the 4-degree one. The Full Mapper knows the link delay, so it is aware that another broker lies in between any of the 6 pairs of original 2-degree brokers, but it cannot recognize that such broker is always the same, which leads to the addition of several wrong brokers and links in the inferred topology (the central one in Figure 7). The Real Mapper instead doesn’t know the link delay, and adds a certain number of virtual brokers in all

2-degree, and only the links remain (rightmost topology in Figure 7).

Figure 8: 4-tree-cluster topology with mesh- connected core. Edge brokers are white-filled, internal brokers are red-filled.

5.1.2 Simulations on Cluster Topologies

Cluster topologies are composed by tree-structured clusters of brokers. Their structure is meant to mimic many real-world overlay networks where local subnets (clusters) are connected through gateways linked in a core structure.

We ran tests using 4-cluster topologies that di↵er in the way the clusters are connected together in the core of the network, by using either a bus, a ring or a mesh (see Figure 8).

Each single cluster is a tree with 4 internal brokers and 9 edge brokers.

0"

50"

100"

150"

200"

0" 1" 2" 3" 4"

graph&distance&

bus"

ring"

mesh"

Figure 9: Accuracy comparison of the Real Map- per (Scenario D) on distinct cluster topologies by varying the number of invisible brokers.

We measured how the accuracy of the Real Mapper is a↵ected by the presence of invisible brokers for the three cluster topologies. Results are plotted in Figure 9. As expected, for all the topologies the graph distance increases as more brokers become invisible. Another result emerg- ing from this simulation concerns the di↵erence in detection accuracy between mesh-connected, ring-connected and bus- connected topologies: the more complex the connection is in the number of links, the less accurate the detection is.

(38)

E XPERIMENTAL E ^VALUATION

▪Sensibility to the time unit choice

0"

5"

10"

15"

20"

25"

30"

35"

40"

85" 90" 95" 100" 105" 110" 115" 120" 125" 130" 135"

graph&distance&

.me&unit&(ms)&

topology"T1"

topology"T2"

topology"T3"

topology"T4"

(39)

C ÔNCLUSIONS ÂND F ÛTURE D ÎRECTIONS

▪The “overlay scan attack” exposes the internal topological

characteristics of some pub/sub systems to malicious users

▪

This information can later be used to plan more effective attacks

▪We presented an algorithm (Real Mapper) to infer the network

topology os a SIENA-like pub/sub system

▪Open questions:

▪

How much is it relevant? Are backend systems at risk?

▪

How can systems be protected?

▪

Is it possible to identify the attack sources?

(40)

T ^HANK ^YOU !

!

Q&A

(41)

C OMPUTATIONAL R ^ATE ( ^EXTENDED )

▪Rather than constraining all the brokers with small computational rate

to have the same r

min

, we can derive a range [r

^lmin

,r

^hmin

] of values for

r

min

◆r^lmin can be easily found by searching for the maximum publishing rate for each pair of edge brokers, and then taking the minimum value.

◆for our algorithm to work, we have to ensure that no broker gets saturated by two streams, each flowing at R/3 events per second

◆This implies that R has to be set such that 2/3*R < r^lmin

◆On the other hand, we need to saturate any non-invisible broker, i.e. R > r^hmin

◆Therefore the range for computational rate is [r^lmin , 3/2 r^lmin]

CISCyber Intelligence andinformation Security

CIS

THE OVERLAY SCAN ATTACK

INFERRING TOPOLOGIES OF DISTRIBUTED

PUB/SUB SYSTEMS THROUGH BROKER SATURATION

Leonardo Aniello, Roberto Baldoni, Claudio Ciccotelli,

Giuseppe Di Luna, Fabrizio Frontali and Leonardo Querzoni

I NTRODUCTION

▪Mainstream middleware solutions for many-to-many asynchronous

communications

▪The rise of large-scale cloud platforms further boosted the adoption

of such solutions

▪The resiliency of such systems against possible attacks is therefore of

utmost importance

A NATOMY OF AN A TTACK

▪A typical strategy employed in many cyber attacks includes

R ECONNAISSANCE P HASE

▪Often performed using a set of tools and techniques whose main

purpose is to collect detailed information about the target systems:

characteristics of the software/hardware platforms

internal architectural organization

defensive strategies

etc.

▪Typical example: port scanning

R ECONNAISSANCE P HASE

▪A kind of “preliminary” attacks that is often underestimated

sometimes considered only slightly more annoying than email spam.

▪The more the attacker knows about the target:

the faster the attack will be

the largest will be its success probability

the less effective target defenses will be

Conversely, the amount of non-publicly available information that can

be obtained with these attacks is striking and greatly help the attacker

in carefully planning potentially devastating future attacks.

V ALUE OF T OPOLOGICAL I NFORMATION

▪Knowledge about the internal topology of a target system is extremely

valuable to plan a cyber attack

▪Coremelt Attack [1]

▪Crossfire attack [2]

T HE O VERLAY S CAN A TTACK

▪We show here that a class of existing distributed publish/subscribe

middleware may be susceptible to such a kind of attack

▪In the next slides:

The attacker goal is to obtain information about the pub/sub

system topological structure

S YSTEM M ODEL

▪We consider a managed publish/subscribe systems

S YSTEM M ODEL

▪We consider a managed publish/subscribe systems

S YSTEM M ODEL

▪We consider a managed publish/subscribe systems

S YSTEM M ODEL

▪We consider a managed publish/subscribe systems

S YSTEM M ODEL

▪The pub/sub system is modeled as a tuple P=⟨G(V,E),d,r⟩

S YSTEM M ODEL

▪The computation time needed by a broker v that routes less than r(v)

events per second is negligible with respect to network latency d

▪If v handles more than r(v) events per second then it gets saturated

▪Available bandwidth is enough to avoid it becoming a bottleneck

▪The set of brokers is partitioned V=V

∪ V

S YSTEM M ODEL

▪The adversary

▪Goal: infer G(V,E)

T OPOLOGY I NFERENCE A LGORITHMS

▪Basic idea

▪Traffic is injected by the attacker using controlled clients

T OPOLOGY I NFERENCE A LGORITHMS

▪We will first introduce a simplified version of the algorithm called “Full

mapper” based on the following naive assumptions

▪Then, we will remove these assumptions

and introduce the “Real mapper” version

of the algorithm

CAUTION

F ULL M APPER

▪The algorithm works in three phases:

F ULL M APPER - S ETUP AND D ISTANCE E ST .

1. Initializes several clients: a client c

for each edge broker B

INFERRING TOPOLOGIES OF DISTRIBUTED  

Leonardo Aniello, Roberto Baldoni, Claudio Ciccotelli,  

A ^NATOMY ^OF ^AN A ^TTACK

R ECONNAISSANCE P ^HASE

R ECONNAISSANCE P ^HASE

V ÂLUE ÔF T ÔPOLOGICAL I ^NFORMATION

T ^HE O ^VERLAY S ^CAN A ^TTACK

S ^YSTEM M ^ODEL

S ^YSTEM M ^ODEL

S ^YSTEM M ^ODEL

S ^YSTEM M ^ODEL

S ^YSTEM M ^ODEL

S ^YSTEM M ^ODEL

S ^YSTEM M ^ODEL

T ^OPOLOGY I ^NFERENCE A ^LGORITHMS

T ^OPOLOGY I ^NFERENCE A ^LGORITHMS

▪Then, we will remove these assumptions  

and introduce the “Real mapper” version  

F ^ULL M ^APPER

F ÛLL M ÂPPER - S ÊTUP ÂND D ÎSTANCE E ^ST .

3. Uses these values to build a (symmetric) delay matrix D with  

F ÛLL M ÂPPER - P ÂTH D ÎSCOVERY

F ÛLL M ÂPPER - P ÂTH D ÎSCOVERY

F ÛLL M ÂPPER - P ÂTH D ÎSCOVERY

F ÛLL M ÂPPER - P ÂTH D ÎSCOVERY

F ÛLL M ÂPPER - P ÂTH D ÎSCOVERY

F ÛLL M ÂPPER - P ÂTH M ÊRGING

F ÛLL M ÂPPER - P ÂTH M ÊRGING

F ÛLL M ÂPPER - P ÂTH M ÊRGING

F ÛLL M ÂPPER - P ÂTH M ÊRGING

F ÛLL M ÂPPER - P ÂTH M ÊRGING

F ÛLL M ÂPPER - P ÂTH M ÊRGING

R ^EAL M ^APPER

E XPERIMENTAL E ^VALUATION

E XPERIMENTAL E ^VALUATION

E XPERIMENTAL E ^VALUATION

E XPERIMENTAL E ^VALUATION

E XPERIMENTAL E ^VALUATION

E XPERIMENTAL E ^VALUATION

E XPERIMENTAL E ^VALUATION

E XPERIMENTAL E ^VALUATION