Inter-Domain Stealthy Port Scan Detection through Complex Event Processing

(1)

Inter-Domain Stealthy Port Scan Detection

through Complex Event Processing

Leonardo Aniello, Giorgia Lodi and Roberto Baldoni

Dipartimento di Informatica e Sistemistica "Antonio Ruberti"

Università degli Studi di Roma "La Sapienza", Via Ariosto 25, 00185 Roma, Italy

aniello,lodi,baldoni@dis.uniroma1.it

ABSTRACT

Large enterprises are nowadays complex interconnected software systems spanning over several domains. This new di- mension makes difficult for enterprises the task of enabling efficient security defenses. This paper addresses the problem of detecting inter-domain stealthy port scans and proposes an architecture of an Intrusion Detection System which uses, for such purpose, an open source Complex Event Processing engine named Esper. Esper provides low cost of ownership and high flexibility. The architecture consists of software sensors deployed at different enterprise domains. Each sen- sor sends events to the Esper event processor for correlation.

We implemented an algorithm for the detection of inter- domain SYN port scans named Rank-based SYN (R-SYN) port scan detection algorithm. It combines and adapts three detection techniques in order to obtain a unique global state- ment about the malicious behavior of host activities. An evaluation of the accuracy of our approach has been carried out using several traces, some of which including original traffic dumps, some others altered by injecting packets that simulate port scan activities. Accuracy results show that our algorithm is able to produce a list of scanners characterized by high detection and low false positive rates.

Categories and Subject Descriptors

C.2.0 [Computer-Communication Networks]: General—

Security and protection; I.5.2 [Pattern Recognition]: De- sign Methodology—Pattern analysis

General Terms

Algorithms, Measurement, Security

Keywords

Intrusion detection systems, Complex Event Processing, Port scan

1. INTRODUCTION

Port scan is one of the most widespread mechanism used by attackers for obtaining information on possible vulner-

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

EWDC ’11,May 11-12, 2011, Pisa, Italy

abilities of any target. Port scan is a preparatory action performed in several security threats such as worm spreading, botnet formation and DDoS attacks. Among the various existing port scan variants, a very common one is the SYN port scan, also named Half Open (HO) port scan [16]. HO port scan is a form of stealthy port scan that aims at uncov- ering the status of certain TCP ports without being traced by application level loggers.

When an attacker is committed to a target organization or enterprises, its first steps usually consist in reconnaissance activities, done through port scanning, that discover whether any vulnerability exists which could be exploited at a later time for various purposes (e.g., capturing sensitive information, disrupting service operation). The attack consists in sending probing packets to specific enterprise sites in order to find out those sites that exhibit some leaks. An enterprise defends itself from such activity by means of In- trusion Detection Systems (IDSs). IDSs monitor the traffic coming into enterprise network by looking for suspicious patterns and signatures. Usually, attackers are aware of the presence of such systems: they thus attempt to perform their activities in a stealthy fashion in order to elude the IDSs.

IDSs trace the amount and pattern of connections issued by source hosts and check behavioral profiles against some configured rule based on time windows and thresholds. From an attacker point of view, one of the most effective way of circumventing these security checks consists in distributing the port scans both in space, probing a few ports of interest at different sites to avoid exceeding configured thresholds, and in time, delaying the single probes to bypass time window controls.

If the enterprise is deployed on top of a single adminis- trative domain and it is connected to the internet through a single physical network connected to the Internet, IDSs can be effectively tuned, and most of the earlier reconnaissance activities can be detected by correlating the data received by enterprises’ hosts. For large scale enterprises (e.g., corporate banks, power grid companies etc) spanning multiple domains, each including many sites, the defense offered by these IDSs cannot be sufficient since probing packets could be sent to distinct enterprise domains whose data cannot be correlated. Without a global correlation over the traffic observed at every domain, no suspect would arise about the behavior of an attacker who is carrying on an inter-domain port scan targeting the domains of some corporate company.

In this paper we address the general problem of detecting inter-domain port scanning activities originating from single sources. The detection is carried out in a cooperative

(2)

fashion by correlating network traffic data coming from geographically distributed enterprise nodes. To this end, we have designed an IDS architecture that can (i) easily deal with the evolution of the monitored system. In a large scale enterprise network, additional domains may be added; the architecture is able to extend its deployment in order to monitor these new parts of the system; and (ii) devise an easy way for updating the detection logic which also has low cost of ownership and high flexibility. As new security mechanisms are put in practice, malicious attackers enable new ways for circumventing them. In order to cope with this evolving scenario it is required that the architecture can promptly deploy new techniques for facing these brand-new threats.

The proposed solution employs so-called Gateway components, i.e., software sensors located at each enterprise geographically dispersed domain that is to be monitored.

Gateways send captured network traffic data to a Complex Event Processing (CEP) engine. The engine is responsible for correlating the data and thus discover spatial and/or temporal relationships among apparently uncorrelated data that would have been undetected by in-house IDSs. We use Esper [6] as CEP engine. Esper carries out its computation on the basis of a set of SQL-like queries that can be configured at run time. This latter Esper’s feature allows us to dynamically adapt the detection logic the IDS is expected to accomplish, integrating queries for facing new threats that may arise. To validate the effectiveness of the proposed architecture, we implemented a new SYN port scan detection algorithm, i.e., the Rank-based SYN (R-SYN) algorithm which adapts and combines three port scan detection techniques: half-open connections detection, horizontal and vertical scans detection, and entropy-based failed connections detection. The algorithm is implemented through a set of queries hosted at the CEP engine; the queries take as input packets (sent by the Gateways) of the TCP 3-way handshaking protocol.

We carried out an experimental evaluation in order to assess the detection accuracy of our architecture. We used real traffic traces both in their original version and after having injected packets simulating SYN port scan activities. In our evaluation we point out the importance of using all the three techniques, including the entropy on failed connections, by showing the differences in accuracy when using our detection algorithm with and without one of the techniques. The results we obtained are based on a formal evaluation model and showed high detection and low false positive rates.

The rest of the paper is organized as follows. Section 2 introduces related works in the field of IDSs and CEPs to motivate the choice of Esper. Section 3 describes the R-SYN port scan detection algorithm. Section 4 presents the solution we designed and implemented. Section 5 outlines the experimental evaluation we carried out. Section 6 discusses the main conclusions of this work.

2. RELATED WORK

Many free IDSs exist that are deployed in enterprise settings. Snort [10] is an open source Network Intrusion Pre- vention/Detection System. It performs real-time traffic analysis and packet logging on IP networks to detect probes or attacks. Bro [7] is an open-source Network IDS that pas- sively monitors network traffic and searches suspicious activity. Its analysis includes detection of specific attacks using

both defined signatures and events patterns, and unusual activities. Even though some works addressing distributed Snort-based or Bro-based IDSs (e.g., [5] [15] [2]) exist, to the best of our knowledge no concrete application seems to have been developed which faces the issues of online gath- ering raw events from geographically dispersed nodes and correlation of those events at real time. Moreover, [5], [15]

and [2] propose the correlation of alerts produced by peripheral sensors. Such alerts are generated using local data, only. The information that is cut away by the peripheral computations could bring IDSs to miss crucial details necessary for gaining the global knowledge required to detect inter-domain malicious behaviors. Our solution addresses precisely this issue through the usage of a general-purpose CEP that offers great flexibility to the management of the detection logic. The need for ease of deployment also drove us to choose among Java-based CEP systems.

CEP and Stream Processing (SP) systems play an impor- tant role in the IT technologies. IBM System S [11] has been used by market makers in processing high-volume market data and obtaining low latency results, as reported in [19].

System S (like others CEP/SP systems, e.g. [12] [18]) is based on event detection across distributed event sources.

However, all these systems exhibit high cost-of-ownership.

Our solution employs open source CEP systems such as JBoss Drools [8] and Esper [6]. JBoss Drools is a Business Logic Integration Platform which provides a unified and integrated platform for Rules, Workflow and Event Process- ing. Esper is a CEP engine technology that processes events and discovers complex patterns among multiple streams of event data. As our solution did not require an entire Busi- ness Logic Platform as provided by JBoss Drools, we chose the lightweight Esper engine for our research.

3. RANK-BASED SYN PORT SCAN DETEC-

TION ALGORITHM

Let us consider a scanner S, a target T and a port P to scan. A TCP SYN (half-open) port scan is characterized as follows: S sends a SYN packet to T : P and waits for a response. If a SYN-ACK packet is received, S can conclude that P is open and optionally reply with an RST packet to reset the connection. In contrast, if an RST-ACK packet is received, S can consider P as closed. If no packet is received at all and S has some knowledge that T is reachable, then S can conclude that P is filtered. Otherwise, if S does not have any clue on the reachability status of T , it cannot assume anything about the state of P .

Not all the port scans can be considered malicious. For instance, there exist search engines that carry out port scanning activities in order to discover Web servers to index [14].

It becomes then crucial to distinguish accurately between ac- tual malicious port scanning activities and benign ones. To this end, we have designed the so-called Rank-based SYN (R-SYN) port scan detection algorithm which adapts and combines three port scan detection techniques; namely (i) Half Open connections detection, (ii) Horizontal and Verti- cal port scans detection, and (iii) Entropy-based failed connections detection.

Half open connections detection It analyzes the sequence of SYN, ACK, RST packets in the three-way TCP handshake. Specifically, in normal activities the following sequence is verified (i) SYN, (ii) SYN-ACK, (iii) ACK. In the presence of a SYN port scan, the connection looks like

(3)

the following: (i) SYN, (ii) SYN-ACK, (iii) RST (or noth- ing) and we refer to it as an incomplete connection. For a given IP address, if the number of incomplete connections is higher than a certain threshold THO (see below), we can conclude that the IP address is likely carrying out malicious port scanning activities.

DEF: for a given IP address x, let countHO(x) be the number of incomplete connections issued by x; we define HO(x) as follows:

HO(x) =

1 if countHO(x) > THO

0 otherwise

Horizontal and vertical port scans detection. In a horizontal port scan the attackers are interested in a port across all IP addresses within a range. In a vertical port scan, attackers scan some or all the ports of single destination hosts [17].

In our algorithm we adapt the Threshold Random Walk (TRW) technique described in [14]. TRW classifies a host as malicious observing the sequence of its requests. Looking at the pattern of successful and failed requests of a certain source IP, it attempts to infer whether the host is behaving as a scanner. While the original technique considers as a failure a connection attempt to either an unreachable host or to a closed port on a reachable host, we adapt the TRW technique in order to distinguish between connection attempts to unreachable hosts and attempts to closed port on reachable hosts, since the former concern horizontal port scans whereas the latter to vertical port scans. We have then designed two modified versions of the original TRW algorithm in order to take into account the earlier mentioned aspects.

Specifically, in order to detect horizontal port scans, we identify connections to both unreachable and reachable hosts.

Hosts are considered unreachable if a sender, after a time interval from the sending of a SYN packet, does not receive neither SYN-ACK nor RST-ACK packets, or if it receives an ICMP packet of type 3 that indicates that the host is unreachable. In contrast, hosts are reachable if a sender receives SYN-ACK or RST-ACK packets. For each source IP address we then count the number of connections to unreachable and reachable hosts and apply TRW algorithm. Let T RWHS(x) be the boolean output of our TRW algorithm version for horizontal port scans computed for a certain IP address x. “True” output indicates that x is considered a scanner. Otherwise x is considered a honest host.

DEF: for a given IP address x, we define HS(x) as follows:

HS(x) =

1 if T RWHS(x) == true 0 otherwise

In order to detect vertical port scans, we first identify connections to open and closed ports. We then count such connections for each source IP address. Let T RWV S(x) be the boolean output of our TRW algorithm version for vertical port scans computed for a certain IP address x.

DEF: for a given IP address x, we define V S(x) as follows:

V S(x) =

1 if T RWV S(x) == true 0 otherwise

Entropy-based failed connections detection Not all suspicious activities are actually performed by scanners. There exist cases in which the connections are simply failures and not deliberate malicious port scans [14]. As in [13], in order to discriminate failures from malicious port scans, we use an entropy-based approach. In our case, the entropy

is evaluated considering two elements of a TCP connection;

namely, destination IP and destination port. The entropy assumes a value in the range [0, 1] so that if some source IP address is issuing failed connections towards the same destination IP and port, its entropy is close to 0; otherwise, if the IP address attempts to connect without success to different destination IPs and ports, its entropy is close to 1.

This evaluation originates by the observation that a scanner IP address does not repeatedly perform the same operation towards specific hosts or ports: if the attempt fails a scanner likely carries out a malicious port scan towards different targets.

Given a source IP address x, a destination IP address y and a destination port p, we define f ailures(x, y, p) as the number of failed connection attempts of x towards y : p. For a given IP address x, we define N (x) as follows:

N (x) =P

y,pf ailures(x, y, p)

In addition, we introduce a statistic about the ratio of failed connection attempts towards a specific destination IP and port. We define stat(x, y, p) as follows:

stat(x, y, p) = f ailures(x,y,p) N (x)

The normalized entropy can then be evaluated applying the following formula:

DEF: for a given IP address x, EN (x) = −

P

y,p(stat(x,y,p)∗log₂(stat(x,y,p))) log₂(N (x))

Combining the techniques. We have designed a ranking mechanism which allows us to minimize the probability that a scanner cheats by behaving apparently in a good way: for example, even if scanners try to circumvent one technique, it is likely that their malicious activities are recognized by one of the other two, thus permitting to identify them in any case.

Our mechanism sums up the three values related to half opens, horizontal and vertical port scans and weights the total result using the entropy-based failed connections.

DEF: for a given IP address x, we define rank(x) as follows:

rank(x) = (HO(x) + HS(x) + V S(x)) ∗ EN (x) Such ranking is compared (≥) to a fixed threshold in order to mark an IP address as scanner.

4. INTER-DOMAIN IDS ARCHITECTURE

The architecture of the inter-domain IDS consists of a Gateway component installed in each domain and a single Esper CEP engine instance deployed in any of the available domains. The architecture is shown in Figure 1.

Gateway Traffic data are captured from the internal domains. In order to be analyzed by the engine, the data are to be normalized and transformed in Plain Old Java Objects (POJOs). To this end, the Gateway component has been designed and implemented to (i) take as input the flows of network data (TCP data in Figure 1), (ii) filter them to keep packets related to TCP three-way handshaking only, and, finally (iii) wrap each packet in a proper POJO to be sent to Esper. We implemented TCPPojo for TCP packets and ICMPPojo for ICMP packets. Each POJO maps every field in the header of the related protocol. POJOs are serial- ized and sent through Java sockets to Esper. When sending the POJOs our implementation maintains the order of the captured packets, which is crucial when evaluating sequence operators in the EPL queries of the Esper engine.

(4)

Figure 1: Inter-domain IDS architecture Complex Event Processing (CEP) The Esper CEP engine [6] receives POJOs that represent the events it has to analyze (input streams). The processing logic is specified in a high level language similar to SQL, namely the Event Pro- cessing Language (EPL). In order to detect malicious port scanning activities a number of EPL queries are defined and executed by the engine, as shown in Figure 1. EPL queries run over a continuous stream of POJOs and produce output streams. When an EPL query finds a match against its clauses in its input stream, it generates a new tuple that is added to its output stream. A subscriber is a Java ob- ject that can be subscribed to a particular output stream so that whenever the query outputs a new tuple, the update() method of the subscriber is invoked using the tuple as argu- ment.

Each detection technique of the R-SYN port scan detection algorithm is implemented using a combination of EPL queries and subscribers. The EPL queries are in charge of recognizing any packet patterns which reveal an anomalous behavior according to the detection metric. Subscribers are invoked whenever such matches are verified; they update global data structures we have created so as to maintain the status about the behavioral profile of each IP address that has chances of being suspected in future. The data structures are the following:

• IPs suspected by Half Open connections technique (listHO);

• IPs suspected by Horizontal port scan technique (listHS);

• IPs suspected by Vertical port scan technique (listV S);

• stat values for computing the entropy (entropy).

General queries. We implemented general queries for all the three techniques. For instance, the following one (syn stream) filters SYN packets:

insert into syn_stream select sourceIP, destIP, ...

from TCPPojo where SYNFlag=true and ACKFlag=false We use a further query (syn ack stream) for filtering in contrast SYN + ACK packets.

Half Open connections detection. Incomplete connections are identified using the following query

insert into halfopen_connection select ...

from pattern [

every a = syn_stream -> ( ( b = syn_ack_stream(...) -> (

) where timer:within(10 sec) ) ) ]

Note that in this case we exploit the pattern construct of Esper to detect patterns of incomplete connections. In particular, a is the stream of SYN packets, b is the stream of SYN+ACK packets, <c> is a filter for RST packets and

<d> is the stream of ACK packets that would correctly complete the three-way handshaking. Such pattern matches if for every a packets these are followed by b packets followed in turn by ( <c> or <d>) within a time window of 10 seconds.

Additional queries are then used and bound to Subscribers for filtering IP addresses that made more than THO (equal to 2) incomplete connections, and updating listHO. Horizontal port scans detection. Connections attempts to both reachable and unreachable hosts are recognized. The following query shows how connections to reachable hosts can be detected:

insert into host_reach_unreach select ..., -1 as value from pattern [

every a = syn_stream -> ( ( (<b> or <c>) and not <d>

) where timer:within(10 sec) ) ]

Tuples of related output stream have a value field that is -1 for reachable hosts and 1 for unreachable hosts. The pattern for distinguishing reachable hosts consists of SYN packet (a), followed by a packet that can be a SYN + ACK (<b>) or RST + ACK (<c>) but not an ICMP packet (<d>). Such pattern matches if involved packets are within a time window of 10 seconds. The query we use for unreachable hosts can be expressed as follows:

insert into host_reach_unreach select ..., 1 as value

from pattern [

every a = syn_stream ->

timer:interval(10 sec) and

not ( (<b> or <c>) and not <d> ) ]

In this case, we search a data pattern in which a SYN packet is not followed by any packet matching the expression ((<b> or <c>) and not <d>) within a time interval of 10 seconds. The meaning of the symbols in the expression is the same as for the query that detects reachable hosts. The output stream is then manipulated by another query that creates a further stream as follows:

select sourceIP, count(*) as SUM, sum(value) as DIFF from host_reach_unreach

group by sourceIP

A Subscriber is registered for this query and it executes TRW calculations. In particular, the number of connections to reachable and unreachable hosts, required by the TRW computation, are obtained using SU M and DIF F fields that result from the above query. If an IP address is suspected of carrying out a horizontal port scan, listHS is updated accordingly.

Vertical port scans detection. In order to detect vertical port scans, connection attempts to both open and closed

(5)

ports are discovered so as to produce port open closed stream.

Such stream has a value field equal to 1 for closed ports and -1 for open ports, as host reach unreach earlier described.

The main difference from horizontal port scans is the pattern used to single out connections to closed ports and connections to open ports.

Similarly to horizontal port scan detection another query is used to compute SU M and DIF F fields, used in turn by a Subscriber for vertical scans that executes TRW calculations and updates listV S.

Entropy-based failed connections detection. We implemented three queries for detecting connections to unreachable hosts, closed ports and incomplete connections, re- spectively, and updating f ailures stream having sourceIP , sourceP ort, destIP and destP ort fields. The following query creates a stream containing the information required for computing stat values:

insert into entropy_syn_stream select sourceIP, destIP,

destPort, count(*) as stat from failures

group by sourceIP, destIP, destPort

The Subscriber registered for such stream updates the entropy data structure we used in our entropy-based computation.

Computing the entropy requires a large amount of storage due to the fact that sourceIP can be any possible IP address while destIP is restricted to an IP address of the enterprise domains. A rough estimation of the storage necessary to compute the entropy is in the order of 256TBytes when considering large organizations owning 2¹⁶IP addresses. In case this storage is not available a windowing system can be in- troduced to cut storage.

Ranking. Once global data structures are updated, the ranking of suspicious hosts is evaluated. Given an IP address x, HO(x) (HS(x) and V S(x)) is equal to 1 if and only if x belongs to listHO (listHS and listV S), 0 otherwise. EN (x) is computed considering stat values contained in entropy data structure. If the computed rank(x) exceeds a certain threshold, x is marked as scanner. In the experiments shown in the next section the threshold is set to 0.52.

5. EXPERIMENTAL EVALUATION

Testbed. We have deployed the IDS prototype we have implemented on a small cluster of 4 Windows Virtual Ma- chines (VMs), each of which equipped with 2GB of RAM and 63GB of disk space. The 4 VMs were hosted in a cluster of 4 quad core 2.8 Ghz dual processor physical machines equipped with 24GB of RAM. The physical machines are connected to a LAN of 10Gbit.

The layout of the components on the cluster consisted of a VM dedicated to host the Esper CEP engine. Each of the re- maining 3 VMs represented the resources made available for the simulated domain. Each resource hosted the Gateway component.

Traces To test the effectiveness of our IDS, we used 4 real traces obtained from ITOC research web site [3], LBNL/ICSI Enterprise Tracing Project [4] and MIT DARPA Intrusion detection project [1]. To test the detection accuracy of our IDS architecture we have synthetically added scanners to

all the traces with a “trace infector” program that infects a trace by injecting TCP packets which simulate all known patterns of SYN port scanning. Table 1 provides a high level summary of the content of the traces. We have man- ually assessed offline the content of the traces in order to understand the scenarios of attacks included in those data, thus comparing those scenarios with the results obtained from R-SYN algorithm.

sources connections scanners

trace1 - 3MB 10 1165 0

trace1* - 3MB 10 1429 7

trace2 - 5MB 15 223 1

trace2* - 5MB 15 487 8

trace3 - 80MB 36 9559 0

trace3* - 80MB 36 9823 7

trace4 - 160MB 39 413772 3

trace4* - 160MB 39 414036 10

* infected trace

Table 1: Content of the traces

Aggregate Input Streams Bandwidth Our centralized IDS may suffer from scalability issues when new domains to monitor are added. To estimate how IDS network load increases with the growth of the number of domains, we evaluated the aggregate bandwidth used by Gateways for sending POJOs to Esper. Each of the 8 traces has been partitioned over the three simulated domains and the required aggregate bandwidth was always less than 80 kbit/s.

When attacks are carried out, we could expect an increase of this bandwidth; however, it is expected to remain within a reasonable range with respect to the available bandwidth of the enterprise connection. The aggregated bandwidth shows also that our IDS can scale to tens of domains.

Results In order to assess the accuracy of the IDS, we partitioned the traces, both in their original and infected versions, and injected resulting sub-traces to available Gateways to observe what we were able to detect. In particular, in order to show the effectiveness of our R-SYN algorithm we run a number of tests using the traces and considering the following accuracy metrics (following the assessment described in [20]): (i) T P (True Positive) which represents the number of suspicious activities we detect as attacks and are true attack activities; (ii) F P (False Positive) which represents an error of the detection; that is, a normal activity which is erroneously considered as an attack; (iii) T N (True Nega- tive) which represents a number of normal activities that we detect as true normal activities; (iv) F N (False Negative) which represents a number of activities that are real attacks that we do not detect. With these values we computed DR (Detection Rate) as DR = T P/(T P + F N ) and, finally, F P R (False Positive Rate) as F P R = F P/(F P + T N ).

The results of these tests are reported in Table 2.

In case of trace1 and trace3, there is no DR as no scanner activities are included in those traces (see Table 1). From the results it emerges that the R-SYN algorithm is able to accurately detect attackers activities, with an exception for trace4 and trace4* where one scanner is not recognized and two source IPs (trace4) or one source IP (trace4*) are erroneously considered malicious.

In order to show the effect of the entropy on the R-SYN algorithm we report test results executed on a Ranking func-

(6)

TP FP TN FN DR FPR

trace1 0 0 10 0 – 0%

trace1* 7 0 3 0 100% 0%

trace2 1 0 14 0 100% 0%

trace2* 8 0 7 0 100% 0%

trace3 0 0 36 0 – 0%

trace3* 7 0 29 0 100% 0%

trace4 2 2 34 1 66% 6%

trace4* 9 1 28 1 90% 3%

*Infected trace

Table 2: Accuracy metrics for R-SYN Algorithm

TP FP TN FN DR FPR

trace1 0 4 6 0 – 40%

trace1* 7 1 2 0 100% 33%

trace4 2 17 19 1 66% 47%

trace4* 9 10 19 1 90% 34%

*Infected trace

Table 3: Accuracy metrics for R-SYN Algorithm without entropy correction

tion that does not exploit the entropy computation. In this case rank(x) is an integer in the interval [0 − 3] and the threshold used to detect the scanners is set to 1. Table 3 reports only the values of the traces that change from Table 2. Comparing the two tables, it clearly emerges that the entropy correction is highly effective in reducing F P R. In all 4 traces reported in Table 3, at least 1/3 of non malicious source IPs are recognized as scanners. This is actually due to the ability of entropy-based technique to properly discriminate suspicious activities from honest TCP failures.

In conclusion, the ranking system used in R-SYN algorithm shows an increased detection accuracy when implemented considering all the three proposed detection techniques, including the entropy-based technique.

6. CONCLUDING REMARKS

More and more cyber attacks are carried out against large organizations (e.g., corporate banks, power grid companies).

Currently 1 out of 5 of such attacks comes together with an extortion [9]. Port scanning is a preparatory action to many major attacks such as worm spreading and DDoS perpe- trated against large organizations.

In this paper we described an IDS architecture based on CEP which specifically targets the detection of reconnaissance activities against organizations spanning multiple ad- ministration domains. The architecture is based on Gate- ways deployed at the different domains of the organizations that send only the necessary data (i.e., packets related to the 3-way handshake protocol) to a central CEP engine for correlation purposes. We presented the R-SYN port scan algorithm and deployed it on the inter-domain IDS architecture. Results on real traces show the effectiveness of our approach with respect to the detection accuracy. In particular we obtained high detection and false positive rates.

Even though storage and bandwidth requirements of inter- domain IDS can be met, in order to increase scalability we are studying techniques to bound the storage needed at the CEP side and to reduce the bandwidth necessary to transfer the data from the Gateways to the engine. We are currently conducting a number of experiments aiming at evaluating both the scalability of the entire system and its performances in terms of detection latencies in WAN settings.

7. ACKNOWLEDGEMENTS

This work has been partially supported by the EU project CoMiFin on the protection of the Financial Infrastructure from Cyber Attacks.

8. REFERENCES

[1] 2000 DARPA Intrusion Detection Scenario Specific Data Sets. http://www.ll.mit.edu/mission/communications/

ist/corpora/ideval/data/2000data.html.

[2] Broccoli, the Bro client communications library.

http://www.icir.org/christian/broccoli/index.html.

[3] ITOC Research: CDX Datasets. http:

//www.itoc.usma.edu/research/dataset/index.html.

[4] LBNL/ICSI Enterprise Tracing Project.

http://www.icir.org/enterprise-tracing/.

[5] Complete Snort-based IDS Architecture.

http://cybervlad.net/ids/index.html, 2002.

[6] Where Complex Event Processing meets Open Source:

Esper and NEsper. http://esper.codehaus.org/, 2009.

[7] Bro: an open source Unix based Network intrusion detection system (NIDS). http://www.bro-ids.org/, 2010.

[8] JBoss Drools Fusion.

http://www.jboss.org/drools/drools-fusion.html, 2010.

[9] McAfee report In the crossfire: critical infrastructures in the age of cyber war, 2010.

[10] Snort: an open source network intrusion prevention and detection system (IDS/IPS). http://www.snort.org/, 2010.

[11] System S. http://domino.research.ibm.com/comm/

research_projects.nsf/pages/esps.index.html, 2010.

[12] M. Akdere, U. ¸Cetintemel, and N. Tatbul. Plan-based complex event detection across distributed sources.

PVLDB, 1(1):66–77, 2008.

[13] W. G. Hai Zhang, Xuyang Zhu. Tcp portscan detection based on single packet flows and entropy. In ICIS ’09 Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human, 2009.

[14] J. Jung, V. Paxson, A. W. Berger, and H. Balakrishnan.

Fast portscan detection using sequential hypothesis testing.

In In Proceedings of the IEEE Symposium on Security and Privacy, 2004.

[15] S. R. Kreibich C. Policy-controlled event management for distributed intrusion detection. In Distributed Computing Systems Workshops, 2005. 25th IEEE International Conference on, 2005.

[16] J. A. H. S. Staniford and J. M. McAlerney. Practical automated detection of stealthy portscans. In Proceedings of the 7th ACM Conference on Computer and

Communications Security, 2000.

[17] H. Singh and R. Chun. Distributed Port Scan. Handbook of Information and Communication Security - Part B, pages 221–234, 2010.

[18] C. Tang, M. Steinder, M. Spreitzer, and G. Pacifici. A Scalable Application Placement Controller for Enterprise Data Centers. In 16th international Conference on World Wide Web, 2007.

[19] X. J. Zhang, H. Andrade, B. Gedik, R. King, J. Morar, S. Nathan, Y. Park, R. Pavuluri, E. Pring, R. Schnier, P. Selo, M. Spicer, V. Uhlig, and C. Venkatramani.

Implementing a high-volume, low-latency market data processing system on commodity hardware using ibm middleware. In WHPCF ’09: Proceedings of the 2nd Workshop on High Performance Computational Finance, pages 1–8, New York, NY, USA, 2009. ACM.

[20] C. Zhou, S. Karunasekera, and C. Leckie. Evaluation of a Decentralized Architecture for Large Scale Collaborative Intrusion Detection. In In Proceedings of the 10th IFIP/IEEE International Symposium on Integrated Network Management, 2007.