ACollaborativeProcessingSystemforCyberAttacksDetectionandCrimeMonitoring Universit`adegliStudidiRoma“LaSapienza”

(1)

Universit` a degli Studi di Roma

“La Sapienza”

Facolt` a di Ingegneria dell’Informazione,

Informatica e Statistica

Tesi di Laurea Magistrale in Ingegneria

Informatica

Sessione Autunnale- Ottobre 2010-2011

A Collaborative Processing System for Cyber

Attacks Detection and Crime Monitoring

Giuseppe Antonio Di Luna

Relatore: Prof. Roberto Baldoni

(2)

(3)

Abstract

This Master Thesis analyses the use of Complex Event Processing (CEP) and collaboration among different organization for detecting and monitoring different kinds of threats:

PortScan Detection, Monitoring of Fraud with electronic payment and counterfeit money.

The use of a collaborative environment allows the detection of malicious activities carry out over boundaries of different organization or the monitoring of events loosely related in order to have a big picture of the current fraud activities. The first part of this thesis addresses the problem of detecting inter-domain stealthy port scans and proposes an architecture of an Intrusion Detection System which uses, for such purpose, an open source Complex Event Processing engine named Esper[29]. Two algorithm for the detection of interdomain SYN port scans have been implemented: The first, named Rank-based SYN (R-SYN) port scan detection algorithm, combines and adapts three detection techniques in order to obtain a unique global statement about the malicious behavior of host activities. The second, LineFitting, uses line interpolation to detect the classic pattern of portscanning. In the second part a particularly vicious kind of portscan, the coordinated one, is investigated and techniques for the detection of this attack are proposed and tested. This kind of attack is very difficult to detect since the evidences of the malicious behaviour are spread among different sources, intra and inter domain analysis are needed in order to detect this attack. My aims is to find-out a method to detect a specific subset of this kind of attack that is likely to go undetected using the existent approaches. In the third part the flexibility of CEP is shown building a fully-functional prototype: correlation engine and web interface for the geographical monitoring of counterfeit currency and frauds with electronic payments. The set of data used in this case belongs to real dataset owned by the Ministry of Economic and Finance.

(4)

(5)

Chapter 1 Introduction

1.1 Scenario

Nowadays, security threats are more and more complex. Attackers distribute their action in time and space. They are able to launch, using botnets, coordinated attacks involving a large amount of sources geographically dispersed over different administrative boundaries.

Sophisticated attacks are composed by many phases distributed over a large timespan and large amount of targets [6, 7, 9]. Moreover the targets may be dispersed over many different organization domains, each domain may have its personal NIDS. A preliminary phase for information gathering is a prelude for many attacks, an early detection of those attempts could be the difference between a successful attack and a successful countermeasure. There are other scenarios, not strictly related to cyber-attacks, where the information useful to reach some goal are spread between different participants or between different datasets. One example could be the monitoring of frauds, in this cases the information could be divided by the nature of the crimes, for example the italian Ministry of Economy and Finance (MEF) has two applications SIRFE and SIPAF that monitor the counterfeit moneys and the frauds with electronic payments. This applications do not share any information between them.

1.2 Motivation

Referring to the Scenario, it is clear that the attack evidences may be dispersed between different domains. If this domains do not embrace a cooperation paradigm the attack would be undetected. It is possible to find sophisticated tools that allows unskilled attackers to launch coordinated attacks from many sources to many targets. The skilled attackers are now using their knowledges for profit: it is well-documented that hackers offer to sell botnet for purpose of cyber-sabotage [18]. Complex attacks are often split in

(10)

two phases an information gathering phase and an active attack phase. The information gathering phase aims to discover possible vulnerabilities (0-day exploit) usable in the attack phase. One of the purposes of the attacker is to be undetected during the two phases of the attack. The portscan is a widely spread techniques to gather information from targets. Detecting that a portscan is happening, could be an early detection of an in fieri attack. The portscan is becoming more complex in order to avoid the Intrusion Detection Systems. The original pattern one-host to many is masquerade, attackers are migrating to inter-domain stealthy scan that involves few hosts on many domain, or coordinated inter-domain stealthy scan that involves many sources distributing their probes on different domains. A professional attacker is well aware that the use of local Intrusion Detection System is common on all the medium to large organizations, it is naive to think that he could expose himself when he can avoid that. It is possible that a set of weak evidences that are to few to trigger a meaningful local alert could lead to a strong proof that a scan is going on when are correlated to similar alert at global level.

The main questions are:

• Is it possible to use the collaboration to detect complex portscan?

• Is the collaboration mandatory to discover the complex version of such attacks?

There are things impossible to do without a collaborative approach, for example let us think about an attacker that owns a large amount of bots that use for portscan, more than the size of attacked network since only one of the network can’t see all the zombies it is clear that in the best case the vision of each of them is partial. Moreover the collaboration is useful even in other fields not strictly related with the cyber-attacks.

The monitoring of crime event is one example, in this case there are events spread on a huge geographical territory involving many different actors, this actors send the events to two different datasets according to their nature. It is possible to correlate this events in order to obtain useful aggregate information? It is possible to aggregate the event in a fast way, using continuous queries?

1.3 Thesis Contribution

This Thesis proposes two algorithms for single source portscan detection that are tested in the collaborative environment. The prototype is made with a CEP engine that allows fast correlation of data even in the presence of high-rate streams. The tests made over this prototype show that they effectively address the detection problem achieving high detection rate with a respectable latency and an extremely low false-positive rate, these preliminary results are also presented in ([43],[44]). The coordinated portscan attack is

(11)

examined and modelled. Two algorithms for the detection of this attack are developed and studied. Moreover a collaborative hierarchical system, that uses this two algorithms, is presented and tested. The prototype built in java, seems to detect, with a good accuracy, this attack in complex situation when some real NIDS (snort) doesn’t. The last contribution is the use of the collaborative paradigm between different datasets. Those dataset belongs to two different applications of UCAMP [48], the data are correlated using continuous queries in order to obtain a ”big picture” that is not visible studying the datasets separately, part of this work has been submitted to [32].

1.4 Thesis Outline

The contents of the Thesis are organized as follows:

Collaborative Processing System (2) In this chapter is present a brief overview of the collaborative processing systems, and of complex event processing.

Portscan Detection (3) The section dedicated to the Portscan detection starts with an overview of Portscan Detection importance, followed by a taxonomy on portscan techniques (3.1.1). The current state of the art is also explored focusing on the most employed algorithms (3.2). Afterwards, it is given an explanation of the two developed algorithms: R-SYN 3.3 and Linefitting (3.4). At the end of the section (3.6) are depicted implementation details, testbed configuration and test results.

Coordinated Portscan Detection (4). The chapter starts with an introduction and an overview of the related works (4.1).The coordinated portscan is analysed and techniques for the detection are presented and tested. In Section (4.2) a model of portscan activities is presented, that includes normal and coordinated portscans, specifically in Subsection (4.2.5) the goals of the work are defined, proposing the definition of detection for such coordinated activities. In Section (4.3) the main idea behind the proposed algorithm is discussed. In (4.4) the impact of the time on the detection is examined, analyzing how this can be used as an advantage. A second detection algorithm is presented in (4.5) and in (4.6) a prototype that implements the ideas discussed in the previous sections is briefly discussed. For last we have the tests on the prototype presented and discussed in (4.7) and in (4.8) there are some final consideration.

GeoFraud Monitoring (4.8) In this Chapter is explained how the collaborative paradigm can be adapted to work within different datasets instead of different organizations.

(12)

The whole project is explained starting from the available data, through the correlation realized ending with the implementation details.

(13)

Chapter 2 Collaborative Processing System

Detecting event correlations and reacting to them are in the core of Complex Event Processing (CEP) and Stream Processing (SP), both of which play an important role in the IT technologies employed by the financial sector as overviewed in [14]. The CEP solution employed in this work is Esper [29], more about Esper and CEP can be found in the appropriate section (2.2.1). The issue of using massive complex event processing among heterogeneous organizations for detecting network anomalies and failures has been suggested and evaluated in [35]. Recently, collaborative approaches addressing the specific problem of Intrusion Detection Systems (IDSs) have been proposed in a variety of works [5, 46, 68]. Differently from singleton IDSs, collaborative IDSs significantly improve time and efficiency of misuse detections by sharing information on attacks among distributed IDs from one or more organizations [69]. The main principle of these approaches is that there exist local IDSs that detect suspects by analyzing their own data. These suspects are then disseminated using possibly peer to peer links. This approach has two main limitations: it relies on data that can be freely exchanged among the peers and it does not fully exploit the information seen at every site. The former constraint can be very strong due to the data confidentiality requirements that are to be met. Commercial examples of such systems are distributed Snort-based [4] or Bro-based IDSs [42] [8]. These systems propose the correlation of alerts produced by peripheral sensors. Such alerts are generated using local data, only. The information that is cut away by the peripheral computations could bring IDSs to miss crucial details necessary for gaining the global knowledge required to detect inter-domain malicious behaviors.

2.1 Comifin and Semantic Room

Comifin [23] project aims to bring collaboration among different financial infrastructure in order to reach a common goal. The goal was to achieve a middleware for the pro-

(14)

tection of financial infrastructure from inter-domain malicious attack: Portscan, Man In the Middle, Denial of Service. The abstraction used for collaboration is called Semantic Room (SR) : The SR enables the construction of collaborative and contractually regulated environments where aggregation and correlation of data provided by organizations participating in the Semantic Room can be carried out with the aim of monitoring large scale infrastructures and providing early detection of attacks, frauds and threats. Each Semantic Room has a specific strategic objective to meet. This abstraction is useful because enforce a way to collaborate in a contract regulated environment. This is crucial when the computation involves data coming from many organization that can be reluc- tant to participate if some contract rules are not present. Those contract rules specify practical aspects like the resource that each participant need to share, and other aspects like privacy of data.

Data Dissemination Complex Event Processing

and Applications

Internet

Processed Data (external consumption)

Contract

Organization_1

Organization_2

Organization_K . . .

Semantic Room Members

Edge Pre- processor

Figure 2.1: Semantic Room

The SR abstraction supports the deployment of two principal components termed Complex Event Processing and Applications, and Data Dissemination. These components can vary from SR to SR depending on the software technologies used to implement the SR processing and sharing logic. In addition, a set of management components are employed for SR management purposes (e.g., management of the membership, monitoring of the adherence to the SR contract).

A Semantic Room is defined by the following three elements:

• objective: each SR has a specific strategic objective to meet. For instance, there can exist SRs created for large-scale stealthy scans detection, or SRs created for detecting web-based attacks such as SQL injection or cross site scripting;

• contract : each SR is regulated by a contract that defines the set of processing

(15)

and data sharing services provided by the SR along with the data protection, privacy, isolation, trust, security, dependability, and performance requirements. The contract also contains the hardware and software requirements a member has to provision in order to be admitted into the SR.

• deployments: The SR abstraction is highly flexible to accommodate the use of different technologies for the implementation of the SR logic or functionality. In particular, the SR abstraction can support different types of system approaches to the processing and sharing; namely, a centralized approach that employs a central server (e.g., Esper [29]), a decentralized approach where the processing load is spread over all the SR members (e.g., a MapReduce-based processing [27]), or a hierarchical approach where a pre-processing is carried out by the SR members and a selected processed information is then passed to the next layer of the hierarchy for further computations.

SR members can inject raw data into the SR by means of SR Gateways deployed at the administrative boundaries of each SR member. Raw data may include real-time data, inputs from human beings, stored data (e.g., historical data), queries, and other types of dynamic and/or static content that are processed in order to produce complex processed data. Raw data are properly pre-processed by the SR Gateways in order to normalize them and satisfy privacy requirements as prescribed by the SR contract.

Processed data can be used for internal consumption within the SR: in this case, derived events, models, profiles, blacklists, alerts and query results can be fed back into the SR so that the members can take advantage of the intelligence provided by the processing.

The processed data are made available to the SR members via their SR Gateways; SR members can then use these data to properly instruct their local security protection mechanisms in order to trigger informed and timely reactions independently of the SR management. In addition, a (possibly post-processed) subset of data can be offered for external consumption. SRs in fact can be willing to make available for external use their produced processed data. In this case, in addition to the SR members, there can exist clients of the SR that cannot contribute raw data directly to the SR but can simply consume the SR processed data for external consumption.

SR members have full access to both the raw data members agreed to contribute to by contract, and the data being processed and thus output by the SR. Data processing and results dissemination are carried out by SR members based on obligations and restrictions specified in the above mentioned contract.

(16)

2.2 Focus of this work

Taking as reference the Semantic Room (2.1) in this work is examined the part of Process- ing and Application. The contract regulation and the data dissemination module aren’t take into account. Since three different problems are presented and studied, is necessary a different analysis of each of them:

Single source detection(3) The main difference between the proposed processing system and those presented in the literature is that the proposed one do not cut any local information that could improve the detection. This is done because the purpose of the system is to detect inter-domain attacks that aren’t possible to detect with a local detection.

Coordinated scan detection (4) The system uses local processing that is correlated globally in order to isolate attack communities. At the best of my knowledge there are not present in literature collaborative systems that detect coordinated portscan under the assumptions that I have made.

Fraud Monitoring 4.8 In this case the collaboration is among different datasets. In the literature i haven’t found similar system that aims to use event-driven architecture to monitor fraud and counterfeit money.

2.2.1 CEP and Esper

The developed applications need some timeliness guarantees for the aggregation. Packets coming from the network need to be analysed and aggregated at reasonable speed in order to detect some attack patterns in timely manner. This is necessary because an early detection is the first stage to counteract. A simple yet effective counteract mechanism can be the installation on the fly of a drop rule in the firewall for all packet coming from the suspect host, in the article [22] is clearly shown that detection-reaction paradigm is an effective defence. In order to gain such timeliness we need a flexible way to code algorithms on network data. The normal paradigm of build a RDBMS and doing queries on it inevitably add some latency because we need to enforce a duty cycle of interrogations, for example we need to check the threshold trespassing for some type of packets each 10s.

CEP overcome this installing continuous queries in a SQL-like language that are evaluated each time one interesting event come across. Where the event is defined like Etzion ”An event is an occurrence within a particular system or domain; it is something that has happened, or is contemplated as having happened in that domain”[30], using CEP system it possible to do a real time aggregation of the interested events that well suite the need of

(17)

real time pattern detection and monitoring. For that reason CEP and Stream Processing (SP) systems play an important role in the IT technologies. The solution employed in this work is Esper. Esper is an open-source CEP engine technology that processes events and discovers complex patterns among multiple streams of event data. The extensive documentation, the free availability of the software, the straightforward integration with java have lead to this choice, moreover EsperTech in some benchmarks claims that Esper is capable to handle 500000evt/s reaching an input information flow of 70 Mbit/s, this performances make Esper complaint with the scenarios investigated in this Thesis. The interested reader can find in the Sections (3.5,4.6.3,4.11.3) more details about Esper and its use in the Misuse Detection Scenario or in Fraud Monitoring. Thanks to the flexibility of Esper the developed platform can be easily adapted to many uses.

(18)

(19)

Chapter 3 Portscan Detection

3.1 Overview

The Portscan attack aims to find-out the active services on one or some hosts. This activity seams, at first glance, quite inoffensive. How dangerous could be the discovery of local services to foreigners? Actually, this can be quite dangerous, the discovery phase is often a prelude for the attacks. Panjwani et al [52] built a monitored environment for attract attackers (Honeynet), They found-out that 46% of scanning activity is followed by active attacks. In this perspective became important the possibility of detect such activities in order to use some early counteract mechanism: black-listing, drop rules and others. Moreover, in the wild exist tools that allow attackers to chain the detection activity with an automatized attack step. This step takes advantage of the information gained by the first phase and starts to try all the possible vulnerabilities. An example is the autopwn tool released with Metasploit[56], a famous framework for penetration testing. Autopwn takes the output of a portscan tool like NMap.

3.1.1 Portscan: A Taxonomy

Internet and Services

Each service running on some host needs an interface to communicate with other internet applications. This interface is mostly a Socket listening on some port. A port is a logical address within the machine. For that reason, a service can be identified by the pair IP address and port. Referring to stack that use IPv4-TCP one IP address is 32 bits long and identify unequivocally one machine in some network. The port is a 16 bits field and identify unambiguously a listening TCP service on some host. For convention, the port range is divided as follows:

(20)

1. Well Known (0,1023): used by processes that provide widely-used types of network services. The binding between the service offered and the number of port is known i.e. port 80 for web-service.

2. Registered (1024,49151): reserved port range for future binding or binding with services that are not widely-used.

3. Ephemeral (49152,65535): usable for any kind of application that needs to bind some port for listening or establish an outgoing connection.

Most of the time if we know on which port a program is listening, it is possible to know what kind of service it is offering. This is true for all the port in the well-known range.

This knowledge is the basic of each portscan. The trick is to send probes to a port range in order to find out if some port is in open state. A port is open if a service in binding on it, is closed otherwise. The behaviour of an open or closed port follows the RFC[55]

that specified the TCP protocol. The act of establishing one connection in TCP is called three-way handshake. One host b want to establish a connection with another host c on the port p₁. b sends one tcp packet with the flag SYN and sequence number x. If the port is open c will answer with a SYN/ACK packet with acknowledge number x + 1 and sequence number y, otherwise it will reply using a RST/ACK packet. The three-way handshake is exploited by the portscanner to find out the state of ports on remote hosts.

Figure 3.1: Three-way handshake

Vertical, Horizontal, Block, Open and Half-Open

We have seen that the main purpose of scan action is to discovery the active services on some remote host/s. The set of scanned ip port is called footprint, in [63] a classification

(21)

of portscans inVertical, Horizontal and Block was done.

Vertical Portscan The attacker may be interested to know all the service active on one host c. In that case the attacker will send his probes to a set of destination address D ⊂ IP × P ort, D ⊂ {< c, pj > : ∀pj ∈ P ort} in this way he will enumerate a set all the possible services present on c.

Horizontal Portscan Another possibility is that the attacker may want to know if the targeted hosts have active a specific service s⁰, maybe because is present 0-day exploit targeting s⁰, D ⊂ {< ipj, p(s⁰) > : ∀ipj ∈ IP } this kind of scan is called Horizontal.

Block Portscan A Block portscan, also called strobe scan, is a mixture of this two type.

In this case, the interest of the attacker is to know the status of some services on a set of targets.

Citing Staniford et Al. ”The most common type of portscan footprint at present is a horizontal scan”, this may be justified because for an attacker could be fine to find-out any vulnerable target, without pay attention to one specific target. On the other hand the vertical portscan is carried-out when the attacker is interested in one specific target and try to discover all its possible vulnerabilities. This kind of classification is related to the footprint of scanning and, in some way, the footprint is more related to the motivation behind the scanning than to a methodology for perform the scanning activities. Another classification can be done over the methodology. The straightforward way to check if some ports are opens is to establish connections to that ports. Checking the state opening and than closing the connection lead to the ”Open” portscan. The advantage of this kind of method is that opening a full connection we can gain some information from the service listening behind the port. For example, the port 22 is normally bound to the ssh service, but many kinds of servers can offer this service, the initial response to the connection can be a factor to discriminate the version because many service daemons specify in the first messages its identity. For example querying my router opening a connection on the 22 port result in this acknowledge message ”SSH-2.07-dropbear 0.46”. Knowing the name and the version of the listening daemon is an important notion for an attacker because can lead him to search in a vulnerability’s database some specific vulnerability targeting the running server. The drawback of open portscan is that the connection may result in the log file of the application server. To avoid this trace one possibility is to check the state of the port without open a valid connection for the upper layer. This can be done exploiting the structure of the three-way handshake. the trick is to send a syn packet towards the interested port p1 then wait for the response. If the response is rst packet is

(22)

then the port is closed otherwise a syn-ack packet means that the port is open, a missing response may mean that the port is filtered by a firewall. If syn-ack packet is received, the scanner may send one rst packet or stop answering in both the case the connection is interrupted before the opening. For that reason is called Half-Open the advantage of half-open connection is to know the state of the port without establish any contact with the upper layer service, because the kernel will not notify this attempt to the application that is waiting on the accept(). Another kind of portscan is the XMAS available using NMap [47], this technique sends packets that have an impossible combination of flags like SYN,FIN,URG hoping that this packet could deceive a bad filter mechanism. Instead.

this kind of packets can be easily detected by an IDS using a simple checking on bad flags combination. For this reason we will focus on the detection of Open and Half-Open portscan.

Figure 3.2: Half-Open portscan

3.2 State of the art

The systems responsible of detect attacks are called Intrusion Detection Systems (IDS).

They can be categorized in many ways. A first division is between systems host based and network based. The host based IDSs check the local logs in order to find-out some intrusion pattern, this kind of IDS is not suited for check attacks that involve a whole network or different networks. The Network based IDSs check the network dump, in batch or online way, disclosing attack patterns in the network packets. One of the idea present in this thesis is to develop a collaborative system in which different organization are involved, the easiest and maybe the only way to do that is sharing network log. This is why this work is oriented towards a NIDS approach, and a survey on this systems has been done. Two real and widely used network based IDSs are Snort ([10], [59]) and BRO [8]. Snort is an open source Network Intrusion Prevention/Detection System that performs real-time traffic analysis and packet logging on IP networks to detect probes or attacks. Bro is an open-source Network IDS that passively monitors network traffic and searches suspicious activity. Its analysis includes detection of specific attacks using both defined signatures and events patterns, and unusual activities. This two NIDS relies

(23)

on portscan detection algorithms that are similar to that used by the Network Security Monitor [34], which detects any source IP address that is issuing more than 15 distinct connection in a given time window. The Snort module for detection of portscans is called sfPortscan [26] and is based on a threshold algorithm: It checks whether a given source IP address connected more than some number of ports or IP addresses within a time-window of some seconds. Moreover a check on the structure of the packet is done.

If the structure is anomalous, a portscan is reported without waiting for any threshold trespassing. For that reason exploiting impossible flags combination (3.1.1) in order to find out possible bugged devices, is not so clever. Bro [53] also uses thresholding to detect scans. A single source attempting to contact many destination IP addresses is considered a scanner if the number of destinations exceeds a threshold, the main difference with the snort algorithm is the use of failure attempt as main indicator for connection that are directed towards specific services. To reduce the number of false positives, Bro uses packet and payload information for application level analysis. Another Detection system that addresses specifically the issue of portscan detection is SPADE, spade use a complex Bayesian Hypothesis testing and simulated annealing in order to do the detection, the nature of this algorithm is off-line.

3.2.1 Threshold Random Walk

Threshold Random Walk is the algorithm presented by Jung in ”Fast Portscan Detection Using Sequential Hypothesis Testing” [41]. This algorithm try to find attackers observing the outcomes of his connection attempts. The main idea behind the algorithm is that a normal host should generate during is normal activity a big number of successful connection and a small number of failures. A scanner lack of knowledge of which hosts and ports on the target network are active, and for this reason it should issue a significant number of failed attempts. For each source r TRW starts evaluating two possible hypothesis that the source is a scanner H1 or that the source is a normal user H0. This evaluation is carried out observing the indicator Yj. Yj is the outcome of the first connection attempt made by the host r towards j.

Y_j =







1, if established 0, otherwise Now we have the probabilities:

P [Yj = 0|H0] = θ0, P [Yj = 1|H0] = 1 − θ0

(24)

P [Y_j = 0|H₁] = θ₁, P [Y_j = 1|H₀] = 1 − θ₁

Since the hypotheses θ₀>θ₁, another hypothesis is that the variables Y |H are indepen- dents and identically distributed. This allows to use the Bernoulli distribution and we define the probabilities that a set of observations Y = {Y₁, ..., Y_N} is generated by H₀ as P [Y |H₀] =Qn

j=1P [Y_j|H₀]. So the likelihood ratio can be computed as:

Λ(Y ) = P [Y |H₁] P [Y |H₀]

Any new connection attempt coming from the host H is used to update the value of Λ(Y ) , that compared with two threshold η₀, η₁ if Λ(Y ) < η₀ the host is detected as scanner, if Λ(Y ) > η₁ the host is considered a normal user otherwise the test keep going on. Intuitively since when host is a scanner the mass function P [Y |H0] would assume a value that is bigger than P [Y |H1], leading the likelihood ratio under η0.

3.2.2 Entropy-Based

The concept of entropy in information theory was introduced by Shannon in the 1948 [60]. The entropy, in information theory, measures the disorder of a dataset. The entropy is maximum when the disorder of the analyzed data is maximum, that means that all the symbols present are sampled from an uniform random distribution. If the structure of the dataset presents some kind of order, symbols sampled by a biased distribution, the value of entropy is lower. The entropy is zero if the dataset is composed by only one symbol. Given the set of data D the entropy H(D) is computed like:

H(D) = X

∀j∈S

P (s_i)Log( 1 P (s_i))

where S is the alphabet of the dataset and P (si) is the probability that the generating source emits si. If this probability is not know, it is possible to compute the entropy with the same definition using the frequency of s_i. In literature are present work that use the measure of entropy of some field of network packet to detect malicious anomalies:

In [51] is present an empirical evaluation of how some kind of attacks affect the entropy computed on fields of TCP/IP packets. The experiments shows that attacks like DDOSs and Bandwith Flood influence the value of entropy calculated of some specific field of the packets, like destination port or source port. The sudden increase of the entropy of this fields is due to the nature of this kind of attacks that are made to exhaust the computational/bandwith resource of the attacked hosts. So their nature destroy the physiological statistical properties of the whole network stream. More subtle attacks

(25)

like network scan are not detectable because they do not alter the distribution statistics considered in their work. In [64] the entropy is applied to study the outbreak of two worms Blaster [17] and Witty [61]. The worms, during their spread phases, search vulnerable hosts. This is done using horizontal portscans. During the outbreak of the infection there is a consistent number of worms that are trying to replicate themselves, so in this scenario the single effect of one worm (a normal scan) is summed with the others affecting the statistical properties examined, specifically the entropy value for the destination ip field start increasing as soon as the outbreak starts. In [67] the entropy is computed per source and used as modification of the TPAS algorithm [62] that is an adaptation of TRW in which a failure is the condition that destination ip count

destination port count for some hosts is above a threshold k in a time bin t.

3.3 R-SYN

The R-SYN algorithm adapts and combines some of the previous techniques in order to discover scanning activities. The Half-open pattern is malicious so a count for source ip is kept. If for a source s this count H_open^s is above a threshold t the A^s_open variable is set to one. This variable indicates that the source s is probably an half-open scanner. Another variable A^s_v is used to flags a source that behaves like a vertical scanner. In order to detect this behaviour we keep the count of failure and successful connection grouped by the destination host. So the expression:

Λ_A^s_v(S) = P [F_s^h ∪ S_s|H₁] P [Fs∪ S_s^h|H0] ≥ v_t

is evaluated, where Fs is the set of failures directed towards F_s^h and S_s^h is the set of successful connection established with h. But is also possible that the scanner queries unreachable and reachable hosts this happens when an horizontal portscan is done, querying a consistent set of non-existent or firewalled hosts. So for each source s similar in the previous case the variable A^s_h is evaluated taking into account the set of connection issued by s this time grouped by destination port. In fact, the common value in an horizontal portscan is the destination port.

Λ_A^s

h(S) = P [F_s^p∪ S_s^p|H1] P [Fs^p∪ Ss^p|H₀] ≥ h_t

This three variables A^s_h, A^s_v and H_open^s can assume two values 0 and 1. The set of failures F_s is examined, a failure is a request to a closed port or to an unreachable host. Each fj ∈ Fs is reported with his multiplicity so it is possible to compute the entropy H(Fs)

(26)

this entropy value is compute on the fields < dest_ip, dest_port > of each failed connection.

This entropy is evaluated when one of this three variables is set to 1 so the total rank for the source s is computed.

Rank_s = (A^s_h+ A^s_v + A^s_open)H(F_s)

If this value is above a threshold rt the source is signalled. This algorithm as several advantages respect to the single algorithms that it uses and adapt. It is capable to detect vertical portscan, the normal TRW is not capable of that, moreover the entropy check take into account the multiplicity of each failures, that is not take into account by the TRW, shaping the detection possibilities.

3.4 LineFitting

The idea behind LineFitting is to exploit the geometric structure of a portscan attack.

The standard behaviour for a portscaner is to make few requests for each element in S, where S the set of target ports. The set F = A \ U is the set of inactive port in S, every request to an element in this set lead to a failure. Is not unrealistic to suppose that the set F is not only different from ∅ but that holds |F | ≥ |A|, this assumption it also made by [41]. So I will focus on the study of the set F . Or more precisely on the set F_h that is the multiset of failures generated by the host h. I use the multiset because the multiplicity of any failures is important, in case of a natural failure due to a misconfiguration, a dns crash or similar we have a Fh with few elements of elevated multiplicity. In case of a portscan we have a F_h with many elements of low multiplicity. So if we plot the F_h of an ideal portscan is depicted in Figure 3.4

The fitting curve of this kind of behaviour is a line y = q now for a non ideal portscan the line should be like y = bx + q with b near zero. The main idea behind this kind of technique is to find patterns that are similar to the ideal one. This kind of pattern can be found applying a linear fitting algorithm to the elements in F_h and then check the similarity between the obtained fitting line and the ideal one. The algorithm that we have used in our test is the follower: The check x is inlier is carried out using the mean and the standard deviation of the series m(F_h) ( m(F_h) is the list of multiplicity of all elements in F_h ) if m(x) is in the interval [kd − m , m + kd] it’s considered as inlier and can be counted for the linear fitting in the next step. For the line fitting we use the least squares. So named y_i^h the element i of m(F_h) we have that b, q are given by the follows:

b = Pn

i=1iy_i^h− ⁿ⁽ⁿ⁺¹⁾₂ y^h nPn

i=1i²− ⁽ⁿ⁺¹⁾ⁿ₂

(27)

<ip,port>

Multiplicity

Failures

Fitting Line

Figure 3.3: Ideal Portscan and corresponding Fitting line

q = y^hPn

i=1i²− ⁿ⁺¹₂ Pn i=1iy^h_i nPn

i=1i²− ⁽ⁿ⁺¹⁾ⁿ₂

the matching phase is a simple threshold check, if b and q are within a range the check is true.

3.5 Prototype

The algorithms are implemented using Esper and java. The architecture is centralized the main engine of correlation is Esper and each participants contribute to the dataset using Gateways installed in their domains.

Gateway Sniffs the network using low level libraries: libpcap [45]. The Gateway is made in java technology, in order to use a library like libpcap there is the need to use a third-party bridge. The library used is jnetpcap [38]. This software is capable of doing real time sniffing from a network interface or sniffing from a dump trace. Only the header of the packets is important, and specifically only the header of the three-way handshake packets. So the Gateway filter out this information wrap them in POJO objects and sends this POJOs to the central engine. All the POJOs are sent to the engine, since all the POJOs could contribute to the detection.

Esper The esper engine does all the correlation work. First patterns has been used in order to obtain by the 3-way handshake packets the status of the connection. That

(28)

Organization j Organization 1

Gateway I/O socket

adapter

sniffer TCP data

Organization N Gateway I/O socket

adapter

sniffer

. . .

POJOs

Esper CEP Engine

I/O socket

Main Engine Input Streams

Output Streams

Scanner list suspected IPs

TCP data

EPL Query EPL Query EPL Query

EPL Query EPL Query Subscriber

Internet

Figure 3.4: Architecture

could be of three type, to closed port, to open port to an unreachable host. A pattern is a common construct in CEP environment, it allows to find a time correlation between stream of events. For example an open connection can be identify with the following pattern:

SY N_seq → SY N − ACKack=seq+1,seq⁰ → ACK_ack⁰_=seq⁰+1,seq=ack+1

Tacking into account the source and destination address/port. In a similar way the pattern are used for the others three cases. The CEP engine keeps trying to find a match for a pending pattern till succeeds. This could lead very quickly to memory exhaustion, this issue is solved setting a time-out for pattern matching. If the pattern is not closed till some time t we tell to the engine to drop the matching procedure. This stadium is common to the two algorithms. Then the things becomes different, for R-SYN we have a rank subsystem made using the EPL queries, this rank subsystem collects for each sources the three variables A^s_h, A^s_v, H_open^s and is fed by the half-open subsystem constituted by a pattern that discover the half-open connections and per source threshold mechanism.

In case of the other two variables a counting and grouping is used connected with an subscriber in order to evaluate when and if the thresholds discussed in 3.3 is trespassed.

At the same time the system collects the failures generated by the source in a multi-set that will be used when there is the need to compute the entropy. This multi-set is indexed for source-ip. When the rank subsystem generates one event, the multi-set is interrogated with an on-demand snapshot query and the entropy is computed if the product of entropy and rank trespass the threshold the host is signalled. For the Line-fitting there is only the need to keep the multi-set of failures if one host reach the threshold or more than five failure directed towards distinct destination starts the computation of the fitting line. If

(29)

the line falls within the predetermined thresholds the host is signalled.

3.6 Evaluation

The algorithms proposed were tested using a prototype and a set of real traces. The evaluated metrics are the detection accuracy that measure the accuracy of the algorithms, and the detection latency that measure the reactivity of the algorithm during the detection.

3.6.1 Tests Scenario

Testbed For the evaluation a testbed consisting of a cluster of 10 Linux Virtual Ma- chines (VMs) has been used. Each of which equipped with 2GB of RAM and 40GB of disk space. The 10 VMs were hosted in a cluster of 4 quad core 2.8 Ghz dual processor physical machines equipped with 24GB of RAM. The physical machines are connected to a LAN of 10Gbit.The layout of the components on the cluster consisted of one VM dedicated to host the Esper CEP engine. Each of the remaining 9 VMs represented the resources made available by 9 simulated organizations participating in the collaborative processing system. Each resource hosted the Gateway component. In order to emulate a large scale deployment environment, all the VMs were connected with each other through an open source WAN emulator. The emulator is called WANem [12] and allows to set specific physical link bandwidths in the communications among the VMs, and the mini- mum and maximum delay of each packets. In this way the Testeb is capable of emulate a WAN environment.

Traces Five intrusion traces were used. The first four were used in order to test the effectiveness of our algorithms in detecting malicious port scan activities whereas the latter has been used for computing the detection latency. All traces include real network traffic of a network that has been monitored. The traces are obtained from the ITOC research web site [2], the LBNL/ICSI Enterprise Tracing Project [3] and the MIT DARPA Intrusion detection project [1]. The content of the traces is described in Table 3.1. In each trace, the first TCP packet of a scanner always corresponded to the first TCP packet of a real port scan activity. Each traces contains portscan attack found in the original dump and portscan attack added to traces.

3.6.2 Test Results

Detection Accuracy In order to assess the accuracy of R-SYN and Line Fitting, the traces were partitioned simulating the presence of 9 organizations participating in the

(30)

trace1 trace2 trace3 trace4 trace5

size (MB) 3 5 85 156 287

number of source IPs 10 15 36 39 23

number of connections 1429 487 9749 413962 1126949

number of scanners 7 8 7 10 8

number of pckts 18108 849816 394496 1128729 3462827 3way-handshake pckts 5060 13484 136086 883500 3393087 length of the trace (sec.) 5302 601 11760 81577 600 3way-handshake pckt rate (p/s) 0.95 22.44 11.57 10.83 5655

Table 3.1: Content of the traces

collaborative processing system; the resulting sub-traces were injected to the available Gateways of each participants in order to observe what the two algorithms were able to detect. To this end, I ran a number of tests considering four accuracy metrics (following the assessment described in [70]): (i) T P (True Positive) which represents the number of suspicious hosts that are detected as scanners and are true scanners; (ii) F P (False Positive) which represents an error of the detection; that is, the number of honest source IP addresses considered as scanners; (iii) T N (True Negative) which represents the number of honest hosts that are not detected as scanners; (iv) F N (False Negative) which represents a number of hosts that are real scanners that the system does not detect. With these values the Detection Rate DR and the False Positive Rate F P R were computed as follows: DR = _{T P +F N}^{T P} , and F P R = _{F P +T N}^{F P} .

The number False Positive are restricted only to two traces. In trace 4 of size 156MB, R-SYN exhibited a FPR equal to 3.4% against a FPR equal to 0% of Line Fitting; that is, R-SYN introduces 1 False Positive scanner. The same happens in trace 2 where Line Fitting introduces 1 False Positive scanner bringing the FPR in that traces to 14%.

The detection rate comparison is showed in figure 3.5. From the tests emerges that the collaboration can be beneficial for sharpening the detection of port scanners. In both algorithms, augmenting the number of participants in the collaborative processing system (i.e., augmenting the volume of data to be correlated) leads to an increase of the detection rate as computed above. However, the behavior of the two algorithms is different, most of the time Line Fitting (light grey bars in figure 3.5) converges more quickly to the highest detection rate compared to R-SYN (black bars in figure 3.5); that is, in Line Fitting a smaller number of participants to the collaborative processing system and then a lower volume of data are required in order to achieve 100% of detection rate.

This is principally due to a higher number of processing steps R-SYN executes and to R-SYN’s subscribers that have to accumulate packets in order to carry out their TRW computation. In addition, R-SYN examines both good and malicious behaviors assigning

(31)

0%

20%

40%

60%

80%

100%

1 2 3 4 Number of organiza/ons 5 6 7 8 9

Line Fi3ng vs R-‐SYN: (Trace1)

R-‐SYN DR Line Fi8ng DR

0%

20%

40%

60%

80%

100%

1 2 3 4 5 6 7 8 9

Number of organiza/ons

0%

20%

40%

60%

80%

100%

1 2 3 4 Number of organiza/ons 5 6 7 8 9

Line Fi3ng vs R-‐SYN: (Trace 3)

0%

20%

40%

60%

80%

100%

1 2 3 Number of organiza/ons 4 5 6 7 8 9

Figure 3.5: Port scan DR vs number of organizations in the collaborative processing system for R-SYN and Line Fitting algorithms. Each organization contributes to the processing with a number of network packets that is on average 1/9 of the size of the trace.

(32)

a positive score to good ones. This implies that in some traces R-SYN has to wait more packets in order to effectively mark IP addresses as scanners.

Detection Latency In the port scan attack scenario, the detection latency should be computed as the time elapsed between the first TCP packet of the port scan activity is sent by a certain IP address and the collaborative processing system marks that IP address as scanner (i.e., when it includes the address in the blacklist). Note that it is not possible to know precisely which TCP packet should be considered the first of a port scan, since that depends on the true aims of who sends such packet. As already said, in the test traces the first TCP packet of a scanner corresponds to the first TCP packet of a real port scan activity so that it is possible to compute the detection latency for a certain IP address x as the time elapsed between the sending of the first TCP packet by x and the detection of x as scanner.

In doing so, there is the need to obtain timestamps of the packets. For such a purpose a simple Java application named TimerDumping was developed, This application (i) takes a trace as input; (ii) sends the packets contained in the trace (according to the original packet rate) to the Gateway using a simple pipe; and (iii) maintains the timestamp of the first packet sent by each source IP address in the trace.

An instance of TimerDumping was deployed on each VM hosting the Gateway component. Each TimerDumping produces a list of pairs < ip address, ts >, where ts is the timestamp of the first TCP packet sent by ip address. The timestamps are then used as beginning events for detection latency computation. Since there are more TimerDumping instances, pairs with the same IP address but different timestamps may exist. In those cases the oldest timestamp is considered.

Timestamps are generated using local clocks of the hosts of the cluster. In order to ensure an acceptable degree of synchronization, all the clustered machines are configured to use the same NTP server which has been installed in a host located at the same LAN. The offset between local clocks is in the order of 10 milliseconds which is accurate for the test as latency measures are in the order of seconds.

For detection latency the test uses the trace of 287MB and changed the physical link bandwidths to the Esper in order to show in which setting one of the two algorithms can be preferable. Link bandwidth is controlled by the WANem emulator. The physical link bandwidth was varied using the WANem emulator with values ranging from 1Mbit/s up to 6.5Mbit/s. Figure 3.6 shows the average detection latency in seconds we have obtained in different runs of the two algorithms.

As illustrated in this Figure, for reasonable link bandwidths of a large scale deployment scenario (between 3Mbit/s up to 6.5Mbit/s) both algorithms show a similar behavior

(33)

0 50 100 150 200 250 300 350

1 2 3 4 5 6.5

Latency (sec)

Link Bandwidth (Mbit/sec) R-‐SYN: (Trace 287MB)

3 par&cipants 6 par&cipants 9 par&cipants

0 20 40 60 80 100 120 140 160

1 2 3 4 5 6.5

Latency (sec.)

Link bandwidth (Mbit/sec) Line Fi>ng: (Trace 287MB)

3 par&cipants 6 par&cipants 9 par&cipants

Figure 3.6: R-SYN and Line Fitting detection latencies in the presence of 3, 6, and 9 participants in the collaborative processing system.

with acceptable detection latencies for the inter-domain port scan application (latencies vary between 0.6 to 35 seconds). However, Line Fitting outperforms R-SYN in the presence of relatively low link bandwidths (looking at the left hand side of the curves, Line Fitting exhibits a detection latency of approximately 150 seconds when 9 participants are available against 250 seconds of R-SYN). In addition, in case of R-SYN, only, results show that when the collaborative system is formed by a higher number of participants (e.g., 9), detection latencies are better than those obtained with smaller collaborative systems. This is principally caused by the larger amount of data available when the number of participants increases: more data allow the algorithm to detect the scanners more quickly. In contrast, when 3 or 6 participants are available there is the need to wait more in order to achieve the final result of the computation. This behavior is not shown in case of Line Fitting for which an increased amount of information is not sufficient to overcome the drawback related to the congestion on low link bandwidths (e.g., 1Mbit/sec).

(34)

(35)

Chapter 4 Coordinated Portscan Detection

4.1 Introduction and Related work

A special category of portscan attack is the coordinated, also called distributed, scan.

This kind of attack is done by a single adversary that use a set of replicas to coordinate the attack issuing only few requests from each of them. This is done in order to game the normal IDS. In the conventional literature this kind of attack was defined for the first time in [63], but in the real life this attack was define, two years before, on Phrack Magazine by hybrid in the 1999 [36]. The system used by Staniford et al. can be divided into two apparatus: the network anomaly detector (Spade) and the correlation engine (Spice). Spade computes the probability distribution of the usual network traffic and assigns an anomaly score to an incoming packet. Unusual source and destination port/IP combinations are flagged as anomalous and sent to the correlation engine. Spice, maintains a graph in which the nodes represent packets and the connections between nodes contain weights indicating the similarity between the two packets. The weights are based on a combination of metrics, like source-ip similarity , destination similarity, time similarity or interval similarity (packet seen at fixed interval). The packet are clustered using simulated annealing and the remaining subgraphs represent network events, such as port scans, distributed port scans, denial-of-service attacks, and server misconfigurations.

However the system works in an offline-manner and the malicious agglomerates need to be categorized by an human operator. No tests have been presented regarding the detection of coordinated attack and the heuristics proposed aren’t studied in relationship with coordinated attack.

Robertson et al. [58] proposed a method for detected coordinated portscans under the assumption that the attack sources are on the same subnet, this is reasonable assuming that attackers compromise more easily group of machines belonging to the same domain exploiting common vulnerabilities. They call Group distance as the maximum distance for

(36)

which two IP addresses may be away from each other in order for them to be considered part of the scanning. They group together the probes for the scanner that are in the same group distance and check it a threshold is trespassed. The authors notes that ”The graphs show that relatively small group sizes very quickly include almost all probers [...]

almost all probers get indiscriminately grouped. Therefore, only small group sizes can be effective”, so there is clear limitation between the considered group sizes and the false positive rate moreover this strong assumption is also a limitation in case the attacker has compromised machines that are on different subnet.

Conti and Abdullah in [24] proposed a method that transforms portscan activity, including coordinated ones, in visual patterns recognisable by an human operator. The authors uses a technique called Parallel Coordinate Plot, that allows to plot in parallels different information, for example a graph external-ip internal-ip has two side. In the left we have on the y-axis all the external source ip, on the right we have all the internal- source ip, so an horizontal portscan from a single source will be plotted like a sheaf of line originated by one point on the left side (one ip) and each line is directed to a different point on the right side (the different targets). Another example a vertical portscan in this case a Parallel Coordinate Plot can be plotted using this policy <external-ip,internal- ip,external-port>. The plot is divide in 2 part in the first part is present a single line from the external ip source (the attacker) to the internal ip-address (the target), in the second part a sheaf of line center on the target spreads over the different ports probed by the attacker (Figure 4.1). The authors claims that this kind of patterns are easy to spot even in case of normal network activity but is not studied how this patterns could be automated detected or how this patterns could be contaminated by a persistent network noise or mutual interactions.

Gates in [31] defines a coordinated-portscan as ”multiple sets (scans) fit together in such a manner that some large portion of the entire space is covered while minimizing the amount of overlap between each of the sets.”, she point-outs a similarity between this problem and the set cover. On this similarity she develops an Altgreedy algorithms for the detection of coordinated portscan. The main limitation of this works is that it can correlate only sources that are already recognised as scanners, this limitation shows up when the attacker is using a set of replicas sized in such a way that a normal single source detection algorithm is not capable to detect even only one of the single sources as malicious. In this chapter a methodology similar to that adopted by Staniford is used. The sources are correlate using clustering over an error graph in order to find out malicious scan activities that are built in such a way to go undetected by the single source portscan detection algorithms, despite this similarity there are many differences.

Staniford in his work do not have the specific target to discover coordinate portscan,

(37)

neither use the same clustering techniques or rules. Moreover Staniford does is analysis for packets when the one proposed in this chapter use failed connections and sources. And for last the effectiveness of Staniford’s approach has not been tested in case of coordinated portscans.

External-Ip Internal-Ip Horizontal Portscan

Internal-Ip Internal-Port Vertical Portscan

External-Ip

Figure 4.1: Example of Visualization of two different attacks using the Conti et al.

framework

As we can see in the literature the coordinated portscan is an attack that is not well studied,as a matter of fact the survey on coordinated-attack by Zhou et al [68] doesn’t not report any trace of this attack, it is analysed the case of large scale stealthy scan in which one source targets many organization, the case studied in the chapter one

Collaboration This work uses a collaboration step in order to improve the detection of coordinated portscan, this is achieved using clustering and alert Similarity. This technique is used by the Collaborative Intrusion Detection Systems. The idea is to compute a function used to calculate the similarity between two pairs of alerts, then the similar events are evaluated with another function and clustered. In literature there are other works that follows this approach:

Cuppens [25] clusters alerts using expert similarity rules for example source-targets similarity, for each cluster a likelihood is used, this value is computed using a function that involves the number of IDSs that contributed to detect the cluster and the number of IDSs that didn’t generate alert for the cluster. If the cluster is above likelihood threshold it is merged to obtain a general alarm, this idea is quite interesting and is useful in case of coordinated attack detection, the merged alert should indicate precisely the sets of attackers and targets. The framework propose in this work is quite general and do not address in depth the methodology for single attack detection, so it is not known if this method is useful in case of coordinated attack, moreover the author says: ”In this experiment, the result of the merging function are quite straightforward. This is because

ACollaborativeProcessingSystemforCyberAttacksDetectionandCrimeMonitoring Universit`adegliStudidiRoma“LaSapienza”

Universit` a degli Studi di Roma

“La Sapienza”

Facolt` a di Ingegneria dell’Informazione,

Informatica e Statistica

Tesi di Laurea Magistrale in Ingegneria

Informatica

A Collaborative Processing System for Cyber

Attacks Detection and Crime Monitoring

Giuseppe Antonio Di Luna

Abstract

Contents

Chapter 1

Introduction

1.1 Scenario

1.2 Motivation

1.3 Thesis Contribution

1.4 Thesis Outline

Chapter 2

Collaborative Processing System

2.1 Comifin and Semantic Room

2.2 Focus of this work

2.2.1 CEP and Esper

Chapter 3

Portscan Detection

3.1 Overview

3.1.1 Portscan: A Taxonomy

3.2 State of the art

3.2.1 Threshold Random Walk

3.2.2 Entropy-Based

3.3 R-SYN

3.4 LineFitting

3.5 Prototype

3.6 Evaluation

3.6.1 Tests Scenario

3.6.2 Test Results

Chapter 4

Coordinated Portscan Detection

4.1 Introduction and Related work