A Collaborative Environment for Customizable Complex Event Processing in Financial Information Systems

(1)

A Collaborative Environment for Customizable Complex

Event Processing in Financial Information Systems

[Regular Paper (it can be also considered as Industry Paper)]

ABSTRACT

We present a framework to enable financial sector customers (such as banks, credit card companies, etc.) to build collaborative protection systems to guard against coordinated Internet-based attacks. The essential element of this framework is a new programming abstraction, called Semantic Room (SR), through which interested parties can process data and share information and computing resources in a trusted and controlled fashion. To this end, each SR is associated with a contract which, among other things, specifies its functionality (e.g., botnet and stealthy scan detection), QoS, and a set of rules governing its membership. We present the design of two event processing systems that we implemented to support the SR functionality in large distributed settings, and show how they can be used for detecting stealthy scans and man-in-the-middle attacks.

Keywords

collaborative architectures for event processing distributed event processing Financial Infrastructure Federated event based systems Domain specific deployments of event-based systems, MapReduce

1. INTRODUCTION

The growing exposure to the Internet has made financial institutions increasingly vulnerable to a variety of security related risks, such as increasingly sophisticated cyber attacks aiming at capturing high value (or, otherwise, sensitive) information, or disrupting service operation for various purposes. Such security attacks result in both short and long term economic losses due to the lack of service availability and infrastructural resilience, and the decreased level of trust on behalf of the customers. This is why such attacks are categorized as an operational risk by the Basel Commit- tee on Banking Supervision in the first pillar of the Basel II accord [3].

To date, these attacks have been faced in isolation by the

single financial institution using several tools that re-enforce their defense perimeter (e.g. intrusion detection systems, firewalls etc). These tools detect possible attacks by exploiting the information available from the logs maintained at the financial institution, for example, carefully looking whether there exists some host that performs suspicious activities within certain time windows. Nowadays, attacks are more sophisticated making this kind of defense inadequate.

Specifically, attacks are distributed in space and time: Dis- tributed in space means that attacks are coordinated on a large scale basis and originate from multiple geographically dispersed locations. They are also distributed in time often consisting of a preparation phase spanning over several days or weeks, and involving multiple preparatory steps aiming at identifying vulnerabilities and attack vectors (such as ac- cidentally open ports) [7, 8, 10]. To detect these attacks a more large view of what is happening in the Internet is required, that could be obtained by sharing and combining the information available at several financial institutions. This information has then to be processed and correlated on-the- fly in order to detect threats and frauds. Even though this sharing can result in a great advantage for financial institutions, it should be carried out only on a clear contractual base and in a trusted and secure environment capable of ensuring privacy and strict confidentiality requirements of financial institutions. Therefore, there is an urgent need of establishing contractually regulated collaborative environments that can be created in a structured and controlled manner, where participants can belong to different organi- zational, administrative and geographical domains.

In this paper we introduce a novel programming abstraction called Semantic Room (SR) that enables the construction of such collaborative contractually regulated environments.

These environments perform distributed event aggregation and correlation on the data provided by organizations participating in the SR with the aim of monitoring widely distributed infrastructures and providing early detection of attacks, frauds and threats. Each SR has a specific strategic objective to meet (e.g., botnet detection, stealthy scan, Man-In-The-Middle) and is associated with both a contract, which specifies the set of rights and obligations for governing the SR membership, and different software technologies that are used to carry out the data processing and sharing within the SR. SRs can communicate with each other; that is, the output of an SR can be used as input of another SR, thus creating the conditions for the deployment of a modular environment.

(2)

The paper describes also a framework that can be used in order to support the SR programming abstraction; a number of design alternatives for the framework are discussed.

One of the its key design features is the ability to clearly separate the SR management from the complex event processing and sharing. The former is provided using basic SR software that guarantees the construction of the trusted and contractually regulated environment (i.e., the SR) necessary for the execution of a secure event processing computation.

The latter is performed within the SR and can be developed in a custom way according to the SR objective, the available computational resources, and the different event processing technologies that can be deployed. In other words, the SR is agnostic with respect to the event processing technology (e.g. [17], [9]) and paradigm (i.e., centralized event processing vs distributed event processing) being used. It will be responsibility of the entity that instantiates the SR to cus- tomize the processing so as to meet the SR objective and the contractual obligations.

In order to validate the framework earlier mentioned, we present two different instances of SRs; namely, an SR for stealthy scan detection that uses MapReduce technologies [23] for processing purposes, and an SR for Man-In-The- Middle attack detection that uses a DHT-based processing.

The rest of this paper is structured as follows. Section 2 introduces related work. Section 3 describes the Semantic Room abstraction highlighting the principal elements that are used for defining it. Section 4 presents the framework we have designed in order to support the construction, deployment and execution of SRs. Section 5 describes two different SRs that can be instantiated using the framework previously introduced, and finally Section 6 discusses the main conclusions of this work and future works.

2. RELATED WORK

Detecting event patterns, sometime referred to as situations, and reacting to them are in the core of Complex Event Pro- cessing (CEP) and Stream Processing (SP), both of which play an important role in the IT technologies employed by the financial sector as overviewed in [15]. In particular, IBM System S [17] has been used by market makers in processing high-volume market data and obtaining low latency results as reported in [35]. System S as other CEP/SP systems, e.g. [16, 29], are based on event detection across distributed event sources.

The issue of using massive complex event processing among heterogeneous organizations for detecting network anomalies and failures has been suggested and evaluated in [22]. Also the usefulness of collaboration and sharing information for telco operators with respect to discovering specific network attacks has been pointed out in [32, 34]. In these works, it has been clearly highlighted that the main limitation of the collaboration approach concerns the confidentiality requirements. These requirements may be specified by the organizations that share data and can make the collaboration itself hardly possible as the organizations are typically not willing to disclose any private and sensitive information.

In our framework the notion of SR can be effectively used in order to build a secure and trusted environment, which can be enriched with the degree of privacy needed by the partic-

ipants where a massive and collaborative event processing computation can occur.

Recently, collaborative approaches addressing the specific problem of Intrusion Detection Systems (IDSs) have been proposed in a variety of works [4, 25, 36, 27]. Differently from singleton IDSs, collaborative IDSs significantly improve time and efficiency of misuse detections by sharing information on attacks among distributed IDs from one or more organizations [37]. The main principle of these approaches is that there exist local IDSs that detect suspects by analyz- ing their own data. These suspects are then disseminated using possibly peer to peer links. This approach has two main limitations: it relies on data that can be freely ex- changed among the peers and it does not fully exploit the information seen at every site. The former constraint can be very strong, especially in the financial context we are evaluating, due to the data confidentiality requirements that are to be met. Our framework has the merit to address this specific issue as it is designed so as to provide different levels of anonymization of the data the organizations can inject into the SR. Moreover, our framework aims at processing data injected into the SR so as suspects are devised through the exploitation of all the data made available by every participating organization. The main advantage of this approach is that the space and time window used to detect complex attacks can be surely enlarged, thus sharpening the detection accuracy.

3. THE SEMANTIC ROOM ABSTRACTION

A Semantic Room (SR) is a federation of financial institutions formed for the sake of information processing and sharing. The financial institutions participating in a specific SR are referred to as the members of the SR.

The SR abstraction is defined by the three following principal elements:

• contract: each SR is regulated by a contract that de- fines the set of processing and data sharing services provided by the SR along with the data protection, privacy, isolation, trust, security, dependability, and performance requirements. The contract also contains the hardware and software requirements a member has to provision in order to be admitted into the SR. For the sake of brevity we do not include in this paper SR contract details; however, the interested readers can refer to [1] for further information;

• objective: each SR has a specific strategic objective to meet. For instance, there can exist SRs created for implementing large-scale stealthy scans detection, or SRs created for detecting Man-In-The-Middle attacks;

• deployments: each SR can be associated with different software technologies, thus enabling a variety of SR deployments. The SR abstraction is in fact highly flex- ible to accommodate the use of different technologies for the implementation of the processing and sharing within the SR (i.e., the implementation of the SR logic or functionality). In particular, the SR abstraction can support different types of system approaches to

(3)

Semantic Room!

Communication!

Complex Event Processing and Applications!

Internet!

!"#$%&&%'(')*)(

!"#$%&#'()

*+#,-./0+#1))

!"#$%&&%'(')*)((

!%2$%&#'()*+#,-./0+#1)) 3'4)5'$')

Figure 1: The Semantic Room abstraction

the processing and sharing; namely, a centralized approach that employs a central server (e.g., Esper [9]), a decentralized approach where the processing load is spread over all the SR members (e.g., a DHT-based processing), or a hierarchical approach where a pre- processing is carried out by the SR members and a selected processed information is then passed to the next layer of the hierarchy for further computations.

Figure 1 depicts the SR abstraction. As shown in this Fig- ure, the SR abstraction supports the deployment of two components termed Complex Event Processing and Appli- cations, and Communication which can vary from SR to SR depending on the software technologies employed to implement the SR processing and sharing logic, and a set of management components, that together form the marble rectan- gle in Figure 1 and that are exploited for SR management purposes (e.g., management of the membership, monitoring of the SR operations). Section 4 describes in detail all these components.

SR members can inject raw data into the SR. Raw data may include real-time data, inputs from human beings, stored data (e.g., historical data), queries, and other types of dynamic and/or static content that are processed in order to produce complex processed data. Raw data are properly managed in order to satisfy privacy requirements that can be prescribed by the SR contract and that are crucial in order to effectively enable a collaborative environment among different, and potentially competitive, financial institutions.

Processed data can be used for internal consumption within the SR: in this case, derived events, models, profiles, black- lists, alerts and query results can be fed back into the SR so that the members can take advantage of the intelligence provided by the processing (Figure 1). SR members can, for instance, use these data to properly instruct their local security protection mechanisms in order to trigger informed and timely reactions independently of the SR management. In addition, a (possibly post-processed) subset of data can be offered for external consumption. SRs in fact can be willing to make available for external use their produced processed data. In this case, in addition to the SR members, there can exist clients of the SR that cannot contribute raw data directly to the SR but can simply consume the SR processed data for external consumption (Figure 1). SR members have full access to both the raw data members agreed to contribute to by contract, and the data being processed and thus output by the SR. Data processing and results dissem-

ination are carried out by SR members based on obligations and restricts specified in the above mentioned contract.

4. THE FRAMEWORK

The framework that supports the Semantic Room abstraction over a pool of (locally and geographically) distributed computational, storage, and network resources is shown in Figure 2.

In the framework we can clearly identify two principal layers: the SR management layer which is responsible for the management of the SR, and the Complex Event Process- ing and Applications layer which realizes the SR processing and sharing logic. In addition, all the architectural components of both layers above can utilize various commodity services for (i) exchanging control and monitoring information among them (such as load and availability), (ii) manag- ing resource allocation to the complex event processing and applications both off-line and at run time through the use of controllers and schedulers, and (iii) storing processing state and data. In the following subsections we describe in more detail the layers of our framework.

4.1 Semantic Room Management

This layer is responsible for supporting the SR abstraction on the top of the individual processing and data sharing applications provided by the Complex Event Processing and Applications layer. It embodies a number of components fulfilling various SR management functions. Such functions include the management of the entire SR lifecycle (i.e., cre- ation of an SR, instantiation of an SR, disband of an SR, management of the SR membership), the registration and discovery of SRs and SR contracts, the configuration and planning of SRs, the management of the communications among different SRs, and the management of trust and reputation within SRs [2]. In addition, each SR member inter- faces an SR through the use of a component of this layer termed SR gateway. This component transforms raw data into events in the format specified by the SR contract. In general, this transformation is necessary as depends on the specific SR objective to be met and comprises three distinct pre-processing steps; namely, filtering, aggregation, and anonymization. This latter consists of applying different anonymization techniques in case privacy and confidentiality requirements are prescribed by the SR contract.

4.2 Commodity services

A number of services can be used by both layers previously mentioned in order to provide functionalities such as communication, storage, resource and contract management, and monitoring. These services are transversal to the other layers and are described in isolation in the following, highlighting several design alternatives available at the state of the art and that can be used for their implementation.

Storage Services. The Storage Service layer of the architecture consists of a collection of components providing various kinds of storage services to the other layers. This layer can embody services for long term storage of large data sets, such as monitoring logs and historical data (HDFS [5] is an option we currently use), and for low latency storage of limited amounts of real-time data (an option in this case we use is WebSphere XS [6]).

(4)

!"#$%&'"()*+((

'$*,&)',(

-)*)."/"*,(

012#3')4(!"#$%&'"#(

5$&67$8(

-)*)."&(

04)'"/"*,(

9$*,&$44"&(

:'1"+%4"&#(

5";(

<==#(

5";(9$*,)3*"&(

:!>( :!?( :!@( :!A(

:!(-)*)."/"*,(

-)*)."&(:!( :!(!".3#,&2( :!(

9$**"'BC3,2(

:!(D),"8)2(

<*)42B'#(<==#(

<*)42B'#(9$*,)3*"&(

EC"*,(

0&$'"##3*.(<==#(

EC"*,(=&$'"##3*.((

9$*,)3*"&(

:,&")/3*.(

F(

-"##).3*.(

!"##$

%&'()"%*

G$)+(

H)4)*'3*.(

9$/=4"I(EC"*,(0&$'"##3*.()*+(<==43')B$*#(

J*$84"+."(

H)#"(-./,(

<==#(

K34"(

:2#,"/#(

D&$%=(

9$//%*

3')B$*(

L*M$&/)B$*(:!(

-)*)."/"*,(

0%;43#1N :%;#'&3;"(

:,$&)."(

:"&C3'"#(

L*N /"/$&2(

#,$&)."(

OH(

-",&3'#(-$*3,$&3*.(

Figure 2: The framework for building collaborative protection environments

Communication. The Communication layer consists of a collection of components providing various kinds of communication services to the other layers. This layer can include large-scale group communication services [19], [18], re- liable low-latency, high throughput message streaming services that are useful for supporting real time event streaming from within the external components into an SR, and publish/subscribe services [21].

Resource and contract management. The Resource and contract management layer is responsible for allocat- ing physical resources (such as computational, storage, and communication) to the SRs so as to satisfy the business ob- jectives (such as performance goals, data and resource sharing constraints, etc.) prescribed by the SR contracts. A capacity planning study might be completed in the preliminary phase of an SR startup. In such a way, the Resource Management layer “can be aware” of the maximum capacity that each SR has to provide in terms of computational power, throughput, memory and storage. To this end, this layer can include a scheduler and placement controller for initial allocation of the services to the physical resources, and a runtime load balancer for possible dynamic re-allocations [30, 26].

As each SR can count on a set of (locally and geographically) distributed data and computational resources, we find con- venient to consider the following three alternatives for the SR deployment that can be all three supported by our framework:

• SR-owned platform: the computational resources of each SR are owned by its members, although one member is deputy as an SR administrator. The computational platform of an SR is fully dedicated to the complex event processing and applications of that SR.

• Third party-owned platform: the computational resources are owned by a third party. The computational resources could be shared among the complex event processing and applications of the SRs. The collocation and data flows can be subject to some restrictions specified by the SR contract. This alternative corresponds to the so called “SR as a service”.

• Mixed platform: the computational resources of each SR are owned by the members of that SR. This platform runs the logic of its SR, but it can occasionally offer hosting services for running the logic of other SRs.

This case is allowed only after explicit request coming from another SR where its complex event processing and applications exceed the capacity of its computational platform and implies some business (and trust) relationship among the involved SRs.

Metrics Monitoring. The Metric Monitoring layer is responsible for monitoring the architecture in order to assess whether the requirements specified by the SR contract are effectively met. As the Metrics Monitoring is a transversal layer of the framework, it operates at both the SR Manage- ment and Complex Event Processing and Applications layers. In particular, at the SR Management layer it is in charge of periodically collecting monitoring information related to the management of SRs (e.g., SR membership information) in order to detect whether that management violates the requirements included into SR contracts. The Metrics Mon- itoring keeps track of the dynamic behavior of the SRs and check whether or not SRs and SR members themselves are honoring their respective contracts. In case the Monitoring detects that SR contracts are close to be violated, it inter- acts with SR Management components in order to trigger proper reconfiguration activities.

At the Complex Event Processing and Applications layer (see below), the Metrics Monitoring is in charge of periodically evaluating whether or not the resource management required by this layer is effectively able to support the execution carried out within this layer. In addition, it is responsible for detecting whether or not the processing execution violates all those requirements specified into the SR contracts.

The Metrics Monitoring uses “sensors”, possibly located at physical resource and container levels, in order to obtain the set of information required for enforcing the metrics of interest (in our current implementation we favored the use of Nagios monitoring technology [28] for metrics monitoring purposes).

(5)

4.3 Semantic Room Complex Event Process-

ing and Applications

This layer consists of applications implementing the data processing and sharing logic required to support the SR functionality. A typical application being hosted in an SR will need to fuse and analyze large volumes of incoming raw data produced by numerous heterogeneous and possibly widely distributed sources, such as sensors, intrusion and anomaly detection systems, firewalls, monitoring systems, etc. The incoming data will be either analyzed in real-time, possibly with assistance of analytical models, or stored for the subsequent off-line analysis and intelligence extraction. This suggests a characterization of the applications supported by an SR, whose runtime instances can be hosted within various runtime container components. In particular the application containers, which can be either standalone or clustered can be the following:

• Event Processing Container: This container is responsible for supporting event-processing applications in a distributed environment. The applications manipulate and/or extract patterns from streams of event data arriving in real-time, from possibly widely distributed sources, and need to be able to support stringent guarantees in terms of the response time and/or throughput.

• Analytics Container: This container is responsible for supporting parallel processing and querying massive data sets on a cluster of machines. It will be used for supporting the analytics and data warehousing applications hosted in the SR.

• Web Container: This container will provide basic web capabilities to support the runtime needs of the web applications hosted within an SR. These applications support the logic enabling the interaction between the client side presentation level artifacts (such as web browser based consoles, dashboards, widgets, rich web clients, etc.) and the processing applications.

Different implementations of the Complex Event Processing and Applications layer can be supported by our framework.

In particular, it can be possible deploying an SR that uses a central server for the implementation of both event processing and analytics containers. In this case, financial institutions of the SR send their own data to the central server (an example of a centralized event correlation engine that can be used in this case is Esper [9]). The central engine performs the correlation and analysis of the data and sends back to the financial institutions the generated processed data to let each financial institution adopt its own countermeasures in a timely fashion.

However, although this solution is fully supported by our framework, it suffers from the inherent drawbacks of a centralized system. The central server may become a single point of failure or security vulnerability: if the server crashes or is compromised by a security attack, the complex event processing computation it carries out can be unavailable or jeopardized. In addition, the volume of events the central server can process in the time unit is limited by the server’s

processing and bandwidth capacities, thus limiting the system scalability. Therefore, in our current implementation of the architecture we favored the use of technologies that allow us to realize a decentralized complex event processing.

In particular, we used both MapReduce [23] and DHT-based technologies for implementing the specific SR logics, as described in the next section.

5. SEMANTIC ROOM INSTANCES

In order to show the flexibility of the proposed SR abstraction, we describe in this section two SRs that differ one another for both the objective to fulfill and the deployed software technologies used to implement the processing and sharing logic and thus meet the objective. Specifically, we introduce in the next subsections an SR for collaborative intrusion detection which deploys a MapReduce [23]-based technology and an SR for collaborative Man-In-The-Middle detection which distributes the load among processing elements through a DHT. The two SRs are at a different level of maturity: the SR for intrusion detection has been success- fully deployed and tested, whereas the one for collaborative Man-In-The-Middle detection is currently under implementation and its complete development and testing is foreseen by June 2010.

5.1 Semantic Room for Collaborative Intru-

sion Detection

In this section, we discuss a specific instantiation of an SR, to which we refer as ID-SR, whose objective is to prevent potential intrusion attempts by detecting stealthy port scan- ning activity. The subjects of the attack are the web servers handling the external web connectivity of the participating financial institutions. Those web servers typically run outside the corporate firewall (in DMZ), and are therefore, frequently targeted by the attackers. The goal of the attack is to identify TCP ports that might have been left opened at the attacked subjects. The attack is carried out by initi- ating a series of TCP connections to ranges of ports at each of the targeted DMZ servers. The ports that are detected as opened can be used as the intrusion vectors at a later time.

The attack detection is based on identifying patterns of unusually high number of TCP SYN requests possibly targeting an unusually high number of ports, and originating from the same external IP address. The statistics are collected and analyzed across the entire set of the ID-SR participants, thus improving chances of identifying low volume activities, which would have gone undetected if the individual participants were exclusively relying on their local protection systems. In addition, to minimize the amount of false positives, the real-time suspicions are periodically calibrated through a reputation system which maintains the site ranking based on the past history of the malicious activities originating from those sites.

To support data analysis, we implemented a distributed event processing system, called Agilis, which is described in the next section. The analysis logic employed by the ID- SR is detailed in Section 5.1.2, and our initial experience with using the Agilis-based ID-SR prototype is presented in Section 5.1.3.

(6)

Analysis Results

GW

MR WX

S

MR

WX S

HDF S SR Gateways

GW

WXS Management

Cat 1 Cat 2 Cat 2

Hadoop/HDFS Management

JT TT ZK CM

WXS

Agilis front-

end

Processing logic (Jaql), Config

Admin Console

Figure 3: MapReduce-based Semantic Room

5.1.1 The Agilis System

Agilis (see Figure 3) consists of a distributed network of processing and storage elements hosted on a cluster of machines allocated from the ID-SR hardware pool (as prescribed by its contract). The processing is based on the Hadoop’s MapRe- duce framework [23]. The processing logic is specified in a high-level language, called Jaql [12], which compiles into a series of MapReduce jobs. To improve detection latency, the mappers and reducers communicate through buffers stored in the main memory storage system, called IBM WebSphere eXtreme Scale (WXS) [6]. The individual components of the Agilis’ framework are illustrated in Figures 3 and 4, and described in detail below.

WebSphere eXtreme Scale (WXS).

WebSphere eXtreme Scale (WXS) is a distributed main memory-based storage system implemented in Java. It allows the user data to be organized into a collection of maps consisting of either relational records, or key-value pairs. At runtime, the data are stored in Data Servers or containers hosted on a cluster of machines. The clients can query the stored data using either a simple get/set API, or full-blown SQL queries. The queries can be executed either on the client, or within a container using an embedded SQL engine.

For scalability, the map’s data can be broken into a fixed number of partitions, which would then be evenly distributed among the WXS containers by the WXS runtime. In addition, for fault tolerance, each map partition can be replicated on a configured number of containers. The information about the operational containers as well as the layout of hosted map partitions and their replicas is maintained at runtime in the WXS Catalog service, which is typically replicated for high availability.

Processing framework.

The processing is carried out on the machines within the ID-SR cluster, and orchestrated

through the optimized Hadoop scheduling framework. The latter consists of a centralized Job Tracker (JT) which coor- dinates the local execution of mappers and reducers on each of the ID-SR nodes through a collection of Task Trackers (TT) (one per machine).

Most of our scheduling optimizations were targeted at improving locality of processing by scheduling the map tasks close to the WXS partitions holding their respective input splits. To match the input splits with the WXS partitions, we provided a new implementation of the Hadoop’s InputFormat interface which was packaged with every Agilis’

MapReduce job submitted to JT. Subsequently, the getSplits method of this interface was used by JT to determine the split locations at runtime (which was obtained by interrogat- ing the WXS Catalog service); and the createRecordReader method to create an instance of RecordReader to read the data from the corresponding WXS partition. To further improve locality, our implementation of RecordReader rec- ognized the SQL select, project, and aggregate queries (by interacting with the Jaql interpreter), and delegated their execution to the SQL engine embedded into the WXS container.

In many cases, this approach resulted in a substantial re- duction in the volumes of intermediate data reaching the reducers thus improving latency, bandwidth utilization, and reducing processing costs. It also allowed us to further enforce privacy of the input data submitted by the individual ID-SR members by scheduling the initial map processing on the machines residing within their administrative bound- aries.

Long-term data storage.

Hadoop File System (HDFS) [5]

is used to provide storage services for massive amount of data that should be preserved over time (such as e.g., historical data keeping track of past attacks). The data stored in HDFS can be injected into Hadoop through the provided

(7)

Client !

machine ^Tracker^{Job !}

Jaql ! Interp!

reter

Hadoop’s Map-Reduce

Task!

Tracker Task!

Tracker

Jaql query

Hadoop Job

•!Jaql snippets for M & R

•!Jaql interpreter

•!InputFormat, OutputFormat

Distributed In-Memory Store (WXS) Storage !

container Storage !

container

Cat 1

Cat 2

Agilis

Figure 4: The Components of the Agilis Runtime.

HDFS InputFormat implementation, and combined with the WXS data using the Jaql I/O constructs. HDFS is managed and kept consistent by the Hadoop’s Chunk Manager (CM), and Zookeeper (ZK) services.

The Jaql language.

The processing logic is expressed in a high-level language, called Jaql. Jaql supports SQL-like query constructs that can be combined into flows. It can also interoperate with a large variety of data sources due to its use of the standardized JSON data model. As shown in Figure 4, in Agilis, the locally compiled Jaql flows are first augmented with the input and output formats to interoperate with WXS, and then submitted to the modified Hadoop scheduler, which orchestrates their execution on the ID=SR machines as explained above.

5.1.2 The ID-SR Processing Steps

The processing steps followed by the ID-SR implementation are depicted in Figure 5(a). At the fist step, the raw data capturing the current networking activity at each of the participating machines (as output by the tcpdump utility) is collected using the tcpdump utility and forwarded to the local ID-SR gateway. Each gateway will then normal- ize the incoming raw data producing a stream of LogEvent records of the form: hsourceIP, destinationIP, sour- cePort, destinationPort, bytesSent, bytesReceived, returnStatusi. The LogEvent records are stored in a WXS partition hosted on a locally deployed WXS container.

The incoming LogEvent records are then processed by a collection of MapReduce jobs handled by Agilis. The processing logic consists of the following steps: First, the input records are subjected to the Summarization flow (see Fig- ure 5(b)) which consists of two processing steps surrounded by two I/O steps (for reading the input, and writing the results). The outcome of the two processing steps is a collection of summary records of the form h sourceIP, port- sNum, reqNum i representing for each source IP address (sourceIP) the number of distinct ports (portsNum) accessed from sourceIP along with the total number of requests (reqNum) originating from sourceIP.

The summary records are then fed into the Blacklisting flow (see Figure 5(b)), which will blacklist a source IP address if the number of requests and distinct ports accessed from that IP address exceed fixed limits. In addition, the summary records are also joined with the historical records of the form h sourceIP, rank i using sourceIP as a key to adjust the long-term rank representing the IP address threat level.

The historical records are used to periodically calibrate the blacklist by excluding the IP addresses whose ranks fall below a fixed threshold.

5.1.3 The ID-SR Prototype

In this section, we report on our initial experience with deploying and testing the ID-SR prototype on a small cluster of 8 Linux Virtual Machines (VMs), each of which equipped with 2GB of RAM and 20GB of disk space. Large scale experimental study using PlanetLab[14] is currently in progress.

The layout of the Agilis components on the cluster was as follows: One of the VM’s was dedicated to host all of the Agilis management components: Hadoop Name Node, Job Tracker, and WXS Catalog Server. Each of the remaining 7 VM’s represented a single ID-SR participant, and hosted Data Node, Task Tracker, WXS Data Servers (containers), and an external web server (which was the attack subject).

To assess the accuracy and timeliness of the port scan attack detection, we measured the Agilis performance in 3 experimental scenarios. All scenarios involved a single intruding host that generated a series of TCP/SYN requests targeting a fixed set of 300 unique ports on each the 7 attacked servers. In each scenario, the requests were injected at constant rate which was set to 10, 20, and 30 requests/server/second for the first, second, and the third scenario respectively. The ratio of the attack to the legitimate traffic per server was 1:5 resulting in the total (malicious and legitimate) incoming event traffic at each server of 60, 120, and 180 requests/second for each of the three scenarios. The blacklisting threshold was set to 20,000 requests and 1000 unique ports. The Jaql flow in Figure 5 was compiled into 3 MapReduce jobs whose total running time was 110 sec on average. The jobs were executed periodically, once every 4

(8)

GW1

GW2

GW3 WXS Part 1

WXS Part 2

WXS Part N

Historical Safety Ranking

Black List

TCPDump

Normalized Data:

[LogEvent]*

TCPDump

Summarized Data:

[SourceIP, rNum, pNum]*

Summ arizati on

Blacklis ting

Rankin g Summarized

Data

[SourceIP,rank]*

Calibr ation

Parallelized Map/Reduce Jobs Summarization:

read($ogPojoInput(”LogEvents”,

”dataObj.LogEvent”,

”returnStatus = ’SYN’”))

→ group by $ip port =

{$.sourceIp, $.destPort, $.destIp} as $rs into {$ip port.sourceIp, numReq: count($rs)}

→ group by $ip = {$.sourceIp}

into {ip: $ip.sourceIp, portsNum: count($), reqNum: sum($[*].numReq)}

→ write($ogPojoOutput(”LogEventsSum”,

”dataObj.LogEventSum”, ”ip”));

Blacklisting:

read($ogPojoInput(”LogEventsSum”,

”dataObj.LogEventSum”,

reqNum > 10000 AND portsNum > 200”))

→transform $.ip

→write($ogPojoOutput(”BlackList”,

”dataObj.HistoryData”, ”ip”));

(a) (b)

Figure 5: Data Flow and Jaql Query Fragments Used for Port Scan Detection in ID-SR

minutes.

In all of our experimental runs, Agilis was able to absorb the entire volumes of the incoming event traffic without experi- encing overload, and correctly blacklist the single intruding host (and none of the others). The detection times (i.e., the time between the beginning of the scan and the intruder blacklisting) for the three scenarios were on average 700, 430, and 330 seconds respectively. This indicates that Agilis was able to take advantage of extra data points to improve the detection latency. Also, note that since the total number of unique ports accessed at each of the 7 attacked subjects in all three scenarios never exceeded 1000 (the detection threshold), the ID-SR participants would have been never able to correctly identify the intruding machine unless they cooperated and shared data through Agilis.

5.2 Semantic Room for Collaborative Man-In-

the-Middle Detection

MitM makes a legitimate user start a connection with a malicious server that mimics the legitimate server behavior. Dif- ferent techniques can be used for obtaining such behavior:

DNS cache poisoning [24], compromise of an intermediate routing node [20], phishing [31], exploit vulnerabilities of authentication mechanisms [33]. In general, the malicious server stores the user credentials, relays them to the licit server, and forwards the response to the user on behalf of the licit server. The MitM node eavesdrops all data flow- ing between the client and the server, thus accessing a great amount of sensitive information. MiTM attacks at a single node can be usually detected looking at anomalies in the statistics of Web and Application servers accesses, identifying an IP address that contacts ”too many” services with credential belonging to different users. The SR programming abstraction can then effectively used: it enables an aggregation and correlation of data logs coming from different SR members that can reduce significantly the detection time of such attacks as the same IP address could also attack services of other members of the SR.

The event processing carried out in such SR can be summarized as follows. There exist Processing Elements (PEs) that are arranged in a DHT, as shown in Figure 6. Each PE implements the event processing container described in the previous section. The container receives events from different SR gateways. A single SR gateway pre-processes events produced by raw sources of information that reside at a financial institution e.g., logs of Web or Application Server technologies such as Tomcat, JBoss, deployed by an institution in order to provide their customers with such financial services as e-banking, e-payment and e-trading. The raw data injected into the SR gateway are filtered, aggregated and anonymized (if privacy requirements are mandated by the SR contract). The output of the SR gateway is an event which is injected into the SR. The format of the events is hsource ip, session idi, where source ip is the address originating a connection to a financial institution server, and session id is a concatenation of a unique identifier for the financial institution offering a financial service, the requested service URL and the user id that authenticated to the service. These information are sufficient in detecting MiTM if there exists a suspicious number of connections originating from the same source ip towards financial services with different user credentials, i.e., different session ids.

Each PE is constructed out of three main subsystems (Fig- ure 6); namely, the event manager, overlay manager and complex event processing engine. These subsystems perform the following functions:

• the Event Manager receives events from the SR gateways and submits the events to the Overlay Manager (see below). Moreover, it receives the results of the complex event processing from a CEP Engine implemented by the PE, and disseminates the results (i.e., alarms) to the SR gateways of the financial institutions members of the SR.

• the Complex Event Processing (CEP) Engine processes events received from the Overlay Manager and submits the analysis results to the Event Manager. The

(9)

Figure 6: The DHT-based Semantic Room and The Processing Element Architecture.

processing required to identify MitM attacks is based on the computing of statistical anomalies on received events. These anomalies can be abstracted by the following high level attack pattern: given a time windows with m events, find at least n events with the same source ip and different session ids. Currently, this subsystem is implemented without using general purpose CEP engines available on the market. How- ever, we are evaluating its implementation with either JBoss Drools Fusion [13] or Esper [9].

• the Overlay Manager (OM) receives new events from the Event Manager and forwards them on the DHT, where the DHT key is obtained by hashing the source ip address field of the event. The DHT we have used in our preliminary implementation is FreePastry [11], since its routing based on proximity improves performance by exploiting locality of nodes.

Once the addresses of malicious servers have been identi- fied, they can be blacklisted and shared among SR members in order to block them, audit internal servers and identify customers that may have suffered the attack, thus granting better protection and security to clients transactions.

6. CONCLUDING REMARKS

In this paper we have described a framework that enables the construction of collaborative security environments whose aim is to defend financial institutions (e.g., banks, regula- tory agencies) from coordinated Internet-based attacks. In doing so, the framework effectively supports a new programming abstraction named Semantic Room (SR). The SR allows financial institutions to process data and share information and computing resources in a controlled, trusted and secure manner. SRs are characterized by (i) an objective (i.e., the SR functionality), (ii) a contract that specifies the set of rules for governing the SR membership and QoS requirements (e.g., security, performance requirements), and (iii) different software technologies that implement the specific SR processing and sharing logic. Owing to this latter property, SRs are customizable: they can be instantiated in different ways, deploying various processing and sharing schemes in order to match the needs of financial institutions.

To this end, we have shown the design of two distributed event processing schemes deployed within two different SRs;

namely, an SR that can be used for detecting stealthy scans and an SR that can be used for detecting MiTM attacks.

Specifically, the former relies on MapReduce technologies for parallelizing the complex event processing computation executed at geographical distributed financial sites, whereas the latter uses Distributed Hash Table technologies and general purpose CEP engines for carrying out the processing.

Both SRs enjoy large windows, in space and time, to detect stealthy scans and MiTM attacks ,correlating raw data coming from SR members.

Future work includes deploying and testing the MiTM SR and assessing performance of both SRs. At this aim we are looking at a different direction: developing an Esper- based SR in order to compare the performances with also a centralized correlation engine, carrying out a quantitative experimental evaluation of both presented SRs using large scale platforms such as PlanetLab [14], building a testbed on the Internet formed by three sites each one equipped with a cluster of machines where deploying the large scale processing environment.

7. ACKNOWLEDGMENTS

We are indebted with Thomas Kohler (UBS), Finn Otto Hansen (SWIFT) and Guido Pagani (Bank of Italy) who greatly helped us in better understanding strategies, constraints and needs of financial players.

8. REFERENCES

[1] This reference is obscured in order to meet the double blind requirement for the review process.

[2] This reference is obscured in order to meet the double blind requirement of the review process.

[3] Basel II Accord. http://www.bis.org/bcbs/bcbscp3.htm, 2009.

[4] DShield: Cooperative Network Security Community - Internet Security. http://www.dshield.org/indexd.html/, 2009.

[5] Hadoop-HDFS Architecture. http://hadoop.apache.org/

common/docs/current/hdfs_design.html, 2009.

[6] IBM WebSphere eXtreme Scale. http://www-01.ibm.com/

software/webservers/appserv/extremescale/, 2009.

[7] National Australia Bank it by DDoS attack.

http://www.zdnet.com.au/news/security/soa/

(10)

National-Australia-Bank-hit-by-DDoS-attack/0, 130061744,339271790,00.htm, 2009.

[8] Update: Credit card firm hit by DDoS attack.

http://www.computerworld.com/securitytopics/

security/story/0,10801,96099,00.html, 2009.

[9] Where Complex Event Processing meets Open Source:

Esper and NEsper. http://esper.codehaus.org/, 2009.

[10] FBI investigates 9 Million ATM scam.

http://www.myfoxny.com/dpp/news/090202\_FBI\

_Investigates\_9\_Million\_ATM\_Scam, 2010.

[11] Freepastry. http://www.freepastry.org/FreePastry/, 2010.

[12] Jaql. http://www.jaql.org/, 2010.

[13] JBoss Drools Fusion.

http://www.jboss.org/drools/drools-fusion.html, 2010.

[14] PlanetLab. http://www.planet-lab.org/, 2010.

[15] Asaf Adi and Opher Etzion. Amit - the situation manager.

VLDB J., 13(2):177–203, 2004.

[16] Mert Akdere, Ugur ¸Cetintemel, and Nesime Tatbul.

Plan-based complex event detection across distributed sources. PVLDB, 1(1):66–77, 2008.

[17] Lisa Amini, Navendu Jain, Anshul Sehgal, Jeremy Silber, and Olivier Verscheure. Adaptive control of extreme-scale stream processing systems. In ICDCS ’06: Proceedings of the 26th IEEE International Conference on Distributed Computing Systems, page 71, Washington, DC, USA, 2006.

IEEE Computer Society.

[18] V. Bortnikov, G. V. Chockler, A. Roytman, and M. Spreitzer. Bulletin Board: A Scalable and Robust Eventually Consistent Shared Memory over a Peer-to-Peer Overlay. In ACM LADIS 2009, 2009.

[19] G. V. Chockler, I. Keidar, and R. Vitenberg. Group communication specifications: a comprehensive study.

ACM Computer Survey, 33(4):427–469, 2001.

[20] T. Espiner. Symantec warns of router compromise.

http://www.zdnetasia.com/news/security/0,39044215, 62036991,00.htm, 2010.

[21] P. T. Eugster, P.A. Felber, R. Guerraoui, and

A. Kermarrec. The many faces of publish/subscribe. ACM Computer Survey, 35(2):114–131, 2003.

[22] Y. Huang, N. feamser, and A. Lakhina nad J. J. Xu.

Diagnosing network disruptions with network-wide analysis. In SIGMETRICS’07, San Diego, California, USA, 12-16 June 2007.

[23] Dean Jeffrey and Sanjay Ghemawat. MapReduce:

simplified data processing on large clusters. Commun.

ACM, 51(1):107–113, 2008.

[24] A. Klein. BIND 9 DNS cache poisoning. http://www.

trusteer.com/files/BIND_9_DNS_Cache_Poisoning.pdf, 2010.

[25] M. E. Locasto, J. J. Parekh, A. D. Keromytis, and S. J.

Stolfo. Towards collaborative security and p2p intrusion detection. In IEEE Workshop on Information Assurance and Security, United States Military Academy, West Point, NY, 15-17 June 2005.

[26] G. Lodi, F. Panzieri, D. Rossi, and E. Turrini. SLA-Driven Clustering of QoS-aware Application Servers. IEEE Transaction on Software Engineering, 33(3), 2007.

[27] P. Poncelet N. Verma, F. Trousset and F. Masseglia.

Intrusion Detections in Collaborative Organizations by Preserving Privacy. In Advances in Knowledge Discovery and Management, December 2009.

[28] Nagios. Nagios. http://www.nagios.org, 2010.

[29] P. R. Pietzuch. Hermes: A Scalable Event-Based Middleware. In Ph.D. Thesis, University of Cambridge.

[30] C. Tang, M. Steinder, M. Spreitzer, and G. Pacifici. A Scalable Application Placement Controller for Enterprise Data Centers. In 16th international Conference on World Wide Web, 2007.

[31] TriCipher. The perfect storm: Man in the middle attacks, weak authentication and organized online criminals.

http://www.tricipher.com/landing_pages/spotlight_

offer.html, 2010.

[32] Y. Xie, V. Sekar, M. K. Reiter, and H. Zhang. Forensic analysis for epidemic attacks in federated networks. In ICNP, pages 143–53, 2006.

[33] A. N. Klingsheim Y. Espelid, L. Netkand and K. J. Hole.

Robbing banks with their own software - an exploit against Norwegian online banks. In IFIP 23rd International Information Security Conference, September 2008.

[34] G. Zhang and M. Parashar. Cooperative detection and protection against network attacks using decentralized information sharing . Cluster Computing, 13(1):67–86, 2010.

[35] Xiaolan J. Zhang, Henrique Andrade, Bu˘gra Gedik, Richard King, John Morar, Senthil Nathan, Yoonho Park, Raju Pavuluri, Edward Pring, Randall Schnier, Philippe Selo, Michael Spicer, Volkmar Uhlig, and Chitra Venkatramani. Implementing a high-volume, low-latency market data processing system on commodity hardware using ibm middleware. In WHPCF ’09: Proceedings of the 2nd Workshop on High Performance Computational Finance, pages 1–8, New York, NY, USA, 2009. ACM.

[36] C. V. Zhou, S. Karunasekera, and C. Leckie. A peer-to-peer collaborative intrusion detection system. In 13th IEEE International Conference on Networks, Kuala Lumpur, Malaysia, November 2005.

[37] C. V. Zhou, C. Leckie, and S. Karunasekera. A survey of coordinated attacks and collaborative intrusion detection.

Computer and Security 29 (2010), pages 124–140, 2009.