• Non ci sono risultati.

A Collaborative Environment for Customizable Complex Event Processing in Financial Information Systems

N/A
N/A
Protected

Academic year: 2021

Condividi "A Collaborative Environment for Customizable Complex Event Processing in Financial Information Systems"

Copied!
10
0
0

Testo completo

(1)

A Collaborative Environment for Customizable Complex

Event Processing in Financial Information Systems

[Regular Paper (it can be also considered as Industry Paper)]

ABSTRACT

We present a framework to enable financial sector customers (such as banks, credit card companies, etc.) to build col- laborative protection systems to guard against coordinated Internet-based attacks. The essential element of this frame- work is a new programming abstraction, called Semantic Room (SR), through which interested parties can process data and share information and computing resources in a trusted and controlled fashion. To this end, each SR is as- sociated with a contract which, among other things, spec- ifies its functionality (e.g., botnet and stealthy scan detec- tion), QoS, and a set of rules governing its membership. We present the design of two event processing systems that we implemented to support the SR functionality in large dis- tributed settings, and show how they can be used for de- tecting stealthy scans and man-in-the-middle attacks.

Keywords

collaborative architectures for event processing distributed event processing Financial Infrastructure Federated event based systems Domain specific deployments of event-based systems, MapReduce

1. INTRODUCTION

The growing exposure to the Internet has made financial in- stitutions increasingly vulnerable to a variety of security re- lated risks, such as increasingly sophisticated cyber attacks aiming at capturing high value (or, otherwise, sensitive) in- formation, or disrupting service operation for various pur- poses. Such security attacks result in both short and long term economic losses due to the lack of service availabil- ity and infrastructural resilience, and the decreased level of trust on behalf of the customers. This is why such attacks are categorized as an operational risk by the Basel Commit- tee on Banking Supervision in the first pillar of the Basel II accord [3].

To date, these attacks have been faced in isolation by the

single financial institution using several tools that re-enforce their defense perimeter (e.g. intrusion detection systems, firewalls etc). These tools detect possible attacks by ex- ploiting the information available from the logs maintained at the financial institution, for example, carefully looking whether there exists some host that performs suspicious ac- tivities within certain time windows. Nowadays, attacks are more sophisticated making this kind of defense inadequate.

Specifically, attacks are distributed in space and time: Dis- tributed in space means that attacks are coordinated on a large scale basis and originate from multiple geographically dispersed locations. They are also distributed in time often consisting of a preparation phase spanning over several days or weeks, and involving multiple preparatory steps aiming at identifying vulnerabilities and attack vectors (such as ac- cidentally open ports) [7, 8, 10]. To detect these attacks a more large view of what is happening in the Internet is re- quired, that could be obtained by sharing and combining the information available at several financial institutions. This information has then to be processed and correlated on-the- fly in order to detect threats and frauds. Even though this sharing can result in a great advantage for financial institu- tions, it should be carried out only on a clear contractual base and in a trusted and secure environment capable of ensuring privacy and strict confidentiality requirements of financial institutions. Therefore, there is an urgent need of establishing contractually regulated collaborative environ- ments that can be created in a structured and controlled manner, where participants can belong to different organi- zational, administrative and geographical domains.

In this paper we introduce a novel programming abstraction called Semantic Room (SR) that enables the construction of such collaborative contractually regulated environments.

These environments perform distributed event aggregation and correlation on the data provided by organizations par- ticipating in the SR with the aim of monitoring widely dis- tributed infrastructures and providing early detection of at- tacks, frauds and threats. Each SR has a specific strate- gic objective to meet (e.g., botnet detection, stealthy scan, Man-In-The-Middle) and is associated with both a contract, which specifies the set of rights and obligations for govern- ing the SR membership, and different software technologies that are used to carry out the data processing and sharing within the SR. SRs can communicate with each other; that is, the output of an SR can be used as input of another SR, thus creating the conditions for the deployment of a modular environment.

(2)

The paper describes also a framework that can be used in order to support the SR programming abstraction; a num- ber of design alternatives for the framework are discussed.

One of the its key design features is the ability to clearly separate the SR management from the complex event pro- cessing and sharing. The former is provided using basic SR software that guarantees the construction of the trusted and contractually regulated environment (i.e., the SR) necessary for the execution of a secure event processing computation.

The latter is performed within the SR and can be developed in a custom way according to the SR objective, the available computational resources, and the different event processing technologies that can be deployed. In other words, the SR is agnostic with respect to the event processing technology (e.g. [17], [9]) and paradigm (i.e., centralized event process- ing vs distributed event processing) being used. It will be responsibility of the entity that instantiates the SR to cus- tomize the processing so as to meet the SR objective and the contractual obligations.

In order to validate the framework earlier mentioned, we present two different instances of SRs; namely, an SR for stealthy scan detection that uses MapReduce technologies [23] for processing purposes, and an SR for Man-In-The- Middle attack detection that uses a DHT-based processing.

The rest of this paper is structured as follows. Section 2 introduces related work. Section 3 describes the Semantic Room abstraction highlighting the principal elements that are used for defining it. Section 4 presents the framework we have designed in order to support the construction, de- ployment and execution of SRs. Section 5 describes two different SRs that can be instantiated using the framework previously introduced, and finally Section 6 discusses the main conclusions of this work and future works.

2. RELATED WORK

Detecting event patterns, sometime referred to as situations, and reacting to them are in the core of Complex Event Pro- cessing (CEP) and Stream Processing (SP), both of which play an important role in the IT technologies employed by the financial sector as overviewed in [15]. In particular, IBM System S [17] has been used by market makers in processing high-volume market data and obtaining low latency results as reported in [35]. System S as other CEP/SP systems, e.g. [16, 29], are based on event detection across distributed event sources.

The issue of using massive complex event processing among heterogeneous organizations for detecting network anomalies and failures has been suggested and evaluated in [22]. Also the usefulness of collaboration and sharing information for telco operators with respect to discovering specific network attacks has been pointed out in [32, 34]. In these works, it has been clearly highlighted that the main limitation of the collaboration approach concerns the confidentiality re- quirements. These requirements may be specified by the organizations that share data and can make the collabora- tion itself hardly possible as the organizations are typically not willing to disclose any private and sensitive information.

In our framework the notion of SR can be effectively used in order to build a secure and trusted environment, which can be enriched with the degree of privacy needed by the partic-

ipants where a massive and collaborative event processing computation can occur.

Recently, collaborative approaches addressing the specific problem of Intrusion Detection Systems (IDSs) have been proposed in a variety of works [4, 25, 36, 27]. Differently from singleton IDSs, collaborative IDSs significantly improve time and efficiency of misuse detections by sharing informa- tion on attacks among distributed IDs from one or more organizations [37]. The main principle of these approaches is that there exist local IDSs that detect suspects by analyz- ing their own data. These suspects are then disseminated using possibly peer to peer links. This approach has two main limitations: it relies on data that can be freely ex- changed among the peers and it does not fully exploit the information seen at every site. The former constraint can be very strong, especially in the financial context we are evalu- ating, due to the data confidentiality requirements that are to be met. Our framework has the merit to address this spe- cific issue as it is designed so as to provide different levels of anonymization of the data the organizations can inject into the SR. Moreover, our framework aims at processing data injected into the SR so as suspects are devised through the exploitation of all the data made available by every partici- pating organization. The main advantage of this approach is that the space and time window used to detect complex at- tacks can be surely enlarged, thus sharpening the detection accuracy.

3. THE SEMANTIC ROOM ABSTRACTION

A Semantic Room (SR) is a federation of financial insti- tutions formed for the sake of information processing and sharing. The financial institutions participating in a specific SR are referred to as the members of the SR.

The SR abstraction is defined by the three following princi- pal elements:

• contract: each SR is regulated by a contract that de- fines the set of processing and data sharing services provided by the SR along with the data protection, privacy, isolation, trust, security, dependability, and performance requirements. The contract also contains the hardware and software requirements a member has to provision in order to be admitted into the SR. For the sake of brevity we do not include in this paper SR contract details; however, the interested readers can refer to [1] for further information;

• objective: each SR has a specific strategic objective to meet. For instance, there can exist SRs created for implementing large-scale stealthy scans detection, or SRs created for detecting Man-In-The-Middle attacks;

• deployments: each SR can be associated with different software technologies, thus enabling a variety of SR deployments. The SR abstraction is in fact highly flex- ible to accommodate the use of different technologies for the implementation of the processing and sharing within the SR (i.e., the implementation of the SR logic or functionality). In particular, the SR abstraction can support different types of system approaches to

(3)

Semantic Room!

Communication!

Complex Event Processing and Applications!

Internet!

!"#$%&&%'(')*)(

!"#$%&#'()

*+#,-./0+#1))

!"#$%&&%'(')*)((

!%2$%&#'()*+#,-./0+#1)) 3'4)5'$')

Figure 1: The Semantic Room abstraction

the processing and sharing; namely, a centralized ap- proach that employs a central server (e.g., Esper [9]), a decentralized approach where the processing load is spread over all the SR members (e.g., a DHT-based processing), or a hierarchical approach where a pre- processing is carried out by the SR members and a selected processed information is then passed to the next layer of the hierarchy for further computations.

Figure 1 depicts the SR abstraction. As shown in this Fig- ure, the SR abstraction supports the deployment of two components termed Complex Event Processing and Appli- cations, and Communication which can vary from SR to SR depending on the software technologies employed to imple- ment the SR processing and sharing logic, and a set of man- agement components, that together form the marble rectan- gle in Figure 1 and that are exploited for SR management purposes (e.g., management of the membership, monitoring of the SR operations). Section 4 describes in detail all these components.

SR members can inject raw data into the SR. Raw data may include real-time data, inputs from human beings, stored data (e.g., historical data), queries, and other types of dy- namic and/or static content that are processed in order to produce complex processed data. Raw data are properly managed in order to satisfy privacy requirements that can be prescribed by the SR contract and that are crucial in or- der to effectively enable a collaborative environment among different, and potentially competitive, financial institutions.

Processed data can be used for internal consumption within the SR: in this case, derived events, models, profiles, black- lists, alerts and query results can be fed back into the SR so that the members can take advantage of the intelligence provided by the processing (Figure 1). SR members can, for instance, use these data to properly instruct their local secu- rity protection mechanisms in order to trigger informed and timely reactions independently of the SR management. In addition, a (possibly post-processed) subset of data can be offered for external consumption. SRs in fact can be willing to make available for external use their produced processed data. In this case, in addition to the SR members, there can exist clients of the SR that cannot contribute raw data directly to the SR but can simply consume the SR processed data for external consumption (Figure 1). SR members have full access to both the raw data members agreed to con- tribute to by contract, and the data being processed and thus output by the SR. Data processing and results dissem-

ination are carried out by SR members based on obligations and restricts specified in the above mentioned contract.

4. THE FRAMEWORK

The framework that supports the Semantic Room abstrac- tion over a pool of (locally and geographically) distributed computational, storage, and network resources is shown in Figure 2.

In the framework we can clearly identify two principal lay- ers: the SR management layer which is responsible for the management of the SR, and the Complex Event Process- ing and Applications layer which realizes the SR processing and sharing logic. In addition, all the architectural com- ponents of both layers above can utilize various commodity services for (i) exchanging control and monitoring informa- tion among them (such as load and availability), (ii) manag- ing resource allocation to the complex event processing and applications both off-line and at run time through the use of controllers and schedulers, and (iii) storing processing state and data. In the following subsections we describe in more detail the layers of our framework.

4.1 Semantic Room Management

This layer is responsible for supporting the SR abstraction on the top of the individual processing and data sharing ap- plications provided by the Complex Event Processing and Applications layer. It embodies a number of components fulfilling various SR management functions. Such functions include the management of the entire SR lifecycle (i.e., cre- ation of an SR, instantiation of an SR, disband of an SR, management of the SR membership), the registration and discovery of SRs and SR contracts, the configuration and planning of SRs, the management of the communications among different SRs, and the management of trust and rep- utation within SRs [2]. In addition, each SR member inter- faces an SR through the use of a component of this layer termed SR gateway. This component transforms raw data into events in the format specified by the SR contract. In general, this transformation is necessary as depends on the specific SR objective to be met and comprises three dis- tinct pre-processing steps; namely, filtering, aggregation, and anonymization. This latter consists of applying different anonymization techniques in case privacy and confidentiality requirements are prescribed by the SR contract.

4.2 Commodity services

A number of services can be used by both layers previously mentioned in order to provide functionalities such as commu- nication, storage, resource and contract management, and monitoring. These services are transversal to the other lay- ers and are described in isolation in the following, highlight- ing several design alternatives available at the state of the art and that can be used for their implementation.

Storage Services. The Storage Service layer of the archi- tecture consists of a collection of components providing var- ious kinds of storage services to the other layers. This layer can embody services for long term storage of large data sets, such as monitoring logs and historical data (HDFS [5] is an option we currently use), and for low latency storage of lim- ited amounts of real-time data (an option in this case we use is WebSphere XS [6]).

(4)

!"#$%&'"()*+((

'$*,&)',(

-)*)."/"*,(

012#3')4(!"#$%&'"#(

5$&67$8(

-)*)."&(

04)'"/"*,(

9$*,&$44"&(

:'1"+%4"&#(

5";(

<==#(

5";(9$*,)3*"&(

:!>( :!?( :!@( :!A(

:!(-)*)."/"*,(

-)*)."&(:!( :!(!".3#,&2( :!(

9$**"'BC3,2(

:!(D),"8)2(

<*)42B'#(<==#(

<*)42B'#(9$*,)3*"&(

EC"*,(

0&$'"##3*.(<==#(

EC"*,(=&$'"##3*.((

9$*,)3*"&(

:,&")/3*.(

F(

-"##).3*.(

!"##$

%&'()"%*

G$)+(

H)4)*'3*.(

9$/=4"I(EC"*,(0&$'"##3*.()*+(<==43')B$*#(

J*$84"+."(

H)#"(-./,(

<==#(

K34"(

:2#,"/#(

D&$%=(

9$//%*

3')B$*(

L*M$&/)B$*(:!(

-)*)."/"*,(

0%;43#1N :%;#'&3;"(

:,$&)."(

:"&C3'"#(

L*N /"/$&2(

#,$&)."(

OH(

-",&3'#(-$*3,$&3*.(

Figure 2: The framework for building collaborative protection environments

Communication. The Communication layer consists of a collection of components providing various kinds of com- munication services to the other layers. This layer can in- clude large-scale group communication services [19], [18], re- liable low-latency, high throughput message streaming ser- vices that are useful for supporting real time event streaming from within the external components into an SR, and pub- lish/subscribe services [21].

Resource and contract management. The Resource and contract management layer is responsible for allocat- ing physical resources (such as computational, storage, and communication) to the SRs so as to satisfy the business ob- jectives (such as performance goals, data and resource shar- ing constraints, etc.) prescribed by the SR contracts. A capacity planning study might be completed in the prelim- inary phase of an SR startup. In such a way, the Resource Management layer “can be aware” of the maximum capac- ity that each SR has to provide in terms of computational power, throughput, memory and storage. To this end, this layer can include a scheduler and placement controller for initial allocation of the services to the physical resources, and a runtime load balancer for possible dynamic re-allocations [30, 26].

As each SR can count on a set of (locally and geographically) distributed data and computational resources, we find con- venient to consider the following three alternatives for the SR deployment that can be all three supported by our frame- work:

• SR-owned platform: the computational resources of each SR are owned by its members, although one mem- ber is deputy as an SR administrator. The compu- tational platform of an SR is fully dedicated to the complex event processing and applications of that SR.

• Third party-owned platform: the computational resources are owned by a third party. The computational re- sources could be shared among the complex event pro- cessing and applications of the SRs. The collocation and data flows can be subject to some restrictions spec- ified by the SR contract. This alternative corresponds to the so called “SR as a service”.

• Mixed platform: the computational resources of each SR are owned by the members of that SR. This plat- form runs the logic of its SR, but it can occasionally offer hosting services for running the logic of other SRs.

This case is allowed only after explicit request coming from another SR where its complex event processing and applications exceed the capacity of its computa- tional platform and implies some business (and trust) relationship among the involved SRs.

Metrics Monitoring. The Metric Monitoring layer is re- sponsible for monitoring the architecture in order to assess whether the requirements specified by the SR contract are effectively met. As the Metrics Monitoring is a transversal layer of the framework, it operates at both the SR Manage- ment and Complex Event Processing and Applications lay- ers. In particular, at the SR Management layer it is in charge of periodically collecting monitoring information related to the management of SRs (e.g., SR membership information) in order to detect whether that management violates the requirements included into SR contracts. The Metrics Mon- itoring keeps track of the dynamic behavior of the SRs and check whether or not SRs and SR members themselves are honoring their respective contracts. In case the Monitoring detects that SR contracts are close to be violated, it inter- acts with SR Management components in order to trigger proper reconfiguration activities.

At the Complex Event Processing and Applications layer (see below), the Metrics Monitoring is in charge of period- ically evaluating whether or not the resource management required by this layer is effectively able to support the execu- tion carried out within this layer. In addition, it is responsi- ble for detecting whether or not the processing execution vi- olates all those requirements specified into the SR contracts.

The Metrics Monitoring uses “sensors”, possibly located at physical resource and container levels, in order to obtain the set of information required for enforcing the metrics of interest (in our current implementation we favored the use of Nagios monitoring technology [28] for metrics monitoring purposes).

(5)

4.3 Semantic Room Complex Event Process-

ing and Applications

This layer consists of applications implementing the data processing and sharing logic required to support the SR functionality. A typical application being hosted in an SR will need to fuse and analyze large volumes of incoming raw data produced by numerous heterogeneous and possi- bly widely distributed sources, such as sensors, intrusion and anomaly detection systems, firewalls, monitoring sys- tems, etc. The incoming data will be either analyzed in real-time, possibly with assistance of analytical models, or stored for the subsequent off-line analysis and intelligence extraction. This suggests a characterization of the appli- cations supported by an SR, whose runtime instances can be hosted within various runtime container components. In particular the application containers, which can be either standalone or clustered can be the following:

• Event Processing Container: This container is respon- sible for supporting event-processing applications in a distributed environment. The applications manipulate and/or extract patterns from streams of event data arriving in real-time, from possibly widely distributed sources, and need to be able to support stringent guar- antees in terms of the response time and/or through- put.

• Analytics Container: This container is responsible for supporting parallel processing and querying massive data sets on a cluster of machines. It will be used for supporting the analytics and data warehousing appli- cations hosted in the SR.

• Web Container: This container will provide basic web capabilities to support the runtime needs of the web applications hosted within an SR. These applications support the logic enabling the interaction between the client side presentation level artifacts (such as web browser based consoles, dashboards, widgets, rich web clients, etc.) and the processing applications.

Different implementations of the Complex Event Processing and Applications layer can be supported by our framework.

In particular, it can be possible deploying an SR that uses a central server for the implementation of both event process- ing and analytics containers. In this case, financial institu- tions of the SR send their own data to the central server (an example of a centralized event correlation engine that can be used in this case is Esper [9]). The central engine performs the correlation and analysis of the data and sends back to the financial institutions the generated processed data to let each financial institution adopt its own countermeasures in a timely fashion.

However, although this solution is fully supported by our framework, it suffers from the inherent drawbacks of a cen- tralized system. The central server may become a single point of failure or security vulnerability: if the server crashes or is compromised by a security attack, the complex event processing computation it carries out can be unavailable or jeopardized. In addition, the volume of events the central server can process in the time unit is limited by the server’s

processing and bandwidth capacities, thus limiting the sys- tem scalability. Therefore, in our current implementation of the architecture we favored the use of technologies that allow us to realize a decentralized complex event processing.

In particular, we used both MapReduce [23] and DHT-based technologies for implementing the specific SR logics, as de- scribed in the next section.

5. SEMANTIC ROOM INSTANCES

In order to show the flexibility of the proposed SR abstrac- tion, we describe in this section two SRs that differ one another for both the objective to fulfill and the deployed software technologies used to implement the processing and sharing logic and thus meet the objective. Specifically, we introduce in the next subsections an SR for collaborative intrusion detection which deploys a MapReduce [23]-based technology and an SR for collaborative Man-In-The-Middle detection which distributes the load among processing ele- ments through a DHT. The two SRs are at a different level of maturity: the SR for intrusion detection has been success- fully deployed and tested, whereas the one for collaborative Man-In-The-Middle detection is currently under implemen- tation and its complete development and testing is foreseen by June 2010.

5.1 Semantic Room for Collaborative Intru-

sion Detection

In this section, we discuss a specific instantiation of an SR, to which we refer as ID-SR, whose objective is to prevent potential intrusion attempts by detecting stealthy port scan- ning activity. The subjects of the attack are the web servers handling the external web connectivity of the participat- ing financial institutions. Those web servers typically run outside the corporate firewall (in DMZ), and are therefore, frequently targeted by the attackers. The goal of the attack is to identify TCP ports that might have been left opened at the attacked subjects. The attack is carried out by initi- ating a series of TCP connections to ranges of ports at each of the targeted DMZ servers. The ports that are detected as opened can be used as the intrusion vectors at a later time.

The attack detection is based on identifying patterns of un- usually high number of TCP SYN requests possibly target- ing an unusually high number of ports, and originating from the same external IP address. The statistics are collected and analyzed across the entire set of the ID-SR participants, thus improving chances of identifying low volume activities, which would have gone undetected if the individual partici- pants were exclusively relying on their local protection sys- tems. In addition, to minimize the amount of false positives, the real-time suspicions are periodically calibrated through a reputation system which maintains the site ranking based on the past history of the malicious activities originating from those sites.

To support data analysis, we implemented a distributed event processing system, called Agilis, which is described in the next section. The analysis logic employed by the ID- SR is detailed in Section 5.1.2, and our initial experience with using the Agilis-based ID-SR prototype is presented in Section 5.1.3.

(6)

Analysis Results

GW

MR WX

S

MR

MR

WX S

HDF S SR Gateways

GW

GW

GW

WXS Management

Cat 1 Cat 2 Cat 2

Hadoop/HDFS Management

JT TT ZK CM

WXS

Agilis front-

end

Processing logic (Jaql), Config

Admin Console

Figure 3: MapReduce-based Semantic Room

5.1.1 The Agilis System

Agilis (see Figure 3) consists of a distributed network of pro- cessing and storage elements hosted on a cluster of machines allocated from the ID-SR hardware pool (as prescribed by its contract). The processing is based on the Hadoop’s MapRe- duce framework [23]. The processing logic is specified in a high-level language, called Jaql [12], which compiles into a series of MapReduce jobs. To improve detection latency, the mappers and reducers communicate through buffers stored in the main memory storage system, called IBM WebSphere eXtreme Scale (WXS) [6]. The individual components of the Agilis’ framework are illustrated in Figures 3 and 4, and described in detail below.

WebSphere eXtreme Scale (WXS).

WebSphere eXtreme Scale (WXS) is a distributed main memory-based storage system implemented in Java. It allows the user data to be organized into a collection of maps consisting of either relational records, or key-value pairs. At runtime, the data are stored in Data Servers or containers hosted on a cluster of machines. The clients can query the stored data using either a simple get/set API, or full-blown SQL queries. The queries can be executed either on the client, or within a container using an embedded SQL engine.

For scalability, the map’s data can be broken into a fixed number of partitions, which would then be evenly distributed among the WXS containers by the WXS runtime. In addi- tion, for fault tolerance, each map partition can be repli- cated on a configured number of containers. The informa- tion about the operational containers as well as the layout of hosted map partitions and their replicas is maintained at runtime in the WXS Catalog service, which is typically replicated for high availability.

Processing framework.

The processing is carried out on the machines within the ID-SR cluster, and orchestrated

through the optimized Hadoop scheduling framework. The latter consists of a centralized Job Tracker (JT) which coor- dinates the local execution of mappers and reducers on each of the ID-SR nodes through a collection of Task Trackers (TT) (one per machine).

Most of our scheduling optimizations were targeted at im- proving locality of processing by scheduling the map tasks close to the WXS partitions holding their respective in- put splits. To match the input splits with the WXS par- titions, we provided a new implementation of the Hadoop’s InputFormat interface which was packaged with every Agilis’

MapReduce job submitted to JT. Subsequently, the getSplits method of this interface was used by JT to determine the split locations at runtime (which was obtained by interrogat- ing the WXS Catalog service); and the createRecordReader method to create an instance of RecordReader to read the data from the corresponding WXS partition. To further improve locality, our implementation of RecordReader rec- ognized the SQL select, project, and aggregate queries (by interacting with the Jaql interpreter), and delegated their execution to the SQL engine embedded into the WXS con- tainer.

In many cases, this approach resulted in a substantial re- duction in the volumes of intermediate data reaching the reducers thus improving latency, bandwidth utilization, and reducing processing costs. It also allowed us to further en- force privacy of the input data submitted by the individual ID-SR members by scheduling the initial map processing on the machines residing within their administrative bound- aries.

Long-term data storage.

Hadoop File System (HDFS) [5]

is used to provide storage services for massive amount of data that should be preserved over time (such as e.g., his- torical data keeping track of past attacks). The data stored in HDFS can be injected into Hadoop through the provided

(7)

Client !

machine Tracker Job !

Jaql ! Interp!

reter

Hadoop’s Map-Reduce

Task!

Tracker Task!

Tracker Task!

Tracker

Jaql query

Hadoop Job

•!Jaql snippets for M & R

•!Jaql interpreter

•!InputFormat, OutputFormat

Distributed In-Memory Store (WXS) Storage !

container Storage !

container Storage !

container

Cat 1

Cat 2

Agilis

Figure 4: The Components of the Agilis Runtime.

HDFS InputFormat implementation, and combined with the WXS data using the Jaql I/O constructs. HDFS is managed and kept consistent by the Hadoop’s Chunk Manager (CM), and Zookeeper (ZK) services.

The Jaql language.

The processing logic is expressed in a high-level language, called Jaql. Jaql supports SQL-like query constructs that can be combined into flows. It can also interoperate with a large variety of data sources due to its use of the standardized JSON data model. As shown in Figure 4, in Agilis, the locally compiled Jaql flows are first augmented with the input and output formats to interoper- ate with WXS, and then submitted to the modified Hadoop scheduler, which orchestrates their execution on the ID=SR machines as explained above.

5.1.2 The ID-SR Processing Steps

The processing steps followed by the ID-SR implementa- tion are depicted in Figure 5(a). At the fist step, the raw data capturing the current networking activity at each of the participating machines (as output by the tcpdump util- ity) is collected using the tcpdump utility and forwarded to the local ID-SR gateway. Each gateway will then normal- ize the incoming raw data producing a stream of LogEvent records of the form: hsourceIP, destinationIP, sour- cePort, destinationPort, bytesSent, bytesReceived, returnStatusi. The LogEvent records are stored in a WXS partition hosted on a locally deployed WXS container.

The incoming LogEvent records are then processed by a col- lection of MapReduce jobs handled by Agilis. The process- ing logic consists of the following steps: First, the input records are subjected to the Summarization flow (see Fig- ure 5(b)) which consists of two processing steps surrounded by two I/O steps (for reading the input, and writing the results). The outcome of the two processing steps is a col- lection of summary records of the form h sourceIP, port- sNum, reqNum i representing for each source IP address (sourceIP) the number of distinct ports (portsNum) ac- cessed from sourceIP along with the total number of re- quests (reqNum) originating from sourceIP.

The summary records are then fed into the Blacklisting flow (see Figure 5(b)), which will blacklist a source IP address if the number of requests and distinct ports accessed from that IP address exceed fixed limits. In addition, the sum- mary records are also joined with the historical records of the form h sourceIP, rank i using sourceIP as a key to adjust the long-term rank representing the IP address threat level.

The historical records are used to periodically calibrate the blacklist by excluding the IP addresses whose ranks fall be- low a fixed threshold.

5.1.3 The ID-SR Prototype

In this section, we report on our initial experience with de- ploying and testing the ID-SR prototype on a small cluster of 8 Linux Virtual Machines (VMs), each of which equipped with 2GB of RAM and 20GB of disk space. Large scale ex- perimental study using PlanetLab[14] is currently in progress.

The layout of the Agilis components on the cluster was as follows: One of the VM’s was dedicated to host all of the Agilis management components: Hadoop Name Node, Job Tracker, and WXS Catalog Server. Each of the remaining 7 VM’s represented a single ID-SR participant, and hosted Data Node, Task Tracker, WXS Data Servers (containers), and an external web server (which was the attack subject).

To assess the accuracy and timeliness of the port scan at- tack detection, we measured the Agilis performance in 3 experimental scenarios. All scenarios involved a single in- truding host that generated a series of TCP/SYN requests targeting a fixed set of 300 unique ports on each the 7 attacked servers. In each scenario, the requests were in- jected at constant rate which was set to 10, 20, and 30 re- quests/server/second for the first, second, and the third sce- nario respectively. The ratio of the attack to the legitimate traffic per server was 1:5 resulting in the total (malicious and legitimate) incoming event traffic at each server of 60, 120, and 180 requests/second for each of the three scenarios. The blacklisting threshold was set to 20,000 requests and 1000 unique ports. The Jaql flow in Figure 5 was compiled into 3 MapReduce jobs whose total running time was 110 sec on average. The jobs were executed periodically, once every 4

(8)

GW1

GW2

GW3 WXS Part 1

WXS Part 2

WXS Part N

Historical Safety Ranking

Black List

TCPDump

Normalized Data:

[LogEvent]*

TCPDump

TCPDump

Summarized Data:

[SourceIP, rNum, pNum]*

Summ arizati on

Blacklis ting

Rankin g Summarized

Data

[SourceIP,rank]*

Calibr ation

Parallelized Map/Reduce Jobs Summarization:

read($ogPojoInput(”LogEvents”,

”dataObj.LogEvent”,

”returnStatus = ’SYN’”))

→ group by $ip port =

{$.sourceIp, $.destPort, $.destIp} as $rs into {$ip port.sourceIp, numReq: count($rs)}

→ group by $ip = {$.sourceIp}

into {ip: $ip.sourceIp, portsNum: count($), reqNum: sum($[*].numReq)}

→ write($ogPojoOutput(”LogEventsSum”,

”dataObj.LogEventSum”, ”ip”));

Blacklisting:

read($ogPojoInput(”LogEventsSum”,

”dataObj.LogEventSum”,

reqNum > 10000 AND portsNum > 200”))

→transform $.ip

→write($ogPojoOutput(”BlackList”,

”dataObj.HistoryData”, ”ip”));

(a) (b)

Figure 5: Data Flow and Jaql Query Fragments Used for Port Scan Detection in ID-SR

minutes.

In all of our experimental runs, Agilis was able to absorb the entire volumes of the incoming event traffic without experi- encing overload, and correctly blacklist the single intruding host (and none of the others). The detection times (i.e., the time between the beginning of the scan and the intruder blacklisting) for the three scenarios were on average 700, 430, and 330 seconds respectively. This indicates that Agilis was able to take advantage of extra data points to improve the detection latency. Also, note that since the total number of unique ports accessed at each of the 7 attacked subjects in all three scenarios never exceeded 1000 (the detection threshold), the ID-SR participants would have been never able to correctly identify the intruding machine unless they cooperated and shared data through Agilis.

5.2 Semantic Room for Collaborative Man-In-

the-Middle Detection

MitM makes a legitimate user start a connection with a mali- cious server that mimics the legitimate server behavior. Dif- ferent techniques can be used for obtaining such behavior:

DNS cache poisoning [24], compromise of an intermediate routing node [20], phishing [31], exploit vulnerabilities of authentication mechanisms [33]. In general, the malicious server stores the user credentials, relays them to the licit server, and forwards the response to the user on behalf of the licit server. The MitM node eavesdrops all data flow- ing between the client and the server, thus accessing a great amount of sensitive information. MiTM attacks at a single node can be usually detected looking at anomalies in the statistics of Web and Application servers accesses, identi- fying an IP address that contacts ”too many” services with credential belonging to different users. The SR program- ming abstraction can then effectively used: it enables an aggregation and correlation of data logs coming from differ- ent SR members that can reduce significantly the detection time of such attacks as the same IP address could also attack services of other members of the SR.

The event processing carried out in such SR can be sum- marized as follows. There exist Processing Elements (PEs) that are arranged in a DHT, as shown in Figure 6. Each PE implements the event processing container described in the previous section. The container receives events from differ- ent SR gateways. A single SR gateway pre-processes events produced by raw sources of information that reside at a fi- nancial institution e.g., logs of Web or Application Server technologies such as Tomcat, JBoss, deployed by an institu- tion in order to provide their customers with such financial services as e-banking, e-payment and e-trading. The raw data injected into the SR gateway are filtered, aggregated and anonymized (if privacy requirements are mandated by the SR contract). The output of the SR gateway is an event which is injected into the SR. The format of the events is hsource ip, session idi, where source ip is the address orig- inating a connection to a financial institution server, and session id is a concatenation of a unique identifier for the fi- nancial institution offering a financial service, the requested service URL and the user id that authenticated to the ser- vice. These information are sufficient in detecting MiTM if there exists a suspicious number of connections originat- ing from the same source ip towards financial services with different user credentials, i.e., different session ids.

Each PE is constructed out of three main subsystems (Fig- ure 6); namely, the event manager, overlay manager and complex event processing engine. These subsystems perform the following functions:

• the Event Manager receives events from the SR gate- ways and submits the events to the Overlay Manager (see below). Moreover, it receives the results of the complex event processing from a CEP Engine imple- mented by the PE, and disseminates the results (i.e., alarms) to the SR gateways of the financial institutions members of the SR.

• the Complex Event Processing (CEP) Engine processes events received from the Overlay Manager and sub- mits the analysis results to the Event Manager. The

(9)

Figure 6: The DHT-based Semantic Room and The Processing Element Architecture.

processing required to identify MitM attacks is based on the computing of statistical anomalies on received events. These anomalies can be abstracted by the fol- lowing high level attack pattern: given a time win- dows with m events, find at least n events with the same source ip and different session ids. Currently, this subsystem is implemented without using general purpose CEP engines available on the market. How- ever, we are evaluating its implementation with either JBoss Drools Fusion [13] or Esper [9].

• the Overlay Manager (OM) receives new events from the Event Manager and forwards them on the DHT, where the DHT key is obtained by hashing the source ip address field of the event. The DHT we have used in our preliminary implementation is FreePastry [11], since its routing based on proximity improves perfor- mance by exploiting locality of nodes.

Once the addresses of malicious servers have been identi- fied, they can be blacklisted and shared among SR members in order to block them, audit internal servers and identify customers that may have suffered the attack, thus granting better protection and security to clients transactions.

6. CONCLUDING REMARKS

In this paper we have described a framework that enables the construction of collaborative security environments whose aim is to defend financial institutions (e.g., banks, regula- tory agencies) from coordinated Internet-based attacks. In doing so, the framework effectively supports a new program- ming abstraction named Semantic Room (SR). The SR al- lows financial institutions to process data and share infor- mation and computing resources in a controlled, trusted and secure manner. SRs are characterized by (i) an objective (i.e., the SR functionality), (ii) a contract that specifies the set of rules for governing the SR membership and QoS re- quirements (e.g., security, performance requirements), and (iii) different software technologies that implement the spe- cific SR processing and sharing logic. Owing to this latter property, SRs are customizable: they can be instantiated in different ways, deploying various processing and sharing schemes in order to match the needs of financial institutions.

To this end, we have shown the design of two distributed event processing schemes deployed within two different SRs;

namely, an SR that can be used for detecting stealthy scans and an SR that can be used for detecting MiTM attacks.

Specifically, the former relies on MapReduce technologies for parallelizing the complex event processing computation executed at geographical distributed financial sites, whereas the latter uses Distributed Hash Table technologies and gen- eral purpose CEP engines for carrying out the processing.

Both SRs enjoy large windows, in space and time, to de- tect stealthy scans and MiTM attacks ,correlating raw data coming from SR members.

Future work includes deploying and testing the MiTM SR and assessing performance of both SRs. At this aim we are looking at a different direction: developing an Esper- based SR in order to compare the performances with also a centralized correlation engine, carrying out a quantitative experimental evaluation of both presented SRs using large scale platforms such as PlanetLab [14], building a testbed on the Internet formed by three sites each one equipped with a cluster of machines where deploying the large scale processing environment.

7. ACKNOWLEDGMENTS

We are indebted with Thomas Kohler (UBS), Finn Otto Hansen (SWIFT) and Guido Pagani (Bank of Italy) who greatly helped us in better understanding strategies, con- straints and needs of financial players.

8. REFERENCES

[1] This reference is obscured in order to meet the double blind requirement for the review process.

[2] This reference is obscured in order to meet the double blind requirement of the review process.

[3] Basel II Accord. http://www.bis.org/bcbs/bcbscp3.htm, 2009.

[4] DShield: Cooperative Network Security Community - Internet Security. http://www.dshield.org/indexd.html/, 2009.

[5] Hadoop-HDFS Architecture. http://hadoop.apache.org/

common/docs/current/hdfs_design.html, 2009.

[6] IBM WebSphere eXtreme Scale. http://www-01.ibm.com/

software/webservers/appserv/extremescale/, 2009.

[7] National Australia Bank it by DDoS attack.

http://www.zdnet.com.au/news/security/soa/

(10)

National-Australia-Bank-hit-by-DDoS-attack/0, 130061744,339271790,00.htm, 2009.

[8] Update: Credit card firm hit by DDoS attack.

http://www.computerworld.com/securitytopics/

security/story/0,10801,96099,00.html, 2009.

[9] Where Complex Event Processing meets Open Source:

Esper and NEsper. http://esper.codehaus.org/, 2009.

[10] FBI investigates 9 Million ATM scam.

http://www.myfoxny.com/dpp/news/090202\_FBI\

_Investigates\_9\_Million\_ATM\_Scam, 2010.

[11] Freepastry. http://www.freepastry.org/FreePastry/, 2010.

[12] Jaql. http://www.jaql.org/, 2010.

[13] JBoss Drools Fusion.

http://www.jboss.org/drools/drools-fusion.html, 2010.

[14] PlanetLab. http://www.planet-lab.org/, 2010.

[15] Asaf Adi and Opher Etzion. Amit - the situation manager.

VLDB J., 13(2):177–203, 2004.

[16] Mert Akdere, Ugur ¸Cetintemel, and Nesime Tatbul.

Plan-based complex event detection across distributed sources. PVLDB, 1(1):66–77, 2008.

[17] Lisa Amini, Navendu Jain, Anshul Sehgal, Jeremy Silber, and Olivier Verscheure. Adaptive control of extreme-scale stream processing systems. In ICDCS ’06: Proceedings of the 26th IEEE International Conference on Distributed Computing Systems, page 71, Washington, DC, USA, 2006.

IEEE Computer Society.

[18] V. Bortnikov, G. V. Chockler, A. Roytman, and M. Spreitzer. Bulletin Board: A Scalable and Robust Eventually Consistent Shared Memory over a Peer-to-Peer Overlay. In ACM LADIS 2009, 2009.

[19] G. V. Chockler, I. Keidar, and R. Vitenberg. Group communication specifications: a comprehensive study.

ACM Computer Survey, 33(4):427–469, 2001.

[20] T. Espiner. Symantec warns of router compromise.

http://www.zdnetasia.com/news/security/0,39044215, 62036991,00.htm, 2010.

[21] P. T. Eugster, P.A. Felber, R. Guerraoui, and

A. Kermarrec. The many faces of publish/subscribe. ACM Computer Survey, 35(2):114–131, 2003.

[22] Y. Huang, N. feamser, and A. Lakhina nad J. J. Xu.

Diagnosing network disruptions with network-wide analysis. In SIGMETRICS’07, San Diego, California, USA, 12-16 June 2007.

[23] Dean Jeffrey and Sanjay Ghemawat. MapReduce:

simplified data processing on large clusters. Commun.

ACM, 51(1):107–113, 2008.

[24] A. Klein. BIND 9 DNS cache poisoning. http://www.

trusteer.com/files/BIND_9_DNS_Cache_Poisoning.pdf, 2010.

[25] M. E. Locasto, J. J. Parekh, A. D. Keromytis, and S. J.

Stolfo. Towards collaborative security and p2p intrusion detection. In IEEE Workshop on Information Assurance and Security, United States Military Academy, West Point, NY, 15-17 June 2005.

[26] G. Lodi, F. Panzieri, D. Rossi, and E. Turrini. SLA-Driven Clustering of QoS-aware Application Servers. IEEE Transaction on Software Engineering, 33(3), 2007.

[27] P. Poncelet N. Verma, F. Trousset and F. Masseglia.

Intrusion Detections in Collaborative Organizations by Preserving Privacy. In Advances in Knowledge Discovery and Management, December 2009.

[28] Nagios. Nagios. http://www.nagios.org, 2010.

[29] P. R. Pietzuch. Hermes: A Scalable Event-Based Middleware. In Ph.D. Thesis, University of Cambridge.

[30] C. Tang, M. Steinder, M. Spreitzer, and G. Pacifici. A Scalable Application Placement Controller for Enterprise Data Centers. In 16th international Conference on World Wide Web, 2007.

[31] TriCipher. The perfect storm: Man in the middle attacks, weak authentication and organized online criminals.

http://www.tricipher.com/landing_pages/spotlight_

offer.html, 2010.

[32] Y. Xie, V. Sekar, M. K. Reiter, and H. Zhang. Forensic analysis for epidemic attacks in federated networks. In ICNP, pages 143–53, 2006.

[33] A. N. Klingsheim Y. Espelid, L. Netkand and K. J. Hole.

Robbing banks with their own software - an exploit against Norwegian online banks. In IFIP 23rd International Information Security Conference, September 2008.

[34] G. Zhang and M. Parashar. Cooperative detection and protection against network attacks using decentralized information sharing . Cluster Computing, 13(1):67–86, 2010.

[35] Xiaolan J. Zhang, Henrique Andrade, Bu˘gra Gedik, Richard King, John Morar, Senthil Nathan, Yoonho Park, Raju Pavuluri, Edward Pring, Randall Schnier, Philippe Selo, Michael Spicer, Volkmar Uhlig, and Chitra Venkatramani. Implementing a high-volume, low-latency market data processing system on commodity hardware using ibm middleware. In WHPCF ’09: Proceedings of the 2nd Workshop on High Performance Computational Finance, pages 1–8, New York, NY, USA, 2009. ACM.

[36] C. V. Zhou, S. Karunasekera, and C. Leckie. A peer-to-peer collaborative intrusion detection system. In 13th IEEE International Conference on Networks, Kuala Lumpur, Malaysia, November 2005.

[37] C. V. Zhou, C. Leckie, and S. Karunasekera. A survey of coordinated attacks and collaborative intrusion detection.

Computer and Security 29 (2010), pages 124–140, 2009.

Riferimenti

Documenti correlati

Il dibattito avviato nel corso dei tavoli di lavoro ha ripercorso le esperienze realizzate per promuovere i contenuti e gli obiettivi del curriculum ufficiale (previsto a livello

The reference framework for evaluating the sustainability of the small-scale supply chain through the proposed methodology (SAEMETH) is therefore an attempt to make the concept

The goal of this work is to define a general approach based on the DSS concept for the management of complex systems in transportation and logistics and to apply it to three problems

In view of the presented results we note that the design choice of embed- ding a ILP solver into the proposed constraint generation based approach wins over more naive combinations

Table 1 Characteristics of ACC patients developing COVID-19 symptoms ID Sex Age (years) Secretory Status a Ant ineoplastic Therapy Disease Stage Comorbidities Mito tane treatment

Sfilano così le ope- re degli eredi napoletani di Caravaggio care a Marco Antonio (con il Battistello Caracciolo dell’Università di Torino commissionato dal Doria al pittore,

We shall restrict to the orthogonal 4pt tensor structures (3.43), which in general allow us to access the weighted averages of the expansion coefficients of t- and u-channel CPWs into

La presenza di un quadro normativo disorganico è probabilmente dovuta anche al fatto che tale tematica risulta essere comunque di carattere “trasversale”, nel