• Non ci sono risultati.

1 1

N/A
N/A
Protected

Academic year: 2021

Condividi "1 1"

Copied!
4
0
0

Testo completo

(1)

1 1 C h a p t e r 1

INTRODUCTION

The vision of Grid computing was introduced by Ian Forster and Carl Kesselman with the publication of their book “The Grid: Blueprint for a New Computing Infrastructure” in July 1998 [1]. The idea is to virtualize computing, with the goal of creating a utility computing model over a distributed set of resources. Figure 1.1 helps to visualize the concepts.

Figure. 1.1 Virtualization of computing and storage resources

Within a single computer, standard elements including the processor, storage, operating system, and I/O exist. The concept of Grid computing is to create a similar environment, over a distributed area, made of heterogeneous elements including servers, storage devices, and networks – a scalable, wide-area computing platform.

The Grid Middleware is the software that handles the coordination of the participating elements. It is analogous to the operating system of a computer.

A Grid Service is a special service that contributes to make the Grid infrastructure available to users. It is analogous to an operating system component such as the filesystem or the memory manager.

The Grid middleware is organized in a “Grid service” layered stack for performing the various operations. The Grid services provide users with standard and uniform interfaces over all the sites participating to the Grid.

A Grid Resource is component of the system that provides or hosts services and may enforce access to these services based on a set of rules and policies defined by entities that are authoritative for the particular resource.

(2)

2 2

Typical resources in Grid environments might be a computer providing compute cycles or data storage through a set of services it offers. Access to resources may be enforced by a Resource itself or by some entity (a policy enforcement point, gateway) that is located between a resource and the requestor thus protecting the resource from being accessed in an unauthorized fashion.

No matter what the hardware and software solutions used to create a Grid participating site are, users can always use a transparent and standard interface to access the available resources. Also many different Grid services can be used for performing the same task. Such Grid services publish their interfaces that can be therefore used by other Grid services as well as by higher-level user applications.

1.1 Problem Statement

As it happens on a local computer, application specific persistent data and metadata in a Grid are stored on disk or tape systems. Whether running on a LAN cluster or over a Grid infrastructure, a computational user task often needs to access data in order to calculate the expected result. The output of such a task might be of interest to other communities of users or can be used for further processing. No matter what the underlying storage technology is, a certain number of functionalities need to be guaranteed to an application. First of all the availability of input data must be guaranteed to the application. Such data set must stay on the device accessed by the application for the entire life of the job accessing it.

• Mechanisms of file pinning must guarantee that a storage garbage collector does not remove files used by the application.

The application must be enabled to use the data access protocol encoded in the application to access the data.

• The storage device offering the data has to support such access protocols.

• Authorization and security policies (local or global) need to be enforced during data access. Before moving jobs that generate large output files to Grid computing systems with available CPU capacity, a Grid scheduler should check for availability of the required space and then allocate it.

• Space reservation is another important requirement especially when using the Grid.

• The possibility to manage disk-space via the Grid becomes essential for running data-intensive applications on the Grid.

• Data movement and replication are also important functions to guarantee optimized access to files.

• Finally, services that guarantee consistency of modifiable files need storage services offering features such as file locking.

1.2 Contribution of this Thesis

This thesis has been carried out at CERN within the WLCG Project. The aim of the project is to provide the four LHC experiments at CERN with a production ready Grid infrastructure distributed all over the world that allows for the storage, the processing and the analysis of data produced by the four detectors

(3)

3 3

of physics particle collisions positioned on the Large Hadrons Collider (LHC) accelerator running at CERN.

Each of the four HEP experiments registers collision events generated by the particles accelerated by the LHC. The raw data generated can be of several Petabytes for each of the experiments. Each member of the worldwide-distributed collaboration of physicists needs to have efficient, easy, and transparent access to the data in order to analyze them and contribute to the physics discovery. Therefore, data management and storage access are important factors in the WLCG Grid infrastructure.

Among the original contributions of this thesis work we list the following:

1. A comprehensive overview of the state of art for what concerns storage management in the Grid, open issues and current developments.

2. A study of the Storage Resource Management (SRM) interface, as the attempt to define and propose a common interface to the diverse storage solutions adopted by the distributed computing centers.

3. A formal model of the SRM protocol as defined by the interface as the set of ordered interactions between client and server.

4. The proposal of a model for a schema to be used to publish the information related to the SRM based storage services.

5. The application of the testing black box methodology to the SRM in order to find out incompleteness and incoherence in the specification.

6. A study on how to reduce the number of tests to be designed to validate the protocol implementations.

7. A presentation and analysis of the results obtained during the test phase.

8. A discussion on the limitations and open issues of the current available version of SRM.

9. A proposal for a new version of the SRM protocol implementing the need for quota management and lock functions besides solving the intrinsic problems posed by the current versions.

1.3 Outline

In this work a comprehensive introduction to Grid computing pointing out the key aspects, general components and existing implementations can be found in Chapter 2. In Chapter 3 we give an introduction to the storage problem in the Grid outlining the requirements for LHC experiments. In Chapter 4, we examine some of the existing storage solutions proposed by the different vendors from hardware and software perspectives and their characteristics interesting for implementing Grid solutions. In particular, we give an overview on existing distributed and parallel file systems and we described StoRM, a disk based storage system based on parallel filesystems such as GPFS. StoRM has been used to initially verify the feasibility of the proposed SRM protocol. In Chapter 5 we give an overview of the file access and transfer protocols used by HEP applications running on the WLCG infrastructure. We also introduce the GridFTP protocol, one of the very first transfer protocols Grid enabled that is in wide use today in the Grid infrastructure for E-Science. In Chapter 6 we analyze the Storage Resource Manager (SRM) version 2.2, an attempt to standardize storage management and access in the Grid. In particular we define a formal model for the protocol behind the interface. In Chapter 7, we propose a model and a schema to publish in the Grid the information related to an SRM based storage service. We introduce new concepts such as Storage Areas and Components that will be exposed only in version 3 of the SRM protocol. In Chapter 8 we present the application of the black box testing methodology to SRM to check

(4)

4 4

the consistency and coherency of the specification and validate existing implementation. The result are also presented and analyzed together with some notes on the practical lessons learned. Finally in Chapter 9 we give some conclusive comments. We discuss the limitations and the open issues present in version 2.2 of SRM and we introduce version 3, currently under definition. We also give an overview of possible future work in this area in order to achieve a completely functional storage solution for Grid.

Riferimenti

Documenti correlati

Pomodoro, mozzarella, prosciutto, funghi, carciofi, salsiccia dolce, wurstel, insalatina. Piazza Brà

Le Zebre invece ripartono dalla vittoria in casa contro i Dragons e dall’ottimo secondo tempo della sconfitta di Glasgow dove i bianconeri riuscirono a cogliere un parziale di 5-0

Johan MEYER (Zebre Rugby Club, 4 caps) Sebastian NEGRI (Benetton Rugby, 22 caps) Jake POLLEDRI (Gloucester Rugby, 13 caps) Abraham STEYN (Benetton Rugby, 36 caps). Mediani

Its focus embraces topics of strategic relevance such as European integration, security and defence, international economics and global governance, energy, climate and Italian

There are 2 main types of masks used to prevent respiratory infection: surgical masks, sometimes referred to as face masks, and respirators.. These masks differ by the type and size

• Blocking the synthesis or function of aldosterone has also been demonstrated to improve diastolic dysfunction in hypertensive patients with diastolic heart failure (34) and to

Answer: A rapidly growing dome-shaped papule or nodule with a central dull pebbly core.. What are the secondary lesions that you would expect to find in

Al convegno erano presenti anche Tony Siino, social media strategist della Regione Siciliana, nuovo socio di PA Social, che ha illustrato il modo di comunicare social della