• Non ci sono risultati.

Data Flow Quality Monitoring in Data Infrastructures

N/A
N/A
Protected

Academic year: 2021

Condividi "Data Flow Quality Monitoring in Data Infrastructures"

Copied!
3
0
0

Testo completo

(1)

i i “output” — 2017/1/11 — 16:39 — page I — #7 i i i i i i

Summary of Ph.D. Activities

Research activity

During the three years Ph.D. programme, the candidate’s activities mainly focused on research and development within the field of (research) data infrastructures in support of science. In particular, the candidate participated to the EU projects OpenAIREplus (Call: FP7-INFRASTRUCTURES-2011-2, grant agreement: 283595) OpenAIRE20201 (Call: EINFRA-2014-1, grant agreement: 643410), SoBigData (Call: H2020-INFRAIA-2014-2015, grant agreement: 654024) and EAGLE (Call: CIP-ICT-PSP-2012-6, grant agreement: 325122).

Data Infrastructures (DIs) consist in large ICT (eco)systems constituted by data storage and processing services and components possibly combined into data flows so as to enable arbitrarily complex data manipulation actions serving the consumption needs of DI customers, be them humans or machines. The data resulting from the execution of data flows represent an important asset both for the DI users, typically craving for the information they need, and for the organization (or community) operating the DI, whose existence and cost sustainability depends on the adoption and usefulness of the DI. Hence, it is vital to provide guarantees on the “correctness” of the DI data flows behaviour over time, to be somehow quantified in terms of “data quality” and in terms of “processing quality”. However, the lack of assurances from aggregated data sources, the occurrence at any level of unexpected errors of any sort and the ever-changing nature of such infrastructures (in terms of algorithms and data flows) makes this “validation process” far from trivial.

For this purpose, the candidate’s Ph.D. research mainly focused on the study of the open monitoring issues of the DIs developed by the candidate’s research group (e.g. OpenAIRE), and on the design of solutions for monitoring data flows quality in DIs. The research efforts led to the design and realization of MoniQ, a system providing a flexible way to outline monitoring scenarios, define the semantics and time ordering of metrics to be sampled from the data flow, gather observations about these metrics from DI’s data flows instances and detect promptly anomalies potentially harmful or disrupting for the DI or for its stakeholders. Since 2015, MoniQ has been actively employed in order to

1OpenAIRE project, https://www.openaire.eu

(2)

i i “output” — 2017/1/11 — 16:39 — page II — #8 i i i i i i monitor the production environment of the OpenAIRE infrastructure.

The candidate spent a visiting period (May – June 2016) at the Knowledge Media Institute (KMi) of the Open University, based in Milton Keynes, UK. The proposal of the collaboration has been selected for funding within the Grant for Young Mobility (GYM) funding scheme provided by the CNR, and dealt with the study of the CORE infrastructure2in order to understand its internals, dynamics, data flows and analyze its

yet open concerns in data flow quality monitoring. The joint work helped extracting new use cases for testing MoniQ and assess its theoretical foundation and practical implications. Furthermore, the collaboration helped evaluating to which extent the proposed approach and framework are flexible and generic in capturing common aspects of data flow quality monitoring. The activities carried out jointly covered the following points:

• study of the CORE infrastructure;

• study of CORE data flows and identification of key monitoring concerns and requirements of the infrastructure;

• validation and enhancement of MoniQ w.r.t. the identified requirements. The objective pursued by this collaboration is threefold:

• assessment and validation of the approach to data flow quality monitoring and assessment of its genericity and flexibility;

• production of new use cases to be described in this thesis; • test of MoniQ in a real-world context other than OpenAIRE.

The experimentation led to the realization of another experimental monitoring in-stance based on MoniQ monitoring the production environment of the CORE infrastruc-ture.

Publications

International Journals

1. Artini, M., Atzori, C., Bardi, A., La Bruzzo, S., Manghi, P. and Mannocci, A. (2015). The OpenAIRE Literature Broker Service for Institutional Repositories. In D-Lib Magazine, 21(11), 3.

2. Manghi, P., Artini, M., Atzori, C., Bardi, A., Mannocci, A., La Bruzzo, S., Candela, L., Castelli, D. and Pagano, P. (2014). The D-NET software toolkit: A framework for the realization, maintenance, and operation of aggregative infrastructures. In Program, 48(4), 322-354.

International Conferences/Workshops with Peer Review

1. Mannocci, A. and Manghi, P. (2016, September). DataQ: A Data Flow Quality Monitoring System for Aggregative Data Infrastructures. In 20th International Conference on Theory and Practice of Digital Libraries(TPDL) (pp. 357-369). Springer International Publishing.

(3)

i i

“output” — 2017/1/11 — 16:39 — page III — #9

i i i i i i 2. Mannocci A., Casarosa V., Manghi P. and Zoppi F. (2016, January). The EAGLE

data aggregator: data quality monitoring. In 7th EAGLE International Conference. 3. Mannocci, A., Casarosa, V., Manghi, P. and Zoppi, F. (2015, January) The EAGLE

Europeana network of Ancient Greek and Latin Epigraphy: a technical perspective. In 11th Italian Research Conference on Digital Libraries (IRCDL).

4. Mannocci, A., Casarosa, V., Manghi, P. and Zoppi, F. (2014, November). The Europeana Network of Ancient Greek and Latin Epigraphy Data Infrastructure. In 8th Research Conference on Metadata and Semantics Research (MTSR) (pp. 286-300). Springer International Publishing.

5. Casarosa, V., Manghi, P., Mannocci, A., Rivero Ruiz, E. and Zoppi, F. (2014) A Conceptual Model for Inscriptions: Harmonizing Digital Epigraphy Data Sources. In 1st EAGLE International Conference on Information Technologies for Epigraphy and Digital Cultural Heritage in the Ancient World(pp. 29-30).

Others

1. Mannocci A. (2015). Data Flow Monitoring: system requirements and design. OpenAIRE2020. Deliverable D8.4.

2. Mannocci A. (2014, November). Monitoring data quality in research data infras-tructures. In 4th RDA plenary meeting. Poster session.

Riferimenti

Documenti correlati

«La vista se deslumbra al fin en medio de tantas maravillas» 36 , annota Sarmiento durante il suo viaggio in Italia, e l’incantesimo provato è stato spiegato da San- hueza a

The vibrational characterization of the bracket was based on Frequency Response Function (FRF) analyses which, conducted via FEM (Finite Element Method), allowed to identify

The Biological Research Institute that I represent was founded by a great support of the scientific staff of the University of Lecce and Bari regarding the direction of the study

Conversely, nighttime systolic pressure was positively related to temperature (P ⬍0.02), with hot days being associated with higher nighttime pressure. Air temperature was identified

3 E.. Prospero is omniscient, in some actions the spectators see him, but when he is not visible, he is however present and active thanks to his spirits, servants

Using the model configuration of these two simulations we performed an OSSE in the Marmara Sea by mimicking a ferrybox network in the eastern basin to see the impact of temperature

Pioneers of biofilms studies observed for different species that there was a first stage, in which the attachment of the cells to the surface was weak, and cells were able to

The corresponding AFM scans and Raman maps of FWHM(2D) are shown in e)-h): and i)-l): respectively. The location of the Raman maps is marked by the dashed black rectangle in a)-d).