i i “output” — 2017/1/11 — 16:39 — page I — #7 i i i i i i
Summary of Ph.D. Activities
Research activityDuring the three years Ph.D. programme, the candidate’s activities mainly focused on research and development within the field of (research) data infrastructures in support of science. In particular, the candidate participated to the EU projects OpenAIREplus (Call: FP7-INFRASTRUCTURES-2011-2, grant agreement: 283595) OpenAIRE20201 (Call: EINFRA-2014-1, grant agreement: 643410), SoBigData (Call: H2020-INFRAIA-2014-2015, grant agreement: 654024) and EAGLE (Call: CIP-ICT-PSP-2012-6, grant agreement: 325122).
Data Infrastructures (DIs) consist in large ICT (eco)systems constituted by data storage and processing services and components possibly combined into data flows so as to enable arbitrarily complex data manipulation actions serving the consumption needs of DI customers, be them humans or machines. The data resulting from the execution of data flows represent an important asset both for the DI users, typically craving for the information they need, and for the organization (or community) operating the DI, whose existence and cost sustainability depends on the adoption and usefulness of the DI. Hence, it is vital to provide guarantees on the “correctness” of the DI data flows behaviour over time, to be somehow quantified in terms of “data quality” and in terms of “processing quality”. However, the lack of assurances from aggregated data sources, the occurrence at any level of unexpected errors of any sort and the ever-changing nature of such infrastructures (in terms of algorithms and data flows) makes this “validation process” far from trivial.
For this purpose, the candidate’s Ph.D. research mainly focused on the study of the open monitoring issues of the DIs developed by the candidate’s research group (e.g. OpenAIRE), and on the design of solutions for monitoring data flows quality in DIs. The research efforts led to the design and realization of MoniQ, a system providing a flexible way to outline monitoring scenarios, define the semantics and time ordering of metrics to be sampled from the data flow, gather observations about these metrics from DI’s data flows instances and detect promptly anomalies potentially harmful or disrupting for the DI or for its stakeholders. Since 2015, MoniQ has been actively employed in order to
1OpenAIRE project, https://www.openaire.eu
i i “output” — 2017/1/11 — 16:39 — page II — #8 i i i i i i monitor the production environment of the OpenAIRE infrastructure.
The candidate spent a visiting period (May – June 2016) at the Knowledge Media Institute (KMi) of the Open University, based in Milton Keynes, UK. The proposal of the collaboration has been selected for funding within the Grant for Young Mobility (GYM) funding scheme provided by the CNR, and dealt with the study of the CORE infrastructure2in order to understand its internals, dynamics, data flows and analyze its
yet open concerns in data flow quality monitoring. The joint work helped extracting new use cases for testing MoniQ and assess its theoretical foundation and practical implications. Furthermore, the collaboration helped evaluating to which extent the proposed approach and framework are flexible and generic in capturing common aspects of data flow quality monitoring. The activities carried out jointly covered the following points:
• study of the CORE infrastructure;
• study of CORE data flows and identification of key monitoring concerns and requirements of the infrastructure;
• validation and enhancement of MoniQ w.r.t. the identified requirements. The objective pursued by this collaboration is threefold:
• assessment and validation of the approach to data flow quality monitoring and assessment of its genericity and flexibility;
• production of new use cases to be described in this thesis; • test of MoniQ in a real-world context other than OpenAIRE.
The experimentation led to the realization of another experimental monitoring in-stance based on MoniQ monitoring the production environment of the CORE infrastruc-ture.
Publications
International Journals
1. Artini, M., Atzori, C., Bardi, A., La Bruzzo, S., Manghi, P. and Mannocci, A. (2015). The OpenAIRE Literature Broker Service for Institutional Repositories. In D-Lib Magazine, 21(11), 3.
2. Manghi, P., Artini, M., Atzori, C., Bardi, A., Mannocci, A., La Bruzzo, S., Candela, L., Castelli, D. and Pagano, P. (2014). The D-NET software toolkit: A framework for the realization, maintenance, and operation of aggregative infrastructures. In Program, 48(4), 322-354.
International Conferences/Workshops with Peer Review
1. Mannocci, A. and Manghi, P. (2016, September). DataQ: A Data Flow Quality Monitoring System for Aggregative Data Infrastructures. In 20th International Conference on Theory and Practice of Digital Libraries(TPDL) (pp. 357-369). Springer International Publishing.
i i
“output” — 2017/1/11 — 16:39 — page III — #9
i i i i i i 2. Mannocci A., Casarosa V., Manghi P. and Zoppi F. (2016, January). The EAGLE
data aggregator: data quality monitoring. In 7th EAGLE International Conference. 3. Mannocci, A., Casarosa, V., Manghi, P. and Zoppi, F. (2015, January) The EAGLE
Europeana network of Ancient Greek and Latin Epigraphy: a technical perspective. In 11th Italian Research Conference on Digital Libraries (IRCDL).
4. Mannocci, A., Casarosa, V., Manghi, P. and Zoppi, F. (2014, November). The Europeana Network of Ancient Greek and Latin Epigraphy Data Infrastructure. In 8th Research Conference on Metadata and Semantics Research (MTSR) (pp. 286-300). Springer International Publishing.
5. Casarosa, V., Manghi, P., Mannocci, A., Rivero Ruiz, E. and Zoppi, F. (2014) A Conceptual Model for Inscriptions: Harmonizing Digital Epigraphy Data Sources. In 1st EAGLE International Conference on Information Technologies for Epigraphy and Digital Cultural Heritage in the Ancient World(pp. 29-30).
Others
1. Mannocci A. (2015). Data Flow Monitoring: system requirements and design. OpenAIRE2020. Deliverable D8.4.
2. Mannocci A. (2014, November). Monitoring data quality in research data infras-tructures. In 4th RDA plenary meeting. Poster session.