• Non ci sono risultati.

Algorithms for Knowledge and Information Extraction in Text with Wikipedia

N/A
N/A
Protected

Academic year: 2021

Condividi "Algorithms for Knowledge and Information Extraction in Text with Wikipedia"

Copied!
2
0
0

Testo completo

(1)

Report on the PhD Activities

Marco Ponza

February 19, 2019

Research Activities

Marco Ponza’s PhD thesis focuses on the design of algorithms for the extraction of knowledge (in terms of entities belonging to a knowledge graph) and information (in terms of open facts) from text through the use of Wikipedia as main repository of world knowledge.

The first part of the dissertation focuses on research problems that specifically lie in the domain of knowledge and information extraction. In this context, Ponza contributes to the scientific literature with the following three achievements: first, he studies the problem of computing the relatedness between Wikipedia entities, through the introduction of a new dataset of human judgements complemented by a study of all entity relatedness measures proposed in recent literature as well as with the proposal of a new computationally lightweight two-stage framework for relatedness computation; second, he studies the problem of entity salience through the design and implementation of a new system that aims at identifying the salient Wikipedia entities occurring in an input text and that improves the state-of-the-art over different datasets; third, he introduces a new research problem called fact salience, which addresses the task of detecting salient open facts extracted from an input text, and he proposes, design and implement the first system that efficaciously solves it.

In the second part of the dissertation Ponza studies an application of knowledge extraction tools in the domain of expert finding. He proposes a new system which hinges upon a novel profiling technique that models people (i.e., experts) through a small and labeled graph drawn from Wikipedia. This new profiling technique is then used for designing a novel suite of ranking algorithms for matching the user query and whose effectiveness is shown by improving state-of-the-art solutions.

Training Activities

Schools:

Bertinoro International Spring School 2016 (BISS 2016) The school was held in the University

Residential Center di Bertinoro (FC) (6 -11 March). The candidate attended the following 3 courses (and he passed the related exam):

1. Algorithmic methods for mining large graphs

Lecturer: Prof. Aristides Gionis (Aalto University, Finland) 2. Advanced Topics in Programming Languages

Lecturer: Prof. Giuseppe Castagna (Universit Paris Diderot - Paris 7, France) 3. Models and Languages for Service-Oriented and Cloud Computing

Lecturer: Prof. Gianluigi Zavattaro (University of Bologna, Italy) 1

(2)

Courses:

• Course “Machine Learning Techniques and Selected Applications for Big Data”

Lecturer: Prof. Stan Matwin (Dalhousie University, Canada)

• Course “Searching by Similarity on a Very Large Scale”

Lecturer: Prof. Giuseppe Amato (CNR Pisa, Italy)

Seminars Cycles:

• Seminar at GATE Summer School (2016) • PhD+ 2016

• Research, Innovation and Future of ICT (2018)

Period Abroad

• Max Planck Institute for Informatics, Saarbrcken (Germany)

From August 2017 to October 2017 and from November 2017 to February 2018

Publications

M. Ponza, F. Piccinno and P. Ferragina. Document Aboutness via Sophisticated Syntactic and Semantic Features. In Proceedings of the 2017 International Conference on Natural Language and Information

Systems. NLDB 2017, pages 441–453, Lecture Notes in Computer Science, Springer.

M. Ponza, P. Ferragina and S. Chakrabarti. A Two-Stage Framework for Computing Entity Relatedenss in Wikipedia. In Proceedings of the 2017 International Conference on Conference on Information

and Knowledge Management, CIKM 2017, pages 1867–1876, ACM.

M. Ponza, L. Del Corro and G. Weikum. Facts That Matter. In Proceedings of the 2018 Conference on

Empirical Methods in Natural Language Processing, EMNLP 2018, pages 1043–1048, ACL.

P. Cifariello, P. Ferragina and M. Ponza. WISER: A Semantic Approach for Expert Finding in Academia based on Entity Linking. Information Systems 2019, pages 1–16, Elsevier.

M. Ponza, F. Piccinno and P. Ferragina. SWAT: A System for Detecting Salient Wikipedia Entities in Texts. Under review at Computational Intelligence, Wiley.

Riferimenti

Documenti correlati

The study of high lying excited states within a superdeformed well provides opportunities to investigate many intriguing nuclear structure aspects such as: the transition from

▷ Literature currently offers a number of solutions based on BoW (Harris, Word’54) A text is a vector of ambiguous keywords :. ▷ LDA/LSI (Huffman, NIPS’10) and Word Embeddings

Murch, “Robust Joint Interference Detection and Decoding for OFDM-Based Cognitive Radio Systems With Unknown Interference”, IEEE JOURNAL ON SELECTED AREAS IN

Two algorithms were developed, aiming at matching an input string or a list of taxa to the reference database, either returning the closest match (in the case of the

Let us mention that further developments of Theorem 2.1.1 allow in particular to recover a result of [K-S 2] on the hyperbolic Cauchy problem in the frame of hyperfunctions (cf..

In this way, the main objectives of the new Espaço Ciência Viva site are: (i) to be a trustworthy source of information; (ii) to be a space for the

Nesta nova etapa, a cobertura jornalística de importantes eventos na área de popularização e divulgação científica passou a ser priorizada e, entre os anos de 2005 e 2007,

Il 15 marzo, data del primo avviso ai viaggiatori sulla Sars, l’Organizzazione Mondiale della Sanità invita 11 laboratori dislocati in 9 paesi a prendere parte a un progetto