Distributional Models of Emotions for Sentiment Analysis in Social Media

(1)

i i “output” — 2017/1/12 — 11:30 — page IX — #11 i i i i i i

Summary of PhD Achievements

Research Activity

During the PhD program, research activity has been mainly focused on two research areas: Text Affective Computing and Information Extraction.

Text Affective Computing

In order to implement algorithms for the recognition of sentiment and emotions in texts, we created two main lexical resources, namely the Italian EMotive Lexicon (ItEM) and FB-NEWS15, that have been described in this thesis.

The construction of ItEM started in 2014/2015 with the collection of a small set of seed words using an online feature elicitation paradigm. The output of this phase was a small lexicon of highly emotive lemmas. In a second phase (period 2015/2016), ItEM was expanded using distributional methods [154].

During the last year we refined the process of seed expansion using context selection strategies [155]. Moreover, we collected the corpus FB-NEWS15 [150], composed of Facebook Italian posts (time span: Jan 2015- Dec 2015), created by crawling the Facebook pages of the most important Italian newspapers. The corpus includes a high number of comments and contains useful material to study the social media language.

Information Extraction

The main project we were involved in during the PhD was SEMPLICE - SEMantic instruments for PubLIC admnistrators and CitizEns (POR CReO 2007 – 2013). It is a 2-year project funded by Regione Toscana aimed at the development of NLP-based tools for knowledge management, information extraction and opinion mining for local public administrations. This project involved University of Pisa and Tuscany Com-panies like 01S, BNova, Seacom. The main applications developed for this project concern (i) Named Entity Recognition; (ii) Term Extraction; (iii) Topic modeling and labeling; (iv) Rule-based information extraction.

(2)

i i “output” — 2017/1/12 — 11:30 — page X — #12 i i i i i i Other

Other domains we were involved in include Digital Humanities (DH) and Computa-tional Lexicography (CL). In DH, we collaborated in the project Memorie di guerra, in which we applied state-of-the-art NLP techniques to war memories and historical docu-ments [152]. In CL we were involved in the project Combinet to develop computational methods for extracting distributional information from corpora [31, 136]

International Conferences/Workshops with Peer Review

[C1] Boschetti, F., Cimino, A., Dell’Orletta, F., Lebani, G. E., Passaro, L. C., Picchi, P., Venturi G., Montemagni S. and Lenci, A. (2014). Computational Analysis of Historical Documents: An Application to Italian War Bulletins in World War I and II. In Proceedings of the LREC 2014 Workshop LRT4HDA. (pp. 70-76). Reykjavík (Iceland), May 2014.

[C2] Passaro, L. C. and Lenci, A. (2014). “Il Piave mormorava. . . ”: Recognizing Lo-cations and other Named Entities in Italian Texts on the Great War. In Proceedings of CLiC-it 2014. (pp. 286–290). Pisa (Italy), December 2014.

[C3] Passaro, L. C. and Lenci, A. (2015). Extracting terms with EXTra. In Proceed-ings of EUROPHRAS 2015.(pp. 188–196). Málaga (Spain), July 2015.

[C4] Castagnoli, S., Lebani, G. E., Lenci, A., Masini, F., Nissim, M. and Passaro, L. C. (2015). POS-patterns or Syntax? Comparing methods for extracting Word Combinations. In Proceedings of EUROPHRAS 2015. (pp. 116-128). Málaga (Spain), July 2015.

[C5] Passaro L. C., Pollacci, L. and Lenci, A. (2015). ItEM: A Vector Space Model to Bootstrap an Italian Emotive Lexicon. In Proceedings of CLiC-it 2015. (pp. 215–220). Trento (Italy), December 2015.

[C6] Nissim, M., Castagnoli, S., Masini, F., Lebani, G. E., Passaro, L. C. and Lenci, A. (2015). Automatic extraction of Word Combinations from corpora: evaluating methods and benchmarks. In Proceedings of CLiC-it 2015. (pp. 204-208). Trento (Italy), December 2015.

[C7] Passaro, L. C. and Lenci, A. (2016). Evaluating Context Selection Strategies to Build Emotive Vector Space Models. In Proceedings of LREC 2016. (pp. 2185-2191). Portorož (Slovenia), May 2016.

[C8] Passaro, L. C., Bondielli, A. and Lenci, A. (2016). FB-NEWS15: A Topic-Annotated Facebook Corpus for Emotion Detection and Sentiment Analysis. In Proceedings of CLiC-it 2016. (pp. 228-232). Napoli (Italy), December 2016.

Others

[O1] Passaro, L. C., Lebani, G. E., Pollaci, L., Chersoni, E., and Lenci, A. (2014). The CoLing Lab system for Sentiment Polarity Classification of tweets. In Pro-ceedings of EVALITA 2014. (pp. 87-92). Pisa (Italy), December 2014.

(3)

i i “output” — 2017/1/12 — 11:30 — page XI — #13 i i i i i i

[O2] Passaro, L. C., Bondielli, A. and Lenci, A. (2016). Exploiting Emotive Features for the Sentiment Polarity Classification of tweets. In Proceedings of EVALITA 2016. (pp. 205-210). Napoli (Italy), December 2016.