- 1 -
U N I V E R S I T À D I P I S A
DIPARTIMENTO DI INGEGNERIA DELL’INFORMAZIONE Dottorato di Ricerca in Ingegneria dell’Informazione
Activity Report by the Student Fabio Del Vigna - cycle XXX PhD program
Tutor(s): Prof. Marco Avvenuti, Dr. Maurizio Tesconi
1. Research Activity
Over the last few years, the pervasiveness of Social Media within the Web landscape has been overwhelming. The most important social networking sites have achieved significant global cover-age, with Facebook hosting over 2 billion registered users, followed by YouTube, Twitter and Chi-nese Social Media such as Sina Weibo. There was also an important growth in instant messaging tools such as FB Messenger, WhatsApp, SnapChat, WeChat, and so on.
These platforms are connected to services and features ranging from live video to news sharing or product reviews, often in real-time fashion. Thanks to their multimedia nature, the speed of in-formation spreading and coverage in terms of users, Social Media platforms have become a source of rich and fresh information, very interesting to study.
The exploitation of Social Media content enables data scientists to perform situational awareness. Since the first year of my Ph.D. I pursued the detection and monitoring of events using Social Me-dia. In particular, such content can be used to detect and monitor mass emergencies, often caused by natural disasters such as earthquakes and floods, but also by widespread incidents or terrorist attacks.
During my Ph.D. I applied data analysis techniques to messages exchanged consequently to mass emergencies, enabling a prompt detection of the events [C2]. Furthermore, textual content can be used to estimate the damage suffered by different areas and to generate crisis maps in the after-math of natural disasters [J1]. I carried out a study on Amatrice's earthquake case (August 2016) [J3] highlighting that, after an hour, the messages shared on Twitter by the users were sufficient to identify the most affected areas and towns that required urgent intervention by emergency re-sponders, such as the civil protection.
To be competing with the state-of-the-art text classifiers, I made use of an NLP text classifier [J1] for Italian texts and a text classifier based on Word Embeddings [J3], which is not limited to a spe-cific language. This makes the system versatile since it is open to any written language, provided an adequate training.
To support event detection and mapping at best I developed a crawling process oriented to data analysis and designed for change. In fact, the architecture I present in [J3] is based on Big Data technologies and able to deal with real-time data streams using incremental updates. In particular, I took care of the scalability of the system, making it able to analyze messages with a multiprocess and robust approach. I introduced Elasticsearch as storage backend in the system and designed Kibana interactive visualizations that draw choropleth maps [J1, J3]. With the use of Word
Embed-- 2 Embed--
dings and the integration with commercial tools, I have made the code more portable and easily maintainable.
I also challenged an important phenomenon affecting Social Media today that is the hate speech. It causes communication disruption and may also have serious real-life implications but can be tackled using techniques close to those described in [C5]. In particular, using a Machine Learning approach, I realized a tool for monitoring the hate speech on Facebook pages, possibly reporting anomalies when the hate level exceeds a warning threshold [C5]. I trained the application on Itian Social Media messages that have been manually annotated. The system uses two different al-gorithms to classify content (SVM and LSTM) and its performance is comparable to those of the most advanced hate speech classifiers for the English language. The hate speech detection is par-ticularly required in case of minors and young people, due to the sadly known phenomenon of cyberbullying, or acts of hatred and violence along the streets (e.g., black blocks) but originated on Social Networks as topics or threads containing flame discussions.
Another important work I carried out on Social Media issues regards the possibility to spot anon-ymized identities in Social Media documents. Such result is significant when dealing with messages that have been purposely obfuscated for political or military reasons (e.g., censorship). In my work [J5], in fact, I demonstrated that in case of censorship applied to Facebook messages, it is possible to identify a set of candidates that can fit an anonymized context. The experiments, conducted us-ing Social Media messages related to American newspapers, showed a good detection rate, usus-ing only comments of people to posts to guess the censored identities. Such result is significant to cir-cumvent the censorship applied by some countries or institutions to news or to identify identities of criminals/enemies in a cyber war scenario.
My recent work includes also an insight on the impact of Social Media at social and political level [O6] and how Social Media is tied to Big Data. Very often, in fact, these platforms are used in elec-toral campaigns or to spread political messages. The study [J2] conducted along my Ph.D. discuss-es the implications of opening to the public Social Media based tools, and the benefits that the population can derive, as well as tackling the problem of their misuse and the consequences that could result from a bad use [C1].
Projects
European project JUST / 2013 CASSANDRA: I took part to the CASSANDRA European research
pro-ject through which I investigated and monitored the diffusion and life cycle of the New Psychoac-tive Substances (NPS).
I contributed to the project through the creation of a set of tools designed to draw interactive vis-ualizations [C3] with the aim of supporting the analysts in the data exploration process and at the same time offering to researchers the possibility to spot new trends and anomalies.
Furthermore, I supported the project through the development of a system able to identify new substances and their effects [C4]. Within the project duration, I focused on analyzing and tracking drug consumers behaviors. My recent studies highlighted that a significant subset of drug forum users takes part to threads regarding more than one substance. Notably, I realized that consumers discuss different NPS as they are declared illegal, but some are attracted by different categories of substances, sometimes with opposite effects.
Centro di Ricerca e Analisi per le Informazioni Multimediali (CRAIM) (CNR-PS Convenzione operati-va): as member of the CRAIM laboratory, I contributed to the project developing a crawler for the
Instagram platform and provided the integration of the crawler with a face detection application for spotting wanted criminals by police and monitoring important events in Italian metropolitan areas (for security reasons). Furthermore, I worked on the adaptation of the crisis management
- 3 -
tool [J3] to encounter the requirements of Italian law enforcers, for crime prevention and critical situation management.
#toscana15: The project # toscana15 is a project originated from the collaboration between the
IIT-CNR, the Department of Political Science of the University of Pisa, and the newspaper "Il Tirre-no", with the aim of monitoring the election campaign for the administrative elections of the Tus-cany region. My role in the project was to develop a set of metrics to monitor politicians and influ-encers on Twitter and Facebook and by comparing them to measure the impact of the Social Net-works on the population [O6].
Publications
International Journals
[J1] Avvenuti, M., Cresci, S., Del Vigna, F., & Tesconi, M. (2016). Impromptu crisis mapping to prior-itize emergency response. Computer, 49(5), 28-37.
[J2] Avvenuti, M., Cresci, S., Del Vigna, F., & Tesconi, M. (2017). On the need of opening up crowdsourced emergency management systems. AI & SOCIETY, 1-6.
[J3] Avvenuti, M., Cresci, S., Del Vigna, F., Fagni, T., & Tesconi, M. (2018). CrisMap: a Big Data Cri-sis Mapping System Based on Damage Detection and Geoparsing. Information Systems Frontiers, 1-19.
[J4] Del Vigna, F., Petrocchi, M., Tommasi, A., Zavattari, C., & Tesconi, M. (2017). Who framed Roger Reindeer? De-censorship of Facebook posts by snippet classification. Online Social Networks and Media, Elsevier (Accepted).
International Conferences/Workshops with Peer Review
[C1] Del Vigna, F., & Cresci, S. (2015). Social Media for the Common Good: the case of EARS. [C2] Avvenuti, M., Del Vigna, F., Cresci, S., Marchetti, A., & Tesconi, M. (2015, November). Pulling information from social media in the aftermath of unpre- dictable disasters. In Information and Communication Technologies for Disaster Management (ICT-DM), 2015 2nd International Confer-ence on (pp. 258-264). IEEE.
[C3] Del Vigna, F., Avvenuti, M., Bacciu, C., Deluca, P., Petrocchi, M., Marchetti, A., & Tesconi, M. (2016, October). Spotting the diffusion of New Psychoactive Sub- stances over the Internet. In In-ternational symposium on intelligent data analysis (pp. 86-97). Springer InIn-ternational Publishing. [C4] Del Vigna, F., Petrocchi, M., Tommasi, A., Zavattari, C., & Tesconi, M. (2016, November). Semi-supervised knowledge extraction for detection of drugs and their effects. In International confer-ence on social informatics (pp. 494-509). Springer International Publishing.
[C5] Del Vigna, F., Cimino, A., Dell’Orletta, F., Petrocchi, M., & Tesconi, M. (2017). Hate me, hate me not: Hate speech detection on Facebook. Proceedings of the First Italian Conference on Cyber-security (pp. 86-95).
- 4 -
[O1] Bellomo, S., Cresci, S., Del Vigna, F., La Polla, M. N., & Tesconi, M. (2015). A platform for gath-ering eyewitness reports from social media users in the aftermath of emergencies. Technical Re-port IIT-CNR
[O2] Bacciu, C., Del Vigna, F., Marchetti, A., Tesconi, M. & Deluca, P. (2016). To- wards an Auto-mated Analysis of the Online Supply Chain of Novel Psychoactive Substances. In OCommWEBIST 2016 abstracts book (poster session).
[O3] Cresci, S., La Polla, M. N., Mazza, M., Tesconi, M. & Del Vigna, F. (2016). #selfie: mapping the phenomenon. IIT TR-08/2016
[O4] Del Vigna, F., Petrocchi, M. & Tesconi, M. (2016). Technical report on the methodology used for the analysis of websites. IIT TR-10/2016
[O5] Del Vigna, F., Petrocchi, M., Deluca, P. & Tesconi, M. (2016). Main social media analysis out-come of the CASSANDRA project Main social media analysis outout-come of the CASSANDRA project. IIT TR-12/2016
[O6] Cresci, S., Del Vigna, F. e Tesconi, M. (2017) I Big Data nella ricerca politica e sociale, in An-dretta, M., and Bracciale, R., (eds.), Social Media Campaigning: Le elezioni regionali in #Tosca-na2015, Pisa: Pisa University Press, 113-140
Pisa, 14/11/2017
The Student