• Non ci sono risultati.

Harnessing the Social Sensing Revolution: Challenges and Opportunities

N/A
N/A
Protected

Academic year: 2021

Condividi "Harnessing the Social Sensing Revolution: Challenges and Opportunities"

Copied!
5
0
0

Testo completo

(1)

U N I V E R S I T À D I P I S A

DIPARTIMENTO DI INGEGNERIA DELL’INFORMAZIONE Dottorato di Ricerca in Ingegneria dell’Informazione

Activity Report by the Student Stefano CRESCI - cycle XXX PhD program Tutor(s): Prof. Marco AVVENUTI, Prof. Maurizio TESCONI 1. Research Activity The research activity I carried out during my PhD involved a crucial aspect of Social Media Analysis: social sensing. In particular, I investigated the 2 sides of the social sensing coin, namely: (i) explore the opportunities opened up by social sensing for a practically relevant scenario, such as that of emergency management; and (ii) investigate the problem of credibility and reliability of social sen-sors. Regarding the first field of research – that is, emergency management via social sensing – I de- signed and developed a Big Data crisis mapping system called CrisMap. The CrisMap system lever-ages my previous experiences within this field, and in particular, an algorithm for geoparsing of microtexts and a machine learning classifier for detecting damage mentions in microtexts. During the third year of PhD, I employed those 2 sub-systems that I developed in previous years, as soft-ware components within the CrisMap system. As previously stated, CrisMap is based on a number of Big Data technologies, such as Apache Spark, Elasticsearch, Apache Kafka, and Kibana. By lever-aging such state-of-the-art technologies, my proposed system is capable of efficiently processing large volumes of social media messages in real-time, while guaranteeing scalability and reliability. Also within this line of research are my studies on crowdsensing. Specifically, I proposed a novel crowdsensing paradigm called hybrid crowdsensing, which goes beyond the traditional participa-tory and opportunistic crowdsensing by combining the strengths of those 2 previous approaches. I developed a proof-of-concept system implementing my hybrid crowdsensing approach, and I benchmarked the system in the emergency management field. Results of this experimentation demonstrated the usefulness of hybrid crowdsensing with regards to the amount and quality of collected information, as well as the overall responsiveness of the system. Other activities that I carried out during my PhD, and that are related to this line of research, are about the develop-ment of a Natural Language Processing (NLP) machine learning classifier for witness detection in social media, and a discussion regarding the need of opening up social media-based emergency management systems.

My research activities regarding characterization and detection of deceptive social sensors have focused on characterizing and fighting fake accounts and social spambots in Twitter. Regarding the characterization the detection of fake Twitter accounts, I built a baseline dataset of accounts where humans and fakes are known a priori. Then, I tested known methodologies for bot and spam detection on the baseline dataset. The results of the analysis suggested that fake followers detection deserves specialized mechanisms. Specifically, algorithms based on classification rules do not succeed in detecting the fake followers in the baseline dataset. Instead, classifiers based on features sets for spambot detection work quite well also for fake followers detection. Finally,

(2)

building on a crawling-cost analysis, I designed and implemented a set of lightweight classifiers that make use of the less costly features, while still being able to correctly classify more than 95% of the accounts. Regarding spambots, a few recent work anecdotally highlighted the emergence of a new wave of malicious and deceptive accounts, with human-like characteristics. However, such works did not provide quantitative evidence of the extent of the problem, nor proposed solutions for the detection of such malicious accounts. Thus, I first focused on carrying out the first quantita-tive assessment of the extent of the social spambot problem in Twitter. I did so by testing Twitter’s capacity in detecting and removing the social spambots. Then, I also carried out a large-scale crowdsourcing experiment with the aim of evaluating the capacity of Twitter users to recognize social spambots in the wild. Finally, I benchmarked a number of state-of-the-art spambot detec-tion techniques and tools, in order to evaluate their effectiveness against the social spambots. Considering the unsatisfactory results measured in all these experiments, I then proposed a novel spambots detection technique that leverages the digital DNA behavioural modelling technique that I developed during my second year of PhD. I benchmarked my proposed detection technique demonstrating its excellent results, also taking into account efficiency and scalability, and its ro-bustness against evasion techniques. In addition, I also exploited the DNA-inspired behavioural modelling technique to perform a study on the characteristics of human behaviour in online social networks. Finally, given my expertise in characterization and detection of deception in social me- dia, I co-organized a Special Session on “Data Science in Societal Debates” at the 2017 Internation- al Conference on Data Science and Advanced Analytics (IEEE). I personally chaired the session dur-ing the conference. 2. Formation Activity During my PhD I attended and successfully completed the following courses: • Dmitry G. Korzun, Petrozavodsk State University (Russia), "Smart spaces"; • Franco Flandoli, University of Pisa, “Corso di Dottorato di Probabilità, Statistica e Processi Stocastici”; M. Luise, L. Sanguinetti, University of Pisa: “Game Theory and Optimization in communi-cations and Networking”; PHD PLUS 2016, University of Pisa: “Valorizzazione della ricerca, Innovazione, Spirito im-prenditoriale”; • Lorenzo Natale, IIT, “Middleware and robotic software programming”. I also obtained a Master’s degree in “Big Data Analytics and Social Mining” from the University of Pisa in 2016, gaining expertise on: • Data Management for Business Intelligence; • Data Mining and Machine Learning; • Web Search Engines and Information Retrieval; • Analytical Crawling, Text Annotation; • Big data sources, crowdsourcing, crowdsensing; • Data Visualization & Visual analytics; • High Performance & Scalable Analytics, NO-SQL Big Data Platforms; • Social Network Analysis; • Mobility Data Analysis; • Sentiment Analysis & Opinion Mining;

(3)

• Big Data for Business; • Big Data Ethics; • Data Journalism & Story Telling. Finally, I attended and successfully completed two PhD summer schools: 2016 summer school on “Computational Complex and Social Systems”, University of Cata-nia; 2016 summer school on “Social Networks Security, Privacy, and Trust”, University of Pad-ua. 3. Research Periods at Qualified Research Institutions From September to December 2015, I joined Nokia Bell Labs (formerly Alcatel-Lucent Bell Labs) for a research internship under the supervision of Prof. Roberto Di Pietro. During my internship, I studied behavioural modelling techniques, and their application toward modelling the online be-havioural patterns of social networks users. Later, I exploited such techniques for improving the performances of state-of-the-art spambot detection techniques. 4. Projects My research activities related to emergency management via social sensing have been carried out as part of the FAR-FAS Regional project “SmartNews” (Social Sensing for Breaking News) and the Registro .it funded project SoS – Social Sensing. Additionally, all these activities related to the characterization and detection of deceptive social sensors have been carried out as part of the “SoBigData Research Infrastructure”, a Horizon2020 project in which I am involved. 5. Publications Journals J1. Cresci, S., Di Pietro, R., Petrocchi, M., Spognardi, A., and Tesconi, M. (2017). Social Fingerprint-ing: detection of spambot groups through DNA-inspired behavioral modeling. IEEE Transactions on Dependable and Secure Computing. IEEE. (in press) J2. Cresci, S., Avvenuti, M., La Polla, M., Meletti, C., and Tesconi, M. (2017). Nowcasting of Earth-quake Consequences using Big Social Data. IEEE Internet Computing. IEEE.

J3. Avvenuti, M., Cresci, S., Del Vigna, F., and Tesconi, M. (2017). On the need of opening up crowdsourced emergency management systems. AI & SOCIETY, 1-6. Springer.

J4. Avvenuti, M., Cresci, S., Marchetti, A., Meletti, C., and Tesconi, M. (2016). Predictability or early warning: using social media in modern emergency response. IEEE Internet Computing, 20(6), 4-6. IEEE.

J5. Cresci, S., Di Pietro, R., Petrocchi, M., Spognardi, A., and Tesconi, M. (2016). DNA-inspired online behavioral modeling and its application to spambot detection. IEEE Intelligent Systems, 31(5), 58-64. IEEE.

(4)

J6. Avvenuti, M., Cresci, S., Del Vigna, F., and Tesconi, M. (2016). Impromptu crisis mapping to pri-oritize emergency response. Computer, 49(5), 28-37. IEEE. J7. Avvenuti, M., Cimino, M. G., Cresci, S., Marchetti, A., and Tesconi, M. (2016). A framework for detecting unfolding emergencies using humans as sensors. SpringerPlus, 5(1), 43. Springer. J8. Cresci, S., Di Pietro, R., Petrocchi, M., Spognardi, A., and Tesconi, M. (2015). Fame for sale: effi-cient detection of fake Twitter followers. Decision Support Systems, 80, 56-71. Elsevier. Conferences/workshops

C1. Vadicamo, L., Carrara, F., Falchi, F., Cresci, S., Tesconi, M., Cimino, A., and Dell’Orletta, F. (2017, October). Cross-Media Learning for Image Sentiment Analysis in the Wild. In Proceedings of

the 2017 IEEE International Conference on Computer Vision Workshops. IEEE.

C2. Cresci, S., Di Pietro, R., Petrocchi, M., Spognardi, A., and Tesconi, M. (2017, October). Exploit-ing digital DNA for the analysis of similarities in Twitter behaviours. In Proceedings of the 4th IEEE

International Conference on Data Science and Advanced Analytics. IEEE.

C3. Avvenuti, M., Bellomo, S., Cresci, S., La Polla, M. N., and Tesconi, M. (2017, April). Hybrid crowdsensing: A novel paradigm to combine the strengths of opportunistic and participatory crowdsensing. In Proceedings of the 26th International Conference on World Wide Web Compan-ion, 1413-1421. ACM. C4. Cresci, S., Di Pietro, R., Petrocchi, M., Spognardi, A., and Tesconi, M. (2017, April). The para-digm-shift of social spambots: Evidence, theories, and tools for the arms race. In Proceedings of the 26th International Conference on World Wide Web Companion, 963-972. ACM. C5. Cresci, S., Gazzè, D., Lo Duca, A., Marchetti, A., and Tesconi, M. (2015, May). Geo data annota-tor: a web framework for collaborative annotation of geographical datasets. In Proceedings of the 24th International Conference on World Wide Web Companion, 23-24. ACM. C6. Cresci, S., Tesconi, M., Cimino, A., and Dell’Orletta, F. (2015, May). A linguistically-driven ap- proach to cross-event damage assessment of natural disasters from social media messages. In Pro-ceedings of the 24th International Conference on World Wide Web Companion, 1195-1200. ACM. C7. Avvenuti, M., Del Vigna, F., Cresci, S., Marchetti, A., and Tesconi, M. (2015, November). Pulling information from social media in the aftermath of unpredictable disasters. In Proceedings of the 2nd International Conference on Information and Communication Technologies for Disaster Man-agement (ICT-DM), 258-264. IEEE. C8. Cresci, S., Cimino, A., Dell’Orletta, F., and Tesconi, M. (2015, November). Crisis mapping during natural disasters via text analysis of social media messages. In Proceedings of the 16th

Interna-tional Conference on Web Information Systems Engineering, 250-258. Springer LNCS. C9. Del Vigna, F., Cresci, S. (2015). Social Media for the Common Good: the case of EARS, In Pro-ceedings of the 1st International Workshop on Community Intelligence for the Common Good. C10. Cimino, A., Cresci, S., Dell’Orletta, F., and Tesconi, M. (2014). Linguistically-motivated and lex- icon features for sentiment analysis of Italian tweets. In Proceedings of the 4th evaluation cam-paign of Natural Language Processing and Speech tools for Italian (EVALITA 2014), 81-86.

(5)

Others

O1. Cresci, S., Del Vigna, F., and Tesconi, M. (2017). I Big Data nella ricerca politica e sociale. In An-dretta, M., and Bracciale, R., (eds.), Social Media Campaigning. Le elezioni regionali in #Tosca-na2015, 113-140. Pisa University Press.

O2. Cresci, S., La Polla, M., and Tesconi, M. (2017). Il fenomeno dei Fake Follower in Twitter. In An-dretta, M., and Bracciale, R., (eds.), Social Media Campaigning. Le elezioni regionali in #Tosca-na2015, 141-162. Pisa University Press. O3. Cresci, S., Di Pietro, R., Petrocchi, M., Spognardi, A., and Tesconi, M. (2016). Social fingerprint-ing - or the truth About you. ERCIM News, 106, 26-27. ERCIM. O4. Meletti, C., Cresci, S., and Tesconi, M. (2016). Rapid estimation of earthquake intensity from Twitter’s social sensors. In Proceedings of the 35th General Assembly of the European Seismologi-cal Commission. ESC. O5. Meletti, C., Cresci, S., La Polla, M., Marchetti, A., and Tesconi, M. (2014). Social Media as Seis-mic Networks for the Earthquake Damage Assessment. AGU Fall Meeting Abstracts, 1, 4338. AGU. Pisa, 30/11/2017 The Student Stefano CRESCI The Tutor(s) Prof. Marco AVVENUTI Prof. Maurizio TESCONI

Riferimenti

Documenti correlati

Specifically, considering only the reading associated with year 2013, the application must count for each station the number of times PM10 was greater than

Only for the historical data of year 2016, the application must compute for each pair (stockId,month) the absolute difference between the highest and the

Specifically, for each city, you must compute the average value of max_temperature by considering all the values of max_temperature associated with the

Each couple of airports (departure airport, arrival airport) is a route and a route is an “overloaded route” if at least 99% of the flights of that route were fully booked

The application stores in the first HDFS output folder the information “bid,maximum price,minimum price” for all the books with an anomalous price behavior in 2017

Considering only the statistics related to May 2018, the application must select only the pairs (VSID, hour) for which the average of CPUUsage% for that pair is

Specifically, given a date and a sensor, the sensor measured an “unbalanced daily wind speed” during that date if that sensor, in that specific date, is characterized by

Considering only the failures related to year 2017, the application must select the identifiers of the data centers (DataCenterIDs) affected by at least 365