Analyzing bot activity on social media over political discourse. The case of Brexit

(1)

Analyzing bot activity on social media

over political discourse. The case of

Brexit

Ivan Spagnuolo

Student Id: 899286

Advisor: Prof. Marco Brambilla

Co-advisor: Dr. Emre Calisir

Dipartimento di Elettronica, Informazione e Bioingegneria

Politecnico di Milano

This thesis is submitted for the degree of

Master of Science in Computer Science and Engineering

October 2019

(2)

I would like to dedicate this thesis to my family and friends, especially my grandfather who wanted to be here.

(3)

Ringraziamenti

Innanzitutto vorrei ringraziare il professor Marco Brambilla, del Dipartimento di Elettronica, Informazione e Bioingegneria del Politecnico di Milano, per essere stato sempre disponibile nel rispondere alle mie domande e per avermi indirizzato nella giusta direzione di studio.

Ringrazio il Dr Emre Calisir, che mi ha sempre aiutato dandomi il giusto supporto tecnico nei vari ostacoli incontrati e i dati necessari per realizzare questo progetto di ricerca.

Vorrei esprimere la mia profonda gratitudine ai miei immancabili genitori, mio fratello e mia cognata con i loro meravigliosi figli per avermi fornito un supporto incessante nella ricerca e nella stesura di questa tesi, la mia ragazza e la sua fantastica famiglia per avermi sempre fatto sentire parte di una seconda famiglia.

Infine, vorrei ringraziare i miei compagni di studio per i vari momenti trascorsi insieme sui libri e con i quali ho avuto un rapporto meraviglioso sia dentro che fuori l’ambiente universitario. Questo risultato non sarebbe stato possibile senza di loro. Grazie.

(4)

Abstract

Social media plays an essential role in terms of information sharing during political events. The strong use of social media in politics provides a vast source of information to understand the relationship between human behaviour and political facts.

While social media is a platform made for the use of people, it is also known that a large percentage of accounts are automatic generators of posts and other activities on social networks in order to amplify some topics or share many types of content. These accounts are often called "bots".

These bots are problematic because they can manipulate information and promote un-verified information, which can adversely affect public opinion on various topics, such as product sales and political campaigns. In this context, detecting bot activity is getting more complex because many bots are actively trying to avoid detection. For this reason, social networks employ more sophisticated algorithms for detecting suspicious activity, and this forces adversaries to develop new techniques aimed at evading those detection algorithms.

Through its open nature and fully-featured API support, Twitter is an ideal platform for research into suspicious social network activity. Starting from this scenario, we decided to analyze in detail how the bots influenced the Brexit referendum through their activity on Twitter both from a temporal and geospatial point of view.

(5)

Abstract

I social media svolgono un ruolo essenziale in termini di condivisione delle informazioni durante eventi politici. Il forte uso dei social media in politica fornisce una vasta fonte di informazioni per comprendere la relazione tra comportamento umano e fatti politici.

Mentre i social media sono una piattaforma creata per l’uso delle persone, è anche noto che una grande percentuale di account sono generatori automatici di post e altre attività sui social network al fine di amplificare alcuni argomenti o condividere molti tipi di contenuti. Questi account sono spesso chiamati "bot".

Questi bot sono problematici perché possono manipolare informazioni e promuovere informazioni non verificate, che possono influire negativamente sull’opinione pubblica su vari argomenti, come vendite di prodotti e campagne politiche. In questo contesto, il rilevamento dell’attività dei bot sta diventando più complesso perché molti di essi stanno attivamente cercando di evitare il rilevamento. Per questo motivo, i social network impiegano algoritmi più sofisticati per rilevare attività sospette e questo costringe gli avversari a sviluppare nuove tecniche volte a eludere quegli algoritmi di rilevamento.

Grazie alla sua natura aperta e al completo supporto API, Twitter è una piattaforma ideale per la ricerca di attività di social network sospette. Partendo da questo scenario, abbiamo deciso di analizzare in dettaglio in che modo i bot hanno influenzato il referendum sulla Brexit attraverso la loro attività su Twitter sia dal punto di vista temporale che geospaziale.

(6)

Contents vii 4.4.2 Data Annotation . . . 18 4.4.3 Data Analysis . . . 19 5 Implementation 20 5.1 Data collection . . . 20 5.2 Data exploration . . . 21 5.3 Preprocessing . . . 21 5.4 Botometer . . . 23 5.5 Stance classification . . . 24 5.6 Validation . . . 25 5.7 Data Analysis . . . 27 5.8 Libraries . . . 27 5.8.1 Scipy . . . 27 5.8.2 Scikit-learn . . . 27 5.8.3 Mapbox . . . 27 5.8.4 Plotly . . . 28

6 Experiments and results 29 6.1 Dataset . . . 29

6.1.1 Tweet . . . 29

6.1.2 User . . . 31

6.1.3 Dataset composition . . . 32

6.2 Temporal domain . . . 36

6.2.1 Activity of the Twitter users . . . 36

6.2.2 Frequency of user’s creation . . . 38

6.2.3 User’s stance timeline . . . 40

6.3 Geospatial domain . . . 41

7 Conclusion 46 7.1 Summary of the results . . . 46

7.2 Contributions . . . 47

7.3 Future work . . . 47

(8)

List of Figures

4.1 Workflow of the proposed solution . . . 16

4.2 Data Preparation . . . 17

4.3 Data Annotation . . . 18

4.4 Data Analysis . . . 19

5.1 Supervised Learning Pipeline for Stance Classification . . . 24

6.1 Example of a JSON Tweet . . . 30

6.2 Example of a JSON Twitter User . . . 31

6.3 Pie charts of Bots distribution . . . 32

6.4 Confusion matrix about Botometer evaluation . . . 33

6.5 Accuracy Threshold Trade-off . . . 34

6.6 Pie chart of Bots distribution with the most accurate threshold . . . 35

6.7 Confusion matrix of the most accurate threshold . . . 35

6.8 Weekly mean of the retweets made by the users . . . 36

6.9 Weekly mean of the favourites received by the users . . . 37

6.10 Subdivision of users based on the number of tweets posted about Brexit . . 37

6.11 Box plot of the tweets weekly posted . . . 38

6.12 Registration timeline of Twitter Users examined . . . 39

6.13 Changing of the Users’ stance during the time . . . 40

6.14 Relation between bots or non-bots and their stance . . . 41

6.15 Choropleth map of Twitter ’casual’ users . . . 42

6.16 Choropleth map of Twitter users interested in the Brexit topic . . . 42

6.17 Number of people interested or not divided by the main countries . . . 43

6.18 Voters distribution . . . 44

(9)

List of Tables

2.1 Table of Stance Indicative Hashtags . . . 8

2.2 Table of the resutls given by Stance Classification methods . . . 9

4.1 Active Users Dataframe Head . . . 18

(10)

Chapter 1 Introduction

1.1 Context and problem statement

Social networks are accused for their inability to prevent the manipulation of news and information by potentially malicious actors. These activities can expose users to a variety of threats. A matter of interest is the coordination of actions across multiple accounts in order to amplify specific content to users in their news, recommendations, and searches.

Participants in these campaigns can include: fully automated accounts (“bots”), cyborgs (accounts that use a combination of manual and automated actions), full-time human opera-tors, and users who inadvertently amplify content due to their beliefs or political affiliations [1].

The role of so-called social media “bots” has been the subject of attention in recent years. In fact they can play a valuable part in the social media ecosystem by commenting posts about a variety of topics in real-time or providing automated updates about news stories or events. At the same time, they can also be used to attempt to alter the perceptions of political discourse on social media or spread misinformation. Since social media has reached an ever more prominent position in the general news and information, the bots have been swept up in the broader debate over Americans’ changing news habits, the tenor of online discourse and the prevalence of “fake news” online.

Identifying suspicious activity in social networks is becoming increasingly difficult. Opponents have learned from their past experiences, are now using better tactics, building better automation, and creating much more human-like sock puppets. In this scenario, we use the bot detector service Botometer which checks the activity of a Twitter account and gives it a score in the range (0,1) based on how likely the account is to be a bot with 1 being the maximum probability [2].

(11)

1.2 Proposed solution 2 After a validation phase of the Botometer results, we have focused our analysis in comparing the activity of bots and non-bots on topics related to Brexit.

Brexit (Britain’s exit) deliberations on Twitter platforms, revolved around the following questions: should the UK Leave or Remain in the EU? If the UK leaves, what would be the consequences? Date of the referendum was announced on February 2016, and the official campaign started in April 2016. Political leaders, from both the camps, participated in the debates and interviews to convey their arguments through traditional and social media platforms during the month of June 2016. More than 30 million people voted on June 23, 2016, and the leave camp won.

In this context, after three years, we were able to observe how the activity of bots and non-bots has changed and going more in detail through a stance classification algorithm we have analyzed whether the pro Brexit or pro-leave accounts are more active.

1.2 Proposed solution

The purpose of this thesis is to analyze the activity of bots and non-bots in order to understand how bots are able to influence people through social media.

Over the years, various analyzes have been carried out on this topic, mainly focused on political and socio-economic events. Our solution is focused on Twitter users which first of all are analyzed and cleaned and then they are classified in bot or non-bot basing on what they post on Twitter.

Then our approach collects the tweets containing the keyword Brexit posted between January 2016 and December 2018, in order to evaluate the stance about the Brexit debate. In this way we are able to determine if an user is pro-Leave or pro-Remain about the exit from the Europe of the UK. Once we get the results from the Botometer framework and the stance classification algorithm we construct a dataset containing all the Twitter information of each user considered, the Botomoter scores and the stance result. In this way we have all the data necessary to conduct temporal and geo-spatial analysis in order to study the social activity of bots and non-bots. In particular, we study the activity in terms of tweets’ frequency about first generic topics and we compare it with the non-bots’ one. Then combining the user stance classification and bot detection, we understand the position of social media users with respect to the different topics in the debate. Our comparative, temporal and spatial analysis of political accounts can detect the critical periods of the Brexit process, the impact they have on the debate and which areas are more active and influenced by this theme.

(12)

1.3 Structure of the thesis 3

1.3 Structure of the thesis

The structure of the thesis is as follows:

• Chapter 2 defines and explains the background knowledge and concepts that are related to the work that has been performed for this thesis.

• Chapter 3 presents the past works that are related to this thesis in the problem they try to answer and the solution they propose.

• Chapter 4 contains a high level description of the employed methods that are used in this thesis.

• Chapter 5 describes the source codes and implementations of the used methods. • Chapter 6 presents the results of the experiments and discusses these outcomes. • Chapter 7 concludes this report by summarizing the work, doing a critical discussion

(13)

Chapter 2 Background

This chapter introduces the theoretical background on Botometer and Stance Classification. The first one is a service that we used to classify users in bots and non-bots, while the second one is a machine learning algorithm to classify the stance of the tweets collected. In section 4, we will explain how we introduced them into our solution.

2.1 Botometer

Botometer is a bot detection framework that extracts a large collection of features from data and meta-data about social media users, including tweet content and sentiment, network patterns, and activity time series. These features are used to train highly-accurate models to identify bots. For a generic user, a score between 0 and 1 is produced to represent the likelihood that the user is a bot.

2.1.1 Data mining

1. Data collection: Data Collection is an important process for almost every kinds of research. Poor quality of data collection can affect the validity of a study and inevitably lead to invalid results. The sources of data in this study was collected through the Twitter API and use the method of both direct downloading and web scraping with python.

2. Features Extraction: Data collected using the Twitter API are distilled in 1,150 features in six different classes that are discussed next.

• User-based features. Features extracted from user metadata have been used to classify users and patterns before. We extract user-based features from meta-data

(14)

2.1 Botometer 5 available through the Twitter API. Such features include the number of friends and followers, the number of tweets produced by the users, profile description and settings.

• Friends features. Twitter allows inter connectivity. Users are linked by follower-friend relations. Content travels from person to person via retweets. Also, tweets can be addressed to specific users via mentions. We consider four types of links: retweeting, mentioning, being retweeted, and being mentioned.

• Network features. The usage of network features significantly helps in tasks like political astroturf detection. This framework reconstructs three types of networks: retweet, mention, and hashtag co-occurrence networks. Retweet and mention networks have users as nodes, with a directed link between a pair of users that follows the direction of information spreading: toward the user retweeting or being mentioned. Hashtag co-occurrence networks have undirected links between hashtag nodes when two hashtags occur together in a tweet.

• Temporal features. Other important features useful to classify an user as bot or non-bot are the temporal features. These are related to user activity, including average rates of tweet production over various time periods and distributions of time intervals between events.

• Content and language features. In a recent study was demonstrated the impor-tance of content and language features in revealing the nature of social media conversations [3]. The system does not employ features capturing the quality of tweets, but collects statistics about length and entropy of tweet text.

• Sentiment features. Sentiment analysis is a powerful tool to describe the emo-tions transmitted by a piece of text, and more broadly the attitude or mood of an entire conversation. Sentiment extracted from social media conversations has been used to predict offline events including financial market fluctuations. This framework leverages several sentiment extraction techniques to generate various sentiment features, including arousal, valence and dominance scores, happiness score, polarization, strength and emoticon score.

3. Model evaluation: Model evaluation helps to find the best model that represents our data and how well the chosen model will work in the future. In Botometer context, given in input the list of screen names, the framework produces in output the following JSON object for each screen_name processed.

(15)

2.1 Botometer 6 1 { 2 "categories": { 3 "friend": 0.45, 4 "sentiment": 0.34, 5 "temporal": 0.55, 6 "user": 0.29, 7 "network": 0.43, 8 "content": 0.41 9 }, 10 "user": { 11 "screen_name": "Botometer", 12 "id": "2451308594" 13 }, 14 "scores": { 15 "english": 0.34, 16 "universal": 0.36 17 } 18 }

4. Model validation: Model validation is defined within regulatory guidance as “the set of processes and activities intended to verify that models are performing as expected, in line with their design objectives, and business uses.” It also identifies “potential lim-itations and assumptions, and assesses their possible impact” and, generally, validation activities are performed by individuals independent of model development or use.

2.1.2 Accuracy Threshold Trade-off

As other past researches, we needed to say whether an account could reasonably be a “bot.” As it is reported in this research site1, we calculated a threshold that would minimize two different types of error. Using a Botometer score that was too high would have meant incorrectly classifying many bots as human accounts – in other words known as a false negative. On the other hand, if we had set a threshold that was too low, we would have incorrectly labelled lots of human accounts as bots – a false positive.

1

(16)

2.2 Brexit Stance Classification 7 Which type of error is “worse”? It’s a complicated question, and the answer depends on what we want to accomplish. We wanted the most accurate, so we set the threshold in a way that maximized accuracy.

So, starting from our validation dataframe we conducted a human analysis to determine which Botometer threshold would minimize the share of false positives and false negatives. In this way, based on our validation dataframe consisting of 1000 users, we analyzed how the accuracy varies with the threshold. In particular, we have dynamically varied the threshold from 0 to 1 with a variation of 0.1. To go into more detail and perceive every single variation we have decreased the variation in the range 0.7 - 0.96 to 0.01 so as to obtain a more linear threshold variation and consequently the possibility of finding a more accurate threshold.

2.1.3 False Positive and False Negative cases

Neither human annotators nor machine-learning models flawlessly. Humans are able to generalize and learn new features from the observed data. Machines go beyond human annotators in processing a large number of relationships and in searching for complex patterns. We analyzed our annotated accounts and their bot scores to highlight when disagreement occurs between annotators and classification models. In these experiments, human annotation is considered as the ground-truth. We have identified the cases in which the disagreement between the classifier’s score and the annotations occurs. We manually reviewed a sample from these accounts to investigate these errors. Accounts recorded as human can be classified as bots when an account posts tweets created by applications linked from other platforms. Some unusually active users are also classified as bots. These users tend to have more retweets in general. This is somewhat intuitive since retweets cost less than creating new content. We have found examples of incorrect classification for organizational and promotional accounts. Such accounts are often managed by multiple individuals, or combinations of users and automatic tools, generating misleading signals for classifiers. Finally, even the language of the content can cause errors: Botometer tends to assign high bot scores to users who tweet in multiple languages. To mitigate this problem, the public version of the system now includes a classifier that ignores the features provided by the language.

2.2 Brexit Stance Classification

Identifying the users who are in favor of, against, or neutral towards a target is known as stance classification. The target of the stance analysis may be a person, an organization, a government policy, a movement, a product, and so on. On the other hand, stance classification

(17)

2.2 Brexit Stance Classification 8 is usually confused with sentiment detection. According to [4], while in sentiment analysis the goal is to extract the sentiment from a piece of text, in stance classification the purpose is to determine favorability toward a given (pre-chosen) target of interest. In our context, the stance classification aims to find users in pro-Remain or pro-Leave stance and analyze their participation in the Brexit discussions. Some studies [5], [6] considered the presence of stance-indicative (SI) hashtags as an effective way to discover polarized tweets and users. The disadvantage of using this method is that it cannot evaluate tweets that do not contain SI hashtags. Unfortunately, this typically includes a substantial share of tweets. The solution we propose is to divide our dataset into two subsets, the ones that contain SI hashtags and the ones that don’t. Then, we classify the tweets with SI hashtags by rule-based method, and the remaining tweets by machine learning methods. Notice that in our context, only 8% of the tweets contain SI hashtags. Thanks to our approach, we can instead analyze the remaining 92% too. After classifying each tweet as pro-Remain, pro-Leave or non-polarized, we will be able to determine each user’s stance by looking at the number of tweets in each class.

2.2.1 Rule-based Classification

Hashtags are commonly used by Twitter users to express their stance in a political phe-nomenon. According to our analysis, between January 2016 and December 2018, more than 600 thousand unique hashtags were used with the Brexit hashtag. As shown in table 2.1, we created a list of stance-indicative (SI) and stance-ambiguous hashtags by finding the most commonly used hashtags and considering the findings of other Brexit related studies.

Stance Characterizing Hashtags

Remain

#strongerin, #voteremain, #intogether, #labourinforbritain, #moreincommon, #greenerin, #catsagainstbrexit, #bremain, #betteroffin, #leadnotleave, #remain, #stay, #ukineu, #votein, #voteyes, #yes2eu, #yestoeu, #sayyes2europe, #fbpe, #stopbrexit, #stopbrexitsavebritain

Leave

#leaveeuofficial, #leaveeu, #leave, #labourleave, #votetoleave, #voteleave, #takebackcontrol, #ivotedleave, #beleave, #betteroffout, #britainout, #nottip, #takecontrol, #voteno, #voteout, #voteleaveeu, #leavers, #vote leave,

#leavetheeu, #voteleavetakecontrol, #votedleave Ambigious #euref, #eureferendum, #eu, #uk

(18)

2.2 Brexit Stance Classification 9 In our approach the stance of a tweet is:

• Pro-Remain (PRT), if it contains at least one Remain, but not any Leave related hashtag, • Pro-Leave (PRL), if it contains at least one Leave, but not any Remain hashtag, • Non-polarized for all other cases.

Then, to calculate the user stance, we applied the following formula considering all tweets of the user in our database.

Score= ∑ PRT ∑ PRT + ∑ PRL U serStance=     

Pro− Leave, i f Score < 0.4 Pro− Remain, i f Score > 0.6

Non− Polarized, otherwise

In our comparative approach, we only take into account pro-Leave and pro-Remain users, and we get the ratio of a class by dividing its value to the sum of two classes. As a result, we found that the number of pro-Remain users is relatively higher than the number of pro-Leave users, as we can note in these results:

Method Type Remain Leave

Rule-based (RB) Tweet Users 462K 62K 254K 38K Machine Learning based (MLB) Tweets

Users 2.1M 408K 1.8M 296K Merged (RB + MLB) Tweets Users 2.56M 432K 2.05M 309K Table 2.2 Table of the resutls given by Stance Classification methods

However, this method classified 92% of tweets as non-polarized because they do not contain SI hashtags. Within our knowledge, Twitter has become the primary place for online social discussions on the Brexit referendum, and there should be a higher number of active polarized users on Twitter. Therefore, we have developed the following complementary method using machine learning techniques for stance classification of the tweets not featuring SI hashtags.

(19)

2.2 Brexit Stance Classification 10

2.2.2 Machine Learning (ML) Based Classification

In this task, we only focused on the tweets that are labeled as non-polarized in the previous method. For the preparation of training and development set for our learning-based classifier, a subject expert involved in our study, and prepared three sets of 1000 tweets from each class: pro-Remain, pro-Leave and non-polarized. In terms of feature engineering, we normalized the tweets with a Twitter specific tokenizer and then transformed to n-gram pairs (uni-bi-trigrams). For the implementation of the classification algorithm, we tested various algorithms, and we obtained the best results with the Support Vector Machines having a linear kernel. In a recently shared task about stance classification [4] was obtained the highest score among other tasks with a machine learning model similar to ours. As a result of the 10-fold cross verification, the weighted average F1 score and AUC scores achieved to 0.71 and 0.80. By predicting the tweets using this model, we obtained 2.1 million tweets from pro- Remain and 1.8 million tweets from pro-Leave classes. Then, for the validation of the classification task, a subject expert evaluated the predicted labels on a randomly selected subset of data. As a result, we found that the model’s variance is less than 5% for both classes. This method allowed us to detect a significant amount of polarized tweets. In the final step, we obtained a complete tweet set of 2.55 million pro-Remain and 1.8 million pro-Leave tweets by combining the results of rule-based and machine learning-based methods. Over this dataset, we applied the user stance evaluation, and we found that 432,000 users are pro-Remain and 309,000 of users are pro-Leave.

(20)

Chapter 3 Related Work

In this section, we review recent studies on the diverse characteristics of social bots, their activities, impact and the various detection methods developed during the years. We then review the literature on stance classification algorithms and how these are applied on political topics.

3.1 Bot detection methods

Significant research from the computer science community has been directed at the automatic detection of bots as algorithms become more complex and adept at simulating the behavior of a human controller. On Twitter, information can be gained about a user from their personal account information, tweets, likes, retweets, and direct messages. To identify bots, we set up three fundamental areas for analysis: the profile, account activity, and text mining [7]. The most common approaches are based on supervised machine learning algorithms. Supervised methods depend on relevant features used to describe entities to be classified, and this is a critical step for all the machine learning classifiers. In the bot detection algorithms, several choices have been taken into consideration, but in general, six major categories of functionality have been identified as relevant for discrimination between human accounts and bot accounts: user metadata, friend metadata, retweets/mention of network structure, content and language, feeling, and temporal characteristics. In the case of supervised learning, after extraction and preprocessing, the functionalities are inserted into supervised automatic learning models for training, so the trained models are used to evaluate accounts never seen before. Most techniques attempt to detect bots at the account level by processing many posts on social media to extract the features listed above. In this report [8], a deep neural network approach, based on the contextual short-term memory architecture (LSTM), is proposed which exploits content and metadata to predict whether a given tweet was published by a

(21)

3.2 Activity and impact of social bots 12 human or a bot. However, supervised methods do not perform well at detecting coordinated social bots that post human-generated content. In fact, if we individually consider a user from a group of coordinated bots, it is not usually suspicious.Their detection requires information about their coordination, which becomes available only once the activity of multiple bots is considered. Unsupervised learning methods have been proposed to address this issue. One of this methods is DeBot [9] that is based on the key observation for which humans cannot be highly synchronous for a long duration; thus, highly synchronous user accounts are most likely bots. The big differences between these methods and supervised ones are that it does not require a large amount of labeled data to train, and it considers cross-user features which they allow for clustering user accounts into correlated sets in near real-time. Finally, in the following research [10] it was studied a semi-supervised machine learning algorithm since it leverages both unsupervised and supervised ML techniques in two different phases: the first phase prunes presumable benign hosts through clustering algorithm like k-Means, Density-Based Spatial Clustering (DBScan) and Self-Organizing Map (SOM), while the second phase achieves bot detection with high precision through classification algorithms like support vector machine (SVM), logistic regression or feed-forward neural network (FNN).

3.2 Activity and impact of social bots

Bot activity has been reported in several domains, with the potential to affect behaviors, opinions, and choices. One of the domain most influenced is health, where we have observed social bots influencing debates about vaccination policies [11] and smoking [12]. Politics is another key domain. During the 2016 U.S. presidential election, social bots were found to generate a large amount of content, possibly distorting online conversations. As demonstrated in this research [13] retweets were a vulnerable mode of information spreading: there was no significant difference in the amount of retweets that humans generated by resharing content produced by other humans or by bots. In fact, humans and bots retweeted each other substantially at the same rate. This suggests that bots were very effective at getting messages reshared in the human communication channels. Another important aspect studied is how bots shares contents about a specific candidate in order to alter the perception of the individuals exposed to this content, suggesting that there exists an organic support for a given candidate, while in reality it is all artificially generated. Similar cases of political manipulation were found in other countries such as France, during the 2017 political elections. In this article [14] tweets, related to candidates Marine Le Pen and Emmanuel Macron in the days leading to the election, highlighted that 18K active bots pushed the so-called MacronLeaks disinformation campaign.

(22)

3.3 Stance classification 13

3.3 Stance classification

In recent years, stance classification for online discussions has received increasing research interest. Given a post belonging to a two-sided debate on an issue, the task is to classify the post as in favor of, against, or neutral towards the issue. Many studies relate Stance classification to sentiment analysis with the big difference that the target of interest may not be explicitly mentioned in the text and it may not be the target of opinion in the text. Previous works have focused on Congressional debates [15] in which the authors aim to determine from the transcripts of U.S. Congressional floor debates whether the speeches represent support of or opposition to the proposed legislation, debates in online forums or debates about political topics as abortion rights, political referendum, gay and gun rights. In the Twitter context, stance classification becomes more challenging since a tweet has a particular text structure based on hashtags, mentions, places and other metadata like retweet/favourites count. Some studies [16] [17] based their algorithms on the presence of so-called stance-indicative (SI) hashtags as an effective way to discover polarized tweets and users. For the data annotation part of the supervised learning task, manual [18] [16] [19] or automatical [17] methods have been used. Besides, there also exist some specific studies presenting richer datasets in order to define a gold standard [20]. Specifically for Twitter, various feature engineering techniques are implemented such as lexical (n- grams), word-embedding [21], syntactic (sentiment, grammatical) [22], meta-data (retweet count, follower count, mentions), network-specific (retweet-based propagation) [23] and argumentative analysis (argumentativeness, source type) [18]. As a machine learning algorithm, the authors achieved successful results with Naive Bayes, Support Vector Machines (SVM), Decision Trees and Recurrent Neural Network (RNN), and a combination of RNN with long-short memory (LSTM) and target-specific attention extractor [24].

(23)

Chapter 4 Methodology

This chapter presents the main objective behind this work and all the various phases involved during the course of this thesis. So, once we have explained the idea behind this work and the related research questions, we will explain the proposed solution at an high level, continuing on the definitions of the key terms of this topic through a detailed glossary.

4.1 Core idea

The core idea relies on the assumption that it is possible to obtain useful insights about political phenomena, in our case Brexit, from social media data. We are able to detect which locations are more influenced on a particular topic, such as the demand for a new referendum, and to understand the position of social media users with respect to the different topics in the debate, classifying them as Bot or Non-Bot, or understanding their stance about a political movement. Our temporal and spatial analysis of political accounts can detect the critical periods of the Brexit process and the impact they have on the debate.

4.2 Objectives and research goals

The research starts with the purpose of answering the following questions:

• What is the impact of automated bot accounts to the online discussions, and to which side are they aligned most?

• Can we determine the political stance of Twitter users with respect to Brexit based on the content they share and differentiating them between Bots and Non-Bots?

(24)

4.3 Glossary 15 • Which are the most active places in spreading their ideas through Twitter on the Brexit

theme?

The main goal of this thesis is to analyze first of all how the activity on Twitter of Bots and Non-Bots varies over the years in the face of political events such as Brexit referendum. At the same time it is interesting to analyze this context from the spatial point of view, so we can be able to study which are the most ’social’ areas regarding Brexit theme.

4.3 Glossary

In order to ease the reading, it is helpful to define some of the terms that will be used throughout the rest of this work.

User: a Twitter user.

Tweet: a post on Twitter done by an user.

Botometer: service that checks the activity of a Twitter account and gives it a score based on how likely the account is to be a bot.

Stance Classification: machine learning method aims at detecting the stance expressed in text towards a specific target, is an emerging problem in sentiment analysis. [25]. Feature: characteristic of the observed phenomenon that is used to describe the

phenomenon itself.

Brexit: it is a portmanteau of "British" and "exit" and it is the withdrawal of the United Kingdom (UK) from the European Union (EU).

Threshold: it is the limit, between 0 and 1, for which an user is classified as Bot or not. Bot: a program used to produce automated posts on the Twitter microblogging service. Accuracy: it is a description of systematic errors, a measure of statistical bias; low

(25)

4.4 Solution proposed 16

4.4 Solution proposed

In this section we are going to illustrate the high level workflow of the solution that we propose. The whole procedure is shown in Figure 4.1 with its different phases and here we are going to give an overall description of each of them.

(26)

4.4.1 Data preparation

The data preparation phase, Figure 4.2, takes as input the raw data, as they come from Twitter, and transforms them into a valid format. The outputs are the screen_name of each active Twitter user for the Botometer framework and the cleaned tweets which will be processed by the Stance classifier. Moreover in this phase, we check if an user is not suspended, blocked or protected to be able to extract its information and tweets.

Going into detail, this phase is composed in turn of three different steps:

Figure 4.2 Data Preparation

• Data collection: it is the primary step, in which data are collected from Twitter without any type of manipulation of them.

• Data exploration: data is studied to find characteristics and potential problems; once again no alteration is performed.

• Preprocessing: Our data collection focused on 900k users and to restrict our sample to recently active users, we develop a python script which by accessing the Twitter APIs checks if the account is active and in this case acquires the information related to the user. This data cleaning phase is necessary to avoid taking into account suspended, deleted or protected accounts which correspond to 100k users in our dataset. For each of these accounts, we collected their main info that are useful for the following analyzes. To retrieve these information we used tweepy, an easy-to-use Python library for accessing the Twitter API, and we saved them inside a csv file ready to be read in our Colab Notebook analysis.

(27)

user_id screen_name created_at friends followers favourites statuses

0 2954167529 addicted2newz Thu Jan 01 03:04:17 +0000 2015 2486 3154 48117 42766

1 243541051 karencolemanIRL Thu Jan 27 08:09:53 +0000 2011 1140 1757 63 2188

2 726372601 MollyMEP Mon Jul 30 16:21:34 +0000 2012 1458 42437 15413 47040

3 852537570202439680 RichardClarkTLT Thu Apr 13 15:02:50 +0000 2017 61 80 103 1793

4 1169142734 BDAE_GRUPPE Mon Feb 11 15:01:20 +0000 2013 409 259 369 6851

Table 4.1 Active Users Dataframe Head

Once the data have been cleaned and processed, they are organized in two csv documents, one with the Twitter information extracted for all the active users and the other one with all the collected tweets. From this info, we will consider only the screen_name field to fed the Botometer service, while for the dataset containing the tweets we will consider the text of each tweet.

4.4.2 Data Annotation

The Data Annotation phase, in Figure 4.3, is composed by two different techniques: one is the Botometer service that receives in input the processed screen_names and it produces the score based on how likely the accounts are to be a bot. The other one is the Stance classifier which takes in input the tweets collected during the data collection phase and it produces first the stance of the tweets and then through a mathematical law the stance of an user.

Figure 4.3 Data Annotation

Once the scores are assigned by the framework and the classifier, we conduct two different validation phases, one for each of the previous data annotation technique. To classify all the users in the most accurate way, first we need to create a ground-truth set that contains known samples of non-bot and bot accounts. So we randomly sampled 1000 accounts and we manually annotated them by inspecting their public Twitter profiles. Some accounts have

(28)

4.4 Solution proposed 19 obvious flags, such as using a stock profile image or retweeting every message of another account within seconds. In general, however, there is no simple set of rules to assess whether an account is bot or non-bot. So we analyzed profile appearance, content produced, retweeted and interactions with other users in terms of retweets and mentions. The final decisions are restricted to: non-bot, bot, or undecided. Accounts labeled as undecided were eliminated from further analysis. We annotated all 1000 accounts and we extracted these users, with all their informations from the active users dataframe, in a different validation dataframe in which the manual score is added as a column. This phase is very important to find the most accurate threshold which determines if an account is bot or non-bot and in this way we validated the Botometer results.While from the tweets dataset we manually checked the stance given by the classifier going to see the content of the tweets posted by the user.

4.4.3 Data Analysis

Figure 4.4 Data Analysis

In our solution, we decided to divide the final analyzes into two different domains: temporal and geospatial. About the temporal analyzes, we focused on comparing how accounts bots and non-bots are active on social media. Then, having taken the information from each user, we analyzed, for example, the average number of tweets posted every day by a bot and non-bot or to show how the frequency of bot creation changes over the years, highlighting the periods containing significant events against the referendum. Instead, concerning geospatial analysis we had to do more work in finding each user’s Geodata needed to view for example the most active areas on Twitter in Brexit, then focusing on comparing the results obtained from the stance classifier with the real results obtained from the referendum. In this scenario, we have only taken into consideration the geodata related to the united kingdom by displaying the percentage of ’Remain’ votes in the various districts.

(29)

Chapter 5 Implementation

In this chapter, we illustrate in detail the entire development and implementation, going through all the stages:

1. Data collection 2. Data exploration 3. Preprocessing 4. Botometer 5. Stance classification 6. Validation 7. Data analysis

In the end, we also shortly present the libraries used in the implementation.

5.1 Data collection

The process starts with the collection of data taken from Twitter. We chose Twitter as social media because thanks to its open source nature and the different APIs made available we were able to extract all the information and data needed to conduct our research.

So, we used the Tweepy library, an easy-to-use Python library which allows us to access the Twitter API. In fact Tweepy remaps every API exposed in a specific method with the relative documentation. When we invoke an API method the data returned are in the JSON format (JavaScript Object Notation), which uses key-value pairs to describe the properties of

(30)

5.2 Data exploration 21 an object. However, the access to the APIs is limited by the Twitter API restrictions, which, for example, have prevented them from tweeting too much in the past and have imposed speed limits on non-premium APIs.

To overcome this performance problem we have registered two different accounts on the Twitter Developer platform with the aim of creating 4 different API access tokens. In this way by creating a single Python script we were able to run it in 4 instances in parallel.

In the first phase we collected the Twitter information of each user through the Tweepy method get_user() and at the same time it allows us to verify if an user is active.

However, the User JSON object have more information fields than required for our purpose, such as user’s url, name, profile_banner_url and many others. It is in the following steps that the useless fields are removed. Regarding the tweets collection, we relied on a scraping algorithm so that we could avoid the limits imposed by Twitter and manage the high number of tweets posted by users in consideration.

5.2 Data exploration

It is good practice to explore the data before starting to manipulate it. Data exploration’s goal is to find characteristics of the data and drive the next operations accordingly. In our context, we studied the objects given by the data collection: the User1and Tweet object2. To study them, we based on the official documentation of the Twitter developer3, in which the structure of each response object is shown and all its properties are described.

5.3 Preprocessing

The objective is to go from a list of screen_names to a dataset containing all the necessary user information and all the tweets posted of an active user.

First of all, the data downloaded from the Twitter API have a lot of information that we are not going to use. For this reason the two dataset are scanned and, for each user and tweet, only the necessary information for the analysis are taken.

In fact, for the users we took information like the account creation date, the screen_name, the id, the number of follower, friends, statuses, the description and the location. This last field is important for the geo-spatial analysis. Unfortunately, another field that would have been very useful is ’derived’, an Enterprise APIs only Collection of Enrichment metadata

1_{https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/user-object.html} 2_{https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/tweet-object} 3_{https://developer.twitter.com/en/docs}

(31)

5.3 Preprocessing 22 derived for user. It provides the Profile Geo Enrichment information needed to localize for example which areas are more influenced and active on Twitter in the Brexit debate and so on. To get this type of metadata, starting from the location field we applied the Mapbox Geocoding APIs4in order to manually obtain the Geo metadata of each user. The Mapbox Geocoding API does two things: forward geocoding and reverse geocoding. In our case we used the Forward geocoding which converts location text into geographic coordinates and context information like the country name, district, region as we can see in the following example of response object:

1 { "features": [

2 { "context": [

3 { "id": "district.9655698107976620",

4 "text": "Greater London"

5 }, 6 { "id": "region.11773787231453920", 7 "short_code": "GB-ENG", 8 "text": "England" 9 }, 10 { "id": "country.8605848117814600", 11 "short_code": "gb",

12 "text": "United Kingdom"

13 } 14 ], 15 "geometry": { 16 "coordinates": [ 17 -0.1275, 18 51.50722 19 ], 20 "type": "Point" 21 },

22 "place_name": "London, Greater London, England,

23 United Kingdom",

24 "text": "London", }

25 ]

26 }

(32)

5.4 Botometer 23 Once we obtained the GEOJson information of each user, we run a Python script that takes all these GEOJson and extracts for each user the context information like the district, region, country and coordinates in order to save them in different columns inside our dataset. We performed this procedure in order to easily access to this type of information which will be useful to do analysis at different levels of localization.

5.4 Botometer

In our solution, we used the Botometer APIs provided by OSoME through the Botometer library installed in our local environment. In this way, through simple instructions in python, we were able to verify if our accounts dataset contained bots or not. To do this, once we finished retrieving tweets during the time interval of our interest, to be able to analyze the various changes of position and opinion regarding the Brexit movement, we have extracted all the screen_names of the users taken into consideration. From this list of screen_name, we were able both to retrieve Twitter-related information to the user and assign him a score that described how much he is a bot or not.

In the next lines, we will show how we used the python Botometer library, in particular, the check_account(screen_name / id) and its result.

1 i m p o r t b o t o m e t e r 2 i m p o r t p a n d a s a s pd 3 r a p i d a p i _ k e y = " x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x " 4 t w i t t e r _ a p p _ a u t h = { 5 ' c o n s u m e r _ k e y ': ' xxxxxxxx ' , 6 ' c o n s u m e r _ s e c r e t ': ' x x x x x x x x x x ' , 7 ' a c c e s s _ t o k e n ': ' x x x x x x x x x ' , 8 ' a c c e s s _ t o k e n _ s e c r e t ': ' x x x x x x x x x x x ' , 9 } 10 bom = b o t o m e t e r . B o t o m e t e r ( w a i t _ o n _ r a t e l i m i t = True , 11 r a p i d a p i _ k e y = r a p i d a p i _ k e y , 12 _{** t w i t t e r _ a p p _ a u t h )} 13 # Check l i s t o f a c c o u n t s 14 u s e r s _ d f = pd . r e a d _ c s v ( ' . / i v a n s p a g n u o l o / t h e s i s / u s e r s . c s v ') 15 a c c o u n t s = u s e r s _ d f [ ' s c r e e n _ n a m e ' ] . v a l u e s 16 r e s u l t s = [ ] 17 f o r s c r e e n _ n a m e , r e s u l t i n bom . c h e c k _ a c c o u n t s _ i n ( a c c o u n t s ) : 18 p r i n t( ' S c r e e n _ n a m e : ' + s c r e e n _ n a m e + ' c h e c k e d ! ') 19 r e s u l t s . a p p e n d ( r e s u l t ) 20

(33)

5.5 Stance classification 24 In the code above, after initializing the RapidAPI key, an API Marketplace for developers, and the Twitter API keys, obtained once registered to the Twitter developer platform, we retrieved the list of screen_names and through a for loop, we go to verify the Botometer scores of each user.

5.5 Stance classification

Figure 5.1 Supervised Learning Pipeline for Stance Classification

The practical approach for Twitter-based political stance classification is to use the polarized hashtags since they allow us to group the tweets very easily. In fact, from an hashtag like #LeaveEU we can clearly observe that it is a Leave tweet and we can easily assign the stance to it. However, in most of the tweets, people do not use polarized hashtags and we do not want to eliminate these tweets. That’s why it is more valuable to benefit from textual content of tweet instead of just the hashtags.

But it brings a difficulty since you have to read manually a bunch of tweets to make the annotation and create training/test datasets.

In every learning task, we should pay attention to the class balance problem. The classes should be balanced or if it is not balanced, we need to use additional methods to fix it. In our case, we created the training/test data balanced.

In our implementation there are 2 classifiers, the first classifier predicts the remain and non-remain tweets, and the second classifier predicts for the leave and non-leave tweets.

(34)

5.6 Validation 25 Note that non-remain tweets cover the leave and neutral tweets together. Similarly, non-leave tweets cover remain and neutral tweets. So in this case, for the remain classifier, I train and test the Remain Classifier with 1000 tweets from remain side, 500 tweets from leave and 500 neutral tweets (1000 remain, 1000 non-remain tweets in total). Similarly, I train and test the Leave Classifier with 1000 tweets from leave side, 500 tweets from remain and 500 neutral tweets (1000 leave, 1000 non-leave tweets in total).

For the feature engineering, the common approach is to generate n-gram features from the words and then apply a classifier with these numerical word Id values. In our project, we also tried several other features of tweets as well but we didn’t observe any increase in accuracy, so n-grams is convenient as the features.

5.6 Validation

Since we used two different algorithms with different purposes, one to classify users in bots and non-bots, the other to determine their stance against Brexit, we performed validation on two different results. However, in both cases, we have decided to adopt a manual validation to obtain a ground-truth validation set, reachable in our context only through human intervention. In the first case, we extracted a total of 1000 accounts in three different ways: completely random, random stratified and random leveled. While the first consists of selecting users in a purely random way, the other two modes have the purpose of selecting users in a more precise way so as to carry out a more reasoned validation. In the random stratified we randomly select the same number of users to be validated with a score, assigned by Botometer, greater and less than the threshold (initially set to 0.8). In this way, we are sure to validate the same amount of possible bots and non-bots. With the random leveled validation, the aim is to consider users at every score range, except for users with scores that are too low (less than 0.2) and too high (above 0.8). In this way, we aim to have users within the validation set with a logical score to be validated, analyzing how false positives and false negatives are distributed within the score ranges.

Once we have collected 1000 users, through the methods explained above, we searched them directly on Twitter through their screen_name and we have labeled them with the values Bot or Non-Bot basing on the content posted, description, name, and image of the profile. At this point, we have the dataset shown in table 5.1, in which we have the columns ’Personal_Score’ and ’Botometer_Score’.

(35)

5.6 Validation 26

user_id screen_name friends followers favourites statuses english_score universal_score botometer_score personal_score 0 14195941 press4change 7222 4317 4202 41145 0.1568 0.1195 Non-Bot Non-Bot 1 2795733150.0 ScoobyDrew2 346 187 3322 5012 0.0794 0.1285 Non-Bot Bot 2 173053804 guidorubio 1727 1884 15733 18986 0.0417 0.0656 Non-Bot Non-Bot 3 142564943 melcocoa 1179 602 4865 5052 0.0276 0.0607 Non-Bot Non-Bot 4 722416316 BSI_CompNav 2053 684 16 2652 0.8882 0.7713 Bot Bot

Table 5.1 Dataframe Head of validated users

Now basing on this set, we needed to find the best threshold in terms of classification correctness. To do this, we have varied the threshold from 0 to 1 and at each millesimal variation, we have calculated the correctness indices as accuracy, precision, and recall with the relative confusion matrix.

Accuracy= TruePositive+ TrueNegative

TruePositive+ FalsePositive + FalseNegative + TrueNegative Precision= TruePositive

TruePositive+ FalsePositive Recall= TruePositive

TruePositive+ FalseNegative

On the other hand, to validate the stance of the tweets posted by the users we extracted more or less 3000 tweets and we manually checked on Twitter the stance of the content posted. Basing on this dataset we performed the Cross-validation, a statistical method used to estimate the skill of machine learning models. It is commonly used in applied machine learning to compare and select a model for a given predictive modeling problem because it is easy to understand, easy to implement, and results in skill estimates that generally have a lower bias than other methods.

(36)

5.7 Data Analysis 27

5.7 Data Analysis

The analyzes carried out on the collected data were focused on answering the research questions reported in the objectives and research goals section.

From the implementation point of view, the analyzes were all written on python notebooks which allowed us to upload, study and visualize the data in the best way. We have created a notebook for each study domain, and once the dataset is loaded into a pandas dataframe, we are able to access it taking into consideration data specific to our use.

For plotting the data we used Plotly, an interesting library that allowed us to visualize the data in interactive graphs that are easy to study.

5.8 Libraries

5.8.1 Scipy

Scipy is a Python-based ecosystem of open-source software for mathematics, science, and engineering. Among its core packages we used:

• NumPy5, the fundamental package for scientific computing with Python. • Matplotlib6[26], a Python 2D plotting library.

• Pandas7, which provides easy-to-use data structures, the pandas dataframe, that are widely used.

5.8.2 Scikit-learn

Scikit-learn8[27] is an open-source library that provides useful tools for data mining and machine learning tasks.

5.8.3 Mapbox

Mapbox wrapper libraries in order to help us integrate Mapbox APIs into our existing application. In our case we used the Mapbox Python SDK9because it allows us to access

5_{http://www.numpy.org}

6_{https://matplotlib.org/index.html} 7_{http://pandas.pydata.org/index.html} 8_{https://scikit-learn.org/stable/}

(37)

5.8 Libraries 28 to the Mapbox APIs through methods written in Python and in this way integrable with our application.

5.8.4 Plotly

The plotly Python library10is an interactive, open-source plotting library that supports over 40 unique chart types covering a wide range of statistical, financial, geographic, scientific, and 3-dimensional use-cases. Built on top of the Plotly JavaScript library, plotly.py enables Python users to create beautiful interactive web-based visualizations that can be displayed in Jupyter notebooks, saved to standalone HTML files, or served as part of pure Python-built web applications.

(38)

Chapter 6 Experiments and results

The goal of this chapter is to show how the methodology and the implementation, discussed in the previous sections, have been put to the test in different experiments. First, the used dataset is presented and analyzed. Then, we focus on the results given by the different analyzes.

6.1 Dataset

Our dataset is made by JSON tweets retrieved through the Twitter API. For every considered user, many different tweets have been gathered. The objective is, after the appropriate operations, to build for each user a document made by his tweets. In this section we are going to illustrate first how the single tweets are organized and then the overall structure of the whole dataset.

6.1.1 Tweet

The term "tweet" is commonly used to indicate a user’s post on Twitter, but when we retrieve tweets through the Twitter API what we get is much more than the simple post. In addition to the text content, a tweet can have up to 150 attributes. In Figure 6.2 it is shown an exemplifying JSON tweet, taken from the dataset, as it was downloaded from the Twitter API.

(39)

6.1 Dataset 30

1 {

2 "created_at": "Thu Apr 06 15:24:15 +0000 2017",

3 "id_str": "850006245121695744",

4 "text": "1\/ Today we\u2019re sharing our vision for the

5 future of the Twitter API platform!

6 \nhttps:\/\/t.co\/XweGngmxlP",

7 "user": {

8 "id": 2244994945,

9 "name": "Twitter Dev",

10 "screen_name": "TwitterDev",

11 "location": "Internet",

12 "url": "https:\/\/dev.twitter.com\/",

13 "description": "Your official source for Twitter

14 Platform news, updates & events. Need technical

15 help? Visit https:\/\/twittercommunity.com\/

16 \u2328\ufe0f #TapIntoTwitter" 17 }, 18 "place": { 19 }, 20 "entities": { 21 "hashtags": [ 22 ], 23 "urls": [ 24 { 25 "url": "https:\/\/t.co\/XweGngmxlP", 26 } 27 ], 28 "user_mentions": [ 29 ] 30 } 31 }

Figure 6.1 Example of a JSON Tweet

It is clear that there is more information than what we need and use. We keep only the username and text fields, the remaining ones are discarded.

(40)

6.1 Dataset 31 The text contents are then processed, as already explained in section 5.3, so that in the end we have pairs of username and pre-processed texts.

Furthermore, these data are aggregated for username, in order to create a single document for each username, i.e. user. So, from now on, there is a univocal correspondence between the user and his document.

6.1.2 User

The User1 object contains Twitter User account metadata that describes the Twitter User referenced. Users can author Tweets, Retweet, quote other Users Tweets, reply to Tweets, follow Users, be @mentioned in Tweets and can be grouped into lists.

1 {

2 "id": 6253282,

3 "id_str": "6253282", 4 "name": "Twitter API",

5 "screen_name": "TwitterAPI", 6 "location": "San Francisco, CA",

7 "profile_location": null,

8 "description": "The Real Twitter API. Tweets about API changes,

9 service issues and our Developer Platform.

10 Don't get an answer? It's on my website.",

11 "url": "https://t.co/8IkCzCDr19",

12 "protected": false,

13 "followers_count": 6133636,

14 "friends_count": 12,

15 "listed_count": 12936,

16 "created_at": "Wed May 23 06:01:13 +0000 2007",

17 "verified": true,

18 "statuses_count": 3656

19 }

Figure 6.2 Example of a JSON Twitter User

(41)

6.1 Dataset 32

6.1.3 Dataset composition

Initially, our dataset was made up of all the tweets collected through scraping. But since we needed more detailed data to conduct our analyzes, we made use of the Twitter API to enrich any data collected. The initial scraping was used to be able to collect a large number of tweets in the shortest time possible to start from a solid data base. After that, through the Id of each tweet and the screen_name of the user who posted the tweet, we were able to collect the additional info. In particular, we first implemented a python script capable of collecting all the useful tweets data, we saved them in a csv file ready to be loaded and read in our python notebooks. After that we concentrated on selecting the screen_name of the users who have posted at least one tweet with hashtag Brexit and collected the most important information such as the number of tweets posted, likes received, location and so on, and we saved them inside another csv in order to separate the two data frames within the notebooks to make them less heavy and more efficient.

Once the user dataset has been completed, we have classified users in bots and not through the botometer score. During the validation phase, we initially considered 3 threshold levels and for each of these we calculated the percentage of bots and non-bots.

(42)

6.1 Dataset 33 Subsequently, we analyzed the confusion matrix to calculate the various indices of correctness relative to each threshold:

(43)

6.1 Dataset 34 Since our goal is to have a classification as accurate as possible, we have analyzed how the accuracy changed when the threshold varied on a millesimal scale inside our aerea of interest (60-90%):

(44)

6.1 Dataset 35 As we can see in the graph above, we have maximum accuracy with a threshold equal to 0.73. For this reason, in our subsequent analyzes we have taken this threshold into account, for which we have recalculated the distribution of bots within the dataset and the relative confusion matrix:

Figure 6.6 Pie chart of Bots distribution with the most accurate threshold

(45)

6.2 Temporal domain 36

6.2 Temporal domain

6.2.1 Activity of the Twitter users

In the temporal domain, we started to analyze the activity of the users, divided into bots and non-bots, to conduct a comparison on the differences in behavior. To this end, we first considered properties belonging to the Twitter User Object, such as the number of retweets done, the number of tweets posted and the number of favourites received. In order to take advantage of all the users’ dataset and make the result balanced, since the number of non-bots is considerably higher, we have decided to average these values.

Figure 6.8 Weekly mean of the retweets made by the users

As we can see in figure 6.8, the average of the retweets made by accounts bots and not bots is about the same for a few months. However, we note how the trend of the one related to non-bots deviates little from the 1300 retweets made weekly, while the one related to bots is more irregular. This chart demonstrates how non-bots use Twitter more actively and continuously, while non-bots focus their activities on reduced and irregular time intervals, such as in the presence of political events, debates and referendums remaining in political theme.

(46)

Figure 6.9 Weekly mean of the favourites received by the users

Regarding the favourites received, we first note how we had much lower numbers and a clear difference between those received by bots compared to non-bots.

Figure 6.10 Subdivision of users based on the number of tweets posted about Brexit

In fact, through the graph above, we can show how the tweets posted by the non-bots manage to arouse more interest and atten-tion than the contents shared by the bots. Remaining in the ’interest’ context, another significant graph is the one shown in the figure 6.10, which shows the percentage of accounts that have had more or less interest in the Brexit theme.

Significantly, 86% of the accounts have tweeted Brexit-related content no more than five times, while the people really interested in the topic are less than 10%.

(47)

6.2 Temporal domain 38 Finally, the last experiments to analyze the activity of the users in the temporal domain, consists of describing, through box plots in figure 6.11, the monthly distribution of tweets posted by bots and not bots. We can see that the distribution of tweets faithfully reflects the fact that non-bots are much more active than bots.

Figure 6.11 Box plot of the tweets weekly posted

A box plot gives us information on the variability or dispersion of data. In this scenario, we observe that the boxes and the relative median of bots and non-bots are not very different, showing that most of the weekly publication of tweets by the two categories is around the same numbers. However, we notice a big difference in the so-called whiskers of the box plots, particularly in the upper one, since those of the non-bots box plots are much longer than those of the bots.

In box plot, the whiskers are generally defined as 1.5 times the inter-quartile range and anything outside is called outlier. In our case, this means that we have many more cases of non-bots users who publish a large number of Tweets weekly compared to the normal distribution.

6.2.2 Frequency of user’s creation

Another factor taken into consideration is the frequency with which new users register on Twitter. Thanks to the created_at field of the Twitter User object, we were able to plot a trend

(48)

6.2 Temporal domain 39 based on the number of registrations made monthly, categorizing them in the two classes of study: bots and non-bots.

The first graph in the figure 6.12 describes the growth of accounts bot creations, an increase that remains balanced between the 200-400 monthly registrations until approaching 2016, the year of the referendum, in which we observe an evident growth. Interesting is the peak recorded just in the month of the referendum, June 2016, in which more than 1000 creations were touched. On the other hand, we can show through the second graph that due to the continuous introduction of new social networks, the number of Twitter registrations of new users is descending more and more. This is caused by the continuous integration of new social networks like Instagram, Facebook and Snapchat, which through the development of new features get greater visibility and use.

(49)

6.2.3 User’s stance timeline

The last experiment carried out in the temporal domain takes into consideration the results obtained from the stance classification. In particular, the time interval studied is between January 2016 and September 2018 in order to have an accurate study on how the stance of the users has changed before and after the referendum. By calculating a single stance value for users from their monthly tweets, we visualized, in the figure 6.13, the increases and decreases of participation to debate from each side. In this way we analyzed monthly changes in the stance of users and through the results obtained we validate the referendum outcome with 51% of pro-Leave and 49% pro-Remain users.

Furthermore, our results show that after the day of the referendum the percentage of pro-Remain users varied between 60% and 70%, denoting a clear change of stance by the pro-Leave population into the pro-Remain side.

(50)

6.3 Geospatial domain 41 By extending our findings one step further, we combined the bot scores with the results of user stance classification. In the figure, we can see on the x-axis the scores assigned by Botometer to the users, on the y-axis, on the left 200,000 users taken as a sample from our dataset while on the right the pro-Leave to pro-Remain ratio. Interestingly, our result shows that the higher the bot score, the more likely the account is in a pro-Leave position. This means that most bots were created with the intention of amplifying the interest in the Leave position of the Brexit political movement.

Figure 6.14 Relation between bots or non-bots and their stance

6.3 Geospatial domain

In the geospatial domain, our goal is to analyze how Twitter users distribute themselves around the world, differentiating them between users who are really interested in the topic or not. Regarding the users’ interest, we based on the chart shown in the figure 6.10, for which we considered interested users those who posted or shared Brexit content at least ten times, while the others we considered them as ’casual’ users.

First, we have plotted a choropleth map to see how densely users are distributed across countries. A choropleth map is a map which uses differences in shading, colouring or placing of symbols within predefined areas to indicate the average values of a particular quantity in