Stance Detection System - AlessandroRenda Algorithmsandtechniquesfordatastreammining

86 Learning from Twitter stream: stance detection from tweets

Figure 6.4: Modules of the proposed system (grey blocks require information from the preliminary supervised learning stage). Figure from (D’Andrea et al., 2019).

As per the text representation and classification steps, several methods have been proposed in the specialized literature, as discussed in Section 6.2. To identify the most suitable scheme in our specific case, we experimented three categories of meth-ods: (i) BOW text representation followed by classical machine learning algorithms for classification (D’Andrea et al., 2015); (ii) a combination of BOW and word em-beddings for text representation followed by classical machine learning algorithms for classification (Mohammad et al., 2017); (iii) deep learning-based approaches for text elaboration and classification (Cliche, 2017). In Section 6.4, we discuss in detail the results achieved by eleven methods selected from the three categories. In our experimental analysis, BOW text representation followed by an SVM classification model achieves the best results. Thus, we adopt this scheme for the text representa-tion and classificarepresenta-tion in our system.

The following sub-sections describe in detail the steps performed in the three modules and the supervised learning stage with reference to the adopted learning scheme. Although the focus of the paper is on Italian vaccine-related tweets, the proposed system is general and easily adaptable to any other topic or language.

Collection of Tweets

The first module of the system consists of two main steps, i.e. fetch and cleaning, and preprocessing. In the fetch and cleaning step, tweets are fetched according to some search criteria (e.g. keywords, time and date of posting, location of posting,

6.3 Stance Detection System 87

hashtags). This task can be accomplished by resorting to the official Twitter APIs, to third-part tool (such as Twint³or GetOldTweets⁴) or to customizable tools designed for this purpose (Bechini et al., 2016). The downloaded set of raw tweets is reduced with the aim of discarding: (i) duplicate tweets, i.e. tweets having same tweet id, possibly fetched in different searches; (ii) tweets written in languages other than Italian: this may occur because of the presence of keywords/hashtags with the same spelling in different languages. We opted for maintaining retweets in the dataset, as we think that the retweeting action, in this context, is a way of sharing the same opinion expressed in the original message.

In the preprocessing step, the text of the message is extracted from the tweet object and all useless meta-information are removed. In fact, each tweet object contains, in addition to the textual content, the status id, the user id, the location (if provided) and a number of further attributes. Some of the attributes (timestamp, user id, lo-cation) are temporarily discarded for the purposes of text mining elaboration, but they will be reconsidered for the trend analysis presented in this and the following chapters. From the tweet’s textual content we discard: links, mentions, numbers and special characters. Hashtags are not completely discarded: they are instead reduced to words (by eliminating the hash (#) symbol), so as not to lose relevant information. Finally, all characters are converted to lower case form.

Text representation

The main aim of the text representation module is to transform the set of strings (i.e. the stream of tweets) into a set of numeric vectors, by eliminating noise and extracting useful information. In the following, we briefly recall the sequence of steps applied to the tweets. Fig. 6.5 shows how a sample tweet, extracted from our vaccination dataset, is transformed as it undergoes the different text elaboration steps, namely tokenization, stop-word filtering, stemming, stem filtering and feature representation.

Tokenizationconsists in transforming a stream of characters into a stream of pro-cessing units, called tokens, e.g. words or phrases. Thus, during this step, by choos-ing n-grams as tokens (with n up to 2) each tweet is converted into a set of tokens, according to the BOW representation. Stop-word filtering consists in removing stop-words, i.e. words providing little or no useful information to the text analysis and can therefore be considered as noise. For the purpose of our analysis, we opted to retain all the verbal forms and the words "non" (not) and "contro" (against), since they proved to be relevant for the stance detection task. Stemming is the process of reducing each token (i.e. word) to its stem or root form, so to map to the same stem

3https://github.com/twintproject/twint

4https://github.com/Jefferson-Henrique/GetOldTweets-java/

88 Learning from Twitter stream: stance detection from tweets

Figure 6.5: Steps of the text elaboration (second module) applied to a sample tweet.

Figure from (D’Andrea et al., 2019).

words that have closely related semantics. Stem filtering consists in filtering out the stems, which are not considered relevant in the training dataset for the supervised learning stage. Feature representation consists in building, for each tweet, the corre-sponding vector of numeric features in order to represent all the tweets in the same F-dimensional feature space. To this aim we have resorted to the well known TF-IDF index.

The supervised learning stage: text classification

The classification model assigns to each tweet, now represented as a numeric vector, a possible class label in the set {in favor of vaccination, not in favor of vaccination, neutral}.

The training of the classification model, as well as the stem filtering and the fea-ture representation steps, encompasses a supervised learning stage. To this aim, we resort to a collection of Ntr labelled tweets as training set. The training tweets were fetched using a set of context-related keywords as search criteria, and were preprocessed as described in the previous section. Then, each tweet of the training set went through the following text mining steps: tokenization, stop-word filtering, stemming and feature representation in R^Q, being Q the number of stems contained in the training tweets. Finally, a feature selection algorithm has been applied in or-der to select the set of relevant stems. First, stems have been ranked according to the Information Gain (IG) value (Patil and Atique, 2013) between the corresponding features and the possible class labels. Then, stems are ranked in descending order

6.4 The vaccination topic case study: Experimental campaign 89

and the first F are selected, with F ≤Q. We experimented with different values for F. Consequently, each feature vector is reduced to the representation in R^F.

Lastly, the supervised classification model is trained. In our system, we adopted an SVM classification model (Keerthi et al., 2001), which has been used successfully for text classification in the literature (D’Andrea et al., 2015; Mohammad et al., 2017;

Tellez et al., 2017a,b).

6.4 The vaccination topic case study: Experimental campaign

Dataset extraction and annotation

The adoption of a supervised learning approach required the collection and label-ing of a set of vaccine-related tweets. For this purpose we have specified, as search criteria, the language, the period of interest, and a set of vaccine-related keywords, chosen based on a preliminary analysis consisting of: (i) the reading of newspaper articles about vaccines and vaccine-related events, and (ii) interviews with medical experts. As date of posting, we considered a time span of five months, from Septem-ber 1^st, 2016, to January 31^st, 2017. This time span has been chosen as a period of intense discussion about vaccination in Italy. The keywords employed refer to differ-ent sub-topics: (i) the vaccination topic itself; (ii) diseases possibly caused by neg-ative effects attributed to vaccines; and (iii) vaccine-preventable diseases. Further, we also queried for three widely used hashtags, namely, #libertadiscelta (hashtag for

"freedom of choice"), #iovaccino (hashtag for "I vaccinate"), and #novaccino (hashtag for "no vaccine"). Based on these criteria, we took into account 38 keywords (includ-ing synonyms, or s(includ-ingular/plural variations of the keywords). The set of keywords employed is listed in Table 6.1. It is worth underlining that we are interested in capturing the stance of users towards the vaccination topic in general: by adopting a supervised learning approach, the distinctive patterns of the three stance classes are automatically learned from the training set, regardless of the specific sub-topic.

The training set was finally built by randomly selecting and manually labelling Ntr =693tweets, including 219 tweets of class not in favor of vaccination 255 tweets of class in favor of vaccination and 219 tweets of class neutral. Table 6.2 summarizes the most relevant annotation rules.

Classification model selection: Experimental comparison

With the aim of selecting the most suitable learning scheme, an extensive experimen-tal comparison has been performed. To assess the generalization capability of each scheme we performed a 10-fold stratified cross validation (CV) procedure. Since

90 Learning from Twitter stream: stance detection from tweets

Context Italian keyword (English translation) Vaccination

topic "complotto vaccini" (vaccines conspiracy); "copertura vaccinale"

(vaccination coverage); "vaccini", "vaccino" (vaccine(s)); "big pharma"; "rischio vaccinale", "rischi vaccinali" (vaccine risk(s));

"vaxxed"; "trivalente" (trivalent); "esavalente" (hexavalent); "vac-cinati", "vaccinata", "vaccinato", "vaccinate" (vaxxed); "quadri-valente" (quadrivalent); "vaccinazione", "vaccinazioni" (vaccina-tion(s)); "libertà vaccinale" (vaccination freedom); "obiezione vac-cinale" (vaccination objection); "età vacvac-cinale" (vaccination age);

"cocktail vaccinale" (vaccination cocktail); "controindicazioni vac-cinali" (vaccine contraindications)

Negative effects attributed to vaccines

"paralisi flaccida" (flaccid paralysis); "autismo" (autism); "malat-tie autoimmuni" (autoimmune diseases); "evento avverso", "eventi avversi" (adverse event(s));

Vaccine-preventable diseases

"meningite" (meningitis), "morbillo" (measles); "rosolia" (rubella);

"parotite" (mumps); "pertosse" (whooping cough); "poliomelite"

(polio); "varicella" (varicella); "MPR" (italian acronym for measles, mumps, rubella);

Hashtags #novaccino (hashtag for "no vaccine"); #iovaccino (hashtag for "I vaccinate"); #libertadiscelta (hashtag for "freedom of choice") Table 6.1: Set of keywords (with corresponding English translation) used to fetch tweets.

Not in favor class

• A negative opinion about the vaccination is expressed

• An exhortation is made not to vaccinate ourselves or not to vaccinate the children

• Diseases and adverse effects are attributed to vaccines

• A connection between autism and vaccine is claimed

• An economic interest of pharmaceutical companies is claimed to be the driving force behind vaccination policies

• Contrariety is expressed about vaccines to be mandatory

• Freedom of choice on vaccines is advocated In favor class

• A positive opinion about the vaccination is expressed

• An exhortation is made to vaccinate ourselves or to vaccinate the children

• A positive opinion about vaccines to be mandatory is expressed

• Contrariety is expressed to the arguments advanced by the not in favor class Neutral class

• A piece of news related to the vaccination topic is reported

• A neutral opinion about the vaccination is expressed

Table 6.2: Annotation rules for the stance classes

6.4 The vaccination topic case study: Experimental campaign 91

the partition of the training set in fold is a random process, we repeated the CV pro-cedure twice using two different seed values. We recall that at each iteration of the procedure we learned the parameters for both text representation and classification on a specific training set, made by the union of nine out of the ten folds.

As anticipated in Section 6.3, we tested three families of approaches, which proved to be effective in the literature for stance detection and kindred tasks: (i) BOW text representation followed by classical machine learning algorithms for classification;

(ii) a combination of BOW and word embeddings for text representation followed by classical machine learning algorithms for classification; (iii) deep learning-based approaches for text elaboration and classification.

In the following we describe the setup for the three categories of approaches.

BOW + traditional machine learning As regards the first family, we tried different text representation schemes and several classification models. For text representa-tion, different tokenization methods and pipelines without the stemming stage have been adopted. The following classification models have been tested: C4.5 decision tree (Quinlan, 2014), Naïve Bayesian (NB) (John, 1995), Multinomial NB (MNB) (McCallum et al., 1998), Random Forest (RF) (Breiman, 2001), Simple Logistic (SL) (Landwehr et al., 2005), and SVM (Platt, 1999). The best results have been obtained with the scheme described in Section 6.3, encompassing BOW for text representation and support vector machine for classification. In the general comparison we consid-ered both the models with and without the feature selection step: in the following they are denoted as BOW+SVM_ALL (9529 features on average) and BOW+SVM_-2000 (BOW+SVM_-2000 features), respectively. For the sake of brevity, the results for the other classification models are not reported in the following.

BOW + word embedding + traditional machine learning The second approach is inspired to that adopted in (Mohammad et al., 2017), where authors compared a number of state-of-the art schemes for stance detection and achieved the best abso-lute results in a stance detection competition. The numeric representation obtained through the BOW scheme was extended by using word embeddings as extra fea-tures. In (Mohammad et al., 2017) the word embedding model was trained with a domain related corpus of English tweets. However, according to (Yang et al., 2018) millions of tweets may be required for effective training a new word embedding model: the lack of such a huge amount of domain related tweets in Italian prompted us to the adoption of three publicly available pre-trained word embeddings (namely Fast-Text, Glove and Word2Vec), in which word vecors are defined in R³⁰⁰. We ver-ified that some recent works have successfully adopted pre-trained word embed-dings for text classification (Wang et al., 2016; Uysal and Murphey, 2017). Thus,

92 Learning from Twitter stream: stance detection from tweets

we experimented three schemes that we denoted as BOW+FAST-TEXT+SVM and BOW+GLOVE+SVM and BOW+W2V+SVM, respectively.

Deep learning Finally, we also experimented two popular schemes for text repre-sentation and classification based on deep-learning. Both schemes adopt word em-beddings for text representation. Convolutional Neural Networks (CNN) and Long Short-Term Memory Network (LSTM) are employed for the classification stage. The adopted models are inspired to the network architectures presented in (Cliche, 2017).

Albeit with different parametrization, similar solutions have been exploited in re-cent works (Wang et al., 2016; Uysal and Murphey, 2017; Yang et al., 2018; Xiong et al., 2018). The models were implemented using the Python Keras library⁵. The preprocessed tweets were converted in sequence of tokens by using the Keras tok-enizer and padded to a fixed length equal to 80 with a special pad token. Also in this case, we adopted pre-trained word embeddings considering Fast-Text, Glove and Word2Vec, which contain more or less 2500 words each. The dimension of word embedding space is equal to 300 for the three word embedding models. Since in our dataset we found 2776 different tokens, new words were initialized with ran-dom samples from a uniform distribution over [0, 1), according to the outcomes presented in (Yang et al., 2018). As regards the classification models, we carried out a deep experimental campaign for identifying the most suitable architectures and the most performing training parameters.

As regards CNN, we defined three convolutional layers (Conv1D with ReLU ac-tivation) characterized by three different filter sizes in{3, 4, 5}and 100 filtering ma-trices each. The number of hidden neurons was set to 30. A dropout layer was added after the pooling layer and the hidden fully connected layer with the aim of reducing overfitting. As regards LSTM, the bidirectional layer consisted in two 100-cell LSTM layers. The 200 final hidden states were concatenated and fed into a fully connected layer of 30 units. Dropout was added in the LSTM layers and after fully connected hidden layer.

Both models were trained by minimizing the categorical cross-entropy loss at the output softmax layer, which consists of 3 neuronal units. We adopted the Adam optimizer with learning rate of 0.001 and batch size 128. In order to find the best hy-perparameter configuration for LSTM and CNN models separately, we performed a grid search using different values of dropout rate, different number of epochs, differ-ent pre-trained word embedding. We selected the models that achieved the highest accuracy on our dataset, using a 10-fold cross validation. The best configuration for CNN was the one with Fast-Text word embedding, dropout rate equal to 0.2 and 40 epochs of training. The best configuration for LSTM was the one with Word2Vec word embedding, dropout rate equal to 0.4 and 60 epochs of training. In the

follow-5https://keras.io/

6.4 The vaccination topic case study: Experimental campaign 93

ing, we denote these two schemes as Fast-Text+CNN and W2V+LSTM. However, we will show also the best results achieved adopting the three word embedding schemes combined with the CNN and the LSTM. Thus, we will also consider the fol-lowing schemes: GLOVE+CNN, W2V+CNN, Fast-Text+LSTM and GLOVE+LSTM.

Results and discussion

The models described in the previous section have been evaluated in terms of widely-used metrics (Forman, 2003), namely, accuracy, precision, recall, F-measure, and Area Under the Curve (AUC).

Table 6.3 shows the average results achieved by the different methods discussed above. It is worth noting that with the methods based on deep learning we achieve the worst results; the remaining methods achieve similar results, even though the BOW+SVM_ALL shows the highest accuracy. Relatively poor performance of deep learning methods may be motivated by the low data regime of our experimental set-ting. According to previous studies (Uysal and Murphey, 2017), this outcome is not completely unexpected: deep architectures has proven to be a successful approach in many areas, but they typically require large dataset for learning. In our applica-tion, also because of the limited number of tweets available in Italian and the known issues related to the small length of each data item, the generation of a large training set suitable for deep architectures would be very tedious and almost unfeasible, and would make the application itself not very appealing.

Based on the values of accuracy obtained with 10-fold CV by the eleven classi-fication models, we performed a statistical comparison of the results aimed to ver-ify whether there exist any statistical difference among their performance. As sug-gested in (Derrac et al., 2011), we applied a non-parametric statistical test, namely the Wilcoxon signed-rank test (Wilcoxon, 1945), which detects significant differ-ences between two distributions: BOW+SVM_All is selected as control model and compared with each of the remaining models. Each distribution consists of 20 ac-curacy values obtained with the two repetitions of CV. In all the tests, we used α = 0.05 as level of significance. Results are reported in Table 6.4: R+ and R- de-note, respectively, the sum of ranks for the folds in which the first model outper-formed the second, and the sum of ranks for the opposite condition. The statistical hypothesis of equivalence is rejected whenever the p-value is lower than α; other-wise, we cannot reject the null hypothesis. Thus, BOW+SVM_All statistically out-performs the other models, except BOW+SVM_2000, BOW+FASTTEXT + SVM and BOW+W2V+SVM. We can conclude that such four schemes are the most suitable for our case study, the classification of stance towards vaccination topic expressed by Twitter users in Italy. For the rest of our analysis we employ the BOW+SVM_-2000 model since it is the simplest one (it does not require additional features) and

94 Learning from Twitter stream: stance detection from tweets

Classifier Class F-measure Precision Recall AUC Accuracy

BOW+SVM_All not in favor 0.60 62.6% 56.6% 0.73

65.4%

in favor 0.65 64.5% 65.5% 0.74

neutral 0.71 68.6% 74.0% 0.80

BOW+SVM_2000 not in favor 0.59 61.5% 56.2% 0.73

64.8%

in favor 0.64 63.2% 63.9% 0.74

neutral 0.72 69.4% 74.4% 0.81

BOW+FASTTEXT+SVM not in favor 0.59 57.9% 60.3% 0.75

64.2%

in favor 0.73 73.3% 72.6% 0.82

neutral 0.61 62.1% 60.4% 0.72

BOW+GLOVE+SVM not in favor 0.56 59.5% 53.0% 0.74

62.2%

in favor 0.70 66.9% 72.1% 0.79

neutral 0.61 59.9% 61.6% 0.71

BOW+W2V+SVM not in favor 0.59 61.1% 56.6% 0.73

63.7%

in favor 0.72 68.9% 74.9% 0.81

neutral 0.60 60.7% 60.0% 0.72

FASTEXT+CNN not in favor 0.57 57.8% 57.9% 0.69

62.9%

in favor 0.63 64.3% 62.7% 0.70

neutral 0.68 69.6% 68.0% 0.77

GLOVE+CNN not in favor 0.55 54.8% 56.6% 0.67

60.5%

in favor 0.63 64.5% 62.4% 0.71

neutral 0.63 65.1% 62.2% 0.73

W2V+CNN not in favor 0.57 57.2% 58.0% 0.69

62.5%

in favor 0.62 62.9% 61.6% 0.70

neutral 0.69 70.4% 68.1% 0.77

FASTEXT+LSTM not in favor 0.55 54.6% 58.4% 0.67

61.2%

in favor 0.63 61.5% 63.6% 0.70

neutral 0.66 72.7% 61.1% 0.75

GLOVE+LSTM not in favor 0.56 55.5% 58.5% 0.68

61.8%

in favor 0.62 62.2% 63.2% 0.70

neutral 0.67 73.3% 63.4% 0.76

W2V+LSTM not in favor 0.57 56.6% 59.8% 0.68

61.9%

in favor 0.59 59.3% 62.0% 0.69

neutral 0.69 76.2% 63.9% 0.77

Table 6.3: Results obtained by the approaches discussed in the text using 10-fold cross validation.

6.4 The vaccination topic case study: Experimental campaign 95

achieves results that are comparable (even slightly better) with the ones achieved by the recent state-of-the art method introduced in (Mohammad et al., 2017).

Comparison R+ R- p-values Hypotesis

BOW+ SVM_ALL vs. BOW+ SVM_2000 27 18 0.528926 Not-rejected BOW+ SVM_ALL vs. BOW+ FASTTEXT + SVM 35.5 19.5 0.386271 Not-rejected BOW+ SVM_ALL vs. BOW+ GLOVE + SVM 55 0 0.003842 Rejected BOW+ SVM_ALL vs. BOW+W2V + SVM 30 15 0.343253 Not-rejected

BOW+ SVM_ALL vs. FASTEXT + CNN 42 3 0.016172 Rejected

BOW+ SVM_ALL vs. GLOVE + CNN 53 2 0.007267 Rejected

BOW+ SVM_ALL vs. W2V + CNN 43 2 0.012851 Rejected

BOW+ SVM_ALL vs. FASTEXT + LSTM 55 0 0.003842 Rejected

BOW+ SVM_ALL vs. GLOVE + LSTM 41 4 0.022327 Rejected

BOW+ SVM_ALL vs. W2V + LSTM 53 2 0.007267 Rejected

BOW+ SVM_ALL vs. GLOVE + CNN 53 2 0.007267 Rejected

BOW+ SVM_ALL vs. W2V + CNN 43 2 0.012851 Rejected

BOW+ SVM_ALL vs. FASTEXT + LSTM 55 0 0.003842 Rejected

BOW+ SVM_ALL vs. GLOVE + LSTM 41 4 0.022327 Rejected

BOW+ SVM_ALL vs. W2V + LSTM 53 2 0.007267 Rejected

Table 6.4: Results of the Wilcoxon Signed-Rank test on the accuracies obtained on the test set.

Online Monitoring

The final goal of our stance detection system is to uncover the trend of the public opinion over time. In this regard, we performed a preliminary monitoring analysis by considering a time span of 10 months from September 1^st, 2016 to June 30^th, 2017.

Notably, we used the same set of keywords reported in Table 6.1. Overall, we fetched N = _{112, 397}tweets, from which we removed those used as training samples. The remaining tweets were pre-processed according to the pipeline described in Section 6.3, and classified with the model trained on the whole training set. Overall, by con-sidering the whole 10-month monitoring campaign, the opinion is slightly biased towards in favor of vaccination: about 19% of the tweets are in favor, about 64% are neutral, and about 17% are not in favor of vaccination. However, such aggregated result cannot grasp the dynamics of opinion over time. Indeed, we have reduced the granularity of the analysis: Fig. 6.6 illustrates an overview of the daily distri-bution of tweets by class as a stacked histogram. As a first outcome, we can notice that the graph presents some spikes in the overall volume of tweets: by analyzing news and press review of those days, we have discovered the presence of specific socio-political events related to vaccination. Events details are schematized in Table 6.5 and the progressive IDs are also reported near to the peaks in Fig. 6.6.

96 Learning from Twitter stream: stance detection from tweets

ID Date Event description

1 2016-09-28 Cancellation of the projection of the documentary film "Vaxxed: from cover-up to catastrophe" in the Italian Republic Senate

2 2016-10-04 Expected projection of the documentary film "Vaxxed: from cover-up to catastrophe" in the Italian Republic Senate

3 2016-10-24 Speech by President of Italian Republic about vaccines

4 2016-11-22 Approval of the law establishing vaccination requirements for school children in Emilia Romagna Region

5 2016-12-28 Death of a school teacher for meningitis in Rome

6 2017-01-26 Agreement between Italian Health Minister and Italian Regions about vaccinations requirement

7 2017-02-07 Cancellation of the projection of the documentary film "Vaxxed: from cover-up to catastrophe" at the European Parliament

8 2017-03-16 Increase of 230% cases of measles in Italy

9 2017-04-17 Italian TV show Report focusing on vaccines cause controversy 10 2017-04-19 Fake vaccinations in the Italian city of Treviso

11 2017-05-03 Fake vaccinations in the Friuli Region

12 2017-05-04 NY Times against Italian political party against vaccines 13 2017-05-04 5 times increase in measles cases in Italy in April 2017

14 2017-05-19 Approval of the decree on vaccinations requirement (12 vaccines) in Italian kindergartens

15 2017-06-07 President of Italian Republic signs the decree about 12 vaccinations requirement in Italian schools

16 2017-06-22 Kid sick of leukemia died for measles in Monza Table 6.5: Real world context related events.

Figure 6.6: Daily number of tweets by class (from September 1^st, 2016 to June 30^th, 2017). Figure from (D’Andrea et al., 2019).

6.5 Outcomes and Perspectives 97

The effect of a triggering event in the volume of tweets may be more or less pro-nounced depending on the scope of the event and on the perception by Twitter users:

for instance, event 4, events 9 and 10, and event 16 have stirred up the discussion so that more than 2000 daily tweets have been collected. Thus, we have deemed it ap-propriate to deepen the analysis in the days of these events: specifically, we selected the tweets in the first two or three days after the occurrence of an event, based on a visual inspection of Fig. 6.6, and evaluated their distribution over the three classes.

Due to the proximity of the dates of events 9 and 10, they were merged into a sin-gle aggregated event. Fig. 6.7 reports the pie plots of the distribution of the stance for the considered events. During event 4, 3566 tweets were collected: most tweets do not contain an opinionated message (about 59%), whereas the tweets in favor of vaccination (about 23%) slightly exceed those not in favor (about 18%). The overall stance is neutral during the aggregated events 9 and 10 (22% in favor, 22% not in fa-vor) and considerably biased towards in favor of vaccination during event 16 (36%

in favor, 25% not in favor).

Figure 6.7: Distribution of opinion polarity over the classes during events 4, 9-10, and 16. Figure from (D’Andrea et al., 2019).

Fig. 6.8 shows the distribution of tweets over the three classes per month: the number of opinionated tweets is rather low at the beginning of the monitoring cam-paign (about 70% are neutral) and grows considerably in Spring 2017. A visual analysis of Fig. 6.6 confirms that not only the overall volume, but also the number of tweets expressing a subjective opinion has grown during this period, due to the occurrence of many context-related events.

6.5 Outcomes and Perspectives

The design, development and deployment of a system for stance classification from tweets have been presented in this chapter. As a first outcome, the most suitable scheme for text processing, representation and classification, has been evaluated through an extensive experimental analysis for model selection: tweets are classified

98 Learning from Twitter stream: stance detection from tweets

Figure 6.8: Distribution of opinion polarity over the classes by month. Figure from (D’Andrea et al., 2019).

adopting the BOW representation of the texts, using n-grams as tokens, followed by an SVM model. We experimentally showed that, for the specific context of stance detection on Twitter, the adopted text classification scheme outperforms recent state-of-the-art approaches, including text classification models based on deep-learning.

Furthermore, the devised classification model achieves results that are comparable with the ones achieved in the stance detection literature. As a second outcome, the intelligent system has been deployed on the Twitter stream to monitor and track shifts of the Italian public opinion about vaccinations over a time span of 10 months.

We have found that the volume of tweets and polarity of the opinion often vary according to context-related socio-political events, which may influence the public opinion itself. Indeed, an early detection of an opinion shift may be of the utmost importance for Public Healthcare Organizations in order to promote actions aimed at avoiding outbreaks of eradicated diseases.

Nevertheless, the presented case study deserves further investigation. On the one hand, whenever the goal is to probe the public opinion on a certain topic, simply relying on plain text analysis may lead to partial or inaccurate outcomes: rather, a more realistic, broad and accurate estimation of the public opinion can be obtained exploiting user-based data aggregations. On the other hand, the issue of concept drift should be taken into account since we are dealing with continuous classifica-tion of a particular form of data streams (i.e. the stream of tweets). The initial clas-sification model, trained using data extracted in a specific time interval, may turn

Nel documento AlessandroRenda Algorithmsandtechniquesfordatastreammining (pagine 90-155)