9.Classificare gli Hashtag: un approccio al sentiment analysis basato sul grafo

Nel precedente capitolo è stata analizzata la relazione tra utente e hashtag, in questo invece si riassume un analisi portata avanti da alcuni studiosi sulle relazioni tra gli stessi hashtag .

L’obiettivo principale del lavoro di Wang12_{è
quello
di
sviluppare
e
testare}

nuovi strumenti per il sentiment analysis su Twitter. L’analisi delle opinioni, su di un determinato topic (argomento), è uno degli obiettivi, ma soprattutto è una delle speranze, di chi deve comunicare sul web o di chi voglia monitorare la percezione comune di un determinato argomento.

Poniamo il caso della Apple che vuole sapere qual è il giudizio sul proprio prodotto, da parte del popolo di Twitter. Pensiamo al presidente Obama che vuole iniziare la sua futura campagna elettorale basandosi sulla rete. Sarebbe utilissimo in questi casi, per i diretti interessanti, comprendere la percezione del “prodotto” sulla rete per effettuare le giuste scelte di comunicazione.

Dalla lettura degli articoli su di un dato argomento solitamente ci si aspetta di comprendere più o meno le opinioni su di esso o per lo meno quelle che i GateKeeper, coloro che decidono cosa è importante, vogliono far passare.

Nei casi esposti in precedenza si rende necessaria un’analisi delle “Tendenze” di opinione su di un periodo.

Nel lavoro di Wang si prende come obiettivo lo sfruttamento della caratteristica unica del tweet, l’ hashtag.

Su Twitter un hashtag è una convenzione spinta dalla comunità per aggiungere contesto e metadati al messaggio.

Gli hashtag sono creati dagli utenti come un modo per mettere in luce argomenti e classificare i messaggi.

12_{Topic
Sentiment
Analysis
in
Twitter:
A
Graph-‐based
Hashtag
Sentiment}

Questa caratteristica fa percepire Twitter come molto più espressivo degli atri social network. Nel lavoro di Wang si usano 600’000 tweet. Solo il 14,6% di questi utilizza almeno un hashtag.

Wang classifica gli hashtag in tre categorie:

• Topic(#iphone) • Sentiment(#love)

• Sentiment topic(#iloveobama)

Wang ritiene che il terzo gruppo sia il più espressivo in quanto comprende gli hashtag che fondono insieme l’argomento con il giudizio.

L’idea principale di Wang è:

«Aggregare gli hashtag in base alla polarità dei messaggi già classificati in cui compaiono»

Purtroppo questo sistema non da risultati strabilianti in quanto si basa sulla classificazione automatica dei tweet che al giorno d’oggi non è eccezionale. Anzi il principale obiettivo di questo lavoro dovrebbe essere di migliorare le metodologie attuali per fornire nuova informazione, ma se il nuovo modello viene addestrato ad imitare il vecchio ci troviamo nella situazione del cane che si morde la coda.

E’ quindi stato utilizzato un secondo metodo, differente da quello originale, per estrarre informazione dagli hashtag e le correlazioni tra questi.

Wang osserva che nel loro dataset la probabilità che due hashtag della solita polarità ricorrano assieme nel medesimo messaggio è di oltre l’80%. Un altro aspetto che Wang analizza è il significato letterale di un hashtag.

Il modello da Wang proposto quindi si basa su un grafico delle co occorrenze di hashtag.

Successivamente Wang costruisce un SVM (Support Vector Machine) basato sul significato letterale dello hashtag, ma questo non verrà qui analizzato.

66 Dato l’insieme di hashtag H={h1,h2………hn} dove ognuno di questi è collegato con un messaggio nell’insieme T={t1,t2,t3…….tn}, classificare la polarità di ogni hashtag Y={y1,y2,…..yn} attribuendogli un valore contenuto nell’insieme {pos,neg}.

Dato l’obiettivo e dato il grafo Wang ha cercato di ottenere la polarità di ogni nodo(hashtag) basandosi sulla diffusione nella rete; questo concetto verrà approfondito nell’apposito capitolo.

Quindi la polarità di un hashtag non è solo figlia dei tweet in cui compare ma anche dai suoi vicini nella rete.

Prendendo come esempio #ipad, questo ha come vicini 5 nodi i quali hanno differenti polarità, in verde i negativi e in rosso i positivi.

timent classification for each tweet but also incorporate the link information among hashtags and the literal meaning of them to solve the hashtag sentiment classification problem, which is expected to be more robust and reliable.

Recently, the opinion mining research has begun to pay more and more attention to social networks such as Twitter because they give rise to the massive user-generated publishing activities. In Twitter, a huge amount of tweets contain sentiment information. Barbosa and Feng [2] first investigat a two-stage SVM (subjectivity and polarity) classifier which seems to be more robust regarding biased and noisy data. In this paper, we adopt this two-stage classification framework to build our tweet-level classifier. In Twitter, some unique characteristics can also be utilized for sentiment classification. Davidov et al. [5] employ hashtags and smileys as sentiment labels for classification to allow diverse sentiment types for short texts. In their another paper [6], they analyze the use of “#sarcasm” hashtags and addressed the problem of sarcastic tweets recognition. Jiang et al. [9] propose to take the target of sentiment into consid- eration in Twitter sentiment analysis, where the hashtags were also utilized as unigram features. Although the hashtag has become a key feature in many micro-blog services, to our best knowledge, our paper is the first to address the task of hashtag-level sentiment classification.

3. HASHTAG-LEVEL SENTIMENT CLAS- SIFICATION

We start this section with a formal definition for the task of hashtag-level sentiment classification2_{. Given a set of hashtags}

H = {h1, h2, . . . , hm} where each hashtag hiis associated with

a set of tweets Ti = {τi1, τi2, . . . , τin}, we aim to collectively

infer the sentiment polarities, y = {y1, y2, . . . , ym} where yi ∈

{pos, neg}3, for H. We assume the hashtags in H are with sentiments. The reason lies in that we are particularly interested in the hot hashtags (i.e. topics) which are usually accompanied with sentiment since people tend to express rich sentiment information in their tweets towards these hot topics. The hashtag-level sentiment classification inherently bases upon the tweet-level sentiment analysis results. Let CT be a tweet-level classifier where each tweet τ

can be assigned with positive or negative probability Prpos(τ ) and

Prneg(τ ), ensuring that Prpos(τ )+Prneg(τ ) = 1 to form a binary

probability distribution. Here, neutral tweets are ignored since they are not useful for the polarity prediction of hashtags. We develop CT using the state-of-the-art sentiment analysis method, which is

presented in details in Section. 4.2.

We can obviously induce the sentiment polarity yifor the hash-

tag hithrough aggregating the results from CT by a simple voting

strategy. This approach, as stated in Section. 4.3, takes the classification for each hashtag independently. As seen in our experiments, the result is not promising. We have shown that hashtags co-occurring in tweets have much higher probability to share the same sentiment polarity than that if they are randomly selected. This observation clearly motivates us to conduct the hashtag-level sentiment classification collectively, which has been proven to be effective in link-based text classification [24, 20]. In the rest of this section, we will first introduce the hashtag graph model and then present the classification framework and the approximate algorithms for inference.

2_{For the sake of simplicity, we restrict our scenario within the con-}

text of Twitter, although applying this framework to other micro- blogs where hashtags also exist is straightforward.

3_{Hereafter, we use pos and neg to represent positive and negative}

label, respectively.

3.1 The Hashtag Graph Model

We define a hashtag graph HG = {H, E}, in which the edge set E consists of links between hashtags and each edge eijrepresents

an undirected link between hashtags hiand hj, which co-occur in

at least one tweet. Figure. 1 illustrates an example of the hashtag graph, in which hashtags are linked if and only if they co-occur at least once in tweets. Here we take the hastag “#obama” as an example. The surrounding hashtags are generally of three categories: (1) topics which is closely connected to Obama (e.g. “#president” and “#healthcare”, etc.); (2) sentiment hashtags which expresses subjective opinions towards Obama, like “#ideal”, “#leader” and (3) sentiment-topic hashtags which indicate the target and the sentiment polarity simultaneously, such as “#iloveobama”. From this figure, as we can see, the neighbor hashtags more or less lend some sentiment tendency to “#obama”. Consequently, It would be unwise if we independently determine the sentiment polarity of each hashtag. Our graph model is aimed at incorporating the co-occurrence relationship and deciding sentiment polarity collectively.

Figure 1: An example of a Hashtag Graph Model

Given the hashtag graph, our ultimate goal is to assign each hashtag hiwith a proper sentiment label yi ∈ {pos, neg}. We make

the Markov assumption that the determination of sentiment polarity yican only be influenced by either the content of correspond-

ing tweets τ ∈ Ti or sentiment assignments of neighbor hashtag

hjs.t.(hi, hj) ∈ E, which results in our HG a pairwise Markov

Network[21]. This leads us to the following factorized distribution: log (Pr (y_{|HG)) =} ! hi∈H log (φi(yi|hi)) + ! (hj,hk)∈E log (ψj,k(yj, yk|hj, hk))− log Z (1)

where the first and second sums correspond to the potential func- tions of a tweet-based factor and a hashtag-hashtag factor. Z is the regularization factor. The potential function of tweet-based factor can be directly obtained through calculation of the polarity probability for each corresponding tweet; while the hashtag-hashtag factor potential function should incorporate the link information to

1033

Figure 2: An example of the enhanced boosting classification setting in which strong sentiment hashtags only provide polar- ity influence to neighbors. Hashtags in red are positive label- fixed nodes and green are negative.

To illustrate this better, we present an example in Figure. 2, where the hashtag “#ipad” has several strong sentiment neighbors such as “#love” and “#isuck”. In our enhanced boosting setting, these colored neighbors will not get involved in dynamic updating them- selves but only send polarity influence to surrounding neighbors. The propagation from “#ipad” to these colored neighbors will be neglected and blocked.

4. EXPERIMENTAL STUDY 4.1 Data Collection and Evaluation

The evaluation of the hashtag-level sentiment classification is challenging because it is difficult to collect the “golden standard” data set. Although human annotation is possible, we maintain that the workload is rather demanding for large scale evaluation data. What makes it more unreliable is that the satisfactory inter-annotator agreement cannot be achieved, with two contributing factors being that hashtags are often used in tweets with different sentiments, and the sentiment polarity of tweets cannot always be determined with confidence. Instead, in our experiments, to evaluate the performance of the hashtag sentiment classification and to collect the training data for enhanced boosting classification, we use a self- annotation manner to label the dataset.

The data collection process is described as follows. We first ran a coarse-grained selection to find hashtags that we are interested in. We picked 10 topics including “Obama”, “Bush”, “Lady Gaga”, “Justin Bieber”, “Islam”, “Lakers”, “Youtube”, “iPad”, “Android” and “Microsoft”. Then we searched from the tweets pool for hashtags containing the topic words as our seeds. This seed set was hence expanded into our hashtag set H by retrieving all hashtags that has co-occurred with at least one of the seed hashtags. Finally, for the selected hashtags in H, we labeled hashtags containing sentiment words4_{with appropriate sentiment polarity labels (pos, neg).}

This subset of H, denoted by ˜H, is used as our label-fixed set for enhanced boosting classification and test set for evaluation to mea- sure the accuracy, precision, recall and F1 metrics. In addition, we

4_{In our experiments, we selected 50 strong positive and 50 strong}

negative words as our sentiment lexicon

conduct a case study to illustrate some interesting results in Sec- tion. 4.6.

In our experiments, our tweets pool has about 0.6 million tweets which were collected in one week period from Twitter. After the seeds selection and data enrichment process, we obtain H consist- ing of 2,181 hashtags which occur in 29,195 tweets. The size of edge set E is 27,430. Selecting hashtags containing strong sentiment words results in a subset ˜_{H containing 947 examples, which} has 595 positive samples and 352 negative samples. The remaining hashtags in H do not have a automatic annotated groundtruth, but the classification of them can be evaluated through the case study. This dataset is used for measuring the performance of hashtag sentiment classification algorithms. For enhanced boosting classification approaches, this dataset will be spilled into the training set and test set to evaluate the classification result with cross validation.

4.2 Tweet Level Sentiment Classifier

In this paper, we build the hashtag-level sentiment classification on top of the tweet-level sentiment analysis results. Basically, we adopted the state-of-the-art tweet-level sentiment classification approach [2], which uses a two-stage SVM classifier to determine the sentiment polarity of a tweet. The first (i.e. subjectivity) classifier determines whether a tweet is neutral or subjective while the second one (i.e. polarity classifier) assigns a subjective tweet with positive or negative polarity. The SVMlight_package5 _{is used in}

our experiments. The two SVM classifiers take the same features as input, which are divided into two categories:

• Content features: including unigram words, punctuation and emoticons. We treat the presence of a token (unigram word, punctuation, or emoticon) as a binary feature which is 1 if the corresponding token occurs in tweet and 0 otherwise. • Sentiment lexicon features: we employ the lexicon from the

General Inquirer6_{and count the number of positive or nega-}

tive words in tweets as features. There are two dimensions in the feature vector which denote the number of positive and negative words in the tweet.

Classifier Accuracy Precision Recall F1 subjectivity(1) 83.13% 59.45% 36.59% 45.27%

polarity(2) 88.96% 90.49% 94.82% 92.60%

(1)+(2) 84.13% _! _! _!

Table 1: Performance of the tweet-level classifier

We use the subjectivity classifier to filter out the neutral tweets. The output of the polarity classifier for a subjective tweet is a real value score s which is positive when predicting the tweet t as positive and negative when predicting as negative. Since we need to convert this value into the polarity probability, we use an empirical threshold ξ = 2 and the following formula is adopted, which is similar to the manner introduced in [16] :

Prpos(t) =    1 s >= ξ 0.5 + s/(2ξ) s_{∈ (−ξ, ξ)} 0 s <=−ξ (10) Prneg(t) = 1− Prpos(t) (11) 5_{http://svmlight.joachims.org/} 6_{http://www.wjh.harvard.edu/ inquirer/}

Per raccogliere i dati necessari all’analisi è stato usato un sistema basato sulle Keyword. Prima è stata fatta una ricerca sui topic d’interesse utilizzando come parole chiavi gli argomenti stessi. Da questi messaggi poi sono state estratte altre parole chiave, in maniera soggettiva supportata da dati lessicali(co occorrenze, ecc..), rilevanti per i vari argomenti.

Nel documento Social Mining: osservare i comportamenti dei consumatori attraverso Twitter (pagine 64-68)