Nel precedente capitolo è stata analizzata la relazione tra utente e hashtag, in questo invece si riassume un analisi portata avanti da alcuni studiosi sulle relazioni tra gli stessi hashtag .
L’obiettivo principale del lavoro di Wang12 è quello di sviluppare e testare
nuovi strumenti per il sentiment analysis su Twitter. L’analisi delle opinioni, su di un determinato topic (argomento), è uno degli obiettivi, ma soprattutto è una delle speranze, di chi deve comunicare sul web o di chi voglia monitorare la percezione comune di un determinato argomento.
Poniamo il caso della Apple che vuole sapere qual è il giudizio sul proprio prodotto, da parte del popolo di Twitter. Pensiamo al presidente Obama che vuole iniziare la sua futura campagna elettorale basandosi sulla rete. Sarebbe utilissimo in questi casi, per i diretti interessanti, comprendere la percezione del “prodotto” sulla rete per effettuare le giuste scelte di comunicazione.
Dalla lettura degli articoli su di un dato argomento solitamente ci si aspetta di comprendere più o meno le opinioni su di esso o per lo meno quelle che i GateKeeper, coloro che decidono cosa è importante, vogliono far passare.
Nei casi esposti in precedenza si rende necessaria un’analisi delle “Tendenze” di opinione su di un periodo.
Nel lavoro di Wang si prende come obiettivo lo sfruttamento della caratteristica unica del tweet, l’ hashtag.
Su Twitter un hashtag è una convenzione spinta dalla comunità per aggiungere contesto e metadati al messaggio.
Gli hashtag sono creati dagli utenti come un modo per mettere in luce argomenti e classificare i messaggi.
12 Topic Sentiment Analysis in Twitter: A Graph-‐based Hashtag Sentiment
Questa caratteristica fa percepire Twitter come molto più espressivo degli atri social network. Nel lavoro di Wang si usano 600’000 tweet. Solo il 14,6% di questi utilizza almeno un hashtag.
Wang classifica gli hashtag in tre categorie:
• Topic(#iphone) • Sentiment(#love)
• Sentiment topic(#iloveobama)
Wang ritiene che il terzo gruppo sia il più espressivo in quanto comprende gli hashtag che fondono insieme l’argomento con il giudizio.
L’idea principale di Wang è:
«Aggregare gli hashtag in base alla polarità dei messaggi già classificati in cui compaiono»
Purtroppo questo sistema non da risultati strabilianti in quanto si basa sulla classificazione automatica dei tweet che al giorno d’oggi non è eccezionale. Anzi il principale obiettivo di questo lavoro dovrebbe essere di migliorare le metodologie attuali per fornire nuova informazione, ma se il nuovo modello viene addestrato ad imitare il vecchio ci troviamo nella situazione del cane che si morde la coda.
E’ quindi stato utilizzato un secondo metodo, differente da quello originale, per estrarre informazione dagli hashtag e le correlazioni tra questi.
Wang osserva che nel loro dataset la probabilità che due hashtag della solita polarità ricorrano assieme nel medesimo messaggio è di oltre l’80%. Un altro aspetto che Wang analizza è il significato letterale di un hashtag.
Il modello da Wang proposto quindi si basa su un grafico delle co occorrenze di hashtag.
Successivamente Wang costruisce un SVM (Support Vector Machine) basato sul significato letterale dello hashtag, ma questo non verrà qui analizzato.
66 Dato l’insieme di hashtag H={h1,h2………hn} dove ognuno di questi è collegato con un messaggio nell’insieme T={t1,t2,t3…….tn}, classificare la polarità di ogni hashtag Y={y1,y2,…..yn} attribuendogli un valore contenuto nell’insieme {pos,neg}.
Dato l’obiettivo e dato il grafo Wang ha cercato di ottenere la polarità di ogni nodo(hashtag) basandosi sulla diffusione nella rete; questo concetto verrà approfondito nell’apposito capitolo.
Quindi la polarità di un hashtag non è solo figlia dei tweet in cui compare ma anche dai suoi vicini nella rete.
Prendendo come esempio #ipad, questo ha come vicini 5 nodi i quali hanno differenti polarità, in verde i negativi e in rosso i positivi.
timent classification for each tweet but also incorporate the link in- formation among hashtags and the literal meaning of them to solve the hashtag sentiment classification problem, which is expected to be more robust and reliable.
Recently, the opinion mining research has begun to pay more and more attention to social networks such as Twitter because they give rise to the massive user-generated publishing activities. In Twitter, a huge amount of tweets contain sentiment information. Barbosa and Feng [2] first investigat a two-stage SVM (subjectivity and po- larity) classifier which seems to be more robust regarding biased and noisy data. In this paper, we adopt this two-stage classifica- tion framework to build our tweet-level classifier. In Twitter, some unique characteristics can also be utilized for sentiment classifica- tion. Davidov et al. [5] employ hashtags and smileys as sentiment labels for classification to allow diverse sentiment types for short texts. In their another paper [6], they analyze the use of “#sarcasm” hashtags and addressed the problem of sarcastic tweets recognition. Jiang et al. [9] propose to take the target of sentiment into consid- eration in Twitter sentiment analysis, where the hashtags were also utilized as unigram features. Although the hashtag has become a key feature in many micro-blog services, to our best knowledge, our paper is the first to address the task of hashtag-level sentiment classification.
3. HASHTAG-LEVEL SENTIMENT CLAS- SIFICATION
We start this section with a formal definition for the task of hashtag-level sentiment classification2. Given a set of hashtags
H = {h1, h2, . . . , hm} where each hashtag hiis associated with
a set of tweets Ti = {τi1, τi2, . . . , τin}, we aim to collectively
infer the sentiment polarities, y = {y1, y2, . . . , ym} where yi ∈
{pos, neg}3, for H. We assume the hashtags in H are with senti- ments. The reason lies in that we are particularly interested in the hot hashtags (i.e. topics) which are usually accompanied with sen- timent since people tend to express rich sentiment information in their tweets towards these hot topics. The hashtag-level sentiment classification inherently bases upon the tweet-level sentiment anal- ysis results. Let CT be a tweet-level classifier where each tweet τ
can be assigned with positive or negative probability Prpos(τ ) and
Prneg(τ ), ensuring that Prpos(τ )+Prneg(τ ) = 1 to form a binary
probability distribution. Here, neutral tweets are ignored since they are not useful for the polarity prediction of hashtags. We develop CT using the state-of-the-art sentiment analysis method, which is
presented in details in Section. 4.2.
We can obviously induce the sentiment polarity yifor the hash-
tag hithrough aggregating the results from CT by a simple voting
strategy. This approach, as stated in Section. 4.3, takes the clas- sification for each hashtag independently. As seen in our experi- ments, the result is not promising. We have shown that hashtags co-occurring in tweets have much higher probability to share the same sentiment polarity than that if they are randomly selected. This observation clearly motivates us to conduct the hashtag-level sentiment classification collectively, which has been proven to be effective in link-based text classification [24, 20]. In the rest of this section, we will first introduce the hashtag graph model and then present the classification framework and the approximate al- gorithms for inference.
2For the sake of simplicity, we restrict our scenario within the con-
text of Twitter, although applying this framework to other micro- blogs where hashtags also exist is straightforward.
3Hereafter, we use pos and neg to represent positive and negative
label, respectively.
3.1 The Hashtag Graph Model
We define a hashtag graph HG = {H, E}, in which the edge set E consists of links between hashtags and each edge eijrepresents
an undirected link between hashtags hiand hj, which co-occur in
at least one tweet. Figure. 1 illustrates an example of the hashtag graph, in which hashtags are linked if and only if they co-occur at least once in tweets. Here we take the hastag “#obama” as an exam- ple. The surrounding hashtags are generally of three categories: (1) topics which is closely connected to Obama (e.g. “#president” and “#healthcare”, etc.); (2) sentiment hashtags which expresses sub- jective opinions towards Obama, like “#ideal”, “#leader” and (3) sentiment-topic hashtags which indicate the target and the senti- ment polarity simultaneously, such as “#iloveobama”. From this figure, as we can see, the neighbor hashtags more or less lend some sentiment tendency to “#obama”. Consequently, It would be unwise if we independently determine the sentiment polarity of each hashtag. Our graph model is aimed at incorporating the co-occurrence relationship and deciding sentiment polarity collec- tively.
Figure 1: An example of a Hashtag Graph Model
Given the hashtag graph, our ultimate goal is to assign each hash- tag hiwith a proper sentiment label yi ∈ {pos, neg}. We make
the Markov assumption that the determination of sentiment polar- ity yican only be influenced by either the content of correspond-
ing tweets τ ∈ Ti or sentiment assignments of neighbor hashtag
hjs.t.(hi, hj) ∈ E, which results in our HG a pairwise Markov
Network[21]. This leads us to the following factorized distribution: log (Pr (y|HG)) = ! hi∈H log (φi(yi|hi)) + ! (hj,hk)∈E log (ψj,k(yj, yk|hj, hk))− log Z (1)
where the first and second sums correspond to the potential func- tions of a tweet-based factor and a hashtag-hashtag factor. Z is the regularization factor. The potential function of tweet-based factor can be directly obtained through calculation of the polarity proba- bility for each corresponding tweet; while the hashtag-hashtag fac- tor potential function should incorporate the link information to
1033
Figure 2: An example of the enhanced boosting classification setting in which strong sentiment hashtags only provide polar- ity influence to neighbors. Hashtags in red are positive label- fixed nodes and green are negative.
To illustrate this better, we present an example in Figure. 2, where the hashtag “#ipad” has several strong sentiment neighbors such as “#love” and “#isuck”. In our enhanced boosting setting, these colored neighbors will not get involved in dynamic updating them- selves but only send polarity influence to surrounding neighbors. The propagation from “#ipad” to these colored neighbors will be neglected and blocked.
4. EXPERIMENTAL STUDY 4.1 Data Collection and Evaluation
The evaluation of the hashtag-level sentiment classification is challenging because it is difficult to collect the “golden standard” data set. Although human annotation is possible, we maintain that the workload is rather demanding for large scale evaluation data. What makes it more unreliable is that the satisfactory inter-annotator agreement cannot be achieved, with two contributing factors being that hashtags are often used in tweets with different sentiments, and the sentiment polarity of tweets cannot always be determined with confidence. Instead, in our experiments, to evaluate the per- formance of the hashtag sentiment classification and to collect the training data for enhanced boosting classification, we use a self- annotation manner to label the dataset.
The data collection process is described as follows. We first ran a coarse-grained selection to find hashtags that we are interested in. We picked 10 topics including “Obama”, “Bush”, “Lady Gaga”, “Justin Bieber”, “Islam”, “Lakers”, “Youtube”, “iPad”, “Android” and “Microsoft”. Then we searched from the tweets pool for hash- tags containing the topic words as our seeds. This seed set was hence expanded into our hashtag set H by retrieving all hashtags that has co-occurred with at least one of the seed hashtags. Finally, for the selected hashtags in H, we labeled hashtags containing sen- timent words4with appropriate sentiment polarity labels (pos, neg).
This subset of H, denoted by ˜H, is used as our label-fixed set for enhanced boosting classification and test set for evaluation to mea- sure the accuracy, precision, recall and F1 metrics. In addition, we
4In our experiments, we selected 50 strong positive and 50 strong
negative words as our sentiment lexicon
conduct a case study to illustrate some interesting results in Sec- tion. 4.6.
In our experiments, our tweets pool has about 0.6 million tweets which were collected in one week period from Twitter. After the seeds selection and data enrichment process, we obtain H consist- ing of 2,181 hashtags which occur in 29,195 tweets. The size of edge set E is 27,430. Selecting hashtags containing strong senti- ment words results in a subset ˜H containing 947 examples, which has 595 positive samples and 352 negative samples. The remaining hashtags in H do not have a automatic annotated groundtruth, but the classification of them can be evaluated through the case study. This dataset is used for measuring the performance of hashtag sen- timent classification algorithms. For enhanced boosting classifica- tion approaches, this dataset will be spilled into the training set and test set to evaluate the classification result with cross validation.
4.2 Tweet Level Sentiment Classifier
In this paper, we build the hashtag-level sentiment classification on top of the tweet-level sentiment analysis results. Basically, we adopted the state-of-the-art tweet-level sentiment classification ap- proach [2], which uses a two-stage SVM classifier to determine the sentiment polarity of a tweet. The first (i.e. subjectivity) classi- fier determines whether a tweet is neutral or subjective while the second one (i.e. polarity classifier) assigns a subjective tweet with positive or negative polarity. The SVMlightpackage5 is used in
our experiments. The two SVM classifiers take the same features as input, which are divided into two categories:
• Content features: including unigram words, punctuation and emoticons. We treat the presence of a token (unigram word, punctuation, or emoticon) as a binary feature which is 1 if the corresponding token occurs in tweet and 0 otherwise. • Sentiment lexicon features: we employ the lexicon from the
General Inquirer6and count the number of positive or nega-
tive words in tweets as features. There are two dimensions in the feature vector which denote the number of positive and negative words in the tweet.
Classifier Accuracy Precision Recall F1 subjectivity(1) 83.13% 59.45% 36.59% 45.27%
polarity(2) 88.96% 90.49% 94.82% 92.60%
(1)+(2) 84.13% ! ! !
Table 1: Performance of the tweet-level classifier
We use the subjectivity classifier to filter out the neutral tweets. The output of the polarity classifier for a subjective tweet is a real value score s which is positive when predicting the tweet t as pos- itive and negative when predicting as negative. Since we need to convert this value into the polarity probability, we use an empirical threshold ξ = 2 and the following formula is adopted, which is similar to the manner introduced in [16] :
Prpos(t) = 1 s >= ξ 0.5 + s/(2ξ) s∈ (−ξ, ξ) 0 s <=−ξ (10) Prneg(t) = 1− Prpos(t) (11) 5http://svmlight.joachims.org/ 6http://www.wjh.harvard.edu/ inquirer/
Per raccogliere i dati necessari all’analisi è stato usato un sistema basato sulle Keyword. Prima è stata fatta una ricerca sui topic d’interesse utilizzando come parole chiavi gli argomenti stessi. Da questi messaggi poi sono state estratte altre parole chiave, in maniera soggettiva supportata da dati lessicali(co occorrenze, ecc..), rilevanti per i vari argomenti.