Information Processing and Management

(1)

Contents lists available atScienceDirect

Information Processing and Management

journal homepage:www.elsevier.com/locate/ipm

Expressive signals in social media languages to improve polarity

detection

E. Fersini

∗

, E. Messina, F.A. Pozzi

DISCo, University of Milano-Bicocca, Viale Sarca, 336 – 20126 Milano, Italy

a r t i c l e

i n f o

Article history: Received 21 May 2014 Revised 7 April 2015 Accepted 11 April 2015 Available online 12 June 2015

Keywords: Sentiment analysis Polarity detection Expressive signals

a b s t r a c t

Social media represents an emerging challenging sector where the natural language expres-sions of people can be easily reported through blogs and short text messages. This is rapidly creating unique contents of massive dimensions that need to be efficiently and effectively an-alyzed to create actionable knowledge for decision making processes. A key information that can be grasped from social environments relates to the polarity of text messages. To better capture the sentiment orientation of the messages, several valuable expressive forms could be taken into account. In this paper, three expressive signals – typically used in microblogs – have been explored: (1) adjectives, (2) emoticon, emphatic and onomatopoeic expressions and (3) expressive lengthening. Once a text message has been normalized to better conform social media posts to a canonical language, the considered expressive signals have been used to enrich the feature space and train several baseline and ensemble classifiers aimed at polar-ity classification. The experimental results show that adjectives are more discriminative and impacting than the other considered expressive signals.

1. Introduction

The goal of sentiment analysis is to deﬁne automatic tools able to extract subjective information, such as opinions and sen-timents from natural language texts, in order to create structured and actionable knowledge to be used by either a decision support system or a decision maker. This issue is usually addressed at document level (Yessenalina et al., 2010), in which the naive assumption is that each document expresses an overall sentiment. When dealing with social media contents coming from microblogs (like Facebook and Twitter), a lower granularity level could be more useful and informative (Jagtap and Pawar, 2013; Zhang et al., 2011). This new kind of virtual communication has led to new types of contents and diffusion models that need to be modeled explicitly starting from the language. The characteristics that distinguish well-formed contents (e.g. reviews) from microblogs messages relate to the use of canonical, coherent and at least paragraph-length pieces of text. However, sentiment analysis on social media leads towards new and more complex scenarios: the sentiment is conveyed in at most two sentence passages often with an informal linguistic register and with non-standard spelling (Eisenstein, 2013). These novel scenarios lead researchers to move from a traditional approach, which solves the sentiment analysis task by using machine learning models (Pang and Lee, 2008), to a communication-oriented paradigm.

The ﬁrst expressive signals that have been considered in the literature to aid the detection of sentiment in a given message are concerned with lexical elements (e.g., adjectives, verbs, adverbs). Pak and Paroubek (2010) investigated the

∗ _{Corresponding author.}

E-mail addresses:fersini@disco.unimib.it(E. Fersini),messina@disco.unimib.it(E. Messina),federico.pozzi@disco.unimib.it(F.A. Pozzi). http://dx.doi.org/10.1016/j.ipm.2015.04.004

(2)

relationships of several part-of-speech (POS) tags with respect to the message subjectivity/objectivity. For instance, interjections and adjectives are relevant indicators of subjective texts, while objective messages contain more common and proper nouns. Once positive and negative texts have been annotated with their part-of-speech tags, the resulting corpus is used to train a sen-timent classifier. A further approach that exploits the part-of-speech characteristics is presented in (Kouloumpis et al., 2011), where the combination of n-grams and POS tags shows a significant improvement in detecting the sentiment orientation of messages. In the context of social language processing, the use of emoticons has attracted machine learning researchers for the sentiment classification task (Hogenboom et al., 2013; Liu et al., 2012; Zhao et al., 2012). Emoticons are considered to be handy and reliable indicators of sentiment, and hence could be used either to automatically generate a training corpus or to act as evi-dence feature to enhance sentiment classification. With regard to expressive lengthening (e.g., “I loooooove you”), not much has been investigated for evaluating its contribution in polarity classification. An exception is (Brody and Diakopoulos, 2011), where the lengthening phenomenon in microblogs has been shown to be strongly associated with subjectivity and sentiment.

Inspired by the wide availability of emotional signals in social media and the promising results obtained by our previous contribution (Pozzi, Fersini, Messina and Blanc, 2013), in this paper we investigate the contribution of the most used expressive signals. To the best of our knowledge, no studies consider the combination of adjectives, initialisms for emphatic and ono-matopoeic expressions, emoticons and word lengthening as possible additional features able to drive the detection of polarity in online social media. In this paper different contributions are given: (1) the analysis of three main language characteristics of social media language, (2) a text normalization procedure to better conform social media messages to a canonical language, (3) a feature expansion approach to improve polarity detection, (4) an analysis of the impact of the expressive signals studied both independently on each others and jointly in traditional learning models and (5) an analysis of the impact of the expressive signals in ensemble models, which have not been yet investigated in the state of the art. To the best of our knowledge, two main research papers (Fersini et al., 2014; Wang et al., 2014) deal with ensemble learning for sentiment analysis, but the focus is on the classification model instead of the impact of the expressive signals. Experimental results highlight not only the link between the characteristics of social media language with the polarity of the messages, but also their beneficial effects with respect to sentiment classification accuracy when jointly considered.

The paper is organized as follows. InSection 2the main expressive forms in online social media are outlined. InSection 3 the text normalization procedure together with the proposed feature expansion approach are detailed. InSection 4the baseline classiﬁers and ensemble methods, used to evaluate the impact of the proposed approach, are presented. InSection 5the exper-imental investigation is detailed, while inSection 6a detailed analysis about the behavior of the classiﬁers and the role of the expressive signals considered is reported. Finally, inSection 7conclusions are derived.

2. Expressive forms in online social media

To better capture the sentiment orientation of the messages, several valuable expressive forms should be taken into account when tackling polarity detection in online social environments. Although microblogs make available several expressive signals, most of them are platform-dependent. For example, Twitter has ‘hashtags’ (words preﬁxed with the symbol ‘#’) which allow users to easily specify topics and summarize the overall sentiment. Differently from Twitter where posts are plain texts, mes-sages on Google+ and Tumblr can be characterized by formatted text. For instance, the bold style can be used to empathize the sentiment (e.g., ‘The iPhone is so beautiful!’) and the strikethrough (a typographical presentation of words with a horizontal line through their center) can be used to convey humor (i.e. the sentiment orientation can be reversed). People use the strikethrough to look like an edit, as if you were crossing something out on paper, but so it is still readable by people (e.g., That was kinda strikethrough …ehmmm..hilarious:)’).

In order to investigate expressive signals that are independent on the platform, this paper focuses on: (1) adjectives, (2) pragmatic particles, such as emoticon, emphatic and onomatopoeic expressions and (3) expressive lengthening. In our inves-tigation, Twitter has been exploited thanks to its availability of data that are public by default: the percentage of public pro-ﬁles available in Twitter is much higher than other social media. For example, in 2012, just over 11% of Twitter users were using the private proﬁles, compared to over 53% of Facebook (Dey et al., 2012). This turns Twitter into a gold mine of free data.

Adjectives. Adjectives are lexical components that operate on the substructure of a sentence to either describe or mod-ify a given element. In this work, we argue that adjectives are strictly related to positive and negative opinions and there-fore could contribute to better detect the sentiment of a given text message. Starting from the idea proposed in (Benamara et al., 2007), our paper is aimed at evaluating the spread of adjectives in online social media and their role in polarity pre-diction. To this purpose, a Part-Of-Speech tagging has been applied in order to tag each term of a message with respect to its verbal form. Canonical (J), comparative (JJR) and superlative (JJS) adjectives1_{have been detected and considered as positive}

(or negative) according to one of the most used lexicon (Hu and Liu, 2004) known as DictHuLiu.2_{The lexicon is composed of}

4783 negative and 2006 positive words. Since online conversational text differs markedly from traditional written genres like

1_{The used tag set represents a standard for Part-Of-Speech tagging and it has been deﬁned in the Penn Tree Bank Project (released through the Linguistic Data} Consortium). Seehttps://www.cis.upenn.edu/treebank/.

(3)

newswire, we used a supervised POS tagger3_{proposed by}_{Owoputi et al. (2013)}_{and trained on manually-annotated social media}

contents.

Pragmatic particles. Pragmatic particles, such as emoticons, emphatic and onomatopoeic expressions, represent those lin-guistic elements typically used on social media to elicit a given message. Emoticons are introduced as expressive, non-verbal components into the written language, mirroring the role played by facial expressions in speech (Walther and DAddario, 2001). Their role is mainly pragmatic: emoticons give a positive or negative sense to written sentences by a visual expression. According to this consideration, we formulate the hypothesis that there exists a relationship between the sentiment orientation of emoti-cons and messages. In order to corroborate this hypothesis (a descriptive analysis will be subsequently conducted), emotiemoti-cons have been distinguished in two main categories, i.e. positive and negative. Instances of positive emoticons are ‘:-)’, ‘:)’, ‘=)’, ‘:D’, while examples of negative ones are ‘:-(’, ‘: (’, ‘=(’, ‘; (’.

Initialisms for emphatic expressions represent a further pragmatic element used in non-verbal communication in online social

media. Although they act as constituent, these emphatic abbreviations play a similar role of emoticons: expressions such as ‘ROFL’ (Rolling On Floor Laughing) clearly represent positive expressions, while abbreviations as ‘BM’ (Bad Manner) denote negative statements.

Onomatopoeic expressions in online social media can help to convey emotions: some expressions such as ‘bleh’ and ‘wow’ are

clear indicators of negative and positive emotional states and therefore can help to distinguish the polarity of a text message. In order to deal with onomatopoeic forms, a regular expression has been deﬁned to map these text elements to the corresponding sentiment orientation dictionaries (positive and negative).

The complete list of pragmatic particles is available assupplementary material.

Expressive lengthening. In text-based social media, word styling (as bold, italic and underlining) is not always available and often replaced by some linguistic conventions. Moreover, the informal nature of expressions leads social media users to make use of orthographic styles that are actually close to the spoken language. In this paper, we claim that the commonly observed phenomenon of expressive lengthening (usually known as word lengthening or word stretching) is an indication of emphasis that is strongly associated with subjectivity and sentiment (Brody and Diakopoulos, 2011). These expressive forms, which are speciﬁc to informal social communication, are usually denoted by some orthographic conventions that mark important expressions used to help polarity detection.

Example 1 [negative]: One. More. Source. C’mon google, just one more # PLEAAASSEEEEE

To better capture the positive or negative orientation of a message, also an expressive lengthening should be considered depending on the sentiment. However, in order to identify its polarity, the corresponding canonical (condensed) form need to be extracted. The main problem when addressing word lengthening is the selection of the correct root. For instance consider the term “gooood” that appears in the following two messages:

Example 2 [positive]: Thanks to gooood it’s Friday!!!! Example 3 [positive]: The new Oreo cookies are really gooood!

When dealing with sentiment analysis its fundamental to detect the correct root of the lengthened word, to subsequently identify the corresponding polarity. Although some approaches in the literature are aimed at tackling expressive lengthening, they are usually based on strict assumptions that make diﬃcult (uncertain) the association of polarity to the original word. For instance, repeated characters are replaced with a single instance of that letter. With respect to the case reported above, both gooood will be replaced with god originating therefore an error in the subsequent polarity association. For this reason, we decided to consider only the presence of a lengthening without its potential sentiment orientation.

The hypothesis underlying this paper is that the main expressive signals mentioned above are strictly related to positive and negative opinions, and therefore could contribute to better detect the polarity of a given text message. In order to corroborate this hypothesis, a preliminary Bayesian analysis is conducted to clarify the relationships between these expressive forms and the sentiment orientation of messages. In order to support our hypothesis that the sentiment orientation of a given expressive signal agrees with the sentiment of the message, conditional probabilities have been computed.

In particular, given an expressive form e occurring in a message m and the polarity label set

=

{

+, −

}

(where+ stands for positive and− for negative), the conditional probability can be estimated as:

P

(

pol

(

e∈ m

)

= se

|

pol

(

m

)

= sm

)

=

I

(

pol

(

m

)

= sm∧ pol

(

e

)

= se

)

I

(

pol

(

m

)

= sm

)

, se, sm∈

(1) where pol

(

·

)

denotes the polarity label and I

(

·

)

is the indicator function. The discussion about the descriptive analysis is based on two benchmarks presented inSection 5.

3. Text normalization and feature expansion

Unlike well-formed documents (e.g., reviews), the writing style and the lexicon of microblogging messages are widely varied. Moreover, messages are often highly ungrammatical, and ﬁlled with spelling errors. As reported inEisenstein (2013),

(4)

the non-standard spelling on the social media is mainly due to the fast writing of the users, length limits of messages in online microblogs and ﬁnally the spread of common illiteracies until they become “the norm”.

3.1. Text normalization

One approach to deal with language of social media consists in conforming the texts to a canonical language. For this pur-pose, we captured a set of patterns using dictionaries a priori deﬁned and regular expressions (REGEX). The text normalization approach includes:

• URL removal: URLs do not provide valuable information for the sentiment analysis task. To this purpose, all the tokens match-ing the REGEX

(https?

| ftp | file)://[-a-zA-Z0-9+

&

@#/%?=

_{∼_}

| !:,.;]

_∗

[-a-zA-Z0-9+

&

@ #/%=

_{∼_}

| ]

have been removed;

• Sharing symbols elimination: most of the social media provide speciﬁc tools that allow users to share as much as possible their messages (e.g., Hashtags, Mention and Retweet on Twitter). Analogously to URLs, all the symbols related to the sharing tools have been removed because ineffective with respect to polarity detection;

• Spell-Checker: messages in online social media are often highly ungrammatical and ﬁlled with spelling errors. In order to overcome this issue, misspelled tokens have been corrected using the Google’s Spell Checker API.4_{Since the Google’s}

algo-rithm takes the neighborhood (context) of a misspelled token into account in suggesting the correction, the whole previously ﬁltered tweet is considered as a query rather than the single token5_;

• Slang correction: in order to aggregate terms with the same meaning but represented with different slangs, a dictionary6_of

a priori deﬁned slang expressions with their meaning, such as ‘btw’ (by the way), ‘thx’ (thanks), ‘any1’ (anyone) and u (you) has been used (Balahur et al., 2014).

3.2. Feature expansion

Most of the works on sentiment analysis have relied on machine learning approaches: by using a bag of words representation, the main goal is to learn a positive/negative classiﬁer based on given weights associated to the words in the text. According to these strategies, the traditional feature vector representing a message m (used to train a given classiﬁer) only includes terms that belong to a common vocabulary V of terms derived from a message collection:

→

m=

(

wt1, wt2, . . . , wt|V|, class

)

(2) where w_tidenotes the weight of term i belonging to the message m.

Less effort has been devoted to enrich the feature space used by the learning machines: to the best of our knowledge no studies consider the combination of adjectives, initialisms for emphatic and onomatopoeic expressions, emoticons and word lengthening as a set of possible additional features used to improve polarity detection.

To this purpose, we propose to enhance the traditional feature vector by including indications about the expressive signals previously introduced. The novel feature vector of a message m is deﬁned as:

→

mnew=

(

wt1, wt2, . . . , wt|V|, p+, p−, a+, a−, l, class

)

(3)

where psrepresents the pragmatic elements (emoticons, initialisms for emphatic and onomatopoeic expressions) with po-larity s∈

{

+, −

}

, asdenotes the adjectives with polarity s, l denotes the expressive lengthening and class is the ground truth polarity.

4. Classiﬁers for polarity detection 4.1. Baseline classiﬁers

In this section a brief overview of the traditional machine learning approaches used for polarity detection is given. In the following, Multinomial Naïve Bayes, Decision Tree, Support Vector Machines and Bayesian Networks are presented.

Multinomial Naïve Bayes. Multinomial Naïve Bayes (MNB) is a classiﬁer often used for text categorization. Given x_k, k =

1, 2, . . . , K be the k-th training vector and ykis the corresponding label such that yk∈

{

1, 2, . . . ,Y

}

, its main goal is to compute the model probability as:

P

(

yk

|

x1k, . . . , x n k

)

= P

(

Hi

)

P

x1_k, . . . , xn k

|

yk, j x_kj

(4) 4_{https://code.google.com/p/google-api-spelling-java/}_.

(5)

This probability model is based on the assumption that the sample length and the class hypothesis are marginally independent.

Regarding polarity classiﬁcation, the approach has been investigated inPang et al. (2002), Pang and Lee (2004) and Go et al. (2009).

Support Vector Machines. Support Vector Machines (SVMs) are learning machines that try to ﬁnd the optimal hyperplane discriminating samples of different classes (Cortes and Vapnik, 1995). A good separation is achieved by the hyperplane that has the largest distance to the nearest training data point of any class (so-called functional margin), since in general the larger the margin the lower the generalization error of the classiﬁer. In this paper, we exploited the probabilistic extension of the original SVMs (Hastie and Tibshirani, 1998).

The main goal is to estimate

py= P

(

yk= y

|

xk

)

(5)

We ﬁrst estimate pairwise class probabilities following the setting of the one-against-one approach for multi-class classiﬁca-tion

τ

yy≈ P

(

yk= y

|

yk= y or y, xk

)

, y= 1, 2, . . . , K, y = y (6) After collecting all pairwise

τ

_yyvalues, the following optimization problem has been solved:

min p 1 2 Y y=1 y:y_=y

(

τ

yypy−

τ

yypy

)

2 (7) subject to py≥ 0,

∀

y, Y y=1 py= 1 (8)

Regarding polarity classiﬁcation, Support Vector Machines have been investigated in (Go et al., 2009; Pang and Lee, 2004; Pang et al., 2002).

Decision Trees. Decision Trees (DT) are classiﬁers presented as binary tree-like structure, where each node corresponds to a variable and edges represents possible realization of that variable. Given a sample x_k=

(

x1

k, . . . , x n

k

)

, leaf nodes correspond to the possible class hypothesis yk. The main goal of Decision Trees is to build a model of the class hypotheses based on the observed attributes of training data. Since this classifier outputs a dichotomic decision tree, it can be used to determine the class label of unclassified sample by considering its descriptive attribute realizations. Building a decision tree model from a training dataset involves two phases. In the first phase, a splitting attribute and a split index are chosen. The second phase involves splitting the records among the child nodes based on the decision made in the first phase. For evaluating whether a node should be splitted or not, the Entropy Deviance (Aha et al., 1991) measure has been used.

Regarding polarity classiﬁcation, Decision Trees have been investigated inBifet and Frank (2010) and Jia et al. (2009). Bayesian Networks. Bayesian Networks (BN) are probabilistic graphical models that compactly represent the joint probability distribution of n random variables. The main assumption, captured graphically by a dependency structure, is that each variable is directly inﬂuenced by only few others. A probability distribution is represented as a directed acyclic graph (DAG) whose nodes represent random variables and whose edges denote direct dependencies between a node hk=

{

xk∪ yk

}

and its set of parents

Pa

(

h_k

)

. Formally, a Bayesian Network asserts that each node is conditional independent of its non-descendants given its parents. Given n features, the joint probability distribution can be decomposed as:

P

(

h1 k, . . . , hnk

)

= n j=1 P

(

h_kj

|

h1 k, . . . , h j−1 k

)

= n j=1 P

(

h_kj

|

Pa

(

h_kj

))

(9) where P

(

h_kj

|

Pa

(

h_kj

))

is described by a conditional probability distribution (CPD).

Regarding polarity classiﬁcation, Bayesian Networks have been investigated inBai (2011) and Airoldi et al. (2006).

4.2. Ensamble approaches

Given a space of possible models, classical statistical inference selects the single model with the highest likelihood given the training data and uses it to make predictions. This may lead to over-confident inferences and decisions that do not take into account the inherent uncertainty of the natural language in wider context as social media. Instead, the idea behind a ensemble mechanism is to exploit the characteristics of several independent classifiers by combining them in order to achieve higher performance than the best single classifier. In order to understand whether the typical social media expressive signals have an impact on several learning schemes, also the main ensemble approaches have been considered.

Regarding polarity classiﬁcation, various ensamble approaches have been investigated inWhitehead and Yaeger (2010), Xia et al. (2011), Hassan et al. (2013) and Wang et al. (2014).

(6)

the ﬁnal polarity by selecting the most popular label prediction (Dietterich, 2002). Let C be a set of independent classiﬁers and

poli

(

m

)

the label assigned to a message m by classiﬁer i∈ C. Then, the optimal label polMV

(

m

)

is assigned as follows:

polMV

(

m

)

=

⎧

⎪

⎨

⎪

⎩

arg max sm i∈C I

(

poli

(

m

)

= sm

)

if i∈C I

(

poli

(

m

)

= sm

)

> i∈C I

(

poli

(

m

)

= sm

)

∀

sm = sm∈

_pol

₍

_m

₎

_otherwise (10)

where I

(

·

)

is the indicator function,

is the set of labels and pol

(

m

)

is the label assigned to m by the “most expert” classiﬁer, i.e. the classiﬁer that is able to ensure the highest accuracy.

Bayesian Model Averaging. The most important limit introduced by Majority Voting is that the models to be included in the ensemble have uniform distributed weights regardless their reliability. However, the uncertainty left by data and models can be ﬁltered by considering the Bayesian paradigm. In particular, through Bayesian Model Averaging (BMA) all possible models in the hypothesis space could be used when making predictions, considering their marginal prediction capabilities and their reliabilities:

P

(

pol

(

m

)

|

C, D

)

=

i∈C

P

(

pol

(

m

)

|

i, D

)

P

(

i

|

D

)

(11) where P

(

pol

(

m

)

|

i, D

)

is the marginal distribution of the label predicted by classiﬁer i and P

(

i

|

D

)

denotes the posterior proba-bility of model i. The posterior P

(

i

|

D

)

can be computed as:

P

(

i

|

D

)

= P

(

D

|

i

)

P

(

i

)

j∈CP

(

D

|

j

)

P

(

j

)

(12) where P

(

i

)

is the prior probability of i and P

(

D

|

i

)

is the model likelihood. InEq. (12),_j_∈CP

(

D

|

j

)

P

(

j

)

is assumed to be a constant and therefore can be omitted. Therefore, BMA assigns the label polBMA

(

m

)

to m according to the following decision rule:

polBMA

(

m

)

= argmax

pol(m)P

(

pol

(

m

)

|

C, D

)

= argmaxpol(m)

i∈C P

(

pol

(

m

)

|

i, D

)

P

(

i

|

D

)

= argmax pol(m) i∈C P

(

pol

(

m

)

|

i, D

)

P

(

D

|

i

)

P

(

i

)

(13) The implicit measure P

(

D

|

i

)

can be easily replaced by an explicit estimate, known as F1-measure, obtained during a

pre-liminary evaluation of the classiﬁer i. In particular, by performing a cross validation, each classiﬁer can produce an averaged measure stating how well a learning machine generalizes to unseen data. Considering

φ

-folds for cross validating a classiﬁer i, the measure P

(

D

|

i

)

can be approximated as

P

(

D

|

i

)

≈1

ι

φ ι=1 2× Piι

(

D

)

× Riι

(

D

)

Piι

(

D

)

+ Riι

(

D

)

(14) where Piι

(

D

)

and Riι

(

D

)

denotes precision and recall obtained by classiﬁer i at fold

ι

. The measure P

(

D

|

i

)

can be estimated both for positive and negative polarities. In this way P

(

l

(

m

)

|

i, D

)

Eq. (13)is tuned according to the ability of the classifier to fit the training data. This approach allows the uncertainty of each classifier to be taken into account, avoiding over-confident inferences. For more details on BMA for polarity detection, please refer to (Pozzi, Fersini and Messina, 2013; Fersini et al., 2014).

5. Experimental investigation 5.1. Experimental settings

In order to verify whether the proposed normalization techniques and feature expansion improve polarity detection, three benchmark datasets have been considered: Gold Standard Movie, Gold Standard Person (Chen et al., 2012) and SemEval 2013 -Task 2.7_{Gold Standard Movie and Person contain respectively 1500 manually labeled Twitter data. Although the original dataset}

is composed of 3 different polarities (POS, NEG and NEU), a reduction of instances has been performed in order to deal only with positive and negative opinions. The resulting datasets are therefore unbalanced: Person is composed of 105 ( 26.44%) negative and 292 ( 73.56%) positive opinions, while Movie includes 96 ( 18.6%) negative and 420 ( 81.4%) positive orientations. The SemEval benchmark is composed of 4922 manually labeled tweets, 3474 ( 70.58%) positive and 1448 ( 29.41%) negative.

(7)

Table 1

Adjective distribution.

Dataset Tweets

With adjectives Movie 77.90% (402)

Person 52.39% (208)

SemEval 45.67% (1900)

Without adjectives Movie 22.10% (114)

Person 47.61% (189)

SemEval 54.32% (2260)

Concerning the traditional baseline classifiers (also enclosed in the ensembles), WEKA toolkit has been used, while BMA and MV have been developed from scratch. Regarding the classifier configurations, probabilistic SVMs have been trained with linear kernel (with cost parameter equal to 1.0 and tolerance to misclassification equal to 0.0010). K2 search algorithm has been exploited to learn the structure of the Bayesian Network. For Decision Trees, C4.5 (J48 in Weka) has been adopted while for Multinomial Naive Bayes no particular setting is required. In order to evaluate the performance achieved by the investigated approaches, a 10-folds cross validation has been adopted.

As performance evaluation, we employed the classical state-of-the-art measure for classiﬁcation known as Precision (P), Recall (R) and F1-measure (Zaki and Meira, 2014). In particular, in order to directly compare the ensemble learning techniques

with the baseline classiﬁers, we employed accuracy:

Acc=# of messages successfully predicted

total # of messagges (15)

5.2. Experimental results

In this section, computational results achieved on all the studied datasets are presented. First, results regarding the in-vestigation on the single contribution of the studied features are shown. Finally, the features have been combined and the classiﬁcation results are presented. In the experimental results, baseline classiﬁers, Majority Voting and BMA have been considered.

5.2.1. Adjectives

The ﬁrst evaluation that has been conducted is mainly aimed at evaluating the spread of the adjectives in social media mes-sages. To this purpose, the distribution of tweets with and without adjectives has been estimated and reported inTable 1. Results highlight that this expressive signal is largely widespread on all the datasets.

Given the remark that adjectives are frequently used in social media contents, the subsequent analysis is focused on high-lighting the relationship that exists between them and the sentiment orientation of the messages. In particular, the conditional probability distributions presented inEq. (1)have been computed and reported inTable 2. If we focus on the ﬁrst two rows of Tables 2a, we can highlight the relationship that exists between the presence/absence of an adjective and the sentiment orien-tation of the message (both for positive and negative messages). By analyzing the other two dataset (Tables 2b and c), we can note that while for positive tweets there is a good correspondence with the occurrence of adjectives, for negative tweets it is more probable to do not have any observation about their presence. If we consider the sentiment orientation of an adjective conditioned to the overall message polarity (row three and four), results show that positive messages have positive adjectives with a high probability as well as for negative adjectives in negative messages. A possible explanation of this polarity agreement is related to the characteristics of Twitter messages: due to the 140-characters limit, adjectives are usually used as powerful instruments to mark the essence of the opinion. A user does not have the possibility to express an articulated opinion, where potentially discordant adjectives (related to different aspects) can be used to express an overall polarity (such as reviews).

Example 4 [positive]: @user You’ll like Shutter Island it was really good :D

Example 5 [positive]: […] star trek generations========== trek fans may be more forgiving , but for the rest of us,

the sluggish star trek generations is a mixed bag at best. the story is interesting , but each scene goes too long . the cast is earnest , but the direction lacks. and so on. (the best example of the latter is a klingon comeuppance that delivers none of the impact of a similar scene in star trek ii .) original enterprise captain james t. kirk appears on both ends of the story, though they cut the scene where shatner turns to the screen to plead “get a life”. remarkably unremarkable . […]

For instance, consider the examples reported above where two opinions are expressed on movies both by a tweet (Example 4, from Gold Standard Movie) and a review (Example 5, from the v2.0 polarity dataset8_{(Pang and Lee, 2004)). Although both}

examples denote a positive sentiment orientation, in the ﬁrst opinion the adjectives are clearly coherent with the overall message polarity, while in the second one the polarity of adjectives is characterized by a strong variability (3 positive adjectives vs 4 negative ones).

(8)

Table 2

Conditional probabilities for adjectives.

The highest probabilities have been marked as bold.

Fig. 1. Comparison between accuracy achieved by classiﬁers with no feature expansion and considering only adjectives.

Regarding the classification results,Fig. 1 shows that including adjectives as additional features leads the most performing baseline classifier (SVM for all the datasets) to an accuracy improvement of 2.53% for Movie, 2.48% for Person and 0.49% for SemEval. Similar improvements can be noted for the ensemble methods (MV and BMA), highlighting the effective contribution of adjectives independently on the learning scheme. If we focus on a comparison between the best baseline classifier with no additional features and BMA9_{with the inclusion of the expressive signal, we can see that a greater accuracy improvement of}

4.09% for Movie, 3.73% for Person and 3.35% for SemEval has been achieved.

5.2.2. Pragmatic particles

Unlike adjectives, which are frequent lexical functions in natural language, the pragmatic particles are expected to be less frequent. This characteristic is conﬁrmed by the probabilities reported inTable 3, where the distribution on the studied datasets is shown.

However, in order to verify that pragmatic particles could be an important source of information for polarity classiﬁcation, a detailed analysis has been conducted by conditioning the presence/absence of a particle, as well as their polarity, with respect to the sentiment orientation of the message (Table 4).

Results for Movie (Table 4a) show that the polarity of pragmatic particles in the messages generally agrees with the message polarity: this leads a positive message to have positive particles, as well as negative messages with negative par-ticles, with a high probability. Regarding the Person dataset (Table 4b), we can assert that while the occurrence of a pos-itive particle is conditioned by the pospos-itive nature of the message, for the negative ones any conclusion can be drawn. In fact, a zero-probability event is not an event that never happens: the Person dataset does not enclose negative messages that

(9)

Table 3

Pragmatic particles distribution.

Dataset Tweets (%)

With pragmatic particles Movie 15.69% (81)

Person 8.75% (28)

SemEval 19.80% (824)

Without pragmatic particles Movie 84.31% (435)

Person 91.25% (292)

SemEval 80.19% (3336)

Table 4

Conditional probabilities for pragmatic particles.

Positive tweets Negative tweets

contain negative pragmatic particles. Results for SemEval (Table 4c) show that while the occurrence of a positive particle is con-ditioned by the positive nature of the message, for the negative ones there is a very small difference that makes any conclusion statistically not signiﬁcant.

Example 4 [negative]: Disappointing to see the UK Govt issueing reassuring messages to Tehran http://t.co/FEePWqPh Maybe they were inﬂuenced by @Nigel_Farage:)

A possible explanation about P

(

p₊∈ m

|

sm= −

)

> P

(

p−∈ m

|

sm= −

)

inTables 4(b)–(c) is related to the use of mockery, humor and irony. Consider for instance Example 3, where a negative opinion has been extracted from the SemEval benchmark about politics. The judgment the UK Government is clearly negative although the positive emoticon “:)” is present introducing irony.

As far is concerned with the sentiment classification, inFig. 2the prediction accuracies of the considered learning machines have been reported. Considering pragmatic particle features with the most performing baseline classifier leads to an accuracy improvement of 1.77% for Movie (SVM), 0.72% for Person (MNB) and 0.24% for SemEval (SVM). A more significant improvement can be noted considering feature expansion with BMA10_{against the best baseline with no additional features: BMA is able to}

provide an increment of 2.4% for Movie, 0.83% for Person and 3.18% for SemEval. Even though MV is signiﬁcantly worst than the other approaches, the use of pragmatic particles leads to an accuracy improvement.

5.2.3. Expressive lengthening

Since we argue that word lengthening could be a relevant sentiment indicator (although it rarely occurs), its presence has been conditioned to the message polarity.Table 5shows that word lengthening has a greater correspondence with positive messages. This can be motivated by strongly positive emotional states related to joy. An instance is reported in Example 4, where a positive sentiment orientation has been extracted from the SemEval benchmark.

Example 5 [Positive]: 7:23 a young savage named suadonte wright was born, I love you baby may your soul rest_{< 3333 I} miss you donnnn!

Although the probability of expressing a stretched word is very low, this expressive form is able to provide a bit of infor-mation. In particular, if we focus on the classification results depicted inFig. 3, we can note that most of the classifier per-formance are improved when the expressive lengthening is considered. The most performing baseline classifier achieves an

(10)

Fig. 2. Classiﬁers with and without pragmatic particles.

Table 5

Conditional probabilities for expressive lengthening.

Movie Person SemEval

P(l∈ m) 0.056 0.035 0.030

P(l /∈ m) 0.944 0.965 0.969

P(l∈ m|sm= +) 0.054 0.041 0.030

P(l∈ m|sm= −) 0.062 0.019 0.028

Fig. 3. Comparison between accuracy achieved by classiﬁers with no feature expansion and considering only word lengthening.

(11)

Fig. 4. Accuracy comparison between classiﬁers with no feature expansion and classiﬁers with all expressive signals.

5.2.4. Combination of all expressive signals

In the previous sections, the contribution of adjectives, pragmatic particles and expressive lengthening has been indepen-dently studied. In this section, the impact of all the expressive signals has been analyzed.Fig. 4 shows that including the con-sidered additional features leads to a significant performance improvement for MV and the most performing baseline classifier and BMA ensemble. In particular, the feature expansion leads the best baseline classifier (SVM for all the datasets) to an accuracy improvement of 3.51% for Movie, 2.98% for Person and 1% for SemEval. The combination of BMA and all the expressive signals leads to an improvement of 4.32% for Movie (DT+BN), 5.02% for Person (DT+SVM+MNB) and 4.69% for SemEval (SVM+MNB+BN). An outstanding improvement can be noted when training a MV approach. In particular, all the expressive forms provide an im-provement on Movie and Person of 14.80% and 5.94% respectively. These encouraging results suggest that not only words play an important role in sentiment classification on social media, but also that a mixture of expressive signals can significantly contribute to better discriminate between positive and negative opinions.

6. Discussion

The results reported in the previous section need a deeper discussion about the behavior of the classifiers and the role of the expressive signals considered. The first consideration relates to the performance achieved by MV. If we focus onFigs. 1–4, we can easily note that MV has broadly reduced generalization abilities. For instance, on Movie and Person dataset, MV has decreasing performance than baseline in terms of accuracy of 12–13%. The main motivations of this behavior are related to two main aspects: 1. Decision rule. The traditional MV approach exploits a democratic voting rule to assign a polarity to a given message, do not taking advantage of marginal distributions. For example consider a message m with negative polarity and a MV ensemble composed of three classifiers A, B and C. While A and B provide the same marginal distribution< 0.51;0.49 > for positive and

negative labels respectively, C has_{< 0.05;0.95 >. This leads to a misclassiﬁcation when considering the traditional voting} rule, i.e. disregarding the marginal distributions the majority of classiﬁers originates a positive label prediction.

Therefore, the traditional voting rule is not able to take into account the indecision of the classiﬁers due to small gaps between positive and negative probabilities of the marginals. Alternative voting rules that could be able to overcome this limitation (experimentally investigated in the following) are:

Maximum rule. It selects the maximum a posteriori probability among the classiﬁers in the ensemble according to:

P

(

polMV

(

m

))

= max

i P

(

pol

(

m

)

|

i

)

, i ∈ C (16)

Average rule. The decision is determined according to the mean of the a posteriori probabilities given by the classiﬁers:

P

(

polMV

(

m

)

= sm

)

= 1

|

C

|

i∈C

(12)

Table 6

Accuracy comparison (%) of democratic (D), max (M), average (A) and product (P) voting rules on movie dataset with no feature expansion. No feature expansion D M A P DT, SVM 36.45 80.24 80.24 80.24 DT, MNB 62.50 80.44 80.06 80.06 DT, BN 27.75 80.44 80.44 80.44 SVM, MNB 56.68 82.56 82.58 82.58 SVM, BN 32.00 82.38 82.38 82.38 MNB, BN 55.42 81.22 81.22 81.22 DT, SVM, MNB 69.33 80.83 83.35 81.23 DT, SVM, BN 50.00 80.63 82.37 81.40 DT, MNB, BN 31.67 80.63 81.60 80.05 SVM, MNB, BN 57.50 82.18 83.52 82.77 DT, SVM, MNB, BN 56.00 80.63 81.99 81.02 Average accuracy 48.66 81.11 81.79 81.22

The highest average accuracies have been marked as bold.

Table 7

Accuracy comparison (%) of democratic (D), max (M), average (A) and product (P) voting rules on movie dataset with feature expansion.

Adjectives Pragmatic particles Lengthening

D M A P D M A P D M A P DT, SVM 59.98 86.07 86.06 86.06 19.36 78.11 78.50 78.50 32.80 80.63 80.82 80.82 DT, MNB 49.09 85.87 84.52 84.52 71.45 79.66 79.66 79.66 57.87 80.44 79.86 79.86 DT, BN 75.12 86.07 86.07 86.07 15.05 78.89 78.89 78.89 24.32 80.83 80.83 80.83 SVM, MNB 51.02 84.13 82.97 82.97 52.38 84.51 83.36 83.36 52.39 83.73 82.19 82.19 SVM, BN 48.66 84.31 85.28 85.28 23.36 82.38 82.38 82.38 24.83 82.38 82.38 82.38 MNB, BN 54.23 82.39 82.59 82.59 58.96 81.80 82.00 82.00 55.42 81.22 81.22 81.22 DT, SVM, MNB 86.33 86.07 86.26 84.53 64.00 80.05 82.37 80.43 72.50 80.63 83.36 81.61 DT, SVM, BN 76.00 86.07 85.87 85.68 33.33 78.89 81.61 79.46 40.00 80.83 82.18 81.21 DT, MNB, BN 85.67 86.07 86.25 84.33 40.00 80.25 81.80 79.86 16.67 80.83 81.41 79.86 SVM, MNB, BN 84.21 83.73 84.14 83.17 70.00 82.38 83.74 83.16 70.00 82.38 83.16 82.19 DT, SVM, MNB, BN 66.81 86.07 86.84 85.30 69.33 80.25 82.38 80.82 62.50 80.83 81.98 81.59 Average accuracy 67.01 85.17 85.17 84.59 47.02 80.65 81.52 80.77 46.30 81.34 81.76 81.25

The highest average accuracies have been marked as bold.

Table 8

Accuracy gain (%) across domains when considering all features. All features

Movie Person SemEval

Best baseline 3.51 2.98 1.00

Best MV 3.13 3.25 1.44

Best BMA 3.33 4.40 2.48

Product rule. The decision is determined by the product of the posterior probabilities:

P

(

polMV

(

m

)

= sm

)

=

i∈C

P

(

pol

(

m

)

= sm

|

i

)

(18)

In order to validate the hypothesis that the traditional democratic voting rule negatively affects the ﬁnal prediction of MV, we report inTables 6and7a comparison with the above mentioned alternative decision rules on Movie Dataset. Independently on the expansion of features (Table 6), all the decision rules which take into account the marginal distributions outperform the democratic rule. When feature expansion is adopted (Table 7), the accuracy performance are generally higher and the gap between democratic and other decision rules tends to decrease. In addition, we can assert that Average decision rule is able to ensure better performance both when feature expansion is considered or not. All these conclusions are still valid when considering the other two datasets.11

2. Size of the dataset. The performance of MV on Movie and Person Dataset are lower than SemEval due to the small number of training instances available (seeFig. 4). In fact, the large volume of training examples of the SemEval dataset contributes to improve the learning abilities of baseline classiﬁers and therefore the generalization capacities of MV.

(13)

Table 9

Evaluation of features (% of accuracy gain) about the best model conﬁguration across datasets.

Movie Person SemEval Movie Person SemEval Movie Person SemEval

Best baseline 2.53 2.48 0.49 1.77 0.72 0.49 1.18 0.21 0.05

Best MV 3.32 3.00 0.71 0.22 0.75 0.47 −0.16 −0.26 −0.03

Best BMA 3.10 3.11 1.13 1.42 0.21 0.96 0.43 0.20 0.43

Average 2.21 0.78 0.23

The highest average accuracy gains have been marked as bold.

Table 10

Evaluation of features (% of accuracy gain) about the worst model conﬁguration across datasets.

Movie Person SemEval Movie Person SemEval Movie Person SemEval

Worst baseline 4.63 7.31 2.74 −2.34 0.00 1.13 1.74 0.00 0.00

Worst MV 20.91 15.71 3.52 −12.70 2.86 1.71 −11.08 3.98 −0.05

Worst BMA 0.19 6.29 1.69 0.19 6.29 0.58 0.00 1.51 0.00

Average 7.0 −0.25 −0.43

The highest average accuracy gains have been marked as bold.

Fig. 5. Learning curves on Movie Dataset.

Obviously, any ensemble method have to deal with these two issues. However, BMA is less subjected to these concerns thanks to its paradigm (seeEq. (13)): the marginal distribution of the classiﬁer predictions contributes to improve the decision rule, while the reliability of the models reduces the effect of the small volume of training data.

(14)

Fig. 6. Learning curve on Person Dataset.

(15)

Fig. 8. Polarity error with non-literal meaning.

highlighting the whole process as independent on the dataset. In particular this gain ranges between a minimum of 1% and a maximum of 4.40%.

However, the entire process (feature expansion+learning models) is clearly dependent on the language used in the dataset. The proposed investigation focuses on social media messages characterized by an informal language style, where adjectives, pragmatic particles and expressive lengthening are commonly used. If the entire process was focused on social media messages characterized by a formal language style, the expected contribution would be driven only by adjectives.

In order to understand if there is an expressive signal that is more discriminative than others, we estimated for each dataset the accuracy gain when considering each feature in the optimal and worst model configuration with respect to no feature ex-pansion (seeTables 9and10). When considering the best model configuration (Table 9), all the features ensure a positive gain on average. As expected, adjectives are more discriminative compared with pragmatic particles and lengthening. Focusing on the worst model configuration (Table 10), it emerges that while adjectives are still discriminative, programatic particles and expressive lengthening negatively affect the classifiers.

To better grasp the negative role of pragmatic particles and expressive lengthening, we report inFigs. 5–7some learning curves related to the considered baseline classiﬁers. By analyzing the results, it emerges not only that the bag of word model without feature expansion is not able to perfectly encode the information provided by the adjectives (as well as the other fea-tures), but also that the error line when considering all the feature is almost asymptotic to the error line related to the them. This implies as general conclusion that only adjectives play a fundamental role as expressive signal, while pragmatic particles and expressive lengthening lead to the deﬁnition of erratic behaviors.

A final interesting issue relates to non-literal meaning, such irony and sarcasm, when dealing with polarity classification tasks. In order to understand if the considered expressive signals are discriminative when non-literal meaning is present, a preliminary experimental investigation has been conducted on the Evalita Dataset (Basile et al., 2014). The dataset is composed of Twitter messages (4513 training samples and 1935 test instances), for which subjectivity, polarity and irony annotations are available for each message. We report inFig. 8 some experimental results on baseline classifiers. As expected, it emerges that adjectives contribute to reduce the misclassification error in non-ironic tweets (seeFig. 8(a)). On the opposite, we can easily point out that when dealing with ironic expressions, the investigated expressive signals are not able to provide any improvement with respect to the polarity classification error (seeFig. 8(b)).

7. Conclusions

(16)

Appendix A. Supplementary material

Supplementary data associated with this article can be found, in the online version, athttp://dx.doi.org/10.1016/j.ipm.2015. 04.004.

References

Aha, D. W., Kibler, D., & Albert, M. K. (1991). Instance-based learning algorithms. Machine Learning, 6(1), 37–66.

Airoldi, E., Bai, X., & Padman, R. (2006). Markov blankets and meta-heuristics search: Sentiment extraction from unstructured texts, Lecture notes in computer

science: Vol. 3932. Advances in web mining and web usage analysis (pp. 167–187). Berlin, Heidelberg: Springer.

Bai, X. (2011). Predicting consumer sentiments from online text. Decision Support Systems, 50(4), 732–742.

Balahur, A., Turchi, M., Steinberger, R., Ortega, J. M. P., Jacquet, G., Küçük, D., et al. (2014). Resource creation and evaluation for multilingual sentiment analysis in social media texts. In Proceedings of the ninth international conference on language resources and evaluation. LREC (pp. 4265–4269).

Basile, V., Bolioli, A., Nissim, M., Patti, V., & Rosso, P. (2014). Overview of the Evalita 2014 SENTIment POLarity classiﬁcation task. In Proceedings of the 4th evaluation campaign of natural language processing and speech tools for Italian (EVALITA’14). Pisa, Italy.

Benamara, F., Cesarano, C., Picariello, A., Recupero, D. R., & Subrahmanian, V. S. (2007). Sentiment analysis: Adjectives and adverbs are better than adjectives alone. In Proceedings of the International Conference on Weblogs and Social Media. ICWSM (pp. 1–4).

Bifet, A., & Frank, E. (2010). Sentiment knowledge discovery in twitter streaming data. Proceedings of the 13th international conference on discovery science. DS’10 (pp. 1–15). Springer-Verlag.

Brody, B., & Diakopoulos, N. (2011). Cooooooooooooooollllllllllllll!!!!!!!!!!!!!! using word lengthening to detect sentiment in microblogs. In Proc. of the confer-ence on empirical methods in natural language processing. EMNLP ’11 (pp. 561–570).

Chen, L., Wang, W., Nagarajan, M., Wang, S., & Sheth, A. P. (2012). Extracting diverse sentiment expressions with target-dependent polarity from twitter. In 6th International AAAI conference on weblogs and social media. ICWSM (pp. 50–57).

Cortes, C., & Vapnik, V. (1995). Support-vector networks. ML, 20(3), 273–297.

Dey, R., Jelveh, Z., & Ross, K. (2012). Facebook users have become much more private: A large-scale study. 2012 IEEE international conference on pervasive computing

and communications workshops (PERCOM workshops) (pp. 346–352). IEEE.

Dietterich, T. G. (2002). Ensemble learning. In The handbook of brain theory and neural networks. Mit Pr (pp. 405–508).

Eisenstein, J. (2013). What to do about bad language on the internet. In Proceedings of the 2013 conference of the North American chapter of the association for computational linguistics: Human language technologies. NAACL-HLT (pp. 359–369).

Fersini, E., Messina, E., & Pozzi, F. (2014). Sentiment analysis: Bayesian ensemble learning. Decision Support Systems, 68, 26–38. Go, A., Bhayani, R., & Huang, L. (2009). Twitter sentiment classiﬁcation using distant supervision. Technical report, Stanford.

Hassan, A., Abbasi, A., & Zeng, D. (2013). Twitter sentiment analysis: A bootstrap ensemble framework. In: Proceedings of the international conference on social computing (pp. 357–364).

Hastie, T., & Tibshirani, R. (1998). Classiﬁcation by pairwise coupling. Proceedings of the 1997 conference on advances in neural information processing systems 10.

NIPS ’97 (pp. 507–513). MIT Press.

Hogenboom, A., Bal, D., Frasincar, F., Bal, M., de Jong, F., & Kaymak, U. (2013). Exploiting emoticons in sentiment analysis. Proceedings of the 28th annual ACM

symposium on applied computing (pp. 703–710). ACM.

Hu, M., & Liu, B. (2004). Mining and summarizing customer reviews. Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and

data mining (pp. 168–177). ACM.

Jagtap, V. S., & Pawar, K. (2013). Analysis of different approaches to sentence-level sentiment classiﬁcation. International Journal of Scientiﬁc Engineering and

Technology, 2(3), 164–170.

Jia, L., Yu, C., & Meng, W. (2009). The effect of negation on sentiment analysis and retrieval effectiveness. Proceedings of the 18th ACM conference on information

and knowledge management. CIKM ’09 (pp. 1827–1830). ACM.

Kouloumpis, E., Wilson, T., & Moore, J. (2011). Twitter sentiment analysis: The good the bad and the omg!. ICWSM, 11, 538–541.

Liu, K.-L., Li, W.-J., & Guo, M. (2012). Emoticon smoothed language models for twitter sentiment analysis. In Proceedings of the twenty-sixth AAAI conference on artiﬁcial intelligence (Vol. 2, pp. 1678–1684).

Maynard, D., & Greenwood, M. A. (2014). Who cares about sarcastic tweets? investigating the impact of sarcasm on sentiment analysis. In Proc. of LREC. Nakov, P., Rosenthal, S., Kozareva, Z., Stoyanov, V., Ritter, A., & Wilson, T. (2013). Semeval-2013 task 2: Sentiment analysis in twitter. In Proceedings of the 7th

international workshop on semantic evaluation (Vol. 13, pp. 312–320). Association for Computational Linguistics.

Owoputi, O., OConnor, B., Dyer, C., Gimpel, K., Schneider, N., & Smith, N. A. (2013). Improved part-of-speech tagging for online conversational text with word clusters. In Proceedings of the 2013 conference of the North American chapter of the association for computational linguistics: Human language technologies (pp. 380–390).

Pak, A., & Paroubek, P. (2010). Twitter as a corpus for sentiment analysis and opinion mining. In Proceedings of the international conference on language resources and evaluation. LREC (pp. 1320–1326).

Pang, B., & Lee, L. (2004). A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the ACL (pp. 271–278).

Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up?: Sentiment classiﬁcation using machine learning techniques. In Proceedings of the ACL-02 conference on empirical methods in natural language processing. EMNLP ’02 (Vol. 10, pp. 79–86). Association for Computational Linguistics.

Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2, 1–135.

Pozzi, F. A., Fersini, E., Messina, E., & Blanc, D. (2013). Enhance polarity classiﬁcation on social media through sentiment-based feature expansion. In WOA@ AI∗ IA (pp. 78–84).

Pozzi, F. A., Fersini, E., & Messina, E. (2013). Bayesian model averaging and model selection for polarity classiﬁcation. In NLDB (pp. 189–200). Reyes, A., Rosso, P., & Veale, T. (2013). A multidimensional approach for detecting irony in twitter. Language Resources and Evaluation, 47(1), 239–268. Suttles, J., & Ide, N. (2013). Distant supervision for emotion classiﬁcation with discrete binary values. Computational linguistics and intelligent text processing

(pp. 121–136). Springer.

Walther, J. B., & DAddario, K. P. (2001). The impacts of emoticons on message interpretation in computer-mediated communication. Social Science Computer

Review, 19, 324–347.

Wang, G., Sun, J., Ma, J., Xu, K., & Gu, J. (2014). Sentiment classiﬁcation: The contribution of ensemble learning. Decision Support Systems, 57(0), 77–93. Whitehead, M., & Yaeger, L. (2010). Sentiment mining using ensemble classiﬁcation models. Innovations and advances in computer sciences and engineering

(pp. 509–514). Netherlands: Springer.

Xia, R., Zong, C., & Li, S. (2011). Ensemble of feature sets and classification algorithms for sentiment classification. Information Sciences, 181(6), 1138–1152. Yessenalina, A., Yue, Y., & Cardie, C. (2010). Multi-level structured models for document-level sentiment classification. In Proceedings of the 2010 conference on

empirical methods in natural language processing. EMNLP ’10 (pp. 1046–1056).

Zaki, M. J., & Meira, W., Jr. (2014). Data mining and analysis: Fundamental concepts and algorithms. Cambridge University Press.

Zhang, H., Yu, Z., Xu, M., & Shi, Y. (2011). Feature-level sentiment analysis for chinese product reviews. In 3rd International conference on computer research and development (ICCRD) (Vol. 2, pp. 135–140).

Zhao, J., Dong, L., Wu, J., & Xu, K. (2012). Moodlens: An emoticon-based sentiment analysis system for chinese tweets. Proceedings of the 18th ACM SIGKDD