Modeling and Representing Negation in Data-driven Machine Learning-based Sentiment Analysis

(1)

Modeling and Representing Negation in Data-driven Machine Learning-based Sentiment Analysis

Robert Remus

[email protected]

Natural Language Processing Group Department of Computer Science

University of Leipzig, Germany

ESSEM-2013 — December 3rd, 2013

(2)

Negation Modeling — Introduction I

In sentiment analysis (SA), negation plays a special role [Wiegand et al., 2010]:

(1) They are hcomfortable to weari⁺.

(2) They are hnot hcomfortable to weari⁺i⁻.

2 · 13

(3)

Negation Modeling — Introduction II

Negations . . .

are expressed vianegation words/signals, e.g.

“don’t x”

“no findings of x”

“rules out x”

. . .

and viamorphology, e.g.

“un-x”

“x-free”

“x-less”

. . .

have anegation scope, i.e. the words that are negated, e.g.

(1) They are not comfortable to wear.

(4)

Negation Modeling — Introduction III

In compositional semantic approachesto SA, negations are usually captured via some ad hoc rule(s), e.g.

“Polarity(not [arg1]) = ¬Polarity(arg1)”

[Choi & Cardie, 2008]

But what about

(1) The stand doesn’t work.

(2) The stand doesn’t work well.

?

How to model and represent negationin a data-driven machine learning-based approach to SA

. . . based solely on word n-grams and

. . .w/o lexical resources, such as SentiWordNet [Esuli & Sebastiani, 2006]

?

4 · 13

(5)

Negation Modeling — Implicitly

Implicit negation modeling viahigher order word n-grams:

bigrams (“*n’t return”)

trigrams (“lack of padding”)

tetragrams (“denied sending wrong size”)

. . .

So, we don’t need to incorporate extra knowledge of negation into our model, that’s convenient!

But what about long negation scopes(length ≥ 4) as in (1) The leather straps have never worn out or broken.

?

Long negation scopes are the rule, not the exception! (>70%)

Word n-grams (n < 5) don’t capture such long negation scopes

Learning models using word n-grams (n ≥ 3) is usually backed up by almost no findingsin the training data

(6)

Negation Modeling — Explicitly I

Let’s incorporate some knowledge of negation into our model and model negation explicitly!

Vital: negation scope detection(NSD)

(1) They don’t stand up to laundering very wellstand up to laundering very well, in that they shrink up quite a bit.

e.g. via

NegEx¹ — regular expression-based = “baseline”

LingScope²— CRF-based = “state-of-the-art”

1http://code.google.com/p/negex/

2http://sourceforge.net/projects/lingscope/

6 · 13

(7)

Negation Modeling — Explicitly II

Once negation scopes are detected, negated and non-negated word n-grams need to be explicitly represented in feature space:

W = {wi}, i = 1, . . . , d word n-grams

X = {0, 1}^d feature space of size d where for xj∈ X

xj_k= 1 denotes the presence of wk

xj_k= 0 denotes the absence of wk

For each feature x_j_k: additional feature ˘x_j_k

x˘j_k= 1 encodes that wk appears negated

x˘j_k= 0 encodes that wk appears non-negated

Result: augmented feature space ˘X = {0, 1}^2d

In ˘X we are now able to represent whether a word n-gram

w is present ([1, 0]),

w is absent ([0, 0]),

w is present and negated ([0, 1]) or

w is present both negated and non-negated ([1, 1]).

(8)

Negation Modeling — Explicitly III

Example: explicit negation modeling for word unigrams in

(1) They don’t stand up to laundering very well, in that they shrink up quite a bit.

Na¨ıve tokenization that splits at white spaces

Ignore punctuation characters

Vocabulary W_uni= {“bit”, “don’t”, “down”, “laundering”, “quite”,

“shrink”, “stand”, “up”, “very”, “well”}

Scheme bit don’t down laundering quite shrink stand up/up very well

w/ [1, 0 1, 0 0, 0 0, 1 1, 0 1, 0 0, 1 1, 1 0, 1 0, 1]

w/o [1 1 0 1 1 1 1 1 1 1 ]

Table : Stylized feature vectors of example (1).

8 · 13

(9)

Negation Modeling — Evaluation I

3 SA subtasks:

1. In-domain document-level polarity classification on

10 domains from [Blitzer et al., 2007]’s Multi-Domain Sentiment Dataset v2.0

2. Cross-domain document-level polarity classification on

90 source domain–target domain pairs from the same data set 3. Sentence-level polarity classification on

[Pang & Lee, 2005]’s sentence polarity dataset v1.0

(10)

Negation Modeling — Evaluation II

Standard setup:

SVMs, linear kernel, fixed C = 2.0

Implicit negation modeling/features: word {uni,bi,tri}-grams

Explicit negation modeling

word {uni,bi,tri}-grams

NSD: NegEx & LingScope

Evaluation measure: accuracy averaged over 10-fold cross validations

For cross-domain experiments: 3 domain adaptation methods

= lots & lots & lots of combinations . . .³

3Summarized evaluation results are to be found in the paper corresponding to this talk. Additionally, full evaluation results are available at

http://asv.informatik.uni-leipzig.de/staff/Robert_Remus

10 · 13

(11)

Negation Modeling — Results “in a nutshell”

Explicitly modeling negation always yields statistically significant better results than modeling it only implicitly

Explicitly modeling negation not only of word unigrams, but of higher order word n-grams is beneficial

Discriminative data-driven word n-gram models + explicit negation modeling = competitive: outperforms several state-of-the-art models

LingScope performs better than NegEx

(12)

Negation Modeling — Future Work

Given appropriate scope detection methods, our approach is easily extensible to model

othervalence shifters[Polanyi & Zaenen, 2006], e.g. intensifiers like

“very” or “many”

hedges[Lakoff, 1973], e.g.“may” or “might”.

Accounting fornegation scopes in the scope of other negations:

(1) I hdon’t care that they are hnot really leatherii.

12 · 13

(13)

Thanks!

Any questions or suggestions?

(14)

Appendix — Literature I

Blitzer, J., Dredze, M., & Pereira, F. C. (2007).

Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification.

In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL) (pp. 440–447).

Choi, Y. & Cardie, C. (2008).

Learning with compositional semantics as structural inference for subsentential sentiment analysis.

In Proceedings of the 13th Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 793–801).

(15)

Appendix — Literature II

Esuli, A. & Sebastiani, F. (2006).

SentiWordNet: A publicly available lexical resource for opinion mining.

In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC) (pp. 417–422).

Lakoff, G. (1973).

Hedging: A study in media criteria and the logic of fuzzy concepts.

Journal of Philosophical Logic, 2, 458–508.

Pang, B. & Lee, L. (2005).

Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales.

In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL) (pp. 115–124).

(16)

Appendix — Literature III

Polanyi, L. & Zaenen, A. (2006).

Contextual valence shifters.

In J. G. Shanahan, Y. Qu, & J. Wiebe (editors), Computing Attitude and Affect in Text: Theory and Application, Ausgabe 20 of The Information Retrieval Series. Computing Attitude and Affect in Text: Theory and Applications (pp. 1–9). Dordrecht:

Springer.

Wiegand, M., Balahur, A., Roth, B., Klakow, D., & Montoyo, A.

(2010).

A survey on the role of negation in sentiment analysis.

In Proceedings of the 2010 Workshop on Negation and Speculation in Natural Language Processing (NeSp-NLP) (pp.

60–68).