Modeling and Representing Negation in Data-driven Machine Learning-based Sentiment Analysis
Robert Remus
Natural Language Processing Group Department of Computer Science
University of Leipzig, Germany
ESSEM-2013 — December 3rd, 2013
Negation Modeling — Introduction I
In sentiment analysis (SA), negation plays a special role [Wiegand et al., 2010]:
(1) They are hcomfortable to weari+.
(2) They are hnot hcomfortable to weari+i−.
2 · 13
Negation Modeling — Introduction II
Negations . . .
are expressed vianegation words/signals, e.g.
“don’t x”
“no findings of x”
“rules out x”
. . .
and viamorphology, e.g.
“un-x”
“x-free”
“x-less”
. . .
have anegation scope, i.e. the words that are negated, e.g.
(1) They are not comfortable to wear.
Negation Modeling — Introduction III
In compositional semantic approachesto SA, negations are usually captured via some ad hoc rule(s), e.g.
“Polarity(not [arg1]) = ¬Polarity(arg1)”
[Choi & Cardie, 2008]
But what about
(1) The stand doesn’t work.
(2) The stand doesn’t work well.
?
How to model and represent negationin a data-driven machine learning-based approach to SA
. . . based solely on word n-grams and
. . .w/o lexical resources, such as SentiWordNet [Esuli & Sebastiani, 2006]
?
4 · 13
Negation Modeling — Implicitly
Implicit negation modeling viahigher order word n-grams:
bigrams (“*n’t return”)
trigrams (“lack of padding”)
tetragrams (“denied sending wrong size”)
. . .
So, we don’t need to incorporate extra knowledge of negation into our model, that’s convenient!
But what about long negation scopes(length ≥ 4) as in (1) The leather straps have never worn out or broken.
?
Long negation scopes are the rule, not the exception! (>70%)
Word n-grams (n < 5) don’t capture such long negation scopes
Learning models using word n-grams (n ≥ 3) is usually backed up by almost no findingsin the training data
Negation Modeling — Explicitly I
Let’s incorporate some knowledge of negation into our model and model negation explicitly!
Vital: negation scope detection(NSD)
(1) They don’t stand up to laundering very wellstand up to laundering very well, in that they shrink up quite a bit.
e.g. via
NegEx1 — regular expression-based = “baseline”
LingScope2— CRF-based = “state-of-the-art”
1http://code.google.com/p/negex/
2http://sourceforge.net/projects/lingscope/
6 · 13
Negation Modeling — Explicitly II
Once negation scopes are detected, negated and non-negated word n-grams need to be explicitly represented in feature space:
W = {wi}, i = 1, . . . , d word n-grams
X = {0, 1}d feature space of size d where for xj∈ X
xjk= 1 denotes the presence of wk
xjk= 0 denotes the absence of wk
For each feature xjk: additional feature ˘xjk
x˘jk= 1 encodes that wk appears negated
x˘jk= 0 encodes that wk appears non-negated
Result: augmented feature space ˘X = {0, 1}2d
In ˘X we are now able to represent whether a word n-gram
w is present ([1, 0]),
w is absent ([0, 0]),
w is present and negated ([0, 1]) or
w is present both negated and non-negated ([1, 1]).
Negation Modeling — Explicitly III
Example: explicit negation modeling for word unigrams in
(1) They don’t stand up to laundering very well, in that they shrink up quite a bit.
Na¨ıve tokenization that splits at white spaces
Ignore punctuation characters
Vocabulary Wuni= {“bit”, “don’t”, “down”, “laundering”, “quite”,
“shrink”, “stand”, “up”, “very”, “well”}
Scheme bit don’t down laundering quite shrink stand up/up very well
w/ [1, 0 1, 0 0, 0 0, 1 1, 0 1, 0 0, 1 1, 1 0, 1 0, 1]
w/o [1 1 0 1 1 1 1 1 1 1 ]
Table : Stylized feature vectors of example (1).
8 · 13
Negation Modeling — Evaluation I
3 SA subtasks:
1. In-domain document-level polarity classification on
10 domains from [Blitzer et al., 2007]’s Multi-Domain Sentiment Dataset v2.0
2. Cross-domain document-level polarity classification on
90 source domain–target domain pairs from the same data set 3. Sentence-level polarity classification on
[Pang & Lee, 2005]’s sentence polarity dataset v1.0
Negation Modeling — Evaluation II
Standard setup:
SVMs, linear kernel, fixed C = 2.0
Implicit negation modeling/features: word {uni,bi,tri}-grams
Explicit negation modeling
word {uni,bi,tri}-grams
NSD: NegEx & LingScope
Evaluation measure: accuracy averaged over 10-fold cross validations
For cross-domain experiments: 3 domain adaptation methods
= lots & lots & lots of combinations . . .3
3Summarized evaluation results are to be found in the paper corresponding to this talk. Additionally, full evaluation results are available at
http://asv.informatik.uni-leipzig.de/staff/Robert_Remus
10 · 13
Negation Modeling — Results “in a nutshell”
Explicitly modeling negation always yields statistically significant better results than modeling it only implicitly
Explicitly modeling negation not only of word unigrams, but of higher order word n-grams is beneficial
Discriminative data-driven word n-gram models + explicit negation modeling = competitive: outperforms several state-of-the-art models
LingScope performs better than NegEx
Negation Modeling — Future Work
Given appropriate scope detection methods, our approach is easily extensible to model
othervalence shifters[Polanyi & Zaenen, 2006], e.g. intensifiers like
“very” or “many”
hedges[Lakoff, 1973], e.g.“may” or “might”.
Accounting fornegation scopes in the scope of other negations:
(1) I hdon’t care that they are hnot really leatherii.
12 · 13
Thanks!
Any questions or suggestions?
Appendix — Literature I
Blitzer, J., Dredze, M., & Pereira, F. C. (2007).
Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification.
In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL) (pp. 440–447).
Choi, Y. & Cardie, C. (2008).
Learning with compositional semantics as structural inference for subsentential sentiment analysis.
In Proceedings of the 13th Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 793–801).
Appendix — Literature II
Esuli, A. & Sebastiani, F. (2006).
SentiWordNet: A publicly available lexical resource for opinion mining.
In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC) (pp. 417–422).
Lakoff, G. (1973).
Hedging: A study in media criteria and the logic of fuzzy concepts.
Journal of Philosophical Logic, 2, 458–508.
Pang, B. & Lee, L. (2005).
Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales.
In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL) (pp. 115–124).
Appendix — Literature III
Polanyi, L. & Zaenen, A. (2006).
Contextual valence shifters.
In J. G. Shanahan, Y. Qu, & J. Wiebe (editors), Computing Attitude and Affect in Text: Theory and Application, Ausgabe 20 of The Information Retrieval Series. Computing Attitude and Affect in Text: Theory and Applications (pp. 1–9). Dordrecht:
Springer.
Wiegand, M., Balahur, A., Roth, B., Klakow, D., & Montoyo, A.
(2010).
A survey on the role of negation in sentiment analysis.
In Proceedings of the 2010 Workshop on Negation and Speculation in Natural Language Processing (NeSp-NLP) (pp.
60–68).