• Non ci sono risultati.

Modeling and Representing Negation in Data-driven Machine Learning-based Sentiment Analysis

N/A
N/A
Protected

Academic year: 2022

Condividi "Modeling and Representing Negation in Data-driven Machine Learning-based Sentiment Analysis"

Copied!
16
0
0

Testo completo

(1)

Modeling and Representing Negation in Data-driven Machine Learning-based Sentiment Analysis

Robert Remus

[email protected]

Natural Language Processing Group Department of Computer Science

University of Leipzig, Germany

ESSEM-2013 — December 3rd, 2013

(2)

Negation Modeling — Introduction I

 In sentiment analysis (SA), negation plays a special role [Wiegand et al., 2010]:

(1) They are hcomfortable to weari+.

(2) They are hnot hcomfortable to weari+i.

2 · 13

(3)

Negation Modeling — Introduction II

 Negations . . .

 are expressed vianegation words/signals, e.g.

 “don’t x”

 “no findings of x”

 “rules out x”

 . . .

and viamorphology, e.g.

 “un-x”

 “x-free”

 “x-less”

 . . .

 have anegation scope, i.e. the words that are negated, e.g.

(1) They are not comfortable to wear.

(4)

Negation Modeling — Introduction III

 In compositional semantic approachesto SA, negations are usually captured via some ad hoc rule(s), e.g.

 “Polarity(not [arg1]) = ¬Polarity(arg1)”

[Choi & Cardie, 2008]

 But what about

(1) The stand doesn’t work.

(2) The stand doesn’t work well.

?

 How to model and represent negationin a data-driven machine learning-based approach to SA

 . . . based solely on word n-grams and

 . . .w/o lexical resources, such as SentiWordNet [Esuli & Sebastiani, 2006]

?

4 · 13

(5)

Negation Modeling — Implicitly

 Implicit negation modeling viahigher order word n-grams:

 bigrams (“*n’t return”)

 trigrams (“lack of padding”)

 tetragrams (“denied sending wrong size”)

 . . .

 So, we don’t need to incorporate extra knowledge of negation into our model, that’s convenient!

 But what about long negation scopes(length ≥ 4) as in (1) The leather straps have never worn out or broken.

?

 Long negation scopes are the rule, not the exception! (>70%)

 Word n-grams (n < 5) don’t capture such long negation scopes

 Learning models using word n-grams (n ≥ 3) is usually backed up by almost no findingsin the training data

(6)

Negation Modeling — Explicitly I

 Let’s incorporate some knowledge of negation into our model and model negation explicitly!

 Vital: negation scope detection(NSD)

(1) They don’t stand up to laundering very wellstand up to laundering very well, in that they shrink up quite a bit.

e.g. via

 NegEx1 — regular expression-based = “baseline”

 LingScope2— CRF-based = “state-of-the-art”

1http://code.google.com/p/negex/

2http://sourceforge.net/projects/lingscope/

6 · 13

(7)

Negation Modeling — Explicitly II

 Once negation scopes are detected, negated and non-negated word n-grams need to be explicitly represented in feature space:

 W = {wi}, i = 1, . . . , d word n-grams

 X = {0, 1}d feature space of size d where for xj∈ X

 xjk= 1 denotes the presence of wk

 xjk= 0 denotes the absence of wk

 For each feature xjk: additional feature ˘xjk

 x˘jk= 1 encodes that wk appears negated

 x˘jk= 0 encodes that wk appears non-negated

 Result: augmented feature space ˘X = {0, 1}2d

 In ˘X we are now able to represent whether a word n-gram

 w is present ([1, 0]),

 w is absent ([0, 0]),

 w is present and negated ([0, 1]) or

 w is present both negated and non-negated ([1, 1]).

(8)

Negation Modeling — Explicitly III

 Example: explicit negation modeling for word unigrams in

(1) They don’t stand up to laundering very well, in that they shrink up quite a bit.

 Na¨ıve tokenization that splits at white spaces

 Ignore punctuation characters

 Vocabulary Wuni= {“bit”, “don’t”, “down”, “laundering”, “quite”,

“shrink”, “stand”, “up”, “very”, “well”}

Scheme bit don’t down laundering quite shrink stand up/up very well

w/ [1, 0 1, 0 0, 0 0, 1 1, 0 1, 0 0, 1 1, 1 0, 1 0, 1]

w/o [1 1 0 1 1 1 1 1 1 1 ]

Table : Stylized feature vectors of example (1).

8 · 13

(9)

Negation Modeling — Evaluation I

 3 SA subtasks:

1. In-domain document-level polarity classification on

 10 domains from [Blitzer et al., 2007]’s Multi-Domain Sentiment Dataset v2.0

2. Cross-domain document-level polarity classification on

 90 source domain–target domain pairs from the same data set 3. Sentence-level polarity classification on

 [Pang & Lee, 2005]’s sentence polarity dataset v1.0

(10)

Negation Modeling — Evaluation II

 Standard setup:

 SVMs, linear kernel, fixed C = 2.0

 Implicit negation modeling/features: word {uni,bi,tri}-grams

 Explicit negation modeling

 word {uni,bi,tri}-grams

 NSD: NegEx & LingScope

 Evaluation measure: accuracy averaged over 10-fold cross validations

 For cross-domain experiments: 3 domain adaptation methods

 = lots & lots & lots of combinations . . .3

3Summarized evaluation results are to be found in the paper corresponding to this talk. Additionally, full evaluation results are available at

http://asv.informatik.uni-leipzig.de/staff/Robert_Remus

10 · 13

(11)

Negation Modeling — Results “in a nutshell”

 Explicitly modeling negation always yields statistically significant better results than modeling it only implicitly

 Explicitly modeling negation not only of word unigrams, but of higher order word n-grams is beneficial

 Discriminative data-driven word n-gram models + explicit negation modeling = competitive: outperforms several state-of-the-art models

 LingScope performs better than NegEx

(12)

Negation Modeling — Future Work

 Given appropriate scope detection methods, our approach is easily extensible to model

 othervalence shifters[Polanyi & Zaenen, 2006], e.g. intensifiers like

“very” or “many”

 hedges[Lakoff, 1973], e.g.“may” or “might”.

 Accounting fornegation scopes in the scope of other negations:

(1) I hdon’t care that they are hnot really leatherii.

12 · 13

(13)

Thanks!

Any questions or suggestions?

(14)

Appendix — Literature I

Blitzer, J., Dredze, M., & Pereira, F. C. (2007).

Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification.

In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL) (pp. 440–447).

Choi, Y. & Cardie, C. (2008).

Learning with compositional semantics as structural inference for subsentential sentiment analysis.

In Proceedings of the 13th Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 793–801).

(15)

Appendix — Literature II

Esuli, A. & Sebastiani, F. (2006).

SentiWordNet: A publicly available lexical resource for opinion mining.

In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC) (pp. 417–422).

Lakoff, G. (1973).

Hedging: A study in media criteria and the logic of fuzzy concepts.

Journal of Philosophical Logic, 2, 458–508.

Pang, B. & Lee, L. (2005).

Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales.

In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL) (pp. 115–124).

(16)

Appendix — Literature III

Polanyi, L. & Zaenen, A. (2006).

Contextual valence shifters.

In J. G. Shanahan, Y. Qu, & J. Wiebe (editors), Computing Attitude and Affect in Text: Theory and Application, Ausgabe 20 of The Information Retrieval Series. Computing Attitude and Affect in Text: Theory and Applications (pp. 1–9). Dordrecht:

Springer.

Wiegand, M., Balahur, A., Roth, B., Klakow, D., & Montoyo, A.

(2010).

A survey on the role of negation in sentiment analysis.

In Proceedings of the 2010 Workshop on Negation and Speculation in Natural Language Processing (NeSp-NLP) (pp.

60–68).

Riferimenti

Documenti correlati

As far as pragmatics is concerned, several studies have found a worsened performance at the comprehension of communicative acts when their literal meaning does not correspond to

La materia, per moltissimo tempo regolata dal diritto internazionale consuetudinario, è oggi disciplinata anche dalla Convenzione di Vienna sulle relazioni diplomatiche del 1961;

The national cohorts are supported by Danish Cancer Society (Denmark); Ligue Contre le Cancer, Institut Gustave Roussy, Mutuelle Generale de l'Education Nationale, Institut

Ora, se è vero che le preoccupazioni di Mitchell sono rivolte non solo (a) all’analisi della costruzione sociale della sfera del visibile, ma ancor di più (b) alla comprensione

The algorithm is based on a shallow Neural Network with three hidden layers, used as fall/non fally classifier, trained with daily life activities features and fall features..

Fourth, concentrating on the present by relying on the adoption of a social approach to innova- tion management, small family firm can engage in collaborative innovation – “a form

The federal countries are different in relation to factors such as: levels of governments (although three federal state levels are usually observable:

16 Giugno 1976, Norme Tecniche delle opere in cemento armato normale e precompresso e per le strutture metalliche – Gazzetta ufficiale della Repubblica Italiana n.. –