• Non ci sono risultati.

A distributional account of metonymy

N/A
N/A
Protected

Academic year: 2021

Condividi "A distributional account of metonymy"

Copied!
111
0
0

Testo completo

(1)

DIPARTIMENTO DI

FILOLOGIA, LETTERATURA E LINGUISTICA

CORSO DI LAUREA IN LINGUISTICA E TRADUZIONE

TESI DI LAUREA

A Distributional Account of Metonymy

CANDIDATO RELATORE

Paolo Pedinotti Prof. Alessandro Lenci CORRELATORE

Prof.ssa Giovanna Marotta

(2)
(3)

1 INTRODUCTION 4

2 METONYMY AND LINGUISTIC THEORIES 7

2.1 DEFINITION OF METONYMY AND SOME IMPORTANT DISTINCTIONS 7 2.2 VARIABILITY IN THE LEXICON:PUSTEJOVSKY’S GENERATIVE LEXICON 11

2.3 METONYMY BEHIND THE LEXICON: COERCION 15

2.4 METONYMY AND CONCEPTUAL KNOWLEDGE: A COGNITIVE LINGUISTIC PERSPECTIVE 23 2.5 METONYMY PREDICTION IN AN EXPECTATION-BASED FRAMEWORK 26 2.6 PSYCHOLINGUISTIC STUDIES OF METONYMY COMPREHENSION 30

3 METONYMY AND DISTRIBUTIONAL SEMANTICS 34

3.1 DISTRIBUTIONAL REPRESENTATIONS AND THEIR PROPERTIES 34 3.2 DISTRIBUTIONAL REPRESENTATIONS OF WORD MEANING IN CONTEXT AND METONYMY 40

4 A DATASET FOR METONYMY RESOLUTION 49

4.1 GOALS AND STRUCTURE OF THE DATASET 49

4.2 DESCRIPTION OF THE DATASET 51

4.3 PRE-TEST 59

5 A DISTRIBUTIONAL MODEL FOR METONYMY RESOLUTION 64

6 EVALUATION 72

6.1 TASK 72

6.2 RESULTS 72

6.3 DISCUSSION 84

7 CONCLUSION 94

8 APPENDIX 1. PRECISION AND AP SCORES FOR EACH ITEM IN THE DATASET 97 9 APPENDIX 2. THEMATIC FIT FOR VERB-ARGUMENT COMBINATIONS IN THE DATASET

100

(4)

List of figures

FIGURE 2.1A BASIC TAXONOMY OF SEMANTIC TYPES (FROM PUSTEJOVSKY AND BATIUKOVA 2019) ... 13

FIGURE 2.2ENTITY SUBTYPES (FROM PUSTEJOVSKY AND BATIUKOVA 2019) ... 17

FIGURE 2.3LEXICAL PRIMING RELATIONS (FROM MCRAE AND MATSUKI 2009) ... 27

FIGURE 3.1WORD VECTORS IN A THREE-DIMENSIONAL VECTOR SPACE (FROM LENCI 2018) ... 36

FIGURE 3.2PERFORMANCE OF DISTRIBUTIONAL MODELS (FROM HILL ET AL.2015) ... 39

FIGURE 3.3REPRESENTATION FOR THE VECTOR OF “CHARGE” AND THE CONTEXTUALIZED VECTOR IN CONTEXT “CHARGE A TAX”(FROM THATER ET AL.2011) ... 41

FIGURE 3.4A TOY EXAMPLE OF DEG(FROM CHERSONI ET AL.2019)... 43

FIGURE 4.1DATASET FOR METONYMY RESOLUTION... 58

FIGURE 4.2FOR EACH ITEM, NUMBER OF SUBJECTS WHO SELECTED THE METONYMIC AND LITERAL READING RESPECTIVELY ... 63

FIGURE 5.1MOST EXPECTED DIRECT OBJECTS ASSOCIATED WITH “COMMEMORATE” ... 67

FIGURE 5.2VISUAL REPRESENTATION OF THE MODEL ... 69

FIGURE 6.1PRECISION VALUES AT DIFFERENT CUTOFFS ... 75

FIGURE 6.2 PRECISION AND MAP VALUES (TOGETHER WITH STANDARD DEVIATION) ... 77

FIGURE 6.3CASES FROM THE DATASET IN WHICH THE EXPECTATIONS ACTIVATED BY THE VERB DIFFER FROM THOSE THAT ARE ACTIVATED BY THE VERB AND THE SECOND ARGUMENT OF THE SENTENCE ... 85

FIGURE 6.4ASSOCIATION VALUES OF THE CONTEXT VECTOR “NMOD:FROM=STARBUCKS” WITH THE EMBEDDINGS OF 10 TYPICAL SUBJECTS OF “LOCATED” ... 89

(5)

1 Introduction

Even outside linguistics, it is widely accepted that linguistic expressions maintain a stable association with their referents. Speakers of a language use their shared knowledge of these relations in order to communicate. For example, using a word allows us to refer to one thing (or more in the case of homonymy or polysemy). We are going to use the same word (or some synonym for it), as long as we have to refer to that thing.

The concern of semantic theories has been to give an account of this relation. In particular, the approach to meaning known as formal semantics (Montague 1974, Cann 1993) emphasized its role in language and aimed to provide a way of representing linguistic elements that eliminates any ambiguity. This is accomplished by translating the expressions of natural languages into the logical categories of a formal language (Frege 1879).

There are other aspects of linguistic meaning that we still have to consider, however, in our efforts to give an adequate account of it. Many of these aspects involve its relation with the context in which sentences are uttered. Amongst these, there are conversational conventions, physical aspects of the situation of utterance, social and cultural constraints on interaction. The distinction between sentence meaning (the aspect of meaning that does not depend on context) and speaker meaning implies a division of labor between semantics and pragmatics, where semantics is concerned with an “abstract” sense of meaning, while all aspects of language use are a matter of pragmatics. Thus, basic formal models built on a simplistic model of communication whereby the shared knowledge of the interlocutors is purely linguistic (i.e. it concerns the relation of signs to the things they are used to refer to) and the process of communication is not affected by external factors.

However, a more complex picture emerges when considering how people actually communicate. More specifically, we realize that the principle of correspondence between a sign and its referent can be overridden, with the speakers using the name of one entity to refer to another entity. Such referential shifts may take place for

(6)

various reasons. For example, they are a useful tool for enabling speakers who are under time pressure to achieve efficient communication, since they allow us to communicate complex ideas through more simple and concrete concepts. An example of this is:

(1) Put the kettle on. I'll be home by five o'clock. (Littlemore 2015)

Here, the expression Put the kettle on refers to the whole process of preparing a cup of tea. By using this communicative shorthand, the speaker is able to avoid listing all the actions involved in the process. It has been shown that meaning shifts such as this serve a wide range of communicative functions, among which are creative functions such as euphemism, humour and irony (Littlemore 2015).

Although the meaning of the expression put the kettle on and its intended referent in (1) are distinct, they are nonetheless related (the action occurs at the beginning of the process of preparing a cup of tea). That is, the example involves the process of metonymic thinking (Lakoff and Johnson 1980). As we will see in this work, metonymic thinking is very widespread and affects language. Section 2.1 is devoted to the definition of the concept of metonymy.

The question I want to pursue here is this: can we include metonymy within the scope of our semantic theory without giving up the strengths that formal approaches bring? Can we obtain a representation of meaning that (a) retains the advantages of formal accounts by including directly interpretable formal structures and (b) accounts for language variability, and thus is capable of predicting metonymic meaning? We can see that the phenomenon of metonymy is not completely unpredictable. This claim is supported by the fact that people are capable of understanding the metonymic meaning, even when they are faced with non-conventionalized metonymies. Moreover, there are limits on metonymic operations, and people appear to have knowledge of them.

Therefore, it is possible to identify factors that drive the interpretation of metonymy in language. In this work, I will focus on the role of world knowledge (knowledge about how we view the world). I will present two different characterizations of world

(7)

knowledge (namely frames and Generalized event knowledge, cf. sections 2.4 and 2.5).

Formal approaches presuppose a distinction between linguistic knowledge and world knowledge, and hence exclude the latter from the semantic representation. This assumption is shared by Pustejovsky’s Generative Lexicon Theory, which will be illustrated in section 2.2. The theory provides a model for the treatment of productive metonymic relationships (such as the Container-for-content metonymy), but is not flexible enough to capture the generative nature of metonymy.

A model for metonymy resolution based on world knowledge requires this kind of knowledge to be represented and accessible for interpretation. Distributional models of word meaning (which will be presented in section 3.1) provide a way of extracting it from co-occurrence statistics of words in large text corpora. This information can, in turn, be modeled with data structures and integrated in the semantic representation (Chersoni et al. 2019, see section 3.2). In this work, I will present a distributional model for metonymy resolution. The model is empirically adequate (it is consistent with the results of previous psycholinguistic studies of metonymy comprehension, see section 2.6) and able to generate predictions about the interpretation of words used metonymically. The model is evaluated against a dataset including a wide range of metonymy types (the dataset will be presented in chapter 4).

(8)

2 Metonymy and linguistic theories

2.1 Definition of metonymy and some important distinctions

Metonymy is a cognitive and linguistic process that occurs when we use one thing to refer to another thing that is linked to it by a relationship of some kind.

Metonymy is a feature of our thought processes: if we ask a group of people to think of a complex entity like "France" and picture it, someone will picture an iconic representation of France like the Eiffel Tower, someone a place in France that they visited, someone another aspect of France easy-to-perceive. No one will picture the whole of France because this information cannot be contained in working memory (Littlemore, 2015).

Language is affected by metonymic thinking. Metonymy in the language serves (among others) a referential purpose, as it is employed by language users as a communicative shortcut that gives them the ability to communicate using fewer words than they would otherwise need. This operation can be related to a type of Gricean maxim of Quantity in which the speaker makes his contribution as informative as is required by communicating less than is inferred from the message (Ziegeler, 2007). The following sentences are examples of metonymy:

(2) a. You can always hire good brains.

b. This bottle is delicious.

c. The green shirt behind you is making me nervous. (Pustejovsky, 2019, p. 334)

d. Can you pass the salt?

e. She was able to pass the driver’s test. (Kövecses and Radden, 1998, p. 73)

The sentences in (2 a,b,c) are examples of referential metonymy (according to the terminology used by Warren 2006), as the meaning shift takes place between lexical items. In (2a) brains stands for "people with brains": the sentence is an example of

(9)

Part for whole metonymy. In (2b) bottle is used metonymically to mean "the liquid contained in the bottle": in this case the entities in relation to each other are the container and its content (Container-for-content metonymy). In (2c) the meaning of green shirt is "the person with the green shirt": the garment stands for the person who wears it. In the sentences in (2 d,e) metonymy does not relate two lexical items but entire propositions. In (2d) a question about a person's ability to pass salt stands for a request for the salt to be passed. In (2e) the assertion of the ability to pass the test signifies the actual passing of the test (Potentiality for actuality metonymy, cf. Panther and Thornburg 1999). Although different types of metonymy involve metonymic processes of a non-lexical nature, in this work I will examine cases of referential metonymies only.

Referential metonymies can be distinguished by two criteria: (a) The relation (according to a cognitive approach to meaning) between the concept literally denoted by the metonymic expression and that communicated; and (b) the degree of conventionalization, that is, the strength of association between the literal and the metonymic meaning:

(a) Metonymies vs. Facets: Croft and Cruse (2004) observed that in certain cases of meaning shift the concepts associated with metonymic meanings are components of a whole (a Gestalt in their analysis). It seems that these cases slightly differ from classical metonymies, in which a basic literal meaning can be identified (these instances of metonymies have been analyzed as dot objects, see section 2.2):

(3) a. Paris is a beautiful city. (location)

b. Paris closed the Boulevard st. Michel. (government)

c. Paris elected the Green candidate as mayor. (population)

(4) They played lots of Mozart. (Croft and Cruse 2004).

The nature of the concept expressed by the word Paris is different from that associated to Mozart, in that the former can be defined only as a whole. That is to say, the word Paris can be analyzed as having different readings. Such words are to

(10)

be distinguished from cases of polysemy such as bank (see below), since the senses are intuitively related to each other. In cognitive linguistic terms, they do not show attentional autonomy (if one is at the focus of attention, the other is not excluded, cf. Croft and Cruse 2004). Furthermore, the meanings in (3) are part of the global concept, while in (4) there is a shift between the literal and the metonymic meaning (even if the two meanings are somehow associated).

(b) Systematic vs. Circumstantial metonymy: some metonymies seem to be part of our language knowledge because they are frequently used in everyday language. Metonymies such as these are called “systematic metonymies” (Nunberg 1995, Piñango et al. 2016). Examples of systematic metonymies are Container-for-content metonymy (2b) and Producer-for-product metonymy:

(5) Reina Sofia has lots of Picasso, Dali and Miro. (Paintings of Picasso, Dali and Miro).

These patterns of metonymy are productive, in that if we assume that a word has a given meaning, we can predict that the meaning associated in the pattern will be also available.

On the other hand, there exist metonymies in which the relation between the literal and the new meaning is established in real time. These metonymies have been given labels like “circumstantial metonymies” (Piñango 2016) or “reference transfers” (Nunberg 1994). In these cases, the context seems to play an important role in licensing the metonymy. (6) is a very debated example of circumstantial metonymy:

(6) That french fries is getting impatient. (Nunberg 1995)

Assuming that the sentence in (6) is uttered in a place where french fries are served, we can take the referent of the expression french fries to be the customer that ordered french fries. Of course, we don’t normally make use of expressions denoting things that are eaten to refer to people that are eating them, but still this is licensed by contextual conditions. In this case, the assumptions concern the utterance context (i.e. information pertaining to the circumstances in which the sentence is uttered, Kamp 2016).

(11)

It should also be expected that this distinction will have consequences for the processing of metonymy, as the interpretation would require the recovery of two different types of information: stored in memory for systematic metonymy, to be inferred from context for circumstantial metonymy. I will report the findings from psycholinguistic studies concerning this issue in section 2.6.

The question I will be concerned with for the rest of this chapter is whether models of linguistic meaning are capable of representing and predicting the additional meaning of metonymic expressions. In the present chapter, I will take into account formal and cognitive approaches. In the next chapter, I will discuss the opportunities opened up by the use of distributional representations.

Metonymy (along with various other cases of figurative language use and other phenomena such as ellipsis and idioms) presents a challenge for theories of sentence meaning based on the Fregean principle of compositionality, which claims that each element of the meaning of a complex expression derives from either the lexical items that compose it or its rules of syntactic composition. This is because a consequence of this principle is that lexical items make always the same contribution to the interpretation of linguistic expressions, which is incompatible with the dynamic behavior of lexical meaning observed in the examples above. In Montague Grammar, a very well-known theory of the meaning of natural language, only one denotation is assigned to each lexical item (the denotation can’t change as long as the model in which the interpretation takes place remains the same) (Montague 1974, Cann 1993).

Despite the apparent incompatibility between the phenomenon of metonymy and a theory of compositionality based on the Fregean principle, some linguists have tried to integrate an account of it into this framework (Pustejovsky 1995, Asher 2011). They followed the principle, and thereby recognized the additional meaning as originating from either the lexical items (Pustejovsky 1995) or the syntactic rules that guide the composition of metonymic expressions (Pustejovsky 1995, Asher 2011) – thus postulating the existence of a complex structure for lexical entries and of accessories process of meaning composition accompanying those posited by formal semantics.

(12)

2.2 Variability in the lexicon: Pustejovsky’s Generative Lexicon

A very influential proposal for lexical entries with a complex structure is James Pustejovsky’s Generative Lexicon (Pustejovsky 1995, Pustejovsky and Batiukova, 2019). The fundamental claim by Pustejovsky is that in order to account for variation in lexical meaning we must make use of lexical structures that encode more information than the traditional ones.

The simplest model of the structure of a lexical entry is the so-called sense enumeration model. This is the one dictionaries make use of in order to specify the meanings associated with a word as well as the one people have the most familiarity with. It is based on the assumption that each of the meanings of a word must be listed in the lexical entry of the word as a distinct sense.

Some of the additional meanings of metonymic words are represented in the lexical entry as a list. The example in (7) is an extract from the lexical entry of the word bottle in the online Oxford’s free English dictionary (https://www.lexico.com/en -

riferimento sitografico):

(7) bottle

1. A glass or plastic container with a narrow neck, used for storing drinks or other liquids (‘he opened the bottle of beer’)

1.1 The contents of a bottle (‘she managed to get through a bottle of wine’)

1.2 (the bottle) informal Used in reference to the heavy drinking of alcohol (‘more women are taking to the bottle’)

The example shows that not only do the lexical entry of bottle specifies the literal meaning and the well-known metonymic content meaning, but also an event meaning (the drinking of alcohol). Similarly, by using such lexical entries, we can potentially represent all the metonymic meanings associated with words.

Despite its being popular and very intuitive, this model has two important shortcomings. The first is that it has nothing to say about the way in which the

(13)

additional meanings are inferred, since it is not a theory of metonymy, and thus it is not capable of making predictions about it. The sense enumeration model also misses another important point: the literal and the metonymic meanings are not truly distinct senses of a word, there is rather some sort of association between them. Polysemous words whose meanings present associations like these are called logically polysemous words, in order to distinguish them from homonyms (accidental polysemy). The classic example of accidental polysemy is the word bank: there is no relationship between the financial institution sense (as in the expression The bank went bankrupt) and the other sense (We moored the boat to the bank). Under this approach, the only way to capture these associations is through organizing hierarchically the senses (by making use of subindexes as in (7)).

In order to get this association between meanings Pustejovsky makes use in his theory of complex symbolic representations called “dot objects”. He argues that words such as bottle encode relations between types or aspects rather than different meanings. Semantic types are ontological categories associated with linguistic expressions. It has been acknowledged that many of them are reflected in the grammar of a language (Asher 2011). A basic ontology (from Pustejovsky and Batiukova, 2019) is showed in Figure 2.1.

The relation between types is symbolized by the dot operator (•), which stands for a specific semantic relation that must in turn be specified.

(14)

Which metonymic patterns of association can be captured by the dot object formalism? The answer depends on the level of granularity of the taxonomy of types as well as the semantic relations that the dot operator can symbolize. In his most recent exposition of Generative Lexicon Theory (Pustejovsky and Batiukova, 2019), Pustejovsky claims that dot objects embrace a wide range of metonymic patterns. Among them are Container-for-content metonymy (symbolized as CONTAINER • CONTENT), metonymies where an object stands for the information contained in it (e.g. John bought a book / This is a good book, PHYSICAL_OBJECT • INFORMATION), Producer-for product-metonymy (PRODUCER • PRODUCT). This formalism can also represent more complex types such as facets by positing two dot operators for the same lexical entry (words such as New York are typed as LOCATION • HUMAN GROUP • ORGANIZATION).

Given that metonymy is part of our linguistic knowledge, the question arises: to what extent is this knowledge due to lexical knowledge and/or to other generative mechanisms? There may be metonymic patterns of association that are unlikely to be stored in lexical long-term memory – models of lexical meanings such as the sense

Figure 2.1 A basic taxonomy of semantic types (from Pustejovsky and Batiukova 2019) Top Entity Physical Abstract Property Intrinsic Extrinsic Event Static Dynamic

(15)

enumeration model and the dot objects will have nothing to say about these. Circumstantial metonymies are good candidates for non-lexicalized metonymies, but they are not the only ones: Pustejovsky argues that some meaning shifts, although widespread in the language, may arise due to the influence of context. An example of this in (8) is logical metonymy:

(8) a. Mary bought a book. (The physical object)

b. Mary finished a book. (Some event on the book, i.e. reading, writing)

It would be difficult to suppose that all the potential event readings assumed by the word book in context can be recovered from the lexical entry. Nevertheless, sentences such as (8b) are very common in language.

It is not clear how much “metonymic knowledge” the knowledge of the lexicon can contain. A test to distinguish lexical ambiguities from other meaning extension is copredication: if two predicates that select for different senses can felicitously be applied to the same argument, then the argument is logically polysemous with respect to those senses. For example, the noun lunch supports copredication in which one predicate requires the event sense while the other requires the meal sense, as can be seen in the well-formedness of the following example:

(9) Lunch was delicious but took forever. (Asher 2011)

Assuming that every type of metonymy has its own representation in the lexicon would mean to put a heavy burden on lexical semantics, since taxonomies of metonymy types proposed in the literature (Lakoff and Johnson, 1980, Radden and Kövecses, 1999) contain a wide range of metonymy types (the dataset built for the present work contains 17 different types). Furthermore, an account based only on lexical information cannot capture an important aspect of the nature of metonymy, that is, the fact that we generate additional meanings by means of our linguistic knowledge. This generative process necessarily invokes the way we compose meaning. This will be the subject of the next section.

(16)

2.3 Metonymy behind the lexicon: coercion

Pustejovsky (Pustejovsky 1995) proposes that a semantic operation involved in compositionality can account for the different manifestations of the meaning of some expressions. He calls this operation “type coercion”.

The notion of coercion has been a debated topic in the linguistic literature even before Pustejovsky’s Generative Lexicon (Lauwers and Willems, 2011). Pustejovsky’s type coercion is only one facet of a broader class: more generally, the term coercion has been used to refer to the resolution of the mismatch between the semantic requirements of a selector (be it a construction, a word class, a temporal or aspectual marker) and the inherent semantic properties of a selected element, resulting in an accommodation of the meaning of the selected element. The selector can be a word class, a temporal or aspectual marker, or a more complex linguistic expression such as construction (Lauwers and Willems, 2011). Examples of coercion are the following:

(10) a. I began a book.

b. He is remaining stable.

c. He sneezed the napkin off the table.

As can be seen from the examples, the concept of coercion can be extended to various levels of syntactic complexity. (10a) is an example of complement coercion because the coerced element is the verb argument. The selecting predicate begin takes as its argument an expression whose semantic type, i.e. the ontological category of the entity denoted by the lexical item does not correspond to the type conventionally selected by the predicate (begin selects the type EVENT, book denotes an entity of type ENTITY). The mismatch results in a type shift of the argument (from “book” to "reading, writing a book"). In (10b) the coercing element is the grammatical morpheme changing the aspectual meaning conventionally associated to the verbal root (from non progressive aspect to progressive aspect). This type of coercion is called aspectual coercion. (10c) can be read from a constructionist perspective. Constructionist approaches (Goldberg 2006, Hoffmann

(17)

and Trousdale 2013) emphasize the role of grammatical constructions, conventional associations between forms and functions. From this perspective the additional meaning of the verb sneeze (to cause the napkin to move off the table by sneezing) comes from the abstract meaning associated with the caused-motion argument construction X CAUSES Y TO MOVE Z, which is the coercing element of the sentence.

Pustejovsky’s type coercion differs from the other types of coercion in that the semantic requirements imposed by the selector element concern semantic types. According to Pustejovsky, these requirements are explicitly encoded in the argument structure, one of the four levels of representation of a lexical entry in Generative Lexicon. The idea that the properties of arguments are predictable from the meaning of their verbs is widespread in generative frameworks (Levin and Rappaport Hovav, 2005). However, while many theories characterize the semantic properties of the argument by using labels identifying the role played by the argument in the event denoted by the predicate (semantic roles), in Generative Lexicon Argument Structure can reference semantic types such as those in figure 2.1 too. An example of Argument Structure (from Pustejovsky and Batiukova, 2019) is shown in (11):

(11) a. Mary put his watch on the table.

put(arg1[cat=DP], arg2[cat=DP], arg3[cat=PP])

put(arg1[sem=animate], arg2[sem=phys_object], arg3[sem=location])

(12) shows that Argument Structure provides information about syntactic categories and semantic types of arguments. Pustejovsky builds a formal description of Argument Structure by adopting feature structures, that is, sets of attribute-value pairs (a formalism used in some theories of grammar such as lexical-functional grammar and head-driven phrase structure grammar to represent linguistic objects, cf. Carnie 2002).

Pustejovsky claims that the interpretation of some language expressions requires simple compositional operations supplemented by type coercion. In Pustejovsky 1995, he gives a formulation of function application rule (that is, the rule corresponding to predication in the logical language of Montague grammar)

(18)

incorporating type coercion (Function Application with Coercion): according to this formulation, the shifting operator applies to the argument only if there is a mismatch between the type of the argument and that required by the predicate. If no such operator is found, a type error is given.

The actual processes involved in type coercion are the same as referential metonymy. In the metonymic sentences in (2a,c) the meaning change involves a semantic type shift to the semantic type conventionally selected by the predicate. Brains in (1a) passes from the ENTITY subtype INANIMATE to the subtype ANIMATE and the same happens in (1c) with green shirt (here I follow the classification of ENTITY subtypes of Pustejovsky and Batiukova, 2019, figure 2.2). A wide range of metonymic patterns involve type shifting. Examples included (among others): shifting from COUNT to MASS in some Content-for-container metonymies, from ANIMATE or ORGANIZATION to INANIMATE in Product-for

producer-Figure 2.2 Entity subtypes (from Pustejovsky and Batiukova 2019) ENTITY MASS SUBSTANCE AGGREGATE COUNT GROUP HUMAN GROUP INDIVIDUAL ANIMATE HUMAN ANIMAL INANIMATE ORGANIZATION

(19)

metonymies, from EVENT to ENTITY in Process/Result alternations, from ENTITY to EVENT in logical metonymies (cf. section 4.2).

The approach to coercion developed by Asher (2011) shares some features with Pustejovsky's type coercion, namely that coercion is triggered by a type mismatch, and that the predicate imposes selectional restrictions on the argument. At the same time, Asher observes that there is an analogy between the typed version of the lambda calculus (implicit in Pustejovsky's Function Application with Coercion rule) and the linguistic phenomenon of presupposition (he focuses on logical presuppositions, where presupposition is generated by linguistic elements of the sentence). This includes, among others, presuppositions conveyed by definite noun phrases (e.g., in the sentence The dog is hungry the determiner phrase The dog generates a presupposition that there is a dog in the discourse context).

If such presuppositions are not satisfied in the discourse context (i.e., they cannot be bound), the supposition is made that the discourse context contains the referent of the DP. This process is called presupposition accommodation. The intuition is that predications too can be seen as type presuppositions, in that the selectional restrictions of predicates provide a presuppositional content. Moreover, just like logical presuppositions, in coercions type presuppositions are accommodated. While there is a widespread view (implicit in the Pustejovsky's idea of a "generative" lexicon) that the coerced element changes its meaning when coercion takes place, Asher claims that accommodation is a problem about compositionality that involves complicating the logical form of predications.

The Asher approach attempts to account for many facts about coercion. Some of these issues are the following:

(12) a. John has finished the kitchen.

b. John has finished the apple.

c. The exterminator has begun (with) the bedroom.

(20)

First, we notice that the specification of the content of the predication depends on the direct object of the coercing element. The preferred reading of (12b) is that John finished doing something with the apple (presumably eating it). Of course, one might not think that John finished doing the same thing with the kitchen. Moreover, the event interpretation is a function of other arguments of the predicate. The resulting meaning of begin in (12c), for instance, is different from that of finish in (12a) even though the predicates and the direct objects have a clearly related meaning. We expect (12c) to mean something like "The exterminator has begun spraying the bedroom". Finally, the eventuality involved may vary depending on the coercing element: the verbs finish and stop have related meanings but behave very differently, in that (12d) is predicted to have a reading on which Mary stops some physical motion of the apple. Indeed, unlike (12a), the agent of the eventuality involved in the coercion is not necessarily the same as the agent of stop (i.e., the verbs have different control properties).

Asher captures these observations by extending the set of usual types used in formal semantics to include two complex types. In addition to the • type (see section above), another sort of complex type, polymorphic or dependent type, is specifically used to model coercion. Polymorphic types behave differently depending on the type of the argument to which they applied. Dependent types, on the other hand, are types whose definition depends on the value of the argument. We can handle certain adjectival modifications that pose problems for compositional theories of meaning by using polymorphic types. For example, the adjective red involves a polymorphic type because the type given by the adjective depends on the type of the argument (as in red meat, red apple, red pen).

Asher argues that all coercions involve polymorphic types (although only few require dependent types). More specifically, a type presupposition invokes a polymorphic type since it licenses a map from the type of the argument to some subtype of the demanded type in order to be satisfied. The specification of the content of the predication is therefore a subtype of the type required by the predicate and depends on the type of the argument (as well as other arguments in the sentence, given that

(21)

the polymorphic type can have other arguments). This can be seen at work in the following example of coercion:

(13) We hear the piano two floors down. (Asher 2011)

Asher assumes that "noise verbs" like piano license a form of event coercion (that is, in (13) piano stands for "some eventuality involving the piano that makes noise"). On his analysis, all of these verbs introduce a dependent type that makes the internal argument an eventuality that makes a noise. In this case, the verb introduces a unary polymorphic type (i.e, a type of one argument), in that the eventuality type is specified by the object of the predicate.

Whichever way it is accounted for, the idea of type coercion reflects an intuition about the way we derive additional meaning from metonymic expressions, namely by drawing on information about the properties of predicate arguments that is directly encoded in the representation of the predicate. In order to get the metonymic interpretation, the reader needs to combine this information with that of the argument. However, it is unclear how this combination takes place. Pustejovsky argues that coercion does not assign a new meaning to the argument; rather, it recovers the needed components from the lexical entry (he calls this process “coercion by exploitation”). This information can come from the argument’s semantic typing or from another part of the structure of the lexical entry that Pustejovsky called “qualia structure”.

Qualia Structure contains four types of information (called roles or qualia) about the entity denoted by the lexical item: constitutive information (information about the constitutive parts of the entity), formal information (the taxonomic category to which the entity belongs), telic information (information about the typical function of the object), agentive information (information about the events involved in bringing it about). The Qualia Structure is expressed as a feature structure, just like Argument Structure. The values are functors taking a sequence of arguments. The Qualia structure for the noun violin (from Pustejovsky and Batiukova, 2019) is illustrated in (14):

(22)

(14) F = musical_instrument(x) A = build(x,y)

T = produce_music_on(x,y,z) C = strings_of(w,x)

Pustejovsky exploits the information encoded in Qualia Structure to handle logical metonymies and prenominal adjectival modifications where the adjective yields a different meaning depending on his argument, as in the case of fast car and fast motorway (see above). To explain sentences such as (14b), he argues that the predicate selects for a role of the NP: in this case the verb finish would select for the TELIC-role of the noun book, thus giving an event interpretation. Similarly, the interpretation of the adjective fast will depend on the value of the argument’s TELIC-role. The problems with Qualia Structure emerge when it is invoked to account for the full range of metonymic patterns, since many metonymic relations don’t fall under any qualia. Zarcone et al. (2010) showed this for logical metonymies by asking participants of their experiment to provide instances of covert events. They found that more than half (51,8%) of 542 covert events elicited did not fall in either the agentive quale category or in the telic quale category (the two roles relevant to logical metonymy interpretation). The problem concerns not just logical metonymies but also “standard” metonymies.

The need to account for apparent violations of compositionality has led Pustejovsky to propose that the process of meaning composition draws on the internal structure of lexical items. This approach is not congruent with the standard assumption in formal semantics that just syntactic structure determines how the meaning of larger expressions than words is derived (this follows from the fact that lexical items are assumed to be undecomposable entities).

Ray Jackendoff developed these ideas into a theory of semantic composition that he called "Enriched composition" (Jackendoff 1997). The theory represents a major shift from syntax-driven theories of composition, since it is acknowledged that (a) the internal structure of lexical items determines the way they come to be combined in the semantic structure of the sentence and (b) The principles for constructing

(23)

phrase and sentence meaning may also be pragmatically motivated (this approach is also called Autonomous Semantics, see Culicover and Jackendoff, 2006). Composition must therefore be thought as a complex process of unification that integrates various types of information, including pragmatic information (pragmatic information here captures the information pertaining to the discourse, i.e. discourse context, as well as the extralinguistic context, i.e. utterance context).

Examples of circumstantial metonymy like (6) are regarded by Jackendoff as a case of enriched composition where the reference transfer is determined by utterance context. He proposes that the interpretation of (6) involves a principle of enriched composition, that is, an interpretation rule included in the speaker's knowledge that have the form in (15):

(15) Interpret an NP as [PERSON CONTEXTUALLY ASSOCIATED WITH NP].

From a formal point of view, this rule translates as a function (the coercing function). The result of applying the coercing function to the "shifted" noun fills the argument position of the predicate (the verb get), as in function composition.

Jackendoff argues that the pragmatic principles that determine reference transfer are integrated into the process of composition, that is, they immediately contribute to the interpretation of the sentence. He is led to these conclusions by considering linguistic evidence (Jackendoff, 1997: p. 54). Psycholinguistic studies focusing on the processing of logical metonymies have provided behavioral evidence for enriched composition (McElree et. al. 2001, Traxler, Pickering, McElree 2002, McElree et. al. 2006). Moreover, experimental evidence has been adduced in favor of a model in which multiple levels of information (including not just context but also world knowledge and multimodal information) conspire to determine the interpretation of sentences (one-step model of language interpretation, cf. Hagoort and Van Berkum 2007). As for “standard” metonymies, recent findings (Piñango et al. 2016) have suggested that reference transfers (which are treated as being derived by pragmatic processes) share an underlying processing mechanism with more-conventionalized

(24)

metonymies, in that the two types show overlapping patterns of cortical activity (see section 2.6).

Observations like those discussed above therefore imply that a theory of how people interpret metonymic sentences will have to employ a broader form of knowledge than this traditional notion of lexical knowledge. Generative Lexicon Theory is built upon the assumption that only a small part of our knowledge of a concept needs to be represented as the linguistic meaning, since only some aspects of the conceptual information are linguistically relevant. This is one of the main principles behind truth-conditional semantics.

Cognitive linguistic approaches (Croft and Cruse, 2004), on the other hand, assume that linguistic knowledge and extralinguistic knowledge are not distinct. It might be useful to adopt a cognitive linguistics perspective, since it gives us insights into the conceptual knowledge involved in the understanding of words. A cognitive linguistic theory of metonymy has been proposed by Langacker (Langacker 1993). Moreover, such an approach is the basis for alternative theories of compositionality focusing on the cognitive processes involved in language interpretation.

2.4 Metonymy and conceptual knowledge: a cognitive linguistic perspective

The claim made by cognitive linguists that there is no real distinction between linguistic and extralinguistic knowledge follows from a basic tenet of cognitive linguistic, namely that language is not an autonomous cognitive faculty. This view is that the abilities that we apply to language tasks are instances of more general cognitive abilities.

Another important consequence of this view of language is that linguistic knowledge has the same representation as other conceptual knowledge. This has implications for the description of meaning as it suggests that meanings are concepts, and hence are interconnected in our minds. Thus, cognitive linguists stress the conceptual structures associated with words.

(25)

The most influential theory of linguistic meaning developed within cognitive linguistics has been the model of frame semantics proposed by Fillmore (Fillmore 1985). In this theory words are associated with systems of concepts called frames. Frames can be defined as structured conceptual representations of our experience, made up of interrelated concepts.

In FrameNet, a lexical database based on Frame Semantics (Baker et al., 1998, https://framenet.icsi.berkeley.edu/fndrupal/) frames are viewed as containing information about: (a) the concepts involved in the frame (Frame Elements), which can be core (essential to the frame meaning) or non-core, and their semantic type; (b) the relationships among frames (frame-frame relations); and (c) a list of lexical items that evoke the frame (Lexical Units). (16) reports the description of the frame Ingestion:

(13) Definition: an Ingestor consumes food or drink (Ingestibles), which entails putting the Ingestibles in the mouth for delivery to the digestive system. This may include the use of an Instrument (The wolves DEVOURED the carcass completely).

Core FEs:

Ingestible: The Ingestibles are the entities that are being consumed by the Ingestor.

Ingestor: The Ingestor is the person eating or drinking (Semantic type: Sentient).

Non-core FEs:

Degree, Duration, Instrument, Manner, Means, Place, Purpose, Source, Time.

Frame-frame relations:

Inherits from: Ingest_substance, Manipulation. Uses: Cause_motion. Is used by: Food.

Lexical units (among others): consume.v, drink.v, eat.v, sip.v

The database architecture reflects the idea that words evoke conceptual structure. When we search for a word, FrameNet returns a list of frames, just as concepts activate frames in our minds. In fact, it is assumed that the meaning of a lexical item

(26)

can be defined only in relation to a background frame (it is “profiled against” the frame). Frames are evoked not only by verbs, but also by other parts of speech (nouns as well as adjectives and adverbs). There are also frames that are not evoked by lexical items; rather, words can activate a “perspectivalized” version of the frame.

This view abandons the assumption that each word is associated with a small piece of information, whether it is a real entity in the word or a concept in our minds; rather, the meaning of a word can be thought of as a node that allows access to our knowledge of the world, just as conceptual categories in cognitive psychology are viewed as a means of accessing information about an entity. This model of word meaning was proposed by Langacker (Langacker 1987), and it has implications for the understanding of sentences, since it requires the hearer to invoke this knowledge upon hearing an utterance in order to understand it.

Langacker claims that people usually employ this knowledge during comprehension in order to get a full specification of the meaning intended by the speaker. He also proposed a theory of metonymy based on these assumptions (Langacker 1993). He argues that when a word is used in a different context, we attend to those parts of its referent that are relevant, while other aspects are treated as irrelevant and therefore ignored. This approach can account for the highlighting of different facets as well as metonymies:

(17) a. Don’t throw the newspaper away! b. Have you read the newspaper today?

(18) They watch Netflix while traveling.

In (17 a,b), the word newspaper has different profiled facets. (17a) profiles the physical object, while (17b) profiles the semantic content of the newspaper. Likewise, (18) selects a different profile than the one usually expressed by the word Netflix, namely movies streaming on Netflix (these are referred to as “active zones”). As we will see in section 2.6, empirical support for Langacker’s theory of metonymy is provided by studies on the processing of metonymy (Frisson and Pickering, 1999, McElree, Frisson and Pickering, 2006, Bott, Rees and Frisson, 2016) indicating that

(27)

metonymic senses can be accessed just as quickly as literal senses, in line with the group of psycholinguistic models of the comprehension of figurative language classified as direct access models.

Langacker’s notion of active zones is not limited to metonymy. He sees virtually every instance of interpretation as involving the activation of different facets of meaning, since he claims that there is always a certain amount of indeterminacy in language. This view represents a major shift, in that metonymy is seen as central to grammar. Following Langacker’s view and proposals for direct access models, I treat metonymy processing not as a special case. Rather, I assume that it involves the same processes involved in “simple” composition, arguing that the additional meaning comes from the knowledge evoked by linguistic items. I will use the notion of Generalized Event Knowledge to characterize this knowledge.

2.5 Metonymy prediction in an expectation-based framework

Generalized Event Knowledge (GEK, McRae and Matsuki, 2009) is knowledge about prototypical features of real-world events such as their usual time course, the event participants and other entities typically involved. For example, people have knowledge about the event eat concerning typical things that are eaten (meat, pasta, pizza), eating instruments (fork, spoon), eating locations (dining room, restaurant, fast food) and so on. This knowledge is “generalized” because it is a schematization, hence abstracted away from one’s experience with events (which can come through first-hand participation as well as second-hand experience).

This notion is very similar to that of frame outlined in the previous section. The two notions are related by virtue of their linguistic manifestation. Linguistic expressions have been shown to determine the activation of GEK, in the same way as words are

(28)

assumed to evoke frames in Frame Semantics (McRae and Matsuki, 2009, for a review).

Psycholinguistic studies (Ferretti et al., 2001, McRae et al., 2005, Hare et al., 2009) found priming between words that co-occur in the same generalized event in lexical decision and animacy decision tasks. That is, they observed that event-based relations influence the response to the subsequent stimulus, suggesting that even individual words activate knowledge about generalized events. Figure 2.3 shows lexical priming relations observed in the experiments cited above. As can be seen, with a few exceptions, priming was obtained from event verbs to nouns referring to related entities – and vice versa – and from nouns to other nouns. All these findings point to a complex structure of relations between concepts, where nodes are defined by the co-occurrence in generalized events.

It was shown that the activation of Generalized Event Knowledge by linguistic expressions gives rise to expectations about incoming input during processing.

(29)

Altmann and Kamide (1999) observed that these predictions drive sentence processing, as they are used to restrict the domain within which subsequent reference will be made. Most importantly, they showed that expectations are generated early during processing. For example, the verb drink in a sentence like The woman will drink the wine triggers immediately expectations for typical objects (wine, water, cognac, …), restricting the possible identity of objects even before the verb object is encountered. Expectations are triggered not only by individual words, but also by combinations of concepts. Matsuki et. al. (2011) found that the reading time for sentences such as Donna used the shampoo to wash her filthy hair (where the object is typical with respect to the instrument-verb combination) shows a characteristic shortening by comparison with Donna used the shampoo to wash her filthy car (atypical condition).

These findings, among many others (Elman 2014, for a review), suggest that people’s knowledge of the selectional restrictions of words (the restrictions they impose on the contexts in which they occur) goes beyond categorical information such as semantic types and semantic roles. Instead, this knowledge contains detailed information about the preferred contexts of words. A model of lexical knowledge based on Generalized Event Knowledge characterizes this as a set of words organized into syntactic roles. For example, the verb arrest activates expectations about typical objects (e.g. criminal, burglar, robber) as well as about typical subjects (e.g. policeman, detective, officer). This is in keeping with the observation by Traxler et al. (2000), noting that expectations concern the occurrence of a word in a certain syntactic role (to be regarded as an approximation to the event role).

Drawing on findings that covert events in logical metonymy are retrieved from Generalized Event Knowledge (Zarcone, Pado, Lenci 2012, 2014, 2017), I hypothesize that such a view of selectional restrictions helps addressing the problems faced by a theory of metonymy. It eliminates the need for representations of the additional meaning like dot objects, since this meaning is assumed to follow directly from standard interpretation processes. In addition, this approach can make predictions about the nature of the shift process. In fact, it seems that much of the metonymic meaning comes from expectations about incoming input:

(30)

(19) Let’s drink a glass and get moving.

We can think of the metonymic interpretation of glass in (19) as having the typical objects of the verb drink as its source. The idea, then, is that glass in (19) refers to something that is typically drunk and is associated with the concept expressed by glass (a proposal for how to model this relation will be presented in chapter 5).

This approach shares with Pustejovsky’s type coercion the assumption that selectional restrictions play an important role in metonymy interpretation but differs from it in that it assumes that (a) the metonymic meaning does not result from an adjustment mechanism eventually triggered by a type-mismatch and subject to the availability of a shifting operator, but it is a regular consequence of processing operations and (b) such a mechanism does not retrieve the meaning components from the arguments’ lexical entries; rather, it exploits an underspecified representation (Frisson and Pickering, 1999, Frisson 2009), with which the knowledge activated by other words is combined.

This approach carries with it two further advantages. First, it makes the idea of selectional restrictions more concrete because, instead of representing them by abstract symbols, it sees them as words whose referent can be more easily identified. Second, it accounts for cases where the type shift is more subtle (that is, the shift does not involve basic semantic classes), without appealing to further subtypes. Examples of this include the following sentence pairs:

(20) a1. John is drawing a fish.

a2. John caught a fish.

b1. Suzie is painting a landscape.

b2. Suzie is admiring a landscape.

As observed by Asher (2011), the verbs in (20a1, b1) make their argument some sort of representation of the given object. This shift does not introduce an eventuality but another kind of entity, which can be defined from the typical objects of verbs.

(31)

To represent GEK, I will make use of semantic representations built from textual co-occurrence statistics, also referred to as distributional representations (Lenci 2018). Before presenting this account of word meaning, I will briefly explore work on metonymy that has been carried out in psycholinguistics.

2.6 Psycholinguistic studies of metonymy comprehension

The aim of psycholinguistic studies on the processing of metonymy was to establish whether a metonymic interpretation for a noun can be accessed just as quickly as a literal interpretation. Three broad classes of models of figurative language processing (Frisson and Pickering, 1999) have been identified based on whether the metonymic sense is retrieved prior to or after the retrieval of the metonymic sense (Figurative-First or Literal-(Figurative-First respectively) or people simultaneously access both interpretations (Parallel Model).

Frisson and Pickering (1999) conducted an eye-tracking experiment to test the predictions of these models. They recorded participants’ eye movements during processing of metonymic expressions (Place for Institution metonymies) in comparison with literal ones. The experimental evidence indicates that people may access metonymic interpretation rapidly under determinate circumstances, that is, when the critical expressions has a familiar metonymic sense. One of the items used was as follows:

(21) a. The minister had an argument with the embassy, but not much more could be done.

b. The minister had an argument with the cottage, but not much more could be done. (Frisson and Pickering, 1999)

The authors argued that the word embassy in (21a), unlike the word cottage in (21b), has a metonymic interpretation which is already known to the reader. Hence, the interpretation of a sentence like (21a) does not require the creation of a new sense, but rather a sense selection operation similar to that necessary for interpreting

(32)

homonyms. This account of metonymy processing is very much in line with Pustejovsky’s approach to metonymy and his notion of coercion by exploitation. Thus, the authors concluded that this result supported the parallel model of metonymy processing. They favored the underspecified version of the parallel model, that is, an account that assumes that a single underdetermined representation of the word meaning is activated initially.

Frisson and Pickering’s findings were replicated in other eye-tracking studies (McElree, Frisson, Pickering 2006, Frisson and Pickering 2007). McElree, Frisson, Pickering (2006) contrasted the processing of standard metonymies and logical metonymies with the processing of expression with a conventional interpretation such as The student welcomed Sartre while living in the south of France, where the expression Sartre is taken to refer to the writer. As in the previous work, they used familiar metonyms (Producer-for-product metonymies) in the standard metonym condition. They found that standard familiar metonyms were not more costly to interpret than conventional expressions while logical metonymies require extra processing effort. These results form part of a larger literature providing evidence that logical metonymies are associated with a greater processing demand (McElree et. al. 2001, Traxler, Pickering, McElree 2002, McElree et. al. 2006).

Frisson and Pickering (2007) compared the processing of familiar metonyms (e.g. Chopin in performed Chopin) versus unfamiliar metonyms (e.g. Henley in performed Henley). In contrast to previous work, participants were asked to read short texts, such as those in the following examples:

(22) a. Mr. Frost, one of my father’s friends, is a descendant of Chopin | Henley, I believe. He claimed that he first performed Chopin | Henley in public when he was not yet ten.

b. Mr. Frost won a major prize with a piano piece written by Chopin | Henley, I believe. He claimed that he first performed Chopin | Henley in public when he was not yet ten. (Frisson and Pickering, 2007)

(33)

Both texts contain a sentence where the word is used literally, followed by a sentence where it is used metonymically. However, they differ from one another in that in (22b) the context makes it clear that Henley is a composer, and thus supports the metonymic interpretation. Interestingly, the authors found that unfamiliar metonyms caused processing difficulty only in the Unsupported condition. Their findings therefore suggest that, in order for the reader to interpret a word metonymically, the familiarity of a metonymic sense does not necessarily have to be an intrinsic feature of the word. Still, the authors argued that contextual information alone cannot license a meaning extension. Rather, the novel interpretation is subject to the availability of a prolific metonymic rule (like the Producer-for-product rule).

Although their results reveal the possibility for a novel interpretation of a word to be processed online without any extra effort, the authors assume a strict distinction between linguistic knowledge and extra-linguistic knowledge, so that a novel sense is either part of the lexical entry or is the result of a lexical rule that takes place in the lexicon. Quite a different scenario emerges from the work of Piñango et al. (2016). In order to understand the respective contributions to establishing metonymic interpretations made by lexical expressions and by contextual information, they compared systematic metonymy (in the form of Producer-for-product metonymy) to circumstantial metonymy (in the form of “dish for customer” metonymy, see (5)) by examining their psycho- (via self-paced reading) and neurolinguistic behavior (via event-related potentials and magnetic resonance imaging). Event-related potentials are the brain response to cognitive events. Psycholinguistic studies on language processing typically measure them after each stimulus word.

All stimuli consisted of short texts (as in Frisson and Pickering, 2007), with the context always supporting metonymy. They observed that (a) circumstantial metonymies elicit extra reading cost some time after the critical word; (b) both metonymies show an N400 effect, while late positivity is observed only in the Circumstantial comparison; and (c) overlapping cortical regions are activated. As in previous works (Schumacher 2013, Weiland et. al. 2014), Late Positivities has been taken to reflect a “reconceptualization” operation (i.e., an adjustment demanded by the context).

(34)

The authors concluded that the results as a whole are not enough to explain a model in which two different processing mechanisms (lexico-semantic on the one hand and pragmatic on the other) underlie the two types of metonymy (two-mechanism view); rather, systematic metonymy and circumstantial metonymy share the same mechanism involving an interaction between lexical meaning and contextual content, while differences in reading times and event-related potentials reflect the degree of conventionalization.

These findings lead to a model (such as those illustrated in the previous section) in which on-line comprehension takes a great deal of the burden of lexical variation off the lexical knowledge, and additional meanings follow directly from the combination of lexical meaning and contextual constraints. I argue that a distributional model of sentence processing implementing this idea may become a useful tool for predicting metonymic meaning. Consistently with the conclusions in Piñango et al. (2016), such a model may be able to account for both systematic and circumstantial metonymy, since it stresses the dependence of metonymic meaning on context. Moreover, it will not deny the importance of notions such as familiarity and conventionalization in metonymy composition, but rather will model these as a measure of the strength of association between a lexical meaning and a potential metonymic sense. That is, I intend the difference between Chopin and Henley in (22) as a difference in their degree of association with the metonymic sense (pieces of Chopin is more common in language than pieces of Henley). I will present this model in chapter 5.

(35)

3 Metonymy and distributional semantics

3.1 Distributional representations and their properties

A theory of metonymy, as we have seen in chapter 2, must deal with some problems that arise from the use of formal representations of meaning.

First, they are associated with some meaning (whether it is an entity or a concept) since they are symbols, and they have trouble accounting for meaning variation. This is an important limitation, since meaning variation is very widespread in language. They are indeed categorical, and thus they imply a strict distinction between linguistic and extralinguistic knowledge, given that they are taken to reflect the organization of our lexical knowledge. This distinction, then, must be reflected in the systematic vs. circumstantial metonymy distinction, coming from the assumption that the two types of metonymy are determined by two different processing mechanisms (see section 2.6). The difficulty with this approach is in marking the boundary between “lexical” and pragmatically determined metonymies. This is problematic because the range of possible metonymies in language is vast (see chapter 4). Furthermore, the “two-mechanism view” is not consistent with recent evidence that suggests that all metonymic meanings arise from the interaction between the lexical meaning and the context.

Second, symbolic representations do not provide a simple way to integrate lexical items with general world knowledge. Results in psycholinguistic research (McRae and Matsuki 2009, Elman 2014) demonstrated that our lexical knowledge is not atomized, but rather concepts are embedded in a network of relations, and thus they constitute an important piece of evidence for the psychological reality of conceptual structures such as frames. These relations have been defined in terms of expectations about entities typically associated in an event representation. It has been acknowledged that individuals have a very extensive and detailed knowledge about them, and that this knowledge is dynamic and context-specific.

These aspects of formal semantic representations pose problems for NLP applications. Their tasks require that meaning representations are unambiguous and

(36)

not subjective. Moreover, it would be even more useful if we could easily and efficiently extract these representations from texts with no need for human labor.

Distributional representations (Lenci 2018) lend themselves to addressing these problems by representing word meanings as vectors embedded in a multi-dimensional vector space. Vector are ordered arrays of real numbers with a variable number of dimensions, where each number (called a component) can be identified by an index. Vectors are typically given lowercase names, while the components are identified by writing the name of the vector with the dimension number as a subscript, as in the following example (n is the number of dimensions):

(23) x = (x1, x2, x3, …, xn)

A vector can be thought of as a geometric object (a point or an arrow), with each component giving the coordinate along a different dimension of the Euclidean space. Figure 3.1 shows some word vectors embedded in a three-dimensional vector space. Each word in the example is represented with an arrow from the origin to the point in space.

In distributional semantics, the values calculated for vector components depend on the distributional properties of the word, that is, its co-occurrences with linguistic contexts. Distributional models of word meaning therefore assume that meaning is intimately related to distribution of words in context. This is the well-known distributional hypothesis, which has roots in the work of American structuralists and in the corpus linguistics tradition (Lenci 2008). Experiments (see below) have demonstrated the validity of the distributional hypothesis by successfully modeling human judgments of relatedness between words and the interpretation of linguistic expressions with distributional representations. Distributional semantics can therefore be seen not only as a useful tool for representing word meaning, but also as a theory of lexical meaning.

(37)

Distributional representations of words can be derived from large corpora of text by using distributional models, that is, computational methods for building word vectors from statistical co-occurrences of words. Distributional models may be broadly grouped into two classes:

(a) Classical distributional semantic models (e.g. Latent Semantic Analysis, Hyperspace Analogue of Language): these models use the basic method to capture distributional facts, which consists of extracting co-occurrence counts from a corpus and representing them with a co-occurrence matrix. A matrix is a two-dimensional array of numbers, where each element is identified by two indices. In co-occurrence matrices, elements are co-occurrence frequencies, while row-indices and columns-indices are target words and contexts respectively. Each element of the matrix is thus a measure of the frequency of occurrence of a word in a given context. Classical models estimate word similarity by using measures of similarity between vectors (the

Riferimenti

Documenti correlati

Further, the analysis evidenced that at this same time (after 14 months) organisms characterizing both levels of density (especially in terms of higher percentage covers; red

One compound can have a higher affinity for the stationary phase with a particular mobile phase and will present a higher retention time than another metabolite with a lower

Historically, the US antitrust systems had to deal with the dismantlement of trusts that held the American economy hostage, while the European competition law got assigned

The issue addressed in this work is the determination of the quenched critical point for the localization/delocalization phase transition of a polymer interacting with an

It is submitted that the probable intention of the drafters of the Restatement (Second) was to achieve a balance (in section 187 (2) (a)) between the older reasonable

Hogle, Shelley’s interplay with poetic language combined politi- cal intention and poetical forms is especially evident “In this more mature view, Shelley regards language, along

Descrivere il vissuto del lutto non è facile, perché si tratta di una reazione complessa, individuale e di gruppo, di fronte a una perdita percepita come non voluta, assurda

Quello che andrebbe eredita- to, dalla citata best practice del Catasto italiano è il modello di un workflow totalmente digitale che coinvolga le professioni in modo diffuso