• Non ci sono risultati.

Learning and dynamic choices under uncertainty: from weighted regret and rejoice to expected utility

N/A
N/A
Protected

Academic year: 2021

Condividi "Learning and dynamic choices under uncertainty: from weighted regret and rejoice to expected utility"

Copied!
26
0
0

Testo completo

(1)

ISSN 2282-6483

Learning and dynamic choices under

uncertainty: from weighted regret

and rejoice to expected utility

Fabio Zagonari

(2)

1

Learning and dynamic choices under uncertainty:

from weighted regret and rejoice to expected

utility

Fabio Zagonari

Dipartimento di Scienze Economiche, Università di Bologna, via Angherà 22, 47900 Rimini (Italy) Phone: 0039 0541 434135 Fax: 0039 0541 434120 E-mail: fabio.zagonari@unibo.it

Abstract

This paper identifies the globally stable conditions under which an individual facing the same choice in many subsequent times learns to behave as prescribed by the expected-utility model. To do so, the analysis moves from the relevant behavioural models suggested by psychology (i.e., weighted probabilities applied to regret and rejoice theory), and by updating probability estimations and outcome preferences according to the learning models suggested by neuroscience (i.e., adaptive learning aimed at reducing surprises), and analogous to Bayesian updating. The search context is derived from experimental economics, whereas the learning framework is borrowed from theoretical economics. Analytical results show that obstinate and lucky individuals are better off in the short-run (i.e., a low density of events in the reference period), but they do not learn, and this is true to a greater extent in a simple context; in contrast, reactive and unlucky individuals are worse off in the short-run, but they learn and are better off in the long-run (i.e., all individuals are equally lucky or unlucky), and this is true to a greater extent in a complex context. The expected-utility model explains real behaviours in the long-run whenever unlucky events are more likely than lucky events.

Keywords

Learning; weighted probabilities; regret; rejoice; repeated choices; expected utility

JEL classification

(3)

2

1. Introduction

Three main econometric analyses of dynamic choices under uncertainty in real or experimental contexts have been performed. Zagonari (1995) applied ordinary least-squares estimators to panel data (104 individuals) from a real rural daily-rated labour market in India, and showed that the expected-utility model performed about as well with linear and non-linear subjective probabilities for representing the decision-making process of women who were repeatedly dealing with the same kind of uncertainty in the same kind of framework. Moreover, an econometric analysis based on

experimental data from two samples of subjects sequentially facing 90 similar pair-wise choice

problems showed that the random preference model (which assumes uncertainty about preferences) is more successful than the Fechner model (which depicts uncertainty at the calculation stage) and the constant error model (which assumes uncertainty about actions), and the estimated random preference model tends towards the same results as in the expected-utility model as the number of choices tends to infinity (Loomes et al., 2002). Finally, Ben-Elia et al. (2013) applied a mixed multinomial logit model to experimental data (49 participants) from simple route choices involving different levels of time variability, and showed that experienced regret explains travel behaviour better than anticipated regret, and that learning seems to mitigate the amplitude of regret emotions, although to a smaller extent in more risky contexts.

Note that many experimental studies can be found that deal with repeated dynamic decision-making under uncertainty. For recent examples, see Hopfensitz & Van Winden (2008) in an investing context; Norman et al. (2012), Huang & Hutchinson (2013) in a shopping context; and Di Cagno et al. (2014) in a bidding context.

The purpose of the present study was to identify circumstances, in terms of observable conditions, in which the explanatory power of the expected-utility model for observed behaviours increases over time in repeated decision-making. Section 2 provides the theoretical economic background for such settings. To identify these circumstances, Section 3 borrows the search context from experimental economics. In particular, offers are independently drawn from a continuous distribution, which is either a uniform or an exponential distribution in [0,n]; the searcher knows these probability distributions and has a non-binding budget constraint; one offer can be presented each time and the cost of one search action is constant. Note that in this context, the searcher cannot accept a previously rejected offer.

Section 4 refers to psychological studies that attempted to identify the initial biases with respect to standard decision-making (i.e., unbiased expected-utility theory) in estimating probabilities and realisations. In particular, Loomes and Sugden (1982) provide a source for representing regret and rejoice, and Prelec (1998) depicts under-estimation of small probabilities and over-estimation of large probabilities. Note that these formulas were chosen for their simplicity (alternatively, the analysis could have used the two-parameter formulation by Lattimore et al., 1992) and their consistency with a dynamic context (alternatively, the analysis could have used the rank-dependent expected-utility theory of Quiggin, 1982 or the cumulative prospect theory of Tversky and Kahneman, 1992).

Section 5 borrows the learning strategy that has been described in neurological studies. In particular, at each point in time, individuals adapt their probability estimations to reduce their biases in estimating the probabilities of experienced events, and satisfaction evaluations to reduce their regret and rejoice perceived from experienced outcomes. The Appendix shows how this learning mechanism is related to the Bayesian updating rule. Note that the present search context differs from one-armed-bandit models, because individuals face a single stochastic machine with a known probability distribution, so no exploration is needed, whereas it is similar to reinforced learning, because individuals refer to optimal decision-making, although no long-run optimal behaviour is assumed, and thus, no long-term convergence is ensured.

Section 6 considers decision-makers who apply the reservation-value rule (i.e., they stop searching at a given time if the marginal cost of an additional search action exceeds its expected marginal benefit), consider the expected regret and rejoice, and potentially over-weight or under-weight

(4)

3

probabilities of the outcomes (i.e., an optimist over-estimates and a pessimist under-estimates the probability of lucky events).

Section 7 presents two main results of this analysis. Procedurally rational individuals (i.e., those who attempt to maximise their utility with imperfect information) could be better off in the short-run (i.e., few choices per reference time, or a low density of events in the reference period) by adopting ex ante a weighted regret and rejoice model (as opposed to the expected-utility model) if they turn out to be lucky ex post, this is true to a larger extent if they are obstinate (as opposed to reactive), although they will not learn, and to a greater extent in a simple world (here represented by a uniform distribution of events). In contrast, all individuals will learn to behave fully rationally (i.e., to maximise utility based on perfect information) according to the expected-utility model in the long-run (i.e., all individuals are equally lucky or unlucky, or the law of large numbers applies) in a complex world, and they will be better off than those who adopt a weighted regret and rejoice model, provided that unlucky events are more likely than lucky events. In other words, individuals learn over time to behave more rationally and optimally, on average, although this behaviour might not be optimal for a subsequent time. Thus, the model presented in this paper accounts for the econometric evidence obtained by Zagonari (1995), Loomes et al. (2002), and Ben-Elia et al. (2013).

2. Background

Three main economic analyses on convergence to the expected-utility model in a context of dynamic choices under uncertainty have been performed. The present study relates to these analyses as follows. First, like Agastya & Slinko (2015):

 Individuals are considered to be procedurally rational even if not fully rational.

 They do not care about the chronological order in which realisations have occurred (i.e., exchangeability).

 Realisations are not necessarily continuous.

 Preferences for all actions do not change if, in the current period, all actions receive the same (and expected) payoff (i.e., consistency).

 Path-dependence in preferences for actions across time is allowed (i.e., the history of decision-makers, or their past experiences, affect their current choices as they mould their preferences).

 Monetary rewards represent an arbitrary (but finite) set.

 The set of actions is finite.

 Preferences are based on rewards rather than on lotteries, as in Mengel & Rivas (2012).

 Preferences for actions are revised to account for the individual’s history of outcomes.

 A preference relationship for any set of actions is a complete, transitive, and reflexive ordering of elements.

 The decision-maker observes the rewards of all actions, including those not chosen by the decision-maker, as in Easley & Rustichini (1999); this assumption is reasonable in a social context such as portfolio choice or labour supply, but not always reasonable in individual contexts such as route choice and consumption demand.

However, the decision-maker prefers one action over another if the former shows a larger average utility with respect to the empirical distribution of the historical rewards up to a given time; that is, the decision-maker remembers the historical frequencies of all rewards linked to all actions, and chooses actions that represent the best response to these frequencies. This assumption is reasonable in an analytical context such as portfolio choice, but not in intuitive contexts such as labour supply, route choice, or consumption demand. In addition, results by Agastya & Slinko (2015) suggest that a backward-looking (fictitious player) decision-maker will behave as a forward-looking (maximiser of expected utility) decision-maker if the stochastic process of events (rewards) is a (restrictively) exchangeable sequence (i.e., any permutation of events has the same probability distribution as the original sequence.

(5)

4

As in Oyarzun & Sarin (2013), individuals update their behaviour according to what has been observed. However, individuals show an increasing probability to choose the action suggested by the expected-utility model: this assumption is not suitable in intuitive contexts, in which an action is either chosen or not chosen. In addition, this research suggests that the learning rule converges with high probability on the set of actions that dominate all others according to a second-order stochastic criterion only if individuals are sensitive to second-order stochastic dominance in all states (specific

conditions) and if an individual’s learning is slow in response to their experience (unobservable conditions).

As in Gilboa & Schmeidler (1996), individuals learn by starting from a specific decision model other than expected utility. However, the set of conceivable cases is necessarily assumed to be infinite: this assumption is not suitable in intuitive contexts, in which infinity is an implausible concept to be applied. In addition, results by Gilboa & Schmeidler (1996) suggest that the decision-maker will choose actions that maximise their expected utility only if the aspiration level is (bizarrely) adjusted over time, both realistically (i.e., it relates to the average of its previous level and the best average performance so far) and ambitiously (i.e., it is higher than the maximum average performance sufficiently often to support this ambition).

In summary, this paper will combine in a single framework the same determinants identified by the economic literature (e.g., realised outcomes, learning rates) by disregarding the most implausible assumptions (i.e., that decision-makers remember the frequencies of all historical rewards linked to all actions, individuals choose probabilities to take an action rather than actions, and the set of conceivable cases is infinite) by adding observable insights, based on explicable assumptions, in realistic contexts (see Section 8 for details).

3. The search context

Many search contexts have been suggested by experimental economics; for recent examples, see Duffy and Puzzello (2014) and Kloosterman (2016). However, these contexts differ based on three main features. First, the situation that is referred to differs. For example, sometimes subjects are provided with a random offer, and choose to take that offer (Hey, 1987); sometimes searchers draw prices from a known distribution, choose the price at which they are willing to buy a fictitious commodity, and resell the commodity to the experimenter at a predetermined price (Kogut, 1990); and sometimes subjects receive an article, ask for one or more bids, and decide whether to sell the article to the experimenter for the highest bid so far (Sonnemans, 1998). Second, the order of bids that is given differs. Sometimes searchers face random sequences of offers, as in the case of Hey (1987) and Kogut (1990). Sometimes subjects face the same pre-selected sequences of offers; this is the case in the study by Sonnemans (1998), in both his experiment 1 (in which any search strategy was allowed) and his experiment 2 (in which the reservation-value rule was enforced). Third, the

information set that is provided differs. In most experimental studies, subjects are sequentially

presented with random offers from a known distribution. This was a normal distribution in Hey (1987), but a uniform distribution in Kogut (1990) and Sonnemans (1998).

In particular, the present study will refer to the following search context. Offers are independently drawn from a known continuous distribution, in which two alternative scenarios are considered: a uniform distribution in [0,n] (i.e., p = 1/n) and an exponential distribution in [0,n] (i.e., p =  exp[-

x]). I define those individuals who correctly estimate probabilities as realists, and those who

over-estimate and under-over-estimate the likelihood of bids with lower probability and higher pay-off as optimists and pessimists, respectively. The searcher has a non-binding budget constraint so that an unlimited number of searches can be assumed. One offer is presented at each point in time (t) and the cost of one search action is constant (c) so that at each time t, individuals must decide whether to accept the last bid r(t) or to reject it; in the former case, they receive r(t) – c, but in the latter case, they pay c. The searcher cannot accept a previously rejected offer; thus, a time horizon of one period ahead is assumed.

(6)

5

4. Initial biases

Many biases with respect to the expected-utility decision-making have been suggested by psychological studies. For recent examples, see Glaser et al. (2012), Reyna (2012), Sherstyuk et al. (2013), Romm (2014), and Ben-Elia and Avineri (2015). However, the most relevant studies in our context refer to the biases in estimating probabilities of outcomes (section 4.1) and in predicting satisfaction from outcomes (section 4.2).

4.1. Weighted probabilities

Several behavioural probability-weighting functions have been suggested to account for observed choices that are inconsistent with the subjective expected-utility model. For recent examples, see Koster & Verhoef (2012), Riddel (2012), Capra et al. (2013), Charupat et al. (2013), Hensher et al. (2013), Petrova et al. (2014), Huang et al. (2015), Blavatskyy (2016), Boos et al. (2016), Keskin (2016), and Traczyk & Fulawka (2016). For the sake of simplicity, I will apply the probability-weighting function derived axiomatically by Prelec (1998), with weights (w) normalized over probabilities so that w(0) = 0 and w(1) = 1:

𝑤𝑢[𝑝(𝑦)] = exp[−[− ln(1 𝑛⁄ )]βu(𝑡)]

𝑤𝑒[𝑝(𝑦)] = exp [−[−ln(λ exp−λ y)]βe(𝑡)]

where wu represents the weight for a uniform distribution, we represents the weight for an

exponential distribution, y represents a given outcome, n represents the largest outcome, λ represents the parameter characterising the exponential distribution, βu(t) and βe(t) represent the bias

coefficients for uniform and exponential distributions (respectively) at time t, such that β(t) > 1 represents the degree of optimism, and β(t) < 1 represents the degree of pessimism. In other words, I will assume that probabilities are processed non-linearly and that the attention given to an outcome does not depend on its ranking with respect to other outcomes but only on its probability. Note that the assumption of the absence of two distinct domains, together with evidence by Cavagnaro et al. (2013) and Chechile & Barch (2013), which show that the (second) elevation parameter is domain-dependent, will justify the use of a one-parameter variant of the probability-weighting function in the case of multi-outcome (e.g., monetary) lotteries. Without loss of generality, I will fix n at 10 and approximate λ as 1, where 1/λ represents the mean of an exponential distribution, which is 0.9995 if n is 10. This lets me compare the uniform and exponential contexts by focusing on other crucial parameters.

4.2. Regret and rejoice

Several behavioural utility models expressed as a function of outcomes have been applied in the literature to explain observed choices that are inconsistent with the subjective expected-utility model. For recent examples, see Bracha & Brown (2012), Jouini & Napp (2012), Wolpert & Leslie (2012), Chorus & Bierlaire (2013), Krähmer & Stone (2013), Dekker (2014), Leong & Hensher (2014), Riella & Teper (2014), Qu (2015), and Charles-Cadogan (2016). For the sake of simplicity, I will assume a constant absolute risk aversion utility model as a function of outcomes:

𝑢(𝑦, 𝑡) = 1 − exp−α(𝑡) 𝑦− ∫ 𝑤[𝑝(𝑥)][1 − exp−γ(𝑡) 𝑥]𝑑𝑥 + ∫ 𝑤[𝑝(𝑥)][1 − exp𝑦 −δ(𝑡) 𝑥]𝑑𝑥 0

𝑛

𝑦

where α(t), γ(t), and δ(t) represent the degrees of risk aversion, regret, and rejoice at time t, respectively, and where w[p(x)] = wu[p(x)] and w[p(x)] = we[p(x)] in the cases of uniform and

exponential distributions, respectively. In other words, I will assume that individuals do not perceive outcomes as gains and losses, but perceive the regret from having taken a decision that ex

post turns out to be the wrong decision and the rejoice from having taken a decision that ex post

turns out to be the right decision. Note that the perceived regret for missing a better bid and the perceived rejoice for avoiding worse bids are weighted according to the (possibly biased) estimated probabilities of these events. For the sake of simplicity, I will assume that γ(t) = δ(t) for each t. This

(7)

6

lets me de-emphasize the learning impacts of unlucky events, without focusing on bad or good events.

5. The learning mechanism

Many alternative learning mechanisms have been suggested for estimating probabilities. For recent examples, see Fennell & Baddeley (2012), Hayashi (2012), Di Caprio et al. (2014), Balcombe & Frazer (2015), and Bisière at al. (2015). However, neuroscience (e.g., Schwartenbeck et al., 2013) suggests that individuals aim to reduce surprises. Here, individuals are assumed to compare the probability of what happened according to the perceived distribution of outcomes with the probability according to the known distribution, and to update the bias parameter β to reduce the difference between these two probabilities, in both the uniform and exponential contexts:

βu(𝑡 + 1) = βu(𝑡) – εu{1 − 𝑤𝑢[𝑝(𝑦)]/(1 𝑛)⁄ }

βe(𝑡 + 1) = βe(𝑡) – εe{1 − 𝑤𝑒[𝑝(𝑦)]/(λ exp−λ 𝑦)}

Where εu and εe represent the learning rates (i.e., the inverse of the reliabilities attached to the

perceived distributions of outcomes) in the uniform and exponential contexts, respectively, and y represents the experienced outcome. Note that the learning rule suggested here is formally

analogous to Bayesian updating; that is, p[Hi|E] = p[E|Hi] p[Hi]/p[E], where p[Hi|E] is the posterior

probability distribution over the space of hypothesis Hi for event E, p[E|Hi] is the likelihood of

observing event E according to generative distribution Hi, p[Hi] is the prior probability distribution

over the space of hypothesis Hi, and p[E] is the a priori probability of the event E (Eberhardt and

Danks, 2011). Indeed, the learning rule suggested here boils down to the Bayesian updating if εu =

{ln[1/n](2βu0 – 1) – ln[p[Hi]])} / {ln[1/n](1 – (1/n)βu0–1)} and εe = {y[2βe0 – 1) – ln[p[Hi]])} / {y(1 –

exp[–y]βe0–1)}, where βu0 = βu(0) and βe0 = βe(0) in the uniform and exponential contexts,

respectively (see the Appendix for details). In other words, individuals are told how the world works in the long run (i.e., probability distributions on average), but do not think that the law of large numbers (i.e., the tendency of observed frequencies to approach the statistical expectation with large sample sizes) should be applied in calculating probabilities in the short-run (i.e., few choices) (Li et al., 2013). Actually, with few experiences, individuals could have events that differ from the average and thereby reinforce their bias in the short-run (Lu et al., 2016). See Van De Kuilen (2009) for findings based on experimental choices consistent with convergence towards linearity of the subjective probability-weighting function.

Many alternative learning mechanisms have been suggested for discovering preferences. For recent examples, see Friston at al. (2013), Hinvest et al. (2014), Shen et al. (2014), and Turi et al. (2015). However, neuroscience (e.g., Rutledge et al., 2015) suggests that individuals aim to reduce surprises. Individuals are here assumed to compare the ex post satisfaction obtained from experiencing an accepted bid, regret from missing better bids, and rejoice from avoiding worse bids, with the utility without regret and rejoice feelings, and to update the bias parameter γ to reduce this difference:

γ(𝑡 + 1) = γ(𝑡)– ζ [1 − 𝑢(𝑦, 𝑡)/𝑈(𝑦)] with

𝑈(𝑦) = 1 − exp−𝐴 𝑦

where ζ represents the learning rate (i.e., the inverse of the reliability attached to the assumed utility) at each time t and A represents the actual degree of risk aversion. Note that u(t) and γ will be replaced by uu(t) and γu, ue(t) and γe, uub(t) and γub, and ueb(t) and γeb in the following sections to

depict unbiased uniform, unbiased exponential, biased uniform, and biased exponential contexts, respectively. For the sake of simplicity, I will assume α(t) = α for each t, as in Buckert et al.’s (2014) discussion of learning risk aversion. This lets me consider a potential gap between the perceived (α) and actual (A) degree of risk aversion by focusing on the dynamics of regret and rejoice. Note that individuals do not have meta-preferences (i.e., I would like to be an expected-utility maker, but I am not, and it takes me effort and time to become this type of decision-maker). Otherwise, people would decide as an expected-utility decision-maker from the beginning.

(8)

7

Here, decision-makers are told how to behave in the long run (i.e., the expected-utility model, on average), but do not think that the same rules should be applied in the short run (i.e., they anticipate regret and rejoice from their choices; see Horvath and Sinha, 2015). Actually, if decision-makers are ex ante optimistic and ex post lucky, they could be better off by following a weighted regret and rejoice model so to reinforce their bias in the short-run (Chorus, 2014). See Matsumoto & Spence (2016) for findings based on reported expectations for real online textbook prices from a survey of 1224 college students, which is consistent with learning in beliefs about price distribution. Without loss of generality, I will assume A = 1, but see Analytis et al. (2014) for a discussion of multi-attribute alternatives. This amounts to a normalisation with respect to the utility obtained by an expected-utility decision-maker. Note that replacing these learning mechanisms based on comparing ratios (i.e., u(y,t)/U(y)) with learning mechanisms based on minimising squares (i.e., [u(y,t) - U(y)]2) do not alter the results qualitatively.

6. The decision rule

Consider an individual who makes decisions at each time t by accounting for the level of satisfaction derived from experiencing a given outcome (y), together with the regret due to missing better bids (from y to n) and the rejoice due to avoiding worse bids (from 0 to y) (Bault et al., 2016). Individuals are procedurally rational (see Bordley and Uberti, 2015), by myopically applying the reservation rule at each time t (Shoji & Kanehiro, 2012). That is, they stop searching at a given time if the marginal cost of an additional search action exceeds its expected marginal benefit. In particular, individuals will accept a bid at time t if and only if it is larger than r(t), where r*(t) solves the following implicit equation:

𝑢[𝑟(𝑡), 𝑡] = ∫ 𝑤(𝑝)[1 − exp𝑛 −α [𝑟(𝑡)−𝑐]]𝑑𝑟 0 − ∫ 𝑤(𝑝)[1 − exp−γ(𝑡) 𝑟(𝑡)]𝑑𝑥 + ∫𝑟(𝑡)𝑤(𝑝)[1 − exp−γ(𝑡) 𝑟(𝑡)]𝑑𝑥 0 𝑛 𝑟(𝑡)

Where w(p) = wu(p) and w(p) = we(p) in the cases of uniform and exponential distributions,

respectively.

Individuals will then draw a bid y by accepting the bid if it is larger than or equal to the reservation bid [i.e., y ≥ r*(t)] and thereby perceiving u(y, t), and by rejecting the bid if it is smaller than the reservation bid [i.e., y < r*(t)] and thereby perceiving u(–c, t).

Having experienced these outcomes, individuals then update their preference bias parameter (γ) in order to reduce the perceived regret or rejoice. Contextually, having observed an event y whose likelihood differs according to the estimated probability distribution and the known probability distribution, individuals update their probability bias parameter (β) in order to reduce this difference. Note that individuals are not fully rational in the short-run (i.e., they obtain something different from what they expected), unless u(y,t) = U(y) at time t, although they could choose

optimally in the short-run if they get more than they would obtain as suggested by the

expected-utility model in the long-run.

Finally, individuals will apply the updated reservation rule at time t+1.

Note that the assumption that the searcher can never accept a previously rejected offer implies that the reservation bid for a risk-neutral individual (α = 0) who is not affected by regret or rejoice (γ = 0) is he expected value, if there is no cost of search (c = 0): it will be 5 in the case of a uniform distribution and 1 in the case of an exponential distribution. Moreover, the reservation bid depends on how events affected the updating of the preferences. Cubitt et al. (2012) discuss the implications of a lack of evidence on the separability principle, in which behaviour is independent of history and of unreachable eventualities (i.e. events that are expected to be impossible to occur). Finally, uncertainty in the bid context might be (implicitly or explicitly) reversed with respect to labour or route contexts; this inversion is irrelevant, since individuals are procedural rational, and they do not change their reservation bid once the bid has been drawn.

(9)

8

7. Results

In this section, I will look for the existence of globally attractive fixed points for β(t) and γ(t) in alternative contexts, where individuals reach the equilibria β* = 1 and γ* = 0 through small steps from below if the initial values of β0 < 1 and γ0 < 0, and are thus smaller than the long-run optimal

value, and if they are characterized by a large inertia (i.e., small learning rates ε and ζ); individuals learn through progressively smaller jumps above and below the long-run optimal value if they do not attach a sufficiently large reliability to the currently assumed degree of realism or of regret and rejoice (i.e., large learning rates ε and ζ); individuals reach the equilibrium through small steps from above if the initial values of β0 > 1 and γ0 > 0, and are thus are larger than the long-run optimal

value, and if they are characterized by a large inertia (i.e., small learning rates ε and ζ). Specifically, subsections 7.1 and 7.2 will show when individuals learn unbiased probabilities in the case of uniform and exponential probabilities, respectively, and then subsections 7.3 to 7.6 will compare learning conditions in four alternative scenarios (i.e., uniform probabilities without bias, exponential probabilities without bias, uniform probabilities with bias, and exponential probabilities with bias, respectively). This will reveal the most relevant variables in each learning process.

7.1. Learning unbiased probabilities with a uniform distribution

Consider an individual who over-estimates or under-estimates probabilities of outcomes (i.e., βu ≠

1), but who updates their bias at separate time intervals (i.e., in a discrete time t) according to the following rule:

βu(𝑡 + 1) = βu(𝑡) – εu[1 − 𝑛(1/𝑛)βu(𝑡)]

where εu is the learning rate.

Result 1

In the case of a uniform distribution of bids, individuals learn unbiased probabilities if their learning rate is small enough: ε ≤ 0.868.

Proof:

At βu* = 1, -1 < ∂βu(t+1)/∂βu(t) = 1 + εu ln(1/10) < 1.

Figure 1. Phase diagram for βu(t) with εu = 0.5. The increasing linear line represents y = x.

This result is illustrated graphically in Figure 1. Thus, βu* = 1 is globally stable, and the most

commonly observed biases (i.e., βu > 1) favour the learning process. If the previous learning rule is

analysed in a continuous time t (i.e., individuals uninterruptedly update their bias), we obtain the following dynamics of βu(t):

βu(𝑡 + 1) = − ln[10 + (1/10)𝑡 εu – ln[−10 + 10^βu0]/ln(10)]/ln(1/10)

Note that time (t) is multiplied by the learning rate εu: in other words, the learning period (i.e., the

learning rate multiplied by the time period) rather than the learning rate (εu) or time (t) taken

0 1 2 3 4 5 beu t 1 2 3 4 5 beu t 1

(10)

9

separately turns out to be crucial. The following dynamics of βu(t), which depend on the initial

values βu0 and learning period ηu (i.e., ηu = εu × t), will be used:

β𝑢(𝑡) = 1 + (βu0− 1) exp−ηu

Where βu0 depends on the individual’s previous experiences.

7.2. Learning unbiased probabilities with an exponential distribution

Consider an individual who over-estimates or under-estimates probabilities of outcomes (i.e., βe ≠

1), but who updates their bias according to the following rule in a discrete time t:

βe(𝑡 + 1) = βe(𝑡) – εe{1 − (exp[−[− ln(λ exp−λ 𝑥)]βe(𝑡)])/(λ exp−λ 𝑥)}

where εe is the learning rate.

Result 2

In the case of an exponential distribution of bids, an individual learns unbiased probabilities if their learning rate is small enough: εe ≤ 2/y.

Proof

At βe* = 1, –1 < ∂βe(t+1)/∂βe(t) = 1–y εe < 1.

Figure 2. Phase diagram for βe(t) with εe= 0.5 and y = 5. The increasing linear line represents y = x.

This result is illustrated graphically in Figure 2. Comparing Result 1 for a uniform distribution with Result 2 for an exponential distribution suggests that stability conditions for the exponential distribution are more demanding than those for the uniform distribution whenever y > 2.30. Thus, βe* = 1 is globally stable, and the most commonly observed biases (i.e., βe > 1) again favour the

learning process. If the previous learning process is analysed in a continuous time t, we obtain the following dynamics of βe(t):

βe(𝑡) = ln[exp𝑦+ (exp−𝑦)𝑡 εe+βe0]/𝑦

Note that time t is again multiplied by the learning rate εe. The following dynamics of βe(t), which

depend on the initial values of βe0 and the learning period ηe (i.e., ηe = εe t) will be used below:

βe(𝑡) = 1 + (βe0− 1) exp−ηe

Where βe0 depends on the individual’s previous experiences.

7.3. Learning expected utility without bias in estimating a uniform

probability distribution

Consider an individual who perceives regret and rejoice (i.e., γu ≠ 0) in accepting or rejecting bids

that are uniformly distributed, but who updates their bias according to the following rule in a discrete time t (e.g., a daily labour supply without personal experiences so that βu = 1):

γu(𝑡 + 1) = γu(𝑡) – ζu[1 − 𝑢𝑢(𝑦, 𝑡)/𝑈(𝑦)] Where 0 1 2 3 4 5 bee t 1 2 3 4 5 bee t 1

(11)

10

𝑢𝑢(𝑦, 𝑡) = 1 − exp−α 𝑦− ∫ (1 𝑛⁄ )[1 − exp−γu(𝑡) 𝑥]𝑑𝑥 + ∫ (1 𝑛𝑦 ⁄ )[1 − exp−γu(𝑡) 𝑥]𝑑𝑥 0

𝑛

𝑦

𝑈(𝑦) = 1 − exp−𝐴 𝑦

and ζu is the learning rate.

Result 3

Provided that an individual knows their absolute risk aversion (i.e., α = A), they learn the expected utility (i.e., γu = 0) if their learning rate and experienced outcomes are small enough:

ζu ≤ 20(1 − exp−𝑦)/(50 − 𝑦2)

y ≤ 5√2

Proof

If and only if α = 1, γu* = 0 is an equilibrium for ζu ≠ 0. At γu* = 0, ∂γu(t+1)/∂γu(t) =

[−10+exp𝑦ζ

u(10 + (−50 + 𝑦2))] [10(−1 + exp⁄ 𝑦)].

Figure 3. Phase diagram for γu with ζu = 1 and y = 5. The increasing linear line represents y = x.

This result is illustrated graphically in Figure 3. Thus, even if an individual is not affected by biases in probability assessment, and probability distributions are simple (i.e., uniform), they learn the expected utility provided a single aspect of their psychological behaviour must be uncovered. Thus, γu* = 0 is globally stable, and the most commonly observed biases (i.e., γu > 0) again favour the

learning process.

Figure 4. Contour plot for γu = 0 stability conditions as a function of ζu and y. Numerical values on each line

represent ∂γu(t+1)/∂γu(t) in [-1,1]. 0.5 0.5 1.0 1.5 2.0 gau t 1 2 3 4 gau t 1

(12)

11

Figure 4 provides stability conditions as a function of the learning rate ζu and the observed outcome

y, where these conditions must be met for each time t during the whole learning process.

Thus, everybody must be moderately unlucky (say, y < 6/10), but an individual who is less reactive (say, ζu < 1) must be more unlucky. If the previous learning process is analysed in a continuous time

t, we obtain the following dynamics of γu(t):

γu(𝑡) = IF ∫ exp10𝑧+𝑦 𝑧𝑧

2exp2𝑦+10𝑧[2exp10𝑧− exp𝑦𝑧+ 2exp(𝑦+10)𝑧(2𝑦 − 11)]𝑑𝑧 #

1

Where IF means the inverse function, and # = exp

−𝑦𝑡ζ u

10(−1 + exp𝑦)+ γu0

Where γu0 = γu(0). Thus, the learning period (here θu = ζu × t) is again crucial. In case of

convergence:

γu(𝑡) = γu0 exp−θu

Where γu0 depends on the individual’s previous experiences. In particular, in the case of

convergence, the difference between perceived regret and rejoice and perceived utility (i.e., uu – U),

which depend on the experienced outcome y and the learning time θu, is depicted in Figure 5.

Thus, in the short-run (i.e., few experiences), individuals who are pretty lucky (say, y > 6/10) are better off if they adopt an expected regret and rejoice approach than if they adopt an expected-utility approach, and the improvement will be greater for individuals who are less reactive. Note that θu

may increase either because individuals are more reactive (i.e., a larger ζu) or because they have

additional experiences (i.e., a larger t). However, the mean of the difference between uu and U over

all y values (Du) is given by:

𝐷u = ∫ ( 1 𝑛) (𝑢𝑢− 𝑈) 𝑛 0 𝑑𝑦 = 1

50exp−10exp [−θu]+θu(−5 − 5exp10exp[−θu]− expθu+ exp10exp[−θu]+θu) < 0 for each θu ≥ 0, where this difference tends to 0 if θu tends to infinity, the minimum Du is -0.14 at θu

= 1.25 and its intercept is -0.08 at θu = 0. Thus, in the long-run (i.e., many experiences), where

there are no lucky or unlucky individuals (i.e., the law of large numbers applies), individuals are better off if they are reactive.

Figure 5. Gains from uu (i.e., uu – U) as a function of θu and y if γu0 = 1. Numerical values on each line represent

(13)

12

Therefore, if individuals are reactive (say, ζu > 1), they gain less in the short-run if they are lucky

(say, y > 6/10), although they learn, and they lose less in the long-run once they have learned. In contrast, individuals who are obstinate (say, ζu < 1) gain more in the short-run if they are lucky (say,

y > 6/10), although they do not learn, and they lose more in the long-run. In other words, in the case

of repeated unlucky events, reactive individuals will be better off than obstinate individuals, because the former will learn more quickly to behave according to the expected-utility approach.

7.4. Learning expected utility without bias in estimating an exponential

probability distribution

Consider an individual who perceives regret and rejoice (i.e., γe ≠ 0) in accepting or rejecting bids

in the case with an exponential distribution, but who updates their bias according to the following rule in a discrete time t (e.g., a daily portfolio choice without personal experiences, so that βe = 1):

γe(𝑡 + 1) = γe(𝑡) – ζe[1 − 𝑢𝑒(𝑦, 𝑡)/𝑈(𝑦)]

Where

𝑢𝑒(𝑦, 𝑡) = 1 − exp−α 𝑦− ∫ (λ exp−λ 𝑥)[1 − exp−γ𝑒(𝑡) 𝑥]𝑑𝑥 + ∫ (λ exp−λ 𝑥)[1 − exp−γe(𝑡) 𝑥]𝑑𝑥 𝑦

0 𝑛

𝑦

𝑈(𝑦) = 1 − exp−𝐴 𝑦

and ζe is the learning rate.

Result 4

Provided that the individual knows their absolute risk aversion (i.e., α = A), they learn the expected utility (i.e., γe = 0) if their learning rate and experienced outcomes are small enough:

ζe ≤ − 2exp10(−1 + exp𝑦)

11exp𝑦+ exp10+𝑦− 2exp10(1 + 𝑦)

𝑦 ≤ 1 − ProductLog[−11 + exp10 2exp11 ]

Proof

If and only if α = 1, γe* = 0 is an equilibrium for ζe ≠ 0. At γe* = 0, ∂γe(t+1)/∂γe(t) =

[11exp𝑦ζ

e+ exp10+𝑦(1 + ζe) − exp10(1 + 2(1 + 𝑦)ζe)] [exp⁄ 10(−1 + exp𝑦)].

Figure 6. Phase diagram for γe with ζe = 1 and y = 5. The increasing linear line represents y = x.

This result is illustrated graphically in Figure 6. Comparing Result 3 for a uniform distribution with Result 4 for an exponential distribution suggests that convergence conditions are more demanding in a more complex context: all individuals must be pretty unlucky (say, y < 2/10). Thus, γe* = 0 is

globally stable, and the most commonly observed biases (i.e., γe > 0) again favour the learning

process. 0.5 0.5 1.0 1.5 2.0 gae t 0.5 0.5 1.0 1.5 2.0 gae t 1

(14)

13

Figure 7. Contour plot for γe = 0 stability conditions as a function of ζe and y. Numerical values on each line

represent ∂γe(t+1)/∂γe(t) in [-1,1].

Figure 7 provides stability conditions as a function of the learning rate ζe and the observed outcome

y, where these conditions must be met for each time t during the whole learning process.

Comparing Figure 4 and Figure 7 suggests that in a more complex world, there is less difference between reactive and obstinate people (say, ζe below and above 1, respectively). If the previous

learning process is analysed in a continuous time t, we obtain the following dynamics of γe(t):

γe(𝑡)

= IF ∫ exp10𝑧(1 + 𝑧)

−exp𝑦+ 2exp10+𝑦+10𝑧+𝑦[−1−𝑧]+ exp10+𝑦+10𝑧𝑧 − 2exp10(1+𝑧)(1 + 𝑧) + exp𝑦+10𝑧(1 + 𝑧)𝑑𝑧 #

1

Where IF means the inverse function, and

# = 𝑡ζe

exp10(−1 + exp𝑦)+ γe0

Where γe0 = γe(0). Thus, the learning period (here θe = ζe × t) is again crucial. In the case of

convergence,

γe(𝑡) = γe0 exp−θe

where γe0 depends on the individual’s previous experiences. In particular, in the case of

convergence, the difference between perceived regret and rejoice and perceived utility (i.e., ue –U),

as a function of the experienced outcome y and the learning time θe, is depicted in Figure 8.

Comparing Figure 5 with Figure 8 suggests that obstinate people are better off in a simpler context. Thus, in the short-run (i.e., few experiences), individuals who are moderately lucky (say, y > 1.6/10) are again better off if they adopt an expected regret and rejoice approach than if they adopt an expected-utility approach, and the improvement is larger if they are less reactive. Note that θe

may increase either because they are more reactive (i.e., a larger ζe) or because they have additional

experiences (i.e., a larger t). However, the mean of the difference between ue and U (De) over all

values of y is given by:

𝐷e = ∫ (λ 𝑦 exp−λ 𝑦)(𝑢𝑒− 𝑈) 𝑛

0

𝑑𝑦 = exp

−20−10exp [−θe] +θe(2exp10+θe(−1 +exp10exp [−θe]) − (−1 +exp10)(1 +exp10(1+exp [−θe]))) (1 +expθe)(1 + 2expθe)

(15)

14

for each θe ≥ 0, where this difference tends to 0 if θe tends to infinity, with its minimum and its

intercept (-0.15) at θe = 0. Comparing Du and De (i.e., Du < De for each θ) suggests that, in the

long-run (i.e., many experiences), obstinate people are worse off than reactive people in a complex world, and that the magnitude of the difference is greater than in a simple world.

Figure 8. Gains from ue (i.e., ue – U) as a function of ζe and y. Numerical values on each line represent ue – U ≥ 0.

The white area means ue < U.

Therefore, in a more complex world, everybody must be more unlucky (i.e., y < 16% rather than y < 66%) in order to learn in the long-run, whereas the difference between reactive and obstinate people is smaller in the short-run, both in terms of the relative unluckiness required to learn and in terms of gains with respect to the expected-utility choices. However, the opposite difference applies in the long-run, with reactive people being better off than obstinate people, and to a greater extent in a more complex world.

7.5. Learning expected utility with bias in estimating a uniform

probability distribution

Consider an individual who misperceives uniform probabilities (i.e., βu ≠ 1) and who perceives

regret and rejoice in accepting or rejecting bids in the case with a uniform distribution (i.e., γub ≠ 0),

but who updates their bias according to the following rule in a discrete time t (e.g., a daily route choice with personal experiences, so βu ≠ 1):

γub(𝑡 + 1) = γub(𝑡) – ζub[1 − 𝑢𝑢𝑏(𝑦, 𝑡)/𝑈(𝑦)]

Where

𝑢𝑢𝑏(𝑦, 𝑡) = 1 − exp−α 𝑦

− ∫ exp[−[− ln (1

𝑛)]βu(𝑡)][1 − exp−γu(𝑡) 𝑥]𝑑𝑥

𝑛

𝑦

+ ∫ exp[−[− ln (1

𝑛)]βu(𝑡)][1 − exp−γu(𝑡) 𝑥]𝑑𝑥

𝑦

0

𝑈 = 1 − exp−𝐴 𝑦

Result 5

In the case of a biased uniform distribution of bids, an individual learns the expected utility (i.e., γub

= 0) if their learning rate and experienced outcomes are small enough: ζub ≤ −2exp−𝑦+(−1)

(βu0−1)exp [−ηu]ln[10]1−(βu0−1)exp [−ηu]

(−1 + exp𝑦)

(16)

15 y ≤ 5√2

Proof

At γub* = 0, ∂γub(t+1)/∂γub(t) = [−1 +exp𝑦+exp𝑦−(−1)

(−1+βu0)exp [−ηu]ln[10]exp [−ηu](−1+βu0+exp [ηu])

ζub(−50 +

𝑦2)] /[−1 +exp𝑦]

Figure 9. Phase diagram for γub with ζub = 0.001, ηu = 0, y = 5, and βu0= 2. The increasing linear line represents y

= x.

This result is illustrated graphically in Figure 9. Comparing Result 5 for a biased uniform distribution with Result 3 for an unbiased uniform distribution suggests that the convergence conditions are extremely more demanding for Result 5 in terms of reactivity and equally demanding in terms of unluckiness: all individuals must be moderately unlucky (say, y < 6/10) and extremely obstinate (say, ζub < 1/1000).

Figure 10. Contour plot for γub stability conditions as dependent on ζub and y with ηu= 1 and βu(0) = 2. Numerical

values on each line represent ∂γub(t+1)/∂γub(t) in [-1,1].

Figure 10 illustrates the stability conditions as a function of the learning rate ζub and observed

outcome y, where these conditions must be met for each time t during the whole learning process. Note that if ηu is large enough (e.g., ηu > 50) to depict previous learning of βu, we obtain the same

result obtained in section 7.4.

Therefore, in an unrealistic context in which people do not learn (uniform) probability distributions before learning to reduce regret and rejoice (although reducing biases in assessing the probabilities

0.5 0.5 1.0 1.5 2.0 gaub t 1 2 3 4 gaub t 1

(17)

16

might require a larger number of observations), only very obstinate individuals will learn to behave according to the expected-utility model, in which luck is not relevant.

7.6. Learning expected utility with bias in estimating an exponential

probability distribution

Consider an individual who misperceives exponential probabilities (i.e., βe ≠ 1) and who perceives

expected regret and rejoice in accepting or rejecting bids in the case with an exponential distribution (i.e., γeb ≠ 0), but who updates their bias according to the following rule in a discrete time t (e.g., a

daily consumption demand based on personal experiences, so βe ≠ 1):

γeb(𝑡 + 1) = γeb(𝑡) – ζeb[1 − 𝑢𝑒𝑏(𝑦, 𝑡)/𝑈(𝑦)]

Where

𝑢𝑒𝑏(𝑦, 𝑡) = 1 − exp−𝛼 𝑦

− ∫ exp[−[− ln(λ exp𝑛 −λ x)]βe(𝑡)][1 − exp−γ𝑒(𝑡) 𝑥]𝑑𝑥 𝑦

+ ∫ exp[−[− ln(λ exp𝑦 −λ 𝑥)]βe(𝑡)][1 − exp−γ𝑒(𝑡) 𝑥]𝑑𝑥 0

𝑈 = 1 − exp−𝐴 𝑦

Result 6

In the case of a biased exponential distribution of bids, an individual learns the expected utility (i.e., γeb = 0) if their learning rate and experienced outcomes are small enough:

ζeb

≤ 2exp−ηe−𝑦(#)2(exp𝑦− 1)

expηe+exp−10 #exp[−η𝑒](10(−1 + β

e0) + 11expηe) − 2(exp−𝑦)#exp [−η𝑒]((−1 + βe0)𝑦 +expηe(1 + 𝑦))

1 + (1 + 10βe0)ⅇ−10βe0− 2(exp−𝑦)βe0(1 + 𝑦 + (−1 +βe0)𝑦) ≤ 0 expηe+ exp−10exp [−η𝑒](−1+exp [η𝑒])(−10 + 11expηe)

− 2(exp−𝑦)exp [−η𝑒](−1+exp [η𝑒])(−𝑦 + expηe(1 + 𝑦)) ≤ 0 where

# = −1 + βe0 + expηe

Proof

At γeb* = 0, ∂γeb(t+1)/∂γeb(t) = 1 + [expηe+𝑦ζeb(expηe+ exp−10 exp[−η𝑒](−1+βe0+exp[η𝑒])(−10 +

10βe0 + 11eηe) − 2(exp−𝑦)exp[−η𝑒](−1+βe0+exp[η𝑒])((−1 + βe0)𝑦 + expηe(1 + 𝑦)))]/(−1 +

βe0 + expηe)2(−1 + exp𝑦)

Figure 11. Phase diagram for γeb with ζeb = 1, ηe= 1, y = 0.5, and βe0 = 2. The increasing linear line represents y =

x.

This result is illustrated graphically in Figure 11. Comparing Result 6 for a biased exponential distribution with Result 4 for an unbiased exponential distribution suggests that convergence

0.5 0.5 1.0 1.5 2.0 gaeb t 0.5 0.5 1.0 1.5 2.0 gaeb t 1

(18)

17

conditions are less demanding in terms of reactivity and more demanding in terms of unluckiness for Result 6: all individuals must be definitely unlucky (say, y < 1/12), regardless of their reactivity.

Figure 12. Contour plot for γeb = 0 stability conditions as a function of ζeb and y with ηe= 1 and βe0 = 2. Numerical

values on each line represent ∂γeb(t+1)/∂γeb(t) in [-1,1].

Figure 12 illustrates the stability conditions as a function of the learning rate ζeb and observed

outcome y, where these conditions must be met for each time t during the whole learning process. Note that if ηe is large enough (e.g., ηe > 50) to depict previous learning of βe, we obtain the same

result as in section 7.5.

Therefore, in a realistic context where people do not learn (exponential) probability distributions before learning to reduce regret and rejoice (since reducing biases in assessing probabilities requires a larger number of observations), only very unlucky individuals will learn to behave according to the expected-utility model, in which their reactivity is slightly relevant.

8. Discussion

Generally speaking, by referring to keywords such as short-run vs. long-run, lucky vs. unlucky, learning vs. non-learning, reactive vs. obstinate, simple vs. complex world, better off vs. worse off., the insights obtained by this study can be summarised as follows.

In order to learn (in the long-run), individuals must be obstinate and unlucky in the short-run, although in a complex world, being unlucky is the most significant condition. However, individuals are better off if they are obstinate and lucky in the short-run, and the benefit is greater in a simple world, although they will be worse off in the long-run, and the drawbacks will be greater in a complex world. Note that obstinate individuals must be more unlucky than reactive individuals to learn in the long-run, and the magnitude of the effect is greater in a simple world.

Obstinate people are better off than reactive individuals in the short-run if they are lucky, although

they do not learn and are worse off in the long-run, and the magnitude of the effect is greater in a complex world. However, unless individuals are repeatedly lucky, they must and will converge on the expected-utility model, and obstinate individuals must be even more unlucky than reactive people in a simple world.

In order to be better off, individuals must learn in the long-run and must be lucky in the short-run. However, if individuals are lucky (in the short-run), they do not need to learn and do not learn. Note that reactive people will learn more quickly (in the long-run) than obstinate people, and the magnitude of the effect will be greater in a complex world, whereas obstinate people will be better

(19)

18

off than reactive people, if lucky (in the short-run), and the magnitude of the effect will be greater in a simple world.

Lucky people (in the short-run) will not learn, although they will be better off in the short-run, and

the magnitude of the effect will be greater in a simple world. However, if individuals are repeatedly lucky, they do not need to converge on the expected-utility model, and are even better off if they are obstinate.

The insights obtained by this paper can be summarised with respect to previous economic studies as follows. Like Gilboa & Schmeidler (1996), I show that individuals learn to behave according to the expected-utility model, but my findings do not depend on aspiration levels changing continuously. Like Oyarzun & Sarin (2013), I show that optimistic individuals choose a risky strategy, but my results do not depend on strictly preferences for second-order stochastic dominance. Like Agastya & Slinko (2015), I show that individuals will learn provided that certain events happen, but my insights do not depend on the stochastic process being an exchangeable sequence.

In summary, I combined the two main determinants of learning (i.e., reactions by individuals and the type of distribution of outcomes), but I obtained more realistic insights by starting from more realistic biases in more realistic contexts. Moreover, my results do not depend on individual characteristics that are difficult to observe (e.g., small learning rates) but on observable contextual characteristics (e.g., unlucky events that are more likely than lucky events). Finally, I develop insights by explaining how learning time is more relevant than the learning rate, where the former refers to an individual characteristic (i.e., large ε or ζ means high reactivity), whereas the latter refers to a contextual characteristic (i.e., small t means a relatively low level of repetition).

The main strengths of this approach can be summarised as follows:

1. The context is suitable to perform experiments on essential variables that lead to the expected-utility model.

2. The starting point is based on psychological studies.

3. The reason and approach to learning are based on neurological studies, 4. The endpoint is based on statistical studies.

5. The results are suitable for suggesting policies on the information that should be provided to lead to the expected-utility model.

The main weaknesses can be summarised as follows:

1. There is no exploration. However, exploitation is more likely to explain behaviours by the majority of individuals in real life, and exploration is meaningless here because individuals are assumed to know the probability distribution in their specific situation.

2. Specific utility functions are used. However, these are common to the literature and let me achieve analytical solutions, which in turn permitted a comparison of the results in alternative scenarios. This is not possible in the general learning framework used in the theoretical economics literature.

3. Regret and rejoice are weighted equally (i.e., δ = γ). However, more realistic alternative assumptions (e.g., δ < γ), which would differentiate a larger regret from being unlucky from a smaller rejoice from lucky events, would imply that individuals are expected to uncover one additional parameter, and individuals must be more unlucky.

4. A single-parameter, inverse-S-shaped probability-weighting function is used. However, this is well supported by empirical findings.

5. Individuals are assumed to know the probability distribution that exists for their specific situation. However, the comparison of scenarios in terms of their complexity suggests that in more complex worlds, where individuals are expected to learn additional parameters, convergence on the expected-utility model will require more unlucky individuals.

Note that the adopted simple search context allowed me to obtain the optimal decision at each time t in formal terms (i.e., the reservation bid). This let me explain the psychological determinants of the observed increasing adoption of the reservation rule over time obtained by experimental economics (e.g., Di Cagno et al., 2014) without relying on the context characteristics (e.g., recall vs. no-recall), and let me explain the adoption of intuitive decisions over optimal decisions at a given time, as has

(20)

19

been obtained by numerical economics (e.g., Sahm & von Weiszäcker, 2016), without relying on unobservable searching costs (e.g., the opportunity cost of time, the cognitive effort, and the information acquisition).

9. Conclusions

For economists who are accustomed to working with models of rational agents, it is easy to overlook how strong the assumptions of von Neumann-Morgenstern expected utility are. Experimental and empirical evidence show that agents typically act in some, but not all, circumstances according to what is predicted by this theory. The question of when the assumptions of von Neumann-Morgenstern expected utility may have greater empirical success must be addressed; in other words, the question of what leads to easy representation of the observed behaviour by those assumptions must be asked (see, for example, Klüppelberg et al., 2014).

This paper identified the contexts in which the expected-utility model can be taken as a behavioural model with the circumstances when learning is more likely to happen. In particular, learning is more probable in more realistic contexts, where a smaller number of observations are needed to reduce biases in predicted preferences than in estimated probabilities. Moreover, some individuals learn in the short-run (i.e., only unlucky individuals will learn), but all individuals learn (on average) in the long-run, whenever the probabilities of unlucky events are larger than those of lucky events. In other words, this study provides insights about the utility of misfortune. Finally, learning is more probable in more lifelike contexts, in which individuals are affected by the most frequently observed biases (i.e., anticipated or perceived regret and rejoice, over-assessment of small probabilities and under-assessment of large probabilities).

Note that the expected-utility model turns out to explain or predict individual behaviour, on average, when individuals know their degree of risk aversion. However, this feature is only weakly challenged by the economic and psychological literature.

Therefore, by applying a learning mechanism conceptually borrowed from neuroscience, although formally related to Bayesian updating, to the degree of regret or rejoice and to the degree of realism, this paper demonstrates that individuals who repeatedly make the same decision will behave procedurally rationally according to a non-expected-utility model at the beginning, and will behave fully rationally according to the expected-utility model at the end of the learning process. However, behaving fully optimally (on average) over time does not imply fully optimal behaviour at each time. Thus, non-standard decision-making theories seem to be most relevant only when the learning process has not been properly developed due to little experience with decisions (Birnbaum and Schmidt, 2015; Bonnefon and Sloman, 2013).

Note that the theoretical assumptions presented in this paper may appear to be daunting. Indeed, the fact that individuals do not know their degree of regret or rejoice and their degree of realism implies that individuals fail to predict the satisfaction that will arise from their choices: the standard assumption that preferences are exogenous is undermined, and the alternative assumption that preferences are constructed is supported. However, the theoretical findings presented in this paper appear encouraging. Indeed, the fact that each individual learns their degree of regret or rejoice and becomes steadily more realistic suggests that individuals refer to a finely tuned and fully assembled kit of preferences, although they can and will apply it only to already experienced decision problems (Ortner, 2016).

Needless to say, the findings presented in this paper require further investigation, both within the same contexts (e.g., by allowing the degree of risk aversion to be updated according to experience, or the degree of regret to differ from the degree of rejoice) and in alternative contexts. For example, research should consider unknown probability distributions (Palley and Kremer, 2014), unobserved outcomes of foregone alternatives, the additional possibility of choosing actions for exploration (Gonzalez, 2013; Cox, 2015), and learning from other people’s experience.

(21)

20

Appendix

In the case of a uniform distribution, the Bayes updating is given by: 𝑝[𝐸|𝐻𝑖] = (1 𝑛⁄ )𝛽𝑢1 =(1 𝑛⁄ ) 𝑝[𝐻𝑖]

(1 𝑛⁄ )β𝑢0 =

𝑝[𝐸|𝐻𝑖] 𝑝[𝐻𝑖] 𝑝[𝐸] By applying logarithms to both sides, we obtain:

βu1 ln[(1 𝑛⁄ )] = (1 − βu0)ln[(1 𝑛⁄ )] + ln[𝑝[𝐻𝑖]]

By subtracting βu0 from both sides, we obtain:

Δβu = (1 − 2βu0) + ln[𝑝[𝐻𝑖]]/ln[(1 𝑛⁄ )]

In the case of a uniform distribution, the learning rule adopted in the present study is given by: Δβu= −εu[1 − (1 𝑛⁄ )βu0−1]

By equating the right-hand sides of the last two equations, we obtain: εu =

ln[(1 𝑛⁄ )](2βu0− 1) − ln[𝑝[𝐻𝑖]] ln[(1 𝑛⁄ )][1 − (1 𝑛⁄ )βu0−1]

In the case of an exponential distribution, the Bayes updating is given by: 𝑝[𝐸|𝐻𝑖] = exp[−𝑦]βe(1)= exp[−𝑦] 𝑝[𝐻𝑖]

exp[−𝑦]βe0 =

𝑝[𝐸|𝐻𝑖] 𝑝[𝐻𝑖] 𝑝[𝐸] By applying logarithms to both sides, we obtain:

−𝑦 βe1 = −𝑦 (1 − βu0) + ln[𝑝[𝐻𝑖]] By subtracting βu0 from both sides, we obtain:

Δβe = (1 − 2βe0) − ln[𝑝[𝐻𝑖]]/𝑦

In the case of a uniform distribution, the learning rule adopted in the present study is given by: Δβe= −εe[1 − exp[−𝑦]βe0−1]

By equating the right-hand sides of the last two equations, we obtain: εe =𝑦 (2βe0− 1) − ln[𝑝[𝐻𝑖]]

𝑦[1 − exp[−𝑦]βe0−1]

Note that both εu and εe are decreasing in p[Hi] (i.e., individuals learn to a greater extent if they trust

their a priori estimation to a smaller extent), with both εu and εe tending to infinity if p[Hi] tends to

0 (i.e., individuals learn to the greatest extent if they trust their a priori estimation to the smallest possible extent), and increasing in βu if βu > 1 and in βe if βe > 1, respectively (i.e., individuals learn

to a greater extent if they are affected by the most frequently observed over-assessment of small probabilities and under-assessment of large probabilities).

References

Agastya, M., Slinko, A. (2015) Dynamic choice in a complex world, Journal of Economic Theory 158: 232-258

Analytis, P.P., Kothiyal, A., Katsikopoulos, K. (2014) Multi-attribute utility models as cognitive search engines, Judgment and Decision Making 9: 403-419

Balcombe, K. Fraser, I. (2015) Parametric preference functionals under risk in the gain domain: a Bayesian analysis, Journal of Risk and Uncertainty 50: 161-187

Bault, N., Wydoodt, P., Coricelli, G. (2016) Different attentional patterns for regret and disappointment: an eye-tracking study, Journal of Behavioral Decision Making 29: 194-205

Ben-Elia, E., Avineri, E. (2015) Response to Travel information: a behavioural review, Transport Reviews 35: 352-377

Ben-Elia, E., Ishaq, R., Shiftan, Y. (2013) "If only I had taken the other road...": regret, risk and reinforced learning in informed route-choice, Transportation 40: 269-293

Birnbaum, M.H., Schmidt, U. (2015) The impact of learning by thought on violations of independence and coalescing, Decision Analysis 12: 144-152

(22)

21

Bisière, C., Décamps, J.-P., Lovo, S. (2015) Risk attitude, beliefs updating, and the information content of trades: an experiment, Management Science 61: 1378-1397

Blavatskyy, P. (2016) Probability weighting and L-moments, European Journal of Operational Research 255: 103-109

Bonnefon, J.-F., Sloman, S.A. (2013) The causal structure of utility conditionals, Cognitive Science 37: 193-209

Boos, M., Seer, C., Lange, F., Kopp, B. (2016) Probabilistic inference: task dependency and individual differences of probability weighting revealed by hierarchical Bayesian modeling, Frontiers in Psychology 7: art. no. 755

Bordley, R., Uberti, M. (2015) A target-oriented approach: a "one-size" model to suit humans and econs [sic] behaviors, Applied Mathematical Sciences 9: 4971-4978

Bracha, A., Brown, D.J. (2012) Affective decision making: a theory of optimism bias, Games and Economic Behavior 75: 67-80

Buckert, M.A., Schwieren, C.B, Kudielka, B.M.C, Fiebach, C.J. (2014) Acute stress affects risk taking but not ambiguity aversion, Frontiers in Neuroscience 8: art. no. 82

Capra, C.M., Jiang, B., Engelmann, J.B., Berns, G.S. (2013) Can personality type explain heterogeneity in probability distortions? Journal of Neuroscience, Psychology, and Economics 6: 151-166

Cavagnaro, D.R., Pitt, M.A., Gonzalez, R., Myung, J.I. (2013) Discriminating among probability weighting functions using adaptive design optimization, Journal of Risk and Uncertainty 47: 255-289

Charles-Cadogan, G. (2016) Expected utility theory and inner and outer measures of loss aversion, Journal of Mathematical Economics 63: 10-20

Charupat, N., Deaves, R. Derouin, T., Klotzle, M., Miu, P. (2013) Emotional balance and probability weighting, Theory and Decision 75: 17-41

Chechile, R.A., Barch, D.H. (2013) Using logarithmic derivative functions for assessing the risky weighting function for binary gambles, Journal of Mathematical Psychology 57: 15-28

Chorus, C.G. (2014) Risk aversion, regret aversion and travel choice inertia: an experimental study, Transportation Planning and Technology 37: 321-332

Chorus, C.G., Bierlaire, M. (2013) An empirical comparison of travel choice models that capture preferences for compromise alternatives, Transportation 40: 549-562

Cox, L.A.T., Jr. (2015) Overcoming learning aversion in evaluating and managing uncertain risks, Risk Analysis 35: 1892-1910

Cubitt, R., Ruiz-Martos, M., Starmer, C. (2012) Are bygones bygones? Theory and Decision 73: 185-202

Dekker, T. (2014) Indifference based value of time measures for random regret minimisation models, Journal of Choice Modelling 12: 10-20

Di Cagno, D., Neugebauer, T., Rodriguez-Palmero, C., Sadrieh, A. (2014) Recall searching with and without recall, Theory and Decision 77: 297-311

Di Caprio, D., Santos-Arteaga, F.J., Tavana, M. (2014) The optimal sequential information acquisition structure: a rational utility-maximizing perspective, Applied Mathematical Modelling 38: 3419-3435

Duffy, J., Puzzello, D. (2014) Experimental evidence on the essentiality and neutrality of money in a search model, Research in Experimental Economics 17: 259-311

Easley, D., Rustichini, A. (1999) Choice without beliefs, Econometrica 67: 1157-1184

Eberhardt, F., Danks, D. (2011) Confirmation in the cognitive sciences: the problematic case of Bayesian models, Minds and Machines 21: 389-410

Riferimenti

Documenti correlati

Nuove tecnologie, come un ambiente virtuale di apprendimento integrato, unite a nuove metodologie come il problem sharing, il posing and solving e il collaborative learning,

Il 20 settembre 1788 Pietro Leopoldo aveva decretato la soppres- sione del tribunale della nunziatura e di ogni giurisdizione esercita- ta dal nunzio sul clero secolare e regolare;

In this case, we note that Spark outperforms SciDB, whose performance rises to about 1.5 hr when comparing .5 million regions of the reference with 101 million regions of

Marco Frey La green economy come nuovo modello di sviluppo Impresa Progetto – Electronic Journal of Management, n.. predisposti dall'UNEP e dall'OCSE, nonché ai recenti atti

che separa Iasos dalla terraferma, le mura della fortezza dell’istmo, unite alle mura classiche ancora esistenti lungo la costa occidentale della penisola e, dopo un tratto

(13). In conclusion, there is a general agreement that IL-6 plays a pathogenic role in all three models of arthritis examined, with particularly clear-cut results obtained in

A) Il fronte sulla via Machiavelli dove è stata tenuta sul filo stradale solo una rampa, quindi un semplice percorso pedonale collettivo agli spazi aperti, sempre