Adaptive learning and emergent coordination in minority games

(1)

Adaptive Learning and Emergent Coordination in Minority Games 1

G. Bottazzi**, G. Devetag***, G. Dosi** 2

**S.AnnaSchoolof AdvancedStudies

via G. Carducci,40 Pisa,Italy

***Department ofComputer and Management Sciences

UniversityofTrento

Via Inama,5- 38100 Trento, Italy.

Abstract

Thework studies theproperties ofa coordinationgame in which agentsrepeatedlycompete

to be in the population minority. The game re ects some essential features of those economic

situations in which positiverewardsareassigned toindividuals whobehaveinoppositionto the

modalbehaviorinapopulation. Here wemodelagroupofheterogeneousagentswhoadaptively

learnand we investigatethe transientand long-runaggregatepropertiesof thesystemin terms

of bothallocativeand informationaleÆciency. Ourresultsshowthat,rst, thesystemlong-run

propertiesstrongly depend on thebehavioral learningrules adopted,and, second,adding noise

attheindividualdecisionlevelandhenceincreasingheterogeneityinthepopulationsubstantially

improveaggregate welfare, although at theexpenseof alonger adjustment phase. In fact, the

systemachievesin that wayahigherlevelof eÆciencycompared tothat attainablebyperfectly

rationalandcompletelyinformedagents.

JEL codes: G1,D8,C6

Keywords: minoritygame,coordination,speculation,adaptivelearning,marketeÆciency,

emer-gent properties.

1

CommentsonapreviousdraftbyCarsHommesendNicolasJonardaregratefullyacknowledged.

2

(2)

This work investigates the dynamic properties of minority games under dierent forms of

hetero-geneity amongstagents and dierentlearningprocedures.

Inthegame,rstintroducedbyChalletandZhang(1997, 1998),apopulationofNagents(where

N is an odd number) must each simultaneously and independently choose between two sides, say

0 and 1. The side chosen by the minority of the agents is the winning one - each gaining a xed

positive reward-,whileall memberson themajoritysideobtainanullpayo.

Formally,themodelis anN-person coordinationgame withmultipleasymmetric Nashequilibria

inpurestrategies,andauniquesymmetricmixedstrategyequilibrium.Inthefollowing,weassumea

largepopulation,andweinvestigatetheconditionsunderwhichrepeatedinteractionamongstplayers

causes some formof aggregateself-organization to emerge.

Thegame,notwithstandingitsextremesimplicity,doescapturesomebasicfeaturesofquiteafew

processesof socialand economic interactions wherebyorderlycoordinationrests upontheexistence

orthe emergence ofsome behavioralheterogeneity intherelevant population.

Some coordination problems of this kind have long been discussed in Schelling's seminal work

on those patterns of \sorting and mixing" where order at an aggregate levelmight ndits roots in

persistent microadjustments withinheterogeneous populations (Schelling, 1978). They also beara

closeresemblancetothe\ElFarol"ProblemdevisedbyBrianArthur(1994). Inthisgame,anumber

of agents must independentlydecide each night whether or not to attend a bar called \El Farol".

Each agent receives a positive utility from attending the bar as long as this is not too crowded.

Otherwise he prefers to stay home. Somewhat similar coordination problems are involved in the

experimentalmarketentry games[e.g., Ochs(1990), Meyeretal. (1992); Ochs(1995) andreferences

therein] wherea groupof N players must decideat each stage whether or notto enter one ormore

marketseach havingaxed capacity k,withk<N.

Finally,the model captures some basicfeatures of speculationon nancial markets - which has

beenindeedtheprimaryconcernofminority-gamemodelingsofar. Aclassicreferenceinthisrespect

is to Keynes' \Beauty Contest" metaphor, where the payo to an individual player does not stem

fromtheaccuracyoftheappreciationoftheintrinsicbeautyofvariouscontestcandidatesbutrather

from guessing theguesses oftheensembleof theother evaluators.

Strictly speaking, the beauty contest metaphor crucially involves positive feedback investment

strategies-inthelanguageoftechnicaltradinginnance-sincethepayoisbasedonamajorityrule,

whileminoritygames(MG, hereafter) fundamentallyaddress thoseaspectsof speculationdynamics

involvingactivitiesofarbitrageagainstaveragemarket behaviors. Thesenegative feedback strategies

characterize agents trying to infersome - actual orimagined - structure in thehistory of collective

interactions and, through that, trying to \beat the market" - that is, \beat the majority view"

-byarbitraging against it. In theliterature on chartistrules,thisroughly corresponds to contrarian

strategies [e.g., inthebehavioralnance literature,De Bondtand Thaler(1995)].

In this work we shall precisely address the collective outcomes of such interactions in the MG

setup. Are orderlyaggregatecoordinationand absenceof arbitrageopportunitiesgenericproperties

of minority-type marketprocesses,independently of any further specicationof microeconomic

be-haviors? Or,conversely,donerdetailsoftheecologyofagentspopulations(suchastheirsheersize)

(3)

PreviousresultsontheMGbasedonadaptiveagentswithdeterministicdecisionrules[e.g.,

Chal-let and Zhang(1997, 1998); Savitet al. (1998)] do suggestthat diverse degreesof market eÆciency

fundamentallydependupon some underlyingheterogeneityin the population. Here we explore the

more general case whereby heterogeneous agents, endowed with memories of variable length, are

allowed to adaptively learn. Interestingly, rst, eÆcient coordination turns out robustly to be an

emergent propertyrestingon an ecologyof diverse agentswho do notplayNashequilibrium

strate-gies. Second,weshowthatcollectiveeÆciencyisnotmonotoniceitherintherationalsophistication

of the agents norin the informationthey are able to access. Rather, again, itcrucially dependson

the ecologyof behaviorsoverthepopulation.

In Section 2 we brie y illustrate the basic features of the game, study the properties of its

equilibriaandintroduceanotionofadaptivestrategy. Section3discussestheroleofheterogeneityfor

collectivedynamicsundertheassumptionofstrategiesoverthepopulationandadaptivedeterministic

behaviors. In Section 4 we introduce a probabilistic learningmodel and study both the transient

and limit properties of the dynamics. In particular, we analyze the allocative and informational

eÆciencies - which we shall dene below - of the system under dierent degrees of computational

complexity of thepurported agents and dierentparameterizations ofthe learningrule. Ingeneral,

our results suggest that it is heterogeneity inthe populationrather than individual computational

abilitieswhichmainlyaccountsfortheobserved aggregateproperties.

2 The Minority Game

2.1 The Baseline Game-theoretic Framework

The minority game is played by a group N of players, where N must be an odd number. On each

periodof thestage game,each playermustchoose privatelyand independentlybetweentwo actions

orsides,say0 and 1. Thepayo

i

fori2f0;1g isthesame forall N players and isequal to

i = ( 1 if n i (N 1)=2 0 otherwise (1) where n i

is the number of players choosing side i. Each player is rewarded with a unitary payo

wheneverthe sidehe chooses happensto bechosen bytheminorityof theplayers,whileplayerson

the majoritysidegetnil.

It is easy to see that the game has N

(N 1)=2

asymmetric Nash equilibria in pure strategies, in

which exactly (N 1)=2 players choose either one of the two sides. Clearly, under pure-strategy

equilibrium play, the payos for the two \parties" are quite dierent. Players belonging to the

minority side are rewarded a xed positive payo, while those on the majority side earn nothing.

The pure strategy equilibria, hence, are not strict, because players on the majority side are just

indierent between sticking to equilibrium play and deviating. The game also presents a unique

symmetric mixed-strategy Nash equilibrium,in which each players selects the two sides with equal

probability 1

.

1

(4)

implied by the pure strategy Nash equilibria is likely to rule out simple precedent-based solutions

of the underlyingcoordination problem. The goodness of the achieved coordination can be easily

measured by theaverage sizeof the winningminority. The further thissizeis from (N 1)=2, the

higheris theamountof moneywhichis,soto speak, left on thetable,hence thelower theresulting

aggregate welfare. Note that under mixed strategy equilibrium play, individual expected payo is

equal to .5 on each period of the stage game, and the group payo follows a binomial distribution

with mean equal to N=2 and a variance of N=4. The measure of variance is indeed an equivalent

measure of the degree of eÆciency achieved in a population. The higher the variance, the higher

the magnitude of uctuations around the mixed strategy Nash equilibrium and the corresponding

aggregate welfare loss.

2.2 The Behavioral Foundations of the Minority Game: Adding Beliefs

Letusstartbydiscussingthenature ofthegameequilibriawhenplayerschoosetheiractions onthe

basisof idiosyncratic(i.e. agent-specic)beliefsaboutother players'aggregate behavior,and try to

exploit the latter to theiradvantage. This perspective, which amounts to consider agents endowed

withinductiveratherthedeductiverationality,hasbeenadoptedintheminorityliteratureincluding

the El Farol model (Arthur, 1994) and it is common to multi-agent accounts of nancial markets

[e.g., Arthuretal. (1997); Brock andHommes (1998); LeBaron(2000) and referencestherein].

One way to model beliefs in this setting, which has traditionally been used in the previous

simulationstudiesontheMG, isto endow players withsets ofadaptive strategies,wherea strategy

may be dened as a function mapping a particular sequence of observed past outcomes up to a

certain period with an action to choose in the next period. In the following we show indeed that

an explicitaccount of thisform of strategic behaviordoesnot change the setof the mixed strategy

Nash equilibriaof thegame.

Consider a group of N agents who play the MG with a payo function 2

as from Eq. 1. The

onlyinformationmadeavailableafter eachroundisthewinningside(0or1). We denethemarket

information as the (history) H of play, i.e., a binary string specifying which side has won in each

period of the stage game. We also dene a parameter m as the portion of the past history H that

players retain in memory. So, if m = 3, players' strategies will be based only on the outcomes

observed inthelast3 roundsof thegame.

Anadaptive strategy maybe denedasa prescriptionontheactionto take onthenextroundof

play(i.e.,to choose 0 or1) providedthat a particular history(i.e.,a particular sequence of m bits)

hasbeenobserveduptothatpoint. Forexample,inthecaseinwhichm=3,anexampleofstrategy

is showninTable1.

The history columns specify all the possible sequences of aggregate outcomes (i.e., of winning

sides) in the last m periods; the action columns specify which action to choose on the next round

incorrespondenceto each particularsequence observed 3

. Given a certainvalueof theparameterm,

2

Various modications of this payo functionhave been proposed by Challet and Zhang and by DeCara etal.

(1999)). We stick to the original one for its simplicity and because the essential features of the model are highly

insensitivetotheproposedvariations.

3

Noteagainthatthisdenitionofstrategydiersconsiderablyfromthenotionofstrategyascompleteplanofaction

(5)

thetotal numberof strategiesthat can begenerated isequalto 2 2

,correspondingto the2 2

ways

of assigningeitheraction to allthe 2 m

possiblebinarystringsof lengthm.

Given the foregoing denitionof strategies, we study, asa rst important benchmark,the

equi-libria associated with the \homogenous" game where all the players have access to the full set of

strategies. Moreover, ratherthanconsideringthedynamicsof thehistorystringh,generatedbythe

successive players' choices, we begin by focusing on the static case requiring that all the strings h

can appearwithequalprobability 4

. Underthishypothesis,despitethehighnumberofstrategies,we

showthatauniquemixedstrategyNashequilibriumexistsbywhichplayersassignequalprobability

to choosing either side, 1 and 0, regardless of past history. However, given the above denitionof

strategy, there are actually innite ways in which players may end up choosing sides with equal

probability.

In order to clarify the analysis that follows, let us introduce some notation. A pure strategy

s 2 S

l

is a mapping H

l

! f0;1g from the set of binary strings of length l to the set of actions.

Consequently,amixedstrategym=fx

;2S l j P 2S l x

=1gcanbethoughtasamapH

l

![0;1]

whichassociatesto each binarystringh2H

l theprobabilitym(h)= P 2H m x

s(h)thantheaction

following it be 1. Let

l

be the 2 2

l

dimensional mixed strategies simplex. For what follows, it

is important to notice that the actual mapping between the mixed strategies space and the space

2

l

[0;1]ofmaps fromH

l

to [0;1]ismanyto oneand surjective,i.e. therearemanywaysofbuilding

up the same mixed strategy starting from the pure strategies set and each mixed strategy can be

realizedin atleast one way.

Considera populationof N (odd)agents. If m 2 N

m

is a mixed strategy prole(the vector

composed bythe players mixed strategies) we denote the mixed strategy of player ias m

i

and the

vector of mixed strategies of all players except i as m

i

. The mixed strategy prole denes, for

eachbinarystring,thepopulationprobabilityofplaying1(theexpectedfrequencyofplayersplaying

action1) associatedto a given binarystring:

m(h)= 1 N N X n=1 X j2S l m j (h)= X j2S m P j (m)s j (h) (2) whereP j

(m) istheexpectedprobabilityofplayingstrategyj(thefrequencywithwhichthisstrategy

willbe played).

The payofunction inEq.1 denes theexpected payoof playeri given the past historyh and

the completemixedstrategyprole m:

U m i ;m (h)=m i (h)(1 m(h)) +m(h)(1 m i (h)) (3)

and we say that a mixed strategym

1

is superior to m

2

if, given the mixed strategy prole m, it is

U m 1 ;m (h) U m 2 ;m (h), 8h2H l

. This denitionallows one to discuss equilibriain a way similarto

what is doneinmore canonical populationgames.

TheoremThestrategy prolem isa Nashequilibriumi m

i

(h)=1=2, 8m

i

2m, 8h2H

l

Marengo andTordjman(1996).

4

This prescriptionis introduced for purposesof analyticaltractability andstems froma symmetryconsideration:

(6)

the players'mixed strategies assignprobabilityone halfto pickeitherside,1 and 0irrespectivelyof

the pasthistory,i.e. when allthe players areperfectlysymmetric randomplayers. Notice, however,

that the ways of realizing such a mixed strategy are innite. Hence, the actual number of mixed

strategy Nashequilibriais also innite.

Proof

In order to demonstrate the assertion we proceedbyidentifyingthe i-th player'sbest repliesto

a given mixed strategy prolem

i

forthe rest of thepopulation. Forthispurpose, it isconvenient

to expressthepayoinEq. 3isolatingthecontributionofthei-th player'sstrategyfromthat ofthe

rest of thepopulation:

U m i ;m i (h)= N +1 N m i (h)+ N 1 N m i (h) 2 N m i (h) 2 2(N 1) N m i (h)m i (h) (4) and m i

(h) hastobe chosen inorder to maximizetheforegoingexpression. The solutiondependson

the value ofm

i

(h)and thequadraticform of Eq. 4allows one to immediatelyndit. Dening

p (h)= N+1 2(N 1)m i (h) 4 (5)

the solutionreads

^ m i (h)= 8 > < > : 0 p (h)0 1 p (h)1 p (h) otherwise (6)

The same procedure can be repeated for each string h to obtain the best reply mixed strategy to

m

i

(h). In order to obtain theset of Nash equilibria,we have to nd strategies that belong to the

set ofmutualbestreplies,i.e. m^

i

(h)=m

i

(h), 8h2H

l

. InspectingEq. 6,itturnsoutthat thiscan

happen onlywhen 0<p

(h)<1 andthe followingconditionissatised

m i (h)= N+1 2(N 1)m i (h) 4 (7)

The solutionisthen m

i

(h)=1=2,implyingm^

i

(h)=1=2.

Q.E.D.

3 The Minority Game with Heterogeneous Agents

One of the interesting questions one may explore in the MG framework is whether and how the

presenceofheterogeneityamongplayers'beliefsmayimpactonthesystemdegreesofself-organization

around theNashequilibriaofthe game.

Inlinewithprevious simulationstudiesbyChalletand Zhang, one mayintroduceheterogeneity

in the populationvia a diversity of strategies. Given a certain value of m, each player is initially

endowed with a number s of strategies which are randomly drawn from a common pool. Hence,

although both m and s are identical over the population, heterogeneity arises from the random

initial strategyassignments.

Inthecourse ofplay,each active strategy iischaracterized byavalue q

i

(t), which indicatesthe

(7)

the choice of thesidewhich expostresultedthewinningside)areassigned one point each.

Given theinitial strategy assignment and theupdatingrulejust described, players' behavior at

each round ofthe stage game iscompletely deterministic,aseach agent picks,among thestrategies

he possesses, theone withthehighest numberof accumulatedpoints.

Notethat in such a framework,however, agents' heterogeneityis only introducedvia the initial

assignment of strategies to players and does not stem from dierent \personal histories". If, for

example, the same strategy i is initially assigned to two dierent players, the value of q

i

(t), which

determinesthestrengthofthat strategyattimetwillnecessarilybethesame forbothplayers,asit

willonlydependon thenumberoftimes thatstrategyhasmadea correctpredictionwhether itwas

actually played ornot.

In order to evaluate the population's performance, it is necessary to introduce a measure of

allocative eÆciency. A naturalcandidate isprovidedbytheaverage numberof players belongingto

thewinningparty,i.e. theaverage numberofpointsearnedbythewholepopulation 5

. Inaccordance

with the previousliterature, asa measure of allocative eÆciency we computea quantityassociated

to theforegoing one,namely themeansquared deviationfrom thehalfpopulation,. Let N be the

numberofagents, andN

0

(t)thenumberofagentschoosingside0attimet,theninarepeatedgame

of durationT themean squareddeviationis computedas

= 1 T T X =0 (N 0 () N 2 ) 2 : (8)

Clearly,the lowerthevalue of,the higherthesystemeÆciency.

Thedynamicprocesscanbeexpectedtodependtosomedegreeontheinitialstrategydistribution

andon theinitialgamehistory,whicharebothgeneratedrandomly. Therefore,inordertoeliminate

any dependence on initial conditions from our results and to focus only on asymptotically stable

states, in allthe simulationspresentedhere we appliedan averagingprocedureover50 independent

sample paths withrandomlygenerated initialhistoriesand strategydistributions.

In addition, at the beginning of each simulation the system was left to evolve for an initial

\adjustmentphase"oflengthT

0

inordertowashawayanypossibletransienteectsonthesubsequent

averagingprocedure. Thequantitiessoobtainedcanthusbeconsideredasymptoticpropertiesofthe

systemas longasT

0

and T arechosen highenoughasto provideagoodapproximationof thelimit

T !1.

The dependence of the volatility measure on N, m and s for the original minoritygame has

beenthoroughly investigatedinthe previoussimulation studiesandis summarizedin Fig.1 forthe

case s=2.

As noticed by Savit et al. (1998), the type of market regime is determined, at least in rst

approximation, by the ratio z = 2 m

=N: hence the curves for various N collapse if plotted in this

variable. In this respect, notice that even if the actual number of possible strategies is 2 2

m

, their

relativestrengthsarecompletelydenedintermofthefrequencyP(0jh

m

)withwhich,overahistory,

a0followsagivenm lengthstringh

m

. Andthere are2 m

ofsuchvariables. So,zcanbeinterpreted

asthe densityof agentsinthe strategyspace degrees-of-freedom.

5

RecallfromthepreviousSectionthatthisquantitymeasuresthedegreetowhichthepopulationisclosetoaNash

(8)

regime" occurs when z is large (the agent are sparse in the strategy space). Here the system can

hardly organize and its behavior can be described as a collection of random agents choosing their

side with a coin toss. In fact suppose the past history to be a given h

m

and suppose there are

N

d (h

m

) agents whose strategies prescribe dierent actions based on that history while there are

N 0 (h m ) and N 1 (h m

) agents whose strategies prescribe the same party (we restrict ourselvesto the

s = 2 case), respectively 0 and 1. If the agent in N

d

choose randomly, the variance is (h

m ) = N d (h m )=4+(N 0 (h m ) N 1 (h m )) 2

=4. The average over the possible h

m

will then give = N=4.

Notice that is shaped bytwo factors, namely a uctuation inthechoices of agentsable to choose

and a uctuation in the initial distribution of strategies. Note also that the allocative eÆciency in

thiscase (asmeasuredby) equalsthatof apopulation ofplayers playingthemixed strategyNash

equilibriumdened over thetwo actions0 and 1(cf. Section2.1).

The second regime could be called the\ineÆcient regime" for z <<1. Here theagents densely

populate the strategy space, and their actions are strongly correlated. This correlation leads to a

worsening of the overall performance dueto a \crowd" eect (Johnson et al. 1998): the agents in

fact are too similar to each other and they all tend to choose the same party on the basis of the

availableinformation.

The third regime for z 1 is where coordination produces a better-than-random performance.

Heretheagentsaredierentiatedenoughnottoyield\crowd"eects,butatthesametimesuÆciently

distributedoverthestrategy spacesoas notto producea random-like behavior.

TheliteratureontheMGinfact, hasmostlyfocusedonthecriticalityofthevaluez

c

where is

minimum,suggestingthatamajorchangeinthesystembehaviorhappenswhenthispointiscrossed

(Challet and Marsili,1999). Aswe willsee in the followingsections, thiscriticalitysurvivesto the

introductionof theprobabilisticlearningrules.

4 Adaptive Learning

4.1 The Model

What happensto the system properties when additional heterogeneity isintroduced inthe

popula-tion? In particular, what happens when agents are endowed with a probabilisticdecision rule? In

order to tackle the problem, in the following we investigate changes in the dynamics and

asymp-totic properties of a populationof agents playingtheMG as a functionof changes in thenature of

the agents' learningmodels. Hence, we leave unaltered thesetup previously described, and modify

onlythe way inwhich agents updatetheir strategies' relative strength. In particular, we adopt the

followingprobabilisticupdatingrule.

Recall the denitionof q

i

(t) as the total numberof points strategy i would have won if played

untiltimet. Then eachagent chooses among herstrategies followingtheprobabilitydistribution:

p i (t)= e q i (t) P j e q j (t) : (9)

wherethesum onj isoverallthestrategies possessedbytheplayer. Notethat, ingeneral, dierent

(9)

endow-it alsoincreases heterogeneity inthepopulation.

The choice of a stochastic learning model is well supported by the available experimental

evi-dence on adaptive behavior in games as wellas by evidence from psychology experiments. In fact,

most descriptive modelsof adaptive learning, whether belief-based orreinforcement-based, implya

probabilistic choice and updating of availableactions [cf. Erev and Roth (1998), Camerer and Ho

(1999)].

Theparametercanbeconsideredasasortofsensitivityofchoicetomarginalinformation: when

it is high,the agents are sensitive even to little dierences in thenotional score of their strategies.

Inthelimitfor !1theusualminoritygameruleisrecovered. Onthecontrary,forlowvaluesof

a great dierence in thestrategies' strengths is required in order to obtainsignicant dierences

inprobabilities.

The model indeed bears closesimilarities with a discrete timereplicator dynamics (see Weibull

(1995)). The connection is straightforwardifone looks at theprobabilityupdatingequation

associ-ated withEq. 9:

p i (t+1)=p i (t) e Æq i (t) P j p j (t)e Æq j (t) : (10) whereÆq i (t)=q i (t+1) q i

(t)arethepointswonbystrategyiattimet. Ifonethinksofacontinuous

process Æq i (t)=q_ i (t)Æt, where q_ i

(t)is the instantaneous\tness" of strategyi, then thecontinuous

time replicatordynamics equation isrecovered keepingonlythe rstterms inÆtexpansion:

_ p i (t) p i (t) =q_ i (t) X j p j (t)q_ j (t) (11) 4.2 Transient length

In everything that follows we will restrict our analysis to thecase N = 101 and s= 2 and we will

speakof theoptimalvalue formemorylengthm

o

withreference to the value of m which minimizes

underthisparameterchoice 6

.

Let us consider the problem of dening the correct values for T

0

and T in Eq. 8 above. The

central questionis: How longmustthe system be left evolving beforeit reaches theasymptotically

stable dynamics? Fig. 2 plotstheaverage value based on the\deterministic" (i.e. =+1) MG

as a function of the time lengthT over whichthis average is taken witha transient T

0

= T. As it

can be seen from the graph, thevalues used inthe literature on the MG are generally suÆcient to

obtaina predictioncorrect to fewpercentagepoints. However, two propertiesareworthnoticing:

Thesystemapproachestheasymptoticvaluefromabove,intuitivelysuggestingthatthesystem

\learns" overtimeto self-organize

For low values of m, in the \ineÆcient regime", and for high value of m, in the \random

regime",the systemreaches astable dynamicquitefast. Onthe contrary,forvaluesof m near

theoptimalvalue m

o

,thesystemtakesa longertimeto self-organize

6

Thechoicetosets=2isjustiedbythefactthatthesystemexhibitsthesamequalitativepropertiesforanys2,

(10)

highvaluesof thislearningruleapproachesthestandardone,andaccordingly,thetransientlength

issimilartotheonefoundinsuchcases. However,as decreases,thelengthgenerallyincreases. The

increaseismostdramaticforvaluesof mneartheoptimalvaluem

o

,andit progressivelydisappears

for higher values of m. The interpretation of such a result stems from the meaning of in terms

of the learningrule. Supposingnon trivialdynamicsfor m near m

o

, theparameter sets the time

scale onwhichsuch dynamicsare attained.

As an illustration, consider the following example: Let be r(t) = p

1 (t)=p

2

(t) the ratio of the

probabilities thatan agent associates to his two strategies, and q(t)=q

1 (t) q

2

(t)the dierence

intheirrespective strengths. From Eq.9 itfollowsthat r(t)=e q(t)

. Assumingthatthedierence

inthetwo strategiesperformanceholdsconstantovertime(anassumptionwhichisgenerallytruein

theinitial transientregime whereagents' behaviorisbasicallyrandom) we obtainq(t)t: hence,

from the equality above, a given dierence in probability is obtained at a time which is inversely

proportionalto .

Inordertoestimatethetimescaleoverwhichthesystemlong-runpropertiesareattained,weuse

the followingprocedure. Holding all the parameters and theinitial conditionsconstant, the system

volatilitycanbeexpressedasafunctionofboththetransientphaseduration,andofthetimelength

overwhichitisaveraged,i.e. =(T;T

0

). StartingfromareferencetimeT

r ,

7

wecomputethemean

volatilityprogressivelydoubling T and T

0

,and thusobtaining aseriesof values

n =(2 n T r ;2 n T r ).

When therelative variationj

n

n 1 j=

n

fallsbelowaxed threshold,we stopand take thelast

computedvalueof asanestimateofitsasymptoticvalue. Thecorrespondingtimelength ^

T()will

be an estimate of the time implied by the system to reach this asymptotic state. As can be seen

in Fig. 6 the increasein ^

T when is lowered is mainly concentrated around m

o

, with shapes that

suggest thepresence ofa regime discontinuity.

4.3 Allocative eÆciency

In order to analyze the asymptotic properties of (m) for dierent 's, we use the procedure just

describedregardingthecalculationof ^

T,i.e. weleavethesystemevolveuntil\stability"isreached 8

.

The simulationresults are plottedin Fig. 5. Interestingly, when decreases, the performance level

of thesystem generally increases. Moreover, such increaseis largerthe lower thevalueof m, and it

becomesnegligibleformm

0

. Theobservedbehaviorisconsistentwiththeideathatforhighvalues

of m, the system dynamics tends to be determined by the initial distribution of strategies among

players, whileplayers themselves have no opportunityto attain a higher performance by adjusting

their behavior. Recall that an increasing m means an increasing number of possible strategies over

whichplayersmayinitiallydraw. ForaxedN,the\ecology"ofdrawnstrategiesbecomesthinneras

comparedtothenotionallyavailableones. Thatphenomenon,itturnsout,preventsthesystemfrom

self-organizing. Note thatthis propertyisquite robustand largely independentfrom the particular

learningrule adopted. Our results, infact, are perfectlyin linewith previous simulation studies in

the highm region.

7

NotethatthechosenvalueforT0isirrelevantaslongasitissmallcomparedtothetypicaltimescale.

8

Letusemphasizethatthestablestatedoesnotimplypointconvergencetoanystatebutsimplylong-runstability

(11)

terms of aggregate eÆciency. In fact, when m issmall, the originallearningrule( =1)produces

a \crowd" eect (corresponding to large groups of agents choosing the same side) which, due to

homogeneityintheinitialstrategyendowments,preventsthesystemfromattainingahighdegreeof

eÆciency 9

. Onthe contrary, theintroductionof aprobabilisticlearningrulefor thestrategy choice

actslikeabrakethatdumpstheamplitudeofsuchcorrelated uctuations. Attheindividuallevel,this

correspondsto lower's,i.e. tohigherdegreesof\inertia",asagentsupdatetheirprobabilitiesmore

slowly. In other words,as decreases each agent behaves asifhe wasapplyinga sort of stochastic

ctitious playapproximation[e.g., Fudenbergand Levine(1998)] 10

,withan implicitassumptionof

stationarity on the distribution of other agents' choices. If the whole population shares the same

- as in the present model - the assumption is, in a way, self-fullling: a decrease in makes the

behavior of the populationas a whole change at a slower pace. A slower probability updating at

the individual level and the resulting more stable collective behavior, together, imply that is a

non increasing function of . In fact, if the system reaches a dynamical stability via an averaging

procedureoverthepastoutcomes,increasingthetimescaleoverwhichaveragingoccurscannotrule

outpreviously attainableequilibria.

However, note that in order to capture the long run eÆciency properties of the system for low

values of it is necessary to let the population play until stability is reached, according to the

procedure described above. Our results, in fact, were not captured by previous simulation studies

(Cavagna etal.,1999). Since thelatter wereall performedwitha xed timelengththeirconclusion

wasthat when is smallenough, the systembehaviorresemblesthe behavior of a randomsystem.

Thisndingwasinfactduetoboththeincreaseinthetransientlengthandthepurelyrandominitial

dynamics which occur when is decreased. Then, by xinga timelength, forsmallvaluesof the

simulations capture throughout the adjustment phase, and the system behavior perfectly mirrors

that of a collection of agents who choose at random (indeed our results of a xed time simulation

are plottedinFig. 6and areperfectlyinlinewiththe existingliterature 11

).

As can be seen in Fig. 5 the performance attainable in theMG via a collective organization of

agents withlimitedinformationand limitedabilitytochooseis,ingeneral,surprisinglyhigh. Inthe

followingwe showthatthelevelofeÆciencyachieved inthedoublelimit !0and m!0actually

equals that attainable with hypothetical perfectly informed and perfectly rational agents endowed

with agreater exibilityinchoice.

Consider for instance a collection of agents with the following characteristics: each agent is

assigned S = 2 strategies, and a vector of length 2 m

containing the probability p(h

m

) of playing

according to its rst strategy after theoccurrence of the historystring h

m

. Moreover, for each h

m ,

each agent knows the values of N

0 (h m ), N 1 (h m ) and N d (h m

) indicating respectively the number

of agents for which their strategies both prescribe to play 0, both to play 1, or to play dierently.

Assuming that the game structure and the amount of information available to agents is common

knowledgeandthattheagentsareperfectlyrational,theproblemcompletelyfactorizesand,foreach

9

Insomesense,onecaninterpretthe\crowd"eectasacollectiveformofoverreaction (Thaler,1993).

10

Notethatctitiousplayimpliesthataplayerbestrespondstotheobservedfrequencyoftheopponent'splay.

11

In otherwords, to discoverthe asymptoticproperties ofthe system, the limitsT ! 1 and ! 0have to be

(12)

m d m m (N 1 (h m ) N 0 (h m )) 2 p(h m )N d (h m ) (12)

i.e. making theaverage fraction of thepopulation choosinga given sideasnear to N=2 aspossible.

This choice willproducea volatility N

d

=4 =N=8 which is roughlysimilar to what obtained in

the simulation shown in Fig. 5 in the low m, low region 12

. However, note that, as from Fig.5,

these fully rational, fully informed players are (on average) \beaten" by a set of \self-organizing"

agents withmemorymm

o

,reaching anearly doubleeÆciency.

A nal remarkconcerns thevariance of thedistribution of asa function of forvarious m's,

as shown in Fig. 7: when decreases the variance of decreases for any m. However it remains

three times greater for m = m

o

suggesting a stronger dependence of the asymptotic performance

on the initial strategy assignment which the system is not able to wash out. That is, signicant

path-dependence eects arepresent.

4.4 Informational eÆciency

What wehave beencallingallocative eÆciencybasicallyhighlightsthecollectiveabilityof capturing

thepayoswhichthegamenotionallyallows. Acomplementaryissueregardstheinformational

eÆ-ciency ofthemarketprocess,i.e.,theextenttowhichthefuturesystemoutcomesareunpredictable,

or,inotherwords,theabsenceofanyarbitrageopportunities. Thus,letusanalyzetheinformational

content of the binary string H of successive winning sides. Let p(0jh

l

) be the probability that a 0

follows a givenstring h

l

ofall thepossible2 l

stringsoflengthl (as depictedinFig. 8).

For theoriginal \deterministic" MG theanalysis performed inSavit etal. leads to the

identi-cationof two regimes: a \partiallyeÆcient regime" form<m

o inwhichp(0jh l ):5;8h l aslongas

l m and inwhich no informational content is left forstrategies withmemory less orequal to the

one usedbythe agents; and an \ineÆcient regime" form >m

o

in which thedistribution of p(0jh

l )

isnot at,evenforlm,meaningtherearegoodstrategiesthatmight exploitthemarket signalto

obtaindierentialprots. Forl>m boththeregionsshowanon trivialdistributionp(0jh

l

) withan

increasingdegree of \ruggedness"asl increases. The eectof introducingsome degreeof behavioral

randomnessthroughtheparameter leadsto theobviouseect of reducingthe\ruggedness"ofthe

distribution ofp(0jh

l

)(see Fig 8).

In order to study the behavior of the system as changes we introduce two related quantities

which can be used to characterize the informational content of the time series. The rst is the

conditional entropyS(l) denedas:

S(l)= X h l p(h l ) X i2f0;1g p(ijh l )logp(ijh l ) (13)

wherethesummationisintendedoverallthepossiblestringsoflengthlandp(h

l

)isthefrequencyof

agivenstringinthesystemhistoryH. ThemaximumvalueS(l)=1isreachedfora atdistribution

p(0jh

l

)=:5;and itcorrespondsto theimpossibilityof forecasting(inprobability)thenext outcome

12

Weare assumingN =N1(hm) N0(hm)<Nd(hm). Noticethat forrandomagentsN p

(N)andNdN

(13)

market"leadsto thedenitionof asecond quantity A(l): A(l)= X h l p(h l )maxf p(0jh l );p(1jh l )g (14)

which isthe average fraction of points won by thebest strategyof memory l. Thisis a measure of

the rewardobtained bythe best arbitrageur with memoryl ( whereas ifno arbitrage opportunities

are present A(l) is equalto :5.)

Beforeanalyzingthebehaviorofthesequantitieswhen isvaried,letusbrie yconsiderasasort

ofidealbenchmarkthepropertiesofapopulationcomposedofperfectlyrational,perfectlyinformed,

agentswithcommonknowledgeofstrategydistributions. Notsurprisingly,inthesecircumstancesthe

problemfactorizesforeachpasthistoryandthedependenceon mdisappears. Thehistoryproduced

by such a system is a random series of 0 and 1. Indeed the number of agents choosing one side is

distributedaccording toabinomialaroundN=2 withdierentwidthsfordierenth

m

. Inparticular,

thismeansthat insuch an extremecasethe \memory"loses anypredicting power andno arbitrage

opportunityisleftforagentswithlongermemory,i.e. noresidualinformationisleftinthetimeseries

and thebehaviorof agentsmakesthemarketperfectlyeÆcient froman informationalpointof view.

In the lastresort, there is nothing to be learned from any history because agents know everything

from the startand coordinate their mixed strategies accordingly. Under thisassumption we expect

S 1 and A:5.

Short of such an idealcase wherethe market losesits coordinating role, because agents ex ante

generate the coordination \in their heads", let us consider, for example, a population of \random

agents". Here, due to the unbalance in the initial strategy endowments we expect a non trivial

structure toappear forevery l;thusS <1 andA>:5.

InFig. 9 we plot S(l) and A(l) forhistoriesgeneratedwith avalue ofm<m

0

,in the\partially

eÆcient regime". The eect of decreasing shows up when l >m butthe informationcontent for

high l is never completelyeliminated. The marketbecomes less eÆcient thelarger the time scale l

at whichit isobserved. Infact itcan be shownunder very generalassumptionsthatcertain strings

inthehistoryaremoreabundantthanothers(Savitetal.,1998)andthelong-rangecorrelation that

wasresponsibleforthe\crowd"eect athigh survivesasanontrivialstructureinp(0jh

l

)forhigh

l. Allthisappliesdespitethefactthatto anagent withmemorylm themarketappearsperfectly

eÆcient regardless ofthe value.

For values of m> m

0

in the \ineÆcient phase" the eect is in some sensereversed. As can be

seen in Fig. 10 the eect of decreasing is again negligible for l m but in the limit ! 0 the

curve becomes atforl>m. Thislastresult deservessome comments: the atnessinlm means

that no gain is achieved from inspectingthe time series witha very long memory l >>m because

no more arbitrage opportunities are open for a longer memory agent than the best possible agent

of memory m. The market can be considered to be, again, partially eÆcient in the sense that it

generates anupperboundonthemaximalattainablearbitragecapabilitywhichdoesnotdependon

the arbitrageurmemory.

The particular form of the conditional entropy in Fig. 10 suggests that in the limit ! 0

the system can be described as a Markov chain of memory m 13

. The result can be explained

13

(14)

relevant andinthelimit !0onlyinnitedierencesbecomerelevant. Puttingitanotherway,the

frequency of victories of the various strategies becomes constant implyingthe formationof a static

hierarchicalstructure inthestrategy space which at theendis responsibleofthe Markov character

of the resulting history. The appearance of \best strategies" in m >m

o

region is well revealed by

the plotof theaverage score bythebeststrategyversus theaverage pointscored bythe player(see

Fig. 11): a correlation infact appears betweenthe performance of a player and theperformanceof

its best strategy. Moreover, inthe mm

o

region a sub-population showingthe same kind of high

correlationcoexists withanother groupthat presentsno correlation, composedof agents possessing

two equally performingstrategies.

Conversely,onlyinthelowmregionnostrategyendsupbeingpreferabletoothersandnoplayer

is boundto lose dueonlyto hisinitial strategyendowment.

Notice, however, that equivalence between strategies does not necessarily imply equivalence in

the agents' performances. This is highlighted by the plot of the variances and the supports of the

pointsdistributionfordierentvaluesof andm inFig.12. Interestingly,onlyforlowmandlow

does equivalence instrategy performance implya relativelyuniform distribution of pointsover the

population. In the other parameter regions learning does not eliminateperformance heterogeneity

overthepopulation. Looselyspeaking,themarketself-organizesoveranecologyof\mentalmodels"

and players,entailingthelong-runcoexistence of relative \winners"and \suckers".

5 Conclusions and Outlook

One of the central questions we addressed in this work is the extent to which market dynamics

generated by arbitraging activities, as represented in Minority Games, display generic properties,

independent from particular behavioralassumptions. Our answeris largelynegative: simple

varia-tions in the agents' learning algorithms, we show, yield important modications in the asymptotic

propertiesof thesystem.

Morespecically,weshowthatless sensitivityto marginalinformation,i.e., moreinertia inthe

learning algorithm entails improved long-run collective performances, although at the expense of

longer adjustment phases. Together, performance asymmetries across agents, as measured by the

variance(oranalogously)thesupportoftheearningsdistributionoverthepopulation,fallasinertia

inthe agents' behavior increases.

Ingeneral,somedegreesofrandomnesshelpinimprovingallocativeandinformationaleÆciencies

of the market - as dened above. The major eect of randomness is that it performs like a brake

on thesystemdynamics,thuspreventing groupsof playerswhodenselypopulatethestrategyspace

from actingina stronglycorrelated way and thusfromproducing\crowd" eects which worsen the

system performance 14

.

their strategiesonthebasisofall thegameoutcomesstartingfromthebeginningofthesimulation(however,seethe

Appendixforananalysisofthesystempropertieswhenatimedecayingfactorisintroduced).

14

Theintroductionofrandomnessinindividualbehaviorisindeedonlyoneofpossible waystomaintainbehavioral

heterogeneityinthepopulation. Forinstance,thesameeecthasbeenobtainedinDeCaraetal. (2000)bysubstituting

the \global"evaluationof strategiesonthesystem historyH witha\personal" evaluationinwhicheachagent uses

(15)

on dierent ecologiesof strategies (ascapturedbytheparameters z=2 m

=N and respectively).

Indeed, one of our major conclusions, which renes over previousresults in theMinority Game

literature, is that market eÆciency - in the complementary denitions proposed above - is only

achieved in correspondence of an \optimaldegree" of heterogeneity,whether in theagents' decision

behavior or in their underlyingsets of beliefs. Moreover, our results suggest, more rationality- as

approximatedherebyagreaterabilityoftheagentsto tracknovelenvironmentalinformation- may

welllower average performances (ananalogous result is obtained ina dierent setup byBrockand

Hommes (1989)).

The general sensitivityof system dynamics upon particular learningalgorithmsalso indicatesa

naturalwayforward,beyondtheexercisespresentedinthiswork,experimentingwithcognitivelyless

demanding learningrules. So,forexample, it wouldbe interesting to explore theproperties of pure

reinforcement learningwherebyagents update onlythe strategies they play. That would also set a

\zero-level" modelin termsof degrees ofrequired informationand cognitiveabilities- somewhat at

theextremeoppositetothemodelsstudiedsofarintheMinorityGameliterature. And,somewhere

inbetween, one might explore more rened learning models such as that in Easley and Rustichini

(1999).

Moreover, beyond thestrict setup ofMinorityGames sofar, theimpact of phenomena of social

imitationstillawaitstobestudied 15

. And,moregenerally,therobustnessoftheforegoingconclusions

ought to be checked in less\reduced form", institutionally richer models, such asarticial markets

of the type outlinedin Marengo and Tordjman (1996), Arthuret al. (1997) and Chiaromonteand

Dosi(1998) [e.g., Kirman (1999)fora review].

Finally,a complementarylineof inquiryregardstheanalysisofbehaviorsandlearningofhuman

subjects underexperimentalsettingsisomorphicto themarket interactionsformalized above.

Inthe lastresort,all these latter exercises, together withthe resultspresented hereought to be

considered as adding some pieces of evidence to the much broader eort aimed at identifying the

variables which determine a \universality class", if any, of market processes involving coordination

cum heterogeneity, as distinguishedfrom those characteristics which strictly depend upon specic

distributionsofbeliefsand learningrules. Wereourresultsconrmed infurtherstudies, theywould

add strength to the conjecture that eÆcient coordination might not stem from the adherence of

populations of agents to Nash equilibrium behaviors, but rather emerge out of persistent forms of

heterogeneityinbeliefsand behaviors withinthe population.

6 Appendix

Many authors, especially in the experimental literature [e.g., Erev and Roth, (1998)] add to the

description of thelearning process one more parameter, connected withthe idea that agents weigh

more the information they received in the recent history as compared to the one coming from far

back in the past. This parameter typically takes the form of a decay factor. If

i

(t) are the points

scoredbystrategyiat timetand (0<1) theinformationdecay factor,theupdatingrulefor

correlationamongagents.

15

(16)

q i (t+1)=q i (t)+ i (t) (15)

and theassociatedupdatingrulefortheprobabilities:

p i (t+1)=p i (t) e q i (t) P j p j (t)e qj(t) : (16)

The eect of introducingsuch a \memory leakage" is twofold: First, it puts an upperlimit to the

maximal strength any strategy could reach, namely 1=(1 ). Second, in presence of no

informa-tion ux,the equiprobability betweenstrategies is steadily restored. This eect implies that ifone

takes the limit ! 0 keeping the value of constant, the system will converge to a collection of

random agents. In turn, this impliesthat agents, loosely speaking, have to collect a larger amount

of informationbefore they startbehavingas aself-organizedsystem.

Theeectofintroducing\forgetting"inthelearningruleiseasilyunderstood: iftheagentsforget

more rapidlythan they learn they are always boundedto less eÆcient behavior. Indeed ,as can be

seenfromFig.13,ifthevalueofisdecreasedtheeÆciencyofthesystemisproportionallyreduced.

References

[1] Arthur,W.B., (1994),Inductivereasoningand boundedrationality,Am. Econ. ReviewPapers.

and Proc., vol. 84,406-412.

[2] Arthur,W.B.,J.H.Holland,B.LeBaron,R.PalmerandP.Taylor, (1997), AssetPricingunder

EndogenousExpectationsinan ArticialStockMarket, inArthur, W.B.,S.N. Durlauf,and D.

Lane(Eds.) TheEconomyas anEvolving Complex System II, Reading,Mass., AdisonWesley.

[3] Brock, W.A.and C.H. Hommes, (1998), Heterogeneous beliefs and routes to chaos ina simple

assetpricing model,Journalof Economic Dynamics andControl,vol.22, 1235-1274.

[4] Camerer,C.andT.Ho,(1999), ExperienceWeightedAttractionLearninginGames: AUnifying

Approach,Econometrica, 67,4, p.827.

[5] Cavagna, A., J. P. Garrahan, I. Giardina and D. Sherrington, (1999), A thermal model for

adaptivecompetition ina market, Phys.Rev. Lett., vol. 83

[6] Challet,D. and M. Marsili,(1999), Phase Transition and Symmetry Breaking in theMinority

Game, Phys.Rev. E,vol. 60

[7] Challet, D. and Y.-C. Zhang, (1997), Emergence of cooperation and organization inan

evolu-tionary game,PhysicaA, 226,407.

Challet,D.andY.-C.Zhang,(1998),OntheminorityGame: AnalyticalandNumericalStudies,

PhysicaA, 256,514.

[8] Chiaromonte,F.andG. Dosi,(1998),ModelingaDecentralizedAssetMarket: AnIntroduction

to the Financial \Toy Room", IIASA, Laxenburg, Austria, Interim Report (Available also as

(17)

A Behavioral Perspective, in: R. Jarrow et al (eds.), Handbooks in OR & MS, Amsterdam,

ElsevierScience.

[10] de Cara, M.A.R. and O. Pla and F. Guinea, (1999), Competition, eÆciency and collective

behaviourinthe"ElFarol" barmodel,The European PhysicalJournalB10.

[11] de Cara, M.A.R. and O. Pla and F. Guinea,(2000), Learning, competition and cooperation in

simplegame,The EuropeanPhysicalJournalB13.

[12] Easley,D.and A.Rustichini,(1999),Choice withoutbeliefs,Econometrica,67, pp.1157-1184

[13] Erev, I. and A. Roth, (1998), Predicting How People PlayGames: Reinforcement Learning in

Experimental Gameswith Unique,MixedStrategy Equilibria,Am. Econ. Review,88, 848-881.

[14] Fudenberg, D.andD.K. Levine,(1998), TheTheory ofLearninginGames,Cambridge, Mass.,

The MITPress.

[15] Johnson,N.F.andM. HartandP.M.Hui,(1999),Crowdeectsandvolatilityinacompetitive

market, PhysicaA, vol.269

[16] Kirman,A., (1993),Ants, RationalityandRecruitement,QuarterlyJournalof Economics, 108,

137-156.

[17] Kirman, A., (1999), Aggregate Activity and Economic Organization, Revue Europeenne des

Sciences Sociales,38,189-230.

[18] LeBaron,B.(2000), Agent-basedcomputationalnance: Suggestedreadingsandearlyresearch,

Journalof Economic Dynamics andControl, 24,pp.679-702.

[19] Marengo,L.and H. Tordjman, (1996),Speculation,Heterogeneityand Learning: A Simulation

Modelof Exchange RateDynamics,Kyklos,vol. 49, 407-437.

[20] Mayer, D.J.,Huyck, J., Battalio,R.and SoringT.(1992), Historiesroleincoordinating

decen-tralized allocation decision: laboratory evidence on repeated binary allocation games, Journal

of Political Economy,100, pp.292-316

[21] Ochs,J.(1990)Thecoordinationprobleminthecentralizedmarkets: Anexperiment,Quarterly

Journalof Economics, 105,pp.545-559.

[22] Ochs, J.(1995)Coordinationproblems, inJ.H.KagelandA.E. Roth(Eds.)Handbookof

exper-imental economics, Princeton: princetonUniversityPress, pp.195-251.

[23] Savit,R.,R.Manucaand R.Riolo,(1998), Adaptive Competition,MarketEÆciencyandPhase

Transition,Phys.Rev. Lett., vol. 82

[24] Schelling,T.C. (1978) ,Micromotives and Macrobehavior, New York, W.W. Norton and

Com-pany.

(18)

(19)

000 1 100 1

001 0 101 0

010 0 110 1

011 1 111 0

Table1: An exampleof strategywith m=3.

lowz highz

low highA.E., highI.E. partialA.E.,partialI.E.

high lowA.E.,high I.E. partialA.E.,low I.E.

Table 2: System properties: a summary (A.E. and I.E. stand for Allocative and Informational

(20)

0.1 1 0.001 0.01 0.1 1 10 100 (z) z= 2 m N N =101 3 3 3 3 3 3 3 3 3 3 3 N =301 + + + + + + + + + + + N =401 2 2 2 2 2 2 2 2 2 2 N =201 N =801 4 4 4 4 4 4 4 4 4 4

Figure1: The volatility(z) fors=2 anddierent valuesofN and m.

10 100 1000 10000 100000 (T) T m=2 m=3 m=4 m=5 m=6 m=7 m=8 m=10 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

Figure 2: Themean along therunlengthfordierent m's. The pointsareaveragesoverasample

(21)

2 2.5 3 3.5 4 4.5 5 100 1000 10000 100000 1e+06 m=6 3 3 3 333 33 2 2.5 3 3.5 4 4.5 5 100 1000 10000 100000 1e+06 m=6 3 3 3 333 33 + + + + + +++ 2 2.5 3 3.5 4 4.5 5 100 1000 10000 100000 1e+06 m=6 3 3 3 333 33 + + + + + +++ 2 2 2 2 2 2 2 2 2 2.5 3 3.5 4 4.5 5 100 1000 10000 100000 1e+06 m=6 3 3 3 333 33 + + + + + +++ 2 2 2 2 2 2 2 2 3 3.5 4 4.5 5 5.5 6 6.5 7 100 1000 10000 100000 1e+06 m=2 =1 3 3 333333 3 3 3.5 4 4.5 5 5.5 6 6.5 7 100 1000 10000 100000 1e+06 m=2 =1 3 3 333333 3 =:1 + + +++++ + 3 3.5 4 4.5 5 5.5 6 6.5 7 100 1000 10000 100000 1e+06 m=2 =1 3 3 333333 3 =:1 + + +++++ + =:01 2 2 222222 2 3 3.5 4 4.5 5 5.5 6 6.5 7 100 1000 10000 100000 1e+06 m=2 =1 3 3 333333 3 =:1 + + +++++ + =:01 2 2 222222 2 =:001 3 3.5 4 4.5 5 5.5 6 6.5 7 100 1000 10000 100000 1e+06 m=2 =1 3 3 333333 3 =:1 + + +++++ + =:01 2 2 222222 2 =:001 =:0001 4 4 4 4 4 4444 4 4.6 4.8 5 5.2 5.4 100 1000 10000 100000 1e+06 m=12 3 3 333333 4.6 4.8 5 5.2 5.4 100 1000 10000 100000 1e+06 m=12 3 3 333333 2 2 22 2 2 2 2 4.6 4.8 5 5.2 5.4 100 1000 10000 100000 1e+06 m=12 3 3 333333 2 2 22 2 2 2 2

Figure 3: along the run length T for dierent 's. The points are averages over a sample of 30

independentruns withatransienttime T

0 =T.

10000

100000

1ε+06

1ε+07

2

4

6

8

10

12 T(.1)

m

β=1

β=.2

β=.04

β=.01

β=.0025

Figure 4: A (rough)estimateof thetime ^

T required to reach thestableasymptotic valuewithan

(22)

2 4 6 8 10 12 14 0 2 4 6 8 10 12 (m) m =5 3 3 3 3 3 3 3 3 3 3 3 3 3 =1 + + + + + + + + + + + + + =:2 2 2 2 2 2 2 2 2 2 2 2 2 2 =:04 =:01 4 4 4 4 4 4 4 4 4 4 4 4 4

Figure 5: Volatility for s = 2, N = 101 and various m's and 's. The runs are performed by

doubling thetimelengthT untilthelasttwo valuesobtaineddierbylessthan1%.

2 3 4 5 6 7 8 0 2 4 6 8 10 12 (m) m =:0001 3 3 3 3 3 3 3 3 3 3 3 3 3 =:001 + + + + + + + + + + + + + =:01 2 2 2 2 2 2 2 2 2 2 2 2 2 =:1 =1 4 4 4 4 4 4 4 4 4 4 4 4 4

Figure 6: The volatility fors=2,N =101 andvariousm'sand 's. Therunsareperformedwith

a xedtime length T =T

0

=10000. When ! 0 the system approachesa collection of randomly

(23)

0 0.2 0.4 0.6 0.8 1 1.2 0.1 1 10 100 1000 1 m=2 3 3 3 3 3 3 3 3 3 3 m=3 + + + + + + + + + + m=4 2 2 2 2 2 2 2 2 2 2 m=5 m=6 4 4 4 4 4 4 4 4 4 4 m=8 ? ? ? ? ? ? ? ? ? ? m=10 b b b b b b b b b b

Figure 7: Variance of the distribution of over a sample of 50 independent runs. As becomes

smallthepointmm

0

maintainsa signicantlylargervariance.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 2 4 6 8 10 12 14 0k<2 l 1 =10 =:04 Figure8: Probabilityp(0jh l

)ofobtaininga0followingagivenbinarystringh

l

inthesystemhistory

for m = 3 and l = m+1 = 4. When is reduced the distribution \ attens" and any structure

(24)

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

0

2

4

6

8

10

12 S(l)

l

β=5

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

0

2

4

6

8

10

12 S(l)

l

β=5

β=1

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

0

2

4

6

8

10

12 S(l)

l

β=5

β=1

β=.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

0

2

4

6

8

10

12 S(l)

l

β=5

β=1

β=.2

β=.04

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

0

2

4

6

8

10

12 S(l)

l

β=5

β=1

β=.2

β=.04

β=.02

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

0

2

4

6

8

10

12 S(l)

l

β=5

β=1

β=.2

β=.04

β=.02

β=.005

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

0

2

4

6

8

10

12 S(l)

l

β=5

β=1

β=.2

β=.04

β=.02

β=.005

β=.0025

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

0

2

4

6

8

10

12 points/T

l

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

0

2

4

6

8

10

12 points/T

l

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

0

2

4

6

8

10

12 points/T

l

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

0

2

4

6

8

10

12 points/T

l

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

0

2

4

6

8

10

12 points/T

l

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

0

2

4

6

8

10

12 points/T

l

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

0

2

4

6

8

10

12 points/T

l

Figure 9: ConditionalentropyS(l) (left)and arbitrageopportunityA(l) (right)asafunctionof the

time depthl form=3<m

o .

0.7

0.75

0.8

0.85

0.9

0.95

1

0

2

4

6

8

10

12 S(l)

l

β=5

0.7

0.75

0.8

0.85

0.9

0.95

1

0

2

4

6

8

10

12 S(l)

l

β=5

β=1

0.7

0.75

0.8

0.85

0.9

0.95

1

0

2

4

6

8

10

12 S(l)

l

β=5

β=1

β=.2

0.7

0.75

0.8

0.85

0.9

0.95

1

0

2

4

6

8

10

12 S(l)

l

β=5

β=1

β=.2

β=.04

0.7

0.75

0.8

0.85

0.9

0.95

1

0

2

4

6

8

10

12 S(l)

l

β=5

β=1

β=.2

β=.04

β=.02

0.7

0.75

0.8

0.85

0.9

0.95

1

0

2

4

6

8

10

12 S(l)

l

β=5

β=1

β=.2

β=.04

β=.02

β=.005

0.7

0.75

0.8

0.85

0.9

0.95

1

0

2

4

6

8

10

12 S(l)

l

β=5

β=1

β=.2

β=.04

β=.02

β=.005

β=.0025

0.5

0.55

0.6

0.65

0.7

0.75

0

2

4

6

8

10

12 points/T

l

0.5

0.55

0.6

0.65

0.7

0.75

0

2

4

6

8

10

12 points/T

l

0.5

0.55

0.6

0.65

0.7

0.75

0

2

4

6

8

10

12 points/T

l

0.5

0.55

0.6

0.65

0.7

0.75

0

2

4

6

8

10

12 points/T

l

0.5

0.55

0.6

0.65

0.7

0.75

0

2

4

6

8

10

12 points/T

l

0.5

0.55

0.6

0.65

0.7

0.75

0

2

4

6

8

10

12 points/T

l

0.5

0.55

0.6

0.65

0.7

0.75

0

2

4

6

8

10

12 points/T

l

Figure10: ConditionalentropyS(l)(left)andarbitrageopportunityA(l)(right)asafunctionofthe

time depthl form=6>m

o .

(25)

0.4992

0.4994

0.4996

0.4998

0.5 0.5002

0.5004

0.5006

0.5008

0.501 0.43 0.44 0.45 0.46 0.47 0.48 0.49 0.5 0.51

m=3

0.46

0.47

0.48

0.49

0.5

0.51

0.52

0.53

0.54

0.55 0.38 0.4 0.42 0.44 0.46 0.48 0.5 0.52 0.54 0.56

m=5

0.42

0.44

0.46

0.48

0.5

0.52

0.54

0.56

0.58

0.6 0.38 0.4 0.420.440.460.48 0.5 0.520.540.560.58 0.6

m=6

0.465

0.47

0.475

0.48

0.485

0.49

0.495 0.4450.450.4550.460.4650.470.4750.480.4850.49

m=12

Figure 11: Plot,foreach player, ofthescoringrate ofhisbeststrategyagainsthisownwinningrate

fora populationof 101 players over30 independent runswith =:04.

0.32

0.34

0.36

0.38

0.4

0.42

0.44

0.46

0.48

0.5

0.52

0.1

1

10

100 1/β

m=3

0.38

0.4

0.42

0.44

0.46

0.48

0.5

0.52

0.54

0.56

0.1

1

10

100 1/β

m=5

0.35

0.4

0.45

0.5

0.55

0.6

0.1

1

10

100 1/β

m=6

0.43

0.44

0.45

0.46

0.47

0.48

0.49

0.1

1

10

100 1/β

m=12

Figure 12: Variance (rectangle) and support (straight line) of the scored points distribution in a

population of N = 101 with s = 2, over 30 independent runs. Notice that while in the high

simulations the distributions are similar in width for any m, when is reduced the low m region

(26)

2

3

4

5

6

10

100 1000

10000

100000

σ

log(T)

β=1

α=.99999

2

3

4

5

6

10

100 1000

10000

100000

σ

log(T)

β=1

α=.99999

α=.9999

2

3

4

5

6

10

100 1000

10000

100000

σ

log(T)

β=1

α=.99999

α=.9999

α=.999

2

3

4

5

6

10

100 1000

10000

100000

σ

log(T)

β=1

α=.99999

α=.9999

α=.999

α=.99

2

3

4

5

6

10

100 1000

10000

100000

σ

log(T)

β=1

α=.99999

α=.9999

α=.999

α=.99

α=.9

2

3

4

5

6

10

100 1000

10000

100000

σ

log(T)

β=.1

2

3

4

5

6

10

100 1000

10000

100000

σ

log(T)

β=.1

2

3

4

5

6

10

100 1000

10000

100000

σ

log(T)

β=.1

2

3

4

5

6

10

100 1000

10000

100000

σ

log(T)

β=.1

2

3

4

5

6

10

100 1000

10000

100000

σ

log(T)

β=.1

2

3

4

5

6

10

100 1000

10000

100000

σ

log(T)

β=.01

2

3

4

5

6

10

100 1000

10000

100000

σ

log(T)

β=.01

2

3

4

5

6

10

100 1000

10000

100000

σ

log(T)

β=.01

2

3

4

5

6

10

100 1000

10000

100000

σ

log(T)

β=.01

2

3

4

5

6

10

100 1000

10000

100000

σ

log(T)

β=.01

Figure 13: as a function of the run length T for dierent values of and . The simulations

are performed with m = 6 where a greater sensitivity of the transient time length to the learning