Adaptive Learning and Emergent Coordination in Minority Games 1
G. Bottazzi**, G. Devetag***, G. Dosi** 2
**S.AnnaSchoolof AdvancedStudies
via G. Carducci,40 Pisa,Italy
***Department ofComputer and Management Sciences
UniversityofTrento
Via Inama,5- 38100 Trento, Italy.
Abstract
Thework studies theproperties ofa coordinationgame in which agentsrepeatedlycompete
to be in the population minority. The game re ects some essential features of those economic
situations in which positiverewardsareassigned toindividuals whobehaveinoppositionto the
modalbehaviorinapopulation. Here wemodelagroupofheterogeneousagentswhoadaptively
learnand we investigatethe transientand long-runaggregatepropertiesof thesystemin terms
of bothallocativeand informationaleÆciency. Ourresultsshowthat,rst, thesystemlong-run
propertiesstrongly depend on thebehavioral learningrules adopted,and, second,adding noise
attheindividualdecisionlevelandhenceincreasingheterogeneityinthepopulationsubstantially
improveaggregate welfare, although at theexpenseof alonger adjustment phase. In fact, the
systemachievesin that wayahigherlevelof eÆciencycompared tothat attainablebyperfectly
rationalandcompletelyinformedagents.
JEL codes: G1,D8,C6
Keywords: minoritygame,coordination,speculation,adaptivelearning,marketeÆciency,
emer-gent properties.
1
CommentsonapreviousdraftbyCarsHommesendNicolasJonardaregratefullyacknowledged.
2
This work investigates the dynamic properties of minority games under dierent forms of
hetero-geneity amongstagents and dierentlearningprocedures.
Inthegame,rstintroducedbyChalletandZhang(1997, 1998),apopulationofNagents(where
N is an odd number) must each simultaneously and independently choose between two sides, say
0 and 1. The side chosen by the minority of the agents is the winning one - each gaining a xed
positive reward-,whileall memberson themajoritysideobtainanullpayo.
Formally,themodelis anN-person coordinationgame withmultipleasymmetric Nashequilibria
inpurestrategies,andauniquesymmetricmixedstrategyequilibrium.Inthefollowing,weassumea
largepopulation,andweinvestigatetheconditionsunderwhichrepeatedinteractionamongstplayers
causes some formof aggregateself-organization to emerge.
Thegame,notwithstandingitsextremesimplicity,doescapturesomebasicfeaturesofquiteafew
processesof socialand economic interactions wherebyorderlycoordinationrests upontheexistence
orthe emergence ofsome behavioralheterogeneity intherelevant population.
Some coordination problems of this kind have long been discussed in Schelling's seminal work
on those patterns of \sorting and mixing" where order at an aggregate levelmight ndits roots in
persistent microadjustments withinheterogeneous populations (Schelling, 1978). They also beara
closeresemblancetothe\ElFarol"ProblemdevisedbyBrianArthur(1994). Inthisgame,anumber
of agents must independentlydecide each night whether or not to attend a bar called \El Farol".
Each agent receives a positive utility from attending the bar as long as this is not too crowded.
Otherwise he prefers to stay home. Somewhat similar coordination problems are involved in the
experimentalmarketentry games[e.g., Ochs(1990), Meyeretal. (1992); Ochs(1995) andreferences
therein] wherea groupof N players must decideat each stage whether or notto enter one ormore
marketseach havingaxed capacity k,withk<N.
Finally,the model captures some basicfeatures of speculationon nancial markets - which has
beenindeedtheprimaryconcernofminority-gamemodelingsofar. Aclassicreferenceinthisrespect
is to Keynes' \Beauty Contest" metaphor, where the payo to an individual player does not stem
fromtheaccuracyoftheappreciationoftheintrinsicbeautyofvariouscontestcandidatesbutrather
from guessing theguesses oftheensembleof theother evaluators.
Strictly speaking, the beauty contest metaphor crucially involves positive feedback investment
strategies-inthelanguageoftechnicaltradinginnance-sincethepayoisbasedonamajorityrule,
whileminoritygames(MG, hereafter) fundamentallyaddress thoseaspectsof speculationdynamics
involvingactivitiesofarbitrageagainstaveragemarket behaviors. Thesenegative feedback strategies
characterize agents trying to infersome - actual orimagined - structure in thehistory of collective
interactions and, through that, trying to \beat the market" - that is, \beat the majority view"
-byarbitraging against it. In theliterature on chartistrules,thisroughly corresponds to contrarian
strategies [e.g., inthebehavioralnance literature,De Bondtand Thaler(1995)].
In this work we shall precisely address the collective outcomes of such interactions in the MG
setup. Are orderlyaggregatecoordinationand absenceof arbitrageopportunitiesgenericproperties
of minority-type marketprocesses,independently of any further specicationof microeconomic
be-haviors? Or,conversely,donerdetailsoftheecologyofagentspopulations(suchastheirsheersize)
PreviousresultsontheMGbasedonadaptiveagentswithdeterministicdecisionrules[e.g.,
Chal-let and Zhang(1997, 1998); Savitet al. (1998)] do suggestthat diverse degreesof market eÆciency
fundamentallydependupon some underlyingheterogeneityin the population. Here we explore the
more general case whereby heterogeneous agents, endowed with memories of variable length, are
allowed to adaptively learn. Interestingly, rst, eÆcient coordination turns out robustly to be an
emergent propertyrestingon an ecologyof diverse agentswho do notplayNashequilibrium
strate-gies. Second,weshowthatcollectiveeÆciencyisnotmonotoniceitherintherationalsophistication
of the agents norin the informationthey are able to access. Rather, again, itcrucially dependson
the ecologyof behaviorsoverthepopulation.
In Section 2 we brie y illustrate the basic features of the game, study the properties of its
equilibriaandintroduceanotionofadaptivestrategy. Section3discussestheroleofheterogeneityfor
collectivedynamicsundertheassumptionofstrategiesoverthepopulationandadaptivedeterministic
behaviors. In Section 4 we introduce a probabilistic learningmodel and study both the transient
and limit properties of the dynamics. In particular, we analyze the allocative and informational
eÆciencies - which we shall dene below - of the system under dierent degrees of computational
complexity of thepurported agents and dierentparameterizations ofthe learningrule. Ingeneral,
our results suggest that it is heterogeneity inthe populationrather than individual computational
abilitieswhichmainlyaccountsfortheobserved aggregateproperties.
2 The Minority Game
2.1 The Baseline Game-theoretic Framework
The minority game is played by a group N of players, where N must be an odd number. On each
periodof thestage game,each playermustchoose privatelyand independentlybetweentwo actions
orsides,say0 and 1. Thepayo
i
fori2f0;1g isthesame forall N players and isequal to
i = ( 1 if n i (N 1)=2 0 otherwise (1) where n i
is the number of players choosing side i. Each player is rewarded with a unitary payo
wheneverthe sidehe chooses happensto bechosen bytheminorityof theplayers,whileplayerson
the majoritysidegetnil.
It is easy to see that the game has N
(N 1)=2
asymmetric Nash equilibria in pure strategies, in
which exactly (N 1)=2 players choose either one of the two sides. Clearly, under pure-strategy
equilibrium play, the payos for the two \parties" are quite dierent. Players belonging to the
minority side are rewarded a xed positive payo, while those on the majority side earn nothing.
The pure strategy equilibria, hence, are not strict, because players on the majority side are just
indierent between sticking to equilibrium play and deviating. The game also presents a unique
symmetric mixed-strategy Nash equilibrium,in which each players selects the two sides with equal
probability 1
.
1
implied by the pure strategy Nash equilibria is likely to rule out simple precedent-based solutions
of the underlyingcoordination problem. The goodness of the achieved coordination can be easily
measured by theaverage sizeof the winningminority. The further thissizeis from (N 1)=2, the
higheris theamountof moneywhichis,soto speak, left on thetable,hence thelower theresulting
aggregate welfare. Note that under mixed strategy equilibrium play, individual expected payo is
equal to .5 on each period of the stage game, and the group payo follows a binomial distribution
with mean equal to N=2 and a variance of N=4. The measure of variance is indeed an equivalent
measure of the degree of eÆciency achieved in a population. The higher the variance, the higher
the magnitude of uctuations around the mixed strategy Nash equilibrium and the corresponding
aggregate welfare loss.
2.2 The Behavioral Foundations of the Minority Game: Adding Beliefs
Letusstartbydiscussingthenature ofthegameequilibriawhenplayerschoosetheiractions onthe
basisof idiosyncratic(i.e. agent-specic)beliefsaboutother players'aggregate behavior,and try to
exploit the latter to theiradvantage. This perspective, which amounts to consider agents endowed
withinductiveratherthedeductiverationality,hasbeenadoptedintheminorityliteratureincluding
the El Farol model (Arthur, 1994) and it is common to multi-agent accounts of nancial markets
[e.g., Arthuretal. (1997); Brock andHommes (1998); LeBaron(2000) and referencestherein].
One way to model beliefs in this setting, which has traditionally been used in the previous
simulationstudiesontheMG, isto endow players withsets ofadaptive strategies,wherea strategy
may be dened as a function mapping a particular sequence of observed past outcomes up to a
certain period with an action to choose in the next period. In the following we show indeed that
an explicitaccount of thisform of strategic behaviordoesnot change the setof the mixed strategy
Nash equilibriaof thegame.
Consider a group of N agents who play the MG with a payo function 2
as from Eq. 1. The
onlyinformationmadeavailableafter eachroundisthewinningside(0or1). We denethemarket
information as the (history) H of play, i.e., a binary string specifying which side has won in each
period of the stage game. We also dene a parameter m as the portion of the past history H that
players retain in memory. So, if m = 3, players' strategies will be based only on the outcomes
observed inthelast3 roundsof thegame.
Anadaptive strategy maybe denedasa prescriptionontheactionto take onthenextroundof
play(i.e.,to choose 0 or1) providedthat a particular history(i.e.,a particular sequence of m bits)
hasbeenobserveduptothatpoint. Forexample,inthecaseinwhichm=3,anexampleofstrategy
is showninTable1.
The history columns specify all the possible sequences of aggregate outcomes (i.e., of winning
sides) in the last m periods; the action columns specify which action to choose on the next round
incorrespondenceto each particularsequence observed 3
. Given a certainvalueof theparameterm,
2
Various modications of this payo functionhave been proposed by Challet and Zhang and by DeCara etal.
(1999)). We stick to the original one for its simplicity and because the essential features of the model are highly
insensitivetotheproposedvariations.
3
Noteagainthatthisdenitionofstrategydiersconsiderablyfromthenotionofstrategyascompleteplanofaction
thetotal numberof strategiesthat can begenerated isequalto 2 2
,correspondingto the2 2
ways
of assigningeitheraction to allthe 2 m
possiblebinarystringsof lengthm.
Given the foregoing denitionof strategies, we study, asa rst important benchmark,the
equi-libria associated with the \homogenous" game where all the players have access to the full set of
strategies. Moreover, ratherthanconsideringthedynamicsof thehistorystringh,generatedbythe
successive players' choices, we begin by focusing on the static case requiring that all the strings h
can appearwithequalprobability 4
. Underthishypothesis,despitethehighnumberofstrategies,we
showthatauniquemixedstrategyNashequilibriumexistsbywhichplayersassignequalprobability
to choosing either side, 1 and 0, regardless of past history. However, given the above denitionof
strategy, there are actually innite ways in which players may end up choosing sides with equal
probability.
In order to clarify the analysis that follows, let us introduce some notation. A pure strategy
s 2 S
l
is a mapping H
l
! f0;1g from the set of binary strings of length l to the set of actions.
Consequently,amixedstrategym=fx
;2S l j P 2S l x
=1gcanbethoughtasamapH
l
![0;1]
whichassociatesto each binarystringh2H
l theprobabilitym(h)= P 2H m x
s(h)thantheaction
following it be 1. Let
l
be the 2 2
l
dimensional mixed strategies simplex. For what follows, it
is important to notice that the actual mapping between the mixed strategies space and the space
2
l
[0;1]ofmaps fromH
l
to [0;1]ismanyto oneand surjective,i.e. therearemanywaysofbuilding
up the same mixed strategy starting from the pure strategies set and each mixed strategy can be
realizedin atleast one way.
Considera populationof N (odd)agents. If m 2 N
m
is a mixed strategy prole(the vector
composed bythe players mixed strategies) we denote the mixed strategy of player ias m
i
and the
vector of mixed strategies of all players except i as m
i
. The mixed strategy prole denes, for
eachbinarystring,thepopulationprobabilityofplaying1(theexpectedfrequencyofplayersplaying
action1) associatedto a given binarystring:
m(h)= 1 N N X n=1 X j2S l m j (h)= X j2S m P j (m)s j (h) (2) whereP j
(m) istheexpectedprobabilityofplayingstrategyj(thefrequencywithwhichthisstrategy
willbe played).
The payofunction inEq.1 denes theexpected payoof playeri given the past historyh and
the completemixedstrategyprole m:
U m i ;m (h)=m i (h)(1 m(h)) +m(h)(1 m i (h)) (3)
and we say that a mixed strategym
1
is superior to m
2
if, given the mixed strategy prole m, it is
U m 1 ;m (h) U m 2 ;m (h), 8h2H l
. This denitionallows one to discuss equilibriain a way similarto
what is doneinmore canonical populationgames.
TheoremThestrategy prolem isa Nashequilibriumi m
i
(h)=1=2, 8m
i
2m, 8h2H
l
Marengo andTordjman(1996).
4
This prescriptionis introduced for purposesof analyticaltractability andstems froma symmetryconsideration:
the players'mixed strategies assignprobabilityone halfto pickeitherside,1 and 0irrespectivelyof
the pasthistory,i.e. when allthe players areperfectlysymmetric randomplayers. Notice, however,
that the ways of realizing such a mixed strategy are innite. Hence, the actual number of mixed
strategy Nashequilibriais also innite.
Proof
In order to demonstrate the assertion we proceedbyidentifyingthe i-th player'sbest repliesto
a given mixed strategy prolem
i
forthe rest of thepopulation. Forthispurpose, it isconvenient
to expressthepayoinEq. 3isolatingthecontributionofthei-th player'sstrategyfromthat ofthe
rest of thepopulation:
U m i ;m i (h)= N +1 N m i (h)+ N 1 N m i (h) 2 N m i (h) 2 2(N 1) N m i (h)m i (h) (4) and m i
(h) hastobe chosen inorder to maximizetheforegoingexpression. The solutiondependson
the value ofm
i
(h)and thequadraticform of Eq. 4allows one to immediatelyndit. Dening
p (h)= N+1 2(N 1)m i (h) 4 (5)
the solutionreads
^ m i (h)= 8 > < > : 0 p (h)0 1 p (h)1 p (h) otherwise (6)
The same procedure can be repeated for each string h to obtain the best reply mixed strategy to
m
i
(h). In order to obtain theset of Nash equilibria,we have to nd strategies that belong to the
set ofmutualbestreplies,i.e. m^
i
(h)=m
i
(h), 8h2H
l
. InspectingEq. 6,itturnsoutthat thiscan
happen onlywhen 0<p
(h)<1 andthe followingconditionissatised
m i (h)= N+1 2(N 1)m i (h) 4 (7)
The solutionisthen m
i
(h)=1=2,implyingm^
i
(h)=1=2.
Q.E.D.
3 The Minority Game with Heterogeneous Agents
One of the interesting questions one may explore in the MG framework is whether and how the
presenceofheterogeneityamongplayers'beliefsmayimpactonthesystemdegreesofself-organization
around theNashequilibriaofthe game.
Inlinewithprevious simulationstudiesbyChalletand Zhang, one mayintroduceheterogeneity
in the populationvia a diversity of strategies. Given a certain value of m, each player is initially
endowed with a number s of strategies which are randomly drawn from a common pool. Hence,
although both m and s are identical over the population, heterogeneity arises from the random
initial strategyassignments.
Inthecourse ofplay,each active strategy iischaracterized byavalue q
i
(t), which indicatesthe
the choice of thesidewhich expostresultedthewinningside)areassigned one point each.
Given theinitial strategy assignment and theupdatingrulejust described, players' behavior at
each round ofthe stage game iscompletely deterministic,aseach agent picks,among thestrategies
he possesses, theone withthehighest numberof accumulatedpoints.
Notethat in such a framework,however, agents' heterogeneityis only introducedvia the initial
assignment of strategies to players and does not stem from dierent \personal histories". If, for
example, the same strategy i is initially assigned to two dierent players, the value of q
i
(t), which
determinesthestrengthofthat strategyattimetwillnecessarilybethesame forbothplayers,asit
willonlydependon thenumberoftimes thatstrategyhasmadea correctpredictionwhether itwas
actually played ornot.
In order to evaluate the population's performance, it is necessary to introduce a measure of
allocative eÆciency. A naturalcandidate isprovidedbytheaverage numberof players belongingto
thewinningparty,i.e. theaverage numberofpointsearnedbythewholepopulation 5
. Inaccordance
with the previousliterature, asa measure of allocative eÆciency we computea quantityassociated
to theforegoing one,namely themeansquared deviationfrom thehalfpopulation,. Let N be the
numberofagents, andN
0
(t)thenumberofagentschoosingside0attimet,theninarepeatedgame
of durationT themean squareddeviationis computedas
= 1 T T X =0 (N 0 () N 2 ) 2 : (8)
Clearly,the lowerthevalue of,the higherthesystemeÆciency.
Thedynamicprocesscanbeexpectedtodependtosomedegreeontheinitialstrategydistribution
andon theinitialgamehistory,whicharebothgeneratedrandomly. Therefore,inordertoeliminate
any dependence on initial conditions from our results and to focus only on asymptotically stable
states, in allthe simulationspresentedhere we appliedan averagingprocedureover50 independent
sample paths withrandomlygenerated initialhistoriesand strategydistributions.
In addition, at the beginning of each simulation the system was left to evolve for an initial
\adjustmentphase"oflengthT
0
inordertowashawayanypossibletransienteectsonthesubsequent
averagingprocedure. Thequantitiessoobtainedcanthusbeconsideredasymptoticpropertiesofthe
systemas longasT
0
and T arechosen highenoughasto provideagoodapproximationof thelimit
T !1.
The dependence of the volatility measure on N, m and s for the original minoritygame has
beenthoroughly investigatedinthe previoussimulation studiesandis summarizedin Fig.1 forthe
case s=2.
As noticed by Savit et al. (1998), the type of market regime is determined, at least in rst
approximation, by the ratio z = 2 m
=N: hence the curves for various N collapse if plotted in this
variable. In this respect, notice that even if the actual number of possible strategies is 2 2
m
, their
relativestrengthsarecompletelydenedintermofthefrequencyP(0jh
m
)withwhich,overahistory,
a0followsagivenm lengthstringh
m
. Andthere are2 m
ofsuchvariables. So,zcanbeinterpreted
asthe densityof agentsinthe strategyspace degrees-of-freedom.
5
RecallfromthepreviousSectionthatthisquantitymeasuresthedegreetowhichthepopulationisclosetoaNash
regime" occurs when z is large (the agent are sparse in the strategy space). Here the system can
hardly organize and its behavior can be described as a collection of random agents choosing their
side with a coin toss. In fact suppose the past history to be a given h
m
and suppose there are
N
d (h
m
) agents whose strategies prescribe dierent actions based on that history while there are
N 0 (h m ) and N 1 (h m
) agents whose strategies prescribe the same party (we restrict ourselvesto the
s = 2 case), respectively 0 and 1. If the agent in N
d
choose randomly, the variance is (h
m ) = N d (h m )=4+(N 0 (h m ) N 1 (h m )) 2
=4. The average over the possible h
m
will then give = N=4.
Notice that is shaped bytwo factors, namely a uctuation inthechoices of agentsable to choose
and a uctuation in the initial distribution of strategies. Note also that the allocative eÆciency in
thiscase (asmeasuredby) equalsthatof apopulation ofplayers playingthemixed strategyNash
equilibriumdened over thetwo actions0 and 1(cf. Section2.1).
The second regime could be called the\ineÆcient regime" for z <<1. Here theagents densely
populate the strategy space, and their actions are strongly correlated. This correlation leads to a
worsening of the overall performance dueto a \crowd" eect (Johnson et al. 1998): the agents in
fact are too similar to each other and they all tend to choose the same party on the basis of the
availableinformation.
The third regime for z 1 is where coordination produces a better-than-random performance.
Heretheagentsaredierentiatedenoughnottoyield\crowd"eects,butatthesametimesuÆciently
distributedoverthestrategy spacesoas notto producea random-like behavior.
TheliteratureontheMGinfact, hasmostlyfocusedonthecriticalityofthevaluez
c
where is
minimum,suggestingthatamajorchangeinthesystembehaviorhappenswhenthispointiscrossed
(Challet and Marsili,1999). Aswe willsee in the followingsections, thiscriticalitysurvivesto the
introductionof theprobabilisticlearningrules.
4 Adaptive Learning
4.1 The Model
What happensto the system properties when additional heterogeneity isintroduced inthe
popula-tion? In particular, what happens when agents are endowed with a probabilisticdecision rule? In
order to tackle the problem, in the following we investigate changes in the dynamics and
asymp-totic properties of a populationof agents playingtheMG as a functionof changes in thenature of
the agents' learningmodels. Hence, we leave unaltered thesetup previously described, and modify
onlythe way inwhich agents updatetheir strategies' relative strength. In particular, we adopt the
followingprobabilisticupdatingrule.
Recall the denitionof q
i
(t) as the total numberof points strategy i would have won if played
untiltimet. Then eachagent chooses among herstrategies followingtheprobabilitydistribution:
p i (t)= e q i (t) P j e q j (t) : (9)
wherethesum onj isoverallthestrategies possessedbytheplayer. Notethat, ingeneral, dierent
endow-it alsoincreases heterogeneity inthepopulation.
The choice of a stochastic learning model is well supported by the available experimental
evi-dence on adaptive behavior in games as wellas by evidence from psychology experiments. In fact,
most descriptive modelsof adaptive learning, whether belief-based orreinforcement-based, implya
probabilistic choice and updating of availableactions [cf. Erev and Roth (1998), Camerer and Ho
(1999)].
Theparametercanbeconsideredasasortofsensitivityofchoicetomarginalinformation: when
it is high,the agents are sensitive even to little dierences in thenotional score of their strategies.
Inthelimitfor !1theusualminoritygameruleisrecovered. Onthecontrary,forlowvaluesof
a great dierence in thestrategies' strengths is required in order to obtainsignicant dierences
inprobabilities.
The model indeed bears closesimilarities with a discrete timereplicator dynamics (see Weibull
(1995)). The connection is straightforwardifone looks at theprobabilityupdatingequation
associ-ated withEq. 9:
p i (t+1)=p i (t) e Æq i (t) P j p j (t)e Æq j (t) : (10) whereÆq i (t)=q i (t+1) q i
(t)arethepointswonbystrategyiattimet. Ifonethinksofacontinuous
process Æq i (t)=q_ i (t)Æt, where q_ i
(t)is the instantaneous\tness" of strategyi, then thecontinuous
time replicatordynamics equation isrecovered keepingonlythe rstterms inÆtexpansion:
_ p i (t) p i (t) =q_ i (t) X j p j (t)q_ j (t) (11) 4.2 Transient length
In everything that follows we will restrict our analysis to thecase N = 101 and s= 2 and we will
speakof theoptimalvalue formemorylengthm
o
withreference to the value of m which minimizes
underthisparameterchoice 6
.
Let us consider the problem of dening the correct values for T
0
and T in Eq. 8 above. The
central questionis: How longmustthe system be left evolving beforeit reaches theasymptotically
stable dynamics? Fig. 2 plotstheaverage value based on the\deterministic" (i.e. =+1) MG
as a function of the time lengthT over whichthis average is taken witha transient T
0
= T. As it
can be seen from the graph, thevalues used inthe literature on the MG are generally suÆcient to
obtaina predictioncorrect to fewpercentagepoints. However, two propertiesareworthnoticing:
Thesystemapproachestheasymptoticvaluefromabove,intuitivelysuggestingthatthesystem
\learns" overtimeto self-organize
For low values of m, in the \ineÆcient regime", and for high value of m, in the \random
regime",the systemreaches astable dynamicquitefast. Onthe contrary,forvaluesof m near
theoptimalvalue m
o
,thesystemtakesa longertimeto self-organize
6
Thechoicetosets=2isjustiedbythefactthatthesystemexhibitsthesamequalitativepropertiesforanys2,
highvaluesof thislearningruleapproachesthestandardone,andaccordingly,thetransientlength
issimilartotheonefoundinsuchcases. However,as decreases,thelengthgenerallyincreases. The
increaseismostdramaticforvaluesof mneartheoptimalvaluem
o
,andit progressivelydisappears
for higher values of m. The interpretation of such a result stems from the meaning of in terms
of the learningrule. Supposingnon trivialdynamicsfor m near m
o
, theparameter sets the time
scale onwhichsuch dynamicsare attained.
As an illustration, consider the following example: Let be r(t) = p
1 (t)=p
2
(t) the ratio of the
probabilities thatan agent associates to his two strategies, and q(t)=q
1 (t) q
2
(t)the dierence
intheirrespective strengths. From Eq.9 itfollowsthat r(t)=e q(t)
. Assumingthatthedierence
inthetwo strategiesperformanceholdsconstantovertime(anassumptionwhichisgenerallytruein
theinitial transientregime whereagents' behaviorisbasicallyrandom) we obtainq(t)t: hence,
from the equality above, a given dierence in probability is obtained at a time which is inversely
proportionalto .
Inordertoestimatethetimescaleoverwhichthesystemlong-runpropertiesareattained,weuse
the followingprocedure. Holding all the parameters and theinitial conditionsconstant, the system
volatilitycanbeexpressedasafunctionofboththetransientphaseduration,andofthetimelength
overwhichitisaveraged,i.e. =(T;T
0
). StartingfromareferencetimeT
r ,
7
wecomputethemean
volatilityprogressivelydoubling T and T
0
,and thusobtaining aseriesof values
n =(2 n T r ;2 n T r ).
When therelative variationj
n
n 1 j=
n
fallsbelowaxed threshold,we stopand take thelast
computedvalueof asanestimateofitsasymptoticvalue. Thecorrespondingtimelength ^
T()will
be an estimate of the time implied by the system to reach this asymptotic state. As can be seen
in Fig. 6 the increasein ^
T when is lowered is mainly concentrated around m
o
, with shapes that
suggest thepresence ofa regime discontinuity.
4.3 Allocative eÆciency
In order to analyze the asymptotic properties of (m) for dierent 's, we use the procedure just
describedregardingthecalculationof ^
T,i.e. weleavethesystemevolveuntil\stability"isreached 8
.
The simulationresults are plottedin Fig. 5. Interestingly, when decreases, the performance level
of thesystem generally increases. Moreover, such increaseis largerthe lower thevalueof m, and it
becomesnegligibleformm
0
. Theobservedbehaviorisconsistentwiththeideathatforhighvalues
of m, the system dynamics tends to be determined by the initial distribution of strategies among
players, whileplayers themselves have no opportunityto attain a higher performance by adjusting
their behavior. Recall that an increasing m means an increasing number of possible strategies over
whichplayersmayinitiallydraw. ForaxedN,the\ecology"ofdrawnstrategiesbecomesthinneras
comparedtothenotionallyavailableones. Thatphenomenon,itturnsout,preventsthesystemfrom
self-organizing. Note thatthis propertyisquite robustand largely independentfrom the particular
learningrule adopted. Our results, infact, are perfectlyin linewith previous simulation studies in
the highm region.
7
NotethatthechosenvalueforT0isirrelevantaslongasitissmallcomparedtothetypicaltimescale.
8
Letusemphasizethatthestablestatedoesnotimplypointconvergencetoanystatebutsimplylong-runstability
terms of aggregate eÆciency. In fact, when m issmall, the originallearningrule( =1)produces
a \crowd" eect (corresponding to large groups of agents choosing the same side) which, due to
homogeneityintheinitialstrategyendowments,preventsthesystemfromattainingahighdegreeof
eÆciency 9
. Onthe contrary, theintroductionof aprobabilisticlearningrulefor thestrategy choice
actslikeabrakethatdumpstheamplitudeofsuchcorrelated uctuations. Attheindividuallevel,this
correspondsto lower's,i.e. tohigherdegreesof\inertia",asagentsupdatetheirprobabilitiesmore
slowly. In other words,as decreases each agent behaves asifhe wasapplyinga sort of stochastic
ctitious playapproximation[e.g., Fudenbergand Levine(1998)] 10
,withan implicitassumptionof
stationarity on the distribution of other agents' choices. If the whole population shares the same
- as in the present model - the assumption is, in a way, self-fullling: a decrease in makes the
behavior of the populationas a whole change at a slower pace. A slower probability updating at
the individual level and the resulting more stable collective behavior, together, imply that is a
non increasing function of . In fact, if the system reaches a dynamical stability via an averaging
procedureoverthepastoutcomes,increasingthetimescaleoverwhichaveragingoccurscannotrule
outpreviously attainableequilibria.
However, note that in order to capture the long run eÆciency properties of the system for low
values of it is necessary to let the population play until stability is reached, according to the
procedure described above. Our results, in fact, were not captured by previous simulation studies
(Cavagna etal.,1999). Since thelatter wereall performedwitha xed timelengththeirconclusion
wasthat when is smallenough, the systembehaviorresemblesthe behavior of a randomsystem.
Thisndingwasinfactduetoboththeincreaseinthetransientlengthandthepurelyrandominitial
dynamics which occur when is decreased. Then, by xinga timelength, forsmallvaluesof the
simulations capture throughout the adjustment phase, and the system behavior perfectly mirrors
that of a collection of agents who choose at random (indeed our results of a xed time simulation
are plottedinFig. 6and areperfectlyinlinewiththe existingliterature 11
).
As can be seen in Fig. 5 the performance attainable in theMG via a collective organization of
agents withlimitedinformationand limitedabilitytochooseis,ingeneral,surprisinglyhigh. Inthe
followingwe showthatthelevelofeÆciencyachieved inthedoublelimit !0and m!0actually
equals that attainable with hypothetical perfectly informed and perfectly rational agents endowed
with agreater exibilityinchoice.
Consider for instance a collection of agents with the following characteristics: each agent is
assigned S = 2 strategies, and a vector of length 2 m
containing the probability p(h
m
) of playing
according to its rst strategy after theoccurrence of the historystring h
m
. Moreover, for each h
m ,
each agent knows the values of N
0 (h m ), N 1 (h m ) and N d (h m
) indicating respectively the number
of agents for which their strategies both prescribe to play 0, both to play 1, or to play dierently.
Assuming that the game structure and the amount of information available to agents is common
knowledgeandthattheagentsareperfectlyrational,theproblemcompletelyfactorizesand,foreach
9
Insomesense,onecaninterpretthe\crowd"eectasacollectiveformofoverreaction (Thaler,1993).
10
Notethatctitiousplayimpliesthataplayerbestrespondstotheobservedfrequencyoftheopponent'splay.
11
In otherwords, to discoverthe asymptoticproperties ofthe system, the limitsT ! 1 and ! 0have to be
m d m m (N 1 (h m ) N 0 (h m )) 2 p(h m )N d (h m ) (12)
i.e. making theaverage fraction of thepopulation choosinga given sideasnear to N=2 aspossible.
This choice willproducea volatility N
d
=4 =N=8 which is roughlysimilar to what obtained in
the simulation shown in Fig. 5 in the low m, low region 12
. However, note that, as from Fig.5,
these fully rational, fully informed players are (on average) \beaten" by a set of \self-organizing"
agents withmemorymm
o
,reaching anearly doubleeÆciency.
A nal remarkconcerns thevariance of thedistribution of asa function of forvarious m's,
as shown in Fig. 7: when decreases the variance of decreases for any m. However it remains
three times greater for m = m
o
suggesting a stronger dependence of the asymptotic performance
on the initial strategy assignment which the system is not able to wash out. That is, signicant
path-dependence eects arepresent.
4.4 Informational eÆciency
What wehave beencallingallocative eÆciencybasicallyhighlightsthecollectiveabilityof capturing
thepayoswhichthegamenotionallyallows. Acomplementaryissueregardstheinformational
eÆ-ciency ofthemarketprocess,i.e.,theextenttowhichthefuturesystemoutcomesareunpredictable,
or,inotherwords,theabsenceofanyarbitrageopportunities. Thus,letusanalyzetheinformational
content of the binary string H of successive winning sides. Let p(0jh
l
) be the probability that a 0
follows a givenstring h
l
ofall thepossible2 l
stringsoflengthl (as depictedinFig. 8).
For theoriginal \deterministic" MG theanalysis performed inSavit etal. leads to the
identi-cationof two regimes: a \partiallyeÆcient regime" form<m
o inwhichp(0jh l ):5;8h l aslongas
l m and inwhich no informational content is left forstrategies withmemory less orequal to the
one usedbythe agents; and an \ineÆcient regime" form >m
o
in which thedistribution of p(0jh
l )
isnot at,evenforlm,meaningtherearegoodstrategiesthatmight exploitthemarket signalto
obtaindierentialprots. Forl>m boththeregionsshowanon trivialdistributionp(0jh
l
) withan
increasingdegree of \ruggedness"asl increases. The eectof introducingsome degreeof behavioral
randomnessthroughtheparameter leadsto theobviouseect of reducingthe\ruggedness"ofthe
distribution ofp(0jh
l
)(see Fig 8).
In order to study the behavior of the system as changes we introduce two related quantities
which can be used to characterize the informational content of the time series. The rst is the
conditional entropyS(l) denedas:
S(l)= X h l p(h l ) X i2f0;1g p(ijh l )logp(ijh l ) (13)
wherethesummationisintendedoverallthepossiblestringsoflengthlandp(h
l
)isthefrequencyof
agivenstringinthesystemhistoryH. ThemaximumvalueS(l)=1isreachedfora atdistribution
p(0jh
l
)=:5;and itcorrespondsto theimpossibilityof forecasting(inprobability)thenext outcome
12
Weare assumingN =N1(hm) N0(hm)<Nd(hm). Noticethat forrandomagentsN p
(N)andNdN
market"leadsto thedenitionof asecond quantity A(l): A(l)= X h l p(h l )maxf p(0jh l );p(1jh l )g (14)
which isthe average fraction of points won by thebest strategyof memory l. Thisis a measure of
the rewardobtained bythe best arbitrageur with memoryl ( whereas ifno arbitrage opportunities
are present A(l) is equalto :5.)
Beforeanalyzingthebehaviorofthesequantitieswhen isvaried,letusbrie yconsiderasasort
ofidealbenchmarkthepropertiesofapopulationcomposedofperfectlyrational,perfectlyinformed,
agentswithcommonknowledgeofstrategydistributions. Notsurprisingly,inthesecircumstancesthe
problemfactorizesforeachpasthistoryandthedependenceon mdisappears. Thehistoryproduced
by such a system is a random series of 0 and 1. Indeed the number of agents choosing one side is
distributedaccording toabinomialaroundN=2 withdierentwidthsfordierenth
m
. Inparticular,
thismeansthat insuch an extremecasethe \memory"loses anypredicting power andno arbitrage
opportunityisleftforagentswithlongermemory,i.e. noresidualinformationisleftinthetimeseries
and thebehaviorof agentsmakesthemarketperfectlyeÆcient froman informationalpointof view.
In the lastresort, there is nothing to be learned from any history because agents know everything
from the startand coordinate their mixed strategies accordingly. Under thisassumption we expect
S 1 and A:5.
Short of such an idealcase wherethe market losesits coordinating role, because agents ex ante
generate the coordination \in their heads", let us consider, for example, a population of \random
agents". Here, due to the unbalance in the initial strategy endowments we expect a non trivial
structure toappear forevery l;thusS <1 andA>:5.
InFig. 9 we plot S(l) and A(l) forhistoriesgeneratedwith avalue ofm<m
0
,in the\partially
eÆcient regime". The eect of decreasing shows up when l >m butthe informationcontent for
high l is never completelyeliminated. The marketbecomes less eÆcient thelarger the time scale l
at whichit isobserved. Infact itcan be shownunder very generalassumptionsthatcertain strings
inthehistoryaremoreabundantthanothers(Savitetal.,1998)andthelong-rangecorrelation that
wasresponsibleforthe\crowd"eect athigh survivesasanontrivialstructureinp(0jh
l
)forhigh
l. Allthisappliesdespitethefactthatto anagent withmemorylm themarketappearsperfectly
eÆcient regardless ofthe value.
For values of m> m
0
in the \ineÆcient phase" the eect is in some sensereversed. As can be
seen in Fig. 10 the eect of decreasing is again negligible for l m but in the limit ! 0 the
curve becomes atforl>m. Thislastresult deservessome comments: the atnessinlm means
that no gain is achieved from inspectingthe time series witha very long memory l >>m because
no more arbitrage opportunities are open for a longer memory agent than the best possible agent
of memory m. The market can be considered to be, again, partially eÆcient in the sense that it
generates anupperboundonthemaximalattainablearbitragecapabilitywhichdoesnotdependon
the arbitrageurmemory.
The particular form of the conditional entropy in Fig. 10 suggests that in the limit ! 0
the system can be described as a Markov chain of memory m 13
. The result can be explained
13
relevant andinthelimit !0onlyinnitedierencesbecomerelevant. Puttingitanotherway,the
frequency of victories of the various strategies becomes constant implyingthe formationof a static
hierarchicalstructure inthestrategy space which at theendis responsibleofthe Markov character
of the resulting history. The appearance of \best strategies" in m >m
o
region is well revealed by
the plotof theaverage score bythebeststrategyversus theaverage pointscored bythe player(see
Fig. 11): a correlation infact appears betweenthe performance of a player and theperformanceof
its best strategy. Moreover, inthe mm
o
region a sub-population showingthe same kind of high
correlationcoexists withanother groupthat presentsno correlation, composedof agents possessing
two equally performingstrategies.
Conversely,onlyinthelowmregionnostrategyendsupbeingpreferabletoothersandnoplayer
is boundto lose dueonlyto hisinitial strategyendowment.
Notice, however, that equivalence between strategies does not necessarily imply equivalence in
the agents' performances. This is highlighted by the plot of the variances and the supports of the
pointsdistributionfordierentvaluesof andm inFig.12. Interestingly,onlyforlowmandlow
does equivalence instrategy performance implya relativelyuniform distribution of pointsover the
population. In the other parameter regions learning does not eliminateperformance heterogeneity
overthepopulation. Looselyspeaking,themarketself-organizesoveranecologyof\mentalmodels"
and players,entailingthelong-runcoexistence of relative \winners"and \suckers".
5 Conclusions and Outlook
One of the central questions we addressed in this work is the extent to which market dynamics
generated by arbitraging activities, as represented in Minority Games, display generic properties,
independent from particular behavioralassumptions. Our answeris largelynegative: simple
varia-tions in the agents' learning algorithms, we show, yield important modications in the asymptotic
propertiesof thesystem.
Morespecically,weshowthatless sensitivityto marginalinformation,i.e., moreinertia inthe
learning algorithm entails improved long-run collective performances, although at the expense of
longer adjustment phases. Together, performance asymmetries across agents, as measured by the
variance(oranalogously)thesupportoftheearningsdistributionoverthepopulation,fallasinertia
inthe agents' behavior increases.
Ingeneral,somedegreesofrandomnesshelpinimprovingallocativeandinformationaleÆciencies
of the market - as dened above. The major eect of randomness is that it performs like a brake
on thesystemdynamics,thuspreventing groupsof playerswhodenselypopulatethestrategyspace
from actingina stronglycorrelated way and thusfromproducing\crowd" eects which worsen the
system performance 14
.
their strategiesonthebasisofall thegameoutcomesstartingfromthebeginningofthesimulation(however,seethe
Appendixforananalysisofthesystempropertieswhenatimedecayingfactorisintroduced).
14
Theintroductionofrandomnessinindividualbehaviorisindeedonlyoneofpossible waystomaintainbehavioral
heterogeneityinthepopulation. Forinstance,thesameeecthasbeenobtainedinDeCaraetal. (2000)bysubstituting
the \global"evaluationof strategiesonthesystem historyH witha\personal" evaluationinwhicheachagent uses
on dierent ecologiesof strategies (ascapturedbytheparameters z=2 m
=N and respectively).
Indeed, one of our major conclusions, which renes over previousresults in theMinority Game
literature, is that market eÆciency - in the complementary denitions proposed above - is only
achieved in correspondence of an \optimaldegree" of heterogeneity,whether in theagents' decision
behavior or in their underlyingsets of beliefs. Moreover, our results suggest, more rationality- as
approximatedherebyagreaterabilityoftheagentsto tracknovelenvironmentalinformation- may
welllower average performances (ananalogous result is obtained ina dierent setup byBrockand
Hommes (1989)).
The general sensitivityof system dynamics upon particular learningalgorithmsalso indicatesa
naturalwayforward,beyondtheexercisespresentedinthiswork,experimentingwithcognitivelyless
demanding learningrules. So,forexample, it wouldbe interesting to explore theproperties of pure
reinforcement learningwherebyagents update onlythe strategies they play. That would also set a
\zero-level" modelin termsof degrees ofrequired informationand cognitiveabilities- somewhat at
theextremeoppositetothemodelsstudiedsofarintheMinorityGameliterature. And,somewhere
inbetween, one might explore more rened learning models such as that in Easley and Rustichini
(1999).
Moreover, beyond thestrict setup ofMinorityGames sofar, theimpact of phenomena of social
imitationstillawaitstobestudied 15
. And,moregenerally,therobustnessoftheforegoingconclusions
ought to be checked in less\reduced form", institutionally richer models, such asarticial markets
of the type outlinedin Marengo and Tordjman (1996), Arthuret al. (1997) and Chiaromonteand
Dosi(1998) [e.g., Kirman (1999)fora review].
Finally,a complementarylineof inquiryregardstheanalysisofbehaviorsandlearningofhuman
subjects underexperimentalsettingsisomorphicto themarket interactionsformalized above.
Inthe lastresort,all these latter exercises, together withthe resultspresented hereought to be
considered as adding some pieces of evidence to the much broader eort aimed at identifying the
variables which determine a \universality class", if any, of market processes involving coordination
cum heterogeneity, as distinguishedfrom those characteristics which strictly depend upon specic
distributionsofbeliefsand learningrules. Wereourresultsconrmed infurtherstudies, theywould
add strength to the conjecture that eÆcient coordination might not stem from the adherence of
populations of agents to Nash equilibrium behaviors, but rather emerge out of persistent forms of
heterogeneityinbeliefsand behaviors withinthe population.
6 Appendix
Many authors, especially in the experimental literature [e.g., Erev and Roth, (1998)] add to the
description of thelearning process one more parameter, connected withthe idea that agents weigh
more the information they received in the recent history as compared to the one coming from far
back in the past. This parameter typically takes the form of a decay factor. If
i
(t) are the points
scoredbystrategyiat timetand (0<1) theinformationdecay factor,theupdatingrulefor
correlationamongagents.
15
q i (t+1)=q i (t)+ i (t) (15)
and theassociatedupdatingrulefortheprobabilities:
p i (t+1)=p i (t) e q i (t) P j p j (t)e qj(t) : (16)
The eect of introducingsuch a \memory leakage" is twofold: First, it puts an upperlimit to the
maximal strength any strategy could reach, namely 1=(1 ). Second, in presence of no
informa-tion ux,the equiprobability betweenstrategies is steadily restored. This eect implies that ifone
takes the limit ! 0 keeping the value of constant, the system will converge to a collection of
random agents. In turn, this impliesthat agents, loosely speaking, have to collect a larger amount
of informationbefore they startbehavingas aself-organizedsystem.
Theeectofintroducing\forgetting"inthelearningruleiseasilyunderstood: iftheagentsforget
more rapidlythan they learn they are always boundedto less eÆcient behavior. Indeed ,as can be
seenfromFig.13,ifthevalueofisdecreasedtheeÆciencyofthesystemisproportionallyreduced.
References
[1] Arthur,W.B., (1994),Inductivereasoningand boundedrationality,Am. Econ. ReviewPapers.
and Proc., vol. 84,406-412.
[2] Arthur,W.B.,J.H.Holland,B.LeBaron,R.PalmerandP.Taylor, (1997), AssetPricingunder
EndogenousExpectationsinan ArticialStockMarket, inArthur, W.B.,S.N. Durlauf,and D.
Lane(Eds.) TheEconomyas anEvolving Complex System II, Reading,Mass., AdisonWesley.
[3] Brock, W.A.and C.H. Hommes, (1998), Heterogeneous beliefs and routes to chaos ina simple
assetpricing model,Journalof Economic Dynamics andControl,vol.22, 1235-1274.
[4] Camerer,C.andT.Ho,(1999), ExperienceWeightedAttractionLearninginGames: AUnifying
Approach,Econometrica, 67,4, p.827.
[5] Cavagna, A., J. P. Garrahan, I. Giardina and D. Sherrington, (1999), A thermal model for
adaptivecompetition ina market, Phys.Rev. Lett., vol. 83
[6] Challet,D. and M. Marsili,(1999), Phase Transition and Symmetry Breaking in theMinority
Game, Phys.Rev. E,vol. 60
[7] Challet, D. and Y.-C. Zhang, (1997), Emergence of cooperation and organization inan
evolu-tionary game,PhysicaA, 226,407.
Challet,D.andY.-C.Zhang,(1998),OntheminorityGame: AnalyticalandNumericalStudies,
PhysicaA, 256,514.
[8] Chiaromonte,F.andG. Dosi,(1998),ModelingaDecentralizedAssetMarket: AnIntroduction
to the Financial \Toy Room", IIASA, Laxenburg, Austria, Interim Report (Available also as
A Behavioral Perspective, in: R. Jarrow et al (eds.), Handbooks in OR & MS, Amsterdam,
ElsevierScience.
[10] de Cara, M.A.R. and O. Pla and F. Guinea, (1999), Competition, eÆciency and collective
behaviourinthe"ElFarol" barmodel,The European PhysicalJournalB10.
[11] de Cara, M.A.R. and O. Pla and F. Guinea,(2000), Learning, competition and cooperation in
simplegame,The EuropeanPhysicalJournalB13.
[12] Easley,D.and A.Rustichini,(1999),Choice withoutbeliefs,Econometrica,67, pp.1157-1184
[13] Erev, I. and A. Roth, (1998), Predicting How People PlayGames: Reinforcement Learning in
Experimental Gameswith Unique,MixedStrategy Equilibria,Am. Econ. Review,88, 848-881.
[14] Fudenberg, D.andD.K. Levine,(1998), TheTheory ofLearninginGames,Cambridge, Mass.,
The MITPress.
[15] Johnson,N.F.andM. HartandP.M.Hui,(1999),Crowdeectsandvolatilityinacompetitive
market, PhysicaA, vol.269
[16] Kirman,A., (1993),Ants, RationalityandRecruitement,QuarterlyJournalof Economics, 108,
137-156.
[17] Kirman, A., (1999), Aggregate Activity and Economic Organization, Revue Europeenne des
Sciences Sociales,38,189-230.
[18] LeBaron,B.(2000), Agent-basedcomputationalnance: Suggestedreadingsandearlyresearch,
Journalof Economic Dynamics andControl, 24,pp.679-702.
[19] Marengo,L.and H. Tordjman, (1996),Speculation,Heterogeneityand Learning: A Simulation
Modelof Exchange RateDynamics,Kyklos,vol. 49, 407-437.
[20] Mayer, D.J.,Huyck, J., Battalio,R.and SoringT.(1992), Historiesroleincoordinating
decen-tralized allocation decision: laboratory evidence on repeated binary allocation games, Journal
of Political Economy,100, pp.292-316
[21] Ochs,J.(1990)Thecoordinationprobleminthecentralizedmarkets: Anexperiment,Quarterly
Journalof Economics, 105,pp.545-559.
[22] Ochs, J.(1995)Coordinationproblems, inJ.H.KagelandA.E. Roth(Eds.)Handbookof
exper-imental economics, Princeton: princetonUniversityPress, pp.195-251.
[23] Savit,R.,R.Manucaand R.Riolo,(1998), Adaptive Competition,MarketEÆciencyandPhase
Transition,Phys.Rev. Lett., vol. 82
[24] Schelling,T.C. (1978) ,Micromotives and Macrobehavior, New York, W.W. Norton and
Com-pany.
000 1 100 1
001 0 101 0
010 0 110 1
011 1 111 0
Table1: An exampleof strategywith m=3.
lowz highz
low highA.E., highI.E. partialA.E.,partialI.E.
high lowA.E.,high I.E. partialA.E.,low I.E.
Table 2: System properties: a summary (A.E. and I.E. stand for Allocative and Informational
0.1 1 0.001 0.01 0.1 1 10 100 (z) z= 2 m N N =101 3 3 3 3 3 3 3 3 3 3 3 N =301 + + + + + + + + + + + N =401 2 2 2 2 2 2 2 2 2 2 N =201 N =801 4 4 4 4 4 4 4 4 4 4
Figure1: The volatility(z) fors=2 anddierent valuesofN and m.
10 100 1000 10000 100000 (T) T m=2 m=3 m=4 m=5 m=6 m=7 m=8 m=10 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
Figure 2: Themean along therunlengthfordierent m's. The pointsareaveragesoverasample
2 2.5 3 3.5 4 4.5 5 100 1000 10000 100000 1e+06 m=6 3 3 3 333 33 2 2.5 3 3.5 4 4.5 5 100 1000 10000 100000 1e+06 m=6 3 3 3 333 33 + + + + + +++ 2 2.5 3 3.5 4 4.5 5 100 1000 10000 100000 1e+06 m=6 3 3 3 333 33 + + + + + +++ 2 2 2 2 2 2 2 2 2 2.5 3 3.5 4 4.5 5 100 1000 10000 100000 1e+06 m=6 3 3 3 333 33 + + + + + +++ 2 2 2 2 2 2 2 2 3 3.5 4 4.5 5 5.5 6 6.5 7 100 1000 10000 100000 1e+06 m=2 =1 3 3 333333 3 3 3.5 4 4.5 5 5.5 6 6.5 7 100 1000 10000 100000 1e+06 m=2 =1 3 3 333333 3 =:1 + + +++++ + 3 3.5 4 4.5 5 5.5 6 6.5 7 100 1000 10000 100000 1e+06 m=2 =1 3 3 333333 3 =:1 + + +++++ + =:01 2 2 222222 2 3 3.5 4 4.5 5 5.5 6 6.5 7 100 1000 10000 100000 1e+06 m=2 =1 3 3 333333 3 =:1 + + +++++ + =:01 2 2 222222 2 =:001 3 3.5 4 4.5 5 5.5 6 6.5 7 100 1000 10000 100000 1e+06 m=2 =1 3 3 333333 3 =:1 + + +++++ + =:01 2 2 222222 2 =:001 =:0001 4 4 4 4 4 4444 4 4.6 4.8 5 5.2 5.4 100 1000 10000 100000 1e+06 m=12 3 3 333333 4.6 4.8 5 5.2 5.4 100 1000 10000 100000 1e+06 m=12 3 3 333333 2 2 22 2 2 2 2 4.6 4.8 5 5.2 5.4 100 1000 10000 100000 1e+06 m=12 3 3 333333 2 2 22 2 2 2 2
Figure 3: along the run length T for dierent 's. The points are averages over a sample of 30
independentruns withatransienttime T
0 =T.
10000
100000
1ε+06
1ε+07
2
4
6
8
10
12
T(.1)
m
β=1
β=.2
β=.04
β=.01
β=.0025
Figure 4: A (rough)estimateof thetime ^
T required to reach thestableasymptotic valuewithan
2 4 6 8 10 12 14 0 2 4 6 8 10 12 (m) m =5 3 3 3 3 3 3 3 3 3 3 3 3 3 =1 + + + + + + + + + + + + + =:2 2 2 2 2 2 2 2 2 2 2 2 2 2 =:04 =:01 4 4 4 4 4 4 4 4 4 4 4 4 4
Figure 5: Volatility for s = 2, N = 101 and various m's and 's. The runs are performed by
doubling thetimelengthT untilthelasttwo valuesobtaineddierbylessthan1%.
2 3 4 5 6 7 8 0 2 4 6 8 10 12 (m) m =:0001 3 3 3 3 3 3 3 3 3 3 3 3 3 =:001 + + + + + + + + + + + + + =:01 2 2 2 2 2 2 2 2 2 2 2 2 2 =:1 =1 4 4 4 4 4 4 4 4 4 4 4 4 4
Figure 6: The volatility fors=2,N =101 andvariousm'sand 's. Therunsareperformedwith
a xedtime length T =T
0
=10000. When ! 0 the system approachesa collection of randomly
0 0.2 0.4 0.6 0.8 1 1.2 0.1 1 10 100 1000 1 m=2 3 3 3 3 3 3 3 3 3 3 m=3 + + + + + + + + + + m=4 2 2 2 2 2 2 2 2 2 2 m=5 m=6 4 4 4 4 4 4 4 4 4 4 m=8 ? ? ? ? ? ? ? ? ? ? m=10 b b b b b b b b b b
Figure 7: Variance of the distribution of over a sample of 50 independent runs. As becomes
smallthepointmm
0
maintainsa signicantlylargervariance.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 2 4 6 8 10 12 14 0k<2 l 1 =10 =:04 Figure8: Probabilityp(0jh l
)ofobtaininga0followingagivenbinarystringh
l
inthesystemhistory
for m = 3 and l = m+1 = 4. When is reduced the distribution \ attens" and any structure
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
0
2
4
6
8
10
12
S(l)
l
β=5
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
0
2
4
6
8
10
12
S(l)
l
β=5
β=1
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
0
2
4
6
8
10
12
S(l)
l
β=5
β=1
β=.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
0
2
4
6
8
10
12
S(l)
l
β=5
β=1
β=.2
β=.04
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
0
2
4
6
8
10
12
S(l)
l
β=5
β=1
β=.2
β=.04
β=.02
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
0
2
4
6
8
10
12
S(l)
l
β=5
β=1
β=.2
β=.04
β=.02
β=.005
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
0
2
4
6
8
10
12
S(l)
l
β=5
β=1
β=.2
β=.04
β=.02
β=.005
β=.0025
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
0
2
4
6
8
10
12
points/T
l
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
0
2
4
6
8
10
12
points/T
l
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
0
2
4
6
8
10
12
points/T
l
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
0
2
4
6
8
10
12
points/T
l
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
0
2
4
6
8
10
12
points/T
l
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
0
2
4
6
8
10
12
points/T
l
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
0
2
4
6
8
10
12
points/T
l
Figure 9: ConditionalentropyS(l) (left)and arbitrageopportunityA(l) (right)asafunctionof the
time depthl form=3<m
o .
0.7
0.75
0.8
0.85
0.9
0.95
1
0
2
4
6
8
10
12
S(l)
l
β=5
0.7
0.75
0.8
0.85
0.9
0.95
1
0
2
4
6
8
10
12
S(l)
l
β=5
β=1
0.7
0.75
0.8
0.85
0.9
0.95
1
0
2
4
6
8
10
12
S(l)
l
β=5
β=1
β=.2
0.7
0.75
0.8
0.85
0.9
0.95
1
0
2
4
6
8
10
12
S(l)
l
β=5
β=1
β=.2
β=.04
0.7
0.75
0.8
0.85
0.9
0.95
1
0
2
4
6
8
10
12
S(l)
l
β=5
β=1
β=.2
β=.04
β=.02
0.7
0.75
0.8
0.85
0.9
0.95
1
0
2
4
6
8
10
12
S(l)
l
β=5
β=1
β=.2
β=.04
β=.02
β=.005
0.7
0.75
0.8
0.85
0.9
0.95
1
0
2
4
6
8
10
12
S(l)
l
β=5
β=1
β=.2
β=.04
β=.02
β=.005
β=.0025
0.5
0.55
0.6
0.65
0.7
0.75
0
2
4
6
8
10
12
points/T
l
0.5
0.55
0.6
0.65
0.7
0.75
0
2
4
6
8
10
12
points/T
l
0.5
0.55
0.6
0.65
0.7
0.75
0
2
4
6
8
10
12
points/T
l
0.5
0.55
0.6
0.65
0.7
0.75
0
2
4
6
8
10
12
points/T
l
0.5
0.55
0.6
0.65
0.7
0.75
0
2
4
6
8
10
12
points/T
l
0.5
0.55
0.6
0.65
0.7
0.75
0
2
4
6
8
10
12
points/T
l
0.5
0.55
0.6
0.65
0.7
0.75
0
2
4
6
8
10
12
points/T
l
Figure10: ConditionalentropyS(l)(left)andarbitrageopportunityA(l)(right)asafunctionofthe
time depthl form=6>m
o .
0.4992
0.4994
0.4996
0.4998
0.5
0.5002
0.5004
0.5006
0.5008
0.501
0.43 0.44 0.45 0.46 0.47 0.48 0.49 0.5 0.51
m=3
0.46
0.47
0.48
0.49
0.5
0.51
0.52
0.53
0.54
0.55
0.38 0.4 0.42 0.44 0.46 0.48 0.5 0.52 0.54 0.56
m=5
0.42
0.44
0.46
0.48
0.5
0.52
0.54
0.56
0.58
0.6
0.38 0.4 0.420.440.460.48 0.5 0.520.540.560.58 0.6
m=6
0.465
0.47
0.475
0.48
0.485
0.49
0.495
0.4450.450.4550.460.4650.470.4750.480.4850.49
m=12
Figure 11: Plot,foreach player, ofthescoringrate ofhisbeststrategyagainsthisownwinningrate
fora populationof 101 players over30 independent runswith =:04.
0.32
0.34
0.36
0.38
0.4
0.42
0.44
0.46
0.48
0.5
0.52
0.1
1
10
100
1/β
m=3
0.38
0.4
0.42
0.44
0.46
0.48
0.5
0.52
0.54
0.56
0.1
1
10
100
1/β
m=5
0.35
0.4
0.45
0.5
0.55
0.6
0.1
1
10
100
1/β
m=6
0.43
0.44
0.45
0.46
0.47
0.48
0.49
0.1
1
10
100
1/β
m=12
Figure 12: Variance (rectangle) and support (straight line) of the scored points distribution in a
population of N = 101 with s = 2, over 30 independent runs. Notice that while in the high
simulations the distributions are similar in width for any m, when is reduced the low m region
2
3
4
5
6
10
100
1000
10000
100000
σ
log(T)
β=1
α=.99999
2
3
4
5
6
10
100
1000
10000
100000
σ
log(T)
β=1
α=.99999
α=.9999
2
3
4
5
6
10
100
1000
10000
100000
σ
log(T)
β=1
α=.99999
α=.9999
α=.999
2
3
4
5
6
10
100
1000
10000
100000
σ
log(T)
β=1
α=.99999
α=.9999
α=.999
α=.99
2
3
4
5
6
10
100
1000
10000
100000
σ
log(T)
β=1
α=.99999
α=.9999
α=.999
α=.99
α=.9
2
3
4
5
6
10
100
1000
10000
100000
σ
log(T)
β=.1
2
3
4
5
6
10
100
1000
10000
100000
σ
log(T)
β=.1
2
3
4
5
6
10
100
1000
10000
100000
σ
log(T)
β=.1
2
3
4
5
6
10
100
1000
10000
100000
σ
log(T)
β=.1
2
3
4
5
6
10
100
1000
10000
100000
σ
log(T)
β=.1
2
3
4
5
6
10
100
1000
10000
100000
σ
log(T)
β=.01
2
3
4
5
6
10
100
1000
10000
100000
σ
log(T)
β=.01
2
3
4
5
6
10
100
1000
10000
100000
σ
log(T)
β=.01
2
3
4
5
6
10
100
1000
10000
100000
σ
log(T)
β=.01
2
3
4
5
6
10
100
1000
10000
100000
σ
log(T)
β=.01
Figure 13: as a function of the run length T for dierent values of and . The simulations
are performed with m = 6 where a greater sensitivity of the transient time length to the learning