The principle of least cognitive action

17  Download (0)

Full text

(1)

Contents lists available atScienceDirect

Theoretical

Computer

Science

www.elsevier.com/locate/tcs

The

principle

of

least

cognitive

action

Alessandro Betti

a

, Marco Gori

b

,

aDepartmentofPhysics,UniversityofPisa,Italy

bDepartmentofInformationEngineeringandMathematics,UniversityofSiena,Italy

a

r

t

i

c

l

e

i

n

f

o

a

b

s

t

r

a

c

t

Articlehistory: Received3January2015

Receivedinrevisedform17June2015 Accepted19June2015

Availableonline2July2015 Keywords:

BG-brackets Lifelong learning Naturallearningtheory On-linelearning Leastcognitiveaction

By and large,the interpretation of learning as acomputational process taking place in bothhumansand machines isprimarily provided intheframework ofstatistics. Inthis paper, wepropose aradically differentperspective inwhichthe emergence oflearning is regardedas theoutcome oflaws ofnature that governtheinteractions ofintelligent agentswiththeirownenvironment.Weintroduceanaturallearningtheory based onthe principleofleastcognitiveaction,whichisinspiredtotherelatedmechanicalprinciple,and tothe Hamiltonianframework formodelingthemotionofparticles.The introductionof the kineticand ofthe potential energy leadstoa surprisinglynatural interpretation of learningasadissipativeprocess.Thekineticenergyreflectsthetemporalvariationofthe synaptic connections, while the potential energyis apenalty that describes the degree of satisfactionof the environmental constraints. The theorygives a picture of learning intermsoftheenergybalancingmechanisms,wherethenovelnotionsofboundaryand barteringenergiesareintroduced.Finally,as anexampleofapplicationofthetheory,we showthatthesupervisedmachinelearningschemecanbeframedintheproposedtheory and, in particular, we show that the Euler–Lagrange differential equations of learning collapsetotheclassicgradientalgorithmonthesupervisedpairs.

©2015ElsevierB.V.All rights reserved.

1. Introduction

Machine Learningis ata lively crossroadof disciplines,where theexploration ofneuroscience and cognitive psychol-ogymeetscomputational modelsmostlybasedon statisticalfoundations.Forthesemodels toberealistic,one istypically concerned with the acquisition of learning skills on a sample oftraining data,that is sufficientlylarge to be consistent witha statisticallyrelevanttest set.However, inmostchallenging andinteresting learning taskstakingplace inhumans, theunderlyingcomputationalprocessesdo notseemtooffersuchaneatidentificationofthetrainingset.Astimegoesby, humans reactsurprisinglywell to newstimuli,while keepingpastacquiredskills,whichseems tobe hard toreach with nowadaysintelligentagents.Thissuggestsustolookforalternativefoundationsoflearning,whicharenotnecessarilybased onstatisticalmodelsofthewholeagentlife.

Inthispaper,weinvestigatetheemergenceoflearningastheoutcomeoflawsofnaturethatgoverntheinteractionsof intelligentagentswiththeirownenvironment,regardlessoftheirnature.Theunderlyingprincipleisthattheacquisitionof cognitiveskillsbylearningobeysinformation-basedlawsontheseinteractions,whichholdregardlessofbiology.Inthisnew perspective,inparticular,weintroduceanaturallearningtheory aimedatdiscoveringthefundamentaltemporally-embedded

*

Correspondingauthor.

E-mailaddress:marco@dii.unisi.it(M. Gori). http://dx.doi.org/10.1016/j.tcs.2015.06.042 0304-3975/©2015ElsevierB.V.All rights reserved.

(2)

mechanismsofenvironmentalinteraction.Whiletheroleoftime hasbeenquiterelevantinmachinelearning(seee.g.the HiddenMarkovModels),insomechallengingproblems,likecomputervision,thetemporaldimensionseemstoplayquitea minorrole.Whatcouldhumanvisionhadbeeninaworldofvisualinformationwithshuffledframes?Anycognitiveprocess aimed atextractingsymbolicinformationfromimagesthat are notframesof atemporallycoherent visualstream would have beenextremelyharder thanin ourvisual experience [1].More thanlooking forsmartalgorithmic solutions todeal withtemporalcoherence,inthispaperwerelyontheideaofregardinglearningasaprocesswhichisdeeplyinterwound withtime,justlikeinmostlawsofnature.

Inordertoderivethelawsoflearning,weintroducetheprincipleofleastcognitiveaction,whichisinspiredtotherelated mechanicalprinciple,andtotheHamiltonianframeworkformodelingthemotionofparticles.Unlikemechanics,however, the cognitiveactionthatwe defineisinfacttheobjectivetobe minimized,morethana functionalforwhichto discover a stationary point. Inour learning framework, thisduality isbased on a proper introductionof the “kinetic”andof the “potentialenergy,”thatleadstoasurprisinglynaturalinterpretationoflearning asadissipativeprocess.Thekineticenergy reflectsthetemporalvariationofthesynapticconnections,whilethepotentialenergyisapenaltythatdescribesthedegree of satisfaction ofthe environmental constraints.The theory gives a picture oflearning interms ofthe energy balancing mechanisms, wherethe novelnotions ofboundaryandbarteringenergies are introduced. Thesenewenergies arise mostly to modelcomplex environmentalinteractions of theagent, that are nicelycapturedby means oftime-variant high-order differential operators.WhenpairingthemwiththenovelnotionofBG-bracket,weshowhow intelligentprocessescanbe understood in terms ofenergy balance; learning turns out be a dissipative process which reduces the potential energy by moving theconnection weights,thus producingakinetic energy.However, theenergybalancing doesinvolvealsothe boundary and barteringenergies. The former arisesbecause of theenergy exchange atthe beginning andat the endof thetemporalhorizon, whilethelast oneisdeeplyconnectedwiththeenergyexchangeduring theagent’slife.Itisworth mentioningthatthisenergybalance,whichhasanicequalitativeinterpretation,isinfacttheformaloutcomeofthemain theoretical results givenin the paper. In particular, the proposed framework incorporates classic mechanics as a special case,whileitopensthedoorstoin-depthanalysesinanycontextinwhichthereisanexplicittemporaldependenceinthe Lagrangianofthesystem.

Whilethepaperfocusesonlawsoflearningthatareindependentofthenatureoftheintelligentagent,itisshownthat theselawsalsoofferageneralframeworktoderiveclassicon-linemachinelearningalgorithms.Inparticular,weshowthat the supervisedmachine learning scheme canbe framed in theproposed theory, andthat theEuler–Lagrange differential equationsoflearningderived inthepaperdocollapsetotheclassicgradientalgorithm.Interestingly,thetheoryprescribes thetime-variantlearningrate,whicharisesfromthegivenvariationalformulation.

The paperis organized as follows.In the next section, we introduce naturalprinciples oflearning and, in particular, the notion of cognitive action. Its minimization, which leads to the laws of learning, is described in Section 3. In Sec-tion 4 we proposea dissipativeHamiltonian framework oflearning, which,in Section 5,iscompleted by the analysisof the BG-brackets. Section 6relies on the previous results to provide an interpretation of learning dynamics by means of energybalance.InSection7,weshow howthelawsoflearningcan beconvertedtothegradientalgorithm.Finally,some conclusionsaredrawninSection8.

2. Naturalprinciplesoflearning

Thenotionoftimeisubiquitousinlawsofnature.Surprisinglyenough,moststudiesonmachinelearninghaverelegated time to the relatednotionof iterationstep.1 From one sidethe connection issound, since itinvolves thecomputational effort,thatisalsoobservedinnature.Fromtheotherside,whiletimehasatrulycontinuousnature,thetimediscretization typicalinfieldslikecomputervisionseemstogiveuptothechallengeofconstructingatrulytheoryoflearning.Thispaper presentsanapproachtolearningthatincorporatestimeinitstrulycontinuousstructure,soastheevolutionoftheweights of theneural synapses follows equationsthat resemble lawsof physics. Weconsider environmentalinteractions that are modeled under the recentlyproposed framework of learningfromconstraints [2]. Inparticular, we focus attentionon the caseinwhicheachconstraintoriginatesalossfunction,thatisexpectedtotakeonsmallvaluesincaseofsoft-satisfaction. Interestingly,thankstotheadoptionofthet-normtheory[3],lossfunctionscanbeconstructedthatcanalsorepresentthe satisfactionoflogicconstraints.

A lot of emphasis in machine learning has been on supervised learning where we are given the collection

L

=

{(

,

u

(

)),

}

κ∈N of supervisedpairs

(

u

(

),

)

over thetemporal sequence

{

}

κ∈N.At a givent, the pairs

(

u

(

),

sk

)

thathavebecomeavailablearedenotedby



L

t

:= {

κ

∈ N :

t

}

.Weassumethatthosedataarelearnedbyafeedforward neuralnetwork definedbythefunction

f

(

·, ·) : R

n

× R

d

→ R

D

,

soastheinputu

(

t

)

ismappedto y

(

t

)

=

f

(

w

(

t

),

u

(

t

))

.Classicsupervisedlearningcanbenaturallymodeledbyintroducing a loss function L

(

·,

·)

: R

2

→ R

along with the given training set

L

. Examples of different loss functionscan be found

(3)

Table 1

Linksbetweenthenaturallearningtheoryandclassicalmechanics. Natural Learning TheoryMechanics Remarks

wiqi Weightsareinterpretedasgeneralizedcoordinates. ˙

wiq˙i Weightsvariationsareinterpretedasgeneralizedvelocities.

υipi Theconjugatemomentumtotheweightsisdefinedbyusingthemachinery ofLegendretransforms.

A[w]S[q] Thecognitiveactionisthedualoftheactioninmechanics.

F(t,w,w˙)L(t,q,q˙) TheLagrangianF tisassociatedwiththeclassicLagrangianL inmechanics. H(t,w,υ)H(t,q,p) Whenusingw andυ,wecandefinetheHamiltonian,justlikeinmechanics.

in[4](Ch. 3). Aclassiccaseistheoneofquadraticloss,thatis L

(

y

,

s

)

=

12

(

y

s

)

2.Becauseofthevariationalformulation oflearningthatisgiveninthispaper,itisconvenienttoassociateL withtheoveralldistributionalloss

V

(

t

,

w

)

=

1 2

ψ (

t

)



κ∈ Lt

(

f

(

w

(

t

),

u

(

t

))

)

2

· δ(

t

).

Here,

ψ(

·)

isreferredtoasthedissipationfunction,anditisassumedtobeapositiveandincreasingmonotonicfunctionon itsdomain.Afundamentalprinciplebehindtheemergence oflearning,whichwillbefullygainedintheremainderofthe paper,isthatitdoesrequireenergydissipation.IntheHamiltonianframework,theintroductionofdissipationprocessescan bedoneindifferentways.Theideaoffactorizingthepotential,aswellasthewholeLagrangian,byatemporally-growing ex-ponentialtermisrelatedtostudiesondissipativeHamiltoniansystems[5].Weregard w

∈ R

nastheLagrangiancoordinates ofavirtualmechanicalsystem.Throughoutthispaper,V

(

·,

·)

isreferredtoasthepotentialenergy ofthesystemdefinedby Lagrangiancoordinatesw.Inmachinelearning,welookfortrajectoriesw

(

t

)

thatpossiblyleadtoconfigurationswithsmall potentialenergy.Followingthedualitywithmechanics,wealsointroducethenotionofkineticenergy.Forreasonsthatwill becomeclearintheremainderofthepaper,wegeneralizethenotionofvelocity, Tiwi,where,foreachfixed i,Tiisatime dependentlineardifferentialoperatoroftheformTi

Ti

(

t

)

=





j=0

α

i,j

(

t

)

dj dtj

.

(1)

Letmi

>

0 bethemass ofeachparticle(dualofconnectionweight)definedbytheposition wi

(

t

)

andvelocityw

˙

i.Then,we definethekineticenergy as

K

(

t

,

T w

)

=

1 2

ψ

n



i=1 mi

(

Tiwi

)

2

.

(2)

Clearly,ifTi

=

T

=

d

/

dt and

ψ(

t

)

1 then wehave K

=

21

n

i=1miw

˙

i2,whichreturnstheclassiccaseofanalytic mechan-ics. Wenotice onpassing thatone could always adsorbthe dissipativefunctionfactorby introducing Tiψ

:=

ψ

Ti,so as K

(

t

,

T w

)

=

12

n

i=1mi

(

Tiψwi

)

2.Now,let usconsider a Lagrangian simplycomposed ofweighed the sumofthe potential V

(

·,

·)

andthekineticenergy

F

(

t

,

w

,

T w

)

=

n



i=1 F

(

t

,

wi

,

Tiwi

)

=

n



i=1 K

(

t

,

Tiwi

)

+

γ

V

(

t

,

w

).

(3) Herewemostly assume

γ

∈ R

,soasthecognitiveactiongetsaclearmeaningintermsofregularizationtheory.However, itisimportanttonoticethatif

γ

= −

1 thecognitiveactionstrictlyrelatestoclassicactioninmechanics.Theconsequences ofdifferentchoicesof

γ

willbecome moreclearinthereminderofthepaper.Theclassic problemofsupervisedlearning isthatofdiscovering wo

=

arg min A

[

w

]

,where

A

[

w

] =

t1



t0

F

(

t

,

w

,

T w

)

dt (4)

isthecognitiveaction ofthesystem.Whilethereisanintriguinganalogywithanalyticmechanics(see

Table 1

),we empha-sizethat, whileinmechanics weare onlyinterested instationarypoints ofthemechanicalaction,inlearning, we would alsoliketodeterminetheminimumofcognitiveactiondefinedbyequation(4).Interestingly,aswewillseelater,thestrict connectionwithanalyticmechanicsalsomakessensewhenconsideringstrongdissipation(seealso[6]).

Finally, a commenton the minimization of the cognitive action (4) isin order. In general, the problem makes sense whenevertheLagrangianisgivenallover

[

t0

,

t1

]

.Theverynatureoftheproblemresultsintheexplicittimedependenceof

(4)

consequence,ifthereisnorestrictiononthenatureofthisinteractionthentheoptimizationproblemdoesrequireallthe information comingin the

[

t0

,

t1

]

interval.However, the environmentalinteractions that are ofinterest inthispaper are

those thataretypicallyassociatedwithlearning processes,whereaftertheagenthasinspectedacertain amountof infor-mation,itmakessensetoinvolveitinpredictionsonthefuture.Hence,wheneverwedealwithatrulylearningenvironment, inwhichtheagentisexpectedtocaptureregularitiesintheinspecteddata,asitwillbeshown,theminimizationproblem becomeswell-posedandnicelyfitsinthecontextofon-linelearning.

3. Euler–Lagrangelawsoflearning

Let us consider the minimization of functional A

[·]

defined by equation (4). Throughout the paper, we assume that the linearoperators T admits theadjoint T . Asstatedin thefollowing proposition, thecoefficientsof T comedirectly from T ’s.

Proposition3.1.GiventhelineardifferentialoperatorT ofdegree



,definedby(1),itsadjointoperatorisalsoalineardifferential operatorofthesamedegreewithcoefficients

β

κ

(

t

)

=



jκ

(

1

)

j



j

κ



α

(jjκ)

(

t

)

(5)

Proof. IfweuseLeibnizformula,wehave

dj dtj

(

α

j

(

t

)

·

χ

(

t

))

=

j



κ=0



j

κ



α

(jjκ)

(

t

)

d κ

χ

dtκ

(

t

).

Fromthedefinitionofadjointoperator,wecaneasilyseethat,ifweposeDj

:=

dj

/

dtj,thenitsadjointis

(

Dj

)

= (−

1

)

jDj. Thenweget T

(

t

)

=





j=0

(

1

)

j j



κ=0



j

κ



α

(jjκ)

(

t

)

d κ dtκ

=





κ=0



jκ

(

−1

)

j



j

κ



α

(jjκ)

(

t

)

d κ dtκ





k=0

β

k

(

t

)

dk dtk

fromwhichthethesisfollows.

2

Letuscalculatethevariationofthefunctional(4),inthecasen

=

1,overtheinterval

[

t0

,

t1

]

byassumingthat2 the

sta-tionarypoint w

(

·)

hasend-points

(

t0

,

w(κ)

(

t0

))

and

(

t1

,

w(κ)

(

t1

))

with

κ

=

0

,

. . . ,



1.Letusconsiderw

ˇ

(

t

)

=

w

(

t

)

+

ξ(

t

)

,

where

ξ(

·)

isavariation and

∈ R

.Clearly,the conditionsontheboundarieson w(κ) yield corresponding conditionson thevariation

ξ

(κ)

(

t0

)

= ξ

(κ)

(

t1

)

=

0

,

κ

=

0

, . . . , 

1

.

(6) Thenwehave

δ

A

=

A

[ ˇ

w

] −

A

[

w

]

=

t1



t0

[

F

(

t

,

w

+

ξ,

T w

+

T

ξ )

F

(

t

,

w

,

T w

)

]

dt

=

·

t1



t0

[

Fw

· ξ +

FT w

·

T

ξ

]

dt

+

O

(

2

).

Now,becauseofconditions(6),thesecondtermcanbeexpressedby t1



t0 FT w

·

T

ξ

dt

=

t1



t0

(

T FT w

)

· ξ

dt

.

(5)

Hence,weget

δ

A

=

·

t1



t0

[

Fw

+

T FT w

] · ξ

dt

.

Whenimposingthestationarycondition

δ

A

=

0,becauseofthefundamentallemmaofvariationalcalculusweget

Fw

+

T FT w

=

0

.

(7)

Nowwe caneasilyseethat theaboveresultcanbegeneralizedtothesetofvariables wi,i

=

1

,

. . . ,

n.Letusconsider thegeneralcaseinwhichweuseanoperator Tiforeachvariable wi.Inthiscasethevariationturnsouttobe

δ

A

=

·

t2



t1 n



i=1

(

Fwi

+

T iFTiwi

)

· ξ

idt

.

Finally,thisanalysisleadstoanecessaryconditionforthestationaryoffunctionalA,thatisgiveninthefollowingtheorem.

Theorem3.1.Letusassumethatwearegiventhevaluesofw(κ)

(

t

0

)

andw(κ)

(

t1

)

with

κ

=

0

,

. . . ,



1.Thenfunctional(4)admits

astationarypointwi,i

=

1

,

. . . ,

n,providedthat

Fwi

+

T

iFTiwi

=

0 i

=

1

, . . . ,

n

.

(8)

Remark3.1. Theproof ofthe theoremfollowsthe classic principlesofvariationalcalculus todetermine stationarypoints (see e.g.[7,8]). Theonlyspecific issuethatarisesinthegivenresultconcernsthepresence ofoperators Ti insteadofthe singlederivatives,whichplaysanimportantroleinthereminderofthepaper.

Remark3.2. In the formulation of learning, one cannot typically rely on the boundary conditions that are assumed in the theorem,whereas it makes sense tomake assumptions on the initial conditions w(κ)

(

t0

)

,

κ

=

0

,

. . . ,

2



1 (Cauchy

conditions). Interestingly, in case in which the Euler–Lagrange equations (8) are asymptotically stable, the effect of the initialcondition vanishesast

→ ∞

.Inthispaperwe aremostlyinterested inexploringtheenvironmentalinteractionsof theagentunderasymptoticstability.

Remark3.3. Equations (8) have been derived under the assumption that we are given the end-points

(

t0

,

(

t0

))

and

(

t1

,

(

t1

))

with

κ

=

0

,

. . . ,



1. While they represent a necessary condition for stationarity,their integration requires

the knowledge of the boundary condition, which leads to a batch-mode formulation of learning. This is related to the commentsattheendofSection 2concerning theon-lineformulationoflearning.However, ifthelearningenvironmentis periodic withperiod

τ

andnτ pairs ineachperiod,thatisif

(

u

(

ti

),

si

)

= (

u

(

ti

+

τ

),

si+

)

,thereisexperimentalevidence to claimthat the causalityissueraised atthe endofSection 2 disappears[9].Basically, theon-line learning onperiodic learning environment returns a solution that very well approximates the one corresponding tothe batch-mode. A more interesting case, that isworth investigating,is theone ofalmost periodic [10] environments inwhich, roughly speaking,

τ

can dependont.The restrictiontotrulylearningonlineenvironments,inwhichtheregularitiescanbecapturedbecause of underlining statistical assumptions, might be better captured by boundary conditions different with respect to those expressedbyequations(6).Forexample,inthesimplestcaseofT

=

D,onecanimposethattheweightsconvergeoverlarge intervals,whichcorrespondswiththetransversalitycondition[8]ontherightborderFTwi

(

t

,

wi

,

T wi

)

|t

=t1

0.However,in

thispaper,we donot addressthisissue,whereasweexplorethebehaviorofequations(8)by meansofan analysisbased on“energybalance”.

Finally, it is also worth mentioning that the given Euler–Lagrange equations get to stationary points and, therefore, dependingthelearningtask,theactualminimizationofthecognitiveactionmightnotbeachieved.Thisissomewhatrelated totheclassicframeworkofsupervisedlearning,wherethediscoveringofoptimalsolutionsalsoinfinite-dimensionalspaces mayledtosuboptimalsolutions[11].

InthecaseofthecognitiveactiondefinedbytheLagrangianofequation(3)wehave

(6)

reductiontomechanics

Letusassumethat

i

=

1

,

. . . ,

n

:

Ti

=

T

=

D be,andlet

γ

= −

1 be.ThenweconsidertheLagrangian

F

(

t

,

w

,

T w

)

=

1 2

ψ

n



i=1 mi

(

D wi

)

2

− ψ

V

(

w

)

where

ψ

=

eθt.ThentheEuler–Lagrangeequations(see

Theorem 3.1

)become

D2wi

+ θ

D wi

+

Vwi

=

0 i

=

1

, . . . ,

n

,

(10)

which corresponds withdampingoscillatorsin classicmechanics. Here wecan promptlysee therole offunction

ψ(

·)

in thebirthofdissipation.Moreover,theroleof

γ

= −

1 becomes clearinordertoattachtheclassic meaningofpotential to function V .Inaddition,thisisfullycoherentwiththedefinitionofaction,wheretheLagrangianisL

=

K

V .

even-order

γ

signflip

LetusconsiderthecaseT

=

α

o

+

α

1D

+

α

2D2 and

ψ(

t

)

=

eθt.

From

Proposition 3.1

,wehaveT

=

α

o

α

1D

+

α

2D2 andwecaneasilyseethattheEL-equationsbecomes

D4wi

+ β

3D3wi

+ β

2D2wi

+ β

1D wi

+ β

owi

+

γ

α

2 2mi Vwi

=

0 (11) where

β

o

:=

α

o

α

2

θ

2

α

o

α

1

θ

+

α

o2

α

2 2

β

1

:=

α

1

α

2

θ

2

+ (

2

α

o

α

2

α

21

α

2 2

β

2

:=

α

2 2

θ

2

+

α

1

α

2

θ

+

2

α

o

α

2

α

12

α

2 2

β

3

:=

2

θ.

The comparisonwithequation (10)revealsandinterestingdifferencewhichinvolvestherole ofthesignof

γ

instability. Whileinmechanicswehave

γ

= −

1,wheninvolvingthesecond-orderdifferentialoperatorofthisexample,inordertoget stability,fromRouth–Hurwitzstability criterionweimmediatelyconcludethat

γ

>

0 isanecessarycondition. Thiscanbe straightforwardly extendedtoanyevenorderof T . Inthispaper,thispropertyisreferredtoas

γ

signflip,andmotivates thestudyofhigh-orderoperators.

4. DissipativeHamiltonianformulationoflearning

Now we use Legendre transform in order to obtain the Hamiltonian formulation of the problem. We introduce the notations f

˙

i

:=

Tifiand ˚fi

:=

T ifi,soasequations(8)canbere-writtenas

Fwi

+

˚Fw˙i

=

0 i

=

1

,

2

, . . . ,

n

.

(12)

Now,letusintroducetheconjugatemomentum

υ

i

υ

i

:=

F

w

˙

i

.

(13)

Likewise,wedefinetheHamiltonianas

H

(

t

,

w

,

υ

)

:=

n



i=1

υ

iw

˙

i

F

(

t

,

w

,

w

˙

),

(14) where w

˙

i mustbethoughtofasfunctionsof wi and

υ

i.

Theorem4.1.TheEuler–Lagrangeequations(8)areequivalenttothe2n Hamiltonequations

˙

wi

=

H

υ

i

,

(15) ˚

υ

i

=

H

wi

.

(16)

(7)

Moreover,wehave

H

t

= −

F

t

.

(17)

Proof. As usual, we introduce the canonical variables wi and

υ

i. Then the canonical equations arise from the EL-equations(8)andfromdefinition(13),asfollows:

H

υ

i

=

υ

i





n j=1

υ

jw

˙

j

F

(

t

,

w

,

w

˙

)



= ˙

wi

+

υ

i

w

˙

i

υ

i

F

wi

wi

υ

i

F

w

˙

i

w

˙

i

υ

i

= ˙

wi

,

H

wi

=

wi





n j=1

υ

jw

˙

j

F

(

t

,

w

,

w

˙

)



= ˙

wi

υ

i

wi

+

υ

i

w

˙

i

wi

F

wi

F

w

˙

i

w

˙

i

wi

= −

F

wi

=

T

F

w

˙

i

=

˚

υ

i

.

Finally,equation(17)arisesfrom(16)asfollows

H

t

=

t





n i=1

υ

iw

˙

i

F

(

t

,

w

,

w

˙

)



=

n



i=1

υ

i

w

˙

i

t

n



i=1 Fw˙i

w

˙

i

t

F

t

= −

F

t

.

2

TheintegrationofHamiltonequations(15)and(16)canbewrittenformallyas

wi

=

−1 Ti Hυi (18)

υ

i

=

−1 Ti Hwi

,

where− 1 Ti and− 1

Ti denotetheinverseoperatorsofTiandTi ,respectively.Noticethatwheneverweusetheinverseoperator, we needto solvean ordinary differential equation oforder



and, therefore,we mustalways impose



initial conditions to get a unique solution. While equations (15) and (16)can be used for determining the evolution of w, as it will be showninSection6,equation(15),canofferapictureoflearningintermsofenergybalance.Thefollowingdefinitionplays a fundamental role inthe analysisofenergybalance. The definitioninvolvesthe set ofvariables

N := {(

wi

,

υ

i

),

i

∈ Nn

}

pairedwiththeassociateddifferentialoperatorT .

Definition4.1.Giventhesystem



∼ (

N ,

T

)

,alongwithanypairoffunctions

φ

1

(

t

,

w

,

υ

)

and

φ

2

(

t

,

w

,

υ

)

,withcontinuous

derivativesuptothe



-thorder,theterm

1

, φ

2

}

TD

:=

n



i=1



∂φ

1

wi DT1

∂φ

2

υ

i

+

∂φ

1

υ

i D− 1 T

∂φ

2

wi



(19)

isreferredtoasthebracketof

φ

1 and

φ

2,withrespecttooperators D and T .

Forthesakeofsimplicity,inthereminderofthepaper,wewillomitthesubscriptandsuperscript,soaswewillsimply usethenation

1

,

φ

2

}

.Wealsonoticethat thiscanbe regardedasaformal definition,unlessweknow(asit willbein

whatfollows)theinitialconditionsfor

−1 T

∂φ

2

υ

i and − 1 T

∂φ

2

wi

.

Wecaneasilyseethatgeneral

1

,

φ

2

}

= {φ

2

,

φ

1

}

andthat

1

,

φ

2

}

islinearonlyinthefirstargument.

Theorem4.2.Giventhesystemdefinedbyw

: [

t0

,

t1

]

→ R

nandany

φ (

t

,

w

,

υ

)

withcontinuousderivatesuptothe



-thorder,we

have

d

φ

dt

=

∂φ

(8)

Proof. Wehave d

φ

dt

=

∂φ

t

+

n



i=1

φ

wiD wi

+ φ

υiD

υ

i

.

(21)

Ifweplug wiand

υ

igivenbyequation(18)intoequation(21)weget

d

φ

dt

=

∂φ

t

+

n



i=1



φ

wiD −1 T Hυi

+ φ

υiD −1 T Hwi



=

∂φ

t

+ {φ,

H

} .

2

5. BG-brackets

Inthissectionwedescribesomepropertiesofthebrackets

{·,

·}

introducedby

Definition 4.1

.Theiranalysisleads toan in-depthunderstandingofenergyexchangeprocesses.Westartnoticingthatif

i

=

1

,

. . . ,

n

:

Ti

=

D then

{·,

·}

reduce to classicPoisson’sbrackets.Thisisformallystatedbythefollowingproposition

Proposition5.1.Thebrackets

{·,

·}

areageneralizationofPoissonbracketstothecaseinwhich

i

=

1

,

. . . ,

n

:

Ti

=

D,thatis

1

, φ

2

} = {φ

1

, φ

2

}

P.B.

.

Proof. From

Proposition 5

,undertheassumptionthat

i

=

1

,

. . . ,

n

:

Ti

=

D,wederivethat Ti

= −

D.Hence,wehave

1

, φ

2

} =

n



i=1



∂φ

1

wi DD1

∂φ

2

υ

i

∂φ

1

υ

i DD1

∂φ

2

wi



=

n



i=1



∂φ

1

wi

∂φ

2

υ

i

∂φ

1

υ

i

∂φ

2

wi



= {φ

1

, φ

2

}

P.B.

.

2

restoftheadjoint

Now,wefocusonthecase3 T

=

D.Interestingly,we willshow thattheenrichmentofthedifferentialoperatorT with

re-specttoD hasimportantconsequencesontheenergybalanceequation(20).Whilefrom

Proposition 5.1

wehave

{

H

,

H

} =

0, itwillbeshownthat T

=

D,ingeneralleadsto

{

H

,

H

}

=

0.Let



w

,

υ

 :=

t



t0

w

(

τ

)

υ

(

τ

)

d

τ

(22)

be. We introduce a coupleof concepts that turn out to be usefulin thefollowing. First, we notice that any differential operator T canbepairedwitharealnumber,whichcomesoutwhenconsideringthebehavioronthebordersof



T w

,

υ



.

Definition5.1.LetT beadifferentialoperatorandlet

·,

·

betheinnerproductgivenby(22).Thenforanygivenfunction pairoffunctions f

,

g

W,2

(

T

|

f

,

g

)

:= 

T f

,

g

 − 

f

,

T g

,

(23)

isreferredtoastherestoftheadjoint ofT withrespectto f and g.

Clearly,

(

·|

·,

·)

islinearineachofitsargumentsanditisdistributiveinboththeoperatorandthetwo functionslots. Moreover, we havethat

(

c

·

I

|

f

,

g

)

0 foranyconstant c. Noticethat

(

T

|

f

,

g

)

:= 

T f

,

g



=

0 wheneverwe aregiven thesameboundaryconditionson f and g.

Proposition5.2.LetTm

=

m

i=0

α

oDibe,where

α

iarerealconstant.Thentherestoftheadjoint

(

T

|

w

,

v

)

w.r.t.w and

υ

canbe computedbytherecurrentequations

3 Forthesakeofsimplicity,intheremainderofthepaper,weassumei=1,. . . ,n: T i=T .

(9)

i

.

(

Tm

|

w

,

υ

)

= (

Tm−1

|

w

,

υ

)

+

α

m

(

Dm

|

w

,

υ

),

ii

. (

Dm

|

w

,

υ

)

=

υ

Dm−1w

− (

Dm−1

|

w

,

D

υ

),

(24)

where

[

u

]

:=

u

(

t1

)

u

(

t0

)

isthevariationonthebordersofu. Proof. i. Wehave

(

Tm

|

w

,

υ

)

= 

Tm−1w

+

α

mDmw

,

υ

 − 

w

, (

Tm−1

)

υ

+

α

m

(

Dm

)

υ



= (

Tm−1

|

w

,

υ

)

+

α

m



Dmw

,

υ

 − 

w

, (

Dm

)

υ



= (

Tm−1

|

w

,

υ

)

+

α

m

(

Dm

|

w

,

υ

).

ii. Inordertogettherecurrentequationii.,westartusingintegrationbypartsasfollows:

(

Dmw

|

υ

)

= 

Dmw

,

υ

 − 

w

, (

Dm

)

υ



= 

Dmw

,

υ

 − (−

1

)

m



w

, (

Dm

)

υ



= [

v Dm−1w

] − 

D

υ

,

Dm−1w

 + (−

1

)

m−1

[

w Dm−1

υ

] − 

D w

,

Dm−1

υ





= [

v Dm−1w

] + [

w

(

Dm−1

)

υ

] −



D

υ

,

Dm−1w

 + 

D w

, (

Dm−1

)

υ





.

Nowwehave

(

Dm−1

|

w

,

D

υ

)

= 

Dm−1w

,

D

υ

 − 

w

, (

Dm−1

)

D

υ

,

and,therefore,weget

(

Dm

|

w

,

υ

)

= [

v Dm−1w

] + [

w

(

Dm−1

)

υ

]



w

, (

Dm−1

)

D

υ

 + (

Dm−1

|

w

,

D

υ

)

+ 

D w

, (

Dm−1

)

υ





= [

v Dm−1w

] + [

w

(

Dm−1

)

υ

] − [

w

(

Dm−1

)

υ

] − (

Dm−1

|

w

,

D

υ

)

= [

v Dm−1w

] − (

Dm−1

|

w

,

D

υ

).

2

Example5.1.LetT

=

α

0

+

α

1D

+

α

2D2.Tocalculatetherestoftheadjointweuse

Proposition 5.2

.Fromequation(24)-ii we

cancalculate

(

Dm

|

w

,

υ

)

form

=

1

,

2

,

3.First,noticethat

(

D0

|

w

,

υ

)

=

0.Then,forD wehave

(

D

|

w

,

υ

)

= [

υ

w

] − (

D0

|

w

,

D0

υ

)

= [

υ

w

].

Nowwecancalculate

(

D2

|

w

,

υ

)

:

(

D2

|

w

,

υ

)

= [

υ

D w

] − (

D

|

w

,

υ

)

= [

υ

D w

] − (

D

|

w

,

D

υ

)

= [

υ

D w

] − [

w D

υ

] = [

υ

D w

w D

υ

].

Finally,fromequation(24)-ii,weget

(

T

|

w

,

υ

)

= [

α

1w

υ

+

α

2

(

υ

D w

w D

υ

)

].

(25)

Thenotionofrestoftheadjointplaysanimportantroleintheremainder ofthepaper,whereweareinterested inthe followingextendeddefinitionwhichinvolveslearningsystem



∼ (

N ,

T

)

.

Definition5.2.Given



∼ (

N ,

T

)

wedefine

M

(,

[

t0

,

t1

]) :=

n



i=1

(

T

|

wi

,

D

υ

i

)

+ (

D

|

wi

,

T

υ

i

).

(26)

M

(,

[

t0

,

t1

])

isreferredtoastherestofT withrespectto

N

.

(10)

Lemma5.1. M

(,

[

t0

,

t1

]) :=

n



i=1

(

T

+

D

)

wi

, (

T

+

D

)

υ

i

.

(27)

Proof. Theproofisastraightforwardconsequenceofthedefinition.

2

First, wenoticethat ifT

=

D then T

+

D

=

0 and,consequently M

(,

[

t

0

,

t1

])

=

0.Clearly,the lemmaenlightensthe

wayM

(,

[

t0

,

t1

])

emergesfrombreakingtheadjointpropertyT

+

D

=

0,thatholdsforT

=

D.

Inthecaseof

Example 5.1

,usingequation(25)weget

M

(,

[

t0

,

t1

]) =

n



i=1

(

T

|

wi

,

D

υ

i

)

+ (

D

|

wi

,

T

υ

i

)

=

n



i=1

[

α

1wiD

υ

i

+

α

2

(

D

υ

iD wi

wiD2

υ

i

)

] + [

wi

(

α

0

α

1D

υ

i

+

α

2D2

υ

i

)

]

=

n



i=1

[

α

0wi

+

α

2D wiD

υ

i

]

Thisexamplesuggeststhat M

(,

[

t0

,

t1

])

dependsonthevalueassumedby

B

(

t

,

w

,

D w

,

D

υ

)

=

n



i=1

α

0wi

+

α

2D wiD

υ

i ontheboundariesof

[

t0

,

t1

]

.Wehave

M

(,

[

t0

,

t1

]) =

B

(

t1

,

w

,

D w

,

D

υ

)

B

(

t0

,

w

,

D w

,

D

υ

).

Clearly,thisholdsfor



withanydifferentialoperator T .Asaconsequence,wheneverweknowthat



exhibitsaperiodic behavioron

[

t0

,

t1

]

then

M

(,

[

t0

,

t1

]) =

0

.

NowweshowapropertyofM whichleadstoaclearinterpretationintermsoftheenergyexchangesoftheagentwiththe environment.

Theorem5.1.LetT beapositiveself-adjointoperatorwithconstantcoefficientsandletusconsiderasystem



suchthat

υ

i

=

T wi. ThenM

0.

Proof. FromthehypothesisT

=

T .Then,fromthedefinitionofM weget

M

(,

[

t0

,

t1

]) =

n



i=1

(

T

|

wi

,

D

υ

i

)

+ (

D

|

wi

,

T

υ

i

)

=

n



i=1



T wi

,

D

υ

i

 − 

wi

,

T D

υ

i

 + 

D wi

,

T

υ

i

 + 

wi

,

D T

υ

i



=

n



i=1



T wi

,

D

υ

i

 + 

D wi

,

T

υ

i



=

n



i=1

(

T

+

D

)

wi

, (

T

+

D

)

υ

i

.

Since

υ

i

=

T wi andT

0 weget

M

(,

[

t0

,

t1

]) =

n



i=1

(

T

+

D

)

wi

, (

T

+

D

)

T wi



=

n



i=1

(



T

+



D

)

w



i ui

,

T

(



T

+



D

)

w



i ui

 =

n



i=1



ui

,

T ui

 ≥

0

.

2

(11)

T–DCommutator

Anotherimportantasymmetry ariseswhichplays an importantrole incaseoftime-dependent differentialoperators. The followingdefinitionturnsouttobeuseful.

Definition5.3.Givenanytwodifferentialoperators T1 andT2 wedefine

[

T1

,

T2

] :=

T1T2

T2T1

.

(28)

Thedifferentialoperator

[

T1

,

T2

]

isreferredtoasthecommutator ofT1andT2.

Ofcourse,wehave

[

T1

,

T2

]

+ [

T2

,

T1

]

=

0.Inparticular,inthefollowing,weareinterestedinthecaseinwhichT1

=

T

andT2

=

D.Inthatcase

T D y

D T y

=





i=0

α

iDiD y

D 



i=1

α

iDiy

=





i=0

α

iDi+1y





i=0 D

α

iDiy





i=0

α

iDi+1y

= −





i=0 D

α

iDiy

≡ −





i=1

β

iDiy

.

(29)

Clearly,

[

T

,

D

]

=

0 forconstant

α

icoefficients,whereasingeneral,

[

T

,

D

]

isadifferentialoperatordefinedbycoefficients

β

i. Likefortherestoftheadjoint,itturnsouttobeusefultoextendthenotionofcommutatortothecaseofasetconjugate variables

N

fordifferentialoperators T and D.

Definition5.4.Letusconsiderthesystem



= {

N ,

T

}

over

[

t0

,

t1

]

.Then

ϒ(,

[

t0

,

t1

]) :=

n



i=1



wi

,

[

T

,

D

]

υ

i



(30)

isreferredtoastheT

D commutatorof

N

.

Again,the T –D commutatorexpressesthedegreeofasymmetry betweenT and D.Wehavesymmetry

[

T

,

D

]

=

T D

D T

=

0 whenever T isatime-invariantdifferential operator, whereasthe asymmetryjustarisesbecause ofthetemporal evolution of coefficients

α

i. In order to give an interpretation of

ϒ(,

[

t0

,

t1

])

, let us consider the case in which T is

replacedwith

φ

T ,where

φ

isapositiveincreasingmonotonicfunction.ThiscasehasbeenalreadymentionedinSection2

with

φ (

t

)

=

ψ(

t

)

.Wediscussthebasicideainthesimplestcaseinwhich T

= φ

D.

Theorem5.2.LetusassumethatM

(,

[

t0

,

t1

])

=

0.IfT

= φ

D then

ϒ(,

[

t0

,

t1

]) = −

n



i=1 mi t1



t0

(

D wi

)

2

φ

D

φ

dt

.

(31)

Proof. First,wecaneasilyseethat

T

= −

D

φ

·

D0

− φ

D

,

whereD0

=

I istheidentityoperator.Nowwecancalculate

[

T

,

D

] =

T D

D T

= (−

D

φ

·

I

− φ

D

)

D

D

(

D

φ

·

I

− φ

D

)

=

D2

φ

I

+

D

φ

D

.

Now,since M

(,

[

t0

,

t1

])

=

0 and

υ

i

=

miT wi

=

mi

φ

D wi,wehave

ϒ(,

[

t0

,

t1

]) =

n



i=1 t1



t0 wi

(

υ

iD2

φ

+

D

φ

D

υ

i

)

dt

=

n



i=1 t1



t0 wi

(

D

(

υ

iD

φ))

dt

= −

n



i=1 t1



t0

υ

iD wiD

φ

dt

= −

n



i=1 mi t1



t0 D

φ (

D wi

)

2

φ

dt

.

2

(12)

Letusconsiderthecaseinwhich

ψ(

t

)

= φ

2

(

t

)

=

eθt.Weget

−ϒ(, [

t0

,

t1

]) =

n



i=1 mi t1



t0 D

φ (

D wi

)

2

φ

dt

=

1 2 n



i=1 mi

θ

t1



t0

(

D wi

)

2

ψ

dt

>

0

.

Thisoffersaclearinterpretationof

−ϒ(,

[

t0

,

t1

])

,whichturnsouttobethedissipatedenergyof



over

[

t0

,

t1

]

.However,

noticethatthisinterpretationislimitedtothecaseD

φ >

0.

Example5.2.LetT

=

sin

(

ω

t

)

·

D be.Wehave

T

= −

ω

cos

(

ω

t

)

·

I

sin

(

ω

t

)

D

and

[

T

,

D

] = −

ω

2sin

(

ω

t

)

·

I

+

ω

cos

(

ω

t

)

D

.

UndertheassumptionthatM

(,

[

t0

,

t1

])

=

0 and

υ

i

=

miT wi

=

misin

ω

t

·

D wehave

ϒ(,

[

t0

,

t1

]) =

n



i=1 t1



t0 wiD

(

ω

cos

(

ω

t

)

υ

i

)

dt

= −

n



i=1 t1



t0 cos

(

ω

t

)

υ

iD widt

= −

1 2 n



i=1 mi t1



t0 sin 2

ω

t

(

D wi

)

2dt

.

This exampleclearly showsthatthe signof

ϒ(,

[

t0

,

t1

])

can flipwithfunctions

ψ(

·)

that arenot monotonic.Hence,

ϒ(,

[

t0

,

t1

])

modelsinteractionsoftheagentwiththeenvironmentwheretheenergyflowsinbothdirections.

The

{·,

·}

bracketsassumeaspecialmeaningwheninvolvingtheHamiltonianofagivensystem.4 Theorem5.3.Giventhesystem



over

[

t0

,

t1

]

then

{

H

,

H

} =

d

dt

(

M

+ ϒ).

(32)

Proof. BecauseoftheHamiltonequations,wehave

{

H

,

H

} =

n



i=1

HwiD −1 T Hυi

+

HυiD −1 T Hwi



=

n



i=1

T

υ

iD −1 T T wi

+

T wiD −1 T T

υ

i



=

n



i=1

T

υ

i

·

D wi

+

T wi

·

D

υ

i

.

Now,ifweintegrateover

[

t0

,

t

]

andusethedefinitionofrestoftheadjointandofcommutator,weget t



t0

{

H

,

H

}

d

τ

=

n



i=1 t



t0

(

T

υ

i

·

D wi

+

T wi

·

D

υ

i

)

d

τ

=

n



i=1



T

υ

i

,

D wi

 + 

T wi

,

D

υ

i



=

n



i=1

(

D

|

wi

,

T

υ

i

)

− 

D T

υ

i

,

wi

 + (

T

|

wi

,

D

υ

i

)

+ 

wi

,

T D

υ

i



4 Wheneverthereisnoriskofambiguity,inthefollowingwedropthedependenceon,[t

(13)

=

n



i=1

(

D

|

wi

,

T

υ

i

)

+ (

T

|

wi

,

D

υ

i

)

+

n



i=1



wi

, (

T D

D T

)

υ

i



=

M

+ ϒ.

Finally,wegetthethesiswhenderivingw.r.t.tot bothsides.

2

This resultshowsthat the

{·,

·}

brackets assumesa special meaning in the casein whichboth the operands are the Hamiltonianfunction.Inparticular,inthatcase,thereisaBoundaryGeneration expressedbyM

(,

[

t0

,

t1

])

,duetotherest

of the adjoint,and a BarteringGeneration expressed by

ϒ(,

[

t0

,

t1

])

, which is dueto the exchange of the operators T

and D.Forthisreason,

{·,

·}

isreferredtoastheBG-brackets.

6. Energybalanceinlearningprocesses

In thissection we discussenergybalancing connected withthegeneral formofcognitive action whoseLagrangian is givenbyequation(3).Whileequations(15)and(16)canbeusedtodeterminethelearningdynamics,equation (17)offers an interesting viewof learningin termsofenergybalancing.Notice that theequation involvesthe canonicalvariables w and

υ

inH ,aswellasw in

˙

F .Analternativewayofconstructingenergybalanceonlybasedoncanonicalvariablesisthat ofusingexpressing

{

H

,

H

}

accordingto

Theorem 5.3

.Wehave

dH dt

=

dK dt

γ

dV dt

=

H

t

+ {

H

,

H

} = −

γ

V

t

+

K

t

+

d dt

(

M

+ ϒ).

Wealsohave dV dt

=

V

t

+ {

V

,

H

},

dK dt

=

K

t

+ {

K

,

H

},

whichsuggeststheintroductionoffunctions

V

and

K

definedby

d

V

dt

= {

V

,

H

} =

dV dt

V

t

,

(33) d

K

dt

= {

K

,

H

} =

dK dt

K

t

.

(34)

Finally,theanalysisissummarizedbythefollowingtheorem,whichestablishestheenergybalanceofanylearningsystem. Unlikeequation(17),thegivenenergybalanceisfullyexpressedintermsofcanonicalvariables w and

υ

.

Theorem6.1.Thesystem

(,

T

)

evolvesaccordingtotheinvariantcondition

d

dt

(

K

γ

V

M

− ϒ) =

0

.

(35)

Animportantsourceofinteractionisrepresentedbytheenvironmentalpotentialenergy

E

=

E

(,

[

t0

,

t

]) :=

t



t0

V

τ

d

τ

,

andby thebarteringenergy

ϒ

,whichappearswhenever T doesnotcommutatewithD.While

K

reflectsthevelocityof theconnectionsdynamics,theboundaryenergyM takesintoaccounttheexchangeofenergyatthebeginningandatthe currenttimeofthelearninghorizon.

dampingoscillators

Inordertoprovideaninterpretationofanalyticmechanics asaspecialcase,let usconsider T

=

D and

γ

= −

1.Basically, we areconsidering theLagrangian whichhas originatedthedampingoscillatorofequation (10)From the definition(30)

of

ϒ

wehave

[

T

,

D

]

= [−

D

,

D

]

=

0 and,therefore,

ϒ

=

0.Likewise,from

Lemma 5.1

,since D

+

T

=

0,we concludethat M

=

0.Asaconsequence,theinvariantof

Theorem 6.1

,expressedintermsofcanonicalvariables,yields

d

dt

(

K

+

V

)

=

0

.

(36)

Thisis infact theclassic invariant condition ofclassic mechanics, whichestablishes that the sumofthekinetic and po-tentialenergies isinvariant. Interestingly,thisholdsalso incasesinwhich thekinetic energy K and thepotential V are

Figure

Updating...

References

Related subjects :