The principle of least cognitive action

(1)

Contents lists available atScienceDirect

Theoretical

Computer

Science

www.elsevier.com/locate/tcs

The

principle

of

least

cognitive

action

Alessandro Betti

a

, Marco Gori

b

,

∗

a_Department_of_Physics,_University_of_Pisa,_Italy

b_Department_of_Information_Engineering_and_Mathematics,_University_of_Siena,_Italy

a

r

t

i

c

l

e

i

n

f

o

a

b

s

t

r

a

c

t

Articlehistory: Received3January2015

Receivedinrevisedform17June2015 Accepted19June2015

Availableonline2July2015 Keywords:

BG-brackets Lifelong learning Naturallearningtheory On-linelearning Leastcognitiveaction

By and large,the interpretation of learning as acomputational process taking place in bothhumansand machines isprimarily provided intheframework ofstatistics. Inthis paper, wepropose aradically differentperspective inwhichthe emergence oflearning is regardedas theoutcome oflaws ofnature that governtheinteractions ofintelligent agentswiththeirownenvironment.Weintroduceanaturallearningtheory based onthe principleofleastcognitiveaction,whichisinspiredtotherelatedmechanicalprinciple,and tothe Hamiltonianframework formodelingthemotionofparticles.The introductionof the kineticand ofthe potential energy leadstoa surprisinglynatural interpretation of learningasadissipativeprocess.Thekineticenergyreﬂectsthetemporalvariationofthe synaptic connections, while the potential energyis apenalty that describes the degree of satisfactionof the environmental constraints. The theorygives a picture of learning intermsoftheenergybalancingmechanisms,wherethenovelnotionsofboundaryand barteringenergiesareintroduced.Finally,as anexampleofapplicationofthetheory,we showthatthesupervisedmachinelearningschemecanbeframedintheproposedtheory and, in particular, we show that the Euler–Lagrange differential equations of learning collapsetotheclassicgradientalgorithmonthesupervisedpairs.

1. Introduction

Machine Learningis ata lively crossroadof disciplines,where theexploration ofneuroscience and cognitive psychol-ogymeetscomputational modelsmostlybasedon statisticalfoundations.Forthesemodels toberealistic,one istypically concerned with the acquisition of learning skills on a sample oftraining data,that is suﬃcientlylarge to be consistent witha statisticallyrelevanttest set.However, inmostchallenging andinteresting learning taskstakingplace inhumans, theunderlyingcomputationalprocessesdo notseemtooffersuchaneatidentiﬁcationofthetrainingset.Astimegoesby, humans reactsurprisinglywell to newstimuli,while keepingpastacquiredskills,whichseems tobe hard toreach with nowadaysintelligentagents.Thissuggestsustolookforalternativefoundationsoflearning,whicharenotnecessarilybased onstatisticalmodelsofthewholeagentlife.

Inthispaper,weinvestigatetheemergenceoflearningastheoutcomeoflawsofnaturethatgoverntheinteractionsof intelligentagentswiththeirownenvironment,regardlessoftheirnature.Theunderlyingprincipleisthattheacquisitionof cognitiveskillsbylearningobeysinformation-basedlawsontheseinteractions,whichholdregardlessofbiology.Inthisnew perspective,inparticular,weintroduceanaturallearningtheory aimedatdiscoveringthefundamentaltemporally-embedded

*

Correspondingauthor.

(2)

mechanismsofenvironmentalinteraction.Whiletheroleoftime hasbeenquiterelevantinmachinelearning(seee.g.the HiddenMarkovModels),insomechallengingproblems,likecomputervision,thetemporaldimensionseemstoplayquitea minorrole.Whatcouldhumanvisionhadbeeninaworldofvisualinformationwithshuﬄedframes?Anycognitiveprocess aimed atextractingsymbolicinformationfromimagesthat are notframesof atemporallycoherent visualstream would have beenextremelyharder thanin ourvisual experience [1].More thanlooking forsmartalgorithmic solutions todeal withtemporalcoherence,inthispaperwerelyontheideaofregardinglearningasaprocesswhichisdeeplyinterwound withtime,justlikeinmostlawsofnature.

Inordertoderivethelawsoflearning,weintroducetheprincipleofleastcognitiveaction,whichisinspiredtotherelated mechanicalprinciple,andtotheHamiltonianframeworkformodelingthemotionofparticles.Unlikemechanics,however, the cognitiveactionthatwe deﬁneisinfacttheobjectivetobe minimized,morethana functionalforwhichto discover a stationary point. Inour learning framework, thisduality isbased on a proper introductionof the “kinetic”andof the “potentialenergy,”thatleadstoasurprisinglynaturalinterpretationoflearning asadissipativeprocess.Thekineticenergy reﬂectsthetemporalvariationofthesynapticconnections,whilethepotentialenergyisapenaltythatdescribesthedegree of satisfaction ofthe environmental constraints.The theory gives a picture oflearning interms ofthe energy balancing mechanisms, wherethe novelnotions ofboundaryandbarteringenergies are introduced. Thesenewenergies arise mostly to modelcomplex environmentalinteractions of theagent, that are nicelycapturedby means oftime-variant high-order differential operators.WhenpairingthemwiththenovelnotionofBG-bracket,weshowhow intelligentprocessescanbe understood in terms ofenergy balance; learning turns out be a dissipative process which reduces the potential energy by moving theconnection weights,thus producingakinetic energy.However, theenergybalancing doesinvolvealsothe boundary and barteringenergies. The former arisesbecause of theenergy exchange atthe beginning andat the endof thetemporalhorizon, whilethelast oneisdeeplyconnectedwiththeenergyexchangeduring theagent’slife.Itisworth mentioningthatthisenergybalance,whichhasanicequalitativeinterpretation,isinfacttheformaloutcomeofthemain theoretical results givenin the paper. In particular, the proposed framework incorporates classic mechanics as a special case,whileitopensthedoorstoin-depthanalysesinanycontextinwhichthereisanexplicittemporaldependenceinthe Lagrangianofthesystem.

Whilethepaperfocusesonlawsoflearningthatareindependentofthenatureoftheintelligentagent,itisshownthat theselawsalsoofferageneralframeworktoderiveclassicon-linemachinelearningalgorithms.Inparticular,weshowthat the supervisedmachine learning scheme canbe framed in theproposed theory, andthat theEuler–Lagrange differential equationsoflearningderived inthepaperdocollapsetotheclassicgradientalgorithm.Interestingly,thetheoryprescribes thetime-variantlearningrate,whicharisesfromthegivenvariationalformulation.

The paperis organized as follows.In the next section, we introduce naturalprinciples oflearning and, in particular, the notion of cognitive action. Its minimization, which leads to the laws of learning, is described in Section 3. In Sec-tion 4 we proposea dissipativeHamiltonian framework oflearning, which,in Section 5,iscompleted by the analysisof the BG-brackets. Section 6relies on the previous results to provide an interpretation of learning dynamics by means of energybalance.InSection7,weshow howthelawsoflearningcan beconvertedtothegradientalgorithm.Finally,some conclusionsaredrawninSection8.

2. Naturalprinciplesoflearning

Thenotionoftimeisubiquitousinlawsofnature.Surprisinglyenough,moststudiesonmachinelearninghaverelegated time to the relatednotionof iterationstep.1 From one sidethe connection issound, since itinvolves thecomputational effort,thatisalsoobservedinnature.Fromtheotherside,whiletimehasatrulycontinuousnature,thetimediscretization typicalinﬁeldslikecomputervisionseemstogiveuptothechallengeofconstructingatrulytheoryoflearning.Thispaper presentsanapproachtolearningthatincorporatestimeinitstrulycontinuousstructure,soastheevolutionoftheweights of theneural synapses follows equationsthat resemble lawsof physics. Weconsider environmentalinteractions that are modeled under the recentlyproposed framework of learningfromconstraints [2]. Inparticular, we focus attentionon the caseinwhicheachconstraintoriginatesalossfunction,thatisexpectedtotakeonsmallvaluesincaseofsoft-satisfaction. Interestingly,thankstotheadoptionofthet-normtheory[3],lossfunctionscanbeconstructedthatcanalsorepresentthe satisfactionoflogicconstraints.

A lot of emphasis in machine learning has been on supervised learning where we are given the collection

L

=

{(

tκ

,

u

(

tκ

)),

sκ

}

κ∈N of supervisedpairs

(

u

(

tκ

),

sκ

)

over thetemporal sequence

{

tκ

}

κ∈N.At a givent, the pairs

(

u

(

tκ

),

sk

)

thathavebecomeavailablearedenotedby

L

t

:= {

κ

∈ N :

tκ

≤

t

}

.Weassumethatthosedataarelearnedbyafeedforward neuralnetwork deﬁnedbythefunction

f

(

·, ·) : R

n

× R

d

→ R

D

,

soastheinputu

(

t

)

ismappedto y

(

t

)

=

f

(

w

(

t

),

u

(

t

))

.Classicsupervisedlearningcanbenaturallymodeledbyintroducing a loss function L

(

·,

·)

: R

2

→ R

along with the given training set

L

. Examples of different loss functionscan be found

(3)

Table 1

Linksbetweenthenaturallearningtheoryandclassicalmechanics. Natural Learning TheoryMechanics Remarks

wiqi Weightsareinterpretedasgeneralizedcoordinates. ˙

wiq˙i Weightsvariationsareinterpretedasgeneralizedvelocities.

υipi Theconjugatemomentumtotheweightsisdeﬁnedbyusingthemachinery ofLegendretransforms.

A[w]S[q] Thecognitiveactionisthedualoftheactioninmechanics.

F(t,w,w˙)L(t,q,q˙) TheLagrangianF tisassociatedwiththeclassicLagrangianL inmechanics. H(t,w,υ)H(t,q,p) Whenusingw andυ,wecandeﬁnetheHamiltonian,justlikeinmechanics.

in[4](Ch. 3). Aclassiccaseistheoneofquadraticloss,thatis L

(

y

,

s

)

=

1₂

(

y

−

s

)

2.Becauseofthevariationalformulation oflearningthatisgiveninthispaper,itisconvenienttoassociateL withtheoveralldistributionalloss

V

(

t

,

w

)

=

1 2

ψ (

t

)

κ∈ Lt

(

f

(

w

(

t

),

u

(

t

))

−

sκ

)

2

· δ(

t

−

tκ

).

Here,

ψ(

·)

isreferredtoasthedissipationfunction,anditisassumedtobeapositiveandincreasingmonotonicfunctionon itsdomain.Afundamentalprinciplebehindtheemergence oflearning,whichwillbefullygainedintheremainderofthe paper,isthatitdoesrequireenergydissipation.IntheHamiltonianframework,theintroductionofdissipationprocessescan bedoneindifferentways.Theideaoffactorizingthepotential,aswellasthewholeLagrangian,byatemporally-growing ex-ponentialtermisrelatedtostudiesondissipativeHamiltoniansystems[5].Weregard w

∈ R

nastheLagrangiancoordinates ofavirtualmechanicalsystem.Throughoutthispaper,V

(

·,

·)

isreferredtoasthepotentialenergy ofthesystemdeﬁnedby Lagrangiancoordinatesw.Inmachinelearning,welookfortrajectoriesw

(

t

)

thatpossiblyleadtoconﬁgurationswithsmall potentialenergy.Followingthedualitywithmechanics,wealsointroducethenotionofkineticenergy.Forreasonsthatwill becomeclearintheremainderofthepaper,wegeneralizethenotionofvelocity, Tiwi,where,foreachﬁxed i,Tiisatime dependentlineardifferentialoperatoroftheformTi

Ti

(

t

)

=

j=0

α

i,j

(

t

)

dj dtj

.

(1)

Letmi

>

0 bethemass ofeachparticle(dualofconnectionweight)deﬁnedbytheposition wi

(

t

)

andvelocityw

˙

i.Then,we deﬁnethekineticenergy as

K

(

t

,

T w

)

=

1 2

ψ

n

i=1 mi

(

Tiwi

)

2

.

(2)

Clearly,ifTi

=

T

=

d

/

dt and

ψ(

t

)

≡

1 then wehave K

=

₂1

n

i=1miw

˙

i2,whichreturnstheclassiccaseofanalytic mechan-ics. Wenotice onpassing thatone could always adsorbthe dissipativefunctionfactorby introducing T_iψ

:=

√

ψ

Ti,so as K

(

t

,

T w

)

=

1₂

n

_i₌₁mi

(

T_iψwi

)

2.Now,let usconsider a Lagrangian simplycomposed ofweighed the sumofthe potential V

(

·,

·)

andthekineticenergy

F

(

t

,

w

,

T w

)

=

n

i=1 F

(

t

,

wi

,

Tiwi

)

=

n

i=1 K

(

t

,

Tiwi

)

+

γ

V

(

t

,

w

).

(3) Herewemostly assume

γ

∈ R

,soasthecognitiveactiongetsaclearmeaningintermsofregularizationtheory.However, itisimportanttonoticethatif

γ

= −

1 thecognitiveactionstrictlyrelatestoclassicactioninmechanics.Theconsequences ofdifferentchoicesof

γ

willbecome moreclearinthereminderofthepaper.Theclassic problemofsupervisedlearning isthatofdiscovering wo

=

arg min A

[

w

]

,where

A

[

w

] =

t1

t0

F

(

t

,

w

,

T w

)

dt (4)

isthecognitiveaction ofthesystem.Whilethereisanintriguinganalogywithanalyticmechanics(see

Table 1

),we empha-sizethat, whileinmechanics weare onlyinterested instationarypoints ofthemechanicalaction,inlearning, we would alsoliketodeterminetheminimumofcognitiveactiondeﬁnedbyequation(4).Interestingly,aswewillseelater,thestrict connectionwithanalyticmechanicsalsomakessensewhenconsideringstrongdissipation(seealso[6]).

Finally, a commenton the minimization of the cognitive action (4) isin order. In general, the problem makes sense whenevertheLagrangianisgivenallover

[

t0

,

t1

]

.Theverynatureoftheproblemresultsintheexplicittimedependenceof

(4)

consequence,ifthereisnorestrictiononthenatureofthisinteractionthentheoptimizationproblemdoesrequireallthe information comingin the

[

t0

,

t1

]

interval.However, the environmentalinteractions that are ofinterest inthispaper are

those thataretypicallyassociatedwithlearning processes,whereaftertheagenthasinspectedacertain amountof infor-mation,itmakessensetoinvolveitinpredictionsonthefuture.Hence,wheneverwedealwithatrulylearningenvironment, inwhichtheagentisexpectedtocaptureregularitiesintheinspecteddata,asitwillbeshown,theminimizationproblem becomeswell-posedandnicelyﬁtsinthecontextofon-linelearning.

3. Euler–Lagrangelawsoflearning

Let us consider the minimization of functional A

[·]

deﬁned by equation (4). Throughout the paper, we assume that the linearoperators T admits theadjoint T _. _As_stated_in _the_following _proposition, _the_coeﬃcients_of _T _come_directly from T ’s.

Proposition3.1.GiventhelineardifferentialoperatorT ofdegree

,deﬁnedby(1),itsadjointoperatorisalsoalineardifferential operatorofthesamedegreewithcoeﬃcients

β

κ

(

t

)

=

j≥κ

(

−

1

)

j

κ

α

(_jj−κ)

(

t

)

(5)

Proof. IfweuseLeibnizformula,wehave

dj dtj

(

α

j

(

t

)

· χ

(

t

))

=

j

κ=0

j

κ

α

(_jj−κ)

(

t

)

d κ

_χ

dtκ

(

t

).

Fromthedeﬁnitionofadjointoperator,wecaneasilyseethat,ifweposeDj

_:=

_dj

_/

_dtj_,_then_its_adjoint_is

₍

_Dj

₎

_{= (−}

₁

₎

j_Dj_. Thenweget T

(

t

)

=

j=0

(

−

1

)

j j

κ=0

j

κ

α

(_jj−κ)

(

t

)

d κ dtκ

=

κ=0

j≥κ

(

−1

)

j

κ

α

(_jj−κ)

(

t

)

d κ dtκ

≡

k=0

β

k

(

t

)

dk dtk

fromwhichthethesisfollows.

2

Letuscalculatethevariationofthefunctional(4),inthecasen

=

1,overtheinterval

[

t0

,

t1

]

byassumingthat2 the

sta-tionarypoint w

(

·)

hasend-points

(

t0

,

w(κ)

(

t0

))

and

(

t1

,

w(κ)

(

t1

))

with

κ

=

0

,

. . . ,

−

1.Letusconsiderw

ˇ

(

t

)

=

w

(

t

)

+

ξ(

t

)

,

where

ξ(

·)

isavariation and

∈ R

.Clearly,the conditionsontheboundarieson w(κ) _yield _{corresponding} _conditions_on thevariation

ξ

(κ)

(

t0

)

= ξ

(κ)

(

t1

)

=

0

,

κ

=

0

, . . . ,

−

1

.

(6) Thenwehave

δ

A

=

A

[ ˇ

w

] −

A

[

w

]

=

t1

t0

[

F

(

t

,

w

+

ξ,

T w

+

T

ξ )

−

F

(

t

,

w

,

T w

)

]

dt

=

·

t1

t0

[

Fw

· ξ +

FT w

·

T

ξ

]

dt

+

O

(

2

).

Now,becauseofconditions(6),thesecondtermcanbeexpressedby t1

t0 FT w

·

T

ξ

dt

=

t1

t0

(

T FT w

)

· ξ

dt

.

(5)

Hence,weget

δ

A

=

·

t1

t0

[

Fw

+

T FT w

] · ξ

dt

.

Whenimposingthestationarycondition

δ

A

=

0,becauseofthefundamentallemmaofvariationalcalculusweget

Fw

+

T FT w

=

0

.

(7)

Nowwe caneasilyseethat theaboveresultcanbegeneralizedtothesetofvariables wi,i

=

1

,

. . . ,

n.Letusconsider thegeneralcaseinwhichweuseanoperator Tiforeachvariable wi.Inthiscasethevariationturnsouttobe

δ

A

=

·

t2

t1 n

i=1

(

Fwi

+

T iFTiwi

)

· ξ

idt

.

Finally,thisanalysisleadstoanecessaryconditionforthestationaryoffunctionalA,thatisgiveninthefollowingtheorem.

Theorem3.1.Letusassumethatwearegiventhevaluesofw(κ)

₍

_t

0

)

andw(κ)

(

t1

)

with

κ

=

0

,

. . . ,

−

1.Thenfunctional(4)admits

astationarypointwi,i

=

1

,

. . . ,

n,providedthat

Fwi

+

T

iFTiwi

=

0 i

=

1

, . . . ,

n

.

(8)

Remark3.1. Theproof ofthe theoremfollowsthe classic principlesofvariationalcalculus todetermine stationarypoints (see e.g.[7,8]). Theonlyspeciﬁc issuethatarisesinthegivenresultconcernsthepresence ofoperators Ti insteadofthe singlederivatives,whichplaysanimportantroleinthereminderofthepaper.

Remark3.2. In the formulation of learning, one cannot typically rely on the boundary conditions that are assumed in the theorem,whereas it makes sense tomake assumptions on the initial conditions w(κ)

(

t0

)

,

κ

=

0

,

. . . ,

2

−

1 (Cauchy

conditions). Interestingly, in case in which the Euler–Lagrange equations (8) are asymptotically stable, the effect of the initialcondition vanishesast

→ ∞

.Inthispaperwe aremostlyinterested inexploringtheenvironmentalinteractionsof theagentunderasymptoticstability.

Remark3.3. Equations (8) have been derived under the assumption that we are given the end-points

(

t0

,

wκ

(

t0

))

and

(

t1

,

wκ

(

t1

))

with

κ

=

0

,

. . . ,

−

1. While they represent a necessary condition for stationarity,their integration requires

the knowledge of the boundary condition, which leads to a batch-mode formulation of learning. This is related to the commentsattheendofSection 2concerning theon-lineformulationoflearning.However, ifthelearningenvironmentis periodic withperiod

τ

andnτ pairs ineachperiod,thatisif

(

u

(

ti

),

si

)

= (

u

(

ti

+

τ

),

si+nτ

)

,thereisexperimentalevidence to claimthat the causalityissueraised atthe endofSection 2 disappears[9].Basically, theon-line learning onperiodic learning environment returns a solution that very well approximates the one corresponding tothe batch-mode. A more interesting case, that isworth investigating,is theone ofalmost periodic [10] environments inwhich, roughly speaking,

τ

can dependont.The restrictiontotrulylearningonlineenvironments,inwhichtheregularitiescanbecapturedbecause of underlining statistical assumptions, might be better captured by boundary conditions different with respect to those expressedbyequations(6).Forexample,inthesimplestcaseofT

=

D,onecanimposethattheweightsconvergeoverlarge intervals,whichcorrespondswiththetransversalitycondition[8]ontherightborderFT_wi

(

t

,

wi

,

T wi

)

|t

=t1

0.However,in

thispaper,we donot addressthisissue,whereasweexplorethebehaviorofequations(8)by meansofan analysisbased on“energybalance”.

Finally, it is also worth mentioning that the given Euler–Lagrange equations get to stationary points and, therefore, dependingthelearningtask,theactualminimizationofthecognitiveactionmightnotbeachieved.Thisissomewhatrelated totheclassicframeworkofsupervisedlearning,wherethediscoveringofoptimalsolutionsalsoinﬁnite-dimensionalspaces mayledtosuboptimalsolutions[11].

InthecaseofthecognitiveactiondeﬁnedbytheLagrangianofequation(3)wehave

(6)

reductiontomechanics

Letusassumethat

∀

i

=

1

,

. . . ,

n

:

Ti

=

T

=

D be,andlet

γ

= −

1 be.ThenweconsidertheLagrangian

F

(

t

,

w

,

T w

)

=

1 2

ψ

n

i=1 mi

(

D wi

)

2

− ψ

V

(

w

)

where

ψ

=

eθt_._Then_the_{Euler–Lagrange}_equations_(see

_{Theorem 3.1}

₎_become

D2wi

+ θ

D wi

+

Vwi

=

0 i

=

1

, . . . ,

n

,

(10)

which corresponds withdampingoscillatorsin classicmechanics. Here wecan promptlysee therole offunction

ψ(

·)

in thebirthofdissipation.Moreover,theroleof

γ

= −

1 becomes clearinordertoattachtheclassic meaningofpotential to function V .Inaddition,thisisfullycoherentwiththedeﬁnitionofaction,wheretheLagrangianisL

=

K

−

V .

even-order

γ

signﬂip

LetusconsiderthecaseT

=

α

o

+

α

1D

+

α

2D2 and

ψ(

t

)

=

eθt.

From

Proposition 3.1

,wehaveT

=

α

o

−

α

1D

+

α

2D2 andwecaneasilyseethattheEL-equationsbecomes

D4wi

+ β

3D3wi

+ β

2D2wi

+ β

1D wi

+ β

owi

+

γ

α

2 2mi Vwi

=

0 (11) where

β

o

:=

α

o

α

2

θ

2

−

α

o

α

1

θ

+

α

o2

α

2 2

β

1

:=

α

1

α

2

θ

2

+ (

2

α

o

α

2

−

α

21

)θ

α

2 2

β

2

:=

α

2 2

θ

2

+

α

1

α

2

θ

+

2

α

o

α

2

−

α

12

α

2 2

β

3

:=

2

θ.

The comparisonwithequation (10)revealsandinterestingdifferencewhichinvolvestherole ofthesignof

γ

instability. Whileinmechanicswehave

γ

= −

1,wheninvolvingthesecond-orderdifferentialoperatorofthisexample,inordertoget stability,fromRouth–Hurwitzstability criterionweimmediatelyconcludethat

γ

>

0 isanecessarycondition. Thiscanbe straightforwardly extendedtoanyevenorderof T . Inthispaper,thispropertyisreferredtoas

γ

signﬂip,andmotivates thestudyofhigh-orderoperators.

4. DissipativeHamiltonianformulationoflearning

Now we use Legendre transform in order to obtain the Hamiltonian formulation of the problem. We introduce the notations f

˙

i

:=

Tifiand ˚fi

:=

T ifi,soasequations(8)canbere-writtenas

Fwi

+

˚Fw˙i

=

0 i

=

1

,

2

, . . . ,

n

.

(12)

Now,letusintroducetheconjugatemomentum

υ

i

υ

i

:=

∂

F

∂

w

˙

i

.

(13)

Likewise,wedeﬁnetheHamiltonianas

H

(

t

,

w

,

υ

)

:=

n

i=1

υ

iw

˙

i

−

F

(

t

,

w

,

w

˙

),

(14) where w

˙

i mustbethoughtofasfunctionsof wi and

υ

i.

Theorem4.1.TheEuler–Lagrangeequations(8)areequivalenttothe2n Hamiltonequations

˙

wi

=

∂

H

∂

υ

i

,

(15) ˚

υ

i

=

∂

H

∂

wi

.

(16)

(7)

Moreover,wehave

∂

H

∂

t

= −

∂

F

∂

t

.

(17)

Proof. As usual, we introduce the canonical variables wi and

υ

i. Then the canonical equations arise from the EL-equations(8)andfromdeﬁnition(13),asfollows:

∂

H

∂

υ

i

=

∂

υ

i

n j=1

υ

jw

˙

j

−

F

(

t

,

w

,

w

˙

)

= ˙

wi

+

υ

i

∂

w

˙

i

∂

υ

i

−

∂

F

∂

wi

∂

wi

∂

υ

i

−

∂

F

∂

w

˙

i

∂

w

˙

i

∂

υ

i

= ˙

wi

,

∂

H

∂

wi

=

∂

wi

n j=1

υ

jw

˙

j

−

F

(

t

,

w

,

w

˙

)

= ˙

wi

∂

υ

i

∂

wi

+

υ

i

∂

w

˙

i

∂

wi

−

∂

F

∂

wi

−

∂

F

∂

w

˙

i

∂

w

˙

i

∂

wi

= −

∂

F

∂

wi

=

T

∂

F

∂

w

˙

i

=

˚

υ

i

.

Finally,equation(17)arisesfrom(16)asfollows

∂

H

∂

t

=

∂

t

n i=1

υ

iw

˙

i

−

F

(

t

,

w

,

w

˙

)

=

n

i=1

υ

i

∂

w

˙

i

∂

t

−

n

i=1 Fw_˙i

∂

w

˙

i

∂

t

−

∂

F

∂

t

= −

∂

F

∂

t

.

2

TheintegrationofHamiltonequations(15)and(16)canbewrittenformallyas

wi

=

−1 T_i Hυi (18)

υ

i

=

−1 T_i Hwi

,

where− 1 T_i and− 1

T_i denotetheinverseoperatorsofTiandT_i ,respectively.Noticethatwheneverweusetheinverseoperator, we needto solvean ordinary differential equation oforder

and, therefore,we mustalways impose

initial conditions to get a unique solution. While equations (15) and (16)can be used for determining the evolution of w, as it will be showninSection6,equation(15),canofferapictureoflearningintermsofenergybalance.Thefollowingdeﬁnitionplays a fundamental role inthe analysisofenergybalance. The deﬁnitioninvolvesthe set ofvariables

N := {(

wi

,

υ

i

),

i

∈ Nn

}

pairedwiththeassociateddifferentialoperatorT .

Deﬁnition4.1.Giventhesystem

∼ (

N ,

T

)

,alongwithanypairoffunctions

φ

1

(

t

,

w

,

υ

)

and

φ

2

(

t

,

w

,

υ

)

,withcontinuous

derivativesuptothe

-thorder,theterm

{φ

1

, φ

2

}

TD

:=

n

i=1

∂φ

1

∂

wi D−T1

∂φ

2

∂

υ

i

+

∂φ

1

∂

υ

i D− 1 T

∂φ

2

∂

wi

(19)

isreferredtoasthebracketof

φ

1 and

φ

2,withrespecttooperators D and T .

Forthesakeofsimplicity,inthereminderofthepaper,wewillomitthesubscriptandsuperscript,soaswewillsimply usethenation

{φ

1

,

φ

2

}

.Wealsonoticethat thiscanbe regardedasaformal deﬁnition,unlessweknow(asit willbein

whatfollows)theinitialconditionsfor

−1 T

∂φ

2

∂

υ

i and − 1 T

∂φ

2

∂

wi

.

Wecaneasilyseethatgeneral

{φ

1

,

φ

2

}

= {φ

2

,

φ

1

}

andthat

{φ

1

,

φ

2

}

islinearonlyintheﬁrstargument.

Theorem4.2.Giventhesystemdeﬁnedbyw

: [

t0

,

t1

]

→ R

nandany

φ (

t

,

w

,

υ

)

withcontinuousderivatesuptothe

-thorder,we

have

d

φ

dt

=

∂φ

(8)

Proof. Wehave d

φ

dt

=

∂φ

∂

t

+

n

i=1

φ

wiD wi

+ φ

υiD

υ

i

.

(21)

Ifweplug wiand

υ

igivenbyequation(18)intoequation(21)weget

d

φ

dt

=

∂φ

∂

t

+

n

i=1

φ

wiD −1 T Hυi

+ φ

υiD −1 T Hwi

=

∂φ

∂

t

+ {φ,

H

} .

2

5. BG-brackets

Inthissectionwedescribesomepropertiesofthebrackets

{·,

·}

introducedby

Deﬁnition 4.1

.Theiranalysisleads toan in-depthunderstandingofenergyexchangeprocesses.Westartnoticingthatif

∀

i

=

1

,

. . . ,

n

:

Ti

=

D then

{·,

·}

reduce to classicPoisson’sbrackets.Thisisformallystatedbythefollowingproposition

Proposition5.1.Thebrackets

{·,

·}

areageneralizationofPoissonbracketstothecaseinwhich

∀

i

=

1

,

. . . ,

n

:

Ti

=

D,thatis

{φ

1

, φ

2

} = {φ

1

, φ

2

}

P.B.

.

Proof. From

Proposition 5

,undertheassumptionthat

∀

i

=

1

,

. . . ,

n

:

Ti

=

D,wederivethat T_i

= −

D.Hence,wehave

{φ

1

, φ

2

} =

n

i=1

∂φ

1

∂

wi D−D1

∂φ

2

∂

υ

i

−

∂φ

1

∂

υ

i D−D1

∂φ

2

∂

wi

=

n

i=1

∂φ

1

∂

wi

∂φ

2

∂

υ

i

−

∂φ

1

∂

υ

i

∂φ

2

∂

wi

= {φ

1

, φ

2

}

P.B.

.

2

restoftheadjoint

Now,wefocusonthecase3 _T

₌

_D._{Interestingly,}_we _will_show _that_the_enrichment_of_the_differential_operator_{T with}

re-specttoD hasimportantconsequencesontheenergybalanceequation(20).Whilefrom

Proposition 5.1

wehave

{

H

,

H

} =

0, itwillbeshownthat T

=

D,ingeneralleadsto

{

H

,

H

}

=

0.Let

w

,

υ

:=

t

t0

w

(

τ

)

υ

(

τ

)

d

τ

(22)

be. We introduce a coupleof concepts that turn out to be usefulin thefollowing. First, we notice that any differential operator T canbepairedwitharealnumber,whichcomesoutwhenconsideringthebehavioronthebordersof

T w

,

υ

.

Deﬁnition5.1.LetT beadifferentialoperatorandlet

·,

·

betheinnerproductgivenby(22).Thenforanygivenfunction pairoffunctions f

,

g

∈

W,2

(

T

|

f

,

g

)

:=

T f

,

g

−

f

,

T g

,

(23)

isreferredtoastherestoftheadjoint ofT withrespectto f and g.

Clearly,

(

·|

·,

·)

islinearineachofitsargumentsanditisdistributiveinboththeoperatorandthetwo functionslots. Moreover, we havethat

(

c

·

I

|

f

,

g

)

≡

0 foranyconstant c. Noticethat

(

T

|

f

,

g

)

:=

T f

,

g

=

0 wheneverwe aregiven thesameboundaryconditionson f and g.

Proposition5.2.LetTm

₌

m

i=0

α

oDibe,where

α

iarerealconstant.Thentherestoftheadjoint

(

T

|

w

,

v

)

w.r.t.w and

υ

canbe computedbytherecurrentequations

3 _For_the_sake_of_simplicity,_in_the_remainder_of_the_paper,_we_assume_∀_i₌₁_,_{. . . ,}_n_: _T i=T .

(9)

i

.

(

Tm

|

w

,

υ

)

= (

Tm−1

|

w

,

υ

)

+

α

m

(

Dm

|

w

,

υ

),

ii

. (

Dm

|

w

,

υ

)

=

υ

Dm−1w

− (

Dm−1

|

w

,

D

υ

),

(24)

where

[

u

]

:=

u

(

t1

)

−

u

(

t0

)

isthevariationonthebordersofu. Proof. i. Wehave

(

Tm

|

w

,

υ

)

=

Tm−1w

+

α

mDmw

,

υ

−

w

, (

Tm−1

)

υ

+

α

m

(

Dm

)

υ

= (

Tm−1

|

w

,

υ

)

+

α

m

Dmw

,

υ

−

w

, (

Dm

)

υ

= (

Tm−1

|

w

,

υ

)

+

α

m

(

Dm

|

w

,

υ

).

ii. Inordertogettherecurrentequationii.,westartusingintegrationbypartsasfollows:

(

Dmw

|

υ

)

=

Dmw

,

υ

−

w

, (

Dm

)

υ

=

Dmw

,

υ

− (−

1

)

m

w

, (

Dm

)

υ

= [

v Dm−1w

] −

D

υ

,

Dm−1w

+ (−

1

)

m−1

[

w Dm−1

υ

] −

D w

,

Dm−1

υ

= [

v Dm−1w

] + [

w

(

Dm−1

)

υ

] −

D

υ

,

Dm−1w

+

D w

, (

Dm−1

)

υ

.

Nowwehave

(

Dm−1

|

w

,

D

υ

)

=

Dm−1w

,

D

υ

−

w

, (

Dm−1

)

D

υ

,

and,therefore,weget

(

Dm

|

w

,

υ

)

= [

v Dm−1w

] + [

w

(

Dm−1

)

υ

]

−

w

, (

Dm−1

)

D

υ

+ (

Dm−1

|

w

,

D

υ

)

+

D w

, (

Dm−1

)

υ

= [

v Dm−1w

] + [

w

(

Dm−1

)

υ

] − [

w

(

Dm−1

)

υ

] − (

Dm−1

|

w

,

D

υ

)

= [

v Dm−1w

] − (

Dm−1

|

w

,

D

υ

).

2

Example5.1.LetT

=

α

0

+

α

1D

+

α

2D2.Tocalculatetherestoftheadjointweuse

Proposition 5.2

.Fromequation(24)-ii we

cancalculate

(

Dm

_|

_w

_,

_υ

₎

_for_m

₌

₁

_,

₂

_,

_3._First,_notice_that

₍

_D0

_|

_w

_,

_υ

₎

₌

_0._Then,_for_{D we}_have

(

D

|

w

,

υ

)

= [

υ

w

] − (

D0

|

w

,

D0

υ

)

= [

υ

w

].

Nowwecancalculate

(

D2

|

w

,

υ

)

:

(

D2

|

w

,

υ

)

= [

υ

D w

] − (

D

|

w

,

υ

)

= [

υ

D w

] − (

D

|

w

,

D

υ

)

= [

υ

D w

] − [

w D

υ

] = [

υ

D w

−

w D

υ

].

Finally,fromequation(24)-ii,weget

(

T

|

w

,

υ

)

= [

α

1w

υ

+

α

2

(

υ

D w

−

w D

υ

)

].

(25)

Thenotionofrestoftheadjointplaysanimportantroleintheremainder ofthepaper,whereweareinterested inthe followingextendeddeﬁnitionwhichinvolveslearningsystem

∼ (

N ,

T

)

.

Deﬁnition5.2.Given

∼ (

N ,

T

)

wedeﬁne

M

(,

[

t0

,

t1

]) :=

n

i=1

(

T

|

wi

,

D

υ

i

)

+ (

D

|

wi

,

T

υ

i

).

(26)

M

(,

[

t0

,

t1

])

isreferredtoastherestofT withrespectto

N

.

(10)

Lemma5.1. M

(,

[

t0

,

t1

]) :=

n

i=1

(

T

+

D

)

wi

, (

T

+

D

)

υ

i

.

(27)

Proof. Theproofisastraightforwardconsequenceofthedeﬁnition.

2

First, wenoticethat ifT

=

D then T

₊

_D

₌

_{0 and,}_consequently _M

_(,

_[

_t

0

,

t1

])

=

0.Clearly,the lemmaenlightensthe

wayM

(,

[

t0

,

t1

])

emergesfrombreakingtheadjointpropertyT

+

D

=

0,thatholdsforT

=

D.

Inthecaseof

Example 5.1

,usingequation(25)weget

M

(,

[

t0

,

t1

]) =

n

i=1

(

T

|

wi

,

D

υ

i

)

+ (

D

|

wi

,

T

υ

i

)

=

n

i=1

[

α

1wiD

υ

i

+

α

2

(

D

υ

iD wi

−

wiD2

υ

i

)

] + [

wi

(

α

0

−

α

1D

υ

i

+

α

2D2

υ

i

)

]

=

n

i=1

[

α

0wi

+

α

2D wiD

υ

i

]

Thisexamplesuggeststhat M

(,

[

t0

,

t1

])

dependsonthevalueassumedby

B

(

t

,

w

,

D w

,

D

υ

)

=

n

i=1

α

0wi

+

α

2D wiD

υ

i ontheboundariesof

[

t0

,

t1

]

.Wehave

M

(,

[

t0

,

t1

]) =

B

(

t1

,

w

,

D w

,

D

υ

)

−

B

(

t0

,

w

,

D w

,

D

υ

).

Clearly,thisholdsfor

withanydifferentialoperator T .Asaconsequence,wheneverweknowthat

exhibitsaperiodic behavioron

[

t0

,

t1

]

then

M

(,

[

t0

,

t1

]) =

0

.

NowweshowapropertyofM whichleadstoaclearinterpretationintermsoftheenergyexchangesoftheagentwiththe environment.

Theorem5.1.LetT beapositiveself-adjointoperatorwithconstantcoeﬃcientsandletusconsiderasystem

suchthat

υ

i

=

T wi. ThenM

≥

0.

Proof. FromthehypothesisT

=

T .Then,fromthedeﬁnitionofM weget

M

(,

[

t0

,

t1

]) =

n

i=1

(

T

|

wi

,

D

υ

i

)

+ (

D

|

wi

,

T

υ

i

)

=

n

i=1

T wi

,

D

υ

i

−

wi

,

T D

υ

i

+

D wi

,

T

υ

i

+

wi

,

D T

υ

i

=

n

i=1

T wi

,

D

υ

i

+

D wi

,

T

υ

i

=

n

i=1

(

T

+

D

)

wi

, (

T

+

D

)

υ

i

.

Since

υ

i

=

T wi andT

≥

0 weget

M

(,

[

t0

,

t1

]) =

n

i=1

(

T

+

D

)

wi

, (

T

+

D

)

T wi

=

n

i=1

(

T

+

D

)

w

i ui

,

T

(

T

+

D

)

w

i ui

=

n

i=1

ui

,

T ui

≥

0

.

2

(11)

T–DCommutator

Anotherimportantasymmetry ariseswhichplays an importantrole incaseoftime-dependent differentialoperators. The followingdeﬁnitionturnsouttobeuseful.

Deﬁnition5.3.Givenanytwodifferentialoperators T1 andT2 wedeﬁne

[

T1

,

T2

] :=

T1T2

−

T2T1

.

(28)

Thedifferentialoperator

[

T1

,

T2

]

isreferredtoasthecommutator ofT1andT2.

Ofcourse,wehave

[

T1

,

T2

]

+ [

T2

,

T1

]

=

0.Inparticular,inthefollowing,weareinterestedinthecaseinwhichT1

=

T

andT2

=

D.Inthatcase

T D y

−

D T y

=

i=0

α

iDiD y

−

D

i=1

α

iDiy

=

i=0

α

iDi+1y

−

i=0 D

α

iDiy

−

i=0

α

iDi+1y

= −

i=0 D

α

iDiy

≡ −

i=1

β

iDiy

.

(29)

Clearly,

[

T

,

D

]

=

0 forconstant

α

icoeﬃcients,whereasingeneral,

[

T

,

D

]

isadifferentialoperatordeﬁnedbycoeﬃcients

β

i. Likefortherestoftheadjoint,itturnsouttobeusefultoextendthenotionofcommutatortothecaseofasetconjugate variables

N

fordifferentialoperators T and D.

Deﬁnition5.4.Letusconsiderthesystem

= {

N ,

T

}

over

[

t0

,

t1

]

.Then

ϒ(,

[

t0

,

t1

]) :=

n

i=1

wi

,

[

T

,

D

]

υ

i

(30)

isreferredtoastheT

₋

_{D commutator}_of

_N

_.

Again,the T –D commutatorexpressesthedegreeofasymmetry betweenT and D.Wehavesymmetry

[

T

,

D

]

=

T D

−

D T

=

0 whenever T isatime-invariantdifferential operator, whereasthe asymmetryjustarisesbecause ofthetemporal evolution of coeﬃcients

α

i. In order to give an interpretation of

ϒ(,

[

t0

,

t1

])

, let us consider the case in which T is

replacedwith

φ

T ,where

φ

isapositiveincreasingmonotonicfunction.ThiscasehasbeenalreadymentionedinSection2

with

φ (

t

)

=

√

ψ(

t

)

.Wediscussthebasicideainthesimplestcaseinwhich T

= φ

D.

Theorem5.2.LetusassumethatM

(,

[

t0

,

t1

])

=

0.IfT

= φ

D then

ϒ(,

[

t0

,

t1

]) = −

n

i=1 mi t1

t0

(

D wi

)

2

φ

D

φ

dt

.

(31)

Proof. First,wecaneasilyseethat

T

= −

D

φ

·

D0

− φ

D

,

whereD0

=

I istheidentityoperator.Nowwecancalculate

[

T

,

D

] =

T D

−

D T

= (−

D

φ

·

I

− φ

D

)

D

−

D

(

−

D

φ

·

I

− φ

D

)

=

D2

φ

I

+

D

φ

D

.

Now,since M

(,

[

t0

,

t1

])

=

0 and

υ

i

=

miT wi

=

mi

φ

D wi,wehave

ϒ(,

[

t0

,

t1

]) =

n

i=1 t1

t0 wi

(

υ

iD2

φ

+

D

φ

D

υ

i

)

dt

=

n

i=1 t1

t0 wi

(

D

(

υ

iD

φ))

dt

= −

n

i=1 t1

t0

υ

iD wiD

φ

dt

= −

n

i=1 mi t1

t0 D

φ (

D wi

)

2

φ

dt

.

2

(12)

Letusconsiderthecaseinwhich

ψ(

t

)

= φ

2

(

t

)

=

eθt.Weget

−ϒ(, [

t0

,

t1

]) =

n

i=1 mi t1

t0 D

φ (

D wi

)

2

φ

dt

=

1 2 n

i=1 mi

θ

t1

t0

(

D wi

)

2

ψ

dt

>

0

.

Thisoffersaclearinterpretationof

−ϒ(,

[

t0

,

t1

])

,whichturnsouttobethedissipatedenergyof

over

[

t0

,

t1

]

.However,

noticethatthisinterpretationislimitedtothecaseD

φ >

0.

Example5.2.LetT

=

sin

(

ω

t

)

·

D be.Wehave

T

= −

ω

cos

(

ω

t

)

·

I

−

sin

(

ω

t

)

D

and

[

T

,

D

] = −

ω

2sin

(

ω

t

)

·

I

+

ω

cos

(

ω

t

)

D

.

UndertheassumptionthatM

(,

[

t0

,

t1

])

=

0 and

υ

i

=

miT wi

=

misin

ω

t

·

D wehave

ϒ(,

[

t0

,

t1

]) =

n

i=1 t1

t0 wiD

(

ω

cos

(

ω

t

)

υ

i

)

dt

= −

n

i=1 t1

t0 cos

(

ω

t

)

υ

iD widt

= −

1 2 n

i=1 mi t1

t0 sin 2

ω

t

(

D wi

)

2dt

.

This exampleclearly showsthatthe signof

ϒ(,

[

t0

,

t1

])

can ﬂipwithfunctions

ψ(

·)

that arenot monotonic.Hence,

ϒ(,

[

t0

,

t1

])

modelsinteractionsoftheagentwiththeenvironmentwheretheenergyﬂowsinbothdirections.

The

{·,

·}

bracketsassumeaspecialmeaningwheninvolvingtheHamiltonianofagivensystem.4 Theorem5.3.Giventhesystem

over

[

t0

,

t1

]

then

{

H

,

H

} =

d

dt

(

M

+ ϒ).

(32)

Proof. BecauseoftheHamiltonequations,wehave

{

H

,

H

} =

n

i=1

HwiD −1 T Hυi

+

HυiD −1 T Hwi

=

n

i=1

T

υ

iD −1 T T wi

+

T wiD −1 T T

υ

i

=

n

i=1

T

υ

i

·

D wi

+

T wi

·

D

υ

i

.

Now,ifweintegrateover

[

t0

,

t

]

andusethedeﬁnitionofrestoftheadjointandofcommutator,weget t

t0

{

H

,

H

}

d

τ

=

n

i=1 t

t0

(

T

υ

i

·

D wi

+

T wi

·

D

υ

i

)

d

τ

=

n

i=1

T

υ

i

,

D wi

+

T wi

,

D

υ

i

=

n

i=1

(

D

|

wi

,

T

υ

i

)

−

D T

υ

i

,

wi

+ (

T

|

wi

,

D

υ

i

)

+

wi

,

T D

υ

i

4 _Whenever_there_is_no_risk_of_ambiguity,_in_the_following_we_drop_the_dependence_on_,_[_t

(13)

=

n

i=1

(

D

|

wi

,

T

υ

i

)

+ (

T

|

wi

,

D

υ

i

)

+

n

i=1

wi

, (

T D

−

D T

)

υ

i

=

M

+ ϒ.

Finally,wegetthethesiswhenderivingw.r.t.tot bothsides.

2

This resultshowsthat the

{·,

·}

brackets assumesa special meaning in the casein whichboth the operands are the Hamiltonianfunction.Inparticular,inthatcase,thereisaBoundaryGeneration expressedbyM

(,

[

t0

,

t1

])

,duetotherest

of the adjoint,and a BarteringGeneration expressed by

ϒ(,

[

t0

,

t1

])

, which is dueto the exchange of the operators T

and D.Forthisreason,

{·,

·}

isreferredtoastheBG-brackets.

6. Energybalanceinlearningprocesses

In thissection we discussenergybalancing connected withthegeneral formofcognitive action whoseLagrangian is givenbyequation(3).Whileequations(15)and(16)canbeusedtodeterminethelearningdynamics,equation (17)offers an interesting viewof learningin termsofenergybalancing.Notice that theequation involvesthe canonicalvariables w and

υ

inH ,aswellasw in

˙

F .Analternativewayofconstructingenergybalanceonlybasedoncanonicalvariablesisthat ofusingexpressing

{

H

,

H

}

accordingto

Theorem 5.3

.Wehave

dH dt

=

dK dt

−

γ

dV dt

=

∂

H

∂

t

+ {

H

,

H

} = −

γ

∂

V

∂

t

+

∂

K

∂

t

+

d dt

(

M

+ ϒ).

Wealsohave dV dt

=

∂

V

∂

t

+ {

V

,

H

},

dK dt

=

∂

K

∂

t

+ {

K

,

H

},

whichsuggeststheintroductionoffunctions

V

and

K

deﬁnedby

d

V

dt

= {

V

,

H

} =

dV dt

−

∂

V

∂

t

,

(33) d

K

dt

= {

K

,

H

} =

dK dt

−

∂

K

∂

t

.

(34)

Finally,theanalysisissummarizedbythefollowingtheorem,whichestablishestheenergybalanceofanylearningsystem. Unlikeequation(17),thegivenenergybalanceisfullyexpressedintermsofcanonicalvariables w and

υ

.

Theorem6.1.Thesystem

(,

T

)

evolvesaccordingtotheinvariantcondition

d

dt

(

K

−

γ

V

−

M

− ϒ) =

0

.

(35)

Animportantsourceofinteractionisrepresentedbytheenvironmentalpotentialenergy

E

=

E

(,

[

t0

,

t

]) :=

t

t0

∂

V

∂

τ

d

τ

,

andby thebarteringenergy

ϒ

,whichappearswhenever T _does_not_commutate_with_D._While

_K

_reﬂects_the_velocity_of theconnectionsdynamics,theboundaryenergyM takesintoaccounttheexchangeofenergyatthebeginningandatthe currenttimeofthelearninghorizon.

dampingoscillators

Inordertoprovideaninterpretationofanalyticmechanics asaspecialcase,let usconsider T

=

D and

γ

= −

1.Basically, we areconsidering theLagrangian whichhas originatedthedampingoscillatorofequation (10)From the deﬁnition(30)

of

ϒ

wehave

[

T

,

D

]

= [−

D

,

D

]

=

0 and,therefore,

ϒ

=

0.Likewise,from

Lemma 5.1

,since D

+

T

=

0,we concludethat M

=

0.Asaconsequence,theinvariantof

Theorem 6.1

,expressedintermsofcanonicalvariables,yields

d

dt

(

K

+

V

)

=

0

.

(36)

Thisis infact theclassic invariant condition ofclassic mechanics, whichestablishes that the sumofthekinetic and po-tentialenergies isinvariant. Interestingly,thisholdsalso incasesinwhich thekinetic energy K and thepotential V are