Contents lists available atScienceDirect
Theoretical
Computer
Science
www.elsevier.com/locate/tcs
The
principle
of
least
cognitive
action
Alessandro Betti
a, Marco Gori
b,
∗
aDepartmentofPhysics,UniversityofPisa,Italy
bDepartmentofInformationEngineeringandMathematics,UniversityofSiena,Italy
a
r
t
i
c
l
e
i
n
f
o
a
b
s
t
r
a
c
t
Articlehistory: Received3January2015
Receivedinrevisedform17June2015 Accepted19June2015
Availableonline2July2015 Keywords:
BG-brackets Lifelong learning Naturallearningtheory On-linelearning Leastcognitiveaction
By and large,the interpretation of learning as acomputational process taking place in bothhumansand machines isprimarily provided intheframework ofstatistics. Inthis paper, wepropose aradically differentperspective inwhichthe emergence oflearning is regardedas theoutcome oflaws ofnature that governtheinteractions ofintelligent agentswiththeirownenvironment.Weintroduceanaturallearningtheory based onthe principleofleastcognitiveaction,whichisinspiredtotherelatedmechanicalprinciple,and tothe Hamiltonianframework formodelingthemotionofparticles.The introductionof the kineticand ofthe potential energy leadstoa surprisinglynatural interpretation of learningasadissipativeprocess.Thekineticenergyreflectsthetemporalvariationofthe synaptic connections, while the potential energyis apenalty that describes the degree of satisfactionof the environmental constraints. The theorygives a picture of learning intermsoftheenergybalancingmechanisms,wherethenovelnotionsofboundaryand barteringenergiesareintroduced.Finally,as anexampleofapplicationofthetheory,we showthatthesupervisedmachinelearningschemecanbeframedintheproposedtheory and, in particular, we show that the Euler–Lagrange differential equations of learning collapsetotheclassicgradientalgorithmonthesupervisedpairs.
©2015ElsevierB.V.All rights reserved.
1. Introduction
Machine Learningis ata lively crossroadof disciplines,where theexploration ofneuroscience and cognitive psychol-ogymeetscomputational modelsmostlybasedon statisticalfoundations.Forthesemodels toberealistic,one istypically concerned with the acquisition of learning skills on a sample oftraining data,that is sufficientlylarge to be consistent witha statisticallyrelevanttest set.However, inmostchallenging andinteresting learning taskstakingplace inhumans, theunderlyingcomputationalprocessesdo notseemtooffersuchaneatidentificationofthetrainingset.Astimegoesby, humans reactsurprisinglywell to newstimuli,while keepingpastacquiredskills,whichseems tobe hard toreach with nowadaysintelligentagents.Thissuggestsustolookforalternativefoundationsoflearning,whicharenotnecessarilybased onstatisticalmodelsofthewholeagentlife.
Inthispaper,weinvestigatetheemergenceoflearningastheoutcomeoflawsofnaturethatgoverntheinteractionsof intelligentagentswiththeirownenvironment,regardlessoftheirnature.Theunderlyingprincipleisthattheacquisitionof cognitiveskillsbylearningobeysinformation-basedlawsontheseinteractions,whichholdregardlessofbiology.Inthisnew perspective,inparticular,weintroduceanaturallearningtheory aimedatdiscoveringthefundamentaltemporally-embedded
*
Correspondingauthor.E-mailaddress:marco@dii.unisi.it(M. Gori). http://dx.doi.org/10.1016/j.tcs.2015.06.042 0304-3975/©2015ElsevierB.V.All rights reserved.
mechanismsofenvironmentalinteraction.Whiletheroleoftime hasbeenquiterelevantinmachinelearning(seee.g.the HiddenMarkovModels),insomechallengingproblems,likecomputervision,thetemporaldimensionseemstoplayquitea minorrole.Whatcouldhumanvisionhadbeeninaworldofvisualinformationwithshuffledframes?Anycognitiveprocess aimed atextractingsymbolicinformationfromimagesthat are notframesof atemporallycoherent visualstream would have beenextremelyharder thanin ourvisual experience [1].More thanlooking forsmartalgorithmic solutions todeal withtemporalcoherence,inthispaperwerelyontheideaofregardinglearningasaprocesswhichisdeeplyinterwound withtime,justlikeinmostlawsofnature.
Inordertoderivethelawsoflearning,weintroducetheprincipleofleastcognitiveaction,whichisinspiredtotherelated mechanicalprinciple,andtotheHamiltonianframeworkformodelingthemotionofparticles.Unlikemechanics,however, the cognitiveactionthatwe defineisinfacttheobjectivetobe minimized,morethana functionalforwhichto discover a stationary point. Inour learning framework, thisduality isbased on a proper introductionof the “kinetic”andof the “potentialenergy,”thatleadstoasurprisinglynaturalinterpretationoflearning asadissipativeprocess.Thekineticenergy reflectsthetemporalvariationofthesynapticconnections,whilethepotentialenergyisapenaltythatdescribesthedegree of satisfaction ofthe environmental constraints.The theory gives a picture oflearning interms ofthe energy balancing mechanisms, wherethe novelnotions ofboundaryandbarteringenergies are introduced. Thesenewenergies arise mostly to modelcomplex environmentalinteractions of theagent, that are nicelycapturedby means oftime-variant high-order differential operators.WhenpairingthemwiththenovelnotionofBG-bracket,weshowhow intelligentprocessescanbe understood in terms ofenergy balance; learning turns out be a dissipative process which reduces the potential energy by moving theconnection weights,thus producingakinetic energy.However, theenergybalancing doesinvolvealsothe boundary and barteringenergies. The former arisesbecause of theenergy exchange atthe beginning andat the endof thetemporalhorizon, whilethelast oneisdeeplyconnectedwiththeenergyexchangeduring theagent’slife.Itisworth mentioningthatthisenergybalance,whichhasanicequalitativeinterpretation,isinfacttheformaloutcomeofthemain theoretical results givenin the paper. In particular, the proposed framework incorporates classic mechanics as a special case,whileitopensthedoorstoin-depthanalysesinanycontextinwhichthereisanexplicittemporaldependenceinthe Lagrangianofthesystem.
Whilethepaperfocusesonlawsoflearningthatareindependentofthenatureoftheintelligentagent,itisshownthat theselawsalsoofferageneralframeworktoderiveclassicon-linemachinelearningalgorithms.Inparticular,weshowthat the supervisedmachine learning scheme canbe framed in theproposed theory, andthat theEuler–Lagrange differential equationsoflearningderived inthepaperdocollapsetotheclassicgradientalgorithm.Interestingly,thetheoryprescribes thetime-variantlearningrate,whicharisesfromthegivenvariationalformulation.
The paperis organized as follows.In the next section, we introduce naturalprinciples oflearning and, in particular, the notion of cognitive action. Its minimization, which leads to the laws of learning, is described in Section 3. In Sec-tion 4 we proposea dissipativeHamiltonian framework oflearning, which,in Section 5,iscompleted by the analysisof the BG-brackets. Section 6relies on the previous results to provide an interpretation of learning dynamics by means of energybalance.InSection7,weshow howthelawsoflearningcan beconvertedtothegradientalgorithm.Finally,some conclusionsaredrawninSection8.
2. Naturalprinciplesoflearning
Thenotionoftimeisubiquitousinlawsofnature.Surprisinglyenough,moststudiesonmachinelearninghaverelegated time to the relatednotionof iterationstep.1 From one sidethe connection issound, since itinvolves thecomputational effort,thatisalsoobservedinnature.Fromtheotherside,whiletimehasatrulycontinuousnature,thetimediscretization typicalinfieldslikecomputervisionseemstogiveuptothechallengeofconstructingatrulytheoryoflearning.Thispaper presentsanapproachtolearningthatincorporatestimeinitstrulycontinuousstructure,soastheevolutionoftheweights of theneural synapses follows equationsthat resemble lawsof physics. Weconsider environmentalinteractions that are modeled under the recentlyproposed framework of learningfromconstraints [2]. Inparticular, we focus attentionon the caseinwhicheachconstraintoriginatesalossfunction,thatisexpectedtotakeonsmallvaluesincaseofsoft-satisfaction. Interestingly,thankstotheadoptionofthet-normtheory[3],lossfunctionscanbeconstructedthatcanalsorepresentthe satisfactionoflogicconstraints.
A lot of emphasis in machine learning has been on supervised learning where we are given the collection
L
=
{(
tκ,
u(
tκ)),
sκ}
κ∈N of supervisedpairs(
u(
tκ),
sκ)
over thetemporal sequence{
tκ}
κ∈N.At a givent, the pairs(
u(
tκ),
sk)
thathavebecomeavailablearedenotedbyL
t:= {
κ
∈ N :
tκ≤
t}
.Weassumethatthosedataarelearnedbyafeedforward neuralnetwork definedbythefunctionf
(
·, ·) : R
n× R
d→ R
D,
soastheinputu
(
t)
ismappedto y(
t)
=
f(
w(
t),
u(
t))
.Classicsupervisedlearningcanbenaturallymodeledbyintroducing a loss function L(
·,
·)
: R
2→ R
along with the given training setL
. Examples of different loss functionscan be foundTable 1
Linksbetweenthenaturallearningtheoryandclassicalmechanics. Natural Learning TheoryMechanics Remarks
wiqi Weightsareinterpretedasgeneralizedcoordinates. ˙
wiq˙i Weightsvariationsareinterpretedasgeneralizedvelocities.
υipi Theconjugatemomentumtotheweightsisdefinedbyusingthemachinery ofLegendretransforms.
A[w]S[q] Thecognitiveactionisthedualoftheactioninmechanics.
F(t,w,w˙)L(t,q,q˙) TheLagrangianF tisassociatedwiththeclassicLagrangianL inmechanics. H(t,w,υ)H(t,q,p) Whenusingw andυ,wecandefinetheHamiltonian,justlikeinmechanics.
in[4](Ch. 3). Aclassiccaseistheoneofquadraticloss,thatis L
(
y,
s)
=
12(
y−
s)
2.Becauseofthevariationalformulation oflearningthatisgiveninthispaper,itisconvenienttoassociateL withtheoveralldistributionallossV
(
t,
w)
=
1 2ψ (
t)
κ∈ Lt
(
f(
w(
t),
u(
t))
−
sκ)
2· δ(
t−
tκ).
Here,
ψ(
·)
isreferredtoasthedissipationfunction,anditisassumedtobeapositiveandincreasingmonotonicfunctionon itsdomain.Afundamentalprinciplebehindtheemergence oflearning,whichwillbefullygainedintheremainderofthe paper,isthatitdoesrequireenergydissipation.IntheHamiltonianframework,theintroductionofdissipationprocessescan bedoneindifferentways.Theideaoffactorizingthepotential,aswellasthewholeLagrangian,byatemporally-growing ex-ponentialtermisrelatedtostudiesondissipativeHamiltoniansystems[5].Weregard w∈ R
nastheLagrangiancoordinates ofavirtualmechanicalsystem.Throughoutthispaper,V(
·,
·)
isreferredtoasthepotentialenergy ofthesystemdefinedby Lagrangiancoordinatesw.Inmachinelearning,welookfortrajectoriesw(
t)
thatpossiblyleadtoconfigurationswithsmall potentialenergy.Followingthedualitywithmechanics,wealsointroducethenotionofkineticenergy.Forreasonsthatwill becomeclearintheremainderofthepaper,wegeneralizethenotionofvelocity, Tiwi,where,foreachfixed i,Tiisatime dependentlineardifferentialoperatoroftheformTiTi
(
t)
=
j=0α
i,j(
t)
dj dtj.
(1)Letmi
>
0 bethemass ofeachparticle(dualofconnectionweight)definedbytheposition wi(
t)
andvelocityw˙
i.Then,we definethekineticenergy asK
(
t,
T w)
=
1 2ψ
n i=1 mi(
Tiwi)
2.
(2)Clearly,ifTi
=
T=
d/
dt andψ(
t)
≡
1 then wehave K=
21n
i=1miw˙
i2,whichreturnstheclassiccaseofanalytic mechan-ics. Wenotice onpassing thatone could always adsorbthe dissipativefunctionfactorby introducing Tiψ:=
√
ψ
Ti,so as K(
t,
T w)
=
12n
i=1mi(
Tiψwi)
2.Now,let usconsider a Lagrangian simplycomposed ofweighed the sumofthe potential V(
·,
·)
andthekineticenergyF
(
t,
w,
T w)
=
n i=1 F(
t,
wi,
Tiwi)
=
n i=1 K(
t,
Tiwi)
+
γ
V(
t,
w).
(3) Herewemostly assumeγ
∈ R
,soasthecognitiveactiongetsaclearmeaningintermsofregularizationtheory.However, itisimportanttonoticethatifγ
= −
1 thecognitiveactionstrictlyrelatestoclassicactioninmechanics.Theconsequences ofdifferentchoicesofγ
willbecome moreclearinthereminderofthepaper.Theclassic problemofsupervisedlearning isthatofdiscovering wo=
arg min A[
w]
,whereA
[
w] =
t1
t0
F
(
t,
w,
T w)
dt (4)isthecognitiveaction ofthesystem.Whilethereisanintriguinganalogywithanalyticmechanics(see
Table 1
),we empha-sizethat, whileinmechanics weare onlyinterested instationarypoints ofthemechanicalaction,inlearning, we would alsoliketodeterminetheminimumofcognitiveactiondefinedbyequation(4).Interestingly,aswewillseelater,thestrict connectionwithanalyticmechanicsalsomakessensewhenconsideringstrongdissipation(seealso[6]).Finally, a commenton the minimization of the cognitive action (4) isin order. In general, the problem makes sense whenevertheLagrangianisgivenallover
[
t0,
t1]
.Theverynatureoftheproblemresultsintheexplicittimedependenceofconsequence,ifthereisnorestrictiononthenatureofthisinteractionthentheoptimizationproblemdoesrequireallthe information comingin the
[
t0,
t1]
interval.However, the environmentalinteractions that are ofinterest inthispaper arethose thataretypicallyassociatedwithlearning processes,whereaftertheagenthasinspectedacertain amountof infor-mation,itmakessensetoinvolveitinpredictionsonthefuture.Hence,wheneverwedealwithatrulylearningenvironment, inwhichtheagentisexpectedtocaptureregularitiesintheinspecteddata,asitwillbeshown,theminimizationproblem becomeswell-posedandnicelyfitsinthecontextofon-linelearning.
3. Euler–Lagrangelawsoflearning
Let us consider the minimization of functional A
[·]
defined by equation (4). Throughout the paper, we assume that the linearoperators T admits theadjoint T . Asstatedin thefollowing proposition, thecoefficientsof T comedirectly from T ’s.Proposition3.1.GiventhelineardifferentialoperatorT ofdegree
,definedby(1),itsadjointoperatorisalsoalineardifferential operatorofthesamedegreewithcoefficients
β
κ(
t)
=
j≥κ(
−
1)
j jκ
α
(jj−κ)(
t)
(5)Proof. IfweuseLeibnizformula,wehave
dj dtj
(
α
j(
t)
·
χ
(
t))
=
j κ=0 jκ
α
(jj−κ)(
t)
d κχ
dtκ(
t).
Fromthedefinitionofadjointoperator,wecaneasilyseethat,ifweposeDj
:=
dj/
dtj,thenitsadjointis(
Dj)
= (−
1)
jDj. Thenweget T(
t)
=
j=0(
−
1)
j j κ=0 jκ
α
(jj−κ)(
t)
d κ dtκ=
κ=0 j≥κ(
−1
)
j jκ
α
(jj−κ)(
t)
d κ dtκ≡
k=0β
k(
t)
dk dtkfromwhichthethesisfollows.
2
Letuscalculatethevariationofthefunctional(4),inthecasen
=
1,overtheinterval[
t0,
t1]
byassumingthat2 thesta-tionarypoint w
(
·)
hasend-points(
t0,
w(κ)(
t0))
and(
t1,
w(κ)(
t1))
withκ
=
0,
. . . ,
−
1.Letusconsiderwˇ
(
t)
=
w(
t)
+
ξ(
t)
,where
ξ(
·)
isavariation and∈ R
.Clearly,the conditionsontheboundarieson w(κ) yield corresponding conditionson thevariationξ
(κ)(
t0)
= ξ
(κ)(
t1)
=
0,
κ
=
0, . . . ,
−
1.
(6) Thenwehaveδ
A=
A[ ˇ
w] −
A[
w]
=
t1 t0[
F(
t,
w+
ξ,
T w+
T
ξ )
−
F(
t,
w,
T w)
]
dt=
·
t1 t0[
Fw· ξ +
FT w·
Tξ
]
dt+
O
(
2
).
Now,becauseofconditions(6),thesecondtermcanbeexpressedby t1
t0 FT w·
Tξ
dt=
t1 t0(
T FT w)
· ξ
dt.
Hence,weget
δ
A=
·
t1 t0[
Fw+
T FT w] · ξ
dt.
Whenimposingthestationarycondition
δ
A=
0,becauseofthefundamentallemmaofvariationalcalculuswegetFw
+
T FT w=
0.
(7)Nowwe caneasilyseethat theaboveresultcanbegeneralizedtothesetofvariables wi,i
=
1,
. . . ,
n.Letusconsider thegeneralcaseinwhichweuseanoperator Tiforeachvariable wi.Inthiscasethevariationturnsouttobeδ
A=
·
t2 t1 n i=1(
Fwi+
T iFTiwi)
· ξ
idt.
Finally,thisanalysisleadstoanecessaryconditionforthestationaryoffunctionalA,thatisgiveninthefollowingtheorem.
Theorem3.1.Letusassumethatwearegiventhevaluesofw(κ)
(
t0
)
andw(κ)(
t1)
withκ
=
0,
. . . ,
−
1.Thenfunctional(4)admitsastationarypointwi,i
=
1,
. . . ,
n,providedthatFwi
+
T
iFTiwi
=
0 i=
1, . . . ,
n.
(8)Remark3.1. Theproof ofthe theoremfollowsthe classic principlesofvariationalcalculus todetermine stationarypoints (see e.g.[7,8]). Theonlyspecific issuethatarisesinthegivenresultconcernsthepresence ofoperators Ti insteadofthe singlederivatives,whichplaysanimportantroleinthereminderofthepaper.
Remark3.2. In the formulation of learning, one cannot typically rely on the boundary conditions that are assumed in the theorem,whereas it makes sense tomake assumptions on the initial conditions w(κ)
(
t0)
,κ
=
0,
. . . ,
2−
1 (Cauchyconditions). Interestingly, in case in which the Euler–Lagrange equations (8) are asymptotically stable, the effect of the initialcondition vanishesast
→ ∞
.Inthispaperwe aremostlyinterested inexploringtheenvironmentalinteractionsof theagentunderasymptoticstability.Remark3.3. Equations (8) have been derived under the assumption that we are given the end-points
(
t0,
wκ(
t0))
and(
t1,
wκ(
t1))
withκ
=
0,
. . . ,
−
1. While they represent a necessary condition for stationarity,their integration requiresthe knowledge of the boundary condition, which leads to a batch-mode formulation of learning. This is related to the commentsattheendofSection 2concerning theon-lineformulationoflearning.However, ifthelearningenvironmentis periodic withperiod
τ
andnτ pairs ineachperiod,thatisif(
u(
ti),
si)
= (
u(
ti+
τ
),
si+nτ)
,thereisexperimentalevidence to claimthat the causalityissueraised atthe endofSection 2 disappears[9].Basically, theon-line learning onperiodic learning environment returns a solution that very well approximates the one corresponding tothe batch-mode. A more interesting case, that isworth investigating,is theone ofalmost periodic [10] environments inwhich, roughly speaking,τ
can dependont.The restrictiontotrulylearningonlineenvironments,inwhichtheregularitiescanbecapturedbecause of underlining statistical assumptions, might be better captured by boundary conditions different with respect to those expressedbyequations(6).Forexample,inthesimplestcaseofT=
D,onecanimposethattheweightsconvergeoverlarge intervals,whichcorrespondswiththetransversalitycondition[8]ontherightborderFTwi(
t,
wi,
T wi)
|t
=t10.However,inthispaper,we donot addressthisissue,whereasweexplorethebehaviorofequations(8)by meansofan analysisbased on“energybalance”.
Finally, it is also worth mentioning that the given Euler–Lagrange equations get to stationary points and, therefore, dependingthelearningtask,theactualminimizationofthecognitiveactionmightnotbeachieved.Thisissomewhatrelated totheclassicframeworkofsupervisedlearning,wherethediscoveringofoptimalsolutionsalsoinfinite-dimensionalspaces mayledtosuboptimalsolutions[11].
InthecaseofthecognitiveactiondefinedbytheLagrangianofequation(3)wehave
reductiontomechanics
Letusassumethat
∀
i=
1,
. . . ,
n:
Ti=
T=
D be,andletγ
= −
1 be.ThenweconsidertheLagrangianF
(
t,
w,
T w)
=
1 2ψ
n i=1 mi(
D wi)
2− ψ
V(
w)
where
ψ
=
eθt.ThentheEuler–Lagrangeequations(seeTheorem 3.1
)becomeD2wi
+ θ
D wi+
Vwi=
0 i=
1, . . . ,
n,
(10)which corresponds withdampingoscillatorsin classicmechanics. Here wecan promptlysee therole offunction
ψ(
·)
in thebirthofdissipation.Moreover,theroleofγ
= −
1 becomes clearinordertoattachtheclassic meaningofpotential to function V .Inaddition,thisisfullycoherentwiththedefinitionofaction,wheretheLagrangianisL=
K−
V .even-order
γ
signflipLetusconsiderthecaseT
=
α
o+
α
1D+
α
2D2 andψ(
t)
=
eθt.From
Proposition 3.1
,wehaveT=
α
o−
α
1D+
α
2D2 andwecaneasilyseethattheEL-equationsbecomesD4wi
+ β
3D3wi+ β
2D2wi+ β
1D wi+ β
owi+
γ
α
2 2mi Vwi=
0 (11) whereβ
o:=
α
oα
2θ
2−
α
oα
1θ
+
α
o2α
2 2β
1:=
α
1α
2θ
2+ (
2α
oα
2−
α
21)θ
α
2 2β
2:=
α
2 2θ
2+
α
1α
2θ
+
2α
oα
2−
α
12α
2 2β
3:=
2θ.
The comparisonwithequation (10)revealsandinterestingdifferencewhichinvolvestherole ofthesignof
γ
instability. Whileinmechanicswehaveγ
= −
1,wheninvolvingthesecond-orderdifferentialoperatorofthisexample,inordertoget stability,fromRouth–Hurwitzstability criterionweimmediatelyconcludethatγ
>
0 isanecessarycondition. Thiscanbe straightforwardly extendedtoanyevenorderof T . Inthispaper,thispropertyisreferredtoasγ
signflip,andmotivates thestudyofhigh-orderoperators.4. DissipativeHamiltonianformulationoflearning
Now we use Legendre transform in order to obtain the Hamiltonian formulation of the problem. We introduce the notations f
˙
i:=
Tifiand ˚fi:=
T ifi,soasequations(8)canbere-writtenasFwi
+
˚Fw˙i=
0 i=
1,
2, . . . ,
n.
(12)Now,letusintroducetheconjugatemomentum
υ
iυ
i:=
∂
F∂
w˙
i.
(13)Likewise,wedefinetheHamiltonianas
H
(
t,
w,
υ
)
:=
n
i=1
υ
iw˙
i−
F(
t,
w,
w˙
),
(14) where w˙
i mustbethoughtofasfunctionsof wi andυ
i.Theorem4.1.TheEuler–Lagrangeequations(8)areequivalenttothe2n Hamiltonequations
˙
wi=
∂
H∂
υ
i,
(15) ˚υ
i=
∂
H∂
wi.
(16)Moreover,wehave
∂
H∂
t= −
∂
F∂
t.
(17)Proof. As usual, we introduce the canonical variables wi and
υ
i. Then the canonical equations arise from the EL-equations(8)andfromdefinition(13),asfollows:∂
H∂
υ
i=
∂
∂
υ
in j=1
υ
jw˙
j−
F(
t,
w,
w˙
)
= ˙
wi+
υ
i∂
w˙
i∂
υ
i−
∂
F∂
wi∂
wi∂
υ
i−
∂
F∂
w˙
i∂
w˙
i∂
υ
i= ˙
wi,
∂
H∂
wi=
∂
∂
win j=1
υ
jw˙
j−
F(
t,
w,
w˙
)
= ˙
wi∂
υ
i∂
wi+
υ
i∂
w˙
i∂
wi−
∂
F∂
wi−
∂
F∂
w˙
i∂
w˙
i∂
wi= −
∂
F∂
wi=
T∂
F∂
w˙
i=
˚υ
i.
Finally,equation(17)arisesfrom(16)asfollows
∂
H∂
t=
∂
∂
tn i=1
υ
iw˙
i−
F(
t,
w,
w˙
)
=
n i=1υ
i∂
w˙
i∂
t−
n i=1 Fw˙i∂
w˙
i∂
t−
∂
F∂
t= −
∂
F∂
t.
2
TheintegrationofHamiltonequations(15)and(16)canbewrittenformallyas
wi
=
−1 Ti Hυi (18)υ
i=
−1 Ti Hwi,
where− 1 Ti and− 1Ti denotetheinverseoperatorsofTiandTi ,respectively.Noticethatwheneverweusetheinverseoperator, we needto solvean ordinary differential equation oforder
and, therefore,we mustalways impose
initial conditions to get a unique solution. While equations (15) and (16)can be used for determining the evolution of w, as it will be showninSection6,equation(15),canofferapictureoflearningintermsofenergybalance.Thefollowingdefinitionplays a fundamental role inthe analysisofenergybalance. The definitioninvolvesthe set ofvariables
N := {(
wi,
υ
i),
i∈ Nn
}
pairedwiththeassociateddifferentialoperatorT .Definition4.1.Giventhesystem
∼ (
N ,
T)
,alongwithanypairoffunctionsφ
1(
t,
w,
υ
)
andφ
2(
t,
w,
υ
)
,withcontinuousderivativesuptothe
-thorder,theterm
{φ
1, φ
2}
TD:=
n i=1∂φ
1∂
wi D−T1∂φ
2∂
υ
i+
∂φ
1∂
υ
i D− 1 T∂φ
2∂
wi (19)isreferredtoasthebracketof
φ
1 andφ
2,withrespecttooperators D and T .Forthesakeofsimplicity,inthereminderofthepaper,wewillomitthesubscriptandsuperscript,soaswewillsimply usethenation
{φ
1,
φ
2}
.Wealsonoticethat thiscanbe regardedasaformal definition,unlessweknow(asit willbeinwhatfollows)theinitialconditionsfor
−1 T
∂φ
2∂
υ
i and − 1 T∂φ
2∂
wi.
Wecaneasilyseethatgeneral
{φ
1,
φ
2}
= {φ
2,
φ
1}
andthat{φ
1,
φ
2}
islinearonlyinthefirstargument.Theorem4.2.Giventhesystemdefinedbyw
: [
t0,
t1]
→ R
nandanyφ (
t,
w,
υ
)
withcontinuousderivatesuptothe-thorder,we
have
d
φ
dt
=
∂φ
Proof. Wehave d
φ
dt=
∂φ
∂
t+
n i=1φ
wiD wi+ φ
υiDυ
i.
(21)Ifweplug wiand
υ
igivenbyequation(18)intoequation(21)wegetd
φ
dt=
∂φ
∂
t+
n i=1φ
wiD −1 T Hυi+ φ
υiD −1 T Hwi=
∂φ
∂
t+ {φ,
H} .
2
5. BG-bracketsInthissectionwedescribesomepropertiesofthebrackets
{·,
·}
introducedbyDefinition 4.1
.Theiranalysisleads toan in-depthunderstandingofenergyexchangeprocesses.Westartnoticingthatif∀
i=
1,
. . . ,
n:
Ti=
D then{·,
·}
reduce to classicPoisson’sbrackets.ThisisformallystatedbythefollowingpropositionProposition5.1.Thebrackets
{·,
·}
areageneralizationofPoissonbracketstothecaseinwhich∀
i=
1,
. . . ,
n:
Ti=
D,thatis{φ
1, φ
2} = {φ
1, φ
2}
P.B..
Proof. From
Proposition 5
,undertheassumptionthat∀
i=
1,
. . . ,
n:
Ti=
D,wederivethat Ti= −
D.Hence,wehave{φ
1, φ
2} =
n i=1∂φ
1∂
wi D−D1∂φ
2∂
υ
i−
∂φ
1∂
υ
i D−D1∂φ
2∂
wi=
n i=1∂φ
1∂
wi∂φ
2∂
υ
i−
∂φ
1∂
υ
i∂φ
2∂
wi= {φ
1, φ
2}
P.B..
2
restoftheadjointNow,wefocusonthecase3 T
=
D.Interestingly,we willshow thattheenrichmentofthedifferentialoperatorT withre-specttoD hasimportantconsequencesontheenergybalanceequation(20).Whilefrom
Proposition 5.1
wehave{
H,
H} =
0, itwillbeshownthat T=
D,ingeneralleadsto{
H,
H}
=
0.Let w,
υ
:=
t
t0
w
(
τ
)
υ
(
τ
)
dτ
(22)be. We introduce a coupleof concepts that turn out to be usefulin thefollowing. First, we notice that any differential operator T canbepairedwitharealnumber,whichcomesoutwhenconsideringthebehavioronthebordersof
T w,
υ
.Definition5.1.LetT beadifferentialoperatorandlet
·,
·
betheinnerproductgivenby(22).Thenforanygivenfunction pairoffunctions f,
g∈
W,2(
T|
f,
g)
:=
T f,
g−
f,
T g,
(23)isreferredtoastherestoftheadjoint ofT withrespectto f and g.
Clearly,
(
·|
·,
·)
islinearineachofitsargumentsanditisdistributiveinboththeoperatorandthetwo functionslots. Moreover, we havethat(
c·
I|
f,
g)
≡
0 foranyconstant c. Noticethat(
T|
f,
g)
:=
T f,
g=
0 wheneverwe aregiven thesameboundaryconditionson f and g.Proposition5.2.LetTm
=
m
i=0
α
oDibe,whereα
iarerealconstant.Thentherestoftheadjoint(
T|
w,
v)
w.r.t.w andυ
canbe computedbytherecurrentequations3 Forthesakeofsimplicity,intheremainderofthepaper,weassume∀i=1,. . . ,n: T i=T .
i
.
(
Tm|
w,
υ
)
= (
Tm−1|
w,
υ
)
+
α
m(
Dm|
w,
υ
),
ii. (
Dm|
w,
υ
)
=
υ
Dm−1w− (
Dm−1|
w,
Dυ
),
(24)where
[
u]
:=
u(
t1)
−
u(
t0)
isthevariationonthebordersofu. Proof. i. Wehave(
Tm|
w,
υ
)
=
Tm−1w+
α
mDmw,
υ
−
w, (
Tm−1)
υ
+
α
m(
Dm)
υ
= (
Tm−1|
w,
υ
)
+
α
mDmw
,
υ
−
w, (
Dm)
υ
= (
Tm−1|
w,
υ
)
+
α
m(
Dm|
w,
υ
).
ii. Inordertogettherecurrentequationii.,westartusingintegrationbypartsasfollows:
(
Dmw|
υ
)
=
Dmw,
υ
−
w, (
Dm)
υ
=
Dmw,
υ
− (−
1)
mw, (
Dm)
υ
= [
v Dm−1w] −
Dυ
,
Dm−1w+ (−
1)
m−1[
w Dm−1υ
] −
D w,
Dm−1υ
= [
v Dm−1w] + [
w(
Dm−1)
υ
] −
D
υ
,
Dm−1w+
D w, (
Dm−1)
υ
.
Nowwehave(
Dm−1|
w,
Dυ
)
=
Dm−1w,
Dυ
−
w, (
Dm−1)
Dυ
,
and,therefore,weget
(
Dm|
w,
υ
)
= [
v Dm−1w] + [
w(
Dm−1)
υ
]
−
w
, (
Dm−1)
Dυ
+ (
Dm−1|
w,
Dυ
)
+
D w, (
Dm−1)
υ
= [
v Dm−1w] + [
w(
Dm−1)
υ
] − [
w(
Dm−1)
υ
] − (
Dm−1|
w,
Dυ
)
= [
v Dm−1w] − (
Dm−1|
w,
Dυ
).
2
Example5.1.LetT
=
α
0+
α
1D+
α
2D2.TocalculatetherestoftheadjointweuseProposition 5.2
.Fromequation(24)-ii wecancalculate
(
Dm|
w,
υ
)
form=
1,
2,
3.First,noticethat(
D0|
w,
υ
)
=
0.Then,forD wehave(
D|
w,
υ
)
= [
υ
w] − (
D0|
w,
D0υ
)
= [
υ
w].
Nowwecancalculate
(
D2|
w,
υ
)
:(
D2|
w,
υ
)
= [
υ
D w] − (
D|
w,
υ
)
= [
υ
D w] − (
D|
w,
Dυ
)
= [
υ
D w] − [
w Dυ
] = [
υ
D w−
w Dυ
].
Finally,fromequation(24)-ii,weget
(
T|
w,
υ
)
= [
α
1wυ
+
α
2(
υ
D w−
w Dυ
)
].
(25)Thenotionofrestoftheadjointplaysanimportantroleintheremainder ofthepaper,whereweareinterested inthe followingextendeddefinitionwhichinvolveslearningsystem
∼ (
N ,
T)
.Definition5.2.Given
∼ (
N ,
T)
wedefineM
(,
[
t0,
t1]) :=
ni=1
(
T|
wi,
Dυ
i)
+ (
D|
wi,
Tυ
i).
(26)M
(,
[
t0,
t1])
isreferredtoastherestofT withrespecttoN
.Lemma5.1. M
(,
[
t0,
t1]) :=
n i=1(
T+
D)
wi, (
T+
D)
υ
i.
(27)Proof. Theproofisastraightforwardconsequenceofthedefinition.
2
First, wenoticethat ifT=
D then T+
D=
0 and,consequently M(,
[
t0
,
t1])
=
0.Clearly,the lemmaenlightensthewayM
(,
[
t0,
t1])
emergesfrombreakingtheadjointpropertyT+
D=
0,thatholdsforT=
D.Inthecaseof
Example 5.1
,usingequation(25)wegetM
(,
[
t0,
t1]) =
n i=1(
T|
wi,
Dυ
i)
+ (
D|
wi,
Tυ
i)
=
n i=1[
α
1wiDυ
i+
α
2(
Dυ
iD wi−
wiD2υ
i)
] + [
wi(
α
0−
α
1Dυ
i+
α
2D2υ
i)
]
=
n i=1[
α
0wi+
α
2D wiDυ
i]
Thisexamplesuggeststhat M
(,
[
t0,
t1])
dependsonthevalueassumedbyB
(
t,
w,
D w,
Dυ
)
=
n
i=1
α
0wi+
α
2D wiDυ
i ontheboundariesof[
t0,
t1]
.WehaveM
(,
[
t0,
t1]) =
B(
t1,
w,
D w,
Dυ
)
−
B(
t0,
w,
D w,
Dυ
).
Clearly,thisholdsfor
withanydifferentialoperator T .Asaconsequence,wheneverweknowthat
exhibitsaperiodic behavioron
[
t0,
t1]
thenM
(,
[
t0,
t1]) =
0.
NowweshowapropertyofM whichleadstoaclearinterpretationintermsoftheenergyexchangesoftheagentwiththe environment.
Theorem5.1.LetT beapositiveself-adjointoperatorwithconstantcoefficientsandletusconsiderasystem
suchthat
υ
i=
T wi. ThenM≥
0.Proof. FromthehypothesisT
=
T .Then,fromthedefinitionofM wegetM
(,
[
t0,
t1]) =
n i=1(
T|
wi,
Dυ
i)
+ (
D|
wi,
Tυ
i)
=
n i=1 T wi,
Dυ
i−
wi,
T Dυ
i+
D wi,
Tυ
i+
wi,
D Tυ
i=
n i=1 T wi,
Dυ
i+
D wi,
Tυ
i=
n i=1(
T+
D)
wi, (
T+
D)
υ
i.
Sinceυ
i=
T wi andT≥
0 wegetM
(,
[
t0,
t1]) =
n i=1(
T+
D)
wi, (
T+
D)
T wi=
n i=1(
T
+
D
)
wi ui
,
T(
T
+
D
)
wi ui
=
n i=1 ui,
T ui≥
0.
2
T–DCommutator
Anotherimportantasymmetry ariseswhichplays an importantrole incaseoftime-dependent differentialoperators. The followingdefinitionturnsouttobeuseful.
Definition5.3.Givenanytwodifferentialoperators T1 andT2 wedefine
[
T1,
T2] :=
T1T2−
T2T1.
(28)Thedifferentialoperator
[
T1,
T2]
isreferredtoasthecommutator ofT1andT2.Ofcourse,wehave
[
T1,
T2]
+ [
T2,
T1]
=
0.Inparticular,inthefollowing,weareinterestedinthecaseinwhichT1=
TandT2
=
D.InthatcaseT D y
−
D T y=
i=0α
iDiD y−
D i=1α
iDiy=
i=0α
iDi+1y−
i=0 Dα
iDiy−
i=0α
iDi+1y= −
i=0 Dα
iDiy≡ −
i=1β
iDiy.
(29)Clearly,
[
T,
D]
=
0 forconstantα
icoefficients,whereasingeneral,[
T,
D]
isadifferentialoperatordefinedbycoefficientsβ
i. Likefortherestoftheadjoint,itturnsouttobeusefultoextendthenotionofcommutatortothecaseofasetconjugate variablesN
fordifferentialoperators T and D.Definition5.4.Letusconsiderthesystem
= {
N ,
T}
over[
t0,
t1]
.Thenϒ(,
[
t0,
t1]) :=
ni=1
wi,
[
T,
D]
υ
i (30)isreferredtoastheT
−
D commutatorofN
.Again,the T –D commutatorexpressesthedegreeofasymmetry betweenT and D.Wehavesymmetry
[
T,
D]
=
T D−
D T=
0 whenever T isatime-invariantdifferential operator, whereasthe asymmetryjustarisesbecause ofthetemporal evolution of coefficientsα
i. In order to give an interpretation ofϒ(,
[
t0,
t1])
, let us consider the case in which T isreplacedwith
φ
T ,whereφ
isapositiveincreasingmonotonicfunction.ThiscasehasbeenalreadymentionedinSection2with
φ (
t)
=
√
ψ(
t)
.Wediscussthebasicideainthesimplestcaseinwhich T= φ
D.Theorem5.2.LetusassumethatM
(,
[
t0,
t1])
=
0.IfT= φ
D thenϒ(,
[
t0,
t1]) = −
n i=1 mi t1 t0(
D wi)
2φ
Dφ
dt.
(31)Proof. First,wecaneasilyseethat
T
= −
Dφ
·
D0− φ
D,
whereD0
=
I istheidentityoperator.Nowwecancalculate[
T,
D] =
T D−
D T= (−
Dφ
·
I− φ
D)
D−
D(
−
Dφ
·
I− φ
D)
=
D2φ
I+
Dφ
D.
Now,since M
(,
[
t0,
t1])
=
0 andυ
i=
miT wi=
miφ
D wi,wehaveϒ(,
[
t0,
t1]) =
n i=1 t1 t0 wi(
υ
iD2φ
+
Dφ
Dυ
i)
dt=
n i=1 t1 t0 wi(
D(
υ
iDφ))
dt= −
n i=1 t1 t0υ
iD wiDφ
dt= −
n i=1 mi t1 t0 Dφ (
D wi)
2φ
dt.
2
Letusconsiderthecaseinwhich
ψ(
t)
= φ
2(
t)
=
eθt.Weget−ϒ(, [
t0,
t1]) =
n i=1 mi t1 t0 Dφ (
D wi)
2φ
dt=
1 2 n i=1 miθ
t1 t0(
D wi)
2ψ
dt>
0.
Thisoffersaclearinterpretationof
−ϒ(,
[
t0,
t1])
,whichturnsouttobethedissipatedenergyofover
[
t0,
t1]
.However,noticethatthisinterpretationislimitedtothecaseD
φ >
0.Example5.2.LetT
=
sin(
ω
t)
·
D be.WehaveT
= −
ω
cos(
ω
t)
·
I−
sin(
ω
t)
Dand
[
T,
D] = −
ω
2sin(
ω
t)
·
I+
ω
cos(
ω
t)
D.
UndertheassumptionthatM
(,
[
t0,
t1])
=
0 andυ
i=
miT wi=
misinω
t·
D wehaveϒ(,
[
t0,
t1]) =
n i=1 t1 t0 wiD(
ω
cos(
ω
t)
υ
i)
dt= −
n i=1 t1 t0 cos(
ω
t)
υ
iD widt= −
1 2 n i=1 mi t1 t0 sin 2ω
t(
D wi)
2dt.
This exampleclearly showsthatthe signof
ϒ(,
[
t0,
t1])
can flipwithfunctionsψ(
·)
that arenot monotonic.Hence,ϒ(,
[
t0,
t1])
modelsinteractionsoftheagentwiththeenvironmentwheretheenergyflowsinbothdirections.The
{·,
·}
bracketsassumeaspecialmeaningwheninvolvingtheHamiltonianofagivensystem.4 Theorem5.3.Giventhesystemover
[
t0,
t1]
then{
H,
H} =
ddt
(
M+ ϒ).
(32)Proof. BecauseoftheHamiltonequations,wehave
{
H,
H} =
n i=1HwiD −1 T Hυi
+
HυiD −1 T Hwi=
n i=1T
υ
iD −1 T T wi+
T wiD −1 T Tυ
i=
n i=1T
υ
i·
D wi+
T wi·
Dυ
i.
Now,ifweintegrateover
[
t0,
t]
andusethedefinitionofrestoftheadjointandofcommutator,weget t t0{
H,
H}
dτ
=
n i=1 t t0(
Tυ
i·
D wi+
T wi·
Dυ
i)
dτ
=
n i=1 Tυ
i,
D wi+
T wi,
Dυ
i=
n i=1(
D|
wi,
Tυ
i)
−
D Tυ
i,
wi+ (
T|
wi,
Dυ
i)
+
wi,
T Dυ
i4 Wheneverthereisnoriskofambiguity,inthefollowingwedropthedependenceon,[t
=
n i=1(
D|
wi,
Tυ
i)
+ (
T|
wi,
Dυ
i)
+
n i=1 wi, (
T D−
D T)
υ
i=
M+ ϒ.
Finally,wegetthethesiswhenderivingw.r.t.tot bothsides.
2
This resultshowsthat the
{·,
·}
brackets assumesa special meaning in the casein whichboth the operands are the Hamiltonianfunction.Inparticular,inthatcase,thereisaBoundaryGeneration expressedbyM(,
[
t0,
t1])
,duetotherestof the adjoint,and a BarteringGeneration expressed by
ϒ(,
[
t0,
t1])
, which is dueto the exchange of the operators Tand D.Forthisreason,
{·,
·}
isreferredtoastheBG-brackets.6. Energybalanceinlearningprocesses
In thissection we discussenergybalancing connected withthegeneral formofcognitive action whoseLagrangian is givenbyequation(3).Whileequations(15)and(16)canbeusedtodeterminethelearningdynamics,equation (17)offers an interesting viewof learningin termsofenergybalancing.Notice that theequation involvesthe canonicalvariables w and
υ
inH ,aswellasw in˙
F .Analternativewayofconstructingenergybalanceonlybasedoncanonicalvariablesisthat ofusingexpressing{
H,
H}
accordingtoTheorem 5.3
.WehavedH dt
=
dK dt−
γ
dV dt=
∂
H∂
t+ {
H,
H} = −
γ
∂
V∂
t+
∂
K∂
t+
d dt(
M+ ϒ).
Wealsohave dV dt=
∂
V∂
t+ {
V,
H},
dK dt=
∂
K∂
t+ {
K,
H},
whichsuggeststheintroductionoffunctions
V
andK
definedbyd
V
dt= {
V,
H} =
dV dt−
∂
V∂
t,
(33) dK
dt= {
K,
H} =
dK dt−
∂
K∂
t.
(34)Finally,theanalysisissummarizedbythefollowingtheorem,whichestablishestheenergybalanceofanylearningsystem. Unlikeequation(17),thegivenenergybalanceisfullyexpressedintermsofcanonicalvariables w and
υ
.Theorem6.1.Thesystem
(,
T)
evolvesaccordingtotheinvariantconditiond
dt
(
K
−
γ
V
−
M− ϒ) =
0.
(35)Animportantsourceofinteractionisrepresentedbytheenvironmentalpotentialenergy
E
=
E
(,
[
t0,
t]) :=
t t0∂
V∂
τ
dτ
,
andby thebarteringenergy
ϒ
,whichappearswhenever T doesnotcommutatewithD.WhileK
reflectsthevelocityof theconnectionsdynamics,theboundaryenergyM takesintoaccounttheexchangeofenergyatthebeginningandatthe currenttimeofthelearninghorizon.dampingoscillators
Inordertoprovideaninterpretationofanalyticmechanics asaspecialcase,let usconsider T
=
D andγ
= −
1.Basically, we areconsidering theLagrangian whichhas originatedthedampingoscillatorofequation (10)From the definition(30)of
ϒ
wehave[
T,
D]
= [−
D,
D]
=
0 and,therefore,ϒ
=
0.Likewise,fromLemma 5.1
,since D+
T=
0,we concludethat M=
0.Asaconsequence,theinvariantofTheorem 6.1
,expressedintermsofcanonicalvariables,yieldsd
dt
(
K
+
V
)
=
0.
(36)Thisis infact theclassic invariant condition ofclassic mechanics, whichestablishes that the sumofthekinetic and po-tentialenergies isinvariant. Interestingly,thisholdsalso incasesinwhich thekinetic energy K and thepotential V are