Preconditioned Nonlinear Conjugate Gradient methods
Giovanni Fasano and Massimo Roma
Contents
1. Introduction(3).
2. PreconditionedNonlinearConjugateGradientalgorithm(6).
3. AnewSymmetricRank-2update(7).
4. ApreconditionerusingaBFGSlikelowrankquasi-Newtonupdate(13).
5. Preliminarynumericalexperiences(18).
6. Conclusionsandfutureworks(27).
1. Introduction
Wedealwiththelargescaleunconstrainedoptimizationproblem
(1.1)
min
𝑥∈IR
𝑛𝑓 (𝑥)
where
𝑓 : IR 𝑛 −→ IR
is a twice continuously dierentiablefunction and𝑛
islarge. Weassumethat foragiven
𝑥 0 ∈ IR 𝑛
thelevelsetΩ 0 = {𝑥 ∈ IR 𝑛 ∣ 𝑓 (𝑥) ≤ 𝑓 (𝑥 0 )}
iscompact. Thehugenumberofrealworldapplicationswhichcanbemodelled
asalargescaleoptimization problem stronglymotivatesthe growinginterest
forthesolutionofsuch problems.
Among the iterative methods for large scale unconstrained optimization,
when the Hessian matrix is possibly dense, limited memory quasiNewton
methods are often themethods of choice. As well known(see any textbook,
e.g. [11]),theygenerateasequence
{𝑥 𝑘 }
,accordingtothefollowingscheme(1.2)
𝑥 𝑘+1 = 𝑥 𝑘 + 𝛼 𝑘 𝑝 𝑘 , 𝑘 = 0, 1, . . . ,
with
𝑝 𝑘 = −𝐻 𝑘 ∇𝑓 (𝑥 𝑘 ),
where
𝐻 𝑘
is an approximation of the inverse of the Hessian matrix∇ 2 𝑓 (𝑥 𝑘 )
and
𝛼 𝑘
isasteplength. Inparticular,insteadofcomputing𝐻 𝑘
ateachiteration𝑘
, these methods update𝐻 𝑘
in a simplemanner, in order to obtain thenewapproximation
𝐻 𝑘+1
to be used in the next iteration. Moreover, instead ofstoringfulldense
𝑛 × 𝑛
approximations,theyonlysaveafewvectorsoflength𝑛
,which allowtorepresenttheapproximationsimplicitly.AmongthequasiNewtonschemes,theLBFGSmethodisusuallyconsid-
eredoneofthemostecient. Itiswellsuitedforlargescaleproblemsbecause
theamount of storageis limited and controlled by the user. This method is
basedon theconstruction of theapproximationof the inverse of theHessian
matrix,byexploitingcurvatureinformation gainedonlyfrom themostrecent
iterations. The inverse oftheHessianmatrixis updatedatthe
𝑘
-thiteration(1.3)
𝐻 𝑘+1 = 𝑉 𝑘 𝑇 𝐻 𝑘 𝑉 𝑘 + 𝜌 𝑘 𝑠 𝑘 𝑠 𝑇 𝑘
where
𝜌 𝑘 = 1 𝑦 𝑇 𝑘 𝑠 𝑘
, 𝑉 𝑘 = 𝐼 − 𝜌 𝑘 𝑦 𝑘 𝑠 𝑇 𝑘 ,
and
(1.4)
𝑠 𝑘 = 𝑥 𝑘+1 − 𝑥 𝑘 = 𝛼 𝑘 𝑝 𝑘 , 𝑦 𝑘 = ∇𝑓 (𝑥 𝑘+1 ) − ∇𝑓 (𝑥 𝑘 ).
Observethat
𝐻 𝑘
alsosatisesrelation𝐻 𝑘 = (𝑉 𝑘−1 𝑇 ⋅ ⋅ ⋅ 𝑉 𝑘−𝑚 𝑇 )𝐻 𝑘 0 (𝑉 𝑘−𝑚 ⋅ ⋅ ⋅ 𝑉 𝑘−1 )
+ 𝜌 𝑘−𝑚 (𝑉 𝑘−1 𝑇 ⋅ ⋅ ⋅ 𝑉 𝑘−𝑚+1 𝑇 )𝑠 𝑘−𝑚 𝑠 𝑇 𝑘−𝑚 (𝑉 𝑘−𝑚+1 ⋅ ⋅ ⋅ 𝑉 𝑘−1 ) + 𝜌 𝑘−𝑚+1 (𝑉 𝑘−1 𝑇 ⋅ ⋅ ⋅ 𝑉 𝑘−𝑚+2 𝑇 )𝑠 𝑘−𝑚+1 𝑠 𝑇 𝑘−𝑚+1 (𝑉 𝑘−𝑚+2 ⋅ ⋅ ⋅ 𝑉 𝑘−1 ) + ⋅ ⋅ ⋅
+ 𝜌 𝑘−1 𝑠 𝑘−1 𝑠 𝑇 𝑘−1 ,
where
𝑚
is thememory of themethod and𝐻 𝑘 0
is aninitial approximationof theinverseoftheHessianmatrix.The well known reasons for the success of the LBFGS method can be
summarizedinthefollowingtwopoints: rstly,evenwhen
𝑚
issmall,𝐻 𝑘+1
isaneectiveapproximationoftheinverseoftheHessianmatrix,secondly
𝐻 𝑘+1
istheunique(positivedenite)matrixwhichsolvestheproblem
min 𝐻 ∥𝐻 − 𝐻 𝑘 ∥ 𝐹
𝑠.𝑡. 𝐻 = 𝐻 𝑇 𝑠 𝑘 = 𝐻𝑦 𝑘 ,
where
∥ ⋅ ∥ 𝐹
istheFrobeniusnorm. Namely,𝐻 𝑘+1
isthepositivedenitema-trixclosest to thecurrentapproximation
𝐻 𝑘
, satisfyingthesecantequation𝑠 𝑘 = 𝐻𝑦 𝑘
. However,LBFGSmethodpresentssomedrawbacks,includingthetheHessianmatrixare veryspread. Moreover,onsomeapplications, theper-
formancesofLBFGSmethodandtheNonlinearConjugateGradientmethod
arecomparable.
In this paper we focus on the latter method: the Nonlinear Conjugate
Gradientmethod (NCG). As well known (see any textbook,e.g. [11]) it isa
naturalextension to generalfunctions ofthelinearConjugateGradient(CG)
method for quadratic functions. It generates a sequence
{𝑥 𝑘 }
according toscheme(1.2),with
𝑝 𝑘 = −∇𝑓 (𝑥 𝑘 ) + 𝛽 𝑘 𝑝 𝑘−1 ,
where
𝛽 𝑘
isasuitablescalar. Dierentvaluesof𝛽 𝑘
giverisetodierentalgo-rithms (see[8] forasurvey). The mostcommonare theFletcherand Reeves
(FR),thePolakandRibière(PR)andtheHestenesandStiefel(HS)algorithms.
Although theNCG methods havebeenwidely studied and are oftenvery
ecientwhensolvinglargescaleproblems,akeypointforincreasingtheire-
ciencyistheuseofapreconditioningstrategy,especiallywhensolvingdicult
illconditionedproblems. Deninggoodpreconditioners for NCGmethods is
currentlystill considered achallengingresearch topic. Onthis guideline, this
work is devoted to investigate the use of quasiNewton updates as precon-
ditioners. In particular, we want to propose preconditioners which possibly
inherittheeectivenessoftheLBFGSupdate. Indeed,herewebuildprecon-
ditionersiterativelydenedandbasedonquasiNewtonupdatesoftheinverse
of the Hessian matrix. This represents an attempt to improve the eciency
oftheNCGmethod byconveyinginformationcollectedfrom aquasiNewton
method,in aPreconditionedNonlinearConjugateGradientmethod(PNCG).
Inparticular, westudynewsymmetriclowrankupdatesoftheinverseofthe
Hessianmatrix,inorder toiterativelydenepreconditionersforPNCG.
Itis worthto notethatthere existsacloseconnectionbetweenBFGSand
NCG[10],andontheotherhand,NCGalgorithmscanbeviewedasmemoryless
quasiNewtonmethods(seee.g.,[13],[12], [11]).
TheideaofusingaquasiNewtonupdate asapreconditionerwithinNCG
algorithms isnot new. In[2], when storage isavailable,apreconditioner de-
ned by
𝑚
quasiNewton updates is used within NCG algorithm. In [1] aofNCG.Moreover,anautomaticpreconditioningstrategybasedonalimited
memoryquasiNewtonupdateforthelinearCGisproposedin[9],withinHes-
sian free Newton methods, and is extended to the solution of a sequence of
linearsystems.
In this paper, we propose two classes of parameters dependent precondi-
tioners. Inparticular,inthenextsectionwebrieyrecallaschemeofageneral
PNCG method. In Section 3 a new symmetric rank-2 update is introduced
and its theoretical properties are studied. Section 4 is devoted to describe a
new BFGSlike quasiNewton update. Finally, in Section 5 the results of a
preliminarynumericalexperiencearereported,showingacomparisonbetween
oneofourproposalsandanLBFGSbasedpreconditionerforPNCG.
2. Preconditioned Nonlinear Conjugate Gradient algorithm
InthissectionwereporttheschemeofageneralPreconditionedNonlinear
ConjugateGradient (PNCG) algorithm (seee.g. [12]). InthePNCG scheme
𝑀 𝑘
denotes thepreconditionerattheiteration𝑘
.Preconditioned NonlinearConjugate Gradient (PNCG) algorithm
Step 1: Data
𝑥 1 ∈ IR 𝑛
. Set𝑝 1 = −𝑀 1 ∇𝑓 (𝑥 1 )
and𝑘 = 1
.Step 2: Computethesteplength
𝛼 𝑘
byusingalinesearchprocedurewhichguaranteestheWolfeconditionstobesatised,andset
𝑥 𝑘+1 = 𝑥 𝑘 + 𝛼 𝑘 𝑝 𝑘 .
Step 3: If
∥∇𝑓 (𝑥 𝑘+1 )∥ = 0
thenstop,elsecompute𝛽 𝑘+1
and(2.1)
𝑝 𝑘+1 = −𝑀 𝑘+1 ∇𝑓 (𝑥 𝑘+1 ) + 𝛽 𝑘+1 𝑝 𝑘 ,
set
𝑘 = 𝑘 + 1
andgotoStep2.By setting
𝑀 𝑘 = 𝐼
for any𝑘
, the popular (unpreconditioned) Nonlinear ConjugateGradient (NCG) method isobtained. The parameter𝛽 𝑘+1
can bechosenin avarietyof ways. ForPNCGalgorithm themost recurrentchoices
arethefollowing:
𝛽 𝑘+1 FR = ∇𝑓 (𝑥 𝑘+1 ) 𝑇 𝑀 𝑘 ∇𝑓 (𝑥 𝑘+1 )
∇𝑓 (𝑥 𝑘 ) 𝑇 ∇𝑓 (𝑥 𝑘 ) ,
(2.2)
𝛽 𝑘+1 PR = [∇𝑓 (𝑥 𝑘+1 ) − ∇𝑓 (𝑥 𝑘 )] 𝑇 𝑀 𝑘 ∇𝑓 (𝑥 𝑘+1 )
∇𝑓 (𝑥 𝑘 ) 𝑇 𝑀 𝑘 ∇𝑓 (𝑥 𝑘 ) ,
(2.3)
𝛽 𝑘+1 HS = [∇𝑓 (𝑥 𝑘+1 ) − ∇𝑓 (𝑥 𝑘 )] 𝑇 𝑀 𝑘 ∇𝑓 (𝑥 𝑘+1 ) [∇𝑓 (𝑥 𝑘+1 ) − ∇𝑓 (𝑥 𝑘 )] 𝑇 𝑝 𝑘
.
(2.4)
Werecall that with respect toother gradient methods, amoreaccurate line-
searchprocedureisrequiredto determinethesteplength
𝛼 𝑘
inaPNCGalgo-rithm. Thisisduetothepresenceoftheterm
𝛽 𝑘+1 𝑝 𝑘
in(2.1). Thelatterfactmotivatestheuse ofthe(strong)Wolfe conditionstocomputethe steplength
𝛼 𝑘
,whichalso guaranteethat𝑠 𝑇 𝑘 𝑦 𝑘 > 0
forany𝑘
.As already said, preconditioningis applied for increasing the eciency of
theNCG method. Inthis regard, we remark anoticeable dierencebetween
linearCGand NCG. Wheneverthe linearCG is applied, theHessianmatrix
doesnotchangeduring theiterationsofthealgorithm. Onthecontrary,when
NCGisappliedtoanonlinearfunction,theHessianmatrix(possiblyindenite)
changesateachiteration.
3. A new Symmetric Rank-2 update
InthissectionwestudyanewquasiNewtonupdatingformula,byconsid-
eringthepropertiesofaparameterdependentsymmetricrank-2(SR2)update
of the inverse of theHessian matrix. Suppose we generate after
𝑘
iterationsthesequenceofiterates
{𝑥 1 , . . . , 𝑥 𝑘+1 }
. ThenourquasiNewtonupdate𝐻 𝑘+1
,whichapproximates
[∇ 2 𝑓 (𝑥)] −1
,satisesthesecantequationalongallpreviousdirections;namelyitresults
𝐻 𝑘+1 𝑦 𝑗 = 𝑠 𝑗 ,
forall𝑗 ≤ 𝑘.
Observethatthelatterappealingpropertyissatisedbyalltheupdatesofthe
Broydenclass,providedthatthelinesearchadoptedisexact(seee.g. [11]). We
wouldliketorecoverthemotivationunderlyingthelatterclassofupdates,and
byusingrank-2updates wewouldliketodeneapreconditionerforPNCG.
Onthis guideline, inorder to buildanapproximateinverseoftheHessian
matrix,weconsidertheupdate
(3.1)
𝐻(𝛾 𝑘+1 , 𝜔 𝑘+1 ) = 𝐻(𝛾 𝑘 , 𝜔 𝑘 ) + Δ 𝑘 , Δ 𝑘 ∈ IR 𝑛×𝑛 , symmetric,
wherethesequence
{𝐻(𝛾 𝑘 , 𝜔 𝑘 )}
dependsontheparameters𝛾 𝑘
,𝜔 𝑘
andprovidesourquasi-Newtonupdatesof
[∇ 2 𝑓 (𝑥)] −1
.Itisrstourpurposeto proposethenewupdate
𝐻(𝛾 𝑘+1 , 𝜔 𝑘+1 )
suchthat:(0)
𝐻(𝛾 𝑘+1 , 𝜔 𝑘+1 )
iswell-dened andnonsingular (1)𝐻(𝛾 𝑘+1 , 𝜔 𝑘+1 )
canbeiterativelyupdated(2)
𝐻(𝛾 𝑘+1 , 𝜔 𝑘+1 )
collectstheinformationfromtheiterations1, 2, . . . , 𝑘
ofaNCGmethod
(3)
𝐻(𝛾 𝑘+1 , 𝜔 𝑘+1 )
satisesthesecantequationatiterations𝑗 = 1, 2, . . . , 𝑘
(4)
𝐻(𝛾 𝑘+1 , 𝜔 𝑘+1 )
either tends to preserve the inertia of the inverse of∇ 2 𝑓 (𝑥 𝑘+1 )
, in case𝑓 (𝑥)
is ageneral quadraticfunction or,by suitablysettingthetwoparameters,itcanbeusedasapreconditionerforPNCG,
i.e.
𝑀 𝑘 = 𝐻(𝛾 𝑘 , 𝜔 𝑘 )
.Observe that the Symmetric Rank-1 (SR1) quasi-Newton update (see Sec-
tion 6.2 in [11]) satises properties (1)-(4) but not the property (0), i.e. it
mightbepossiblynotwelldenedforageneralnonlinearfunction. Thelatter
result follows from the fact that SR1 update provides only a rank-1 quasi-
Newtonupdate, unlikeBFGS andDFP. Ontheotherhand,while BFGSand
DFP quasi-Newton formulae provide only positive denite updates, the SR1
formulaisabletorecovertheinertiaoftheHessianmatrix,bygeneratingpos-
sibly indenite updates. Thus, now we want to study an SR2 quasi-Newton
updateisprovidedbyinformationfromtheNCGmethod. Tothisaim,assum-
ingthat
𝐻 𝑘 = 𝐻(𝛾 𝑘 , 𝜔 𝑘 )
isgiven,weconsider therelation(3.1) whereweset(see(1.4))
Δ 𝑘 = 𝛾 𝑘 𝑣 𝑘 𝑣 𝑘 𝑇 + 𝜔 𝑘
𝑝 𝑘 𝑝 𝑇 𝑘 𝑦 𝑇 𝑘 𝑝 𝑘
, 𝛾 𝑘 , 𝜔 𝑘 ∈ IR, 𝑣 𝑘 ∈ IR 𝑛 ,
and
𝑝 𝑘
is generated at the𝑘−
th iteration of the (unpreconditioned) NCG method. Thus,wewillhavethenewupdate(3.2)
𝐻 𝑘+1 = 𝐻 𝑘 + 𝛾 𝑘 𝑣 𝑘 𝑣 𝑘 𝑇 + 𝜔 𝑘
𝑝 𝑘 𝑝 𝑇 𝑘 𝑦 𝑇 𝑘 𝑝 𝑘
, 𝛾 𝑘 , 𝜔 𝑘 ∈ IR, 𝑣 𝑘 ∈ IR 𝑛 ,
andinordertosatisfythesecantequation
𝐻 𝑘+1 𝑦 𝑘 = 𝑠 𝑘
thefollowingequalitymusthold
𝐻 𝑘 𝑦 𝑘 + 𝛾 𝑘 (𝑣 𝑇 𝑘 𝑦 𝑘 )𝑣 𝑘 + 𝜔 𝑘
𝑝 𝑘 𝑝 𝑇 𝑘 𝑦 𝑘 𝑇 𝑝 𝑘
𝑦 𝑘 = 𝑠 𝑘 ,
thatis
(3.3)
𝛾 𝑘 (𝑣 𝑇 𝑘 𝑦 𝑘 )𝑣 𝑘 = 𝑠 𝑘 − 𝐻 𝑘 𝑦 𝑘 − 𝜔 𝑘 𝑝 𝑘 .
Thereforeitresults
(3.4)
𝑣 𝑘 = 𝜎 𝑘 (𝑠 𝑘 − 𝐻 𝑘 𝑦 𝑘 − 𝜔 𝑘 𝑝 𝑘 )
forsomescalar
𝜎 𝑘 ∈ IR
. Bysubstitutingtheexpression(3.4)of𝑣 𝑘
in(3.3)wehave
𝛾 𝑘 𝜎 2 𝑘 [𝑦 𝑘 𝑇 (𝑠 𝑘 − 𝐻 𝑘 𝑦 𝑘 − 𝜔 𝑘 𝑝 𝑘 )] (𝑠 𝑘 − 𝐻 𝑘 𝑦 𝑘 − 𝜔 𝑘 𝑝 𝑘 ) = 𝑠 𝑘 − 𝐻 𝑘 𝑦 𝑘 − 𝜔 𝑘 𝑝 𝑘 .
Thus, thefollowingrelationamong theparameters
𝛾 𝑘
,𝜎 𝑘
and𝜔 𝑘
musthold(3.5)
𝛾 𝑘 𝜎 𝑘 2 = 1
𝑠 𝑇 𝑘 𝑦 𝑘 − 𝑦 𝑘 𝑇 𝐻 𝑘 𝑦 𝑘 − 𝜔 𝑘 𝑝 𝑇 𝑘 𝑦 𝑘
.
Note that from the arbitrarinessof
𝛾 𝑘
, withoutlossof generality, wecan set𝜎 𝑘 ∈ {−1, 1}
.Now, in the nextproposition werst consider thecase of quadraticfunc-
tions,and provethat the update (3.2)satises thesecantequation, alongall
Proposition 3.1. Assume that
𝑓
isthequadratic function𝑓 (𝑥) = 1 2 𝑥 𝑇 𝐴𝑥 + 𝑏 𝑇 𝑥
, where𝐴 ∈ IR 𝑛×𝑛
issymmetricand𝑏 ∈ IR 𝑛
. Suppose that𝑘
stepsofthe(unpreconditioned)CGareperformed,in orderto detectthestationarypoint
(ifany)ofthefunction
𝑓
,andthatthevectors𝑝 1 , . . . , 𝑝 𝑘
aregenerated. Then,thematrix
𝐻 𝑘+1
in(3.2)satisesthesecantequations(3.6)
𝐻 𝑘+1 𝑦 𝑗 = 𝑠 𝑗 , 𝑗 = 1, . . . , 𝑘,
providedthat thecoecients
𝛾 𝑗
,𝜔 𝑗
,𝑗 = 1, . . . , 𝑘
arecomputedsuchthat(3.7)
𝛾 𝑗 = 1
𝑠 𝑇 𝑗 𝑦 𝑗 − 𝑦 𝑗 𝑇 𝐻 𝑗 𝑦 𝑗 − 𝜔 𝑗 𝑝 𝑇 𝑗 𝑦 𝑗
, 𝑗 = 1, . . . , 𝑘,
𝜔 𝑗 ∕= 𝑠 𝑇 𝑗 𝑦 𝑗 − 𝑦 𝑗 𝑇 𝐻 𝑗 𝑦 𝑗
𝑝 𝑇 𝑗 𝑦 𝑗
, 𝑗 = 1, . . . , 𝑘.
Proof Theproofproceedsbyinduction. Equations(3.6)holdfor
𝑘 = 1
,thatis
𝐻 2 𝑦 1 = 𝑠 1
,aslongas𝑠 1 = [
𝐻 1 + 𝛾 1 𝜎 1 2 (𝑠 1 − 𝐻 1 𝑦 1 − 𝜔 1 𝑝 1 )(𝑠 1 − 𝐻 1 𝑦 1 − 𝜔 1 𝑝 1 ) 𝑇 + 𝜔 1
𝑝 1 𝑝 𝑇 1 𝑦 𝑇 1 𝑝 1
] 𝑦 1 ,
orequivalently
𝑠 1 − 𝐻 1 𝑦 1 − 𝜔 1 𝑝 1 = 𝛾 1 (𝑠 𝑇 1 𝑦 1 − 𝑦 1 𝑇 𝐻 1 𝑦 1 − 𝜔 1 𝑝 𝑇 1 𝑦 1 ) [𝑠 1 − 𝐻 1 𝑦 1 − 𝜔 1 𝑝 1 ] ,
whichissatisedselecting
𝛾 1
and𝜔 1
accordingwith(3.7).Now, suppose that the relations (3.6)hold forthe index
𝑘 − 1
. Tocompletetheinduction we need to provethat the relations (3.6) hold for theindex
𝑘
.Firstly,notethat
𝐻 𝑘+1 𝑦 𝑘 = 𝑠 𝑘
holds. Infact𝑠 𝑘 = [
𝐻 𝑘 + 𝛾 𝑘 𝜎 2 𝑘 (𝑠 𝑘 − 𝐻 𝑘 𝑦 𝑘 − 𝜔 𝑘 𝑝 𝑘 )(𝑠 𝑘 − 𝐻 𝑘 𝑦 𝑘 − 𝜔 𝑘 𝑝 𝑘 ) 𝑇 + 𝜔 𝑘
𝑝 𝑘 𝑝 𝑇 𝑘 𝑦 𝑇 𝑘 𝑝 𝑘
] 𝑦 𝑘
holdsifandonlyif
𝑠 𝑘 − 𝐻 𝑘 𝑦 𝑘 − 𝜔 𝑘 𝑝 𝑘 = 𝛾 𝑘 (𝑠 𝑇 𝑘 𝑦 𝑘 − 𝑦 𝑘 𝑇 𝐻 𝑘 𝑦 𝑘 − 𝜔 𝑘 𝑝 𝑇 𝑘 𝑦 𝑘 )(𝑠 𝑘 − 𝐻 𝑘 𝑦 𝑘 − 𝜔 𝑘 𝑝 𝑘 ),
andthelatter holdsfrom (3.7)with
𝑗 = 𝑘
. Now, wehaveto provethat (3.6)holdforany
𝑗 < 𝑘
. For𝑗 < 𝑘
wehave𝐻 𝑘+1 𝑦 𝑗 = 𝐻 𝑘 𝑦 𝑗 + 𝛾 𝑘 𝜎 2 𝑘 (𝑠 𝑘 − 𝐻 𝑘 𝑦 𝑘 − 𝜔 𝑘 𝑝 𝑘 )(𝑠 𝑘 − 𝐻 𝑘 𝑦 𝑘 − 𝜔 𝑘 𝑝 𝑘 ) 𝑇 𝑦 𝑗 + 𝜔 𝑘
𝑝 𝑇 𝑘 𝑦 𝑗
𝑦 𝑘 𝑇 𝑝 𝑘
𝑝 𝑘 ,
where
𝐻 𝑘 𝑦 𝑗 = 𝑠 𝑗
bytheinductivehypothesis. Moreover,(𝑠 𝑘 − 𝐻 𝑘 𝑦 𝑘 ) 𝑇 𝑦 𝑗 = 𝑠 𝑇 𝑘 𝑦 𝑗 − 𝑦 𝑘 𝑇 𝐻 𝑘 𝑦 𝑗 = 𝑠 𝑇 𝑘 𝑦 𝑗 − 𝑦 𝑘 𝑇 𝑠 𝑗 = 𝑠 𝑇 𝑘 𝐴𝑠 𝑗 − (𝐴𝑠 𝑘 ) 𝑇 𝑠 𝑗 = 0,
where the third equality holds since
𝑦 𝑗 = 𝐴𝑠 𝑗
, for any𝑗
, for the quadraticfunction
𝑓
. Finally,𝜔 𝑘 𝑝 𝑇 𝑘 𝑦 𝑗 = 𝜔 𝑘 𝑝 𝑇 𝑘 𝐴𝑠 𝑗 = 𝜔 𝑘 𝛼 𝑗 𝑝 𝑇 𝑘 𝐴𝑝 𝑗 = 0,
which follows from the conjugacy of thedirections
{𝑝 1 , . . . , 𝑝 𝑘 }
generated bytheCG.Thus,(3.6)holdforany
𝑗 ≤ 𝑘
andtheinductioniscomplete.As an immediate consequence of the previousproposition, we prove now
thenite terminationproperty for aquadratic function, i.e. after at most
𝑛
steps,
𝐻 𝑛+1
istheinverseoftheHessianofthequadraticfunction.Corollary3.1. Assumethat
𝑓
isthequadraticfunction𝑓 (𝑥) = 1 2 𝑥 𝑇 𝐴𝑥 + 𝑏 𝑇 𝑥
,where
𝐴 ∈ IR 𝑛×𝑛
is symmetric and𝑏 ∈ IR 𝑛
. Suppose that𝑛
steps of the(unpreconditioned)CGareperformed,in orderto detectthestationarypoint
ofthefunction
𝑓
,andthatthevectors𝑝 1 , . . . , 𝑝 𝑛
aregenerated. If (3.7)holds,wehave
𝐻 𝑛+1 = 𝐴 −1
.Proof Byapplying Proposition 3.1,wehavethat(3.6)hold for
𝑘 = 𝑛
,i.e.
𝐻 𝑛+1 𝑦 𝑗 = 𝑠 𝑗 , 𝑗 = 1, . . . , 𝑛.
Since
𝑓
isquadraticthen𝑦 𝑗 = 𝐴𝑠 𝑗
,forany𝑗
,i.e.𝐻 𝑛+1 𝐴𝑠 𝑗 = 𝑠 𝑗 , 𝑗 = 1, . . . , 𝑛.
Now, since
𝑠 𝑗 = 𝛼 𝑗 𝑝 𝑗
,𝑗 = 1, . . . , 𝑛
, theconjugacy ofthevectors{𝑝 1 , . . . , 𝑝 𝑛 }
impliesthat
𝐻 𝑛+1 = 𝐴 −1
.Wehighlightthat,whenever
𝑘 = 𝑛
,Corollary3.1justies therstpartofthe statement (4) on page 8. Moreover, later on in the paper weshowthat
for
𝑘 < 𝑛
, the update matrix in (3.2) can be suitably modied to provideaAfteranalyzingthecaseof
𝑓 (𝑥)
quadratic,weturnnowtothegeneralcaseofanonlineartwicecontinuouslydierentiablefunction. Inparticular,sincewe
areinterestedinusingthematrix
𝐻 𝑘+1
in(3.2)asapreconditioner,weneedto investigate ifthereexists asuitablesettingoftheparameterssuchthat𝐻 𝑘+1
is positive denite, provided that (3.7) are satised. In the nextproposition
weprovethat iftheparameter
𝜔 𝑘
isbelowathresholdvalue,thenthematrix𝐻 𝑘+1
isalmostalwayspositivedenite.Proposition3.2. Let
𝑓
beanonlineartwicecontinuouslydierentiablefunc- tion. Suppose that the(unpreconditioned)NCG method isused to minimizethefunction
𝑓
. Supposethat (3.7)issatisedand(3.8)
0 ≤ 𝜔 𝑘 < 𝑠 𝑇 𝑘 𝑦 𝑘 − 𝑦 𝑇 𝑘 𝐻 𝑘 𝑦 𝑘
𝑝 𝑇 𝑘 𝑦 𝑘
,
with
(3.9)
𝑦 𝑇 𝑘 𝑠 𝑘 + 𝑦 𝑇 𝑘 𝐻 𝑘 𝑦 𝑘 ≤ 0
or𝑦 𝑇 𝑘 𝑠 𝑘 − 𝑦 𝑇 𝑘 𝐻 𝑘 𝑦 𝑘 ≥ 0,
where
𝑠 𝑗 = 𝛼 𝑗 𝑝 𝑗
. Thenthematrix𝐻 𝑘+1
in(3.2)ispositivedenite.Proof Bysubstituting(3.4)in(3.2),recallingthat
𝜎 2 𝑘 = 1
weobtain𝐻 𝑘+1 = 𝛾 𝑘
[ (𝛼 𝑘 − 𝜔 𝑘 ) 2 𝑝 𝑘 𝑝 𝑇 𝑘 + (𝛼 𝑘 − 𝜔 𝑘 ) (
(𝐻 𝑘 𝑦 𝑘 ) 𝑝 𝑇 𝑘 + 𝑝 𝑘 (𝐻 𝑘 𝑦 𝑘 ) 𝑇 ) + (𝐻 𝑘 𝑦 𝑘 ) (𝐻 𝑘 𝑦 𝑘 ) 𝑇 ]
+ 𝜔 𝑘
𝑝 𝑘 𝑝 𝑇 𝑘 𝑦 𝑘 𝑇 𝑝 𝑘
.
Hence
𝐻 𝑘+1
canberewrittenintheform( 𝑝 𝑘
.
.
.
𝐻 𝑘 𝑦 𝑘
)
⎛
⎜
⎜
⎜
⎜
⎝
𝛾 𝑘 (𝛼 𝑘 − 𝜔 𝑘 ) 2 + 𝜔 𝑘
𝑦 𝑇 𝑘 𝑝 𝑘
𝛾 𝑘 (𝛼 𝑘 − 𝜔 𝑘 )
𝛾 𝑘 (𝛼 𝑘 − 𝜔 𝑘 ) 𝛾 𝑘
⎞
⎟
⎟
⎟
⎟
⎠
⎛
⎜
⎜
⎜
⎜
⎜
⎜
⎝ 𝑝 𝑇 𝑘
. . .
(𝐻 𝑘 𝑦 𝑘 ) 𝑇
⎞
⎟
⎟
⎟
⎟
⎟
⎟
⎠ .
Therefore
𝐻 𝑘+1
ispositivedeniteifandonlyifthefollowinginequalitieshold:(3.10)
𝛾 𝑘 (𝛼 𝑘 − 𝜔 𝑘 ) 2 + 𝜔 𝑘
𝑦 𝑇 𝑘 𝑝 𝑘
> 0 𝛾 𝑘
(
𝛾 𝑘 (𝛼 𝑘 − 𝜔 𝑘 ) 2 + 𝜔 𝑘
𝑦 𝑘 𝑇 𝑝 𝑘
)
− 𝛾 2 (𝛼 𝑘 − 𝜔 𝑘 ) 2 > 0.
Usingtheexpressionof
𝛾 𝑘
in(3.5)andrecallingthat𝑦 𝑘 𝑇 𝑠 𝑘 > 0
(asaconsequence oftheWolfeconditions),(3.10)areequivalentto(𝛼 𝑘 − 𝜔 𝑘 ) 2 𝑦 𝑘 𝑇 𝑝 𝑘
(𝛼 𝑘 − 𝜔 𝑘 )𝑝 𝑇 𝑘 𝑦 𝑘 − 𝑦 𝑘 𝑇 𝐻 𝑘 𝑦 𝑘
+ 𝜔 𝑘 > 0 𝜔 𝑘
(𝛼 𝑘 − 𝜔 𝑘 )𝑝 𝑇 𝑘 𝑦 𝑘 − 𝑦 𝑘 𝑇 𝐻 𝑘 𝑦 𝑘
> 0.
Aftersomecomputationweobtainthatthereexistvaluesoftheparameter
𝜔 𝑘
forwhich thelatter inequalitiesadmit solutions,with onlyoneexception. In
fact,theyaresatisedforanyvalueof
𝜔 𝑘
suchthat0 ≤ 𝜔 𝑘 < 𝛼 𝑘 𝑝 𝑇 𝑘 𝑦 𝑘 − 𝑦 𝑇 𝑘 𝐻 𝑘 𝑦 𝑘
𝑝 𝑇 𝑘 𝑦 𝑘
buttheydonotadmitsolutionincase
𝛼 𝑘 𝑦 𝑇 𝑘 𝑝 𝑘 + 𝑦 𝑘 𝑇 𝐻 𝑘 𝑦 𝑘 > 0
and𝛼 𝑘 𝑦 𝑘 𝑇 𝑝 𝑘 − 𝑦 𝑘 𝑇 𝐻 𝑘 𝑦 𝑘 < 0,
i.e. when (3.9)doesnothold.
From Proposition 3.1 and Corollary 3.1, we could use the matrix
𝐻 𝑘+1
asanapproximateinverseof
∇ 2 𝑓 (𝑥)
. However,Proposition3.2evidencesthat conditions(3.7)and(3.8)donotsucetoensure𝐻 𝑘+1
positivedenite. Infact,whenever (3.9)occurs, additional safeguard is needed since
𝐻 𝑘+1
is possiblyindenite. Thus, the denition of
𝐻 𝑘+1
should be possibly modied in ordertoobtainpositivedeniteupdates.
4. A preconditioner using a BFGS–like low–rank quasi-Newton update
InthissectionwepartiallyaddressthenalremarkofSection3. Indeed,we
introduceanewclassof preconditionerswhicharestilliterativelyconstructed
by using information from the NCG iterations and, as in the case of BFGS
updates, theyarealwayspositivedenite. Onthis purpose, thepricewepay
withrespectto(3.2),isthatthesecantequationissatisedonlyatthecurrent
Wedrawourinspirationfrom [4],whereanewpreconditionerforNewton
Krylovmethodsisdescribed. Inparticular,in[4]thesetofdirectionsgenerated
by the Krylov subspace method is used to provide an approximate inverse
preconditioner,forthesolutionofNewton'ssystems. Onthisguideline,observe
that if
𝑓 (𝑥) = 1 2 𝑥 𝑇 𝐴𝑥 + 𝑏 𝑇 𝑥
, where𝐴
is positive denite and𝑏 ∈ IR 𝑛
, thenitis well known(see e.g. [6])that the CGmethod maygenerate
𝑛
conjugatedirections
{𝑝 𝑗 }
suchthat(4.1)
𝐴 −1 =
𝑛
∑
𝑗=1
𝑝 𝑗 𝑝 𝑇 𝑗 𝑝 𝑇 𝑗 𝐴𝑝 𝑗
.
Now,inordertointroduceaclassofpreconditionersfortheNCG,incaseofa
generaltwicecontinuoslydierentiablefunction
𝑓
,supposewehaveperformed𝑘
iterationsof the (unpreconditioned)NCG, sothat the directions𝑝 1 , . . . , 𝑝 𝑘
aregenerated. Letus considerthematrix
𝑀 𝑘+1
denedby(4.2)
𝑀 𝑘+1 = 𝜏 𝑘 𝐶 𝑘 + 𝛾 𝑘 𝑣 𝑘 𝑣 𝑇 𝑘 + 𝜔 𝑘 𝑘
∑
𝑗=𝑘−𝑚
𝑝 𝑗 𝑝 𝑇 𝑗
𝑝 𝑇 𝑗 ∇ 2 𝑓 (𝑥 𝑗 )𝑝 𝑗
,
where
0 ≤ 𝑚 ≤ 𝑘
,𝛾 𝑘 , 𝜔 𝑘 ≥ 0
,𝜏 𝑘 > 0
,𝐶 𝑘 ∈ IR 𝑛×𝑛
is symmetric positivedeniteand
𝑣 𝑘 ∈ IR 𝑛
. Inordertouse𝑀 𝑘+1
asapreconditionerandtoupdate itsexpressioniteratively,weset𝜏 𝑘 = 1
,𝐶 𝑘 = 𝐻(𝜏 𝑘 , 𝛾 𝑘 , 𝜔 𝑘 )
(with𝐻(𝜏 0 , 𝛾 0 , 𝜔 0 )
given)and rewrite(4.2)intheform
(4.3)
𝐻(𝜏 𝑘+1 , 𝛾 𝑘+1 , 𝜔 𝑘+1 ) = 𝐻(𝜏 𝑘 , 𝛾 𝑘 , 𝜔 𝑘 )+𝛾 𝑘 𝑣 𝑘 𝑣 𝑇 𝑘 +𝜔 𝑘 𝑘
∑
𝑗=𝑘−𝑚
𝑝 𝑗 𝑝 𝑇 𝑗 𝑝 𝑇 𝑗 ∇ 2 𝑓 (𝑥 𝑗 )𝑝 𝑗
.
𝐻(𝜏 𝑘+1 , 𝛾 𝑘+1 , 𝜔 𝑘+1 )
may be treated as a symmetric quasiNewton update.However,for simplicity, in thesequel weprefer to use themoregeneral form
givenby(4.2).
Observethat in the expression of
𝑀 𝑘+1
,𝑣 𝑘 𝑣 𝑘 𝑇
representsarank-1 updateand from (4.1) thedyads
𝑝 𝑗 𝑝 𝑇 𝑗 /𝑝 𝑇 𝑗 ∇ 2 𝑓 (𝑥 𝑗 )𝑝 𝑗
are aimed to build an approxi-mateinverse. Theinteger
𝑚
canbeviewedasalimitedmemory parameter,similarlytotheLBFGSmethod. Moreover,wecansetthevector
𝑣 𝑘
andtheparameters