Preconditioned Nonlinear Conjugate Gradient methods

(1)

Preconditioned Nonlinear Conjugate Gradient methods

Giovanni Fasano and Massimo Roma

1. Introduction

Wedealwiththelargescaleunconstrainedoptimizationproblem

(1.1)

min

𝑥∈IR

^𝑛

𝑓 (𝑥)

where

𝑓 : IR ^𝑛 −→ IR

^is ^a ^twice continuously dierentiablefunction and

𝑛

^is

large. Weassumethat foragiven

𝑥 0 ∈ IR ^𝑛

^the^level^set

Ω 0 = {𝑥 ∈ IR ^𝑛 ∣ 𝑓 (𝑥) ≤ 𝑓 (𝑥 0 )}

iscompact. Thehugenumberofrealworldapplicationswhichcanbemodelled

asalargescaleoptimization problem stronglymotivatesthe growinginterest

forthesolutionofsuch problems.

Among the iterative methods for large scale unconstrained optimization,

when the Hessian matrix is possibly dense, limited memory quasiNewton

methods are often themethods of choice. As well known(see any textbook,

e.g. [11]),theygenerateasequence

{𝑥 𝑘 }

^,^according^to^the^following^scheme

(1.2)

𝑥 𝑘+1 = 𝑥 𝑘 + 𝛼 𝑘 𝑝 𝑘 , 𝑘 = 0, 1, . . . ,

with

𝑝 𝑘 = −𝐻 𝑘 ∇𝑓 (𝑥 𝑘 ),

where

𝐻 𝑘

^is ^an approximation of the inverse of the Hessian matrix

∇ ² 𝑓 (𝑥 𝑘 )

and

𝛼 𝑘

^is^asteplength. Inparticular,insteadofcomputing

𝐻 𝑘

âtêachîteration

𝑘

^, ^these ^methods ^update

𝐻 𝑘

ⁱⁿ â ^simple^manner, ⁱⁿ ôrder ^to ôbtain ^the^new

approximation

𝐻 𝑘+1

^to ^be ûsed ⁱⁿ ^the ^next îteration. ^Moreover, înstead ôf

storingfulldense

𝑛 × 𝑛

approximations,theyonlysaveafewvectorsoflength

𝑛

^,^which ^allow^to^represent^theapproximationsimplicitly.

AmongthequasiNewtonschemes,theLBFGSmethodisusuallyconsid-

eredoneofthemostecient. Itiswellsuitedforlargescaleproblemsbecause

theamount of storageis limited and controlled by the user. This method is

basedon theconstruction of theapproximationof the inverse of theHessian

matrix,byexploitingcurvatureinformation gainedonlyfrom themostrecent

iterations. The inverse oftheHessianmatrixis updatedatthe

𝑘

^-th^iteration

(4)

(1.3)

𝐻 𝑘+1 = 𝑉 _𝑘 ^𝑇 𝐻 𝑘 𝑉 𝑘 + 𝜌 𝑘 𝑠 𝑘 𝑠 ^𝑇 _𝑘

where

𝜌 𝑘 = 1 𝑦 ^𝑇 _𝑘 𝑠 𝑘

, 𝑉 𝑘 = 𝐼 − 𝜌 𝑘 𝑦 𝑘 𝑠 ^𝑇 𝑘 ,

and

(1.4)

𝑠 𝑘 = 𝑥 𝑘+1 − 𝑥 𝑘 = 𝛼 𝑘 𝑝 𝑘 , 𝑦 𝑘 = ∇𝑓 (𝑥 𝑘+1 ) − ∇𝑓 (𝑥 𝑘 ).

Observethat

𝐻 𝑘

^also^satises^relation

𝐻 𝑘 = (𝑉 𝑘−1 ^𝑇 ⋅ ⋅ ⋅ 𝑉 𝑘−𝑚 ^𝑇 )𝐻 𝑘 ⁰ (𝑉 𝑘−𝑚 ⋅ ⋅ ⋅ 𝑉 𝑘−1 )

+ 𝜌 𝑘−𝑚 (𝑉 𝑘−1 ^𝑇 ⋅ ⋅ ⋅ 𝑉 𝑘−𝑚+1 ^𝑇 )𝑠 𝑘−𝑚 𝑠 ^𝑇 𝑘−𝑚 (𝑉 𝑘−𝑚+1 ⋅ ⋅ ⋅ 𝑉 𝑘−1 ) + 𝜌 𝑘−𝑚+1 (𝑉 𝑘−1 ^𝑇 ⋅ ⋅ ⋅ 𝑉 𝑘−𝑚+2 ^𝑇 )𝑠 𝑘−𝑚+1 𝑠 ^𝑇 𝑘−𝑚+1 (𝑉 𝑘−𝑚+2 ⋅ ⋅ ⋅ 𝑉 𝑘−1 ) + ⋅ ⋅ ⋅

+ 𝜌 𝑘−1 𝑠 𝑘−1 𝑠 ^𝑇 𝑘−1 ,

where

𝑚

îs ^the^memory ôf ^the^method ând

𝐻 _𝑘 ⁰

îs ânînitial approximationof theinverseoftheHessianmatrix.

The well known reasons for the success of the LBFGS method can be

summarizedinthefollowingtwopoints: rstly,evenwhen

𝑚

^is^small,

𝐻 𝑘+1

^is

aneectiveapproximationoftheinverseoftheHessianmatrix,secondly

𝐻 𝑘+1

istheunique(positivedenite)matrixwhichsolvestheproblem

min 𝐻 ∥𝐻 − 𝐻 𝑘 ∥ 𝐹

𝑠.𝑡. 𝐻 = 𝐻 ^𝑇 𝑠 𝑘 = 𝐻𝑦 𝑘 ,

where

∥ ⋅ ∥ 𝐹

^is^the^Frobenius^norm. ^Namely,

𝐻 𝑘+1

^is^the^positive^denite^ma-

trixclosest to thecurrentapproximation

𝐻 𝑘

^, ^satisfying^the^secant^equation

𝑠 𝑘 = 𝐻𝑦 𝑘

^. ^However,^LBFGS^method^presents^some^drawbacks,^including^the

(5)

theHessianmatrixare veryspread. Moreover,onsomeapplications, theper-

formancesofLBFGSmethodandtheNonlinearConjugateGradientmethod

arecomparable.

In this paper we focus on the latter method: the Nonlinear Conjugate

Gradientmethod (NCG). As well known (see any textbook,e.g. [11]) it isa

naturalextension to generalfunctions ofthelinearConjugateGradient(CG)

method for quadratic functions. It generates a sequence

{𝑥 𝑘 }

^according ^to

scheme(1.2),with

𝑝 𝑘 = −∇𝑓 (𝑥 𝑘 ) + 𝛽 𝑘 𝑝 𝑘−1 ,

where

𝛽 𝑘

îsâ^suitable^scalar. ^Dierent^valuesôf

𝛽 𝑘

^give^rise^to^dierent^algo-

rithms (see[8] forasurvey). The mostcommonare theFletcherand Reeves

(FR),thePolakandRibière(PR)andtheHestenesandStiefel(HS)algorithms.

Although theNCG methods havebeenwidely studied and are oftenvery

ecientwhensolvinglargescaleproblems,akeypointforincreasingtheire-

ciencyistheuseofapreconditioningstrategy,especiallywhensolvingdicult

illconditionedproblems. Deninggoodpreconditioners for NCGmethods is

currentlystill considered achallengingresearch topic. Onthis guideline, this

work is devoted to investigate the use of quasiNewton updates as precon-

ditioners. In particular, we want to propose preconditioners which possibly

inherittheeectivenessoftheLBFGSupdate. Indeed,herewebuildprecon-

ditionersiterativelydenedandbasedonquasiNewtonupdatesoftheinverse

of the Hessian matrix. This represents an attempt to improve the eciency

oftheNCGmethod byconveyinginformationcollectedfrom aquasiNewton

method,in aPreconditionedNonlinearConjugateGradientmethod(PNCG).

Inparticular, westudynewsymmetriclowrankupdatesoftheinverseofthe

Hessianmatrix,inorder toiterativelydenepreconditionersforPNCG.

Itis worthto notethatthere existsacloseconnectionbetweenBFGSand

NCG[10],andontheotherhand,NCGalgorithmscanbeviewedasmemoryless

quasiNewtonmethods(seee.g.,[13],[12], [11]).

TheideaofusingaquasiNewtonupdate asapreconditionerwithinNCG

algorithms isnot new. In[2], when storage isavailable,apreconditioner de-

ned by

𝑚

quasiNewton updates is used within NCG algorithm. In [1] a

(6)

ofNCG.Moreover,anautomaticpreconditioningstrategybasedonalimited

memoryquasiNewtonupdateforthelinearCGisproposedin[9],withinHes-

sian free Newton methods, and is extended to the solution of a sequence of

linearsystems.

In this paper, we propose two classes of parameters dependent precondi-

tioners. Inparticular,inthenextsectionwebrieyrecallaschemeofageneral

PNCG method. In Section 3 a new symmetric rank-2 update is introduced

and its theoretical properties are studied. Section 4 is devoted to describe a

new BFGSlike quasiNewton update. Finally, in Section 5 the results of a

preliminarynumericalexperiencearereported,showingacomparisonbetween

oneofourproposalsandanLBFGSbasedpreconditionerforPNCG.

2. Preconditioned Nonlinear Conjugate Gradient algorithm

InthissectionwereporttheschemeofageneralPreconditionedNonlinear

ConjugateGradient (PNCG) algorithm (seee.g. [12]). InthePNCG scheme

𝑀 𝑘

^denotes ^thepreconditionerattheiteration

𝑘

^.

Preconditioned NonlinearConjugate Gradient (PNCG) algorithm

Step 1: Data

𝑥 1 ∈ IR ^𝑛

^. ^Set

𝑝 1 = −𝑀 1 ∇𝑓 (𝑥 1 )

^and

𝑘 = 1

^.

Step 2: Computethesteplength

𝛼 𝑘

^by^using^a^linesearch^procedure^which

guaranteestheWolfeconditionstobesatised,andset

𝑥 𝑘+1 = 𝑥 𝑘 + 𝛼 𝑘 𝑝 𝑘 .

Step 3: If

∥∇𝑓 (𝑥 𝑘+1 )∥ = 0

^then^stop,^else^compute

𝛽 𝑘+1

^and

(2.1)

𝑝 𝑘+1 = −𝑀 𝑘+1 ∇𝑓 (𝑥 𝑘+1 ) + 𝛽 𝑘+1 𝑝 𝑘 ,

set

𝑘 = 𝑘 + 1

^and^go^to^Step^2.

(7)

By setting

𝑀 𝑘 = 𝐼

^for ^any

𝑘

^, ^the ^popular (unpreconditioned) Nonlinear ConjugateGradient (NCG) method isobtained. The parameter

𝛽 𝑘+1

^can ^be

chosenin avarietyof ways. ForPNCGalgorithm themost recurrentchoices

arethefollowing:

𝛽 𝑘+1 ^FR = ∇𝑓 (𝑥 𝑘+1 ) ^𝑇 𝑀 𝑘 ∇𝑓 (𝑥 𝑘+1 )

∇𝑓 (𝑥 𝑘 ) ^𝑇 ∇𝑓 (𝑥 𝑘 ) ,

(2.2)

𝛽 _𝑘+1 ^PR = [∇𝑓 (𝑥 𝑘+1 ) − ∇𝑓 (𝑥 𝑘 )] ^𝑇 𝑀 𝑘 ∇𝑓 (𝑥 𝑘+1 )

∇𝑓 (𝑥 𝑘 ) ^𝑇 𝑀 𝑘 ∇𝑓 (𝑥 𝑘 ) ,

(2.3)

𝛽 _𝑘+1 ^HS = [∇𝑓 (𝑥 𝑘+1 ) − ∇𝑓 (𝑥 𝑘 )] ^𝑇 𝑀 𝑘 ∇𝑓 (𝑥 𝑘+1 ) [∇𝑓 (𝑥 𝑘+1 ) − ∇𝑓 (𝑥 𝑘 )] ^𝑇 𝑝 𝑘

.

(2.4)

Werecall that with respect toother gradient methods, amoreaccurate line-

searchprocedureisrequiredto determinethesteplength

𝛼 𝑘

ⁱⁿ^a^PNCG^algo-

rithm. Thisisduetothepresenceoftheterm

𝛽 𝑘+1 𝑝 𝑘

ⁱⁿ^(2.1). ^The^latter^fact

motivatestheuse ofthe(strong)Wolfe conditionstocomputethe steplength

𝛼 𝑘

^,^which^also ^guarantee^that

𝑠 ^𝑇 _𝑘 𝑦 𝑘 > 0

^for^any

𝑘

^.

As already said, preconditioningis applied for increasing the eciency of

theNCG method. Inthis regard, we remark anoticeable dierencebetween

linearCGand NCG. Wheneverthe linearCG is applied, theHessianmatrix

doesnotchangeduring theiterationsofthealgorithm. Onthecontrary,when

NCGisappliedtoanonlinearfunction,theHessianmatrix(possiblyindenite)

changesateachiteration.

3. A new Symmetric Rank-2 update

InthissectionwestudyanewquasiNewtonupdatingformula,byconsid-

eringthepropertiesofaparameterdependentsymmetricrank-2(SR2)update

of the inverse of theHessian matrix. Suppose we generate after

𝑘

^iterations

thesequenceofiterates

{𝑥 1 , . . . , 𝑥 𝑘+1 }

^. ^Then^ourquasiNewtonupdate

𝐻 𝑘+1

^,

whichapproximates

[∇ ² 𝑓 (𝑥)] ⁻¹

^,^satises^the^secantêquationâlongâll^previous

directions;namelyitresults

𝐻 𝑘+1 𝑦 𝑗 = 𝑠 𝑗 ,

^for^all

𝑗 ≤ 𝑘.

(8)

Observethatthelatterappealingpropertyissatisedbyalltheupdatesofthe

Broydenclass,providedthatthelinesearchadoptedisexact(seee.g. [11]). We

wouldliketorecoverthemotivationunderlyingthelatterclassofupdates,and

byusingrank-2updates wewouldliketodeneapreconditionerforPNCG.

Onthis guideline, inorder to buildanapproximateinverseoftheHessian

matrix,weconsidertheupdate

(3.1)

𝐻(𝛾 𝑘+1 , 𝜔 𝑘+1 ) = 𝐻(𝛾 𝑘 , 𝜔 𝑘 ) + Δ 𝑘 , Δ 𝑘 ∈ IR ^𝑛×𝑛 , symmetric,

wherethesequence

{𝐻(𝛾 𝑘 , 𝜔 𝑘 )}

^depends^on^the^parameters

𝛾 𝑘

^,

𝜔 𝑘

^and^provides

ourquasi-Newtonupdatesof

[∇ ² 𝑓 (𝑥)] ⁻¹

^.

Itisrstourpurposeto proposethenewupdate

𝐻(𝛾 𝑘+1 , 𝜔 𝑘+1 )

^such^that:

(0)

𝐻(𝛾 𝑘+1 , 𝜔 𝑘+1 )

^iswell-dened andnonsingular (1)

𝐻(𝛾 𝑘+1 , 𝜔 𝑘+1 )

^can^beiterativelyupdated

(2)

𝐻(𝛾 𝑘+1 , 𝜔 𝑘+1 )

^collects^theinformationfromtheiterations

1, 2, . . . , 𝑘

^of^a

NCGmethod

(3)

𝐻(𝛾 𝑘+1 , 𝜔 𝑘+1 )

^satises^the^secantêquationâtîterations

𝑗 = 1, 2, . . . , 𝑘

(4)

𝐻(𝛾 𝑘+1 , 𝜔 𝑘+1 )

êither ^tends ^to ^preserve ^the înertia ôf ^the înverse ôf

∇ ² 𝑓 (𝑥 𝑘+1 )

^, ⁱⁿ ^case

𝑓 (𝑥)

îs â^general ^quadratic^function ôr,^by ^suitably

settingthetwoparameters,itcanbeusedasapreconditionerforPNCG,

i.e.

𝑀 𝑘 = 𝐻(𝛾 𝑘 , 𝜔 𝑘 )

^.

Observe that the Symmetric Rank-1 (SR1) quasi-Newton update (see Sec-

tion 6.2 in [11]) satises properties (1)-(4) but not the property (0), i.e. it

mightbepossiblynotwelldenedforageneralnonlinearfunction. Thelatter

result follows from the fact that SR1 update provides only a rank-1 quasi-

Newtonupdate, unlikeBFGS andDFP. Ontheotherhand,while BFGSand

DFP quasi-Newton formulae provide only positive denite updates, the SR1

formulaisabletorecovertheinertiaoftheHessianmatrix,bygeneratingpos-

sibly indenite updates. Thus, now we want to study an SR2 quasi-Newton

(9)

updateisprovidedbyinformationfromtheNCGmethod. Tothisaim,assum-

ingthat

𝐻 𝑘 = 𝐻(𝛾 𝑘 , 𝜔 𝑘 )

^is^given,^we^consider ^the^relation^(3.1) ^where^we^set

(see(1.4))

Δ 𝑘 = 𝛾 𝑘 𝑣 𝑘 𝑣 _𝑘 ^𝑇 + 𝜔 𝑘

𝑝 𝑘 𝑝 ^𝑇 _𝑘 𝑦 ^𝑇 _𝑘 𝑝 𝑘

, 𝛾 𝑘 , 𝜔 𝑘 ∈ IR, 𝑣 𝑘 ∈ IR ^𝑛 ,

and

𝑝 𝑘

^is ^generated ^at ^the

𝑘−

^th ^iteration ^of ^the (unpreconditioned) NCG method. Thus,wewillhavethenewupdate

(3.2)

𝐻 𝑘+1 = 𝐻 𝑘 + 𝛾 𝑘 𝑣 𝑘 𝑣 𝑘 ^𝑇 + 𝜔 𝑘

𝑝 𝑘 𝑝 ^𝑇 _𝑘 𝑦 ^𝑇 _𝑘 𝑝 𝑘

, 𝛾 𝑘 , 𝜔 𝑘 ∈ IR, 𝑣 𝑘 ∈ IR ^𝑛 ,

andinordertosatisfythesecantequation

𝐻 𝑘+1 𝑦 𝑘 = 𝑠 𝑘

^the^following^equality

musthold

𝐻 𝑘 𝑦 𝑘 + 𝛾 𝑘 (𝑣 ^𝑇 _𝑘 𝑦 𝑘 )𝑣 𝑘 + 𝜔 𝑘

𝑝 𝑘 𝑝 ^𝑇 _𝑘 𝑦 _𝑘 ^𝑇 𝑝 𝑘

𝑦 𝑘 = 𝑠 𝑘 ,

thatis

(3.3)

𝛾 𝑘 (𝑣 ^𝑇 𝑘 𝑦 𝑘 )𝑣 𝑘 = 𝑠 𝑘 − 𝐻 𝑘 𝑦 𝑘 − 𝜔 𝑘 𝑝 𝑘 .

Thereforeitresults

(3.4)

𝑣 𝑘 = 𝜎 𝑘 (𝑠 𝑘 − 𝐻 𝑘 𝑦 𝑘 − 𝜔 𝑘 𝑝 𝑘 )

forsomescalar

𝜎 𝑘 ∈ IR

^. ^Bysubstitutingtheexpression(3.4)of

𝑣 𝑘

ⁱⁿ^(3.3)^we

have

𝛾 𝑘 𝜎 ² 𝑘 [𝑦 𝑘 ^𝑇 (𝑠 𝑘 − 𝐻 𝑘 𝑦 𝑘 − 𝜔 𝑘 𝑝 𝑘 )] (𝑠 𝑘 − 𝐻 𝑘 𝑦 𝑘 − 𝜔 𝑘 𝑝 𝑘 ) = 𝑠 𝑘 − 𝐻 𝑘 𝑦 𝑘 − 𝜔 𝑘 𝑝 𝑘 .

Thus, thefollowingrelationamong theparameters

𝛾 𝑘

^,

𝜎 𝑘

^and

𝜔 𝑘

^must^hold

(3.5)

𝛾 𝑘 𝜎 𝑘 ² = 1

𝑠 ^𝑇 _𝑘 𝑦 𝑘 − 𝑦 _𝑘 ^𝑇 𝐻 𝑘 𝑦 𝑘 − 𝜔 𝑘 𝑝 ^𝑇 _𝑘 𝑦 𝑘

.

Note that from the arbitrarinessof

𝛾 𝑘

^, ^without^loss^of ^generality^, ^we^can ^set

𝜎 𝑘 ∈ {−1, 1}

^.

Now, in the nextproposition werst consider thecase of quadraticfunc-

tions,and provethat the update (3.2)satises thesecantequation, alongall

(10)

Proposition 3.1. Assume that

𝑓

^is^the^quadratic ^function

𝑓 (𝑥) = ¹ ₂ 𝑥 ^𝑇 𝐴𝑥 + 𝑏 ^𝑇 𝑥

^, ^where

𝐴 ∈ IR ^𝑛×𝑛

^is^symmetric^and

𝑏 ∈ IR ^𝑛

^. ^Suppose ^that

𝑘

^steps^of^the

(unpreconditioned)CGareperformed,in orderto detectthestationarypoint

(ifany)ofthefunction

𝑓

^,^and^that^the^vectors

𝑝 1 , . . . , 𝑝 𝑘

^are^generated. ^Then,

thematrix

𝐻 𝑘+1

ⁱⁿ^(3.2)^satises^the^secant^equations

(3.6)

𝐻 𝑘+1 𝑦 𝑗 = 𝑠 𝑗 , 𝑗 = 1, . . . , 𝑘,

providedthat thecoecients

𝛾 𝑗

^,

𝜔 𝑗

^,

𝑗 = 1, . . . , 𝑘

^are^computed^such^that

(3.7)

𝛾 𝑗 = 1

𝑠 ^𝑇 _𝑗 𝑦 𝑗 − 𝑦 _𝑗 ^𝑇 𝐻 𝑗 𝑦 𝑗 − 𝜔 𝑗 𝑝 ^𝑇 _𝑗 𝑦 𝑗

, 𝑗 = 1, . . . , 𝑘,

𝜔 𝑗 ∕= 𝑠 ^𝑇 _𝑗 𝑦 𝑗 − 𝑦 _𝑗 ^𝑇 𝐻 𝑗 𝑦 𝑗

𝑝 ^𝑇 _𝑗 𝑦 𝑗

, 𝑗 = 1, . . . , 𝑘.

Proof Theproofproceedsbyinduction. Equations(3.6)holdfor

𝑘 = 1

^,

thatis

𝐻 2 𝑦 1 = 𝑠 1

^,^as^long^as

𝑠 1 = [

𝐻 1 + 𝛾 1 𝜎 ₁ ² (𝑠 1 − 𝐻 1 𝑦 1 − 𝜔 1 𝑝 1 )(𝑠 1 − 𝐻 1 𝑦 1 − 𝜔 1 𝑝 1 ) ^𝑇 + 𝜔 1

𝑝 1 𝑝 ^𝑇 ₁ 𝑦 ^𝑇 ₁ 𝑝 1

] 𝑦 1 ,

orequivalently

𝑠 1 − 𝐻 1 𝑦 1 − 𝜔 1 𝑝 1 = 𝛾 1 (𝑠 ^𝑇 ₁ 𝑦 1 − 𝑦 ₁ ^𝑇 𝐻 1 𝑦 1 − 𝜔 1 𝑝 ^𝑇 ₁ 𝑦 1 ) [𝑠 1 − 𝐻 1 𝑦 1 − 𝜔 1 𝑝 1 ] ,

whichissatisedselecting

𝛾 1

^and

𝜔 1

^according^with^(3.7).

Now, suppose that the relations (3.6)hold forthe index

𝑘 − 1

^. ^To^complete

theinduction we need to provethat the relations (3.6) hold for theindex

𝑘

^.

Firstly,notethat

𝐻 𝑘+1 𝑦 𝑘 = 𝑠 𝑘

^holds. ^In^fact

𝑠 𝑘 = [

𝐻 𝑘 + 𝛾 𝑘 𝜎 ² _𝑘 (𝑠 𝑘 − 𝐻 𝑘 𝑦 𝑘 − 𝜔 𝑘 𝑝 𝑘 )(𝑠 𝑘 − 𝐻 𝑘 𝑦 𝑘 − 𝜔 𝑘 𝑝 𝑘 ) ^𝑇 + 𝜔 𝑘

𝑝 𝑘 𝑝 ^𝑇 _𝑘 𝑦 ^𝑇 _𝑘 𝑝 𝑘

] 𝑦 𝑘

holdsifandonlyif

𝑠 𝑘 − 𝐻 𝑘 𝑦 𝑘 − 𝜔 𝑘 𝑝 𝑘 = 𝛾 𝑘 (𝑠 ^𝑇 _𝑘 𝑦 𝑘 − 𝑦 _𝑘 ^𝑇 𝐻 𝑘 𝑦 𝑘 − 𝜔 𝑘 𝑝 ^𝑇 _𝑘 𝑦 𝑘 )(𝑠 𝑘 − 𝐻 𝑘 𝑦 𝑘 − 𝜔 𝑘 𝑝 𝑘 ),

andthelatter holdsfrom (3.7)with

𝑗 = 𝑘

^. ^Now, ^we^have^to ^prove^that ^(3.6)

holdforany

𝑗 < 𝑘

^. ^F^or

𝑗 < 𝑘

^we^have

𝐻 𝑘+1 𝑦 𝑗 = 𝐻 𝑘 𝑦 𝑗 + 𝛾 𝑘 𝜎 ² _𝑘 (𝑠 𝑘 − 𝐻 𝑘 𝑦 𝑘 − 𝜔 𝑘 𝑝 𝑘 )(𝑠 𝑘 − 𝐻 𝑘 𝑦 𝑘 − 𝜔 𝑘 𝑝 𝑘 ) ^𝑇 𝑦 𝑗 + 𝜔 𝑘

𝑝 ^𝑇 _𝑘 𝑦 𝑗

𝑦 _𝑘 ^𝑇 𝑝 𝑘

𝑝 𝑘 ,

(11)

where

𝐻 𝑘 𝑦 𝑗 = 𝑠 𝑗

^by^the^inductivehypothesis. Moreover,

(𝑠 𝑘 − 𝐻 𝑘 𝑦 𝑘 ) ^𝑇 𝑦 𝑗 = 𝑠 ^𝑇 _𝑘 𝑦 𝑗 − 𝑦 _𝑘 ^𝑇 𝐻 𝑘 𝑦 𝑗 = 𝑠 ^𝑇 _𝑘 𝑦 𝑗 − 𝑦 _𝑘 ^𝑇 𝑠 𝑗 = 𝑠 ^𝑇 _𝑘 𝐴𝑠 𝑗 − (𝐴𝑠 𝑘 ) ^𝑇 𝑠 𝑗 = 0,

where the third equality holds since

𝑦 𝑗 = 𝐴𝑠 𝑗

^, ^for ^any

𝑗

^, ^for ^the ^quadratic

function

𝑓

^. ^Finally,

𝜔 𝑘 𝑝 ^𝑇 𝑘 𝑦 𝑗 = 𝜔 𝑘 𝑝 ^𝑇 𝑘 𝐴𝑠 𝑗 = 𝜔 𝑘 𝛼 𝑗 𝑝 ^𝑇 𝑘 𝐴𝑝 𝑗 = 0,

which follows from the conjugacy of thedirections

{𝑝 1 , . . . , 𝑝 𝑘 }

^generated ^by

theCG.Thus,(3.6)holdforany

𝑗 ≤ 𝑘

ând^theînductionîs^complete.

As an immediate consequence of the previousproposition, we prove now

thenite terminationproperty for aquadratic function, i.e. after at most

𝑛

steps,

𝐻 𝑛+1

îs^theînverseôf^the^Hessianôf^the^quadratic^function.

Corollary3.1. Assumethat

𝑓

^is^the^quadratic^function

𝑓 (𝑥) = ¹ ₂ 𝑥 ^𝑇 𝐴𝑥 + 𝑏 ^𝑇 𝑥

^,

where

𝐴 ∈ IR ^𝑛×𝑛

^is ^symmetric ^and

𝑏 ∈ IR ^𝑛

^. ^Suppose ^that

𝑛

^steps ^of ^the

(unpreconditioned)CGareperformed,in orderto detectthestationarypoint

ofthefunction

𝑓

^,^and^that^the^vectors

𝑝 1 , . . . , 𝑝 𝑛

^are^generated. ^If ^(3.7)^holds,

wehave

𝐻 𝑛+1 = 𝐴 ⁻¹

^.

Proof Byapplying Proposition 3.1,wehavethat(3.6)hold for

𝑘 = 𝑛

^,

i.e.

𝐻 𝑛+1 𝑦 𝑗 = 𝑠 𝑗 , 𝑗 = 1, . . . , 𝑛.

Since

𝑓

^is^quadratic^then

𝑦 𝑗 = 𝐴𝑠 𝑗

^,^for^any

𝑗

^,^i.e.

𝐻 𝑛+1 𝐴𝑠 𝑗 = 𝑠 𝑗 , 𝑗 = 1, . . . , 𝑛.

Now, since

𝑠 𝑗 = 𝛼 𝑗 𝑝 𝑗

^,

𝑗 = 1, . . . , 𝑛

^, ^the^conjugacy ^of^the^vectors

{𝑝 1 , . . . , 𝑝 𝑛 }

impliesthat

𝐻 𝑛+1 = 𝐴 ⁻¹

^.

Wehighlightthat,whenever

𝑘 = 𝑛

^,^Corollary^3.1^justies ^the^rst^part^of

the statement (4) on page 8. Moreover, later on in the paper weshowthat

for

𝑘 < 𝑛

^, ^the ^update ^matrix ⁱⁿ ^(3.2) ^can ^be ^suitably ^modied ^to ^provide^a

(12)

Afteranalyzingthecaseof

𝑓 (𝑥)

^quadratic,^we^turn^now^to^the^general^case

ofanonlineartwicecontinuouslydierentiablefunction. Inparticular,sincewe

areinterestedinusingthematrix

𝐻 𝑘+1

ⁱⁿ^(3.2)^as^apreconditioner,weneedto investigate ifthereexists asuitablesettingoftheparameterssuchthat

𝐻 𝑘+1

is positive denite, provided that (3.7) are satised. In the nextproposition

weprovethat iftheparameter

𝜔 𝑘

^is^below^a^threshold^value,^then^the^matrix

𝐻 𝑘+1

îsâlmostâlways^positive^denite.

Proposition3.2. Let

𝑓

^be^a^nonlinear^twicecontinuouslydierentiablefunc- tion. Suppose that the(unpreconditioned)NCG method isused to minimize

thefunction

𝑓

^. ^Suppose^that ^(3.7)^is^satised^and

(3.8)

0 ≤ 𝜔 𝑘 < 𝑠 ^𝑇 _𝑘 𝑦 𝑘 − 𝑦 ^𝑇 _𝑘 𝐻 𝑘 𝑦 𝑘

𝑝 ^𝑇 _𝑘 𝑦 𝑘

,

with

(3.9)

𝑦 ^𝑇 _𝑘 𝑠 𝑘 + 𝑦 ^𝑇 _𝑘 𝐻 𝑘 𝑦 𝑘 ≤ 0

^or

𝑦 ^𝑇 _𝑘 𝑠 𝑘 − 𝑦 ^𝑇 _𝑘 𝐻 𝑘 𝑦 𝑘 ≥ 0,

where

𝑠 𝑗 = 𝛼 𝑗 𝑝 𝑗

^. ^Then^the^matrix

𝐻 𝑘+1

ⁱⁿ^(3.2)^is^positive^denite.

Proof Bysubstituting(3.4)in(3.2),recallingthat

𝜎 ² _𝑘 = 1

^we^obtain

𝐻 𝑘+1 = 𝛾 𝑘

[ (𝛼 𝑘 − 𝜔 𝑘 ) ² 𝑝 𝑘 𝑝 ^𝑇 𝑘 + (𝛼 𝑘 − 𝜔 𝑘 ) (

(𝐻 𝑘 𝑦 𝑘 ) 𝑝 ^𝑇 𝑘 + 𝑝 𝑘 (𝐻 𝑘 𝑦 𝑘 ) ^𝑇 ) + (𝐻 𝑘 𝑦 𝑘 ) (𝐻 𝑘 𝑦 𝑘 ) ^𝑇 ]

+ 𝜔 𝑘

𝑝 𝑘 𝑝 ^𝑇 _𝑘 𝑦 _𝑘 ^𝑇 𝑝 𝑘

.

Hence

𝐻 𝑘+1

^can^be^rewrittenⁱⁿ^the^form

( 𝑝 𝑘

.

𝐻 𝑘 𝑦 𝑘

)

⎛

⎜

⎝

𝛾 𝑘 (𝛼 𝑘 − 𝜔 𝑘 ) ² + 𝜔 𝑘

𝑦 ^𝑇 _𝑘 𝑝 𝑘

𝛾 𝑘 (𝛼 𝑘 − 𝜔 𝑘 )

𝛾 𝑘 (𝛼 𝑘 − 𝜔 𝑘 ) 𝛾 𝑘

⎞

⎟

⎠

⎛

⎜

⎝ 𝑝 ^𝑇 _𝑘

. . .

(𝐻 𝑘 𝑦 𝑘 ) ^𝑇

⎞

⎟

⎠ .

Therefore

𝐻 𝑘+1

îs^positive^deniteîfândônlyîf^the^followinginequalitieshold:

(3.10)

𝛾 𝑘 (𝛼 𝑘 − 𝜔 𝑘 ) ² + 𝜔 𝑘

𝑦 ^𝑇 _𝑘 𝑝 𝑘

> 0 𝛾 𝑘

(

𝛾 𝑘 (𝛼 𝑘 − 𝜔 𝑘 ) ² + 𝜔 𝑘

𝑦 _𝑘 ^𝑇 𝑝 𝑘

)

− 𝛾 ² (𝛼 𝑘 − 𝜔 𝑘 ) ² > 0.

(13)

Usingtheexpressionof

𝛾 𝑘

ⁱⁿ^(3.5)^and^recalling^that

𝑦 _𝑘 ^𝑇 𝑠 𝑘 > 0

^(as^aconsequence oftheWolfeconditions),(3.10)areequivalentto

(𝛼 𝑘 − 𝜔 𝑘 ) ² 𝑦 _𝑘 ^𝑇 𝑝 𝑘

(𝛼 𝑘 − 𝜔 𝑘 )𝑝 ^𝑇 _𝑘 𝑦 𝑘 − 𝑦 _𝑘 ^𝑇 𝐻 𝑘 𝑦 𝑘

+ 𝜔 𝑘 > 0 𝜔 𝑘

(𝛼 𝑘 − 𝜔 𝑘 )𝑝 ^𝑇 _𝑘 𝑦 𝑘 − 𝑦 _𝑘 ^𝑇 𝐻 𝑘 𝑦 𝑘

> 0.

Aftersomecomputationweobtainthatthereexistvaluesoftheparameter

𝜔 𝑘

forwhich thelatter inequalitiesadmit solutions,with onlyoneexception. In

fact,theyaresatisedforanyvalueof

𝜔 𝑘

^such^that

0 ≤ 𝜔 𝑘 < 𝛼 𝑘 𝑝 ^𝑇 _𝑘 𝑦 𝑘 − 𝑦 ^𝑇 _𝑘 𝐻 𝑘 𝑦 𝑘

𝑝 ^𝑇 _𝑘 𝑦 𝑘

buttheydonotadmitsolutionincase

𝛼 𝑘 𝑦 ^𝑇 _𝑘 𝑝 𝑘 + 𝑦 _𝑘 ^𝑇 𝐻 𝑘 𝑦 𝑘 > 0

^and

𝛼 𝑘 𝑦 _𝑘 ^𝑇 𝑝 𝑘 − 𝑦 _𝑘 ^𝑇 𝐻 𝑘 𝑦 𝑘 < 0,

i.e. when (3.9)doesnothold.

From Proposition 3.1 and Corollary 3.1, we could use the matrix

𝐻 𝑘+1

asanapproximateinverseof

∇ ² 𝑓 (𝑥)

^. ^However,Proposition3.2evidencesthat conditions(3.7)and(3.8)donotsucetoensure

𝐻 𝑘+1

^positive^denite. ^In^fact,

whenever (3.9)occurs, additional safeguard is needed since

𝐻 𝑘+1

^is ^possibly

indenite. Thus, the denition of

𝐻 𝑘+1

^should ^be ^possibly ^modied ⁱⁿ ^order

toobtainpositivedeniteupdates.

4. A preconditioner using a BFGS–like low–rank quasi-Newton update

InthissectionwepartiallyaddressthenalremarkofSection3. Indeed,we

introduceanewclassof preconditionerswhicharestilliterativelyconstructed

by using information from the NCG iterations and, as in the case of BFGS

updates, theyarealwayspositivedenite. Onthis purpose, thepricewepay

withrespectto(3.2),isthatthesecantequationissatisedonlyatthecurrent

(14)

Wedrawourinspirationfrom [4],whereanewpreconditionerforNewton

Krylovmethodsisdescribed. Inparticular,in[4]thesetofdirectionsgenerated

by the Krylov subspace method is used to provide an approximate inverse

preconditioner,forthesolutionofNewton'ssystems. Onthisguideline,observe

that if

𝑓 (𝑥) = ¹ ₂ 𝑥 ^𝑇 𝐴𝑥 + 𝑏 ^𝑇 𝑥

^, ^where

𝐴

^is ^positive ^denite ^and

𝑏 ∈ IR ^𝑛

^, ^then

itis well known(see e.g. [6])that the CGmethod maygenerate

𝑛

^conjugate

directions

{𝑝 𝑗 }

^such^that

(4.1)

𝐴 ⁻¹ =

𝑛

∑

𝑗=1

𝑝 𝑗 𝑝 ^𝑇 _𝑗 𝑝 ^𝑇 _𝑗 𝐴𝑝 𝑗

.

Now,inordertointroduceaclassofpreconditionersfortheNCG,incaseofa

generaltwicecontinuoslydierentiablefunction

𝑓

^,^suppose^we^have^performed

𝑘

^iterations^of ^the (unpreconditioned)NCG, sothat the directions

𝑝 1 , . . . , 𝑝 𝑘

aregenerated. Letus considerthematrix

𝑀 𝑘+1

^dened^by

(4.2)

𝑀 𝑘+1 = 𝜏 𝑘 𝐶 𝑘 + 𝛾 𝑘 𝑣 𝑘 𝑣 ^𝑇 _𝑘 + 𝜔 𝑘 𝑘

∑

𝑗=𝑘−𝑚

𝑝 𝑗 𝑝 ^𝑇 𝑗

𝑝 ^𝑇 _𝑗 ∇ ² 𝑓 (𝑥 𝑗 )𝑝 𝑗

,

where

0 ≤ 𝑚 ≤ 𝑘

^,

𝛾 𝑘 , 𝜔 𝑘 ≥ 0

^,

𝜏 𝑘 > 0

^,

𝐶 𝑘 ∈ IR ^𝑛×𝑛

^is ^symmetric ^positive

deniteand

𝑣 𝑘 ∈ IR ^𝑛

^. Înôrder^toûse

𝑀 𝑘+1

^as^apreconditionerandtoupdate itsexpressioniteratively,weset

𝜏 𝑘 = 1

^,

𝐶 𝑘 = 𝐻(𝜏 𝑘 , 𝛾 𝑘 , 𝜔 𝑘 )

^(with

𝐻(𝜏 0 , 𝛾 0 , 𝜔 0 )

given)and rewrite(4.2)intheform

(4.3)

𝐻(𝜏 𝑘+1 , 𝛾 𝑘+1 , 𝜔 𝑘+1 ) = 𝐻(𝜏 𝑘 , 𝛾 𝑘 , 𝜔 𝑘 )+𝛾 𝑘 𝑣 𝑘 𝑣 ^𝑇 𝑘 +𝜔 𝑘 𝑘

∑

𝑗=𝑘−𝑚

𝑝 𝑗 𝑝 ^𝑇 _𝑗 𝑝 ^𝑇 _𝑗 ∇ ² 𝑓 (𝑥 𝑗 )𝑝 𝑗

.

𝐻(𝜏 𝑘+1 , 𝛾 𝑘+1 , 𝜔 𝑘+1 )

^may ^be ^treated ^as ^a ^symmetric quasiNewton update.

However,for simplicity, in thesequel weprefer to use themoregeneral form

givenby(4.2).

Observethat in the expression of

𝑀 𝑘+1

^,

𝑣 𝑘 𝑣 _𝑘 ^𝑇

^represents^a^rank-1 ^update

and from (4.1) thedyads

𝑝 𝑗 𝑝 ^𝑇 _𝑗 /𝑝 ^𝑇 _𝑗 ∇ ² 𝑓 (𝑥 𝑗 )𝑝 𝑗

âre âimed ^to ^build ân âpproxi-

mateinverse. Theinteger

𝑚

^can^be^viewed^as^a^limited^memory ^parameter,

similarlytotheLBFGSmethod. Moreover,wecansetthevector

𝑣 𝑘

^and^the

parameters

𝜏 𝑘 , 𝛾 𝑘 , 𝜔 𝑘

^such ^that ^the ^class ^of preconditioners

𝑀 𝑘

^satises, ^for

Preconditioned Nonlinear Conjugate Gradient methods