• Non ci sono risultati.

Numerical Methods for Economics: Theories and Applications

N/A
N/A
Protected

Academic year: 2021

Condividi "Numerical Methods for Economics: Theories and Applications"

Copied!
161
0
0

Testo completo

(1)

U

NIVERSITY OF

P

ISA

- S

ANT

’A

NNA

S

CHOOL OF

A

DVANCED STUDIES

M

ASTER

T

HESIS

Numerical Methods for Economics:

theories and applications

Author:

Federico NUTARELLI

Supervisor:

Prof. Mauro SODINI

A thesis submitted in fulfillment of the requirements for the degree of Economics

in the

(2)
(3)

iii

Declaration of Authorship

I, Federico NUTARELLI, declare that this thesis titled, “Numerical Methods for Eco-nomics: theories and applications” and the work presented in it are my own. I confirm that:

• This work was done wholly or mainly while in candidature for a research de-gree at this University.

• Where any part of this thesis has previously been submitted for a degree or any other qualification at this University or any other institution, this has been clearly stated.

• Where I have consulted the published work of others, this is always clearly attributed.

• Where I have quoted from the work of others, the source is always given. With the exception of such quotations, this thesis is entirely my own work.

• I have acknowledged all main sources of help.

• Where the thesis is based on work done by myself jointly with others, I have made clear exactly what was done by others and what I have contributed my-self.

Signed: Date:

(4)
(5)

v

“Thanks to my solid academic training, today I can write hundreds of words on virtually any topic without possessing a shred of information, which is how I got a good job in journalism.” Dave Barry

(6)
(7)

vii

University of Pisa - Sant’Anna School of Advanced studies

Abstract

Faculty Name

Department of Economics and Management

Economics

Numerical Methods for Economics: theories and applications

by Federico NUTARELLI

The Thesis Abstract is written here (and usually kept to just this page). The page is kept centered vertically so can expand into the blank space above the title too. . .

(8)
(9)

ix

Acknowledgements

The acknowledgments and the people to thank go here, don’t forget to include your project advisor. . .

(10)
(11)

xi

Contents

Declaration of Authorship iii

Abstract vii

Acknowledgements ix

1 Theoretical Framework 1

1.1 Introduction to Differential Equations . . . 1

1.2 Another Approach for First Order Differential Equations . . . 5

1.3 Second order Differential Equations . . . 7

1.4 Stability. . . 9

2 Numerical methods 13 2.1 Basic Notions of Numerical Analysis . . . 13

2.2 Finding the zeros of scalar equations . . . 18

2.3 Solving systems of linear equations . . . 24

2.4 Numerical Integration . . . 38

2.5 Function approximation . . . 71

3 ODE Solvers 75 3.1 Taylor’s, Euler’s and Trapezoid methods: . . . 75

3.2 Higher order Runga Kutta methods . . . 78

4 Applications: Ramsey model, Uzawa Lucas 97 4.1 Ramsey model. . . 97

4.2 Uzawa-Lucas model . . . 105

4.3 Time-Elimination: Ramsey model and Uzawa-Lucas. . . 108

4.3.1 General introduction . . . 108

4.3.2 Application to Ramsey-Cass-Koopmans model. . . 110

4.3.3 Other algorithms applied to Ramsey-Cass-Koopmans . . . 113

4.3.4 Time-Elimination applied to Uzawa-Lucas model . . . 122

A Matrix norms 133 A.1 Properties and definitions . . . 133

B Maximum Principle general model 135 B.1 Hamiltonain and first order conditions. . . 135

(12)
(13)

xiii

List of Figures

2.1 Algorithmic error for x3(x1+ x2) . . . 16

2.2 bisection to discontinuous function . . . 20

2.3 . . . 40

2.4 Lagrange polynomials approximation . . . 73

3.1 . . . 80 3.2 . . . 84 3.3 . . . 85 3.4 . . . 86 3.5 . . . 90 4.1 . . . 103 4.2 . . . 103 4.3 . . . 104

4.4 Policy function Ramsey-Cass-Koopmans . . . 113

4.5 Ramsey model with forward shooting . . . 117

4.6 Criterion convergence . . . 117

4.7 Bacward shooting Judd case . . . 122

4.8 Backward shooting Ramsey model . . . 122

4.9 . . . 131

4.10 . . . 131

(14)
(15)

xv

(16)
(17)

xvii

List of Abbreviations

LAH List Abbreviations Here

(18)
(19)

xix

Physical Constants

(20)
(21)

xxi

List of Symbols

a distance m

P power W(J s−1)

(22)
(23)

xxiii

(24)
(25)

1

Chapter 1

Theoretical Framework

1.1

Introduction to Differential Equations

To begin, it is necessary a general definition of what it is called a "differential equa-tion".

In particular, following Kirshan, "any relation involving a dependent variable, inde-pendent variable (or variables), and the differential coefficient (or coefficients) of the dependent variable with respect to the independent variable (variables), is known as differential equation".

In other words, a differential equation is a mathematical equation that relates a func-tion with its derivatives.

An example could be simply

∂y

∂x = cotx Other standard definitions are needed:

Order of a differential equation : the order of a differential equation refers to the order of the highest derivative involved. For example

∂2y

∂x = sinx is of order 2.

Degree of the differential equation :Not to be confused with the order, the degree refers to the highest power to which a derivative is raised. For example, changing our notation and defining∂y∂x as y0( y00will be, then, the same differential coefficient but with an order 2, y000with order 3 and so on; from now on the two notations are used alternatively), it can be easily seen that y00+ y0 = y has order 2 and degree 1, while (y0)2= yorder 1 and degree 2.

Solution of a differential equation: a solution of a differential equation on an inter-val α < t < β where t is the time which often substitutes the x, is any function y(t) which satisfies the differential equation in question on the interval.Nevertheless, is not always possible to find an explicit form for the solution. It is, thus, necessary, to resort to numeric algorithms as explained later in the discussion.

However, a distinction must be done among "general solution" and what it is called "particular solution".

With particular solution it is usually meant a specific result, which, for instance, combined to the homogeneous solution in linear differential equations, gives the general solution.

(26)

The homogeneous solution is simply the solution of y(k)+ k−1 X j=0 aj(t)y(j)= g(t) when g(t) = 0.

Other two main definitions are required in this brief theoretical introduction.

Initial condition: Initial conditions are usually given values of the solution(s) and its derivative(s) at specific points. They are of the form y(t0) = y0and/or y(k)(t0) = yk. Initial conditions are fundamental to solve an initial value problem.

Initial value Problem: An Initial Value Problem (or IVP) is a differential equation along with an appropriate number of initial conditions.

Yet, those basic concepts are not sufficient alone in the proceeding of the work. Some distinctions are, in fact, important.

First of all, it is pointed out the difference among "implicit form" and "explicit form". The latter, indicate a way to express the differential equation, in which the highest order derivative can be explicited as a function of the other elements. As a generic example it is provided the following

y(k)(x) = f (x, y(x), . . . , y(n−1))

Instead, the implicit form, occurs when the differential equation can be expressed like

F (x, y0(x), . . . , y(n)(x))

that is y(n)cannot be sen as a function of the other elments of the equation.

The last relevant discrimination is among linear differential equations and

non-linear differential equations.

In its most genereal significance, the first branch includes all the differential equa-tions having soluequa-tions which can be added together in particular linear combina-tions to form further solucombina-tions. Linear differential equacombina-tions can be ordinary (ODEs) or partial (PDEs). The term "linear" refers to the linearity of the derivatives, so that, for example

y0(x) = y(x) + x2

is linear. For those types of relations, an explicit solution can always be found. Concerning, instead, the other category, non linear interactions among derivatives and the other terms are involved. Here, it is not always possible to find an explicit solution. As an example it is provided the following:

y00(x) = y(x)y0(x)

First order linear differential equations

(27)

1.1. Introduction to Differential Equations 3 • Homogeneous case (b=0) : notice that y(x) = 0 solves the equation; moreover, if z(x) is a solution, then also αz(x) is a solution and finally if z(x) and l(x) are solutions, also z(x) + l(x) will be a solution. Recalling the definition of vector subspace, w is a vector subspace of the vector space V over the field K if and only if 0 ∈ W , ∀x, y ∈ W, x + y ∈ W and ∀α ∈ K, ∀x ∈ W, αx ∈ W .

As verified above, the three conditions are all fullfilled, so that the space of solutions for the homogeneous case is a vector subspace. It can be shown that the dimension (dimV = number of vectors of a basis of v ) of the latter subspace is 1, so that it is sufficient to find just a specific generator to describe it.

We assume as generator y(x) = eax, which solves the equation in homoge-neous case. (it is simple to show in degree one case. In fact from y0(x) + ay = 0 one recovers y0(x) = −aywhich is the exponential decay function).

• Non homogeneous case (b 6= 0) : note that here, the space of solutions does not constitute a vector subspace. In fact,0 /∈ space of solutions. However, a Lemma can be introduced.

Lemma 1: If f0(t), f (t) are two solutions of the differential equation then the dif-ference y = f (t)f 0(t) is a solution of the homogeneous equation y00+ ay0+ by = 0. This lemma implies that the solutions of the Diffeq are given by: f (t) = f 0(t)+ (all solutions of the homogeneous equation).

Indeed, "all solutions of the homogeneous case" have been already discovered above. To prove that y(x) = eax, represents them, it is necessary to recall that every vector space has a basis f1(t), f2(t), . . . , fd(t). So, every solution of the homogeneous equation is a linear combination of the basis elements: a1f1+ a2f2+ · · · + adfd. This means that the solutions of the original Diffeq are given by: f (t) = f0(t) + a1f1(t) + a2f2(t) + · · · + adfd(t)where a1, . . . , adare arbitrary scalars. Thus, we need d linearly independent solutions of the homo-geneous differential equation, where d is the order of the differential equation. But, in this specific case, the order is one, meaning that the only one coordi-nate of the basis can be fully described by a single element, which is y(x) = eax (the proof that the "span" of eax is equal to the space of the solutions for the homogeneous case and that the set {eax} is composed by linearly independent vectors, is left to the reader ). Every possible solution of the homogeneous is, then, a combination of that element, for instance y(x) = y0eax.

The strategy to be implemented is, thus: find a solution for the homogeneous case, which has been already founded and from now on will be denoted as y(x) = y0eax; find a particular solution for the non-homogeneous case; sum them to obtain the general solution.

The simplest solution for the non-homogeneous case is the constant one y. Substi-tuting in y0(x) = ay(x) + b, it is recovered that y = −ab if a 6= 0.

Note that the case a = 0 is easily solved by integrating both sides : R+∞ −∞ y

0(x) = R+∞

−∞ b.

Thus, the general solution for a first order linear differential equation for a 6= 0, is provided by keax−b

a, where k is a constant (the same as y0above). To recover k, it is necessary to set an "initial value problem" of the form

(

y0(x) = ay(x) + b y(t0) = y0

(28)

Non autonomous first order linear differential equations

• General form : y0(x) = a(x)y(x) + b(x)

• Homogeneous case : as in the previous case, the space of the solutions for the homogeneous case is a vector space. Moreover, the general solution will be a combination of the homogeneous solution and a particular solution.

Starting from the homogeneous case formula y0(x) = a(x)y(x), it is implied that y0(x) − a(x)y(x) = 0. Thus, calling A(x) = R+∞

−∞ and multiplying both sides by e−A(x), it is obtained the relation : e−A(x)y0(x) − e−A(x)a(x)y(x) = 0 Noting, then, that the left hand side corresponds to D(e−A(x)y(x)where D(.) is the derivative of the argument, one can integrate both sides obtaining

Z +∞

−∞

D(e−A(x)y(x))dx = Z +∞

−∞ 0

At the end of the process after some simplifications, it is easily recovered a solution for the homogeneous case, that is

y(x) = ke−A(x)

• Non homogeneous case (b 6= 0) : the starting point is now the equation y0(x) = a(x)y(x) + b(x). By adopting the same strategy as of for the homogeneous case (multiplication of both sides by e−A(x)), it is recovered the particular solution y(x) = e−A(x)[R+∞

−∞ b(x)e−A(x)+ k]

Bernoulli differential equations Having completed the linear cases, from this sec-tion are presented a series of non linear first ordel differential equasec-tions. One of the most common representative, is the so called "Bernoulli equation".

As usual it is presented the general form and then the risolution method.

• General form : y0(x) = a(x)b[(y(x))]

• Resolution method : of course, it is unusefull, in this situation, to analyze the homogeneous case, because if b = 0, then all the equation will be 0. Moreover, in non-linear case, the space of solution is usually not a vector space.

Are, thus, required other steps to solve the equation. In particular, it is possible to separate the variables, by splitting all the terms dependent on y to all those which refer to x.

This is simply done if one remembers that writing y0(x)is equivalent to sayδyδx. At this point, all the terms can be rearranged as suggested above to obtain the solution : Z +∞ −∞ δy δb(y) = Z +∞ −∞ a(x)

(29)

1.2. Another Approach for First Order Differential Equations 5

First order non-linear differential equation with exponential term It is shown in this paragraph, one of the most general type of first order non-linear differential equation.

• General form : y0(x) = a(x)y(x) + b(x)y(x)α

• Resolution method : having not a vector space for the solutions it must be adopted an "ad hoc" strategy. A simple proof could be to see whether, given two solutions z(x) and y(x), the sum of the two is still a solution or not; the term (z(x)α+ y(x)α) 6= ((z(x) + y(x))α,will convince the reader that z(x) + y(x) doesn’t belong to the space of solutions.

Notice, first of all, that if b = 0 or α = 1, we come back to previous linear cases. So, the analysis, will focus on the situations in which α 6= 1 ,α 6= 0 and b(x) 6= 0.

A distintion must be done : if y(x) = 0 and if y(x) 6= 0.

If y(x) = 0 and the initial condition is y(0) = 0, then, the solution will be y(x) = 0 ∀x, being y(x) = 0 a stable critical point.

If y(x) 6= 0, all can be divided by y(x), obtaining : y0(x)

y(x) = a(x) + b(x) y(x)α

y(x) = a(x) + b(x)y(x) α−1

Call z(x) = y(x)1−α. Thus, yy(x)0(x) = a(x) +z(x)b(x) and z0(x) = (1 − α)y(x)−αy(x); whence, multiplying and dividing by y(x), one obtains (1 − α)y(x)1−α y0(x)

y(x). So, introduciong the expression of yy(x)0(x) in the equation developed for z0(x), it follows that :

z0(x) = (1 − α)z(x)(a(x) + b(x) z(x))

which is all in z(x) terms. The latter expression can be solved for z(x), and then, y(x) can be easily recovered by the relation among the two variables.

1.2

Another Approach for First Order Differential Equations

This section, is mainly usefull to deal with the tools provided by MATLABand some

of the numerical methods presented.

It is important to underline, in particular, that a first order linear differential equation can be viewed as an input-response with a source term.

In some senses, this simply means, that the first order differential equations falling into the linear categories (some of which analyzed in the previous section), can be generalized in a unique class, which, in turns, resamble the "non autonomous first order linear differential equations".

The variable x is now specified as time t.

In general, if a source term q(t) is defined, one can write the equation in this way : δy

δt = ay + q(t)

The general solution, will, hence, depend on the source itself y(t) = y(0)eat

Z s=t

s=o

(30)

where y(0)eat is the solution of the homogeneous case andRs=t s=oe

a(t−s)q(s)ds is the particular solution.

The source term q(t) can be either a constant c, or an exponential est, cos(ωt), a polynomial, and so on.

Complex exponential as source Let’s assume an oscillating input is given : • General form : δyδt = ay + cos(ωt)

• Resolution method : assume an initial condition y(0) = y0. As above men-tioned, having a vector space in linear cases, one can sum a particular solution to the homogeneous one to obtain the general solution.

Specifically, it is important to focus on a "particular solution", being usually homogeneous categories easy to solve and having already provided a general form to those.

In this specific context, it can be implemented y(t) = M cos(ωt) + N sin(ωt). To see the effectiveness of the result and also find M and N, it is sufficient to substitute it into the equation.

Recalling that the derivative of cosα = −sinα, it is clear thatδyδt = −ωM sin(ωt)+ +ωN cos(ωt), and so −ωM sin(ωt) + ωN cos(ωt) = aM cos(ωt) + aN sin(ωt) + +cos(ωt), whence, cos(ωt)(ωN − aM − 1) = (ωN + aN )sin(ωt).

It is possible to recover M and N by imposing the system :

(

ωN cos(ωt) = aM cos(ωt) + cos(ωt) ⇒ N ω − aM = 1 −M ωsin(ωt) = aN sin(ωt) ⇒ −M ω − aN = 0      M = − a (ω)2+ (a)2 N = ω (ω)2+ (a)2 The latter solution can hugely simplified with the form y(t) = Gcos(ωt − α), which, remembering the formula for the cosine of a difference, corresponds to y(t) = G(cos(ωt)cos(α) + sin(ωt)sin(α)); by equalizing the latter partic-ular solution, with the previous one, one discovers that M = Gcos(α) and N = Gsin(α). Moreover, being sin2(α) + cos2(α) = 1, it can be shown that M2+ N2 = G2(cos2(α) + sin2(α)) = G2, that is G =√M + N.

More complex forms : A more complex form, involves the presence of imagi-nary units in the equation.

The genral for could be summarized as following : δy

δt = ay + cos(ωt) + isin(ωt)

However, the Euler theorem, suggests that cos(α) + isin(α) = eα, so that, all can be rewritten as :

δy

δt = ay + e iωt

which simplifies a lot the calculations needed.

In fact, it is easy now to see that a partiular solution could be provided by y(t) = Y eiωt . To find Y , it suffces to substitute Y eiωt in the equation; the substitution results in

Y = 1

(31)

1.3. Second order Differential Equations 7 Because of the more conviniency of working in terms of real numbers, the aim becomes to find two real solutions starting from the complex ones.

To do that, it is necessary to write the denominator of the particular solution in polar form, iω − a = reiα where, the latter provides a representation of the left hand side in the "Real-Imaginary" plan; r is the magnitude and α the angle formed with the Real axis. The issue is to recover r : r is easily founded by applying the "Pitagora’s theorem" on the triangle formed by the coordinates −a, iω and r itself. r will be, then,√a2+ ω2.

Having r and α, one discovers that iω − a = reiα = √a2+ ω2e. Hence, y(t) = Y eiωt = iω−a1 eiωt= √ 1

a22e

−iaeiωt= 1 a22e

i(ωt−a).

It is easy now to separe the real part, which can be provided as a particular solution :

y(t) =√ 1

a2+ ω2cos(ωt − a) Note that √ 1

a22 is what previously was called G, so that this particular

solu-tion, is equivalent to Gcos(ωt − a) provided before.

Following the same reasoning as before, because the "twin solution" with Gcos(ωt− a)was Gsin(ωt − a), also the latter can be presented as a solution for the more general case δyδt = ay + cos(ωt) + isin(ωt).

1.3

Second order Differential Equations

A brief sketch is here given on second order differential equations (just the linear ones). Of course second order differential equations are mainly characterized by the presence of second derivatives.

Their general form is:

ky00(x) + ay0(x) + by(x) = c(x) where k is assumed to be 1 for sake of simplicity.

Every time c(x) = 0, we have an "homogenous case": y00(x) + ay0(x) + by(x) = 0

The space of solutions for the homogeneous case is a vector space of dimension two. Thus, it is sufficient to find two linearly independent solutions for describing the entire space.

The most natural guess is y(x) = eλx. Then: y0(x) = λeλx, y00 = λ2eλx. Substituting in the second order equation,

λ2eλx+ aλeλx+ beλx= 0 eλx(λ2+ aλ + b) = 0

from which (

eλx= 0 never

λ2+ aλ + b = 0 to be solved where λ2+ aλ + bis called the characteristic equation.

Applying the usual formula to solve second order equations, three cases can be sketched:

• ∆ = a2 − 4b > 0, so that we end up with two distinct solutions λ1 6= λ2. Thus, the outcomes will be of the type: y1(x) = eλ1x and y2(x) = eλ2x. In

(32)

particular those are linearly independent; hence they form a basis for the space of solutions for the homogeneous case. The description of the mentioned space is:

y(x) = c1eλ1x+ c2eλ2x

• ∆ = 0, so that we end up with two coincident solutions λ1 = λ2. The basis is composed yet by just one element: calling λ1 = λ2 = λ, the unique member of the basis is eλx. Actually, the space of solutions for the homogeneous case has dimension two. We need, therefore, another element for the basis to be complete. The most natural choice is y2(x) = xeλx. Notice, in fact that y2(x)is linearly independent from y1(x)1.

• ∆ < 0, so that no real solution is found and we need to look at complex roots. In particular, λ = −a 2 + −i √ ∆ 2 where α = −a2 and β =

√ ∆ 2 . Thus

y(x) = eαx[c1cos(βx) + c2sin(βx)]

Having a solution for the homogeneous case, like in first order differential equations, a solution is needed for the non-homogeneous case:

y00(x) + ay0(x) + by(x) = c(x) with c(x) = w

with w ∈ IR. A possibility could be the constant solution: y(x) = ¯y, y0 = 0, y00 = 0. Hence, substituting 0 + 0 + b¯y = w so that ¯ y = w b if b 6= 0

The total solution in the three cases will be given by the sum of the homogeneous solution with the non-homogeneous one:

• if λ1 6= λ2, y(x) = c1eλ1x+ c

2eλ2x+wb • if λ1 = λ2= λ, y(x) = c1eλx+ c2eλx

• if complex solutions are present: y(x) = eαx[c1cos(βx) + c2sin(βx)] + w b

If b = 0 and the strategy implemented is no more feasable, a new variable needs to be introduced: z(x) = y0(x). So, the second order diferential equation (with b = 0) becomes:

z0(x) + az(x) = w

The latter is a first order differential equation in z(x) that can be easily solved. Once solved for z(x), y(x) is easily discovered by taking the integral of z(x): y(x) =R z(x)

1α

1xeλx+ α2eλx = 0if eλx = 0which never happens or if α1x + α2 = 0, that is if and only if

(33)

1.4. Stability 9

1.4

Stability

For the purposes of this work is not fully developed. In particuar, we will focus just on the key points needed.

First of all, stability mainly refers to so called steady-states. Taking a generic differential equation (linear or not)

∂y

∂t = f (y)

we call Y the steady state, that is the point or the points such that f (Y ) = 0

For instance, if we consider:

a) ∂y ∂t = ay b) ∂y ∂t = y − y 2 c) ∂y ∂t = y − y 3

and put them equal to 0, we will discover that in the first case, the unique steady state is Y = 0, in the second case the two steady states are Y = 0, Y = 1 and finally the last case Y = 0, Y = 1, Y = −1.

The question now is: when are those steady states stable? Or, in other terms, if the initial condition is near the steady state, will it converge to it or will it diverge pro-ceding with time?

Starting with the most simple case, a), we notice that its generic solution is y = eat. Thus, in order for y to reach the steady state it is necessary that a < 0.

The latter result however, is the outcome of a more general Lemma (not demon-strated) according to which

Theorem 1. In order to check the stability of a generic first order differential equation, it

must be considered the derivative

∂f

∂y at y=Y

Then, if∂f∂y < 0, the steady state(s) is (are) stable, otherwise no.

For example in case b), ∂f∂y = 1 − 2y, which, evaluated at the steady state Y = 0 turns out to be 1, while at Y = 1, it is -1. Hence, we conclude that Y = 0 is unstable and on the other hand, Y = 1 is stable.

To understand the reason why the above Lemma represents a valid test for stability, it is necessary to look at the difference among y and Y . Indeed, if that difference goes toward 0, Y is stable, otherwise no. To check it, the derivative with respect to time of y − Y can be performed. Yet, we have

∂ ∂t(y − Y ) = ∂y ∂t − ∂Y ∂t = f (y) − f (Y )

By calculus, the difference of a function evaluated in two points is approximately equal to (∂f∂y)(y − Y ).

So, everything is determined by (∂f∂y): (y − Y ) will go to zero (that is, Y is stable) whenever (∂f∂y) < 0.

(34)

Stability can also be studied for systems of equations and second order differential equations

Starting from the latter, for instance:

y00+ By0+ Cy = 0

we know how to convert it into two first order equations (which was done through the definition of the variable z(x) above). Specifically:

∂ ∂t  y y0  =  0 1 −C −B   y y0  with  0 1 −C −B 

called the companion matrix; λ1and λ2, that are the roots of the characteristic equation described above, are the eigenvalues of the companion matrix.

For the stability of a second order differential equation is required that λ1and λ2are negative or have negative real part.

Finally the last instance of stability concerns systems of equations. We will focus our attention on systems of first order differential equations.

A generic linear dynamical system takes on the form: (

˙

x = ax(t) + by(t) + e ˙

y = cx(t) + dy(t) + f

The general solution of the system takes the form: x y  = c1v1eλ1t + c2v2eλ2t + A−1 −e −f 

with v1and v2the eigenvectors associated to the eigenvalues λ1and λ2of the diagonalizable (not demonstrated) matrixa b

c d 

. To study its stability we must apply the following theorem:

Theorem 2. Let A =a b

c d 

and det(A) 6= 0 (that is λ1 6= 0 and λ2 6= 0); let (x∗, y)the solution for the dynamical system:

1. if λ1 < 0and λ2< 0, then (x∗, y∗)is globally asymptotically stable. 2. if λ = a + / − ib and a < 0, then (x∗, y∗)is globally asymptotically stable. 3. if λ1 > 0and λ2> 0or λ = a + / − ib and a > 0, then (x∗, y∗)is unstable. 4. Finally if λ1and λ2have opposite signs, the point is a saddle.

Notice that if λ1 = λ2= 0(or det(A) 6= 0) it is not possible to classify the solution. Until now just linear dynamical systems have been explored. However, the most interesting cases for the purposes of the work are represented by non linear systems

of the form: (

˙

x = f (x, y) ˙

y = g(x, y)

Assuming that at least one stationary point (x∗, y∗) exists, the following theorem holds:

(35)

1.4. Stability 11

Theorem 3. Let J(x∗, y∗)be the Jacobian matrix evaluated at the stationary solution: J = "∂f ∂x(x ∗, y) ∂f ∂y(x ∗, y) ∂g ∂x(x ∗, y) ∂g ∂y(x ∗, y) #

and λ1, λ2the eigenvaues of J. Then:

1. if λ1 < 0and λ2< 0, then (x∗, y∗)is locally asymptotically stable. 2. if λ = a + / − ib and a < 0, then (x∗, y∗)is locally asymptotically stable. 3. if λ1 > 0and λ2> 0or λ = a + / − ib and a > 0, then (x∗, y∗)is unstable. 4. Finally if λ1and λ2have opposite signs, the point is a saddle.

Notice that if λ1 = λ2 = 0(or det(J) 6= 0) it is not possible to classify the solution as well as in the linear case.

An empirical trick to quickly determine the stability of a system is provided by the "Cartesian rule". In particular studying the sign of the generic characteristic equation resulting from the study of eigenvalues of a generic matrixa b

c d 

, that is λ2+ λ[−a − d] + ad − cb = λ2 − tr(A)λ + det(A), it is possible to guess the sign of its solution. In particular if no change of sign occurs, then the solution would be negative, otherwise positive.

For instance, if tr(A) < 0 and det(A) < 0, then λ2 +

− tr(A) +

λ + det(A) −

. Hence the se-quence of signs will be: ++

no change in signs

+− change in signs

. Thus we would expect a saddle stability (the sign of the first eigenvalue will be negative and opposite to the sign of the second one).

(36)
(37)

13

Chapter 2

Numerical methods

2.1

Basic Notions of Numerical Analysis

Dealing with real numbers on a computer is impossible, because each number is represented by some bits, taking values 0 or 1. Even though huge progresses were made in hardwares to improve the quality of representation, some error is unavoid-able.

The main sources of error are given by truncation, rounding and number represen-tation.

This is due to the fact that, being the hardware based on a binary logic, one cannot perform the usual decimal representation.It is, thus, important to have a basic un-derstanding of the machine language.

First of all, let us deal with the representation issue.

For instance, to go from a decimal base representation to a binary on, a clear general formula is provided:

N = (bk2k) + (bk−12k−1) + · · · + (b121) + (b020) (2.1) The fractional part of a number (part after comma), is represented by

∞ X

k=1

dk2−k (2.2)

An example is the backward transposition into decimal of (10101.1)2; starting from 21.5 : (21.5)10 = 24+ 22+ 20 + 2−1 = (10101.1)2, one can verify its validity by ap-plying formula 2.1 for the part before the comma of the binary representation, that is (10101)2, while formula 2.2 for the residual (.1)2.

By following this procedure and starting with the fractional part (the 1 after the comma), one would end up with 1 × 2−1+ 1 × 20+ 0 × 21+ 1 × 22+ 0 × 23 = (21.5)10. In practice, a necessity could be to represent both quite large and quite small num-bers. To do this, it is useful to restor the floating point representation of a number:

x ≈ q × 2n

where q is the mantissa and n is the exponent (for example: to write a large number such as 78244000, one can use 7, 824 × 103or 0.07824 × 105and so on). Both mantissa and exponent, must be written in binary terms, and because a finite space of mem-ory is available to store the mantissa, often a roundoff error is commited.

The last source of error, is the truncation. This happens when we substitute an infi-nite sum with a fiinfi-nite sum.

A more accurate analysis on numerical errors, however, is needed. For instance, the explained errors concern just the initial representation of the number, but many

(38)

other errors due to the representation of the result of successive operations as ma-chine numbers, may occur. The approximation of the result to the nearest mama-chine number should be done at the end of any operation.

This situation lead to the so called phenomenon of the propagation of the errors. Some examples can be easily experienced in MATLAB: indeed, the simple operation 1 − 3 ∗ (43 − 1) results in a 2.2204 ∗ 10−16instead of 0.

Generally, in literature, the one of calculators is called finite arithmetics; in fact the following algebraic properties are not valid for calculators:

• Associative property for addition: (x + y) + z = x + (y + z); • Associative property for multiplication: (xy)z = x(yz) • If x + y = y + z then x = z;

• If xy = yz and y 6= 0, then x = z;

• Distributive property: x(y + z) = xy + xz; • Semplification: x(y/x) = y;

For the reasons explained, the computation of a simple function f : IRn→ IR could be affected by various errors:

• Inherent Error: it is generated by the approximation of a vector x with machine numbers ˜x so that f (˜x) is computed instead of f (x). It depends just on the function f and not on the way it is computed. It is not due to the fact that an approximation was done by the algorithm that implement a representation of f; instead it is intrinsic in f itself depending on the approximation of x with a machine number. It can be represented as following:

in =

f (˜x) − f (x) f (x) The Inherent error is unavoidable.

• Algorithmic Error: it is due to the fact that computations are done in a finite arithmtic. So it deals with the fact that φ(˜x)is calculated rather than f (˜x); an example is the previous 1 − x ∗ (43 − 1) when x = 3. A formal expression is given by:

alg=

φ(˜x) − f (˜x) f (˜x)

• Analytical Error: it is generated by the approximation of the function φ with the function ψ. This is due to the fact that the calculator is able to efficiently and effectively compute a discrete number of functions, that are those with a finite number of sums, products, subtraction and divisions. The not-rational func-tions(logarim, trigonometric functions and so on) are instead approximated. The algorithmic erro, thus, depends just on the algorithms used to approxi-mate those functions and it is represented by:

an =

ψ(˜x) − φ(˜x) φ(˜x)

(39)

2.1. Basic Notions of Numerical Analysis 15 Obviously, those definitions are valid if and only if f (˜x), f (x), φ(˜x)are all 6= 0. In conclusion, in evaluating a function, a calculator could commit the following total error:

tot = ψ(˜x) − f (x) f (x)

A theorem is stated without proof to underline the relation among the mentioned errors:

Theorem 2.1. Assuming f (x) 6= 0, f (˜x) 6= 0, φ(˜x) 6= 0, the following holds:

tot = in+ alg+ an+ algan+ inalg+ inan+ alganin≈ alg+ an+ in But how can we estimate Inherent and Algorithmic errors?

Estimate of Inherent Error: Supposing that the function is differentiable, one could take the first order approximation of it:

f (˜x) − f (x) ≈ n X i=1 δf δxi(x)( ˜xi− xi) Hence, the Inherent error results to be:

in= f (˜x) − f (x) f (x) ≈ n X i=1 δf δxi (x) xi f (x) ˜ xi− xi xi from which: in≈ n X i=1 cii with ci= xi f (x) δf δxi (x) and i = xi˜ − xi xi

where ciare the so called coefficients of amplification and ithe representation errors of the components of the vector x. Whenever the coefficients ci are high in module, the computation of f (x) is bad conditioned.

The case of sum and difference presented below, are particularly exemplificative of the complexity for the calculator to compute those types of operations:

• f (x1, x2) = x1+ x2results in in = c11+ c22where c1= x1

x1+x2 and c2 =

x2

x1+x2

• f (x1, x2) = x1− x2results in in = c11+ c22where c1= x1x−x1 2 and c2 =

−x2

x1−x2

It is clear that if x1 ≈ x2or x1 ≈ −x2, the coefficients would tend to infinite leading to very bad conditioning and high Inherent errors.

Estimate of Algorithmic Error:As seen above, this type of error is dependent on the particular sequence of performed operations. Therfore, a graph could be useful for the explanation. The nodes are the intermediate result of every operation; to every node it is associated the error in the approximation to the nearest machine number. The weights for every directed arch are represented by the coefficients of amplifica-tion ci.

Iteratively it is possible to calculate the Algorithmic error, considering that initial data are considered machine nubers with zero approximation error. The latter is eventually considered in the Inherent error after at least an operation is performed. Figure 2.1 shows the algorithmic error for the function f (x1, x2, x3) = x3(x1+ x2).

(40)

Specifically it is displayed how, due to the finite arithmetic, the two mathematically equivalent expressions A) x3(x1+ x2)and B) x3x1+ x2x3produce two different algo-rithmic errors: The algoalgo-rithmic error is expressed by the sum of the inherent errors

FIGURE2.1: Algorithmic error for x3(x1+ x2)

committed in every operation, weighted by the relative amplification coefficients. Thus, after a series of simple computations one can find out that

algA = η2+ 1 · 0 + 1 · (η1+ x1 x1+ x2 · 0 + x2 x1+ x2 · 0) = η1+ η2 Similarly: algB = η3+ x1 x1+ x2·(η1+1·0+1·0)+ x2 x1+ x2·(η2+1·0+1·0) = η3+ 1 x1+ x2·(η1·x1+η2·x2) To conlude this overview on errors, it is worth to hint at the stability of the algo-rithmic error through the above example. In particular, algorithm A) appears to be more stable than B) which can be affected by huge numerical errors in the case of numerical cancellation phenomena, that is x1≈ −x2.

(41)

2.1. Basic Notions of Numerical Analysis 17

Conditioning and instability: The problem with the mentioned errors, occur when they propagate within steps of an algorithm and it is called stability of the algorithm. Consider a numerical problem as a mapping y = f (x) which transforms the input data x into the output y. As seen above, some roundoff errors, might be introduced in representing the input. Stability occurs when this little error in the input does not have huge or relatively big propagations in the output. In other words,we should check the effects on the output of a perturbation δx of the input. Note that there is a trade-off between stability and computational efforts (time to compute the algo-rithm): if the researcher aims to a stable algorithm, time to compute is the price to pay.

Let ¯x = x + δxdenote the actual input, where δx is the roundoff error. The output related to ¯xshould be f (¯x), whereas the algorithm yield an answer, say y∗.

The algorithm is called "stable", if the relative error is below or equal the magnitude of the machine precision (that is a very low value):

kf (¯x) − y∗k kf (¯x)k ≤  where  is the machine precision.

There is still another issue called conditioning, which, differently from the stability that is directly related with the algorithm, is related to the difficulty in solving the problem per se. This means that, sometimes, a perturbation in the input, which is hugely propagated in the outut, could lead to the misleading conclusion that the al-gorithm has some stability issue, while, instead the mistake is related to the problem that we are trying to solve.

Indeed, we analyze the latter issue by comparing f (¯x)with f (x).

Ideally, one would like that when the error in the input is small, also the error in the output is small. In other terms, the optimal condition is to have a bounded relation like the following

kf (x) − f (¯xk) kf (x)k ≤ K

kx − ¯xk

kxk (2.3)

where K is called the condition number. Of course the higher is K, the bigger is the ill-conditioning of the problem.

Putting together the concepts, one will find a good answer to a problem, when the problem itself is well conditioned and the algorithm is stable.

A solution to a numerical problem could be reached directly or using an iterative algorithm. In the latter case, one would require that the sequence generated by the iterations, conducts to the correct solution x∗. Furthermore, the speed of conver-gence, quantified by a rate, should be fast.

This means that, if we define the rate of convergence as kxn+1− x∗k ≤ kxn− x∗kα

the larger the α, the better (it is similar to say that bigger steps toward the solution x∗can be taken).

When using an iterative algorithm, we may have no precise idea of the number of iterations needed to get a reasonable solution; in other cases, instead, the desired so-lution is reached after a known number of steps (direct method). For drect methods, it is possible to compute the number of operations to reach the solution: this is the "algorithm complexity". The number of computations will be a function of the size

(42)

and of the specific characteristics of the problem: this is the "problem complexity". Usually, it is much easier to compute the "worst case complexity" of a problem. For example in an optimization problem there might be up to 2N solutions. However, it could be the case that some of the solutions are not feasable, meaning that 2N is a measure of the higher size possible of the problem (the worst case) or technically, the complexity is O(2N).

This is quite undesirable, because implies an exponentially growing size. Efficient algorithms, have a polynomial complexity of order O(Np) where p is constant (the growth in size and complexity is much slower). So algorithm complexity is a func-tion of problem complexity and the optimal situafunc-tion is when both are low.

2.2

Finding the zeros of scalar equations

A common task in Finance as well in Economics, is to to solve non-linear equations. In particular, one might want to find a solution of an equation in a single variable, such as f (x) = 0 or a system of equations in several variables, such as F (X) = 0. MATLAB offers approaches for both cases. In the following paragraphs are analyzed Bisection method and Newton’s method.

Bisection Method: It is the simplest method for solving the scalar equation f (x) = 0, in that it does not require the existence of an explicit or analytical expression for f , but just the ability to estimate it and evaluate it. This feature is important for cases in which, for instance, Newton’s Method is not applicable because it is impossible to find the derivative of a function.

The intuition behind Bisection method is that, if we have two points a and b such that f (a) < 0 and f (b) > 0, then, if the function is continuous, it must cross the axis somewhere in the interval [a, b]. So, one could adopt a recusrive strategy, which con-sists in reducing the interval and checking the sign at the midpoint c = a+b2 . Three cases are now possible: if f (c) = 0 the solution is found; if f (c) < 0, the zero must be somewhere in the interval [c, b]; otherwise, the interval to check is [a, c]. The same route is followed again taking the midpoint of the subinterval chosen and so on con-structing smaller and smaller intervals which bracket the zero.

Formally, a non-decreasing sequence an and a non-increasing sequence bnare gen-erated, such that:

|r − cn| ≤

b0− a0 2n+1 ,

where r is the unknown root and cnis the sequence of midpoints for any subinter-val, that is cn= (bn+an)

2 . The above definition implies that there will be convergence; specifically the linearity of the convergence can also be proved. The approach will give just approximate solutions and not exact one.

Furthermore, it is necessary to define some termination criteria. Usually the follow-ing are adopted:

• bn− an< δ, that is when the subinterval is close enough. • |f (cn)| < 

(43)

2.2. Finding the zeros of scalar equations 19 All of them can be used jointly.

The MATLAB function to implement is "fzero".However, a simple function could also be artificially constructed. The following are examples of application of the method: in the first example we get a "false solution"; in the second, no solutions arise and finally it is provided a case in which bisection applies.

For sake of completeness, the artificial function is reported:

1 function p = bisection (f,a,b) %f = @(x) function defined on x;

2

3 if f(a) * f(b) > 0

4 fprintf('wrong choice\n');

5 else

6 p = (a + b)/2;

7 err = abs(f(p));

8 while err > 1e-7 %stop criteria = tollerance level

9 if f(a) * f(p) < 0 10 b = p; 11 else 12 a = p; 13 end 14 p = (a + b)/2; 15 err = abs(f(p)); 16 end 17 end

It basically litterally does what stated in theory and uses a tollerance level (the sec-ond stop criteria)as stop criterion.

As first example, the non-linear equation1x = 0is considered. Trying to solve it with MATLAB unction fzero, generates a false solution:

1 fzero(@(x) 1/x,3)

2

3 ans =

4

5 -2.777551328724977e-16

We get a very small number which is virtually a zero. This is actually not a solu-tion; indeed, we are applying the method to a discontinuous function: bisection is here deceived by the fact that at each iteration, the intervals are actually shrinking because of the change in sign. Thus, near zero, at least a stop criterion, which is bn− an< δis fulfilled. To get a better understanding of the issue, the discontinuous function is displayed below:

(44)

-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 -200 -150 -100 -50 0 50 100 150 200

FIGURE2.2: bisection to discontinuous function

The second example, shows a case in which no solution is reached. The equation is x2 = 0. Whatever interval [a, b] is taken in the regular parabola, in fact, f (a) and f (b)will never have opposite sign, so that bisection method cannot help. This is the result:

1 fzero(@(x)x^2, 3)

2 Exiting fzero: aborting search for an interval containing a sign change

3 because NaN or Inf function value encountered during search.

4 (Function value at -1.8203e+154 is Inf.)

5 Check function or try again with a different starting value.

6

7 ans =

8

9 NaN

Finally a "good behaviour" example is provided using the function sin(x) in the interval[−1, 1]; the artificial function is used:

1 f = @(x) sin(x); 2 bisection(f,-1,1) 3 4 ans = 5 6 0

Newton’s method: Unlike Bisection method, Newton’s Method exploits some knowl-edge about f and in particular the first derivative of the function. Thus, to solve the scalar equation f (x) = 0, one need to assume that f ∈ C2, i.e. it is sufficiently con-tinuous and differentiable.

The differentiability is used through a truncated Taylor expansion. Assuming that x(0) is not a solution of the equation since f (x) 6= 0, one would like to discover the step ∆x such that

(45)

2.2. Finding the zeros of scalar equations 21 To do that, we may consider the Taylor expansion

∆x = −f (x (0)) f0(x(0)).

The expansion is truncated, so that a root is not found in one step; the idea, hence, is to use a sequence of points:

x(k+1) = x(k)− f (x (k)) f0(x(k))

In Newton’s Method, convergence is quadratic and not linear like in Bisection method. The bad news is that convergence is only local, meaning that unless the sequence is started near the root, Newton may fail.

The described approach, is useful in case of a single equation. An example is pro-vided below. Specifically, the evaluation was done for the function f (x) = x2 − 1 and the root 1 is found. The codes used for Newton’s Method are partly taken from MathWorks forum; those are:

1 f = @(x) x^2-1;

2 df = @(x) 2*x; %WE NEED TO DEFINE ALSO FUNCTION'S DERIVATIVE BEFORE

3 4 x = .1; %initial guess 5 keep_trying = true; 6 k = 0; %counter 7 8 while keep_trying 9 10 x = x - f(x)/df(x); 11 k = k+1; 12

13 if abs(f(x)) < 0.001 %change tolerance as needed

14 keep_trying = false;

15 end

16

17 if k>15

18 keep_trying = false; %give up after 15 iterations!

19 end 20 21 end 22 23 x = 24 25 1.0000

A huge advantage of Newton’s Method, is that it is immediately generalized for systems (differently from Bisection), to solve

F (X) = 0

where F = [f1, f2, . . . , fn]0. A focus on the theory behind is due. Suppose to have an approximation x(k) = [x(k)

1 , x (k) 2 , . . . , x

(k)

n ]0 of the root x∗ = [x∗1, x∗2, . . . , x∗n]0, and remembering that, for the single equation case, f (x(0))+f0(x(0))∆x ≈

(46)

0, one may write f1(x(k)) + (x∗1− x (k) 1 )  δf1 δx1  x=x(k) + · · · + (x∗n− x(k)n ) δf1 δxn  x=x(k) ≈ 0 f2(x(k)) + (x∗1− x (k) 1 )  δf2 δx1  x=x(k) + · · · + (x∗n− x(k)n ) δf2 δxn  x=x(k) ≈ 0 . . . ... fn(x(k)) + (x∗1− x(k)1 ) δfn δx1  x=x(k) + · · · + (x∗n− x(k)n ) δfn δxn  x=x(k) ≈ 0 which is a system of linear equations in which the coefficients are the Jacobian ma-trix: J (x(k)) =             δf1 δx1  x=x(k)  δf1 δx2  x=x(k) . . .  δf1 δxn  x=x(k)  δf2 δx1  x=x(k)  δf2 δx1  x=x(k) . . .  δf2 δx1  x=x(k) .. .  δfn δx1  x=x(k)  δfn δx1  x=x(k) . . .  δfn δx1  x=x(k)           

A solution of the system is reached by solving J(k)∆x(k)= −F (x(k)) and setting

x(k+1) = x(k)+ ∆x(k).

Examples of those method are provided using the following MATLAB function:

1 function x=multinewton(f,x,NumIters) 2 3 [y,dy]=f(x); 4 for j=1:NumIters 5 s=dy\y; 6 x=x-s; 7 [y,dy]=f(x); 8 end

Hence, one should provide a multivariable function (which is our system of equa-tions):

1 function [y, dy]=myfunction(x)

2 %Example function to try out Newton's Method

3 %

4 n=length(x);

5 y=zeros(size(x)); %Not necessary for a small vector

6 dy=zeros(n,n); %Not necessary for a small matrix

7 y(1)=-x(1)^3+x(2);

8 y(2)=x(1)^2+x(2)^2-1;

9 dy(1,1)=-3*x(1)^2; dy(1,2)=1;

(47)

2.2. Finding the zeros of scalar equations 23 In this specific case the system is

(

y1 = −x31+ x2 y2 = x21+ x22− 1 And the solution is:

1 >> x=[1;2]; 2 >>y=multinewton(@myfunction,x,7); 3 >>y 4 5 y = 6 7 0.8260 8 0.5636

That is: starting from an initial approximation x = [x1, x2] = [1, 2] and apply-ing Newton’s method on the system, we have obtained that the zeros are x∗ = [0.8260, 0.5636]

(48)

2.3

Solving systems of linear equations

Methods for solving linear equations can be classified in direct and iterative.

Because many issues may arise in solving linear systems (it might be difficult with certain matrices), it is vital to measure problem conditioning, which, in this case, amounts to consider the condition number of the matrix.

Take the linear system Ax = b and suppose to perturb b by adding a term δb (imag-ine that the perturbation is due to rounding off, see above). The solution will be perturbed too: A(x + δx) = b + δb, which implies that δx = A−1δb. Remembering that a problem is ill conditioned if a little perturbation on the input degenerates in the output, one would like to assess the error in the solution, δx, as a function of the input error δb.

By adopting compatible matrices (see Appendix A), one can write: • kδxk = kδA−1δbk 6 kA−1kkδbk

• kbk = kAxk 6 kAkkxk

Thus, noticing that Ax + δx = b + δb, but, Ax = b, so,Ab + Aδx =Ab + δb, one ends up with: kδxk kAkkxk 6 kA −1kkδbk kbk ⇒ kδxk kxk 6 kAkkA −1kkδbk kbk

, which resemble the (2.3), where, now the condition number is K(A) ≡ kAkkA−1k and gives an upper bound to the relative error in the solution to the perturbation. Obviously, the higher is the condition number, the more difficult will be to solve a linear system.

In particular, a theorem, relates the results of linear algebra iand the findings about numerical analysis in terms of "difficulty in solving linear systems".

Theorem 2.2. Let A a non-singular matrix of order n. Then for any subordinate matrix

norm we have 1 cond(A) = min ( kA + Bk kAk Bis a singular matrix )

The theorem basically states that, when condition number is large, the matrix can be well approximated by a singular matrix, that is the linear system is more difficult to solve.

Anyway, it is important to notice that ill-conditioning, is not necessarily related to singularity. The numerical troubles are, in fact mostly related to the matrix A itself which may lead to degenerating perturbations in the output given a small change in the input.

Direct methods: PA-LU approach The basic idea behind direct methods is to trans-form the matrix A, given a linear system Ax = b, into a suitable trans-form, for example an upper or a lower triangular form. Triangular forms are, in fact, easy to solve.

(49)

2.3. Solving systems of linear equations 25 Let us consider a system Ax = b where A is upper triangular, such that:

a11x1+ a12x2+ · · · + a1nxn= b1 a22x2+ · · · + a2nxn= b2

.. . = ... annxn= bn

(2.4)

It is now easy to see that by backsubstitution from the last variable, the generic el-emnt xk= (bk−

Pn

j=k+1akjxj)

akk .

Now that is clear how it is simple to solve linear systems with A in triangular form, it is useful to discover a method to transform a linear system into an equivalent tri-angular form.

Gaussian elimination is such a procedure.

To show it, it is necessary to start from a linear system in such a form (E1) a11x1+ a12x2+ · · · + a1nxn= b1 (E2) a21x1+ a22x2+ · · · + a2nxn= b2

.. . = ... (En) an1x1+ an2x2+ · · · + annxn= bn For each equation, the transform

(Ek) ← (Ek) − ak1 a11

(E1), which leads to the equivalent system

a11x1+ a12x2+ · · · + a1nxn= b1 (1) a22x2+ · · · + (1) a2nxn= (1) b2 .. . = ... (1) an2x2+ · · · +ann(1)xn= (1) bn

where the coefficients with overset (1) are the new equivalent coefficients obtained after the transformation.

Repeat this transformation under the coefficients ak2, then ak3, and so on until the desired triangular form is obtained.

The glaring difficulty and limit of Gaussian elimination is that, in order to perform it, is necessary that a11 6= 0 for the first step, a22 6= 0 for the second and so on. If the system is non-singular (the matrix A is not near to singularity), a permutation of columns and/or rows can be performed to solve the mentioned issue.

It can be useful, at this point to show an example from the book of Prof. Paolo Brandimarte, "Numerical Methods in Finance and Economics".

Consider the matrix

A =   5 1 4 0 0 3 0 1 2  

(50)

Since the element a22of our matrix A is 0, Gaussian elimination would fail. How-ever, one can simply invert the second and he third equation, so that the element a22 after the permutation, becomes 1 and the Gaussian elimination can be implemented safely. The permutation matrix should have the following characteristics:

• All elements are either 0 or 1

• For each row and column, at least one element is equal to 1 Namely, in the example the permutation matrix will be

P =   1 0 0 0 0 1 0 1 0  

that implies a swap of second and third equation (notice in fact that in second and third row the ones are reversed with respect to the original equations) when P mul-tiplies A.

Hence, P A will result in:

P A =   1 0 0 0 0 1 0 1 0     5 1 4 0 0 3 0 1 2  =   5 1 4 0 1 2 0 0 3  

As anticipated before, second and third row of the resultant matrix are inverted. The result of Gaussian elimination (the system in triangular form, that is a trans-formation of matrix A in triangular form), can be used in particular circumstances analyzed below.

In many occasions, indeed, it is convinient to defactorize the matrix P A into the product of an upper and a lower triangular matrix. Note that the matrix P is not always necessary.

More precisely we have

P A = LU

where U (upper triangular matrix) is exactly the end result of Gaussian elimination and L (lower triangular matrix), are the transformations to perform in order to ob-tainthe equivalent system in upper triangular form. The latter are namely linear combinations of rows, deriving from (Ek) ← (Ek) − aak111(E1); it is sufficient to know that L is the matrix which, multiplied by U , gives A (or P A). In fact, a specific func-tion in MATLAB is dedicated to find the LU conversion.

With such factorization, solving the system Ax = b, is equivalent to solving Ly = P b

U x = y

Basically the advantage of using LU decomposition is in terms of computational time and numerical difficulties.

Indeed, LU decomposition take less time to solve a linear system than Gaussian elimination, and does not have problems of singularity like "matrix inversion". To fully understand and exploit the pros of P A − LU , a simple application is dis-played.

(51)

2.3. Solving systems of linear equations 27 Suppose that the system one would like to solve is of the form AX = b where

A =   1 5 −4 8 −9 7 10 −32 9  

and b = [256]; yet a comparison between the classical solution A(−1)band the P A − LU solution, is proposed. The two are identical; the fundamental differences are that, while the "classical solution" has for sure trubles whenever A is singular or close to singular, P A − LU does not. Indeed, it does not specifically require that A is invertible.

Moreover, P A − LU take less time than "classical approach".

Notice also the usage of MATLAB function lu(A), which provides at the same time, the matrix P needed, the lower and the upper triangular matrices for A:

1 %%%%%%%%%%%%%%%%%%%PA-LU METHOD%%%%%%%%%%%%%%%%%%%%%%%%%

2

3 A = [ 1 5 -4; 8 -9 7; 10 -32 9];

4

5 [L,U,P] = lu(A) %this command implements the LU decomposition of a ...

matrix 6 7 L = 8 9 1.0000 0 0 10 0.8000 1.0000 0 11 0.1000 0.4940 1.0000 12 13 14 U = 15 16 10.0000 -32.0000 9.0000 17 0 16.6000 -0.2000 18 0 0 -4.8012 19 20 21 P = 22 23 0 0 1 24 0 1 0 25 1 0 0 26 27 b = [2; 5 ; 6]; %define a vector b 28

29 x_classical = A\b %classic solution

30 31 x_classical = 32 33 0.8720 34 0.0088 35 -0.2710 36

37 x_LU = U\(L\(P*b)) %solution implied by the PA-LU decomposition is ...

the same 38 39 40 x_LU = 41 42 0.8720

(52)

43 0.0088

44 -0.2710

Solving linear systems through iterative methods: it is convinient to begin our discussion, by considering that iterative schemes are one possible approach when the fixed point of a generic operator G(x) (not necessarily a function) is needed. Later on, the reader will understand the reason why.

In particular, the aim is to find a fixed point of G, i.e., to satisfy the equation x = G(x)

One can try with the iteration scheme called fixed point iteration:

x(k+1)= G(x(k)) (2.5)

starting from an initial approximation x(0). Consider that the method can be applied for both linear and non linear equations. The basic idea, originates from the evidence that the roots founded by reporting a function (or more in general an operator) on the axis y = x, are the same. In algebraic terms, finding f (x) = 0, is equivalent to find where f (x) + x = x, which is a fixed point with respect to f (x) = x (just adding xboth sides). Thus, what one can do, is to start by a generic x(0)and progressively use G(xk) = f (x) + x, which graphically represents the point x reported on f (x) + x, as x(k+1). The latter passage is obtained by transposing G(xk) on the line y = x. If there is convergence, by recursively repaeting the operations suggested, one will find an x such that going further makes no sense, because G(x) = x at this x, so that we stay there till we begin to search another fixed point with a new starting point. Hence, the founded x, from which one cannot depart, will be a fixed point with respect to G(x) and a root with respect to f (x)

The questions are if and when th approach converges to a fixed point of G. The answer lies in the so called contraction mapping.

For sake of simplicity, the investigation is conducted on a system of linear equations Ax = b, which can be rewritten as x = (A + I)x − b = ˆAx − b. Call G(x) = ˆAx − b the operator whose fixed point is searched, and apply the above method to see if it converges.

The iteration scheme will diverge as n → ∞. In fact, we have that: x(1) = ˆAx(0)− b x(2)= ˆAx(1)− b = ˆA2x(0)− ˆAb − b x(3) = ˆAx(2)− b = ˆA3x(0)− ˆA2b − ˆAb − b . . .

so, intuition suggests that the scheme will diverge as n → ∞, because ˆAgrows with-out bound.

It can be shown that convergence happens only when the eigenvalues of ˆA have an absolute value less than one. Thus, the method x(k+1) = G(x), though simple, cannot be exploited in genereal for arbitrary systems of equations, because it is not always the case.

Another approach must be taken. The advantage, as we will see, is that, with the methods which will be taken into account, convergence will not depend on some-thing uncontrollable for us such as ˆAin previous method, but on something that we

(53)

2.3. Solving systems of linear equations 29 are able to choose and manipulate: it involves, in particular how we split the matrix A(which is our arbitrary choice).

The approach consists, firstly, in splitting arbitrarly (other methods were developed in order to choose optimally the split in terms of computation to be implemented as we will see later on) the matrix A:

A = D + C which yields an equivalent system:

Dx = −Cx + b

To solve it, one might improve the convergence by exploiting the flexibility in choos-ing D. Namely, it can be taken advantage of the fact that D can be choosen arbitrarily, by selecting an ad hoc alternative, such that Dx(k+1)= d(k). The iteration scheme will be, then:

d(k)= −Cx(k)+ b (2.6)

which is solved for x(k)in order to generate a sequence of approximations.

But, what are tha claimed advantages of using this method instead of the one which exploit just scheme (2.5)?

The convenience is mostly in terms of convergence.

To investigate the issue of convergence it is convenient to write d(k)as Dx(k+1)which simply involves solving part of the previous system. Recovering x(k+1) we end up with

x(k+1)= −D−1Cx(k)+ D−1b

Let B = −D−1C = I − D−1A; then one can check the evolution of the absolute error (which represents the divergence: the highest the error the highest the divergence from the correct solution) e(k)= x∗− x(k+1)denoting as xthe correct solution. Hence, substituting the expression of x(k+1) and xin the error lagged of one step, the following is recovered:

e(k+1)= x∗− x(k+1) = −D−1Cx+ D−1b − (−D−1Cx(k)+ D−1b) = = (Bx∗+ D−1b) − (Bx(k)+ D−1b) = B(x∗− x(k)) = Be(k) By backward substitution, it is easy to see that

lim k→∞e

(k)= Bke(0)

The error, then will not grow bigger if for k → ∞, Bktends to 0.

Theorem 2.3. lim

k→∞B

k= 0if and only if the spectral radius of B is strictly less than 1.

Remembering that the spectral radius is just the supremum among the absolute values of the elements in the spectrum (set of its eigenvalues) of a matrix, the the-orem simply states that if all the eigenvalues of B have an absolute value less than one, then convergence happens. Denoting as ρ(B) the spectral radius of B, we can conclude that the approach will converge if and only if

(54)

The issue is that, verifying this condition may be time consuming, because of the need of checking all the eigenvalues of a possibly large matrix such as B. To avoid this truble, being B a matrix whose norm is compatible with a vector norm (this can be seen by starting from x(k+1) = Bx(k)+ D−1b), one can recall that

ρ(B) ≤k B k

so that if ρ(B) is less than 1, for sure also k B k will be, with the advantage that computing the latter is simpler, being able to choose the king of norm with which we want to calculate it (it can be choosen easy computable norms such as k . k∞or k . k1). The reader is invited to see Appendix for further information.

From a pratical point of view the whole method makes sense if it is easy to compute the (2.6). By a proper choice of D, such that the norm of B is less than one and it is simple to compute (2.6), various approaches are recovered.

Here are reported the Jacobi and Gauss-Seidel ones.

Jacobi method: this approach is recovered when D is a diagonal matrix of the type:        a11 0 0 . . . 0 0 a22 0 . . . 0 0 0 a33 . . . 0 .. . ... ... . .. ... 0 0 0 . . . ann       

which can be easily inverted if aii 6= 0; if L∞ norm is choosen a sufficient (but not necessary) condition is obtained as following:

k I − D−1A k∞= max 1≤i≤n n X j=1j6=i aij aii < 1

where aij is a generic element of the matrix A, while a1

ii is a generic element of the

matrix D−1. Exploiting some algebra, one recover the so called diagonal dominance once that the maximum of the ratio is provided:

|aii| > n X

j=1j6=i

|aij| ∀i

The latter is the sufficient condition for convergence.

To implement the method it is sufficient to recover te x as we have seen above, xi = −D−1Cx + D−1b, that is in Jacobi’s case:

xi = 1 aii bi− n X j=1 j6=i aijxj ! , i = 1, . . . , n where Pnj=1j6=iaijxj !

represent the element of the matrix C, such that D + C = A (indeed, the element of C are all the elements of A except the diagonal, which is D).

(55)

2.3. Solving systems of linear equations 31 This leads immediately to the iteration scheme:

x(k+1)i = 1 aii bi− n X j=1 j6=i aijx(k)j ! , i = 1, . . . , n

The iterations are usually stopped when "symptoms of convergence"(which will happen because we imposed convergence) are manifested, that is when the differ-ence between x(k+1) and x(k) is so tiny that can be considered negligible, meaning that we have almost reached the true solution.

Having, hence, fixed a tollerance parameter , the algorithm can be stopped when k x(k+1)− x(k)k<  k x(k)k

It is important to undeline that the absence of diagonal dominance does not nec-essarily imply divergence. Furthermore, the importance of spectral radius in this approach is fundamental: empirically, it is easy to see, in fact, that the speed of con-vergence to the true solution is higher the lower is the spectral radius, and so the norm, of the matrix B (that we choose).

Indeed, there exist methods aimed at shifting the eigenvalues of the matrix B to speed up convergence.

Hence, at this point, it is convenient to show how, practically, Jacobi’s method ap-plies.

MATLAB function has its origin directly from the general case x(k+1) = −D(−1)Cx(k)+ D−1bin which D is setted as the diagonal of A. The algorithm stops when ||xk+1− x(k)|| < ||x(k)||;  is arbitrarely choosen.

Here are provided applications in both cases: in the first one, A is characterized by diagonal dominance and the exact solution, simply recovered by doing A−1b coin-cides with the Jacobi’s one. In the second example, instead, no diagonal dominance is present and divergence occurs. A careful reader, hower, should pay attention to the fact that, in general, the lack of diagonal dominance does not necessarily imply divergence (as said above, it is a sufficient but not necessary condition).

The following are MATLAB codes implemented:

1 function [x,i] = Jacobi (A,b,x0,eps,MaxIter)

2 dA = diag(A); 3 C = A - diag(dA); 4 Dinv = diag((1./dA)); 5 B = -Dinv * C; 6 b1 = Dinv * b; 7 oldx = x0; 8 for i = 1 : MaxIter 9 x = B * oldx + b1; 10 if norm(x-oldx) < eps*norm(oldx) 11 break; 12 end 13 oldx = x; 14 15 end 16 17 %%%%%%%%%%%%%FIRST EXAMPLE%%%%%%%%%%%%%%%%% 18 %A = [ 5 -1 2; 3 8 -2; 1 1 4] 19 %b = [ 12 -25 6]' 20 %eps = 1e-08; 21 %MaxIter = 1000;

Riferimenti

Documenti correlati

Corso di Laurea Sp ecialistic a in Ingegneria

Corso di Laurea Sp ecialistic a in Ingegneria

Zhang: Classification and existence of positive solutions of higer order nonlinear iterative functional differential equations, J.. Pascali: An existence theorem for self-referred

In this note, we review several methods for solving boundary-value prob- lems (mainly, the Dirichlet problem) for nonlinear equations of mathematical physics.. Then we compare

In the last decades, human-induced alterations of the natural morpho-hydrological characteristics have altered the transport-deposition cycle in many rivers, so that

Così la villa d'Al - ghero diventa un ridotto economico e politico di primaria importanza: strategica è la sua po - sizione nel nord della Sardegna e i suoi mari ( da

Write a program which given two lists L1 and L2 of integers decides if all elements in the first list are present in the second one.

the infected person or animals were the obvi- ous vehicles of the infection, carriers of some kind of “poison.” i believe that the elchasaites, who were aware of the contagious