Bilinear regression - Facoltà di Ingegneria

and then, solving the system of two equations in two unknowns:









 β1=

�m i=1

xiyi− y

�m i=1

x²_i − x

�m i=1

β2= y − β¹x

(D.9)

where x and y represent the mean value of x and of y respectively.

D.4 Bilinear regression

B

ilinear regression stands for the regression analysis approach applied to a bilinear function, a function where a dependent variable z depends over two diﬀerent indepen-dent parameters (e.g. x and y) through a non-linear equation. The function it is linear along the x axis and along the y axis but generally it is not linear otherwise. The equation is as follows:

z(x, y) = β1x + β2y + β3xy + β4 (D.10) In order to define the four parameters βj, j ∈ [1, 4] there are two major ways. One is by the use of Gauss-Jordan elimination (see Appendix E.2.1) and the other is by the use of QR factorization (see Appendix E.3). Considering a solution based on Gauss-Jordan elimination it is needed again to calculate the partial derivatives of χ² according to the four parameters, considering all m data points:











∂χ²

∂β1 = 2 ·

�m i=1

(zi− β¹xi− β²yi− β³xiyi− β⁴) (−xi) = 0

∂χ²

∂β2 = 2 ·

�m i=1

(zi− β¹xi− β²yi− β³xiyi− β⁴) (−yi) = 0

∂χ²

∂β3 = 2 ·

�m i=1

(zi− β¹xi− β²yi− β³xiyi− β⁴) (−xiyi) = 0

∂χ²

∂β4 = 2 ·

�m i=1

(zi− β¹xi− β²yi− β³xiyi− β⁴) (−1) = 0

(D.11)

In this situation the algebraical solution of the system is quite complicated and so practically inconvenient to get. Thus it is better to define the system of Equation (D.11) by a matrix form, namely:

A· β = a (D.12)

where β ∈ R⁴^×1 is the vector of the four unknowns parameters while A ∈ R⁴^×4 and a ∈ R⁴^×1

D.4. BILINEAR REGRESSION

are defined as follows:







�m i=1

x²_i

�m i=1

xiyi

�m i=1

x²_iyi

�m i=1

y_i²

�m i=1

xiy²_i

�m i=1

x²_iy²_i

�m i=1

xiyi

sym. m











 β1

β2

β3

β4





=







�m i=1

xizi

�m i=1

yizi

�m i=1

xiyizi

�m i=1







(D.13)

Now an approach as described in Appendix E.2.1 will bring to the calculation of the unknowns.

On the other hands, for a solution based on the QR factorization it is needed a system of equations defining each constrain defined by each point to be considered:







x1 y1 x1y1 1 x2 y2 x2y2 1 ... ... ... ...

xi yi xiyi 1 ... ... ... ...

xm ym xmym 1











 β1

β2

β3

β4





=





 z1

...







(D.14)

Equation (D.14). where the first matrix (i.e. B) is m × 4 and the vector of constant terms (i.e. Z) has m entries, where usually m > 4. Thus in this situation the matrix of coeﬃcients is not a square matrix, and in particular has more rows than columns. In this situation the equivalent system is overdetermined as has more constrain than degree of freedom (unknowns).

The solution in general doesn’t exist, but can be found a “best solution” in a statistical meaning, namely the best set of values for βj to have the bilinear function closer to the cloud of points (xi, yi, zi) , i ∈ [1, m].

To perform this task is necessary to invert the matrix B but as aforementioned this can-not be done if can-not by a generalized concept of inverse matrix and especially thanks to some factorization approach, that are more numerically stable than a normal pseudoinverse matrix (see Appendix D.2). After factorization of B it will be possible to find easily the solution of Equation (D.14).

An example of bilinear regression can be seen in Figure D.2 where 49 data points are fitted by a bilinear function. To perform that regression a simple Octave function has been written in order to find the best fit of the points. The function creates the matrix B and the vector Z, filling by random generated numbers. At the end the pseudoinverse³ of the matrix B was used

3This approach has been used for simplicity and because is easy to find a stable matrix, but it cannot be normally used

D.4. BILINEAR REGRESSION

Figure D.2: Example of bilinear regression of data points

to calculate the solution of the regression:

function b = b i l i n e a r R e g r e s s i o n ( numberPoints ) B = ones ( numberPoints , 4 ) ;

B( : , 1) = rand ( ) ; % x c o o r d i n a t e s g e n e r a t i o n B( : , 2) = rand ( ) ; % y c o o r d i n a t e s g e n e r a t i o n B( : , 3) = B( : , 1) . ∗ B( : , 2 ) ; % x ∗ y

z = rand( numberPoints , 1 ) ; % z c o o r d i n a t e s g e n e r a t i o n

b = pinv (B) ∗ z ; % B ∗ b = z

Appendix

E

Matrix Algebra

E.1 Introduction

I

n mathematics a matrix it is a rectangular array of numbers [117] organized in rows and columns thus each entry of the array is denoted by a double index subscript: ai,j is the entry of the matrix A at the i^th row and the j^th column. A matrix where one of the dimensions (rows or columns) is equal to unity is normally called vector. Generally a vector in a m× 1 space is called column vector and a vector in a 1 × n space is called row vector. Generally speaking, a scalar can be thought as a 1 × 1 matrix. A matrix where the number of rows equals the number of columns is called square matrix. In this book the matrices will be represented by upper-case (or sometimes lower-case) letter and a double underline, while vectors will be represented by lower-case letters and single underline. The single entry of matrix or vector will be always represented by lower-case letter and no underline.

Matrices can be subdue to diﬀerent operations (called matrix operations) like matrix addition, scalar multiplication and transposition.

Definition 4. The operation of addition over two matrices A, B ∈ R^m^×n gives a matrix C ∈ R^m^×n where:

ci,j= ai,j+ bi,j, i∈ [1, m], j ∈ [1, n]

Definition 5. The operation of scalar multiplication of a matrix A ∈ R^m^×n and a scalar value k gives a matrix C ∈ R^m×nwhere:

ci,j= k · ai,j, i∈ [1, m], j ∈ [1, n]

Definition 6. The operation of transposition of a matrix A ∈ R^m×n(denoted often by A^T) gives a matrix B ∈ Rⁿ^×m where:

bj,i= ai,j, i∈ [1, m], j ∈ [1, n]

It is also possible to define other operations with matrices, like the matrix multiplication.

E.1. INTRODUCTION

Definition 7. The operation of multiplication of a matrix A ∈ R^m^×n with a matrix B ∈ Rⁿ^×s gives a matrix C ∈ R^m×s where:

ci,j=

�n k=1

ai,k· b^k,j, i∈ [1, m], j ∈ [1, s]

Matrix multiplication is has associativity and distributivity but is not commutative (i.e.

A· B �= B · A) except when at least one of the two matrices is the identity matrix.

A matrix is called symmetric if it is true that A^T = A and so it follows that only square matrices can be symmetric. A symmetric matrix A ∈ R^m×m is called positive defined if for all non-zero vectors v ∈ R^mthe following expression is true:

p = v^T · A · v > 0 (E.1)

For square matrices A ∈ R^m^×m may exist a square matrix B ∈ R^m^×mthat multiplied to the previous one gives the identity matrix, namely:

A· B = I (E.2)

This special matrix B is commonly simply indicated by A⁻¹. The existence of this matrix is not assured and it depends on special values of its entries. In particular a matrix has an inverse if and only if it is not singular and so if its determinant is not zero. To calculate it, it is possible to use the Leibniz formula, defined by the following:

det�

A�= �

σ∈Sm

�

sgn (σ)·

�m i=1

ai,σ(i)

�

(E.3)

where σ is one of the m! permutations of the numbers in the set [1, . . . , m], that are all grouped in the set of permutations Sm. The sign of σ depends on the number of switches between numbers must be applied to the permutation to have the original ordered list of numbers. If this number of switches is odd the sign will be negative, if the number of switches is even the sign will be positive.

It is also possible to define the determinant by expansion of a row (or column) by the Laplace formula. Selecting the i^throw the formula will be:

det� A�

�m j=1

(−1)î+j· aî,j· dî,j (E.4)

where di,j is the determinant of the matrix obtained from A by removing the i^throw and the j^th column. This formula is very easy and so often used in simple applications or where the matrices are relatively small. It is anyway proved that the time needed for its calculation gets very high

E.1. INTRODUCTION

with the dimension of the matrix. For matrices bigger than 3×4 it is faster to express (if possible) the matrix by a multiplication of lower triangular matrix (L) and an upper triangular matrix (U) by LU factorization (see Appendix E.4). In fact in that case the determinant will be:

det�

A�= det� L�

· det�

U�= det�

U� (E.5)

as det�

L�= 1. Computing the determinant of U is fast as the determinant of a triangular matrix is equalt to the product of the entries on the main diagonal.

For small matrices here is presented the direct expressions of the determinant:

det

��a1,1 a1,2

a2,1 a2,2

��

=a1,1· a^2,2− a^1,2· a^2,1 (E.6)

det









a1,1 a1,2 a1,3

a2,1 a2,2 a2,3

a3,1 a3,2 a3,3







 =a^1,1· a^2,2· a^3,3+ a1,2· a^2,3· a^3,1+ a1,3· a^3,2· a^2,1+ (E.7)

− a^1,1· a^2,3· a^3,2− a^1,2· a^2,1· a^3,3− a^1,3· a^2,2· a^3,1 (E.8) It is obvious from Equation (E.3) that the number of monomials must be added (and sub-tracted) to get the determinant is exactly m! and so it is growing very fast with the dimension of the matrix.

For example few tests performed on a computer AMD Athlon^TM64 3000+ with 1.81 GHz processor and 2 GByte of RAM gave these results shown in Table E.1.

Table E.1: Speed test on two diﬀerent algorithm to calculate the determinant of a square matrix

matrix dimension time [s]

Laplace LU factorization

1 0.0000022 0.0000042

2 0.0000017 0.0000064

3 0.000010 0.000011

4 0.000023 0.000017

5 0.00011 0.000025

6 0.00069 0.000035

7 0.0052 0.000048

8 0.038 0.000064

9 0.35 0.00011

10 3.5 0.00013

11 38.7 0.00018

12 465.0 0.00022

The results are only shown to present the big diﬀerence between the two approaches. The Laplace algorithm is obviously getting slower much faster than the determinant calculated by LU

Nel documento Facoltà di Ingegneria (pagine 185-191)