• Non ci sono risultati.

Least Squares Estimation

N/A
N/A
Protected

Academic year: 2021

Condividi " Least Squares Estimation"

Copied!
11
0
0

Testo completo

(1)

Systems and Control Theory Lecture Notes

Laura Giarr´ e

(2)

Lesson 2: Least Squares Estimation

 Least Squares Estimation

 Computing the LS

 Projection Theorem

(3)

Least Squares Estimation

 Consider a system of m equations in n unknown, with m > n, of the form

y = Ax.

 Assume that the system is inconsistent: more equations than unknowns, equations not linear combinations of one another.

 In these conditions, there is no x such that y − Ax = 0.

However, one can write e = y − Ax, and find x that minimizes e.

 This is a least squares problem min x e = min

x y − Ax 2 2 .

The optimal x is the least squares estimate.

(4)

Computing the Least Square Estimate (LS)

 The set M : {z ∈ R m : z = Ax, x ∈ R n } is a subspace of R m , called the range of A, R(A), i.e., the set of all vectors

obtained by linear combinations of the columns of A.

 Now we are looking for the element of M that is closest to y, in 2-norm. The solution is:

e = (y − Aˆx)⊥R(A).

 In particular, if a i is the i-th column of A, then

(y − Aˆx)⊥R(A) ⇔a  i (y − Aˆx) = 0, for i = 1, ..., n A  (y − Aˆx) = 0

A  Aˆx = A  y.

 Then the LS solution is

ˆx = (A  A) −1 A  y.

(5)

The Gram product

 Let us take a more abstract look at this problem, e.g., to

address the case that the data vector y is infinite-dimensional.

 Given an array of n A vectors A = [a 1 |...|a n

A

], and an array of n B vectors B = [b 1 |...|b n

B

], both from an inner vector space V , define the Gram Product ≺ A, B  as a n A × n B matrix such that its (i, j) entry is a i , b j .

 For the usual Euclidean inner product in an m-dimensional space,

≺ A, B = A  B.

 Symmetry and linearity of the inner product imply symmetry

and linearity of the Gram product.

(6)

The Least Squares Estimation Problem

 Consider again the problem of computing

x∈R min

n

y − Ax = min

y∈R(A) y − ˆy 2

 y can be an infinite-dimensional vector, as long as n is finite.

 We assume that the columns of A = [a 1 , a 2 , ..., a n ] are independent.

Lemma (Gram matrix) The columns of a matrix A are independent

⇔ ≺ A, A  is invertible.

(7)

Proof of Lemma

 If the columns are dependent, then there is η = 0 such that Aη = 

j a j η j = 0. But then j a i , a j η j = 0 by the linearity of inner product.

That is, ≺ A, A  η = 0, and hence ≺ A, A  is not invertible.

 Conversely, if ≺ A, A  is not invertible, then ≺ A, A  η = 0 for some η = 0.

In other words η  ≺ A, A  η = 0, and hence Aη = 0.

(8)

The Projection Theorem and Least Squares Estimation 1

 y has a unique decomposition y = y 1 + y 2 , where y 1 ∈ R(A), and y 2 ∈ R (A).

 To find this decomposition, let y 1 = Aα, for some α ∈ R n . Then, ensure that y 2 = y − y 1 ∈ R (A). For this to be true,

a i , y − Aα = 0, i = 1, ..., n, i.e.,

≺ A, y − Aα = 0.

 Rearranging, we get ≺ A, A  α =≺ A, y 

 If the columns of A are independent,

α =≺ A, A  −1 ≺ A, y 

(9)

The Projection Theorem and Least Squares Estimation 2

 Decompose e = e 1 + e 2 ( e 1 ∈ R(A), and e 2 ∈ R (A)).

 Note that e = e 1  + e 2 .

 Rewrite e = y − Ax as

e 1 + e 2 = y 1 + y 2 − Ax, i.e.,

e 2 − y 2 = y 1 − e 1 − Ax.

 Each side must be 0, since they are on orthogonal subspaces!

 e 2 = y 2 – can’t do anything about it.

 e 1 = y 1 − Ax = A(α − x) - minimize by choosing x = α.

ˆx =≺ A, A  −1 ≺ A, y 

(10)

Examples

 If y, e ∈ R m , and it is desired to minimize

e 2 = e  e =  m

i=1 |e i | 2 , then ˆx = (A  A) −1 A  y.

(If the columns of A are mutually orthogonal, A  A is diagonal, and inversion is easy)

 If y, e ∈ R m , and it is desired to minimize e  Se, where S is a Hermitian, positive-definite matrix, then ˆx = (A  SA) −1 A  Sy.

 Note that if S is diagonal, then e  Se =  m

i=1 s ii |e i | 2 , i.e., we are minimizing a weighted least square criterion. A large s ii penalizes the i th component of the error more.

 In a stochastic setting, the weight matrix S should be related to the noise covariance, i.e.,

S = (E[ee  ]) −1

(11)

Thanks

DIEF- laura.giarre@unimore.it

Riferimenti

Documenti correlati

We discuss the Caratheodory-Tchakalo (CATCH) subsampling method, implemented by Linear or Quadratic Programming, for the compression of multivariate discrete measures and

Chapter 5 investigates how event-triggered communication from the controller to the actuator allows to reduce the network traffic (and, indirectly, the energy con- sumption of

 We will learn how to design systems that ensure desirable properties, and how to identify parameters from collected

Systems and Control Theory Lecture Notes.. Laura

A state-space realization of a SISO transfer function H is minimal iff it is reachable and observable.

Transform the diagram in the normal form making use of the rules of reduction 3.. Add algebraically all the responses from

• The optimal values of the control parameters minimize the integral criteriay. • The 3 most used criteria are

The idea of reduction/compression of a finite measure by Tchakaloff or directly Caratheodory theorem recently arose in different contexts, for example in a probabilistic setting