Identiﬁcation in Practice

(1)

Systems and Control Theory Lecture Notes

Laura Giarr´ e

(2)

Lesson 22: Identiﬁcation in Practice, Model Validation

Identiﬁcation in Practice

Model Validation

(3)

Identiﬁcation in Practice

What do we know?

We know methods for identifying models inside a priori given model structures.

How can we use this knowledge to provide a model for the plant, the process noise, with reasonable accuracy?

Pretreatment of data: - Remove the bias (may not be due to inputs) - Filter the high frequency noise

Introduce ﬁltered errors. Emphasize certain frequency range.

(4)

The Identiﬁcation Experiment

WƌŝŽƌŝ/ŶĨŽƌŵĂƚŝŽŶ

ůĂƐƐŽĨ

ŵŽĚĞůƐ

^ĞůĞĐƚŝŽŶ

ƌŝƚĞƌŝĂ

ĞƐŝŐŶƚŚĞ

ǆƉĞƌŝŵĞŶƚ

ůŐŽƌŝƚŚŵƚŽĞƐƚŝŵĂƚĞ

ƚŚĞŵŽĚĞů

ŽůůĞĐƚĚĂƚĂ s;ǌ͕TͿ

D

sĂůŝĚĂƚŝŽŶ P;TͿ

EK

z^

΂ǌ΃

(5)

Parametric Estimate

Constrained minimum

min θ V (θ) = min

θ∈Θ

1 N

n k=1

ε ² (k|θ) ˆθ = arg min

θ∈Θ V (θ)

ARX:

ε(k|θ) = y(k) − φ ^T (k)θ

∂V

∂θ = −2 N

n k=1

(y(k) − φ ^T (k)θ) ²

∂ ² V

∂θ ² = −2 N

n k=1

(φ(k)φ ^T (k)) ≥ 0

(6)

LS algorithm

Solution: LS algorithm

(φφ ^T )θ = φy

Then

ˆθ = (

φφ ^T ) ⁻¹ φy

Notice that if we multiply by ¹

N the two members, then we have the sample estimation of the expected value:

ˆθ = (E[φφ ^T ]) ⁻¹ E [φy]

Moreover,

V (θ) = 1 N

(ε(k|θ) ² )

represents the sample mean value and it converges (for

N → ∞) to E[ε(k|θ) ² ]

(7)

Identiﬁability

Structural Identiﬁability: A class of models

M = {μ(θ) : θ ∈ Θ ⊂ R ⁿ } is structurally identiﬁable in θ 0 if {∀θ ∈ Θ : G(z, θ) = G(z, θ ₀ ), H(z, θ) = H(z, θ ₀ )} then θ = θ ₀

Theorem: A class of model M = {μ(θ) : θ ∈ Θ ⊂ R ⁿ } is

structurally identiﬁable in θ 0 if G (z, θ 0 ) and H(z, θ 0 ) do not

present pole/zero cancellation.

(8)

Identiﬁability

Experimental Identiﬁability is guaranteed when R = E[φ(k)φ ^T (k)] is invertible (deﬁnite positive).

Theorem: Considering a generic regressor, a white process as

input is guaranteeing experimental identiﬁability.

(9)

Persistent Excitation

Deﬁnition:

A quasi-stationary input, u, is persistently exciting (p.e.) of order n if the matrix

¯R _n =

R _u (0) . . . R _u (n − 1) R _u (n − 1) . . . R _u (0)

is positive deﬁnite.

Theorem:

Let u be a quasi-stationary input of dimension n _u , with spectrum Φ _u (ω). Assume that Φ _u (ω) > 0 for at least n distinct frequencies. Then u is p.e. of order n.

Theorem (Scalar):

If u is p.e of order n → Φ _u (e ^jω ) = 0 for at least n-points.

(10)

Lemma

For an ARX system, and a regressor with past inputs and outputs φ(k) = [−y(k − 1) . . . − y(k − n a ) u(k − d) . . . u(k − d − n _b + 1)]

then R > 0 if

1) A,B do not have common roots (no hidden modes)

2) u (k) is PE of order n a + n _b = dim(φ) = dim(θ)

(11)

Identiﬁcation toolbox (LJUNG)

ARX(na,nb,nk):

where na is the number of parameters in A(z), nb is the number of parameters in B (z) and nk is the I/O delay (usually equals to 1)

ARMAX(na,nb,nc,nk):

where nc is the number of parameters in C (z)

OE(na,nf,nk)

BJ(nb,nc,nd,nf,nk)

(12)

Input Signals

Commonly used signals: Step function; Pseudorandom binary sequence (PRBS); Periodic signals: sum of sinusoids

Notion of suﬃcient excitation Conditions !

A Pseudorandom binary sequence: a periodic signal that switches between two levels in a certain fashion

Levels = ± on a period M

The spectrum of PRBS approximate WN as distributions.

(13)

Model structure selection

Pick a model structure (or model structures):

Which one is better?

How can you decide which one reﬂects the real system?

Is there any advantage from picking a model with a large number of parameters, if the input is exciting only a smaller number of frequency points?

What are the important quantities that can be computed

directly from the data (inputs & outputs), that are important

to identiﬁcation?

(14)

Model class selection problem

Find the model class whose optimal model is the Best one for our purpose.

Three steps:

1. A priori knowledge and data analysis

2. Comparing diﬀerent model structures

3. Residual analysis

(15)

A priori knowledge

Exploit physical knowledge on the system (Grey-Box modeling)

Analyze R _y ^N (τ) and R _yu ^N (τ) to understand if we need to insert

the therms y (k − τ) or u(k − τ) in the model.

(16)

Comparing diﬀerent model structures 1

Simplest idea: use the BEST FIT as a measure of the quality of the model:

V (ˆθ, Z ^N ) = 1 N

N t=1

(y(k) − ˆy(k|k − 1)) ²

Although V (ˆθ, Z ^N ), with ˆθ ∈ R ^d is not monotonically

decreasing in d , the knee of the curve V vs. d suggests a good trade oﬀ between ﬁt of the data (= predictive ability of the model) and the model complexity (= number of parameters)

Recall that Parsimony is always a speciﬁcation of the overall

design.

(17)

Comparing diﬀerent model structures 2

(18)

Comparing diﬀerent model structures 3

A more Quantitative approach: Account for the bias-variance trade oﬀ by optimizing a functional:

J (ˆθ) = V (ˆθ, Z ^N )(1 + U(d))

U (d) is a monotonically decreasing function of d.

Possible choices of U (d);

1. Akaike Information criterion (AIC):

U (d) = 2d/N 2. Minimum Description Length (MDL):

U (d) = log (N) N d

Use a longer prediction horizon and compute J _l = _N ¹ _N

k=1 (y(k) − y(k|k − l)) ²

Compute the simulation error: if ˆy _s (k) = G(z, ˆθ), then J _s =

N

(y(k) − ˆy _s (k)) ²

(19)

Comparing diﬀerent model structures 4

Compare diﬀerent model classes according to the following measure of FIT:

FIT = 100

1 − J

N 1

_N

k=1 (y(k) − ¯y) ²

where J is the functional used for the selection test, such as V , J _s or J _l

¯y = _N ¹ _N

k=1 y (k) is the average value.

Notice that for OE models J _s = V

(20)

Simulation diﬀerent than Validation tests

The set of data used for validation need to be diﬀerent than the set use for simulation :

Suppose two data sets are available (if not divide the set in two parts):

Z ₁ ^N = {u ₁ (1), y ₁ (1), . . . , u ₁ (N), y ₁ (N)}

Z ₂ ^N = {u ₂ (1), y ₂ (1), . . . , u ₂ (N), y ₂ (N)}

IDEA:

use Z ₁ ^N for Identiﬁcation

use Z ₂ ^N for Validation

(21)

Residual Correlation Analysis 1

Model class: y (k) = G(z)u(k) + H(z)e(k)

Assumption: there exist G _o (z) and H _o (z) such that the data Z ^N = {u(k), y(k)} ^N _k=1 satisfy:

y (k) = G o (z)u(k) + H o (k)e(k) with e ∼ WN (0, σ _e ² )

Then:

(k) =y(k) − y(k|k − 1)

= 1

H _o (z) y (k) − G _o (z)

H _o (z) u (k) (∗)

= 1

H _o (z) (G o (z)u(k) + H o (z)e(k)) − G _o (z)

H _o (z) u (k) (assumption)

=e(k)

(*) If the identiﬁed model is equal to the true one:

(22)

Residual Correlation Analysis 2

Then if the assumptions holds, the residuals is a white process with zero mean and variance σ _e ²

This means:

R (τ) =

σ _e ² (τ) τ = 0 0 τ = 0

Similarly if u (k) is a deterministic input:

R _u (τ) = E[ (k + τ)u(k)] = E[ (k + τ)]u(k) = 0 ∀τ

(23)

Residual Correlation Analysis 3

Key idea: Compute the sample variance function ˆ R ^N (τ) and ˆR _u ^N (τ) and check if they are compatible with the theoretical properties.

TEST:

For τ = 1, 2, . . . M check if ˆR ^N (τ) ∈ [−2.58 ^√ ^σ _N

^e²

2 .58 ^√ ^σ _N

^e²

] where σ _e ² = V (ˆθ, Z ^N ) = _N ¹

k=1 N ² (k)

A similar result holds for the cross-covariance ˆ R _u ^N (τ).

(24)

Examples

use two systems:

S1 :y(k) − 0.8y(k − 1) = u(k − 1) + e(k)

S2 :y(k) − 0.8y(k − 1) = u(k − 1) + e(k) − 0.8e(k − 1)

Objective: Study the mechanics of system identiﬁcation

Introduce linear regressions to minimize prediction error

Study the importance of selecting the correct model structure

Study the importance of selecting the input

Analysis: Is the theory consistent with the observations?

(25)

Thanks

DIEF- [email protected]

Tel: 059 2056322

Identiﬁcation in Practice

Systems and Control Theory Lecture Notes

Laura Giarr´ e

Lesson 22: Identiﬁcation in Practice, Model Validation

Identiﬁcation in Practice

Model Validation

Identiﬁcation in Practice

What do we know?

We know methods for identifying models inside a priori given model structures.

How can we use this knowledge to provide a model for the plant, the process noise, with reasonable accuracy?

Pretreatment of data: - Remove the bias (may not be due to inputs) - Filter the high frequency noise

Introduce ﬁltered errors. Emphasize certain frequency range.

The Identiﬁcation Experiment

Parametric Estimate

Constrained minimum

min θ V (θ) = min

θ∈Θ

1 N

n k=1

ε 2 (k|θ) ˆθ = arg min

θ∈Θ V (θ)

ARX:

ε(k|θ) = y(k) − φ T (k)θ

∂V

∂θ = −2 N

n k=1

(y(k) − φ T (k)θ) 2

∂ 2 V

∂θ 2 = −2 N

n k=1

(φ(k)φ T (k)) ≥ 0

LS algorithm

Solution: LS algorithm

(φφ T )θ = φy

Then

ˆθ = (

φφ T ) −1 φy

Notice that if we multiply by 1

N the two members, then we have the sample estimation of the expected value:

ˆθ = (E[φφ T ]) −1 E [φy]

Moreover,

V (θ) = 1 N

(ε(k|θ) 2 )

represents the sample mean value and it converges (for

N → ∞) to E[ε(k|θ) 2 ]

Identiﬁability

Structural Identiﬁability: A class of models

M = {μ(θ) : θ ∈ Θ ⊂ R n } is structurally identiﬁable in θ 0 if {∀θ ∈ Θ : G(z, θ) = G(z, θ 0 ), H(z, θ) = H(z, θ 0 )} then θ = θ 0

Theorem: A class of model M = {μ(θ) : θ ∈ Θ ⊂ R n } is

structurally identiﬁable in θ 0 if G (z, θ 0 ) and H(z, θ 0 ) do not

present pole/zero cancellation.

Identiﬁability

Experimental Identiﬁability is guaranteed when R = E[φ(k)φ T (k)] is invertible (deﬁnite positive).

Theorem: Considering a generic regressor, a white process as

input is guaranteeing experimental identiﬁability.

Persistent Excitation

Deﬁnition:

A quasi-stationary input, u, is persistently exciting (p.e.) of order n if the matrix

¯R n =

R u (0) . . . R u (n − 1) R u (n − 1) . . . R u (0)

is positive deﬁnite.

Theorem:

Let u be a quasi-stationary input of dimension n u , with spectrum Φ u (ω). Assume that Φ u (ω) > 0 for at least n distinct frequencies. Then u is p.e. of order n.

Theorem (Scalar):

If u is p.e of order n → Φ u (e jω ) = 0 for at least n-points.

Lemma

For an ARX system, and a regressor with past inputs and outputs φ(k) = [−y(k − 1) . . . − y(k − n a ) u(k − d) . . . u(k − d − n b + 1)]

then R > 0 if

1) A,B do not have common roots (no hidden modes)

2) u (k) is PE of order n a + n b = dim(φ) = dim(θ)

Identiﬁcation toolbox (LJUNG)

ARX(na,nb,nk):

where na is the number of parameters in A(z), nb is the number of parameters in B (z) and nk is the I/O delay (usually equals to 1)

ARMAX(na,nb,nc,nk):

where nc is the number of parameters in C (z)

OE(na,nf,nk)

BJ(nb,nc,nd,nf,nk)

Input Signals

Commonly used signals: Step function; Pseudorandom binary sequence (PRBS); Periodic signals: sum of sinusoids

Notion of suﬃcient excitation Conditions !

ε ² (k|θ) ˆθ = arg min

ε(k|θ) = y(k) − φ ^T (k)θ

(y(k) − φ ^T (k)θ) ²

∂ ² V

∂θ ² = −2 N

(φ(k)φ ^T (k)) ≥ 0

(φφ ^T )θ = φy

φφ ^T ) ⁻¹ φy

Notice that if we multiply by ¹

ˆθ = (E[φφ ^T ]) ⁻¹ E [φy]

(ε(k|θ) ² )

N → ∞) to E[ε(k|θ) ² ]

M = {μ(θ) : θ ∈ Θ ⊂ R ⁿ } is structurally identiﬁable in θ 0 if {∀θ ∈ Θ : G(z, θ) = G(z, θ ₀ ), H(z, θ) = H(z, θ ₀ )} then θ = θ ₀

Theorem: A class of model M = {μ(θ) : θ ∈ Θ ⊂ R ⁿ } is

Experimental Identiﬁability is guaranteed when R = E[φ(k)φ ^T (k)] is invertible (deﬁnite positive).

¯R _n =

R _u (0) . . . R _u (n − 1) R _u (n − 1) . . . R _u (0)

Let u be a quasi-stationary input of dimension n _u , with spectrum Φ _u (ω). Assume that Φ _u (ω) > 0 for at least n distinct frequencies. Then u is p.e. of order n.

If u is p.e of order n → Φ _u (e ^jω ) = 0 for at least n-points.

For an ARX system, and a regressor with past inputs and outputs φ(k) = [−y(k − 1) . . . − y(k − n a ) u(k − d) . . . u(k − d − n _b + 1)]

2) u (k) is PE of order n a + n _b = dim(φ) = dim(θ)

Analyze R _y ^N (τ) and R _yu ^N (τ) to understand if we need to insert

V (ˆθ, Z ^N ) = 1 N

(y(k) − ˆy(k|k − 1)) ²

Although V (ˆθ, Z ^N ), with ˆθ ∈ R ^d is not monotonically

J (ˆθ) = V (ˆθ, Z ^N )(1 + U(d))

Use a longer prediction horizon and compute J _l = _N ¹ _N

k=1 (y(k) − y(k|k − l)) ²

Compute the simulation error: if ˆy _s (k) = G(z, ˆθ), then J _s =

(y(k) − ˆy _s (k)) ²

_N

k=1 (y(k) − ¯y) ²

where J is the functional used for the selection test, such as V , J _s or J _l

¯y = _N ¹ _N

Notice that for OE models J _s = V

Z ₁ ^N = {u ₁ (1), y ₁ (1), . . . , u ₁ (N), y ₁ (N)}

Z ₂ ^N = {u ₂ (1), y ₂ (1), . . . , u ₂ (N), y ₂ (N)}

use Z ₁ ^N for Identiﬁcation

use Z ₂ ^N for Validation

Assumption: there exist G _o (z) and H _o (z) such that the data Z ^N = {u(k), y(k)} ^N _k=1 satisfy: