Systems and Control Theory Lecture Notes
Laura Giarr´ e
Lesson 22: Identification in Practice, Model Validation
Identification in Practice
Model Validation
Identification in Practice
What do we know?
We know methods for identifying models inside a priori given model structures.
How can we use this knowledge to provide a model for the plant, the process noise, with reasonable accuracy?
Pretreatment of data: - Remove the bias (may not be due to inputs) - Filter the high frequency noise
Introduce filtered errors. Emphasize certain frequency range.
The Identification Experiment
WƌŝŽƌŝ/ŶĨŽƌŵĂƚŝŽŶ
ůĂƐƐŽĨ
ŵŽĚĞůƐ
^ĞůĞĐƚŝŽŶ
^ĞůĞĐƚŝŽŶ
ƌŝƚĞƌŝĂ
ĞƐŝŐŶƚŚĞ
džƉĞƌŝŵĞŶƚ
ůŐŽƌŝƚŚŵƚŽĞƐƚŝŵĂƚĞ
ƚŚĞŵŽĚĞů
ŽůůĞĐƚĚĂƚĂ s;nj͕TͿ
D
sĂůŝĚĂƚŝŽŶ P;TͿ
EK
z^
nj
Parametric Estimate
Constrained minimum
min θ V (θ) = min
θ∈Θ
1 N
n k=1
ε 2 (k|θ) ˆθ = arg min
θ∈Θ V (θ)
ARX:
ε(k|θ) = y(k) − φ T (k)θ
∂V
∂θ = −2 N
n k=1
(y(k) − φ T (k)θ) 2
∂ 2 V
∂θ 2 = −2 N
n k=1
(φ(k)φ T (k)) ≥ 0
LS algorithm
Solution: LS algorithm
(φφ T )θ = φy
Then
ˆθ = (
φφ T ) −1 φy
Notice that if we multiply by 1
N the two members, then we have the sample estimation of the expected value:
ˆθ = (E[φφ T ]) −1 E [φy]
Moreover,
V (θ) = 1 N
(ε(k|θ) 2 )
represents the sample mean value and it converges (for
N → ∞) to E[ε(k|θ) 2 ]
Identifiability
Structural Identifiability: A class of models
M = {μ(θ) : θ ∈ Θ ⊂ R n } is structurally identifiable in θ 0 if {∀θ ∈ Θ : G(z, θ) = G(z, θ 0 ), H(z, θ) = H(z, θ 0 )} then θ = θ 0
Theorem: A class of model M = {μ(θ) : θ ∈ Θ ⊂ R n } is
structurally identifiable in θ 0 if G (z, θ 0 ) and H(z, θ 0 ) do not
present pole/zero cancellation.
Identifiability
Experimental Identifiability is guaranteed when R = E[φ(k)φ T (k)] is invertible (definite positive).
Theorem: Considering a generic regressor, a white process as
input is guaranteeing experimental identifiability.
Persistent Excitation
Definition:
A quasi-stationary input, u, is persistently exciting (p.e.) of order n if the matrix
¯R n =
R u (0) . . . R u (n − 1) R u (n − 1) . . . R u (0)
is positive definite.
Theorem:
Let u be a quasi-stationary input of dimension n u , with spectrum Φ u (ω). Assume that Φ u (ω) > 0 for at least n distinct frequencies. Then u is p.e. of order n.
Theorem (Scalar):
If u is p.e of order n → Φ u (e jω ) = 0 for at least n-points.
Lemma
For an ARX system, and a regressor with past inputs and outputs φ(k) = [−y(k − 1) . . . − y(k − n a ) u(k − d) . . . u(k − d − n b + 1)]
then R > 0 if
1) A,B do not have common roots (no hidden modes)
2) u (k) is PE of order n a + n b = dim(φ) = dim(θ)
Identification toolbox (LJUNG)
ARX(na,nb,nk):
where na is the number of parameters in A(z), nb is the number of parameters in B (z) and nk is the I/O delay (usually equals to 1)
ARMAX(na,nb,nc,nk):
where nc is the number of parameters in C (z)
OE(na,nf,nk)
BJ(nb,nc,nd,nf,nk)
Input Signals
Commonly used signals: Step function; Pseudorandom binary sequence (PRBS); Periodic signals: sum of sinusoids
Notion of sufficient excitation Conditions !
A Pseudorandom binary sequence: a periodic signal that switches between two levels in a certain fashion
Levels = ± on a period M
The spectrum of PRBS approximate WN as distributions.
Model structure selection
Pick a model structure (or model structures):
Which one is better?
How can you decide which one reflects the real system?
Is there any advantage from picking a model with a large number of parameters, if the input is exciting only a smaller number of frequency points?
What are the important quantities that can be computed
directly from the data (inputs & outputs), that are important
to identification?
Model class selection problem
Find the model class whose optimal model is the Best one for our purpose.
Three steps:
1. A priori knowledge and data analysis
2. Comparing different model structures
3. Residual analysis
A priori knowledge
Exploit physical knowledge on the system (Grey-Box modeling)
Analyze R y N (τ) and R yu N (τ) to understand if we need to insert
the therms y (k − τ) or u(k − τ) in the model.
Comparing different model structures 1
Simplest idea: use the BEST FIT as a measure of the quality of the model:
V (ˆθ, Z N ) = 1 N
N t=1
(y(k) − ˆy(k|k − 1)) 2
Although V (ˆθ, Z N ), with ˆθ ∈ R d is not monotonically
decreasing in d , the knee of the curve V vs. d suggests a good trade off between fit of the data (= predictive ability of the model) and the model complexity (= number of parameters)
Recall that Parsimony is always a specification of the overall
design.
Comparing different model structures 2
Comparing different model structures 3
A more Quantitative approach: Account for the bias-variance trade off by optimizing a functional:
J (ˆθ) = V (ˆθ, Z N )(1 + U(d))
U (d) is a monotonically decreasing function of d.
Possible choices of U (d);
1. Akaike Information criterion (AIC):
U (d) = 2d/N 2. Minimum Description Length (MDL):
U (d) = log (N) N d
Use a longer prediction horizon and compute J l = N 1 N
k=1 (y(k) − y(k|k − l)) 2
Compute the simulation error: if ˆy s (k) = G(z, ˆθ), then J s =
N
(y(k) − ˆy s (k)) 2
Comparing different model structures 4
Compare different model classes according to the following measure of FIT:
FIT = 100
1 − J
N 1
N
k=1 (y(k) − ¯y) 2
where J is the functional used for the selection test, such as V , J s or J l
¯y = N 1 N
k=1 y (k) is the average value.
Notice that for OE models J s = V
Simulation different than Validation tests
The set of data used for validation need to be different than the set use for simulation :
Suppose two data sets are available (if not divide the set in two parts):
Z 1 N = {u 1 (1), y 1 (1), . . . , u 1 (N), y 1 (N)}
Z 2 N = {u 2 (1), y 2 (1), . . . , u 2 (N), y 2 (N)}
IDEA:
use Z 1 N for Identification