Articolo (molto tecnico)

(1)

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 36. NO, 3, MARCH 1991 259

Paradigms and

Puzzles in the Theory of

Dynamical Systems

Jan C. Willems,

Fellow, IEEE

Was ich hier geschrieben habe, macht im Einzelnen uberhaupt nicht die Anspruch auf Neuheit

(Wiltgenstein, Tractatus).

theory of dynamical systems. The basic ingredients form a triptych, with Abstract-We will outline the main features of a framework for a the behavior of a system in the center, and behavioral equations and

latent variables as side panels. We will discuss a variety of representa- tion and parametrization problems, in particular, questions related to input/output and state models. It is shown how the interconnection of systems fits into this framework, in particular, problems of feedback control. The final issue to which we pay attention is that of system identification. It is argued that exact system identification leads to the question of computing the most powerful unfalsified model.

T

I.

INTRODUCTION

HE purpose of this article is to give a self-contained exposition of an approach to mathematical models, in particular, to a theory of dynamical systems. This approach leads to an appealing triptych, with the behavior of a system playing the lead role in center stage, and with behavioral equations and

latent variables as the important supporting characters. The framework presented incorporates problems of representation, questions of parametrization, and procedures for identification: algorithms for obtaining models from observed data. Our concept of a dynamical system leads to a new view of the notions of controllability and observability, and of the interconnection of systems, in particular, to what constitutes a feedback control law.

We will focus on dynamical systems because of their obvious importance in control problems. However, we will introduce the concepts in their exquisite simplicity and, therefore, full generality. Also, we will consciously avoid problems of a technical mathematical nature and concentrate instead on mathematical language, on issues related to system stmcture. This has as one consequence that the emphasis will be on discrete-time dynamical systems. Also, because we want to demonstrate that all this can lead to more than idle generality, we will present many specific results for discrete-time linear time-invariant systems. Our ambition, our hope, our conviction is that the approach and the framework presented here will prove useful and find accep- tance on a very elementary pedagogical level. Indeed, the van- tage point taken and the circle of ideas revealed here show more than has been realized before the unity which exists between elementary mathematical concepts and the first principles underlying mathematical models and dynamical systems.

General

Models

Mathematical models as we encounter them in practice may be expressed by ordinary or partial differential equations, they may involve the language of graphs or lattice diagrams, or

recommended by Past Associate Editor at Large, J. Baillieul. This work was

Manuscript received October 10, 1989; revised July 31, 1990. Paper supported by the EEC under Contract SC1-0433-C(A).

The author is with the Mathematics Institute, University of Groningen, 9700 Av Groningen, The Netherlands.

IEEE Log Number 9042134.

require the notion of a transfer function or a formal language. It may seem hard to find a common denominator in all this. A perceptive observation, one which can be attributed to control theory, is to look at systems, at devices, as black boxes. Thus, instead of trying to understand, in the tradition of physics, how a device is put together and the detail of how its components work, we are told to concentrate on how it behaves, on the way in which it interacts with its environment. It is this black box point of view which we will formalize in its ultimate generality. However, we will back

off

from the usual input/output setting, from the processor point of view, in which systems are seen as being influenced by inputs, acting as causes, and producing outputs through these inputs, the internal initial conditions, and the system dynamics. We will retreat from this position by considering all variables a priori on an equal footing and develop the input/output framework as a special case which in many situations can actually be deduced from the original model.

We proceed as follows. Assume that we have a phenomenon which we want to model. To start with, we cast the situation in the language of mathematics by assuming that the phenomenon

produces elements in a set

U

which we will call the universum.

Elements of

U

will be called the outcomes of the phenomenon.

Now, a (deterministic) mathematical model for the phenomenon (viewed purely from the behavioral, the black box point of view) claims that certain outcomes are possible, while others are not. Hence a model recognizes a certain subset

23

of M. This subset will be called the behavior (of the model). Formally:

Definition I.1: A mathematical model is a pair (U,

8)

with

U

the universum-its elements are called outcomes-and 8

the behavior.

Example: The fact that the absolute temperature is nonnegative may be viewed as modeling the phenomenon temperature.

Thus U = R and $3 = [ -273, CQ) (with the temperature ex- pressed in 'C).

Example: During the ice age, shortly after Prometheus stole fire from the Gods, man realized that

H,O

could appear, depending on the temperature, as water, steam, or ice. It took a while longer before this situation was captured in a mathematical model. The generally accepted model is as follows:

W =

{ice, water, steam} X

[

- 273,

CQ)

and

23

= ({ice} X

[

-273,Ol) U {water} X [0, 1001 u({steam} x

[IOO,CQ]).

Example: Economists believe that there exists a relation between the production P of a particular economic resource, the capital

K

invested in the necessary infrastructure, and the labor

L

expended towards its production. A typical model will look like:

M

= R:, and

23

= { ( P , K , LjER:I

P

= F ( K , Lj}, 0018-9286/91/0300-0259$01.00 01991 IEEE

(2)

2M) IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 36. NO. 3, MARCH 1991 where F :

Wl+

R+ is the production junction. Typically, F :

( K , L ) - C U K ~ L ? with

c r , p , r ~ W + ,

0 < @ < 1 , O S - ~ S I

all constant parameters depending on the technology.

Behavioral Equations

In applications, models are more often than not described by equations.

Definition 1.2: Let

QJ

be a universum,

E

an abstract set, called the equating spaces, and j l

,

j z :

U + R. The mathemati-

cal model

(U,

%) with 8 = { u e U

1

f l ( u ) = f 2 ( u ) } is said to be described by

( a )

behavioral equation(s), and will be denoted as (@,

E,

f , , f 2 ) a behavioral equation representation

The best way of looking at the behavioral equations f l ( u ) =

j 2 ( u ) is as equilibrium conditions: the behavior 8 consists of those attributes for which two (sets of) quantities are in balance.

A few remarks are in order. First, in many applications models will be described by behavioral inequalities: simply take in the aforementioned definition I to be an ordered space and consider the behavioral inequality f , ( u ) 5 j 2 ( u ) . Many models in operations research (e.g., in linear programming) and in economics are of this nature. Second, every model can trivially be considered as coming from behavioral equations. Simply take I to be any set containing 0 and let

j

be any map from

U

to

E

such that

23

= { y E

U

1

j ( u ) = 0 ) . Interesting

representation questions occur when we want ( f , , fi) to have a concrete and appealing form or when we want the behavioral equations to be a set of equations, each allowing a natural and convincing interpretation, bringing into evidence such things as balance equations, conservation laws, simple constitutive laws, field equations, etc. Third, and most importantly, whereas equations uniquely specify the behavior, the converse is obviously not true. Since we have a tendency to think of mathematical models in terms of equations, most models being presented in the form, it is important to emphasize their ancillary role: it is the behavior, the

solution

set of the behavioral equations, not the behavioral equations themselves, which is the essen- tial result of a modeling procedure.

Input-Output

Models

In most cases, the universum &! will be a product set @ =

n

U,

(with A a suitably chosen index set formalizing the different attributes of elements of the universum) and hence the behavior % _C

n

Ua

becomes a relation between the variables

U , E

U,,

( Y E A . All further mathematical structure on product sets, as maps, functions, graphs, resulting in a relation is thus in principle applicable in order to further structure mathematical models. An important special case is recovered by assuming that

U

=

0

X @ and that

23

is the graph of a map. Formally;

Definition 1.2: A mathematical model

(0

x

0,

B )

with

23

the graph of a map F :

0

+ Q is called an I / 0 map, denoted as

(O,@,

F ) ;

0

is called the input space, Q is called the

output

space, and F is called the system map.

In an 1/0 map it is possible to interpret the attributes in

0

as

causing the attributes in Q. Note that an 1/0 map can be viewed as being described by the behavioral equation y = F ( u j .

Many other properties of mathematical models can be nicely cast in the framework presented, for example, linearity, lin- earization, symmetry, variational principles, etc. In particular, a mathematical model

(U,

23) is said to be linear is

U

is a vector space and 8 a linear subspace of U.

of (@,

8).

awl

Parametrization

In our framework, a mathematical model is a very abstract object. When we represent it by behavioral equations it becomes a bit more concrete. Often, however, it is possible to view a model as being induced by some concrete parameters.

Let $j be a set, for example a family of mathematical models. In this case, each element M E

M

denotes a mathematical model

(U, 8 ) . A parametrization (D, a) of M consists of a set

P

and a surjective map T :

P

-+

1.

The set

P

is called the parameter

space. We think of C as a concrete space and of an element

p E

P

as a parameter, for example a set of real or complex numbers. Typically, p determines a behavioral equation and in this way induces a mathematical model. If

liD

is a set of matrices we speak of

(C,

a) as a matrix parametrization of

1.

A similar nomenclature is used for polynomial parametrizations, polynomial matrix parametrizations, etc. When

I

is an abstract space related to

1

then we will also call (C, a) a

representation of

1.

Hence, representation (abstract) is essen- tially synonymous to parametrization (concrete). For representations, think of a behavioral equation representation of a mathematical model, or of a representation as an I/O map, etc.

Let

(D,

a) be a parametrization of

1.

Note that a is surjective but not necessarily injective. If a is a bijection, we will call the parametrization trim. In any case, ',he map a:

P

+ I

induces a natural equivalence relation I on

D

defined by

{ p,Cp,}:

*

( * ( p l ) = a(p2)). This equivalence relation leads to canonical forms and to invariants. A subset

PC

E

IP

will be called a canonical form' for the parametrization

(D,

a) if

PC

n

p(mod 8) is nonempty for all p E

P,

Le., if a(Dc) =

a,

equivalently if

(PC,

T ) is itself a parametrizatio? of

1.

It is

called a trim canonical form if

PC

n

p(mod I ) consists of exactly one point for all p E

D,

Le., if a

I

PC +

M

is a

bijection, equivalently if (PC, a ) is a trim ParaFetrization of

1.

A map i:

P

-t

0

is called an invariant if { p , < p 2 } =) { i( p l ) = i ( p 2 ) } and a complete invariant if { p,Cp2} e { i ( p l ) =

In addition to studying problems of canonical forms and invariants related to parametrization, we will also be interested in specifying how one can obtain all elements of a parameter space equivalent to one given element. Usually, this will be achieved by a transformation group

6

= (S,, g E B) on the parameter space

[P.

Thus B is a group and each S , defines a bijection on

D

with Sgl0,, = SgloSg2. The set

P'

= { p' E

P I3g E

B

such that p' = Sg( p ) } is called the orbit of p

under

(3.

It is easy to see that a transformation group defines an equivalence relation on

D

by defining p I and p 2 to be equivalent if they are on the same orbit. If this equivalence relation induced by the tranGormation group (3 coincides with the equivalence relation B induced by the parametrization

(C,

T I ,

then we see that by computing the orbit of p under 8 we obtain

a

concrete way of constructing all elements which define (under

a) the same mathematical model as p E

P.

Note that invariants

of

behavioral equations hence refer to functions of the parameters appearing in the equations which will give the same value for all those behavioral equations which lead to the same behavior. Of course when (B, a) is a parametrization and PC is a canonical form, then

(PC,

a

1

is also a parametrization. As

'Sometimes the term normal form is used for what we call a canonical

form and the term canonical form is reserved for what we call a trim Jordan form a canonical form,

canonical form. However, our nomenclature is consistent with calling the

(3)

WLLEMS: PAMDIGMS AND PUZZLES IN DYNAMICAL SYSTEMS 261

such a map i : Ip +

0

can be an invariant when the map j is trajectories mapping the time axis I into I3 (the position Space restricted to the canonical form

PC,

even though it is not an for the planets) which satisfy his three famous laws. Since for a

invariant on

B.

given map w:

R

-+ [a3 one can unambiguously decide whether

or not it satisfies Kepler’s laws, 8 is indeed well defined. Kepler’s laws provide a beautiful example of a dynamical system We will now apply this view of mathematical models in order in the sense of the aforementioned definition, since it is one of to set up a language for dynamical systems. There have been the few instances in which the behavior

8

is described explic- many attempts to come up with a suitable axiomatic framework itly, and not indirectly in terms of behavioral equations. It took for the study of dynamical systems. Mathematics has chosen for no lesser man than Newton to think up appropriate behavioral a notion which centers around autonomous state space systems equations for this dynamical system.

( x =

f ”

x and its generalizations). Notwithstanding the fact that Most dynamical systems are indeed described by behavioral many areas Of physics (as Hamiltonian mechanics and quantum equations. These are often differential or difference equations, mechanics) fit this framework very nicely and notwithstanding sometimes integral equations. We will postpone the introduction the impressive phenomenological achievements of this theory (as of such descriptions until later. First, we want to introduce some

chaos and strange

attractors).

this line of thought suffers from general properties of dynamical systems in our framework. two serious drawbacks. First, it considers a system as isolated These qualitative properties can be of various type: involving from its environment and, second, it assumes that a model is structure (as linearity), symmetry (as time invariance), related to already in minimal state form (that is, very roughly speaking, the memory (as the state), involving the interaction with the that it assumes that the differential equations describing the environment (as inputs and outputs), etc.

system are first order) and explicit in

x.

Both conditions are Definition 11.2: A dynamical system

X

= (p, W, 8 ) is said rarely satisfied in real applications. In engineering, particularly to be linear if

W

is a vector space (over a field %-for the in control and signal processing, there has always been a ten- purposes of this paper, think of it as

R)

and 8 is a linear d e w to view systems as

processors,

Producing output signals _subspaceof W v (which is a vector space in the obvious way by from input signals. In many applications in control engineering _{pointwise addition and multiplication by a scalar).}

and signal processing, it will, indeed, be eminently clear what _{Thus, ]inear systems obey the superposition principle in its} the inputs and the outputs are. However, there are also many _{simplest form: { w l ( . ) ,}_{w 2 ( . )} _{E 8 ;}

_p

E

F}

=.

{

+

applications where this input-output structure is not at all evi-

_p

w,(. ) E 8

1.

dent (an example at point is in the terminal behavior of an

_we

_{can view time-invariance}_{as a} _of electrical circuit). We will view a dynamical system in the

_we

_{w i ~}_therefore_first_introduce

logical context of Definition 1.1. The only distinguishing feature _{kt be}

_a

_{family ofdyna_micd systems, and}_@₌_(sg, _E₈₎ is that now the phenomenon produces outcomes that are func- _{a transformation group}_on

_C.

_{T h p @ is a group, and S, defines} tions of time: the universum is a function space. This leads to _for_each _E_{8 a}_{_bijection}_on

_x

_satisfying

_sgl

~ g2 =

s,,

0

sg2

.

the following definition. _{We will} _{call C}E

C

@-symmetric if S,Z = B for all g E @ ,

(U,

W,

8)

with

D

&

R

the

time

axis,

W

the signal

space,

and @ consists of the one point 2

9 E

W v

the behavior.

Thus, a dynamical system2 is defined by U, the time instants axis Example: Let =

z,

Take

C

denote all dynamical systems with the time =

z ,

viewed as a group under addition. of interest W , the space in which the time signals which the _L~~_{u r :}

_wz

_{-, ~ “ 3 ”}_{denote the}_{backward t-shift:}

_{( , , 1 f ) ( ~ 3}

_:= system produces take on their values, and

8 ,

a family of _{f ( t f}₊

_{t ) .}

_N~~ _define, _for _E

_p,

_s,:

₂

_-, _by W-valued time trajectories. The sets

e

and

w

define the setting,

_s,((z,w,

₈₎₎

_:=

_{( 8 ,}

_w,

_{, , t ~ ) ,}_and _@_:=

_{( s t ,}

_The the mathematization of the problem, while 8 formalizes the _dynamical_systems_which_are _in

_{t h i s}

_{will be}

model

C,

time signals in 8 can in principle occur, are compati- in a trivial way to continuous time

(u

=

w)

(and-in a

ble with the laws governing 2 , while those outside 8 cannot bit less trivial way--to arbitrary time sets).

occur, are prohibited. Example: Let

B

denote all time-invariant dynamical systems

We will use the terms “system” and “dynamical system” with time axis =

R,

consider the (trivial) group consisting of interchangeably. Our notion of a dynamical system hardly does two points: 8 = 11, g } , 1. ~ ~

s g ( ( ~ ,

f w , i8 ) ) ~= ~

justice to the word “system,” it only captures the idea of a

,

I

(

w,

rev 8 ) with rev: W R -+ W R the time-reversal opera-

dynamical model. In particular, the word system as it used in tor: ( r e v f ) ( t ) := f(- t ) . The dynamical systems which are

Definition 11.1 does not formalize structure as interconnection, in this are time This defini- or algorithmic features involved in our intuitive idea of tion is easily extended to discrete time.

“system.” Since time invariance will play an important role in the

Example

-

Kepler’s Laws: If a definition is to show proper _sequel,_we_{single it} _in_a_definition,

respect and do justice to history, Kepler’s laws should provide _Definition _11.3:_A_dynamicd_system ₌

_(u,

_W,

_{8 )}_with the very first example of a dynamical system. They do. Take ₌

_g

_or

_R

_{is said}_to_{be time}

_invariant

_if_,,IB ₌_{8 for}_all

0

=

R

W

=

W3$

and 8 = { w :

w

--*

w3

~ ~ e p l e r ’ s laws are

t E u ( u t denotes the backwards t-shvt: ( c r r f ) ( ~ ? :=

f ( t ’

+

satisfied}. Thus the behavior % in this example consists of the _t))._If ₌

₂

_{then this condition is equivalent to}

_,,B

₌

_{8 ,}

planetary motions which according to Kepler are possible, all _The_analog_of_{this definition when the time axis is}

_H,

_or

_w,

11. DYNAMICAL SYSTEMS

Definition

**IZ*I:**

A dynamical system is a triple = i.e., if the orbit through

X

induced by the transformation group

laws which govern the System. According to the dynamical called (discrete time) time

invariant.

This definition can be

reference to input-output maps or relations, without reference to state Important!

we

will assume, essential[y throughout this

variables, and without reference to behavioral equations has been developed paper, that [he lime axis

is

a

(usually)

or

(sometimes)

successively in [lo]-[14]. We believe that it presents a very succinct,

simple, and general starting point for the study of all sort of questions in the and that Our dynamical systems are time invariant’ This

theory of dynamical systems. assumption is made in part for reasons of exposition and notation ’Our definition of a dynamical system as a family of trajectories without requires u f %

E

8

for all t E

u.

(4)

262 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 36, NO. 3 , MARCH 1 9 9 1

(U

=

X ,

not

I+;

U

=

I,

not I+) and in part because it

features as an essential assumption in many of the definitions (e.g., time invariance).

Dynamical systems acquire their importance, their idiosyncra- cies, their relevance, from the fact that they exhibit memory, from their potential to model phenomena with an aftereffect, from the possibility to describe devices in which the past influ- ences the future. We will, in this paper, introduce a number of themes related to the memory of a system, the most central being the state of a dynamical system. This will be studied later. 'In this section, we will consider first, if the infinite past or the infinite future are relevant for deciding whether or not a trajectory belongs to the behavior.

Definition 11.4: Let C = (U,

W,

8 ) be a dynamical system (as always, henceforth, with

U

=

2

or

W

and

C

time invariant);

C is said to be complete if { w E 8 } e { w

I

121 E 8

1

[ f l , l z l , v t , , t , E

U,

t , I t,}. X is said to be A-complete ( A E

v,

4 2 0)

i f { w c @ } e {(o'w)~Io,al~8~lo,al,~t~U};ifthi~holdsfor

all A

>

0, we call C locally specified; if it holds for A = 0, we call C static.

Second, we will consider how long a trajectory has to be observed before its past and future become independent.

Definition

U S :

Let

C

= (U,

W,

8 )

be a dynamical system;

X

is said to have memory span A(A E

U,

4

>

0) if { w l , w 2 E 8 ,

w l ( t ) = w z ( t ) for 0 5 t

<

A

=

{ w l A w , ~ 8 } . Here A denotes concatenation (at

0),

defined as

(w1 W , ) ( t ) =

{

"w:i:;

;

2:

C is said to have finite memory span if the aforementioned holds for some A; it is said to be Markovian if { w l , w 2 E

B ,

wl(0) = w,(O)} =) { w l A w , E ~ } (that is, in discrete time,

if the memory span is 1); and memoryless if { w l , w 2

€83)

=)

{ w 1 A wz E

B}

(this may be viewed as memory span 0).

Note that in a system with memory span A we have called the past ( t

< 0)

independent of the future ( t 2 A ) , given the system's trajectory for 0 5 t

<

A . Indeed, any possible past can then be concatenated with any possible future. We view this as

"independence" of the past and the futures3

The notion of completeness will play an important role in the sequel. Intuitively, completeness means that the way a signal looks like at t = k m should, per se, be of no consequence as to whether or not the signal obeys the law of the system. As we shall see, systems described by difference equations are automat- ically complete (since such laws extend over a finite-time interval only). Noncornplete systems occur for example when signals are required to be square integrable. For instance

(1,

I,

12(Z;

R)) is a (trivial) noncomplete system. For further remarks on 1,-systems, see Section XI.3.

Dflerence and Dlflerential Equations

The notion of behavioral equations is immediately applicable (take M = W') to dynamical systems. We will, in this paper, be particularly concerned with difference (and differential) equations.

A behavioral dzyerence equation representation of a discrete-time dynamical system with the axis

U

=

I+

and signal space

W

is defined by a nonnegative integer L (called the lag,

or the order of the difference equation), an abstract set

R

(called the equating space), and two maps f l , fi: W L + l -+

E.

They

'The terms Murkoviun and independence are borrowed from stochas-

tics. We use them here in a purely deterministic setting.

- .. . ~ ~ -.

define the behavior by 8 = { w :

Z++

W

1

f l o (

-

u L w , w ; * * , u w , w ) = f 2 . ( u L w , u L - ' w ; * * , u w , w ) } , This is easily generalized to the case U =

X .

However, in this case, it is logical to consider the equation to have both positive and negative lags, yielding the following behavioral equations:

u L - l

f l " ( U L W , u L - ' w ; e ' , u ' + l w , d w )

=

f , o ( u ~ w , u ~ - ~ w , " . , u ~ + ~ w , u ~ w ) .

We will call L - I the lag of this difference equation. It is clear

that the system

( p ,

W,

8)

obtained this way defines a time-invariant dynamical system.

The continuous-time analog of this is a dynamical system represented by a behavioral equation which is a differential equation. The time axis is

I

,

the signal space

W

a differentiable manifold, L a nonnegative integer (called the order of the differential equation),

R

the equating space, and f l , f2 maps from

JLW

into [D

( J L W

denotes the Lth order jet bundle of

W;

if

W

is an nonempty open subset of

W4,

then

JLW

=

W

X

( W 4 ) L - 1 ) .

This defines the behavior 8 =- { w : -+

w

I

dw d L w dw d L w

dt

'

d t L

f l O ( W ,

df""

-

) = f , O ( W ,

-;..

-

' d t L ) } . It is not

easy to be precise regarding what is meant by the solution of the aforementioned differential equation. We will not belabor this point now but return to it in Section XI.2.

Let us recapitulate the situation. We have defined the dynamical system in terms of its behavior. Our claim is that this is the natural level of generality in which to think about models. We will consequently always try to define as much as possible qualitative properties of systems in terms of their behavior. We have seen how linearity, time invariance, time reversibility, completeness, the memory span, etc., can be defined in this vein. We have observed how behavioral equations, in particular, difference and differential equations, lead to representations of dynamical systems. Now, given a difference or a differential equation with a certain structure (as, for example, linearity), it is usually obvious that the dynamical system represented enjoys an analogous property (as linearity). The question which we find of much interest is the converse, that is: What properties of the behavior allow the system to be represented by a dflerence ( o r a dz~erential) equation of a particular type? The following proposition gives a small taste of the type of result which we have in mind.

Proposition ZI.6: Let C =

(I,

W,

8 ) be a (discrete-time) dynamical system. The following conditions are equivalent:

i)

X

is time invariant, complete, and has memory span L ;

ii)

E

is time invariant and L-complete;

iii) C can be described by a behavioral difference equation Proofs of this and other results are collected in Appendix P. The previous proposition clarifies what the essential conditions are for a system to be represented by means of a difference equation: it has to be complete (absence of initialization and termination conditions on the behavior at t =

*

m ) , and it has to have a finite memory span (observation of a trajectory on a finite interval lets us conclude that what could have happened in the past is independent of what will happen in the future).

III.

AR-SYSTEMS

with lag

L.

Let us now consider the following behavioral difference equations

R , w ( t

+

L )

+

R , - , w ( t

+

L - 1)

+

. ' *

(5)

WILLEMS: PARADIGMS AND PUZZLES IN DYHAMICAL SYSTEMS 263

where R,, R,-,,

. .

.,

R / + , , R , E

R g x q .

Note that this will be the system of difference equations obtained by assuming

W

=

( a q , E = Fig, and f l , f r linear. Any finite set of linear equations relating the outcome time-series w I

,

w 2 , *

,

wq to a finite

set of their time lags will lead to a set of equations which in matrix notations can be written in the above form. We will assume that L , 1 E

H ,

L 2 I (we find it convenient not to assume that

L

= 0 or

I

= 0 , even though at this point this would constitute no real loss of generality). Now introduce the polynomial matrix4 R ( s , s-') = R , s L

+

R L _ , s L - '

of difference equations may be written in shorthand in terms of this polynomial matrix as

+

. . .

+R,+,s'+'

+

R , S ' E W ~ ~ ~ [ S , SF']. The above system

R ( u , u - l ) ~ = 0 . (AR)

We will call this an autoregressive system. Note that every polynomial matrix defines such a system with the number of columns equal to the dimension of the signal space and the number of rows equal to the number of (scalar) equations. It is logical to consider the signal space

( R q )

and hence its dimension

( q ) as fixed by the system. However, the number of equations

( 8 ) in AR will be a variable, it will depend on the system and on its representation. We will therefore assume that in

R ( s ,

s-

')

the number of columns is fixed but that the number of rows is in principle free. Hence R ( s , s- I) will be considered to be an

element of RoXq[s,

s-'].

The behavioral difference equation AR describes a dynamical system C = (D,

W, 8 )

with time axis T = 8 , signal space

W

=

W q ,

and behavior 8 = { w :

2

+ [ a 4

I

R ( a , o - ' ) w = o } .

Note that if we consider

R ( s , s-I)

E R g x q [ s , s-I] as inducing the linear map R ( o , u - I ) :

(Wq)"

+

(Rg)",

then 8 =

ker R ( u , u- I ) . It is easy to see that the system C defined above is linear time invariant and complete. In fact:

Theorem

ZZI.1:

Let C =

(2,

W q ,

9)

be a dynamical system. The following are equivalent:

i) C is linear time invariant and complete;

ii) C is linear time invariant complete and has finite memory iii) 3 R ( s ,

s-I)

E W o X q [ s , s - ' 1 such that 8 = ker R

iv) 8 E

P

(e

- the family of all linear shift-invariant closed subspaces of L q ; Lq :=

(Wq)Z

equipped, with the topology of

pointwise convergence).

signal processing, discrete-time control, econometrics, etc. Against this background, it should be of considerable interest to identify on the level of the behavior what is really behind the assumption that a system can be described by AR-equations. The aforementioned theorem shows precisely what these conditions are: linearity, time invariance, and completeness. It also provides a succinct topological classification of the kernels of polynomial operators in the shift: they correspond precisely to shift-invariant closed linear subspaces. We will refer to the previous theorem as the AR-representation theorem,

Motivated by the aforementioned theorem, and with a slight abuse of notation, we will use

P

to denote the set of linear time-invariant complete dynamical systems (&, W q , 8). span;

( o , o - ? ;

polynomial matrices. A glance at Appendix N may be helpful. The terms

4The notation at this _{point becomes very} _{dependent on polynomials and} autoregressive, moving average, and autoregressive-moving average are borrowed from the theory of stochastic processes and time-series analysis. We use them here in a purely deterministic context.

Parametrization

We will use the nomenclature introduced in Section I.

4

parametrization

(P,

a) of a family of dynamical systems

X

consists of a y a c e P, the parameter space, and a surjective map a:

D

+

X .

We think of P as a space of concrete objects

(matrices, polynomials, polynomial matrices). More abstract ways of obtaining

X

will be referred to as representations

(e.g., the latent variable, state space, and input-output representations, which will be studied later on). Theorem 111.1 implies the following proposition:

Proposition 111.2: Let

P

denote the set of linear time-invariant complete systems

(T,

W, 8)

with

T

= 2 and

W

=

W q .

Then ( R o X q [ s , s-'], ker) with ker: R ( s , s-')

-

(0,

W , ker R ( o , u-I)) defines a (polynomial matrix) parametrization of

P

4'.

Since in a parametrization (P, T ) it is usually clear from the context what the map a is, we will often refer to P as the parametrization. Thus, the aforementioned proposition states that

W o

x 4 [ s , s -

'1

is a parametrization of

9

q . This makes it

evident why polynomial matrices play such an overwhelming role in linear system theory. This parametrization induces an equivalence relation on W o X q [ s , s-'1. We will denote this equivalence as AR. Thus, for R , ( s , s-'), R,(s, sI-l) E R o X q [ s , s - ' ] , { R l A 5 R 2 } : o { k e r R , ( o , u - ) = kerR,(a, u-I)}.

It is clear how the notions of canonical forms, invariants, and transformation groups become useful in the context of dynamical systems. As already mentioned, behavioral equations specify the behavior uniquely, but the converse is not true. As such, two distinct AR-equations can represent the same behavior, the same dynamical system. A canonical form weeds out some of this redundancy, a trim canonical form does this weeding out until no redundancy is left. An invariant is a function of the parameters of the behavioral difference equations which actually depends on the induced behavior only, while a complete invariant offers a unique specification of the behavior in terms of the coefficients of the behavioral equations. Recognizing a suitable transformation group will allow us to construct from one AR- system, all AR systems which model the same dynamical system.

We will study efficient canonical forms for AR-systems later. In this section we will only show how to reduce the number of equations. At several points in this paper, we will study minimal representations. This minimality will always refer to keeping

the number of equations as small as possible and keeping the number

of

latent variables (cf Section IV)

as

small

as

possible

(within the class of representations of a particular type of a given dynamical system). For AR-representations this requires having a minimal number of equations (since there are no latent variables), as expressed in the following definition.

DeBnition 111.3: Let R ( s , s - ' ) E

W g x q [ s ,

s-'1. We will call the system of AR-equations R ( o , u- ')w = o minimal (for simplicity we call also R minimal) if { R ' ( s , sf) E

W g ' x q [ s ,

s-'1, R ' , i R }

*

{ g ' 2 g } .

Proposition

111.3:

Every C E 4 admits a minimal AR-representation R( u, o - ' ) w = 0 . Moreover

i) { R is minimal) o { R ( s , s - ' ) ~ R ; / q [ s , s-'1, i.e., R is of full-row rank}

ii) If R is minimal, then { R ' ( s , s - l ) E R o X q [ s , SKI],

R > 5 R , R' is minimal} Y { R and R' are unimodularly left

equivalent (that is, there exists a unimodular

U(s,

s-I) such that

R =

UR')}.

(6)

tions may be deduced from one: as the orbit of the transformation group R -+

UR

where U ranges over the unimodular

polynomial matrices of a suitable dimension. This group is the unimodular group acting on the left. Note that the aforementioned proposition implies in particular that

W7Tq[s,

s-'] is a canonical form since each element of

P

admits a minimal AR-representation. We emphasize that when R ( o , cr-')w = 0

is not minimal, then this does not mean that there are necessarily redundant equations in this system of equations. In order to obtain a minimal system it will in general be necessary to first recombine some of the equations, and then cancel those which have become redundant.

IV.

LATENT

VARIABLES

We will now push for a few moments dynamical systems into the background and discuss again some aspects of general mathematical models. Our view of a mathematical model as expressed in Section I is as follows: identify the variables of the phenomenon which we want to model (specify the universum I)

and identify the possible outcomes, the behavior (specify

8

E

U).

However, in most modeling problems, we will need to work with other variables in addition to those which we try to model. We will call these other variables latent variables. They are auxiliary variables introduced, if for no other reason, because they make it possible to express conveniently the behavior, the outcomes of the variables in

U.

In order to distinguish these from latent variables, we will call the elements of

U

manifest variables.s In a bit, we will give a series of instances where latent variables appear. Let us, however, start with two typical concrete examples.

Example: Consider a restrictive electrical circuit. This consists of a graph, with nodes and branches. Some branches contain resistors, some are external ports. We want to model the port behavior, the relation between the voltage drop across and the current through those external branches. In order to specify the sign of these, we make the graph into a directed one. Now introduce as auxiliary variables the voltages across and the currents through the branches. The following relations must be satisfied.

i) Kirchhoff's current law (KCL): the sum of the currents entering each node must be zero.

ii) Kirchhoff's voltage law ( K V L ) : the sum of the voltage drops across the branches of any loop must be zero.

iii) The constiturive laws of the resistors in the resistor branches.

An example is shown in Fig. 1. The following equations must be satisfied (in the notation indicated in Fig. 1):

Constitutive laws : KCL : KVL : R , I ] =

v,

I = I , + I ,

v , + v , = v

R212 =

Vz

I,

=

r,

+

r4

v,

+

v,

=

v

R,13 = V3 I,

+

r3

=

I,

vl

+

v,

=

v,

+

v,

R4Z4 = I$ I = I4

+

z5

V I

+

v,

=

v2

R515 = V,

v,

+

v,

=

v,

Our basic purpose is to express the relation between the voltages and currents at the external branches. In the aforemen-

iliary) variables as used here is quite appropriate in analogy with the use of

distinction between manifest (signal) variables and latent (aux- these terns in psychology. Latent variables are reminiscent of Rosenbrock's partial state [81 and of the internal variables used off and on in electrical circuit theory.

- - . ... - ~~~ . ._

I

Y

c

261 IEEE TRANSACTIONS ON AUTOMATIC CONTROL. VOL. 36, NO. 3, MARCH 1 9 9 1

-

I

Fig. 1. A resistive electrical circuit

tioned example, this will be a relation of the form

V

= RI

(where R can be calculated from R , , R,,

R,,

R,, and

R5),

obtained by eliminating (

V,

,

*

.

,

V,, V,,

. .

* , I,) from the previ-

ous equations. However, the basic model, the one obtained from

first principles, involves the variables ( V,, * *

.

,

V,,

I,,

. . .

, I,)

in addition to those which we are trying to describe

( V ,

I).

The port voltage

V

and the port current

I

are the manifest variables, while the voltages across and the currents through the internal branches (the variables

V,

,

. . .

,

V,,

I,, *

.

,

I, in the aforemen-

tioned example) are latent variables.

Example: An economist is trying to figure out how much of a package of

n

economic goods will be produced. As a firm believer in equilibrium theory, our economist assumes that the possible volumes will consist of those points in the behavior space where, product for product, the supply equals the demand. This equilibrium set is a subset of

W

:

.

It is the behavior which

we are looking for. In order to specify this set, we can proceed as follows. Introduce as latent variables the price, the supply, and the demand of the n products. Now determine the supply and demand functions Si:

W

:

-

W,

and Di:

R;-

W,.

Thus

Si(

P I , P Z , .

. .

, P , ) and

D,(

p I , p 2 , * * , p , ) equal the amount

of product i which will be bought and produced when the going prices are p I , p,, *

. .

, p n . This yields the following behavioral

equations:

These behavioral equations describe the relation between the prices p i , the supplies si, the demands d j , and the production volume

6 .

The V,'s for which these equations are solvable yield the desired behavior. Clearly, this behavior is most conveniently specified in terms of the behavior of the variables p l , s i , d i , and

V,

( i = 1,2, *

.,

n )

jointly.

These examples illustrate the following definition.

Definition IV.1: A mathematical model with latent vari-

ables

X,

is dejned as

a

triple

C,

= (U, [I,

8,)

with

lJ

the

universum of manifest variables,

R

the set

01

latent variables, and

9,

E U X

b

the f u l l behavior. It defines the manifest mathematical model

(U,

8 ) with 8 := { u E U

1

~ I E

15

such that ( u ,

I )

E

By}

;

8

is called the manifesr behavior. We call

(U, R,

Bf)

a latent variable representation of (M, $3).

Notions as behavioral equations; linearity, symmetry, parametrization, etc., can be generalized in an obvious way to models with latent variables. Thus in the economics example we have

M =

W

:

,

R

=

Wy,

and

8,

described by the behavioral equations given in the example.

The above definition is easily generalized to dynamical systems:

Definition IV.2: A dynamical system with lateni variables

(7)

WILLEMS: PARADIGMS AND PUZZLES IN DYNAMICAL SYSTEMS 265

the space of manifest variables, L the space of latent vari- ables, and

B f

5

(W

x

Ljr

the full behavior. It defines a

latent variable representation of the manifest dynamical sys- tem

C

=

(T,

W,

'23) with manifest behavior 8 := {

w :

U

-+

W ~ 3 I : U - + O . s u c h t h a t ( w , I ) ~ B , } .

Note that in our framework, we view the signal variables as those variables which the model aims at describing. We think of these variables as manifest, as external, as directly observ- able if you like (even though that suggests much more than we have in mind). We think of the latent variables as auxiliary

variables, as internal, as unobservable or-better-as only

indirectly observable through the signal variables. Sometimes we will refer to the full behavior as the internal behavior and to the manifest behavior as the external behavior.

Situations where basic models use latent variables either for mathematical reasons or in order to express the basic laws governing the manifest variables are manifold. Let us mention a

few: internal voltages and currents in electrical circuits in order to express the external port behavior; the momentum in Hamiltonian mechanics in order to describe the evolution of the position; the internal energy and entropy in thermodynamics

in order to formulate the first and second law restricting the temperature and the exchange of heat and mechanical work;

prices in economics in order to explain the production and exchange of economic goods; state variables in system theory in order to formalize the relation between inputs and outputs; the

wave function in quantum mechanics underlying observables; and, finally, the basic probability space

n

in probability theory: the latent variable space par excellence.

Latent variables will invariably appear whenever we model a system by the method of tearing. The system is then viewed as an interconnection (see Section XII) of subsystems and the

modeling process is carried out by Zooming in on the individual subsystems. The overall model is then obtained by combining these models of the subsystems with the interconnection con- straints. The ultimate model will invariably contain latent variables: the auxiliary variables introduced in order to express the interconnections will play this role.

ARMA-Models

All the notions related to dynamical systems introduced so far

W

= W q , and

L

=

W d ,

can be described by an ARMA-system of behavioral equations.6

An ARMA-system induces a manifest dynamical system

( E ,

W4,

B )

with 8 = {

w

13 I such that ARMA is satisfied), equivalently,

B

= ( R ( a , u-'))-'im M ( a , u - ' ) with ( ) - ' the point-to-set inverse, R ( u , u - ' ) : (R4)" + (Rig)", and M ( o ,

6 ' ) :

Wd)"

-+ (Fig)". Clearly,

( H ,

W , 8 ) is linear and time invariant. The question arises if it is also complete. The answer is in the affirmative.

Theorem W . 3 ; Let the dynamical system with latent variables

Xf

=

(4,

F i 4 , I d , '23,) be linear time-invariant and com-

plete. Then the manifest dynamical system which it represents, C = ( 8 , Wq,

B ) ,

is also linear time-invariant and complete.

In terms of an ARMA-system of behavioral equations

R ( o , o - ' ) w = M ( u , u - ' ) Z , the previous theorem states that the latent variables I can be completely eliminated from these equations, resulting in an AR-system of equations. That is, there will exist a set of polynomial matrix ~ ' ( 5 , s-') E [ a o X q [ s , s - ' ]

such that the AR-system of equations R'(o, u-

')w

= o represents the manifest behavior. In other words, this last set of equations captures all of the restrictions imposed on w by the ARMA equation. This elimination may result in an increase in the lag of the resulting AR-system as compared to the ARMA- system. However, the number of equations in the AR-system need never be larger than the number of equations of the original ARMA-system. We will refer to the previous theorem and to the resulting consequence for ARMA- and AR-equations as the

ARMA-elimination theorem

.'

The results obtained imply that pairs of polynomial matrices

( R ( s , s-'), M ( s , s-')) with booth

R

and M having the same number of rows and with

R

having q columns, provide an- other parametrization of

L!

with .x: ( R , M )

-

(a,

Rq,

( R ( a , u - '))-

'

im M ( u , u- ')), This leads to the equivalence ( R l , M,),,,,(R,,

Md.

MA-Models

A specially important class of ARMA-systems are those in which R ( s , s-

')

= I , yielding the moving average system

w = M ( u , u - l j z (MA) (linearity, time invariance, completeness, etc.) are trivially gen- The intrinsic behavior of an MA-system equals eralized to systems with latent variables. Sometimes it is obvious

im M ( a . 6 ' ) . Of course, by Theorem IV.3,

B

E

F q ,

and the variable model (for example, linearity and time invariance)

sometimes it is less obvious (for example, completeness). these subspaces of

(W)"

which are images of polynomial operators in the shift. We have seen that every 8 E L! allows an and consider the system of behavioral difference equations Let R(s'

'-

I ) E R g x q ' s '

'-

'"

M ( s ' s - E R g x d [ S ' s-

'I

AR-representation, that is, that it is the kernel of a polynomial

operator in the shift. Does it also allow an MA-representation?

that the intrinsic inherits a property from the latent elements of

9

4 which can be represented this way are precisely

In other words, is it also the image of a polynomial operator in

I?

('

'-

')

= M ( ''

'-

1

1'

(ARMAj

the shift? If not, does the restriction of having an MA-represen- relating the time-series

w :

I

-+ R4 (expressing the evolution of tation imply some interesting system theoretic property? It may as a surprise that these abstract questions lead us to the manifest variables) to the latent variable time series I : concept of controllability!

Z

+

W4.

We will call this an autoregressive-moving-average

system (we mean nothing stochastic by this The 6 ~use hof ~(the continuous-t,me counterpan

oo

AR- and ,ZRMA.mode]s

term R(', u-l)w in ARMA is the autoregressive part, _{is implicitly} _very_{much present in the inspiring book}_of_Belevich_[I]._The while M ( u , u - ' ) I is called the moving-average part. Clearly view of modeling implicit in the aork of Belevich is very akin to our own.

these equations represent a dynamical system with latent vari- 'Let PEW'^^*. Theorem IV.3 implies that P w i l l map linear shift-in-

(z

,

w

q ,

w

d , 8 f ) with 8

,

= ker [ R ( ~, u - 1 )

1

variant closed subspaces of L** into linear shift-invariant closed subspaces

- M ( a .

u- I)], From Theorem III.l, we how that precisely of L q ' . Our proof of this fact depends on the Smith form for polynomial

every such system with

B ~ E

L!

q + d $ that is, every linear timein- essential way. Actually. it can be shown more generally that P will map any matrices and uses the AR-representation result of Theorem 111.1 in an