IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 36. NO, 3, MARCH 1991 259
Paradigms and
Puzzles in the Theory of
Dynamical Systems
Jan C. Willems,
Fellow, IEEEWas ich hier geschrieben habe, macht im Einzelnen uberhaupt nicht die Anspruch auf Neuheit
(Wiltgenstein, Tractatus).
theory of dynamical systems. The basic ingredients form a triptych, with AbstractWe will outline the main features of a framework for a the behavior of a system in the center, and behavioral equations and
latent variables as side panels. We will discuss a variety of representa tion and parametrization problems, in particular, questions related to input/output and state models. It is shown how the interconnection of systems fits into this framework, in particular, problems of feedback control. The final issue to which we pay attention is that of system identification. It is argued that exact system identification leads to the question of computing the most powerful unfalsified model.
T
I.
INTRODUCTIONHE purpose of this article is to give a selfcontained exposi tion of an approach to mathematical models, in particular, to a theory of dynamical systems. This approach leads to an appealing triptych, with the behavior of a system playing the lead role in center stage, and with behavioral equations and
latent variables as the important supporting characters. The framework presented incorporates problems of representation, questions of parametrization, and procedures for identification: algorithms for obtaining models from observed data. Our con cept of a dynamical system leads to a new view of the notions of controllability and observability, and of the interconnection of systems, in particular, to what constitutes a feedback control law.
We will focus on dynamical systems because of their obvious importance in control problems. However, we will introduce the concepts in their exquisite simplicity and, therefore, full general ity. Also, we will consciously avoid problems of a technical mathematical nature and concentrate instead on mathematical language, on issues related to system stmcture. This has as one consequence that the emphasis will be on discretetime dynami cal systems. Also, because we want to demonstrate that all this can lead to more than idle generality, we will present many specific results for discretetime linear timeinvariant systems. Our ambition, our hope, our conviction is that the approach and the framework presented here will prove useful and find accep tance on a very elementary pedagogical level. Indeed, the van tage point taken and the circle of ideas revealed here show more than has been realized before the unity which exists between elementary mathematical concepts and the first principles under lying mathematical models and dynamical systems.
General
Models
Mathematical models as we encounter them in practice may be expressed by ordinary or partial differential equations, they may involve the language of graphs or lattice diagrams, or
recommended by Past Associate Editor at Large, J. Baillieul. This work was
Manuscript received October 10, 1989; revised July 31, 1990. Paper supported by the EEC under Contract SC10433C(A).
The author is with the Mathematics Institute, University of Groningen, 9700 Av Groningen, The Netherlands.
IEEE Log Number 9042134.
require the notion of a transfer function or a formal language. It may seem hard to find a common denominator in all this. A perceptive observation, one which can be attributed to control theory, is to look at systems, at devices, as black boxes. Thus, instead of trying to understand, in the tradition of physics, how a device is put together and the detail of how its components work, we are told to concentrate on how it behaves, on the way in which it interacts with its environment. It is this black box point of view which we will formalize in its ultimate generality. However, we will back
off
from the usual input/output setting, from the processor point of view, in which systems are seen as being influenced by inputs, acting as causes, and producing outputs through these inputs, the internal initial conditions, and the system dynamics. We will retreat from this position by considering all variables a priori on an equal footing and develop the input/output framework as a special case which in many situations can actually be deduced from the original model.We proceed as follows. Assume that we have a phenomenon which we want to model. To start with, we cast the situation in the language of mathematics by assuming that the phenomenon
produces elements in a set
U
which we will call the universum.Elements of
U
will be called the outcomes of the phenomenon.Now, a (deterministic) mathematical model for the phenomenon (viewed purely from the behavioral, the black box point of view) claims that certain outcomes are possible, while others are not. Hence a model recognizes a certain subset
23
of M. This subset will be called the behavior (of the model). Formally:Definition I.1: A mathematical model is a pair (U,
8)
withU
the universumits elements are called outcomesand 8the behavior.
Example: The fact that the absolute temperature is nonnega tive may be viewed as modeling the phenomenon temperature.
Thus U = R and $3 = [ 273, CQ) (with the temperature ex pressed in 'C).
Example: During the ice age, shortly after Prometheus stole fire from the Gods, man realized that
H,O
could appear, depending on the temperature, as water, steam, or ice. It took a while longer before this situation was captured in a mathematical model. The generally accepted model is as follows:W =
{ice, water, steam} X[
 273,CQ)
and23
= ({ice} X[
273,Ol) U {water} X [0, 1001 u({steam} x[IOO,CQ]).
Example: Economists believe that there exists a relation be tween the production P of a particular economic resource, the capital
K
invested in the necessary infrastructure, and the laborL
expended towards its production. A typical model will look like:M
= R:, and23
= { ( P , K , LjER:IP
= F ( K , Lj}, 00189286/91/03000259$01.00 01991 IEEE2M) IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 36. NO. 3, MARCH 1991 where F :
Wl+
R+ is the production junction. Typically, F :( K , L )  C U K ~ L ? with
c r , p , r ~ W + ,
0 < @ < 1 , O S  ~ S Iall constant parameters depending on the technology.
Behavioral Equations
In applications, models are more often than not described by equations.
Definition 1.2: Let
QJ
be a universum,E
an abstract set, called the equating spaces, and j l,
j z :
U + R. The mathematical model
(U,
%) with 8 = { u e U1
f l ( u ) = f 2 ( u ) } is said to be described by( a )
behavioral equation(s), and will be de noted as (@,E,
f , , f 2 ) a behavioral equation representationThe best way of looking at the behavioral equations f l ( u ) =
j 2 ( u ) is as equilibrium conditions: the behavior 8 consists of those attributes for which two (sets of) quantities are in balance.
A few remarks are in order. First, in many applications models will be described by behavioral inequalities: simply take in the aforementioned definition I to be an ordered space and consider the behavioral inequality f , ( u ) 5 j 2 ( u ) . Many models in operations research (e.g., in linear programming) and in economics are of this nature. Second, every model can trivially be considered as coming from behavioral equations. Simply take I to be any set containing 0 and let
j
be any map fromU
toE
such that23
= { y EU
1
j ( u ) = 0 ) . Interestingrepresentation questions occur when we want ( f , , fi) to have a concrete and appealing form or when we want the behavioral equations to be a set of equations, each allowing a natural and convincing interpretation, bringing into evidence such things as balance equations, conservation laws, simple constitutive laws, field equations, etc. Third, and most importantly, whereas equa tions uniquely specify the behavior, the converse is obviously not true. Since we have a tendency to think of mathematical models in terms of equations, most models being presented in the form, it is important to emphasize their ancillary role: it is the behavior, the
solution
set of the behavioral equations, not the behavioral equations themselves, which is the essen tial result of a modeling procedure.InputOutput
Models
In most cases, the universum &! will be a product set @ =
n
U,
(with A a suitably chosen index set formalizing the different attributes of elements of the universum) and hence the behavior % _Cn
Ua
becomes a relation between the variablesU , E
U,,
( Y E A . All further mathematical structure on product sets, as maps, functions, graphs, resulting in a relation is thus in principle applicable in order to further structure mathematical models. An important special case is recovered by assuming thatU
=0
X @ and that23
is the graph of a map. Formally;Definition 1.2: A mathematical model
(0
x0,
B )
with23
the graph of a map F :0
+ Q is called an I / 0 map, denoted as(O,@,
F ) ;0
is called the input space, Q is called theoutput
space, and F is called the system map.In an 1/0 map it is possible to interpret the attributes in
0
ascausing the attributes in Q. Note that an 1/0 map can be viewed as being described by the behavioral equation y = F ( u j .
Many other properties of mathematical models can be nicely cast in the framework presented, for example, linearity, lin earization, symmetry, variational principles, etc. In particular, a mathematical model
(U,
23) is said to be linear isU
is a vector space and 8 a linear subspace of U.of (@,
8).
awl
awl
Parametrization
In our framework, a mathematical model is a very abstract object. When we represent it by behavioral equations it becomes a bit more concrete. Often, however, it is possible to view a model as being induced by some concrete parameters.
Let $j be a set, for example a family of mathematical models. In this case, each element M E
M
denotes a mathematical model(U, 8 ) . A parametrization (D, a) of M consists of a set
P
and a surjective map T :P
+1.
The setP
is called the parameterspace. We think of C as a concrete space and of an element
p E
P
as a parameter, for example a set of real or complex numbers. Typically, p determines a behavioral equation and in this way induces a mathematical model. IfliD
is a set of matrices we speak of(C,
a) as a matrix parametrization of1.
A similar nomenclature is used for polynomial parametrizations, polynomial matrix parametrizations, etc. WhenI
is an ab stract space related to1
then we will also call (C, a) arepresentation of
1.
Hence, representation (abstract) is essen tially synonymous to parametrization (concrete). For represen tations, think of a behavioral equation representation of a mathe matical model, or of a representation as an I/O map, etc.Let
(D,
a) be a parametrization of1.
Note that a is surjec tive but not necessarily injective. If a is a bijection, we will call the parametrization trim. In any case, ',he map a:P
+ Iinduces a natural equivalence relation I on
D
defined by{ p,Cp,}:
*
( * ( p l ) = a(p2)). This equivalence relation leads to canonical forms and to invariants. A subsetPC
EIP
will be called a canonical form' for the parametrization(D,
a) ifPC
n
p(mod 8) is nonempty for all p EP,
Le., if a(Dc) =a,
equivalently if(PC,
T ) is itself a parametrizatio? of1.
It iscalled a trim canonical form if
PC
n
p(mod I ) consists of exactly one point for all p ED,
Le., if aI
PC +M
is abijection, equivalently if (PC, a ) is a trim ParaFetrization of
1.
A map i:
P
t0
is called an invariant if { p , < p 2 } =) { i( p l ) = i ( p 2 ) } and a complete invariant if { p,Cp2} e { i ( p l ) =In addition to studying problems of canonical forms and invariants related to parametrization, we will also be interested in specifying how one can obtain all elements of a parameter space equivalent to one given element. Usually, this will be achieved by a transformation group
6
= (S,, g E B) on the parameter space[P.
Thus B is a group and each S , defines a bijection onD
with Sgl0,, = SgloSg2. The setP'
= { p' EP I3g E
B
such that p' = Sg( p ) } is called the orbit of punder
(3.
It is easy to see that a transformation group defines an equivalence relation onD
by defining p I and p 2 to be equiva lent if they are on the same orbit. If this equivalence relation induced by the tranGormation group (3 coincides with the equivalence relation B induced by the parametrization(C,
T I ,
then we see that by computing the orbit of p under 8 we obtain
a
concrete way of constructing all elements which define (undera) the same mathematical model as p E
P.
Note that invariantsof
behavioral equations hence refer to functions of the parame ters appearing in the equations which will give the same value for all those behavioral equations which lead to the same behav ior. Of course when (B, a) is a parametrization and PC is a canonical form, then(PC,
a1
is also a parametrization. As'Sometimes the term normal form is used for what we call a canonical
form and the term canonical form is reserved for what we call a trim Jordan form a canonical form,
canonical form. However, our nomenclature is consistent with calling the
WLLEMS: PAMDIGMS AND PUZZLES IN DYNAMICAL SYSTEMS 261
such a map i : Ip +
0
can be an invariant when the map j is trajectories mapping the time axis I into I3 (the position Space restricted to the canonical formPC,
even though it is not an for the planets) which satisfy his three famous laws. Since for ainvariant on
B.
given map w:R
+ [a3 one can unambiguously decide whetheror not it satisfies Kepler’s laws, 8 is indeed well defined. Kepler’s laws provide a beautiful example of a dynamical system We will now apply this view of mathematical models in order in the sense of the aforementioned definition, since it is one of to set up a language for dynamical systems. There have been the few instances in which the behavior
8
is described explic many attempts to come up with a suitable axiomatic framework itly, and not indirectly in terms of behavioral equations. It took for the study of dynamical systems. Mathematics has chosen for no lesser man than Newton to think up appropriate behavioral a notion which centers around autonomous state space systems equations for this dynamical system.( x =
f ”
x and its generalizations). Notwithstanding the fact that Most dynamical systems are indeed described by behavioral many areas Of physics (as Hamiltonian mechanics and quantum equations. These are often differential or difference equations, mechanics) fit this framework very nicely and notwithstanding sometimes integral equations. We will postpone the introduction the impressive phenomenological achievements of this theory (as of such descriptions until later. First, we want to introduce somechaos and strange
attractors).
this line of thought suffers from general properties of dynamical systems in our framework. two serious drawbacks. First, it considers a system as isolated These qualitative properties can be of various type: involving from its environment and, second, it assumes that a model is structure (as linearity), symmetry (as time invariance), related to already in minimal state form (that is, very roughly speaking, the memory (as the state), involving the interaction with the that it assumes that the differential equations describing the environment (as inputs and outputs), etc.system are first order) and explicit in
x.
Both conditions are Definition 11.2: A dynamical systemX
= (p, W, 8 ) is said rarely satisfied in real applications. In engineering, particularly to be linear ifW
is a vector space (over a field %for the in control and signal processing, there has always been a ten purposes of this paper, think of it asR)
and 8 is a linear d e w to view systems asprocessors,
Producing output signals _{subspace }of W v (which is a vector space in the obvious way by from input signals. In many applications in control engineering _{pointwise addition and multiplication by a scalar). }and signal processing, it will, indeed, be eminently clear what _{Thus, ]inear systems obey the superposition principle in its } the inputs and the outputs are. However, there are also many _{simplest form: { w l ( . ) , }_{w 2 ( . ) } _{E 8 ; }
_{p }
EF}
=.
{+
applications where this inputoutput structure is not at all evi
_{p }
w,(. ) E 8
1.
dent (an example at point is in the terminal behavior of an
_{we }
_{can view timeinvariance }_{as a } _{of } electrical circuit). We will view a dynamical system in the_{we }
_{w i ~ }_{therefore }_{first }_{introduce }logical context of Definition 1.1. The only distinguishing feature _{kt be }
_{a }
_{family ofdyna_micd systems, and }_{@ }_{= }_{(sg, } _{E }_{8) } is that now the phenomenon produces outcomes that are func _{a transformation group }_{on }_{C. }
_{T h p @ is a group, and S, defines } tions of time: the universum is a function space. This leads to _{for }_{each } _{E }_{8 a }_{_bijection }_{on }_{x }
_{satisfying }_{sgl }
~ g2 =s,,
0sg2
.
the following definition. _{We will } _{call C }E
C
@symmetric if S,Z = B for all g E @ ,(U,
W,
8)
withD
&R
thetime
axis,W
the signalspace,
and @ consists of the one point 29
E
W v
the behavior.Thus, a dynamical system2 is defined by U, the time instants axis Example: Let =
z,
TakeC
denote all dynamical systems with the time =z ,
viewed as a group under addition. of interest W , the space in which the time signals which the _{L~~ }_{u r : }_{wz }
_{, ~ “ 3 ” }_{denote the }_{backward tshift: }_{( , , 1 f ) ( ~ 3 }
_{:= } system produces take on their values, and8 ,
a family of _{f ( t f }_{+ }_{t ) . }
_{N~~ } _{define, } _{for } _{E }_{p, }
_{s,: }
_{2 }
_{, } _{by } Wvalued time trajectories. The setse
andw
define the setting,_{s,((z,w, }
_{8)) }
_{:= }_{( 8 , }
_{w, }
_{, , t ~ ) , }_{and } _{@ }_{:= }_{( s t , }
_{The } the mathematization of the problem, while 8 formalizes the _{dynamical }_{systems }_{which }_{are } _{in }_{t h i s }
_{will be }model
C,
time signals in 8 can in principle occur, are compati in a trivial way to continuous time(u
=w)
(andin able with the laws governing 2 , while those outside 8 cannot bit less trivial wayto arbitrary time sets).
occur, are prohibited. Example: Let
B
denote all timeinvariant dynamical systemsWe will use the terms “system” and “dynamical system” with time axis =
R,
consider the (trivial) group consisting of interchangeably. Our notion of a dynamical system hardly does two points: 8 = 11, g } , 1. ~ ~s g ( ( ~ ,
f w , i8 ) ) ~= ~justice to the word “system,” it only captures the idea of a
,
I
(
w,
rev 8 ) with rev: W R + W R the timereversal operadynamical model. In particular, the word system as it used in tor: ( r e v f ) ( t ) := f( t ) . The dynamical systems which are
Definition 11.1 does not formalize structure as interconnection, in this are time This defini or algorithmic features involved in our intuitive idea of tion is easily extended to discrete time.
“system.” Since time invariance will play an important role in the
Example

Kepler’s Laws: If a definition is to show proper _{sequel, }_{we }_{single it } _{in }_{a }_{definition, }respect and do justice to history, Kepler’s laws should provide _{Definition } _{11.3: }_{A }_{dynamicd }_{system } _{= }
_{(u, }
_{W, }
_{8 ) }_{with } the very first example of a dynamical system. They do. Take _{= }_{g }
_{or }_{R }
_{is said }_{to }_{be time }_{invariant }
_{if }_{,,IB } _{= }_{8 for }_{all }0
=R
W
=W3$
and 8 = { w :w
*w3
~ ~ e p l e r ’ s laws aret E u ( u t denotes the backwards tshvt: ( c r r f ) ( ~ ? :=
f ( t ’
+
satisfied}. Thus the behavior % in this example consists of the _{t)). }_{If } _{= }
_{2 }
_{then this condition is equivalent to }_{,,B }
_{= }_{8 , }
planetary motions which according to Kepler are possible, all _{The }_{analog }_{of }_{this definition when the time axis is }_{H, }
_{or }_{w, }
11. DYNAMICAL SYSTEMS
Definition
IZ*I:
A dynamical system is a triple = i.e., if the orbit throughX
induced by the transformation grouplaws which govern the System. According to the dynamical called (discrete time) time
invariant.
This definition can bereference to inputoutput maps or relations, without reference to state Important!
we
will assume, essential[y throughout thisvariables, and without reference to behavioral equations has been developed paper, that [he lime axis
is
a
(usually)or
(sometimes)successively in [lo][14]. We believe that it presents a very succinct,
simple, and general starting point for the study of all sort of questions in the and that Our dynamical systems are time invariant’ This
theory of dynamical systems. assumption is made in part for reasons of exposition and notation ’Our definition of a dynamical system as a family of trajectories without requires u f %
E
8
for all t Eu.
262 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 36, NO. 3 , MARCH 1 9 9 1
(U
=X ,
notI+;
U
=I,
not I+) and in part because itfeatures as an essential assumption in many of the definitions (e.g., time invariance).
Dynamical systems acquire their importance, their idiosyncra cies, their relevance, from the fact that they exhibit memory, from their potential to model phenomena with an aftereffect, from the possibility to describe devices in which the past influ ences the future. We will, in this paper, introduce a number of themes related to the memory of a system, the most central being the state of a dynamical system. This will be studied later. 'In this section, we will consider first, if the infinite past or the infinite future are relevant for deciding whether or not a trajec tory belongs to the behavior.
Definition 11.4: Let C = (U,
W,
8 ) be a dynamical system (as always, henceforth, withU
=2
orW
andC
time invariant);C is said to be complete if { w E 8 } e { w
I
121 E 81
[ f l , l z l , v t , , t , EU,
t , I t,}. X is said to be Acomplete ( A Ev,
4 2 0)i f { w c @ } e {(o'w)~Io,al~8~lo,al,~t~U};ifthi~holdsfor
all A
>
0, we call C locally specified; if it holds for A = 0, we call C static.Second, we will consider how long a trajectory has to be observed before its past and future become independent.
Definition
U S :
LetC
= (U,W,
8 )
be a dynamical system;X
is said to have memory span A(A EU,
4>
0) if { w l , w 2 E 8 ,w l ( t ) = w z ( t ) for 0 5 t
<
A=
{ w l A w , ~ 8 } . Here A de notes concatenation (at0),
defined as(w1 W , ) ( t ) =
{
"w:i:;
;
2:
C is said to have finite memory span if the aforementioned holds for some A; it is said to be Markovian if { w l , w 2 E
B ,
wl(0) = w,(O)} =) { w l A w , E ~ } (that is, in discrete time,if the memory span is 1); and memoryless if { w l , w 2
€83)
=){ w 1 A wz E
B}
(this may be viewed as memory span 0).Note that in a system with memory span A we have called the past ( t
< 0)
independent of the future ( t 2 A ) , given the sys tem's trajectory for 0 5 t<
A . Indeed, any possible past can then be concatenated with any possible future. We view this as"independence" of the past and the futures3
The notion of completeness will play an important role in the sequel. Intuitively, completeness means that the way a signal looks like at t = k m should, per se, be of no consequence as to whether or not the signal obeys the law of the system. As we shall see, systems described by difference equations are automat ically complete (since such laws extend over a finitetime inter val only). Noncornplete systems occur for example when signals are required to be square integrable. For instance
(1,
I,12(Z;
R)) is a (trivial) noncomplete system. For further remarks on 1,systems, see Section XI.3.Dflerence and Dlflerential Equations
The notion of behavioral equations is immediately applicable (take M = W') to dynamical systems. We will, in this paper, be particularly concerned with difference (and differential) equa tions.
A behavioral dzyerence equation representation of a dis cretetime dynamical system with the axis
U
=I+
and signal spaceW
is defined by a nonnegative integer L (called the lag,or the order of the difference equation), an abstract set
R
(called the equating space), and two maps f l , fi: W L + l +E.
They'The terms Murkoviun and independence are borrowed from stochas
tics. We use them here in a purely deterministic setting.
 .. . ~ ~ .
define the behavior by 8 = { w :
Z++
W
1
f l o (
u L w , w ; * * , u w , w ) = f 2 . ( u L w , u L  ' w ; * * , u w , w ) } , This is easily generalized to the case U =X .
However, in this case, it is logical to consider the equation to have both positive and negative lags, yielding the following behavioral equations:u L  l
f l " ( U L W , u L  ' w ; e ' , u ' + l w , d w )
=
f , o ( u ~ w , u ~  ~ w , " . , u ~ + ~ w , u ~ w ) .
We will call L  I the lag of this difference equation. It is clearthat the system
( p ,
W,
8)
obtained this way defines a timein variant dynamical system.The continuoustime analog of this is a dynamical system represented by a behavioral equation which is a differential equation. The time axis is
I
,
the signal spaceW
a differentiable manifold, L a nonnegative integer (called the order of the differential equation),R
the equating space, and f l , f2 maps fromJLW
into [D( J L W
denotes the Lth order jet bundle ofW;
ifW
is an nonempty open subset ofW4,
thenJLW
=W
X( W 4 ) L  1 ) .
This defines the behavior 8 = { w : +w
I
dw d L w dw d L w
dt
'
d t Lf l O ( W ,
df""

) = f , O ( W ,;..

' d t L ) } . It is not
easy to be precise regarding what is meant by the solution of the aforementioned differential equation. We will not belabor this point now but return to it in Section XI.2.
Let us recapitulate the situation. We have defined the dynami cal system in terms of its behavior. Our claim is that this is the natural level of generality in which to think about models. We will consequently always try to define as much as possible qualitative properties of systems in terms of their behavior. We have seen how linearity, time invariance, time reversibility, completeness, the memory span, etc., can be defined in this vein. We have observed how behavioral equations, in particular, difference and differential equations, lead to representations of dynamical systems. Now, given a difference or a differential equation with a certain structure (as, for example, linearity), it is usually obvious that the dynamical system represented enjoys an analogous property (as linearity). The question which we find of much interest is the converse, that is: What properties of the behavior allow the system to be represented by a dflerence ( o r a dz~erential) equation of a particular type? The follow ing proposition gives a small taste of the type of result which we have in mind.
Proposition ZI.6: Let C =
(I,
W,
8 ) be a (discretetime) dynamical system. The following conditions are equivalent:i)
X
is time invariant, complete, and has memory span L ;ii)
E
is time invariant and Lcomplete;iii) C can be described by a behavioral difference equation Proofs of this and other results are collected in Appendix P. The previous proposition clarifies what the essential condi tions are for a system to be represented by means of a difference equation: it has to be complete (absence of initialization and termination conditions on the behavior at t =
*
m ) , and it has to have a finite memory span (observation of a trajectory on a finite interval lets us conclude that what could have happened in the past is independent of what will happen in the future).III.
ARSYSTEMSwith lag
L.
Let us now consider the following behavioral difference equa tions
R , w ( t
+
L )+
R ,  , w ( t+
L  1)+
. ' *
WILLEMS: PARADIGMS AND PUZZLES IN DYHAMICAL SYSTEMS 263
where R,, R,,,
. .
.,
R / + , , R , ER g x q .
Note that this will be the system of difference equations obtained by assumingW
=( a q , E = Fig, and f l , f r linear. Any finite set of linear equa tions relating the outcome timeseries w I
,
w 2 , *,
wq to a finiteset of their time lags will lead to a set of equations which in matrix notations can be written in the above form. We will assume that L , 1 E
H ,
L 2 I (we find it convenient not to assume thatL
= 0 orI
= 0 , even though at this point this would constitute no real loss of generality). Now introduce the polynomial matrix4 R ( s , s') = R , s L+
R L _ , s L  'of difference equations may be written in shorthand in terms of this polynomial matrix as
+
. . .
+R,+,s'+'+
R , S ' E W ~ ~ ~ [ S , SF']. The above systemR ( u , u  l ) ~ = 0 . (AR)
We will call this an autoregressive system. Note that every polynomial matrix defines such a system with the number of columns equal to the dimension of the signal space and the number of rows equal to the number of (scalar) equations. It is logical to consider the signal space
( R q )
and hence its dimension( q ) as fixed by the system. However, the number of equations
( 8 ) in AR will be a variable, it will depend on the system and on its representation. We will therefore assume that in
R ( s ,
s')
the number of columns is fixed but that the number of rows is in principle free. Hence R ( s , s I) will be considered to be anelement of RoXq[s,
s'].
The behavioral difference equation AR describes a dynamical system C = (D,
W, 8 )
with time axis T = 8 , signal spaceW
=W q ,
and behavior 8 = { w :2
+ [ a 4I
R ( a , o  ' ) w = o } .Note that if we consider
R ( s , sI)
E R g x q [ s , sI] as inducing the linear map R ( o , u  I ) :(Wq)"
+(Rg)",
then 8 =ker R ( u , u I ) . It is easy to see that the system C defined above is linear time invariant and complete. In fact:
Theorem
ZZI.1:
Let C =(2,
W q ,
9)
be a dynamical system. The following are equivalent:i) C is linear time invariant and complete;
ii) C is linear time invariant complete and has finite memory iii) 3 R ( s ,
sI)
E W o X q [ s , s  ' 1 such that 8 = ker Riv) 8 E
P
(e
 the family of all linear shiftinvariant closed subspaces of L q ; Lq :=(Wq)Z
equipped, with the topology ofpointwise convergence).
signal processing, discretetime control, econometrics, etc. Against this background, it should be of considerable interest to identify on the level of the behavior what is really behind the assumption that a system can be described by ARequations. The aforementioned theorem shows precisely what these conditions are: linearity, time invariance, and completeness. It also provides a succinct topological classification of the kernels of polynomial operators in the shift: they correspond precisely to shiftinvariant closed linear subspaces. We will refer to the previous theorem as the ARrepresentation theorem,
Motivated by the aforementioned theorem, and with a slight abuse of notation, we will use
P
to denote the set of linear timeinvariant complete dynamical systems (&, W q , 8). span;( o , o  ? ;
polynomial matrices. A glance at Appendix N may be helpful. The terms
4The notation at this _{point becomes very } _{dependent on polynomials and } autoregressive, moving average, and autoregressivemoving average are borrowed from the theory of stochastic processes and timeseries analysis. We use them here in a purely deterministic context.
Parametrization
We will use the nomenclature introduced in Section I.
4
parametrization(P,
a) of a family of dynamical systemsX
consists of a y a c e P, the parameter space, and a surjective map a:D
+X .
We think of P as a space of concrete objects(matrices, polynomials, polynomial matrices). More abstract ways of obtaining
X
will be referred to as representations(e.g., the latent variable, state space, and inputoutput represen tations, which will be studied later on). Theorem 111.1 implies the following proposition:
Proposition 111.2: Let
P
denote the set of linear timein variant complete systems(T,
W, 8)
withT
= 2 andW
=W q .
Then ( R o X q [ s , s'], ker) with ker: R ( s , s')

(0,
W , ker R ( o , uI)) defines a (polynomial matrix) parametri zation ofP
4'.
Since in a parametrization (P, T ) it is usually clear from the context what the map a is, we will often refer to P as the parametrization. Thus, the aforementioned proposition states that
W o
x 4 [ s , s '1
is a parametrization of9
q . This makes itevident why polynomial matrices play such an overwhelming role in linear system theory. This parametrization induces an equivalence relation on W o X q [ s , s'1. We will denote this equivalence as AR. Thus, for R , ( s , s'), R,(s, sIl) E R o X q [ s , s  ' ] , { R l A 5 R 2 } : o { k e r R , ( o , u  ) = kerR,(a, uI)}.
It is clear how the notions of canonical forms, invariants, and transformation groups become useful in the context of dynamical systems. As already mentioned, behavioral equations specify the behavior uniquely, but the converse is not true. As such, two distinct ARequations can represent the same behavior, the same dynamical system. A canonical form weeds out some of this redundancy, a trim canonical form does this weeding out until no redundancy is left. An invariant is a function of the parame ters of the behavioral difference equations which actually de pends on the induced behavior only, while a complete invariant offers a unique specification of the behavior in terms of the coefficients of the behavioral equations. Recognizing a suitable transformation group will allow us to construct from one AR system, all AR systems which model the same dynamical sys tem.
We will study efficient canonical forms for ARsystems later. In this section we will only show how to reduce the number of equations. At several points in this paper, we will study minimal representations. This minimality will always refer to keeping
the number of equations as small as possible and keeping the number
of
latent variables (cf Section IV)as
smallas
possible(within the class of representations of a particular type of a given dynamical system). For ARrepresentations this requires having a minimal number of equations (since there are no latent vari ables), as expressed in the following definition.
DeBnition 111.3: Let R ( s , s  ' ) E
W g x q [ s ,
s'1. We will call the system of ARequations R ( o , u ')w = o minimal (for simplicity we call also R minimal) if { R ' ( s , sf) EW g ' x q [ s ,
s'1, R ' , i R }*
{ g ' 2 g } .Proposition
111.3:
Every C E 4 admits a minimal ARrep resentation R( u, o  ' ) w = 0 . Moreoveri) { R is minimal) o { R ( s , s  ' ) ~ R ; / q [ s , s'1, i.e., R is of fullrow rank}
ii) If R is minimal, then { R ' ( s , s  l ) E R o X q [ s , SKI],
R > 5 R , R' is minimal} Y { R and R' are unimodularly left
equivalent (that is, there exists a unimodular
U(s,
sI) such thatR =
UR')}.
tions may be deduced from one: as the orbit of the transforma tion group R +
UR
where U ranges over the unimodularpolynomial matrices of a suitable dimension. This group is the unimodular group acting on the left. Note that the aforemen tioned proposition implies in particular that
W7Tq[s,
s'] is a canonical form since each element ofP
admits a minimal ARrepresentation. We emphasize that when R ( o , cr')w = 0is not minimal, then this does not mean that there are necessarily redundant equations in this system of equations. In order to obtain a minimal system it will in general be necessary to first recombine some of the equations, and then cancel those which have become redundant.
IV.
LATENT
VARIABLESWe will now push for a few moments dynamical systems into the background and discuss again some aspects of general math ematical models. Our view of a mathematical model as ex pressed in Section I is as follows: identify the variables of the phenomenon which we want to model (specify the universum I)
and identify the possible outcomes, the behavior (specify
8
EU).
However, in most modeling problems, we will need to work with other variables in addition to those which we try to model. We will call these other variables latent variables. They are auxiliary variables introduced, if for no other reason, because they make it possible to express conveniently the behavior, the outcomes of the variables inU.
In order to distinguish these from latent variables, we will call the elements ofU
manifest variables.s In a bit, we will give a series of instances where latent variables appear. Let us, however, start with two typical concrete examples.Example: Consider a restrictive electrical circuit. This con sists of a graph, with nodes and branches. Some branches contain resistors, some are external ports. We want to model the port behavior, the relation between the voltage drop across and the current through those external branches. In order to specify the sign of these, we make the graph into a directed one. Now introduce as auxiliary variables the voltages across and the currents through the branches. The following relations must be satisfied.
i) Kirchhoff's current law (KCL): the sum of the currents entering each node must be zero.
ii) Kirchhoff's voltage law ( K V L ) : the sum of the voltage drops across the branches of any loop must be zero.
iii) The constiturive laws of the resistors in the resistor branches.
An example is shown in Fig. 1. The following equations must be satisfied (in the notation indicated in Fig. 1):
Constitutive laws : KCL : KVL : R , I ] =
v,
I = I , + I ,v , + v , = v
R212 =Vz
I,
=r,
+
r4
v,
+
v,
=v
R,13 = V3 I,+
r3
=I,
vl
+
v,
=v,
+
v,
R4Z4 = I$ I = I4+
z5
V I
+
v,
=v2
R515 = V,v,
+
v,
=v,
Our basic purpose is to express the relation between the voltages and currents at the external branches. In the aforemen
iliary) variables as used here is quite appropriate in analogy with the use of
distinction between manifest (signal) variables and latent (aux these terns in psychology. Latent variables are reminiscent of Rosenbrock's partial state [81 and of the internal variables used off and on in electrical circuit theory.
  . ...  ~~~ . ._
I
Y
c
261 IEEE TRANSACTIONS ON AUTOMATIC CONTROL. VOL. 36, NO. 3, MARCH 1 9 9 1

I
Fig. 1. A resistive electrical circuit
tioned example, this will be a relation of the form
V
= RI(where R can be calculated from R , , R,,
R,,
R,, andR5),
obtained by eliminating (V,
,
*.
,
V,, V,,
. .
* , I,) from the previous equations. However, the basic model, the one obtained from
first principles, involves the variables ( V,, * *
.
,V,,
I,,
. . .
, I,)in addition to those which we are trying to describe
( V ,
I).
The port voltageV
and the port currentI
are the manifest variables, while the voltages across and the currents through the internal branches (the variablesV,
,. . .
,V,,
I,, *.
,
I, in the aforementioned example) are latent variables.
Example: An economist is trying to figure out how much of a package of
n
economic goods will be produced. As a firm believer in equilibrium theory, our economist assumes that the possible volumes will consist of those points in the behavior space where, product for product, the supply equals the demand. This equilibrium set is a subset ofW
:
.
It is the behavior whichwe are looking for. In order to specify this set, we can proceed as follows. Introduce as latent variables the price, the supply, and the demand of the n products. Now determine the supply and demand functions Si:
W
:

W,
and Di:R;
W,.
ThusSi(
P I , P Z , .. .
, P , ) andD,(
p I , p 2 , * * , p , ) equal the amountof product i which will be bought and produced when the going prices are p I , p,, *
. .
, p n . This yields the following behavioralequations:
These behavioral equations describe the relation between the prices p i , the supplies si, the demands d j , and the production volume
6 .
The V,'s for which these equations are solvable yield the desired behavior. Clearly, this behavior is most conveniently specified in terms of the behavior of the variables p l , s i , d i , andV,
( i = 1,2, *.,
n )
jointly.These examples illustrate the following definition.
Definition IV.1: A mathematical model with latent vari
ables
X,
is dejned asa
tripleC,
= (U, [I,8,)
withlJ
theuniversum of manifest variables,
R
the set01
latent variables, and9,
E U Xb
the f u l l behavior. It defines the manifest mathematical model(U,
8 ) with 8 := { u E U1
~ I E15
such that ( u ,I )
EBy}
;8
is called the manifesr behavior. We call(U, R,
Bf)
a latent variable representation of (M, $3).Notions as behavioral equations; linearity, symmetry, parametrization, etc., can be generalized in an obvious way to models with latent variables. Thus in the economics example we have
M =
W
:
,
R
=Wy,
and8,
described by the behavioral equations given in the example.The above definition is easily generalized to dynamical sys tems:
Definition IV.2: A dynamical system with lateni variables
WILLEMS: PARADIGMS AND PUZZLES IN DYNAMICAL SYSTEMS 265
the space of manifest variables, L the space of latent vari ables, and
B f
5(W
xLjr
the full behavior. It defines alatent variable representation of the manifest dynamical sys tem
C
=(T,
W,
'23) with manifest behavior 8 := {w :
U
+W ~ 3 I : U  + O . s u c h t h a t ( w , I ) ~ B , } .
Note that in our framework, we view the signal variables as those variables which the model aims at describing. We think of these variables as manifest, as external, as directly observ able if you like (even though that suggests much more than we have in mind). We think of the latent variables as auxiliary
variables, as internal, as unobservable orbetteras only
indirectly observable through the signal variables. Sometimes we will refer to the full behavior as the internal behavior and to the manifest behavior as the external behavior.
Situations where basic models use latent variables either for mathematical reasons or in order to express the basic laws governing the manifest variables are manifold. Let us mention a
few: internal voltages and currents in electrical circuits in order to express the external port behavior; the momentum in Hamiltonian mechanics in order to describe the evolution of the position; the internal energy and entropy in thermodynamics
in order to formulate the first and second law restricting the temperature and the exchange of heat and mechanical work;
prices in economics in order to explain the production and exchange of economic goods; state variables in system theory in order to formalize the relation between inputs and outputs; the
wave function in quantum mechanics underlying observables; and, finally, the basic probability space
n
in probability the ory: the latent variable space par excellence.Latent variables will invariably appear whenever we model a system by the method of tearing. The system is then viewed as an interconnection (see Section XII) of subsystems and the
modeling process is carried out by Zooming in on the individual subsystems. The overall model is then obtained by combining these models of the subsystems with the interconnection con straints. The ultimate model will invariably contain latent vari ables: the auxiliary variables introduced in order to express the interconnections will play this role.
ARMAModels
All the notions related to dynamical systems introduced so far
W
= W q , andL
=W d ,
can be described by an ARMAsystem of behavioral equations.6An ARMAsystem induces a manifest dynamical system
( E ,
W4,
B )
with 8 = {w
13 I such that ARMA is satisfied), equivalently,B
= ( R ( a , u'))'im M ( a , u  ' ) with ( )  ' the pointtoset inverse, R ( u , u  ' ) : (R4)" + (Rig)", and M ( o ,6 ' ) :
Wd)"
+ (Fig)". Clearly,( H ,
W , 8 ) is linear and time invariant. The question arises if it is also complete. The answer is in the affirmative.Theorem W . 3 ; Let the dynamical system with latent vari ables
Xf
=(4,
F i 4 , I d , '23,) be linear timeinvariant and complete. Then the manifest dynamical system which it represents, C = ( 8 , Wq,
B ) ,
is also linear timeinvariant and complete.In terms of an ARMAsystem of behavioral equations
R ( o , o  ' ) w = M ( u , u  ' ) Z , the previous theorem states that the latent variables I can be completely eliminated from these equations, resulting in an ARsystem of equations. That is, there will exist a set of polynomial matrix ~ ' ( 5 , s') E [ a o X q [ s , s  ' ]
such that the ARsystem of equations R'(o, u
')w
= o repre sents the manifest behavior. In other words, this last set of equations captures all of the restrictions imposed on w by the ARMA equation. This elimination may result in an increase in the lag of the resulting ARsystem as compared to the ARMA system. However, the number of equations in the ARsystem need never be larger than the number of equations of the original ARMAsystem. We will refer to the previous theorem and to the resulting consequence for ARMA and ARequations as theARMAelimination theorem
.'
The results obtained imply that pairs of polynomial matrices
( R ( s , s'), M ( s , s')) with booth
R
and M having the same number of rows and withR
having q columns, provide an other parametrization ofL!
with .x: ( R , M )
(a,
Rq,
( R ( a , u  '))'
im M ( u , u ')), This leads to the equivalence ( R l , M,),,,,(R,,Md.
MAModels
A specially important class of ARMAsystems are those in which R ( s , s
')
= I , yielding the moving average systemw = M ( u , u  l j z (MA) (linearity, time invariance, completeness, etc.) are trivially gen The intrinsic behavior of an MAsystem equals eralized to systems with latent variables. Sometimes it is obvious
im M ( a . 6 ' ) . Of course, by Theorem IV.3,
B
EF q ,
and the variable model (for example, linearity and time invariance)sometimes it is less obvious (for example, completeness). these subspaces of
(W)"
which are images of polynomial operators in the shift. We have seen that every 8 E L! allows an and consider the system of behavioral difference equations Let R(s''
I ) E R g x q ' s ''
'"
M ( s ' s  E R g x d [ S ' s'I
ARrepresentation, that is, that it is the kernel of a polynomialoperator in the shift. Does it also allow an MArepresentation?
that the intrinsic inherits a property from the latent elements of
9
4 which can be represented this way are preciselyIn other words, is it also the image of a polynomial operator in
I?
('
'
')
= M ( '''
11'
(ARMAj
the shift? If not, does the restriction of having an MArepresen relating the timeseriesw :
I
+ R4 (expressing the evolution of tation imply some interesting system theoretic property? It may as a surprise that these abstract questions lead us to the manifest variables) to the latent variable time series I : concept of controllability!Z
+W4.
We will call this an autoregressivemovingaveragesystem (we mean nothing stochastic by this The 6 ~use hof ~(the continuoust,me counterpan
oo
AR and ,ZRMA.mode]sterm R(', ul)w in ARMA is the autoregressive part, _{is implicitly } _{very }_{much present in the inspiring book }_{of }_{Belevich }_{[I]. }_{The } while M ( u , u  ' ) I is called the movingaverage part. Clearly view of modeling implicit in the aork of Belevich is very akin to our own.
these equations represent a dynamical system with latent vari 'Let PEW'^^*. Theorem IV.3 implies that P w i l l map linear shiftin
(z
,w
q ,w
d , 8 f ) with 8,
= ker [ R ( ~, u  1 )1
variant closed subspaces of L** into linear shiftinvariant closed subspaces M ( a .
u I)], From Theorem III.l, we how that precisely of L q ' . Our proof of this fact depends on the Smith form for polynomialevery such system with