Multilinear models for nonlinear time series

(1)

EUI

WORKING

PAPERS IN

ECONOMICS

EUI Working Paper ECO No. 93/2

Multilinear Models for Nonlinear Time Series

Carlo Grillenzoni

WEP

3 3 0 © The Author(s). European University Institute. Digitised version produced by the EUI Library in 2020. Available Open Access on Cadmus, European University Institute Research Repository.

(2)

Please note

As from January 1990 the EUI Working Paper Series is divided into six sub-series, each sub-series is numbered

individually (e.g. EUI Working Paper LA W No. 90/1). ©

The Author(s). European University Institute. version produced by the EUI Library in 2020. Available Open Access on Cadmus, European University Institute Research Repository.

(3)

EUROPEAN UNIVERSITY INSTITUTE, FLORENCE

ECONOMICS DEPARTMENT

EUI Working Paper ECO No. 93/2

Multilinear Models for Nonlinear Time Series

Ca r l o Gr i l l e n z o n i

BADIA FIESOLANA, SAN DOMENICO (FI)

(4)

No part of this paper may be reproduced in any form without permission of the author.

European University Institute Badia Fiesolana I - 50016 San Domenico (FI)

Italy © The Author(s). European University Institute. version produced by the EUI Library in 2020. Available Open Access on Cadmus, European University Institute Research Repository.

(5)

M ULTILIN EAR M ODELS FOR NONLINEAR TIM E SERIES

Carlo Grillenzoni

D ept.of E conom ics, Univ.of M odem & European University Institute

Abstract - A class of multilinear models for nonlinear time series is introduced. It extends the bilinear ARMA representation of Granger-Andersen by including general monomials of lagged input and output. For this class, algorithms of structure identification and parameter estimation are provided, suitable for dealing with subset models and time varying coefficients. An extended application on real economic data illustrates the fra mework and makes comparisons with other nonlinear models. Contents: 1. Introduction; 2. Testing; 3. Representation; 4. Identification; 5. Estimation; 6. Conclusions, Appendicies.

Key-words - Bilinear processes, Multilinear models, Multicorrelation functions, Recursive pseudolinear regression, Neural networks.

(6)

(7)

1 . Introduction

There is no compelling reason to expect social and environmental time series to conform to dynamic models which are linear in the variables. Such schemes, usually having a finite number of nonlinear parameters, are used for ease of statistical analysis, just as the assumption that variates are normally distributed is made for the convenience of mathematical treatment and interpretation.

Time series analysts have begun to turn their attention to the study of nonlinear stochastic processes. Granger & Andersen (1978) and Subba-Rao & Gabr (1984) have developed a class of models, called Bilinear ARMA, that extends the ARMA representation in the same way as the dynamic bilinear systems (see Rugh, 1981). In mathematical terms, the rationale of the approach is given by taking a second order Volterra expansion of the unknown stochastic function that generates the data.

Other models for nonlinear time series exist, e.g. exponential and threshold autore gressions (see Priestley 1988, Tong 1990, for surveys and comparisons) and, recently, neural networks (see White, 1989). However, the advantage of the bilinear approach is that the resulting equations retain a regression structure so that many algorithms designed for linear models can be applied. On the other hand, its fundamental limit does consist in excluding from the representation nonlinear terms which are produced by Volterra series expansions of higher order.

An attempt to fill this gap has been provided by Hinich & Patterson (1985 a,b) with a class of quadratic innovation models having a nonzero bicovariance (third order cumulant) function. Another extension will be proposed in this paper with a multilinear representation that includes all possible monomial combinations of lagged input and output. This approach is suitable for covering, in a parsimonious manner, the higher order moment information contained in a nonlinear time series. At a theoretical level, its deri vation has the same starting point as the state dependent models of Priestley (1988).

The central purpose of the paper is that of providing a model-building framework for multilinear ARMA models. Special attention will then be devoted to the problems of identification of the dynamic structure (order selection) and estimation of the parameters from sample data. Since empirical models often have irregular (subset) structure and their coefficients are time-varying, suitable technical solutions are given by partial multicorre

lation functions (in identification) and recursive pseudolinear regression in estimation (see

Solo, 1978). Loosely speaking, these are generalized moment methods which possess suboptimal properties and are easily implementable.

The paper is organized as follows : Section 2 introduces the case study, focusing on presentation of the data and tests for linearity; an extension of tests presently available is proposed. Section 3 derives a multilinear representation and discusses problems related to its stationarity and stability. In Section 4 techniques of identification mentioned above are developed and applied to the set of data introduced in Section 2. Finally, Section 5 deals with methodological problems related to nonlinear estimation in the case of time- varying parameters; it also concludes the numerical application.

(8)

2 . Testing

The preliminary step that must be taken in modeling a nonlinear time series, is testing for its linearity. This means checking the existence of nonlinear relationships in general, and next tentatively identify their typology in terms of classes of models.

Over the last ten years many linearity tests have been proposed in statistical literature; their general features can be summarized as follows:

1) Tests are carried out utilizing only statistics of residuals generated by linear AR models, which represent the null H0 . In fact, it seems reasonable to start the model building by applying tests that do not require the direct estimation of the nonlinear alternative //, . 2) Tests can be classified into two groups, according on the fact that they do or do not assume a specific class of nonlinear models (e.g. bilinear, exponential, threshold, etc.), under the alternative hypothesis.

In practical terms, however, many of the tests that do not assume a specific model under H1 implicitly refer to quadratic systems, i.e. to systems that admit at most a second order Volterra expansion (see Keenan, 1985). On the other hand, tests that are designed for a particular Hl , may locally have good power in testing for other kinds of alternatives (see Lukkonen el al., 1988). Given this situation of indeterminacy, a reasonable strategy for testing and specifying nonlinear time series consists in comparing several test statistics, with the sole constraint represented by point (1) above. To make these ideas more precise and to motivate further methodological developments, we introduce in advance the numerical application on which the paper focuses.

The data - The application concerns with the index of wholesale prices in the period Jan.1973 - Dec.1985 (N=156), which was crucial for price inflation in Europe. The original series { Z,} is not very interesting since it exhibits a marked linear trend representable by a random walk plus d rift; Z, = n+Z,_, + z,, where {z,} is a zero mean process.

The transformed series { z,} has been already modelled by Grillenzoni (1990) in terms of linear models with time varying parameters, and by Grillenzoni (1991) in the context of multi variable transfer functions. These studies have shown that { z,} can be significantly explained by the series of exchange rates £/$ , while the reverse causality (in the Wiener-Granger sense) is completely absent. This evidence supports the viewpoint of Keynesian economists, as opposed to monetarist theories, about the real determinants of inflation. The implications for economic policy are obvious: exchange rates must be controlled by central banks in order to avoid volatility of prices.

In this paper we refer to the series {z,} for detecting and modeling nonlinearity in the variables. First step is taken by evaluation of descriptive statistics, such as skewness / = .679 and kurtosis £ = 3.124 ; the combination of these in the formula lV [i2+(£2-3)/4]/6= 11.95 is asymptotically distributed as a y2(2) under the hypothesis of gaussianity (see Jarque & Bera, 1980). Detailed evidence of non-normality is also pro vided by the non-parametric density estimate /(z) = (lV/t)~‘ i K[ (z - z,)//i ] , obtained with a window width h = .2 and the kernel K = N(0,1) (see Figure lb).

(9)

Figure 1 (a,b) - Plot of series z, = (1 - B)Z,~ g and non-parametric density estimate .

4 0.03

0.025

Plot of series {z,} in Figure la shows a number of "bursts" of variability which are typical of bilinear time series models (Priestley, 1988). Figure lb confirms the existence of bilinearity in terms of a marked asymmetry of /(z); however, there is also a significant bimodality at z = 0 which is typical of threshold AR processes (see Tong, 1990 p.157). Tentative nonlinear models for { z,} are then given by

In order to refine these guesses we go back to the introduction of the section and apply the linearity tests described in the literature. As stated at point (1), these simply require the estimation of the model specified under H0; in particular, its residuals

Table 1 - Sample autocorrelations and bicorrelations of residuals {a ,} .

Lag 1 2 3 4 5 6 7 8 9 10 11 12

riaA) -.03 .00 .00 .02 .07 .02 .01 .10 -.06 -.07 -.05 .13 Q(24)=16

riaA2) .06 .13 -.04 .12 .04 -.01 -.07 -.08 -.07 .09 -.09 .08

rid1 a?) -.13 .07 -.14 .01 -.09 -.10 -.01 -.03 -.06 -.01 -.06 .12 Q(24)=21

Some Tests - Table 1 reports autocorrelations and cross-correlations of squared residuals. The utility of these coefficients in detecting bilinearity and heteroscedasticity (of ARCH type) have been discussed by Mara vail (1983), Li (1984), Kumar (1986) and Gabr (1988). A portmanteau test of Ljung-Box type was proposed by McLeod & Li (1983) and refers to the statistics Q(k) = N , r j a:(/) • It can be shown that asymptotically Q{k) = NRl — where Rf is the coefficient of determination from the auxiliary regression

a] = o2+ I ‘ , , a, + e, — therefore its meaning as a test for ARCH is apparent. BAR: z, = ()>z.l+ P z,.1a,_1+ o a ,, a,~IID (0,l)

H0 : z,= .0 1 + ,62z,_ !+ « ,, a 2, = .93, R2 = .38 (2.1) cm os) © The Author(s). European University Institute. version produced by the EUI Library in 2020. Available Open Access on Cadmus, European University Institute Research Repository.

(10)

Table 2-Results of typical linearity tests applied to the series { z,} .

Type Value Critical 5% Authors

Portmanteau Bilin.- ARCH 11.8 x\ 1 2 ) = 2 1 .0 McLeod & Li (1983)

Lagr. Multipliers Bilinear(2,2) 6.3 X2(4) = 9.5 Weiss (1986)

Lagr. Multipliers Exponent.(2) 5.1 %\2) = 6.0 Lukkonen et al.(1988)

Tukey 1 d.f. Quadratic 7.8 F(l,150) = 3.9 Keenan (1985)

Tukey- CUSUM "Threshold" 5.1 F(l,140) = 3.9 Tsay (1989)

Likelihood Ratio Threshold 7.2 Y (2) = 6.0" Tong (1990)

N ote: Critical values refer to asymptotic distributions under H0 .

Lagrange multiplier tests have been discussed by Weiss (1986) and Lukkonen, Saikkonen & Terasvirta (1988), and refer to the quadratic statistic

LM

_{- < !}

m

)

M„ = \ I z j t h ,

where z» = [ 1 ,z,_,...z,_,]' are the regressors of the linear model. The structure of £u depends on the alternative hypothesis ; for example, in a bilinear model of order (2,2) it becomes z_u = [z,- i d , - , , ... ,z,_2d,_2] ' , while fora second order exponential AR we have z„ = [z,3_i,z,2_iz,_2] ' . The choice of the orders under H, must be carefully made since they determine the power of the test. As before, it can be shown that the LM statistic is asymptotically equivalent to N R1, concerning the regression of d, on £„ .

The test proposed by Keenan (1985) assumes under H, only a nonlinear model which admits a second order Volterra expansion. The procedure is similar to Tukey's one- degree-of-freedom test for nonadditivity and involves the auxiliary regressions of

zf = ( z , - d , f on zq, (with residuals e ,) and of d, on e , . Tsay (1989) has developed a

method which combines Keenan's test with the CUSUM test by Petruccelli & Davies (1986) for threshold autoregressions. The basic steps consist of getting standardized recursive innovations &n from the linear model with arranged (sorted) observations z(0, and next in regressing d n on z_m .

The empirical results reported in Tables 1,2 show that the sole statistic significant at 1% is the Keenan test; Tsay's test is significant at 5% and so is the LR test by Tong. This last needs a consideration apart since it requires the direct estimation of the model under

Hi ; moreover, since the likelihood function is not differentiable with respect to the thre shold parameter (in the example it was set z,_, = 0 ), standard asymptotic theory does not hold. The value of -21og(LR) reported in Table 2 is therefore purely indicative.

By interpreting Tsay's test as a test on the nonstationarity of a nonlinear process — recall that it combines Keenan and CUSUM methods— a tentative representation for theseries {z,} is then given by the evolving quadratic model (z, — <{>,z,_1) = az,_,z,_y+a,. This conclusion is supported by the work of Grillenzoni (1990), in which it was shown that <(>, follows a cubic polynomial of time, and by the fact that the diagonal cumulant function p 4 .i+ k ,j +k)=N~l ££,, ( z , z , i s nonzero for (t,y) = (l,3)and k = 0...4. This feature suggests an extension of the test procedures discussed so far.

(11)

Figure 2 (a,b) - Non-parametric estimate of tho regression function E[z, |z,_,] with A =0.2.

A general test - A fairly general starting point for testing linearity stems from the relationship between non-gaussianity and non-linearity. Linear non-gaussian processes simply arise by passing an input a, -IIDfO.o2) through a linear filter \y(B) = X»«oViSi . Now assuming = £ ( a ’) * 0 , the typical situation is that z, = \\i(B) a, have a third order cumulant (bicovariance) function which is uniformly non-zero, namely

' k = 0 ‘

To test for the non-gaussianity of {z,} it is then sufficient to check if |4(j'J) * 0 for any (i , j ) ; on the other hand a test for linearity may be developed on the linear innovations

a, = [z,-E(z,\z, z, _2. ■■)], by checking if pS(i,J) = 0 for all ( i , j ) * 0 .

More generally, a global linearity test may refer to the multicorrelation function

P* ■ i( h ...1* ) ... it)

lAi+il iAt+il

for any ... 0 ) , where At+1 is a maximal for the (/t + l)-th order cumulant function. Various choices are available for A , such as, in increasing order

r t+.(0 ... 0) = E (d ‘ *') A**, = ( {E (d ))E [A ,-E (A ,)f}''\

[ E ( d ] ) f * nn

A = n

however, the most suitable in terms of cross-correlation interpretation is the second. The meaning of the above framework is that of reducing tests for linearity to tests for the independence of linear innovations. This involves the estimation of the filter y (B ), the generation of residuals d, - y(B)"‘z,, the computation of the statistics A, = n*=i<3,_/ ,

An = N"' Hi1,, A ,, and finally

r ttiih... *'«) = i a . à ì ) i , ( , À , - A N) Y 2 N 0, 1 [W-max(iy)]1' © The Author(s). European University Institute. version produced by the EUI Library in 2020. Available Open Access on Cadmus, European University Institute Research Repository.

(12)

The approximate distribution of /-,(■) holds under the null H0 : t, = a,~ I1D , with £ (a * +')< °°, and for N sufficiently large; the argument of the proof being similar to those of standard time series analysis — see McLeod & Li (1983) and Section 4.

Application of the above test to the the residual of model (2.1) has provided several indications of dependence that contradict the results of Table 1. Referring to bi-tri corre lation functions, 5% significance level and k=20 as the maximum lag allowed, the number of significant coefficients turned out to be greater than 100. Non-trivial examples were r3(6, 8) = .25 , r3(4,6) = —.22 , r3(3,3) = -.23 and r„(6, 6, 8) = .31 , r4( 14,14,20) = .23 , r4(5,5,12) = .25, r4( 12,12,16) = -.2 8 , r4(l,3,5) = .26, amongst many others. This complexity requires that more general representations for non-linear time series be sought.

3 . Representation

A natural extension of the autoregressive moving average (ARMA) model z,=(<j>iz,_i + z,_,+0,a,_ !+ ... + 9 ,a,_,) + a, , a, ~ INfO.a2) , toward a representation nonlinear in the variables, can be realized by considering a general nonlinear function of the vector of "regressors" x f = [ z , a , _,] = {:*,,} (seePriestley, 1988 p.92)

n a r m a z ,= / (z ,_ ,...z,-P, a , - i...a,_,) + a ,, a , ~ II D (0,a 2) (3.1)

With this formulation {a,} play the role of innovations of the nonlinear process {z,} and

f(X j) that of projection on the past information : £[z, 13,_, = (z,_1,z,_2, . . . )] .

Instead of proceeding as in the derivation of the state dependent model (SDM, Priestley, 1988 p.93) — i.e. by expanding /(•) in a first order Taylor series about any fixed point Xja— we now consider a general expansion about the origin in terms of Volterra

series. By assuming /(•) analytic (i.e. differentiable of every order) around r , = 0 , we may

get the expansion

the various sums define, respectively, linear, bilinear, trilinear,..., forms in the pseudo regressors x„ , i = l ,2 ... (p + q) = m . The explicit constant term po=/(0) may have a significant role in thedynamic behaviour of the resulting multilinear framework, expecially when a finite series has to be used in practice. Here, the Multilinear ARMA model simply arises by truncating the above expansion and taking any subset structure :

0+ Z_{y=lL*i+.+A,-/}

z

P w . n w ’

X it — i z t - i • a t - G - p ) } t = 1 ,2 ...p _{P*1 •*»}

+ a, , a ,-I I D (0 ,a J ) (3.2) 3 7 ( i )

dl'xu ... cf~x„

the degree n depends on the shape of /(■) and requirements of accuracy, the coefficients { Pt,... ».} are subsequences of the Volterra kernels. In certain cases, the justification of (3.2)

(13)

also steins from ordinary series expansions, as the following example shows.

Example 1 - Consider the model z,=z,_,exp(-<j,_1) + al ; hence from the expansion

exp(-a) = Xt.o (-1 )*a*/ k ! we have, by truncation, z, = z, _ t—z, _ ,a, _ t + z, _ ,a,2_,/ 2 - z, _ ,a,3_,/ 6 + a, which is a multilinear model.

Model (3.2) can also be viewed as a direct generalization of the Bilinear ARM A system, e.g. taking n = 2 we may get the quadratic model

z, = l z,_i+ £ 9,a,_,+ £ £ p„z,_1a,_,+ I £ a,,z,__{y=i 4} _J_{1 = 1y = 1} _{i = iy=i} iZ,-y+ £ £ 5i a,_1a ,.y +a, (3.3) where (r,P ,R )< p , (s,Q ,S)< q and [<!>,, 07; cqy, 5tf] are subsets of the coefficients [ P.-; py] . Quadratic terms like ( z , z , _,), (a, ., a,_j) were excluded from the bilinear representation by Granger & Andersen (1978) on the basis of the fact that they may raise difficult problems of stationarity and invertibility. However, Hinich & Patterson (1985 a,b) have shown, on real economic data, the effectiveness of quadratic innovation models of the type z, = £ ? ..£ ? „ 8^0, a,_y+a,. Unlike the bilinear ARMA (see Kumar, 1986; Gabr, 1988), this class always generates nonzero third order cumulants and thus it is a candidate to represent more complex processes.

Stability and Stationarity - Stability properties are suitable features for dynamic models, since they determine the reliability of the forecasting and control rules as well as the performance of the parameter estimators. As a general definition of stochastic stability we adopt the principle that to inputs { a,} bounded in probability, there must correspond outputs { z,} uniformly bounded in probability. While this condition may allow for the existence (finiteness) of some moments, asymptotic stationarity is a stronger concept since it enables the same moments to have constant asymptotic expressions. On the other hand,

strict stationarity does not presume the existence of any moment, since the densities

associated to {z,} may be Cauchy with constant parameters.

In past years, most of the theoretical research in nonlinear time series has been con cerned with finding parametric conditions for the existence of convergent solutions to the various models. Specifically, given a, ~ I1D, if there exists a unique measurable function g :9 T -> 9 t such that z,=g(a,,a,_l ,a,_2, ...) almost surely for all i = 0, l ,2... then the process {z,} isstrictlystationaryandergodic(seeStout,1974p.l82). — Note, by contrast, that stochastic stability simply requires that for any input bounded in probability the solution g,(a, ...) does not diverge.— Recently, a general technique of analysis of the ergodicity has been exploited by Tong (1990, Chap.4), assuming that the nonlinear models be representable in terms of vector Markov chain x, =/(*,_,) + e, with /(•) analytic and e, ~ I1D (0,£< °°). Unfortunately, this feature does not hold for multilinear models, and therefore the Tong's method cannot be applied to equations (3.2M3.3).

Even restricting the treatment to bilinear ARMA models (i.e.R = Q = R = S = 0 in (3.3)),

compact parametric conditions of stationarity have been established only for particular

orders (p ,q ,r ,s ) . For the superdiagonal model z, = £ f . ,<(),■ z,_f + £ J r , p/iz,_;u ,_ l +a, with £(a,2) = a2< ‘»>, that written with a vector notation becomes z, =$'z,_,+ P1'z ,.1a,_1+a, ,

(14)

Bhaskara-Rao, Subba-Rao & Walker (1983) have obtained the following conditions stationarity : \n„ ('J) ® <I>+B ® B o2)< 1 (3.4a) invertibility: j)1'E(z,_,z,_i') § 1 < 1 (3.46) where 4>' = [$:/,_!] .B ' = [ £ ,:0 ,- i] and X„„(-) denotes the spectral radius. If (3.4a) is satisfied, then the equivalent markovian representation z, = <t>z, . 1 + BzJ 1a, , + c a ,, with c' = (1:0,, _i), admits the multilinear MA solution z, = T i, , 11*.i ( <D + B a, ,)ca, _t+ ca, that converges in mean square for all t . Analogously, under (3.4b) there exists a unic function

h : 9T->9? such that a, = A(z,,z,_,,z,_2, . . . ) converges with probability one for all t. Following this approach, Pham (1986) and Liu & Brockwell (1988) have derived stationarity constraints for a general BARMA model, which are very complicated and difficult to apply. Anyway, even though conditions (3.4) seems more transparent, their importance remain of "limited" practical value in view of the following remarks: i) It is not clear what they actually mean in terms of system parameters {<j>,, pyl} . For example, it can be shown that stationarity in mean is ensured by A,„„(4>)< 1 , and this is clearly equivalent to the stability of the AR polynomial <)>(B), but what are the parametric consequences of (3.4b) ?

ii) They are not concerned with cumulants and higher order moments involved by non linear algorithms of estimation. Conditions for thefc-th order stationarity of bilinear models could lead to severe requirements on their parameters, difficult to fulfill in practice. Two examples better illustrate these points.

Example 2 -L et z, = 3z,_1a,_i + a, with k > h > 0 ; in this case condition (3.4b) means

32 E (z,2.,)<1 . Squaring z, , we get the difference equation £(zf_1) = p2o2£(z,2.2i) + o2; if the condition of stationarity (3.4a) holds, i.e. p2o2< l , the asymptotic solution of this equation leads to the invertibility requirement p2 o2/ (l-p 2o2)< l , i.e. p2cr2< .5 .

Example 3 - Consider the above example with a, -IN fO .o2) gaussian ; having

£(z, ka, .,) = () taking fourth moments, we get £(z,4) = (p43o4)£(z,4_»)+3c4 . Solving for this difference equation, a necessary condition for the 4-th order stationarity of {z,} becomes p2o2< 1/V3 - .6 . As for invertibility this condition is stronger than (3.4a).

When autoregressive components are present, the constraints tend to become even moresevere. For example, in the model z, =<|>z,_,+pz,_,a,_i+a, the existence of 4-th order moments requires [<t>4+6(i)>pa)2+ 3(p a)4]< 1 (see Sesay & Subba-Rao, 1988). It is then clear that conditions of stationarity for complex nonlinear models not only are difficult to establish but may not exist at all — that is , the region of stationarity in the parameter space might be empty.

These comments tend to discourage the analysis of the stability properties of multi linear models (3.2M3.3). Granger & Andersen (1978) have heuristically shown the virtual nonstationarity and non-invertibility of the schemes z, = az,2_, + a, , z, = 8 a,2_, + a ,. However, unless the contrary is proved, one cannot exclude that suitable properties may hold, even though locally and for particular realizations, for the quadratic model (3.3).

(15)

With respect to the nonlinear system (3.1), if |/(x,) |< | xj , i.e. /(■) is a contraction mapping, the process {z,} is stochastically stable. Examples of this kind are given by systems that can be reduced in the form of rational transfer functions, e.g.

(l + 8z,_1a,_I)z, = (a -p z ,_ 1a,_l)z,_» + a,(l + 8z,_,a,.i). a -P z ,,ia ,„i _{1 + 8z,_1a,_i} <1

However, these models are nonlinear in the parameters and thus require complicated algorithms of identification, estimation and forecasting. The peculiar feature of repre sentation (3.2) is given by its regression structure which allows for direct application of many recursive procedures of standard time series analysis.

Strong realistic considerations that may mitigate the picture of uncertainty outlined so far, are given by the following observations:

a ) Stationarity properties are certainly suitable features, but they are often concerned with abstract asymptotic behaviour of the output of the models. In modeling real data, users typically deal with finite sampling intervals; moreover, many observational time series (mostly in economics) are nearly divergent in nature.

b ) The parameters of the models may be time-varying (deterministically or stochastically). In this case, issues of convergence and stationarity do not arise by definition; moreover, the change of the "regression" coefficients may have a stabilizing effect on the behaviour of the output { z,} . Specifically, even if the region of stationarity of the constant parameter model is empty, or nearly so, there may exist sequences of time-varying coefficients { P,} which force {z,} to be bounded in probability.

Example 4 - Situations of this kind can be illustrated by simulations. In a number of

experiments, we have assesed, for example, that the quadratic process z, = az,2_,+ a, , with zo = 0 and a, uniformly distributed in the interval [-1.5,+ 1.5], tends to overflow within 1000 recursions for | a |> .30. However, if { a ,} oscillates in the ring .30<|a,|<.40 (e.g. dj = + .3 5 ,0t2 = -.3 5 , oq = +.35 ... etc.) the process { z,} may not diverge. In practice, there are infinite trajectories of { a ,} , laying outside the region of stability, which confine { z,} within finite bounds.

In the sequel we shall use the additional degree of freedom represented by the variability of the coefficients for enlarging the region of stability of the multilinear models and for weakening the parametric conditions of existence of their moments. This means, in practice, making statistical inference under the next working hypotheses :

Assumptions A - Consider the class of processes (3.2) with deterministically varying

coefficients {p ,} and distribution functions F,( ) ■ Then, for every input { a ,} bounded in probability [i.e sup,? (| a, |=<*>) = 0 ] , there exist trajectories of { j^} such that: (Al) {z,} is asymptotically independent: $(m) = sup,|FI(z1 |z,_ „ )-£ ,(z,)|->0 as m ~ ; (A2) { z,} is strictly bounded : infj/’d z, |<Af)= 1, M «~>.

Condition (A l) is a particular version of the so-called (p-mixing property (see Stout, 1974); condition (A2) enables the existence of moments of every order (the so-called moment property), that is (A2) => sup,£ | z, |*<°» for all k < ~ . This result follows im

(16)

mediately by the fact that, given any real random variable Z , the existence of £ | z 1*, k < implies that z*+5£(|Z |>z)->0 as |z|-»<*> with 8 s 0 ; on the other hand, the converse holds for any value 8 > 0 (see Laha & Rohatgi, 1979 p.38). Hence, there must be a precise relationship between extreme values that { z,} may assume and the mass of probability of the corresponding events.

With this in mind we can state that if only moments up to order K < « are concerned in the analysis of the process { z,} , then (A2) may be weakened as :

(A2)’ sup, P(| z, |>z )=o(l/z*r+5) as |z|—>°° with 8 > 0 Instead, a requirement stronger than (Al) is represented by

(A l)’ <()(m) = sup,sup„ |F(z(i. . . z , J z , z , _ _ „ ) - F ( z , i ...z,-) |- » 0 as For identification purposes we also need the following regularity conditions.

Assumptions B - For any model that satisfies Conditions (A), the sequence of coeffi

cients { p,} is bounded and has a well defined time-average behaviour, namely :

Moreover, it enables the process {z,} to be quasi-stationary of order K , that is :

It is important to stress the structure of the asymptotic average operator £(■) since it has a fundamental role in defining suitable parameters for off-line inference.

Forecasting - A useful practical consequence of the analysis of stationarity is that the asymptotic solutions z, = g(a,,a,_i,a,_2...) to the various nonlinear representations, have a direct utilization by forecasting algorithms. By contrast, the computation of the optimal multistep predictor from the models in their original form of difference equations, is strongly affected by the presence of nonlinear regressors, such as (z,_,a,_4) , h < k .These difficulties clearly increase in the case of multilinear systems, so that suboptimal and pragmatic solutions must be sought.

Given a model and the set of information 3, up to time f , the task is to find the expression of z,(f), the predictor of z,+, optimal in ms-sense. It is well known that

but this conditional expectation is always linear only for { z,} gaussian. In the general nonlinear process (3.1) we easily find z“,(l) = / (*,), therefore a simplified multistep pre dictor can be obtained by extrapolating the identified function in the form of a deterministic difference equation, namely

(Bl) (B2) U D = argmin£{ [z,+,-z,(/)| 3 ,]2} = £ [z ,+, 13,] © The Author(s). European University Institute. version produced by the EUI Library in 2020. Available Open Access on Cadmus, European University Institute Research Repository.

(17)

The application of this approach to the multilinear model (3.2) involves approximations of the kind £ [z,+,zi+aZ,+,| 3,] = z ,(l)i,(h )it(k ), (l,h ,k ) > 1 , such as in the one step ahead forecast. However, when lagged values of { a, } are present in the "regressors", bad results may be generated and other solutions must be attempted. In practice, the above strategy corresponds to combining forecasts generated by sub-models.

Example 4 - Consider the model z, = az,2_, + p z, _, z, _2+ a, . The optimal one step ahead

predictor is z",(l) = (zl+1- a , +1) ; similarly z“,(2) = a £ [z 2*, | 3,] + p£[z,+11 3,] z,, where £ [z,2+i 13d = £[(az,:l + Pz,z,_i)2 + 2(az,2+ |Jz,z,_,)a,+, + a 2+113,] =(az,2 + Pz,z,_1)2+ a 2,hence z“,(2) = az',(l )2 + a a 2 + p z',( 1) ■ z ,. Prediction based on sub-models gives z,(2) = z',(2)-ao 2 .

A pragmatic solution must also be adopted for the variances of prediction errors d2(i) = £[a,2,, | 3,] • For l = 1 we clearly have b2(l) = cr2, but for general prediction horizons we must resort to empirical estimators based on past forecasts z",(/), namely

a 2(/)= £ [_t=( zt. , -z\(/)]2/ ( ( - 0 , 1>1

4 . Identification

Once the class of models for representing a nonlinear time series has been choosen, a crucial phase in the modeling is given by the selection of its orders. With respect to the class (3.2), the task is difficult since it requires the definition of the structure of monomials

{yj, = n r.iZ,^ , i.e. of the powers ky .h y . For the sub-class (3.3), which is

more regular, two techniques developed for linear and bilinear models may be used . 1) Parametric - By assuming a, ~ IN( 0, a2), independent normal, the orders are selected by minimizing some information criterion IC = -2 log(likelihood) + f(N ) dim(model) :

n = argminjjog(o2) + n — > n = (p + q + r ■ s + P ■ Q +R ■ S) (4.1) where (jN -d) ,d =max(p,r,P ,R ) is the effective number of observations used for calcu lating the maximum of the log-likelihood -(A f-d)logo2/2, and normalizing the IC . The function f l N - d ) is what characterizes the kind of IC used in practice; Akaike, Schwarz and Hannan & Quinn have suggested, rispectively, f(N ) = 2 , log(W), log(log(iV)).

2) Nonparametric - This approach simply assumes a, ~ IID (0 , a2) , and it selects models by comparing the sample behaviour of same higher order moments with those theoreti cally generated by a class of low order models.

Example 6 - Let z, = 8 a, a, _» + a , , h < k ; simple calculation show that { z, > is white

noise. However, third order moments = £[z,z,+,z)+J] have six nonzero values , namely \i,(h,k) = m(—A, —A) = |i3(&- h , - h ) = 8o4 plus their permutation symmetries.

Both these approaches are of limited practical value since they rely heavily on the assumption that a true (regular) multilinear system exists. By contrast, data are often gene

(18)

rated by irregular (subset) models, having sparse coefficients at strange lags . The main consequences are that the estimation of information criteria may fail owing to the presence of many insignificant and collinear terms, which make the Hessian matrix associated with the nonlinear estimator, ill-conditioned. Secondly, analysis of the theoretical multicova riance functions, related to all the subset alternatives of (3.2), is practically impossible and some patterns are shared by different model structures.

The identification procedure that we now propose stems from viewing the multilinear system as an ARM AX model whose inputs are given by the monomials yJt = 11, z, FL a, , namely <t>(B) z, = p0+X” B „ P, y,t+ 0(B) a, . Selection of significant regressors may then be developed with "second order" moments £[z,yy,_»] = y,y(A); in practice we assume that with multiple products in (3.2), the series { y ,,}” aquire an autonomous nature with respect to the output {z,} . Non-parametric techniques based on "cross covariance" can tentatively be used to find coefficients { P,-}^, that have a chance to turn out significant in efficient estimations.

Example 7 - Consider the model of Hinich-Patterson z, = XT-, XJ= 18,y a, a, _y + a, ; by

defining y = we clearly have £[z,y,-„.»] = 6A1a4 . Analogously, for more complex models (3.3) a sufficient condition in order that oq,- # 0 , is given by the partial bicovariance £ [z ,y , - i j \ y , >0] * 0 where y ,-,j-k = z,_,_1z,_y_1 .

The strategy of putting coefficients in correspondence of every significant multico variance y,j(k), is certainly approximate and leads to overparametrization. However, it drastically reduces the number of terms to be considered in the estimation of information criteria. There are also a number of heuristic facts which make reasonable such a strategy in the earlier phase of identification :

• although Y,y(0)*0 is not sufficient to enable the existence of P; * 0 , it always represents a necessary condition, that is P,* 0 => £ (z ,y ,,)* 0 ;

• evenif P; * 0 implies ytj(k )* 0 for some A # 0 , under stability conditions y,y(0) provides the greatest value : |£(z,yy,)|>|£(z,yy,_,)l for A > 0 . This means that spurious regressors can be identified before estimation.

• There exists the possibility of strengthening the relationship covariance-coefficient by referring to partial multicovariances £[z,y;i \yu ....y y_ „ ];

• finally, as shown by Example 7 , in case of multilinear MA models (where p = 0) the relationship is one-to-one, that is £[z,(y;, = a,-,)] * 0 => p, * 0 .

Explanation of second remark is given by the following example.

Example 9 - Consider the model (l-<|>B)z, = a i/z,_iz(_/ + a< with |<J>|<1 and stationary

up to moments of 4-th order. Now expanding z, = X»-o v,jtz,_ * z , , with vijk = a,y<$)* and n, = a ,l§ (B ), it is clear that the diagonal cumulant function y,,(k) = \xt(i + k ,j + A), where y, = z,_,z,_y, is decreasing and has a maximum at A = 0 .

An efficient identification method is that of stepwise regression in which the inter mediate information provided by the partial multicorrelations is used to select the most appropriate pseudolinear regressors to be included in the model. If a coefficient that was

(19)

significant at an earlier stage, later becomes insignificant (after some other inclusions) then the corresponding pseudolinear regressor is deleted. Hence, by adding and deleting the appropriate coefficients the best model should tentatively be determined.

Partial multicorrelations pi/u...j-i“ £(z,yy, | yu . . . ) can easily be estimated by utilizing the recurrence relationships which connect these coefficients

11

...j —2 Pry - 111 ...j -2 Pjj-111 ...j-2

The candidate to enter the model is the variable { y,,} which has the highest sample partial multicorrelation . Approximate significance of these estimates can be evaluated by adapting the F-statistics described by Jenkins & Watts (1968, p.481), namely

1 , _ i

F(\ ,N — p —k )

where j = k + 1 , .. . n and k is the number of terms already included in the model. This approach is typical of standard regression models in which the regressors are observable. In our context, the nonlinear system z, = (1,. + X !;! p,, = 2,3 ... n must be estimated at each step j , both in order to check the significance of the included coef ficients as well as to generate the next candidate "regressors" f jt = (jjfiiz ,-, n jiid ;,-,) • A reasonable approach is thus to estimate rt i as simple correlations between a jf and

yj t ; in summary, the two indicator for the selection of the regressors y,, are given by

r,j = Cor(z,y;,) and r,jll_ j , l = C or(.djJjl) for j = 2,3...n .

To simplify the method further, in particular to reduce the number of intermediate estimations, we suggest a procedure for the system (3.3) which refers to the bi-correlations functions p,,(i,j)<*E(z,y,_u ) where y,_(J = ( z ,z ,_,), (z,_, a, _; ) , ( a , a,_,). First of all it is necessary to derive the sample distribution of their estimators.

P roposition 1 - Let { z,} be a non-gaussian process, asymptotically independent,

stationary up to moment of order 6, and {y ,} defined as above. Then, under the null hypothesis H0 : z, s a , — IID (0, cr2) and for N sufficiently high, we have

N 2 (z ,-z )(y ,-ij-y ) max(f,j) +1 o ,o , (N ~max(i,J)1 IN 0 1 N -m ax (i,7) (4.2)

Proof - A heuristic demonstration of the statement is given in Appendix A 1 . We now

present the various steps of the identification algorithm of the system (3.3).

Step 1 - Identify ip,if) (the ARMA part of the model) with standard methods such as

analysis of sample autocorrelations r j k ) and partial autocorrelations.

Step 2 - Identify (F,Q) by setting coefficients av in the same position (iff) as every

significant correlation (4.2) (i.e. | r,y(i,j)\> 2/VN -max(i, j ) ), in which y,_,j = (z,_, z,_y) .

(20)

Step 3 - Fit the partial model z, = XF <j>, z,_,-+-. . . +Xf X® OqZ,-iz,_j+d, and generate the corresponding residuals { a , } ; then identify (r,s) (i.e. the significant p]; coefficients) as in Step 2 by setting y, j = (z,_,a,.y).

Step 4 - Fit the partial model z, = Xf <S>,z,_,-t-_+X< X; psz,_,d,_y+(f, and generate the

residuals { a ,} ; then identify (R,S) as in Steps 2,3 by setting y,_,j = (d,_1d,_J ) .

Step 5 - Fit the global model using as initial values the estimates of Step 3 and for S,j the correlations r,f (i,j) of Step 4 . Then drop all the insignificant coefficients.

Step 6 - Estimate the final model and check its adequacy with residual correlations

r„a(k) . r j(k ) > ■ The corresponding portmanteau test is given by

G(3R) = X + “ X2(3K - n ) (4.3)

* = 1 //„

As we have stated previously, the rationale of the algorithm is based on the heuristics that the variables {y),} tend to have a statistical behaviour of their own with respect to {z,} , and that a necessary condition for (a,,,(3,,,5,,) * 0 is given by £[z,y,_,j] * 0 . These features are particularly true for models which have an irregular (subset) structure. Under these assumptions Steps 1-4 probably lead to a moderate overparameterization; however, in Step 5 all the unnecessary coefficients are identified and deleted. There are some sub stantive points to be discussed in detail:

• If (p ,q ) = 0 ,i.e . = 0(B) = 1 , and rn(i + k ,j + k ) * 0 persistently for k > 0 with i,j fixed; then a parsimonious representation is provided by the impulse response function z, = [ct,yo/fx,y(fl)]z,-,z,-; where a,,(B) is a monic polynomial of, at most, second order. Estimation of this model requires the definition of the auxiliary variable w,7i = otSiWiy1_1+...-Hxj/0z,_izl_y which can be generated adaptively; extension of this pro cedure to Steps 3 and 4 may be obtained by referring to the series (z, - wiJt) .

• Since the role of monomials (Fly z,_; ) , (F lja ,-;) may be competitive, the selection of coefficients a t] before S,7- implicitly accords priority to the autoregressive terms (z, , z, 7). The identification algorithm then leads to a saving of nonlinear terms (a,_, a,_y) which complicate the estimation problem and have a limited forecasting horizon.

• The rationale of the residual test at Step 6 is similar to that of the test for nonlinear ARMA models discussed by McLeod & Li (1983). While the reason for considering only

diagonal coefficients r(i = j ) is merely practical, the utilization of the ^-statistic is relatively

new. The approximate distribution of (4.3) is a direct consequence of the distribution of multicorrelation coefficients (4.2). As shown in Appendix A1 optimal statistical properties can be derived under a mild assumption of asymptotic independence.

The application - We now conclude the section, returning to the numerical application commenced in Section 2. Principal goal is to check the model building strategy discussed above, but also to compare the statistical performance of various multilinear models. Step 1 of the identification algorithm of the model (3.3) has been already carried out by (2.1), we therefore proceed directly to the second step.

(21)

Step 2 - In order to identify the significant quadratic components = (z,_iz,_J )

bi-autocorrelations rv(i,j) ofTable3were considered. Setting coefficients a,y in the same position (i,j) as every significant correlation, we obtained the intermediate model (3.3) with ( ft,, <(>,, a 13, ^ , a 14, a,7, a lg) . However, in the subsequent OLS estimation many parameters turned out insignificant so that the resulting partial model was

z,= .14+ ,62z,_,- .21z,_1z,_j + d ,, d2 = .87,/?2 = .43

(1.6) (10.) (-3.6)

Notice that the above equation includes the impulse response function z, = [—<xI3/(l—<t>iS)] z,-iz,_3 which covers the sequence of significant correlations

r^(l + k,3 + k ) ,k = 0 ,1 ...4 ofTable3.

Table 3 - Sample bi-autocorrelations r[z,(z,_,z,_y)]

Step 3 - Identification of the bilinear terms (z, a, ) requires the inspection of partial bi-correlations rtf( i ,j ) , with ÿ, _,j = ( z , o f Table 4. Following the procedure of Step 2 we selected the parameters ( pu , p4iU, pl<u) and the intermediate estimation gave z,= .25+ .70z,_,— .22z,_,z,_,- .12z,_,d,_,- .14z,_„d, „+ ,10z, loâ , . + â ,, ô2 = .82,K2= .47

(3.2) (10.7) (-4.7) ( -2 3 ) (-2 X ) (1.9)

Table 4 - Partial bi-autocorrelations r[z,(z,_,d,_; )]

i l , J -» 1 2 3 4 5 6 7 8 9 10 11 12 1 .21 .11 -.04 -.06 -.09 -.09 -.10 -.18 -.07 -.04 .00 -.03 2 -.05 .14 -.04 -.10 -.12 -.04 -.07 -.15 -.13 -.04 -.03 -.04 3 -.01 -.15 .03 .00 -.11 -.12 -.09 -.01 -.04 -.16 -.11 .07 4 .07 -.08 -.04 .00 -.08 -.12 -.13 -.02 -.01 -.08 -.22 .00 5 -.02 .04 -.02 -.10 -.05 -.06 -.11 .00 -.08 -.02 -.10 -.11 6 -.01 .03 .01 -.07 .01 -.07 -.06 .04 -.02 -.13 -.05 -.13 7 -.07 -.05 .04 .00 -.02 .04 -.07 -.09 -.02 -.03 -.03 -.03 8 -.08 -.15 .03 .07 .09 .12 -.07 -.05 -.08 -.11 -.01 .09 9 .01 -.07 -.03 .07 .10 -.02 .05 -.02 -.11 -.10 -.08 .06 10 -.02 .03 -.04 -.03 .21 -.05 -.01 .00 -.10 -.03 -.08 -.02 11 -.05 .02 .08 -.04 .05 .03 -.05 .05 -.01 .02 -.09 -.07 12 -.13 -.05 .11 .11 .14 -.04 -.11 .04 .04 .09 -.06 .02 Note: {a,} are the residuals estimated at Step 2.

(22)

Steps 4,5 - For the identification of the quadratic terms (a,_,a,_; ) the partial bicor relations r,f(i,j) with y,-i.; = (d1-ld,-j),ofTable5w ereconsidered.Tentatively,thesole significant coefficients are ( 5,,, 822); however, the term 8„ a , i s clearly competitive with |3llzI_1uI_j , included at Step 3 , and 822a^ 2 is implied by the impulse response function z, = [811/(l-<l)1S)]a,:^i Estimation of the global model with ( po ,<(>,, a ,3, p„, p4. n , Pio,5. 8„ , 822) confirmed the insignificance of ( f t , , 8» ) .

j l , i —> 1 2 3 4 5 6 7 8 9 10 11 12 N ote: { d , }

Step 6 - The nonlinear estimation of the final multilinear model provided

z,= .30+ .67z,_,- .24z,_jZ,_3- ,16z,_4d,_n + .12z,_10d,_5- .15<sj_, + d , , a2 = .80 , R2 = .49

3 .7 ) (13.4) (-5.4) (-2.4) (1 2 ) ( -2 6 )

with a value of the portmanteau statistic (4.3) Q(3 • 12) = 25.3 < 43.8 = XasPO).

Evaluation based on the indicators (a2,/?2) and the significance of the regression coefficients, does not seem sufficient to check the statistical robustness of the above pro- . cedure. It is necessary, in fact, to refer to indicators which take into account the tradeoff between statistical fitting and parametric efficiency. Table 6 summarises the values of three information criteria corresponding to the models of the previous steps; in two cases the model that is indicated as the best is the final one. Results of BIC are partially due to its sensitivity to the number of observations d =ma\(p,P,r,R) lost in the estimation.

Table 6 - Information Criteria evaluated at Steps 1-6

Step ,(N -d ) 1,(154) 2,(152) 3,(145) 6, (145)

AIC : /(AO = 2 -.047 -.100 -.116 -.140

BIC: /(A0 = log(A0 -.007 -.040 +.007 -.017

HIC:/(A0 = log(log(A0) -.031 -.076 -.066 -.091

This empirical example continues in the next section, which is concerned with methodological aspects of nonlinear estimation; in particular in conditions of time-varying parameters. In that context we will evaluate the forecasting performance of the model identified at Step 6 and other classes of nonlinear models will be considered.

Table 5 -Partial bi-autocorrelations r[z,(â,

1 2 3 4 5 6 7 8 9 10 11 .21 -.02 .20 -.03 -.08 .11 .10 -.10 -.01 .06 -.03 -.02 -.03 -.07 -.06 .01 .05 -.01 -.04 -.03 -.12 -.05 .05 .00 -.05 -.07 -.02 -.05 -.08 -.13 .03 -.02 .02 .14 -.07 -.06 .04 -.11 .02 .09 -.07 .00 .08 -.07 -.08 .07 .08 -.12 -.03 .11 -.12 .05 -.03 -.10 -.02 .04 .11 .06 -.14 -.09 -.01 .01 .04 .00 -.03 -.10 .00 -.06 .12 .12 .02 -.16 -.08 .07 .05 -.01 -.13 are the residuals estimated at Step 3.

(23)

5 . Estim ation

The identification procedure outlined in the previous section has implicitly assumed that an efficient estimator for multilinear models is available. If the distribution of the input process {a ,} is known a-priori a natural candidate is the maximum likelihood method; however this condition contradicts the general formulation (3.1) where a, ~ IID. There is also another feature that makes this approach unsuitable : explicitation of the ML-estimator is not always possible and optimization of the likelihood function must proceed by strictly numerical methods. For the MARMA model this situation involves problems in finding the unconditional algorithm and in analysing its statistical behaviour. In this section we adopt and extend the least-squares approach followed by Subba-Rao & Gabr (1984) for bilinear models.

In order to simplify the treatment, we rewrite model (3.2) in regression form and assume a simplified structure for its monomials, such as

MARMA* « f P i 1i \

*, = &.+ I &>„ + <»,. _i-i y„= n z , . f II a,_j _Vj-i (5.1) i ‘ )

with the general constraint (p, + q,)<(pl + qi) for i < k .Resorting to nonlinear least squares (NLS), i.e. letting g' = [(V>, ft,] and

^ , = argminj^/(P)= Z a,2(p)j , a, ~ IIEKO.o2)

improves the situation with respect to the ML-approach; but analytical expression of the gradient of the Gauss-Newton algorithm still remains difficult to implement

NLS

da, i dy,

& ® = - a £ - Z . - S _dp _y-iLr

l i ( k ) d , ( k ) (5.2 a)

-V,

(5.2 b)

where y,(JJ)' = [l,y i, ...y *] and q = max(tf,) (see Appendix A2 for the derivation). From (5.2b) we note that calculation of the gradient consists of a filtering on the pseudolinear regressors y's , but with a filter that depends on random variables. This makes explicit the dependence of the algorithm (5.2a) on higher order moments of { z,} .

In the linear ARMA context we simply have = Xj 10(B), i.e. g, = x ,—X}-i 0, §, t . By assuming x,' = [z,_1... a, stationary gaussian, we also have { ^ } covariance stationary and therefore {§,§,'} is stationary in mean. Since j)v(/t) is a minimizer, its consistency can be proved by applying the ergodic theorem to r 1 XUi d(x) 0 where t = (k = N ). In (5.2b) { ^ } is not a linear transformation of {y ,} and this in turn could not be stationary in covariance— for { z,} should be 2p-th order stationary, with p = max( p,). The divergence of the estimator (5.2a) may then follow simply because the process (5.2b) has not second order moments. For a multilinear model the problem of invertibility is even more urgent

(24)

observation is that, unless the range of {z,} is restricted within certain bounds, it is impossible to identify terms like 8, n j- i a.*-/ - using algorithms that involve the calculation of residuals. Also in this case, however, we may resort to the stabilizing effect induced by time-varying coefficients, and assume the existence of trajectories of { fi,} which allow

the stability of the filtering = = ■

Off-line Inference - Referring to time-varying parameter models, we now address the important problem of finding a convergent off-line estimator for the mean value P = £ ( j^). This is, indeed, the sole question that can be posed, in the off-line inference, with evolving models. As we shall show, even restricting the analysis to multilinear AR models, consistent estimators may only exist under Assumptions (A)-(B) and other conditions concerning the behaviour of the parameter function. The system of reference for the analysis is given below, in which (Xy'.i khJ] for i < h and k,j> 0

MAR< z, = y/g, + a,> (a, 13,_,) — IID(0,<^) (5.3)

= [ ft z,%J , {IP „ | }< ~ ,

Since it is linear in the parameters, we may consider, as an estimator for j3, the ordinary least squares method

<5-4>

Assumptions C - In order to derive this estimator and to establish its consistency with

respect to the parameter P , some form of orthogonality between the coefficient function { P,} and the variance function of { y ,} is needed, namely

(Cl) £[JE(y/Z,')g1] = £ ( L L ' ) £ r(P,)

(C2) £(a,2) < ~ , E(yj a,) = Q, £(y,y/ )> 0

Indeed, multiplying (5.3) by y, and taking expectation we have £(y,z,) = £(y,y/)p,+ + £ (y ,a ,), next applying the operator £[■ ], under the above conditions we may get

I= £ (Z ,Z ,T '£ (Z ,“.) (5.5)

Notice that, in general, given two deterministic bounded functions f(t),g (t) , the mean value of their product h(i) does not coincide with the product of their means. An immediate example is given by taking /(r) = sin(t),g(t) = -sin(t) , in which / = g = 0 , whereas A(/)= f( t )■ g(t) is negative nearly everywhere. However, the admissibility of the orthogonality assumption (Cl) stems from the fact that (0 { P,} is deterministic and has a stabilizing effect on {z,} (therefore it tends to move rapidly); («) £(■) is an asymptotic operator, so that for finite intervals t e [ 1 ,N] we may really assume

(25)

(C l)’ £ *( y, y j P, ) = £ I , £( h z /g ,)= £ ( x, z/) P + °

Clearly, (Cl)-(Cl)' are always satisfied if { J3,} is constant, periodic or monotonic. On the basis of the assumptions stated so far we have the following formal result:

P roposition 2 - Consider the time-varying multilinear AR model (5.3) in which :

i) assumptions (A1),(A2) are satisfied, in particular : (A l)'w ith 6>(m) = 0(Vm h) , h > rl{2r - 1 ) , r > 1 and (A2)' with K = (2k) (r + 8 ), K = max^Xj', i , 0 < 5 < r ;

ii) assumptions (B1),(B2) hold for every order k < K ;

in) conditions (C1),(C2) are satisfied, or alternatively (C l)';

iv) thesequence EN(yj yJ')={N~1'Efrl E(yI,yjl) } is uniformly positive definite.

Then for N sufficiently large the OLS estimator (5.4) exists with probability one and is consistent for the average trajectory p = £ (P ,).

Proof- The proof is not short and requires some auxiliary results concerning the law

of large number and the transformations of mixing sequences; it is given in Appendix A3. In simulation experiments we have checked that OLS is an accurate estimator for p if the sequence { pj } is not near the border of the (extended) stability region. Anyway, in the case of time-varying parameters off-line estimators are not the proper ones.

On-line Inference - Returning to the system (5.1), we note that its regression structure also enables the application of pseudo-linear regression (PLR) methods in the estimation. These methods simply come from approximating the gradient (5.2b) as g/p) = y,(p) and inserting the corresponding iterative expression y jjt) , together with a,(it) = z, - yjk)' |5(/t), in (5.2a); the final algorithm has the same structure as the OLS (5.4), but is iterative. In the context of nonlinear models, this approach significatively reduce the order of the moments that need to exist, on the other hand it does not provide a minimization method. In practice, as shown in the case of linear models (see Hannan & McDougall, 1988 and Grillenzoni, 1990), the approximation of the gradient makes the resulting estimators not always consistent and generally inefficient. Utilization of the PLR approach should then be limited to recursive (on-line) methods, applied for tracking the sequence of parameters { p j in time-varying models. In this context questions of stationarity and convergence do not matter (by definition) and the adaptive properties of PLR, allowed by the greater computational speed, are preferable to those of accuracy of NLS.

Proceeding as in Solo (1978) or Grillenzoni (1990), by equating (k = N ) = t in (5.2a) and with |(£) replaced by yjjr.), the Recursive PLR estimator of (5.1) becomes

R-PLR a(t) = z,- p ( l - l) 'y ( t ) R(() = X R ( t - l ) + y(t)y(t)' & 0 = j5 ( f - l ) + R ( f f £(«)*(«) d (0= z,-j5(t )'£(<) (5.6a) (5.6 b) (5.6 d) (5.6c) © The Author(s). European University Institute. version produced by the EUI Library in 2020. Available Open Access on Cadmus, European University Institute Research Repository.

(26)

J (t ) = X J ( . t - l ) + d ( t f

£(<+1)= { n j‘„ z ,+w + 1 - 7)}

(5.6c) (5.6/) The terms a , , a, are rispectively the prediction error and the recursive residual; the factor 0<5i<l by preventing R(t) from vanishing, enables parameter changes (P ,-g,_,) to be tracked. Since R(t) is an approximation of the Hessian m atrix, X should be designed to provide a suitable compromise between umbiasedness (fast tracking) and estimation accuracy. Finally, a ( t f = (l-X )J(t) provides an on-line estimator for o f.

Returning to the efficent NLS estimation, the exact recursive expression of (5.2) can be obtained from (5.6) by replacing y_(t) with %(t) in the equations of R (t), (3(1) and inserting the filter

Under the assumption of constant parameters the resulting algorithm tends to minimize the weighted functional J, = EUi V “Ta?(p); however, in the context of evolving systems, it is not clear what may be the improvement in terms of the MSE F || |5(i) “ 1 2. Given the complexity and multilinearity of the filtering (5.6g), a worsening of the tracking capability with respect to (5.6c) cannot be ruled out.

The parameters of (5.1) may vary with time depending on the goodness with which the multilinear model (3.2) approximates the "true" nonlinear function (3.1). This situation can be illustrated with a simple example.

Example 5 - Consider the bilinear system z, = P z, _, a, „2+ z, _ i a, _, + a, ; this can easily be

decomposed into a time varying AR(1) model z, = <j), , z, . , + a, whose parameter behaves like an MA(1) process <j), = Pa,_, + a, with the same input. Hence, whenever a nonlinear model is approximated by a linear ARMA, stochastic variability of parameters occours.

In certain circumstances the lack of nonlinear representation may then be rectified by admitting that the model is time varying and by estimating its coefficients on-line. With respect to the Kalman Filter approach, used in state dependent systems (see Priestley, 1988 p.99), algorithm (5.6) is much more easy to implement since it only requires as priors 0 < X < 1, R(0 f ' = p ■ /„ (and usually .95 < X < .99, .01 < p < 2.0); moreover, it only assumes that parameters do not change suddenly, in a deterministic or stochastic fashion . To be more specific, Kalman Filter implementation requires that parameters follow a linear, or a linearizable, process, whereas recursive algorithms, by making estimates smooth functions of the past observations, implicitly assume pI =/( 3, _,). The weighting sequence {X fi,t)} should then be designed according to the path of conditional probabilities

P,(P, | z,_t) or to that of cross correlations Cor(p,,z,_,). These informations, however, are not generally available a priori and other, more pragmatic, criteria must be followed.

A way of avoiding altogether the problem of priors in the recursive estimation, consists of reducing algorithm (5.6) to a stochastic approximation scheme. This may be approached by setting X s 1 and replacing R(t) with R(t) = R (t)/t, so as to retain the

! « ) = £ « ) - (5.6*) © The Author(s). European University Institute. version produced by the EUI Library in 2020. Available Open Access on Cadmus, European University Institute Research Repository.

(27)

tracking capability. In this case the updating rule equivalent to (5.6b) becomes R(t) = « ( t - l ) + y W t - l ) - Z « ) £ ( 0 ']

and fc-iterating the recursions one may initialize Rklt(0) = Rt(N) . A stochastic approxi mation type solution is achieved if R(t) converge to a matrix 0<R <=■= as t -> ~ ; however, for problems explained above this may be guaranted only if {? ,}" = 0 .

The Application - The parameters of the model identified at Step 6 in the previous section, were successively estimated with the recursive algorithm (5.6) in order to check their stability over time. Initial values were set j3(0) = j3„ , R(0) = RN obtained from the iterative estimation. The optimal forgetting factor = .974 was derived with a search procedure by minimizing the global objective function yN(\) = Zf=J[d(/)J+|| J3(t)-j5v|| ] which establishes a tradeoff between traking and accuracy. Trajectories of $,(r) are shown in Figure 3 ; notice that in spite of the mild value taken by X a significant variability of parameters occours. This is a clear indication that the process {z,} is nonstationarty in higher order moments. Other important features regard the fact the on-line estimates move around their off-line value j3v and their fluctuations are asymmetric, i.e. compensate each other. This confirms, in a certain sense, the role played by the variability of the parameters in stabilizing the behaviour of the output. Finally, the gain in statistical fitting allowed by recursive estimation is pointed out by the value of the residual variance :

(N - d)~' YU.* d ( t f = .63.

Figure 3 (a,b,c,d) - Recursive estimates of the parameters of the model at Step 6

(28)

Results of Table 6 and Figure 3 then confirm the effectiveness of the modeling pro cedure discussed so far; however, they are essentially in-sample behaviour. Another necessary test concerns with the operative, out-of-sample performance; specifically, does the multilinear modeling enable an improvement over the forecasting ability of standard linear models ? To check this point, we took 24 post-sample observations and we eva luated the mean absolute prediction error MApE,(/1 h)= / r'E j-i |z,+,+ i-z ,+,(i)| ■ In this statistic A = 12 is the sample size of the mean, /+x= 1985.12+x are the forecast origins and / = 1 ,2 ... 12 are prediction steps. Results corresponding to models at Step 1 (2.1) and Step 6 (with parameters estimated off-line and on-line) are displayed in Figure 4. It can be appreciated the better forecasting ability of nonlinear models, in particular with time varying parameter. This outcome is particular important in view of the fact that the forecasting algorithm utilized in practice was the sub-optimal one decribed in Section 3, and in the case of time varying model, extrapolation of the parameter function was not made, i.e. 3,(1) = ]3(r).

Figure 4 - MApE statistics for models at Steps 1 (— ) and 6 ( — , .. .)

To complete this section we now evaluate the statistical performance of neural net work and exponential autoregression models on the same set of data. The framework of neural networks (see White, 1989) is based on a sequence of hidden units having common input variables { * ,} and the same structure \\r( ) . The responses of these units interact at an intermediate layer, before reaching the output { y,} . Accordingly, the shape of t|/( ) is represented by a threshold rule or a sigmoid function (e.g. a probability distribution function)

* = * [ I .P ,v ( 2 / a .) ] + a ,

Example 10-Let y = 'F(x'j3) and '¥(■) be the normal distribution; y is then the conditional expectation of a Bernoulli random variable generated by a probit model.

This general framework has important connections with the projection pursuit

regression designed for non-parametric curve fittings (see Friedman & Stuetzle, 1981), in

which \[/y(") are smooth functions with different structure and ('F = |37) s l . Similarities