Indicator function and diﬀerent codings for fractional factorial designs 1

(1)

Indicator function and different codings for fractional factorial designs

¹

Giovanni Pistone

Department of Mathematics - Politecnico di Torino, Italy

Maria-Piera Rogantin ∗

Department of Mathematics - Universit`a di Genova, Italy

Abstract

First, we discuss the relationship between ideal theory and indicator function in fractional factorial designs. Secondly, we deal with the algebraic theory of Bayley (1983) level codings in the framework of ideal theory.

Key words: Algebraic Statistics, Complex coding, Indicator function, Mixed designs, Regular fractions, Orthogonal Arrays

1 Introduction

We discuss here the properties of a fractional factorial design with m factors X1, . . . , Xm, where the set of treatments is described by the solutions of a system of polynomial equations with rational coefficients. The solutions them- selves are not restricted to be rational. In fact they could be real or complex algebraic numbers. We discuss both qualitative and quantitative factors. The approach has been introduced in Pistone and Wynn (1996) and Pistone et al.

(2001). Other references to current research are given.

A design is usually a list or array of treatments. In our approach, the design is coded into the coefficients of the polynomials whose solutions are the treatment

∗ Corresponding author.

Email addresses: [email protected] (Giovanni Pistone), [email protected](Maria-Piera Rogantin).

1 Partially supported by Italian grant PRIN03 coordinated by G. Consonni (Uni- versity of Pavia, Italy)

(2)

levels, then the levels are coded by numeric values. This indirect approach may seem baroque, but it has some advantage. In fact, the algebraic complexity of the coefficients of a polynomial is smaller than the algebraic complexity of its solutions.

A classical example of such a representation is the definition of regular fractions using the generating words. The translation from generating words into the language of polynomial equations is discussed in Pistone and Rogantin (2005).

Our aim is to push forward as far as possible the analysis without actually do- ing any computation on the treatment codings. As most of the computations are done on polynomials, we use extensively symbolic algebra software, especially CoCoA. This one and other similar software systems can perform exact computations on computable number fields, i.e. where the numbers can be stored in a finite or potentially infinite memory. Examples are mod p-fields Z_p, Galois fields GF(p^s), rational numbers Q.

A second feature of the approach we use in this paper is the systematic em- bedding of a generic design into a full factorial design containing it. For this reason, we call fraction a generic design, while the term design is usually re- served to the full factorial, possibly mixed, case. Each function, or response, defined on the design has a polynomial representation. Two different polynomials represent the same function if their difference is zero on the design points.

Indicator polynomials are one way to give defining equations. The main result of the paper are Theorem 1 to Theorem 3 which relate indicator function to Gr¨obner bases.

In the case of qualitative factors with numerical coding of levels, the monomial terms are not directly meaningful, but they could supply a handy linear basis for the interaction spaces, see Galetto et al. (2003). In the quantitative case, the levels of factors are given as real values and most of the relevant properties are of geometric or algebraic type. Designs and their fractions can be properly specified as solutions of systems of polynomial equations, while linear models can be specified as hierarchical polynomial models. It is interesting to remark that the orthogonalization of monomial terms in a hierarchical polynomial model can be shown to produce a set of orthogonal polynomials whose leading terms coincides with the monomials in the model, giving rise to a notion of interaction adapted to the hierarchy, see Giglio and Wynn (2000).

(3)

Acknowledgment

Section 3 of this paper is a revised version of the talk presented to ICODOE 2005 conference, Memphis, May 13-15, 2005. The authors wish to thank the Organizers for the opportunity to present these ideas to such a highly qualified audience.

Eva Riccomagno made useful comments on a previous version of the paper, pointing out in particular some similarities of the second part with Caboara and Riccomagno (1998).

2 Designs, Gr¨obner basis and indicator functions

In this Section we review basic facts about the algebraic study of DOE, while discussing some new material not published elsewhere. A general reference to the polynomial algebra we use is Cox et al. (1997) or Kreuzer and Robbiano (2000). The informed reader can go directly to the main results in Section 2.3.

2.1 Full factorial designs

Let k[x1, . . . , xm] be a polynomial ring containing Q, let d1(x1), . . . , dm(xm) be univariate polynomials. We denote by aij, i = 1, . . . , nj, the solutions of each equation dj(xj) = 0. We assume that these solutions are all distinct and they belong to some extension K of the number field k. Let D be the full factorial design consisting of the solutions of the system d1(x1) = 0, . . . , dm(xm) = 0.

As each equation contains one single indeterminate, the solution set of the system is the Cartesian product of each set of solutions. We usually drop the sub-j when discussing a single factor.

The ideal generated by the previous polynomials, I(D) =< d¹(x1), . . . , dm(xm) >,

is, by definition, the set of all the polynomials which vanish on D. It is called the ideal of the full factorial design. Each polynomial f ∈ I(D) is of the form f = ^P^m_i=1hjdj with hj ∈ k[x¹, . . . , xm], j = 1, . . . , m. Two polynomial whose difference belongs to I(D) represent models confounded on D.

Example. Our main examples are the 5^m factorial designs, with different codings of levels. However, most of the results we discuss are true for non- prime number of levels and for mixed designs.

(4)

(C1) If the levels are 0, 1, 2, 3, 4, then

d(x) = x(x − 1)(x − 2)(x − 3)(x − 4)

= x⁵− 10x⁴+ 35x³− 50x²+ 24x . Here, k = K = Q.

(C2) If the levels are −2, −1, 0, 1, 2, then

d(x) = x(x²− 1)(x²− 4)

= x⁵− 5x³+ 4x . Here, k = K = Q.

(C3) If the levels are the 5-th roots of the unity as in Pistone and Rogantin (2005) and other references therein, then

d(x) = x⁵− 1 .

Here, k = Q, but K is an extension of Q containing the field _<x^Q[x]⁵_−1>. This extended field is computable, but most computational algebra systems do not implement it. If symbolic computations are not of interest, we can take K = C.

(C4) If the levels are the sines of trigonometry angles, sin(^2π₅ k) = ℑ(eⁱ²⁵^π^k), k = 0, 1, 2, 3, 4, as in Bayley (1983), then a direct computation shows that

d(x) = 16x⁵− 20x³ + 5x . Here k = Q and the extended field K = Q

·q

10 − 2√

5,^q10 + 2√ 5

¸

, see below in Section 3.2.

•

The ring of real valued responses on D is denoted by R(D). It is identified with the quotient ring R[x₁, . . . , x_m]/I(D), which contains k[x1, . . . , x_m]/I(D) as a sub-ring.

In the following we write x^α = x^α₁¹· · · x^αm^m for a monomial of indeterminates and X^α = X₁^α¹· · · Xm^α^m for a monomial function on D. As a finite dimensional K-vector space, K(D) is generated by the monomials X^α with α ∈ L with

L = {0, . . . , n¹− 1} × · · · × {0, . . . , n^m− 1} .

We denote such unique hierarchical monomial basis by Est_D. In a very special case, when d(x) = xⁿ−1, this basis is actually orthonormal. All the identifiable polynomial models are a linear combination of the monomials in EstD. Given a polynomial f , it is equal on D to a unique linear combination of elements in EstD, called normal form of f , NFD(f ). Especially, if dj(xj) is monic of

(5)

degree nj, then

dj(xj) = xⁿ_j^j − NF^D^³xⁿ_j^j^´ .

The normal forms of each f can be computed by repeated applications of the re-writing rules xⁿ_j^j = NFD

³xⁿ_j^j^´.

The product X^αX^β is computed in the quotient ring K(D) through the linear basis as

X^αX^β = X^α+β = ^X

γ∈L

cα+β,γX^γ

where the last expression is the normal form of X^α+β on D and the sum of the exponents is computed mod nj. Note that the actual solutions of the equations dj = 0 are not required to compute the array [cα,β], where α ∈ L+L, β ∈ L and cα,β ∈ k.

There exists a remarkable special case, namely when each of the dj’s is a binomial of the form xⁿ− x^n−h, with h = 0, 1, that is of the form x(xⁿ⁻¹− 1) or xⁿ− 1. The normal form of a monomial is itself a monomial. In particular, in last case all the monomials of the basis are invertible in C(D) and the mapping L ∋ α 7→ NF(X^α) ∈ Est^D is a homomorphism from the additive group Z_n1 × · · · Znm to Est_D, sub-group of the multiplicative group of the invertible elements of the ring C(D).

Example. In the case (C4) where the levels are coded by the sines of trigonometric angles the re-writing formulæ up to degree 8 (4+4) are











x⁵ = 5

4x³− 5 16x;

x⁶ = 5

4x⁴− 5 16x²; x⁷ = 5

4x⁵− 5

16x³ = 5 4

µ5

4x³− 5 16x

¶

− 5

16x³ = 5

4x³ −25 64x;

x⁸ = 5

4x⁴−25 64x².

(1)

•

The statistical properties of a design are mostly computed through the moments µα = ED(X^α). We show how to compute moments from the coefficients of the dj’s.

The coefficients of each dj are symmetric functions of the level values, d(x) =

m

Y

j=1

(x − aj) =

m

X

k=0

(−1)^m−kσ_m−kx^k

where σ_h denotes the h-elementary symmetric polynomial of the a_j’s, see (van der Waerden, 1970, Sec. 5.7). E.g. the coefficient of term of order m − 1

(6)

gives the sum of the level values. If we denote by sh, h = 1, . . . , m − 1 the h-moments of the factor with respect to the counting measure on D^j, sh =^P^m_j=1a^h_j, the the Newton-Girard formulæ are

sh− s^h−1σ1+ sh−2σ2− · · · + (−1)^h−1s1σh−1+ (−1)^hhσh = 0

with h = 1, . . . , n This allows the computation of the moments as functions of the coefficients of the polynomial d(x).

Example. With nj = 5 the Newton-Girard formulæ are solved as











s₁ = σ₁ s2 = σ²₁− 2σ²

s3 = σ³₁− 3σ¹σ2+ 3σ3

s₄ = σ⁴₁− 4σ1²σ₂+ 2σ₂²+ 4σ₁σ₃ − 4σ4

(2)

and in the case (C4) with d-polynomial x⁵−⁵₄x³+₁₆⁵ x, we get from Equations (2) and (1):











s1 = 0 s2 = −2

µ

−5 4

¶

= 5 2 s3 = 0

s₄ = 2

µ5 4

¶2

− 45 16 = 15

8











s₅ = 5

4s₃− 5

16s₁ = 0 s₆ = 5

4s₄− 5

16s₂ = 25 16 s7 = 5

4s3− 25 64s1 = 0 s8 = 5

4s4− 25

64s2 = 175 128 .

(3)

•

Let µα = E(X^α), α = (α1, . . . , αm) ∈ L be the moments on D with respect to the uniform probability. Then

µα= 1 n

m

Y

j=1

sj,αj with n = n1· · · n^m .

Example. (continued) In the case (C4) with nj = 5 and m = 3, the not zero moments are µ222 = ¹₈, µ224 = µ242 = µ422 = ₃₂³ , µ244 = µ424 = µ442 = ₃₂³ and µ444 = ₅₁₂²⁷. •

2.2 Fractional factorial design

Given a set of generating equations of the fraction, g1, . . . , gk, the fraction F consists of all the points of D on which all the generating polynomials are zero. The ideal of the fraction, I(F) is generated by d1, . . . , d_m, g₁, . . . , g_k and we can assume that g1, . . . , gk are reduced in normal form on I(D). Given the

(7)

generating equations, it is not obvious how to compute the number of points in the fraction and how to find a hierarchical monomial basis x^α, α ∈ M. Note that #M = #F.

One possible solution consists of the three following steps, see Pistone et al.

(2001) for details.

(1) Order. The first step is to introduce a monomial order, that is a total order

≺ on monomials such that 1 ≺ x^α and x^α ≺ x^β implies x^α+γ ≺ x^β+γ. (2) G-basis. The basis di, . . . , dm, g1, . . . , gk of I(F) is transformed into a

special equivalent basis, called Gr¨obner basis, by the application of an algorithm which is implemented into computer algebra softwares.

(3) Est. A hierarchical basis consist of all the monomials which are not di- vided by the leading term of the polynomials in the Gr¨obner basis.

In this paper we rely marginally on this technology. Note that all the previous points are trivial in the case of a factorial design. In such a case, the hierarchical monomial basis is unique. Note also that some hierarchical bases can not be obtained via the Gr¨obner Basis method, as the example in the next section shows.

2.3 Indicator function

We call indicator function of a fraction F ⊂ D a polynomial F reduced in normal form on I(D), F =^Pα∈LbαX^α, such that

F (a) =







1 if a ∈ F

0 if a ∈ D \ F . (4)

Note that a reduced F is an indicator polynomial if and only if F²− F = 0 on D. Indicator polynomials for fractional factorial designs were introduced first in Fontana et al. (1997), Fontana et al. (2000) and used in Cheng et al.

(2004). It was extended to fractions with replicates in Ye (2003). Tang and Deng (1999) present a basically equivalent methodology. The indicator function could be obtained as the unique reduced interpolator from Equation (4).

When a list of treatment values is available, F can be computed using some form of interpolation formula, as done in the aforementioned papers. Here we follow a new approach based on generating equations.

Theorem 1 Let the ideals of the design and of the fraction be respectively:

I(D) =< d¹, . . . , dm > and I(F) =< d¹, . . . , dm, g1. . . , gk> . A D-reduced polynomial F is the indicator of F if (1) and (2) below are both

(8)

satisfied:

(1) there exist hj ∈ k[x¹, . . . , xm], j = 1, . . . , k such that 1 − F −^X

j

hjgj ∈ I(D) ;

(2) for all gi,

F gi ∈ I(D) . Moreover, given (1), statement (2) is equivalent to

(3) for all gi, there exists ki ∈ k[x¹, . . . , xm] such that gi− kⁱ(1 − F ) ∈ I(D) .

Proof. Condition (1) is equivalent to F (a) = 1 for all a ∈ F. Condition (2) is equivalent to F (a) = 0 for all a ∈ D \ F because for such an a there exists a gj such that gj(a) 6= 0.

Moreover, condition (1) is equivalent to 1 − F ∈ I(F); in such a case (2) is equivalent to < d1, . . . , dm, 1 − F >= I(F). 2

Remark. The computation of the indicator polynomial from the generating equations could be reduced to the case of a single generating equation. In fact, if Fⁱ is the fraction whose generating equation is gi, with indicator polynomial F_i, then F = ∩^ki=1Fi and F = NF_D(F₁· · · Fk).

Example. Let us consider the 3² factorial design with coding −1, 0, +1, as in (C2). Then the design equations are d1(x) = x³− x and d²(y) = y³− y. We consider the “cross” fraction whit generating polynomial g(x, y) = xy. The system corresponding to statements (1) and (2) of Theorem 1 is











x³− x = 0 y³ − y = 0 1 − f − hxy = 0 f xy = 0 .

From the third equation multiplied by x²y² we get x²y² − fx²y² − hx³y³ = 0, and, using the other equations, we get x²y² − 1 + f = 0. The indicator polynomial is F (x, y) = 1 − x²y². The equivalent system of equations











x³− x = 0 y³− y = 0 f + x²y²− 1 = 0 hxy + f − 1 = 0

(9)

is in lower-triangular form with respect to the lexicographic order of monomials with x ≺ y ≺ f ≺ h and has smallest leading term. In other words, it is a Gr¨obner basis for the lexicographic order. •

Theorem 2 We consider the ring k[h1, . . . , hk, f, x1, . . . , xm]. Then the lexicographic Gr¨obner basis of the elimination ideal

< d1, . . . , dm, (1 − f) −^X

j

hjgj, f g1, . . . , f gk> ∩ k[f, x¹, . . . , xm]

contains a unique polynomial of the form f −^Pα∈Lbαx^α. Then the indicator function F is ^P_α∈Lb_αX^α.

Proof. The polynomial f −^Pα∈Lbαx^α belongs to the elimination ideal because of theorem 1. It has minimum leading term among the polynomials containing the indeterminates f and x1, . . . , xm. 2

If g is a response on the design D, the mapping L(D) ∋ g 7→ NF^D(gF ) ∈ L(D)

sets g to zero outside F, and provides a distinguished representative of the equivalence class of responses which are confounded on the fraction. The computation of NF_D(X^αF ), α ∈ L is a nice way to study the confounding struc- ture of the fraction. Again, the computation is done without actually computing the treatments.

Theorem 3 We fix a term order on the set of exponents L. We compute the system of the Normal Forms on D of the polynomials X^αF , α ∈ L with the previous order. We denote by R the matrix of the coefficients

NFD(X^αF ) =^X

β

RαβX^β .

Any vector k in the left kernel of R, k^tR = 0, is a confounding relation among the monomials X^α:

X

α∈L

kαX^α = 0 on F.

A hierarchical monomial basis of the fraction can be found from the ker-matrix K. If KL\M is a non singular sub-matrix of K, then X^α, with α ∈ M, is a monomial basis of F.

Proof. We write the ker-matrix in two vertical blocks as K = [KL\M | K^M] and the matrix R in the two corresponding horizonal blocks R =







RL\M

RM





. If

(10)

KL\M is non singular, then K_L\M⁻¹ K R = 0 implies RM + K_L\M⁻¹ KM RM = 0 and X^α, with α ∈ M, is a basis of the fraction.

2

Example. Composite fraction with 2 factors.

In this section we apply all the preceding developments to the fraction with the following 9 points:

The equations of the full factorial design in the coding (C2) are:

d1 = x1(x²₁− 1)(x²1 − 4) and d² = x2(x²₂− 1)(x²2− 4) and the generating equations of the composite design can be taken as

g1 = x³₁x2−x¹x2, g2 = x1x³₂−x¹x2, g3 = x³₁+3x1x²₂−4x¹, g4 = 3x²₁x2+x³₂−4x² where g3 and g4 can be derived respectively from x1(x²₁ − 4)(x²2 − 1) and x2(x²₂− 4)(x²1− 1), using g¹ and g2.

As in the previous example (the cross), the application of Theorem 1, its remark and Theorem 2 and the use of the Buchberger algorithm for computing Gr¨obner basis with the lexicographic order provide the indicator function for each generating equation as

F₁= 1

48x⁴₁x⁴₂− 5

48x⁴₁x²₂− 1

48x²₁x⁴₂+ 5

48x²₁x²₂+ 1 F₂= 1

48x⁴₁x⁴₂− 1

48x⁴₁x²₂− 5

48x²₁x⁴₂+ 5

48x²₁x²₂+ 1 F₃= 19

144x⁴₁x⁴₂− 79

144x⁴₁x²₂+ 1

3x⁴₁− 67

144x²₁x⁴₂+ 271

144x²₁x²₂− 4 3x²₁+ 1 F₄= 19

144x⁴₁x⁴₂− 67

144x⁴₁x²₂− 79

144x²₁x⁴₂+ 271

144x²₁x²₂+1

3x⁴₂− 4 3x²₂+ 1 and

F = 31

144x⁴₁x⁴₂− 127

144x⁴₁x²₂−127

144x²₁x⁴₂+1

3x⁴₁+ 511

144x²₁x²₂+1

3x⁴₂− 4

3x²₁− 4 3x²₂+ 1

(11)

We can compute the Normal Forms of X^αF , α ∈ L with the DegRevLex order.

A left kernel K of the matrix R, computed by CoCoA, gives a good picture of the confounding patterns:







00 01 02 03 04 10 11 12 13 14 20 21 22 23 24 30 31 32 33 34 40 41 42 43 44

0 −4 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0

0 0 −4 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0

0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0

−1 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 1 0 −1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0

−1 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 −1 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0

−1 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 −1 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0

−1 0 0

0 0 0 0 0

−4 0 0 0 3 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0

−1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

−1 0 1 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 −1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

−4 0 0 0 0 0 0 0 0 0 1 0 3 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 −1 0 1 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

−1 0 1







We observe that there are 12 rows with 2 non-zero values (-1 and 1) and 23 zeros: we have the full confounding of the correspondent monomials. The other 4 rows are cases of partial confounding. The 12 cases of total confounding split in four cycles.

A non singular sub-matrix of the previous kernel of the R matrix corresponds to M = {(α¹, α2) | 0 ≤ αⁱ ≤ 2}; the determinant of the matrix KL\M is 1.

In particular, the response surface with terms with total degree up to 2, is estimable.

Notice that the previous discussion about the confounding, based on the matrix K depends on the very special form of the kernel matrix produced by CoCoA. Actually CoCoA produces a matrix with integer entries such that the number of non zero entries of each row is low. A generic numerical software would have produced some type of orthonormalized matrix unsuitable for a statistical interpretation. •

3 Real linear bases and real levels

While all codings of levels specified as polynomial ideals are compatible with the algebraic theory, special cases have special features.

In particular, the use of the n-roots of the unity produces an orthogonal basis of monomials, see Pistone and Rogantin (2005). Thus, not only the identifiability can be treated but also the estimation of effects (i.e. least square) are easily managed.

However, in the applications we are interested in real effects. Bayley (1983) suggests a way to define a real coding and interpretable real effects, while keeping some of the feature of the complex coding. Other relevant references

(12)

are Caboara and Riccomagno (1998), Edmondson (1994), Kobilinsky (1990), Kobilinsky and Monod (1991).

We shall discuss in detail an example of a fraction of a 5³ design.

3.1 Bases for the responses on the design

Coding the levels as in (C3) of the previous section and denoting by L the set of the exponents of the monomial responses, the set of functions {X^α , α ∈ L}

is an orthonormal basis of the complex responses on the design D. Then, each response f can be represented as an unique C-linear combination of constant, simple and interaction terms:

f = ^X

α∈L

θ_α X^α, θ_α ∈ C

where the coefficients are uniquely defined by θα = ED

³f X^α^´. A response f is real valued if and only if

θα = θ[−α] , α ∈ L .

In the applications, we are interested in the real responses on the design. Note that both the real vector space R(D) and the complex vector space C(D) of the responses on the design D have a real basis, see (Kobilinsky, 1990, Prop.

3.1). In particular, a common real basis of both spaces is described in the following proposition.

Proposition 1 For each factor j with nj levels, an orthogonal basis of the real functions R(Dj) defined on the design restricted to the j factor Dj is:











1 constant term

ℜ(Xj^k) for 1 ≤ k ≤ n^j/2 ℜ(Xj^k) = X_j^k+ X^k_j 2

ℑ(Xj^k) for 1 ≤ k < n^j/2 ℑ(Xj^k) = −i X_j^k− X^kj

2 .

The basis of R(D) is obtained via the Kronecker product:

R(D) = R(D1) ⊗ · · · ⊗ R(Dj) · · · ⊗ R(Dm) .

Proof. For each real function defined on the factor, we have f = f , that is:

nj−1

X

k=0

θk X_j^k =

nj−1

X

k=0

θk X_j^k .

(13)

If θk = ak+ i bk then θk = ak− i b^k and each real function f can be written as

f = θ0+

nj−1

X

k=1

ak

³X_j^k+ X_j^k^´

2 + i

nj−1

X

k=1

bk

³X_j^k− Xj^k

´

2

= θ0+ ^X

1≤k≤nj/2

αk ℜ(Xj^k) + ^X

1≤k<nj/2

βk ℑ(Xj^k) .

with α_k = 2ℜ(θk) for k < n_j/2, α_k = ℜ(θk) for k = n_j/2 (even case), β_k =

−2ℑ(θ^k) for k < nj/2.

Now we check the orthogonality of the elements of the basis. First, all the vectors are orthogonal to the constant.

Then, let h and k be distinct integer numbers such that 1 ≤ k < h < nj/2.

ℑ(X^h) ℑ(X^k) = i

ÃX_j^h− Xj^n−h

2

!

i

ÃX_j^k− Xj^n−k

2

!

=

= −1 2

ÃX_j^h+k+ X_j^n−h−k

2 − X_j^h−k+ X_j^n−h+k 2

!

=

= −1 2

³ℜ(Xj^h+k) − ℜ(Xj^h−k)^´ .

The mean value of ℑ(X^h) ℑ(X^k) is 0 because neither h + k nor h − k are 0.

The cases ℑ(X^h)–ℜ(X^k) and ℜ(X^h)–ℜ(X^k) are similar. 2

Remark. The previous bases, as a function of the integer lattice coding 0, 1, 2, . . . , nj− 1 consist of the usual trigonometric functions, see Riccomagno et al. (1997) and Bates et al. (1998). In fact we are now discussing the algebraic properties of the Fourier regression.

3.2 Real recoding for factors with a prime number of levels

Now we consider the application of the basis of Proposition 1 to ordered and quantitative factors.

If nj is a prime integer greater than 2, then the map ωk ←→ ℑ(ω^k)

is one-to-one and ℑ(Xj) can be considered the linear term of a real basis of the responses defined on the re-coded D^j. We denote by u1j this term. In this

(14)

case the coding is

{0, ℑ(ω¹), . . . , ℑ(ωⁿj−1)} =

(

0, sin

Ã2 π nj

!

, . . . , sin

Ã2 π nj

(nj − 1)

!)

Notice that the new real levels are not equally spaced and their increasing order differs from the numbering k = 0, 1, . . . , n1 − 1, see example below. In fact, we are constructing polynomial models and the trigonometric representation is not appropriate.

Example. n_j = 5

The roots of the unity and the real coding are:

0 1 2

3

4

−

p10 + 2√ 5

4 sinµ 2π

5 4

¶

−

p10 − 2√ 5

4 sinµ 2π

5 3

¶

0 sinµ 2π 5 0

¶ p10 − 2√

5

4 sinµ 2π

5 2

¶ p10 + 2√

5

4 sinµ 2π

5 1

¶

(5)

•

We shall discuss below a better algebraic presentation of such values.

Now we shall show that the other elements of the basis are representable as polynomials of degrees from 0 up to nj− 1. The degrees shall induce a special ordering on the elements of the basis. We consider the re-ordered basis

u0j, u1j, . . . , ukj, . . . , u(nj−1)j

defined as follows:











u0j =1

u_kj =ℜ(Xj^k) for k 6= 0, k even u_kj =ℑ(Xj^k) for k odd

. (6)

In fact, if n_j is a prime number greater than 2, it is odd and the conjugate of an even power of Xj is an odd power and vice-versa. Then

(15)

nℜ(Xj^k), k 6= 0, k even, k ≤ n^j − 1^o=

½

ℜ(Xj^k), 1 ≤ k ≤ nj

2

¾

nℑ(Xj^k), k odd, k ≤ n^j − 1^o=

½

(−1)^k−1ℑ(Xj^k), 1 ≤ k < nj

2

¾

. The interest of this new numbering is given by the following proposition.

Proposition 2 Let the number of levels be a prime number greater than 2 and let

u0j, u1j, . . . , u(nj−1)j

be the basis defined above in (6).

Then, each element of such a basis is a polynomial in u1j as implied by the following triangular system of equations:

u^k_1j=





 Pk/2

s=0 γsk u(2s)j if k is even

P[k/2]

s=0 γsk u_(2s+1)j if k is odd (7)

where the γrk’s are rational coefficients.

Consequently, the basis consists of the orthogonal polynomials on the real coding.

Proof. We have:

i ^k^³Xj− X^j^´^k= (i )^k

k

X

r=0

Ãk r

!

(−1)^k−rX_j^2r−k =

(if k is odd) = (i )^k

k/2−1

X

r=0

(−1)^r

Ãk r

!

³X_j^k−2r− X^k−2rj

´

= (i )^k

k/2−1

X

s=0

(−1)^s

Ã k

k+1+2s 2

!

³X_j^2s+1− X^2s+1j

´

(if k is even) = (−1)^k/2+1

Ã k k/2

!

+

k/2−1

X

r=0

(−1)^r

Ãk r

!

³X_j^2s+ X^2s_j ^´

= (−1)^k/2+1

Ã k k/2

!

+

k/2−1

X

s=0

(−1)^s+1

Ã k

k+1+2s 2

!

³X_j^2s+ X^2s_j ^´ .

2

We notice that the last element of the basis u_(n_j_−1)j equals ℜ(X^j); in fact u(nj−1)j = 1/2 (Xⁿ^j⁻¹+ Xⁿ^j⁻ⁿ^j⁺¹) = ℜ(X^j). From the previous proposition, we can write u(nj−1)j as an (nj− 1)-degree polynomial of u^1j. Then, the real

(16)

part of a complex coding level ω, denoted by c, can be written as a polynomial of the corresponding imaginary part, denoted by s. The mapping between the real and the complex codings is

(ω = c + i s c = unj−1(s)

Remark. Having identified a natural degree associated to each term of the basis, we have a consistent definition of a parsimonious hierarchical model.

For example we could define a response surface model with constant, linear terms and interactions of order two and minimum degree.

A better description of the values of the real coding is available through the notion of minimal polynomial. For example, the irrational values appearing in (5) are solution of the equation of (C4).

The general form of the minimal polynomial for sin(2πk)/p, k = 0, . . . , n − 1, with p a prime is given by (see Beslin and de Angelis (2004)):

S_p(s) =

(p−1)/2

X

k=0

(−1)^k

Ã p 2k + 1

!

(1 − s²)^{(p−1)/2−k}s^2k+1 .

Note that the algebraic complexity has been reduced, because the coefficients of the minimal polynomial are integers.

Example. nj = 5 (cont.)

For each factor, the re-ordered real basis u₀, u₁, u₂, u₃, u₄ is:









 u₀ = 1

u₁ = ℑ(X) = −i ³

X−X 2

´ u₂ = ℜ(X²) =³

X²+X² 2

´ u₃ = ℑ(X³) = −i ³

X³−X³ 2

´

= i ³

X²−X² 2

´

= −ℑ(X²) u₄ = ℜ(X⁴) =³

X⁴+X⁴ 2

´

=³

X+X 2

´

= ℜ(X) .

The triangular system in Equation (7) is:

u²₁ = −1

2u2+ 1

2 u³₁ = −1

4u3 +3

4u1 u⁴₁ = 1

8u4− 1

2u2+ 3

8 (8)

and the orthogonalized system of the monomial 1, u1, u²₁, u³₁, u⁴₁ on the given

(17)

points is:











1 u1

u2 = −2u²1+ 1 u3 = −4u³1+ 3u1

u4 = 8u⁴₁− 8u²1+ 1 .

(9)

In this case the mapping between the real and the complex coding is:

(ω = c + i s

c = 8s⁴− 8s² + 1 .

The mean values of the element of the basis 1, u1, u²₁, u³₁, u⁴₁ are rational number. In fact, from the relations (8), the mean values of the odd powers are zero and the mean values of the even powers are (cfr. (3):

ED(1) = 1 , ED(u²₁) = 1

2 , ED(u⁴₁) = 3 8 .

The space of the responses is linear with basis u0, . . . , u4. As a ring, it has the following multiplication table.







u0 u1 u2 u3 u4

u0 2u0 2u1 2u2 2u3 2u4

u1 2u1 −u²+ 1 −u¹+ u3 u2− u⁴ −u³ u2 2u2 −u¹+ u3 u4+ 1 u1 u2+ u4

u3 2u3 u2− u⁴ u1 −u⁴+ 1 −u¹− u³ u4 2u4 −u³ u2+ u4 −u¹− u³ u2+ 1







× 1 2

•

3.3 Recoding of regular fractions

The regular fractions in complex coding are defined by equations of the form X^α = constant

The imaginary part of the product of k roots of the unity corresponds to the sinus of the sum of k angles which can be computed using the following

(18)

formulas. If c^(k), s^(k) denote the cos and sin of the sum of k angles, then











c^(k)= ^X

h=k,k−2,...

h≥0

(−1)^k−h² ^X

I⊆{1,...,k}

#I=h

Y

i∈I

ci

Y

i /∈I

si

s^(k)= ^X

h=k−1,k−3,...

h≥0

(−1)^k−h−1² ^X

I⊆{1,...,k}

#I=h

Y

i∈I

ci

Y

i /∈I

si .

(10)

By the use of the polynomial formulas of c as function of s and the Equations 10, every set of generating equations of a regular fraction translate in a set of generating equations in the real coding.

Example. nj = 5 (cont.)

The following generating equation for a 5³⁻¹: XY Z = 1 translates into:

(ci = 8s⁴_i − 8s²i + 1 i = 1, 2, 3

0 = s1c2c3+ c1s2c3 + c1c2s3− s¹s2s3 .

The defining equations for the full factorial design are:

16s⁵_i − 20s³i + 5si = 0 i = 1, 2, 3

A monomial basis of the fraction, obtained using CoCoA, see Appendix, is:

1, z, y, x, z², y², x², z³, y³, z⁴, y⁴,

yz, xz, xy, yz², xz², y²z, x²z, xy², yz³, xz³, y²z², yz⁴, xyz, xyz² .

The monomial terms in the list are linearly independent on the fraction. A possible choice of a symmetric hierarchical model based on this list is

1, z, y, x, z², y², x², yz, xz, xy, xyz .

We consider the same regular fraction, using the Fourier coding as in the system (9). The basis is square free. A monomial basis of the fraction, obtained using CoCoA is:

1, z4, z3, z,z1, y4, y3, y2, y1, x4, x3, x2, x1,

y4z4, y3z4, y2z4, y1z4, x4z4, x3z4, x2z4, x1z4, y4z3, y3z3, y2z3, y1z3 .

It is remarkable that interactions of order 3 appear in the former case and not in the last one. •

(19)

4 Conclusions

In this paper we have discussed how fractions defined by polynomial equations in the complex coding translate to polynomial equations in the real coding.

Moreover, Gr¨obner basis softwares can produce a list of estimable monomial terms deduced from any type of defining equations, in particular indicator functions.

The computations of interest are exact but, unfortunately, not all the hierarchical bases are produced this way We expect this to be relevant because inverse problems could be considered, as in Robbiano and Rogantin (1998) and Fontana et al. (2000).

5 Appendix: CoCoA script

Use R::=Q[x,y,z]; -- rational coefficients and 3 indeterminate ring Dx:=16*x^5-20*x^3+5*x; -- defining equations of the full design Dy:=16*y^5-20*y^3+5*y;

Dz:=16*z^5-20*z^3+5*z;

Cx:=8*x^4-8*x^2+1; -- equations for cosines Cy:=8*y^4-8*y^2+1;

Cz:=8*z^4-8*z^2+1;

F:=x*Cy*Cz + Cx*y*Cz+ Cx*Cy*z -x*y*z;

-- addition formulas and generating equation of the fraction I:=Ideal(F,Dx,Dy,Dz); -- ideal generated by equations

Est:=QuotientBasis(I); Sort(Est); Est; -- set of estimable terms Use R::=Q[x[1..4],y[1..4],z[1..4]];

PowX:=[x[1]^2+1/2*x[2] - 1/2, x[1]^3+1/4*x[3]-3/4*x[1],

x[1]^4-1/8*x[4] +1/2*x[2]- 3/8, x[1]^5-20/16*x[1]^3 + 5/16 x[1]];

PowY:=[y[1]^2+1/2*y[2] - 1/2, y[1]^3+1/4*y[3]-3/4*y[1],

y[1]^4-1/8*y[4] +1/2*y[2]- 3/8, y[1]^5-20/16*y[1]^3 + 5/16 y[1]];

PowZ:=[z[1]^2+1/2*z[2] - 1/2, z[1]^3+1/4*z[3]-3/4*z[1],

z[1]^4-1/8*z[4] +1/2*z[2]- 3/8, z[1]^5-20/16*z[1]^3 + 5/16 z[1]];

Pow:=Concat(PowX,PowY,PowZ);

F:=[x[1]y[4]z[4]+x[4]y[1]z[4]+x[4]y[4]z[1]-x[1]y[1]z[1]];

Frac:=Concat(Pow,F);

I:=Ideal(Frac);

Est:=QuotientBasis(I);Sort(Est);Est;

(20)

References

Bates, R. A., Riccomagno, E., Schwabe, R., Wynn, H. P., 1998. Lattices and dual lattices in optimal experimental design for fourier models. Computa- tional statistics & data analysis 28, 283–296.

Bayley, R. A., 1983. The decomposition of treatment degrees of freedom in quantitative factorial experiments. J. R. Statist. Soc. B 44 (1), 63–70.

Beslin, S., de Angelis, V., 2004. The minimal polynomials of sin(2π/p) and cos(2π/p). Mathematical Magazine 77, 146–149.

Caboara, M., Riccomagno, E., 1998. An algebraic computational approach to the identifiability of Fourier models. Journal of Symbolic Computation 26, 245–260.

Cheng, S.-W., Li, W., Ye, K. Q., 2004. Blocked nonregular two-level factorial designs. Technometrics 46 (3), 269–279.

Cox, D. A., Little, J. B., O’Shea, D., 1997. Ideal, Varieties, and Algorithms, 2nd Edition. Springer-Verlag, New York, 1st ed. 1992.

Edmondson, R. N., 1994. Fractional factorial designs for factors with a prime number of quantitative levels. J. R. Statist. Soc., B 56 (4), 611–622.

Fontana, R., Pistone, G., Rogantin, M.-P., 1997. Algebraic analysis and gen- eration of two-levels designs. Statistica Applicata 9 (1), 15–29.

Fontana, R., Pistone, G., Rogantin, M. P., 2000. Classification of two-level factorial fractions. J. Statist. Plann. Inference 87 (1), 149–172.

Galetto, F., Pistone, G., Rogantin, M. P., 2003. Confounding revisited with commutative computational algebra. J. Statist. Plann. Inference 117 (2), 345–363.

Giglio, Beatrice, R. E., Wynn, H. P., 2000. Gr¨obner basis strategies in regression. Journal of Applied Statistics 27 (7), 923–938.

Kobilinsky, A., 1990. Complex linear model and cyclic designs. Linear Algebra and its Applications 127, 227–282.

Kobilinsky, A., Monod, H., 1991. Experimental design generated by group morphism: An introduction. Scand. J. Statist. 18, 119–134.

Kreuzer, M., Robbiano, L., 2000. Computational Commutative Algebra 1.

Springer, Berlin-Heidelberg.

Pistone, G., Riccomagno, E., Wynn, H. P., 2001. Algebraic Statistics: Compu- tational Commutative Algebra in Statistics. Chapman&Hall, Boca Raton.

Pistone, G., Rogantin, M., 2005. Indicator function and complex coding for mixed fractional factorial designs. Tech. rep., Dipartimento di Matematica, Politecnico di Torino, submitted to JSPI.

Pistone, G., Wynn, H. P., Mar. 1996. Generalised confounding with Gr¨obner bases. Biometrika 83 (3), 653–666.

Riccomagno, E., Schwabe, R., Wynn, H. P., 1997. Lattice–based optimum design for Fourier regression. The Annals of Statistics 25 (6), 2313–2327.

Robbiano, L., Rogantin, M.-P., 1998. Full factorial designs and distracted fractions. In: Buchberger, B., Winkler, F. (Eds.), Gr¨obner Bases and Applica- tions (Proc. of the Conf. 33 Years of Gr¨obner Bases). Vol. 251 of London

(21)

Mathematical Society Lecture Notes Series. Cambridge University Press, pp. 473–482.

Tang, B., Deng, L. Y., 1999. Minimum G2-aberration for nonregular fractinal factorial designs. The Annals of Statistics 27 (6), 1914–1926.

van der Waerden, B. L., 1970. Algebra. Vol 1. Translated by Fred Blum and John R. Schulenberger. Frederick Ungar Publishing Co., New York.

Ye, K. Q., 2003. Indicator function and its application in two-level factorial designs. The Annals of Statistics 31 (3), 984–994.