• Non ci sono risultati.

Functional analysis for parametric families of functional data

N/A
N/A
Protected

Academic year: 2021

Condividi "Functional analysis for parametric families of functional data"

Copied!
6
0
0

Testo completo

(1)

c

 World Scientific Publishing Company

DOI:10.1142/S0218127412502264

FUNCTIONAL ANALYSIS FOR PARAMETRIC

FAMILIES OF FUNCTIONAL DATA

ANGELA DE SANCTIS and TONIO DI BATTISTA

DMQTE Department,

“G.d’Annunzio” University of Chieti-Pescara,

Pescara, Italy a.desanctis@unich.it

dibattis@unich.it Received May 14, 2011

Assuming a Parametric Family of Functional Data, the problem of computing summary statistics of the same functional form is investigated. The central idea is to compile the statistics on the parameters instead of on the functions themselves. With the hypothesis of a monotonic dependence from parameters, we highlight the special features of this statistics.

Keywords: Functional data analysis; Lp-norm; convergence pointwise almost everywhere; monotonic dependence.

1. Introduction

Recently, Functional Data Analysis (FDA) has become an interesting research topic for statisti-cians, see e.g. [Ferraty & Vieu, 2006; Ramsay & Silverman, 2007] and references therein. In fact, in many different fields, such as medicine, physics, eco-nomics, etc., the statistical units are described not by vectors but by curves or functions. As examples, we can consider, in cardiology, the electrocardio-grams and, in psychophisiology, the galvanic skin responses (GSR signals). We wish to deal with cases where data are functions of a model or are reduced, with some technique such as using basis functions or orthogonal fitting curves [Di Battista et al., 2007; Sung, 2005], to functions known in their closed form from time series. In particular, we assume a para-metric family of functional data. As example, we mention the family of the Cobb–Douglas production functions, which are frequently used in economics in order to study the relationship between input factors and the level of production. The functions take on the form y = f (K, L) = LαKβ, where L is one factor of production (often labor) and K is

a second factor of production (often capital). The parameters are α and β and they are positive with

α + β = 1. Besides, in biology, growth functions are

used to describe growth processes [Vieira & Hoff-mann, 1977], and we can cite the logistic growth functions y = a/[1 + exp{−(b + ct)}] where a, b and

c are parameters, a > 0 and c > 0, and the

Gom-pertz growth functions y = exp(a− bct) where a, b and c are parameters, b > 0 and 0 < c < 1. The aims of FDA are fundamentally the same as those of any area of statistics, i.e. to investigate essential aspects such as the mean and the variability of the data. We note that the mean is a representative ele-ment of data, then it has to be in their set, on the contrary, the variability is the measure of an error, so it can be out.

Functional Analysis is the mathematical field which studies the spaces of functions, then we can consider this data to be in some Lp space, p > 0, with the usual norm [Rudin, 1986]. In general, the functional data constitute a subspace which is not a vectorial subspace of Lp. For example, let

(2)

functions in which for simplicity the production factors A1 and A2 are assumed constant. In anal-ogy to the vectorial case, if we define the mean as

y = A1Lα1+A2Lα2

2 , we do not have a Cobb–Douglas function. In general, the result of this approach may be not a function with the same functional form of the data so that erroneous interpretations of the final functional statistic could be obtained.

In this communication, we wish to emphasize a new approach which is focused on the functional form generating the data. In the case of a para-metric family of functional data, we use the param-eter space in order to transport the mean of the parameters to the functional space. Afterwards, we pointwise define a variability function. Assuming a monotonic dependence from parameters, we prove suitable properties for the functional mean and vari-ability, similar to those of the vectorial case. Finally, for illustrative purpose, two simulation studies are presented.

2. The Functional Mean

Let S be a family of functions with n real parame-ters, that is, S ={fθ} with θ = (θ1, θ2, . . . , θn)∈ Θ. We suppose that there is a biunivocal correspon-dence between the family S and a convex param-eter space Θ, that is, every functional datum fθ of S is univocally determined by its parameter vector θ. In an economic setting, S could be the family of Cobb–Douglas production functions, i.e.

fα,β(K, L) = KαLβ with α > 0, β > 0 and

α + β = 1. Starting from m functional data

belong-ing to S, fθ1, fθ2, . . . , fθm, the objective is to find an element of S, which we will call functional mean and denote by fˆθ= H(fθ1, fθ2, . . . , fθm).

In the following we will assume that functional data constitute a subspace S of some Lp space,

p > 0, with the usual normfp = {X|f|pdµ}1p,

for every f ∈ Lp, where (X, µ) is a measure space [Rudin, 1986]. When S is a vectorial subspace, then we can express the functional mean as the sample mean fˆθ= fθ1+fθ2m+···+fθm. Because S is closed with respect to linear combinations, we have that fˆθ∈ S. It is easy to see that if S is not a vectorial sub-space then this function does not necessarily lead to an element belonging to S. As example of vec-torial subspace, let S be the family of functions of the following form fα(x) = αg(x), with α being real parameter and g∈ Lp a fixed function, then

fˆα(x) = m  i=1 fαi(x) m = m  i=1 αig(x) m = m  i=1 αi m g(x). (1) This proves that fˆα is an element of S and its parameter is the mean of the parameters α1, α2, . . . , αm. Inspired by this example, in the gen-eral case, we consider the following:

Definition 2.1. Let fθ1, fθ2, . . . , fθm be functional data, then their functional mean is the element of

S that has as associated parameter the mean ˆθ = K(θ1,θ2, . . . ,θm) of parameters θ1,θ2, . . . ,θm, that is, fˆθ.

In other words, the functional mean will be obtained by the mean of the parameters following the scheme: θi ← fθi ˆ θ = K(θi)→ fˆθ. i = 1, 2, . . . , m (2)

2.1. Property of the functional

mean

We can require for the functional mean the same properties of the mean of the parameters, as the internality property. But we have to suppose some hypothesis, in particular, we suppose that the func-tions are linked to each parameter by a monotonic dependence.

2.1.1. Univariate case

If we have only a parameter α, we assume the following monotonic increasing dependence by the parameter

α1 ≤ α2 ⇒ fα1(x)≤ fα2(x), ∀ x ∈ X

or the following monotonic decreasing dependence by the parameter

α1 ≤ α2⇒ fα1(x)≥ fα2(x), ∀ x ∈ X. It is easy to verify that monotonic decreasing depen-dence by the parameter is verified by the family S of the functions fα(x) = xα with 0 < α < 1 and

x∈ [0, 1].

In the cases of monotonic dependence by the parameter, for the mean ˆα of the parameters α1

(3)

and α2, we obtain the internality property, that is,

fα1(x)≤ fˆα(x)≤ fα2(x), ∀ x ∈ X in the first case or

fα2(x)≤ fˆα(x)≤ fα1(x), ∀ x ∈ X in the second case.

This property is not valid without the hypoth-esis of monotonic dependence by the parameter, indeed, it is easy to see that is not true, for example, for the family of functions y = h(α)x with α real parameter and h(α) = α, if α is a rational num-ber while h(α) = −α, if α is not a rational num-ber. In fact, we do not have f (−1)x ≤ f(√2)x

f (1 + 2√2)x, because −1 ≤ −√2 ≤ −(1 + 2√2) is not true, nor f (2− 2√2)x ≥ f(1)x ≥ f(2√2)x, because −(2 − 2√2)≥ 1 ≥ −2√2 is not true. 2.1.2. Multivariate case

In general, if the parameters are vectors θ = 1, . . . , θn) andη = (η1, . . . , ηn), we defineθ ≤ η if

θi≤ ηi,∀ i = 1, . . . , n. As usual, we denote f ≤ g if

f (x)≤ g(x), ∀ x ∈ X. Then, we can take the

follow-ing definitions of monotonic increase dependence by parameters:

θ ≤ η ⇒ fθ≤ fη

and monotonic decrease dependence by parameters:

θ ≤ η ⇒ fθ≥ fη.

As extension of the univariate case, we obtain the internality property:

θ1 ≤ · · · ≤ θm⇒ fθ1 ≤ fθˆ≤ fθm

or

θ1 ≤ · · · ≤ θm⇒ fθm ≤ fθˆ≤ fθ1

where ˆθ is the vector of the parametric means

ˆ

θ = (ˆθ1, . . . , ˆθn).

3. Functional Variability

For simplicity of notation, we consider now the uni-variate case.

In order to define the functional variability we first introduce, for every x∈ X, the quantity:

vir(x) =|fθi(x)− fˆθ(x)|r

which represents the rth order algebraic deviation between the observed functional data fθi and the

functional statistics fˆθ. Then the functional vari-ability can be defined pointwise by the rth order functional moment Vr(x) = 1 m m  i=1 vri(x). (3)

3.1. Properties of the functional

variability

The function Vr has obviously the following prop-erty: if fθi = fˆθalmost everywhere in X, for every

i = 1, 2, . . . , m, then Vr = 0 almost everywhere in X.

We recall that the distance, induced by the norm, of two functions f and g in Lp is defined:

d(f, g) =f − gp =  X|f − g| p 1 p

Let θinbe a sequence in n∈ N for every i = 1, . . . , m with θn1 = θ1 and

θ1 = θn1 ≤ θ2n≤ · · · ≤ θnm We have the following direct property:

Theorem 1. If limn→∞d(fθ1, fθn

m) = 0 then

limk→∞Vrnk = 0 almost everywhere in X for a

sub-sequence of data fθnk i .

Proof. It is a consequence of the Theorem, in Func-tional Analysis, which says that the convergence of a sequence of functions to a function in Lp implies the existence of a subsequence which con-verges pointwise almost everywhere to the same function. 

If we add the positivity of the functions, we can obtain also the inverse property:

Theorem 2. If fθ1 ≥ 0 and limn→∞Vrn= 0 almost

everywhere in X then limn→∞d(fθ1, fθn m) = 0.

Proof. We have fθn

i → fˆθ, for n → ∞, almost

everywhere in X, for every i = 1, . . . , m, so thesis is a consequence of Lebesgue’s Monotone Con-vergence Theorem and Fatou’s Lemma. We note that, in order to use the Lebesgue’s Theorem, we need the monotonic dependence by parameters hypothesis. 

(4)

In conclusion, we can remark that the defini-tions of functional mean and variability, introduced in this paper, are compatible with the standard norm and distance in the functional spaces, allowing the subsequent theory consistent from a mathemat-ical point of view. From a statistmathemat-ical point of view, we analyze the descriptive aspects of the proposed estimation method. Connections and results related with statistical inference will be studied in a future work.

4. Simulation Studies

We exhibit two simulation studies in order to evaluate the estimation method proposed for the functional statistics.

4.1. Power functions

We suppose that the observations are contaminated with some error so that the resulting family S is constituted by the functions fα(x) = xα+  with 0 < α < 1 and 0 ≤ x ≤ 1. We simulate differ-ent populations by assigning to α differdiffer-ent distri-butions such as the Normal, the Uniform and the Exponential with different parameters and to  a white noise with standard error equal to 0.01. For illustrative purposes, in Fig. 1, there are three pop-ulations for α ∼ N(µ = 0.5, σ = 0.1), α ∼ U(0, 1)

and α∼ Exp(0.05). Values of α outside the interval (0, 1) were discarded.

In order to evaluate the estimation method pro-posed in Sec. 2, we sample from each population

J = 5000 samples for various sample sizes m. As

the functions are observed with error we first need to apply the ODF method, according to Di Battista

et al. [2007] and Sung [2005], to estimate the

param-eter α for each function. Once for each sample the estimates α1, α2, . . . , αm are available, the scheme detailed in (2) can be applied in order to obtain the functional mean of the sample. In Fig. 2 we show the results for a sample size of m = 10. In particular, for each population, the functional mean together with the estimated standard error are plotted.

4.2. Functional diversity profiles

We present an ecological application of the esti-mation method proposed. Suppose we have a bio-logical population made up of n species where we are able to observe the relative abundance vector

θ = (θ1, θ2, . . . , θn) in which the generic θi rep-resents the relative abundance of the ith species. One of the most remarkable aspects, in environmen-tal studies, is the evaluation of ecological diversity. The most frequently used diversity indexes may be expressed as a function fθ of the relative abun-dance vector. Patil and Taillie [1982] proposed

0 0.2 0.4 0.6 0.8 1 0 0.5 1 α ∼ N(0.5,0.1) x y=x α+ε y=x α+ε y=x α+ε 0 0.2 0.4 0.6 0.8 1 0 0.5 1 α ∼ U(0,1) x 0 0.2 0.4 0.6 0.8 1 0 0.5 1 α ∼ Exp(0.05) x

(5)

0 0.5 1 0 0.2 0.4 0.6 0.8 1 α ∼ N(0.5,0.1) x y 0 0.5 1 0 0.01 0.02 0.03 Standard error x 0 0.5 1 0 0.2 0.4 0.6 0.8 1 α ∼ U(0,1) x y 0 0.5 1 0 0.02 0.04 0.06 0.08 Standard error x 0 0.5 1 0 0.2 0.4 0.6 0.8 1 α ∼ Exp(0.05) x y 0 0.5 1 0 0.02 0.04 0.06 Standard error x

Fig. 2. J = 5000. Functional mean statistics for a sample size m = 10. to measure diversity by means of the β-diversity

profiles defined as ∆ = fθ(β) = 1 n  j θjβ+1 β . (4)

β-diversity profiles are non-negative and convex

curves. In order to apply functional linear mod-els on diversity profiles, Gattone and Di Battista [2009] applied a transformation which can be con-strained to be non-negative and convex. In the FDA context, it is convenient considering the β-diversity

−1 0 1 0 1 2 3 4 Uniform Population β mean profiles ∆ −1 0 1 0 0.01 0.02 0.03 Standard error β −1 0 1 0 1 2 3 4 Poisson Population β mean profiles ∆ −1 0 1 0 0.02 0.04 0.06 0.08 Standard error β −1 0 1 0 1 2 3 4 Multinomial Population β mean profiles ∆ −1 0 1 0 0.01 0.02 0.03 Standard error β

Fig. 3. J = 5000. Functional mean diversity profiles ˆ∆ =fˆθ=1− Pn

jθβ+1j

(6)

profile as a parametric function computable for any desired argument value of β ∈ [−1, 1]\{0}. The space parameter is multivariate and given by

θ. We simulate different biological populations by

assigning to each component of θ different dis-tributions such as the Uniform, the Poisson and the Multinomial distribution. From each popula-tion, we sample J = 5000 samples with different sample sizes. In this case, the function ∆ in (4) is observed without error. For each sample of size

m, we can evaluate the estimates ˆθ from the

observed θ1,θ2, . . . ,θm and the scheme detailed in (2) can be applied in order to obtain the func-tional mean ˆ∆ = fˆθ. In Fig. 3 we show the results for three populations with n = 5 species with dif-ferent level of diversity. From each population we randomly choose samples of size m = 5. The param-eters of the Poisson and the Multinomial distribu-tions are λ = 100∗ [0.55, 0.19, 0.13, 0.07, 0.06] and [0.55, 0.19, 0.13, 0.07, 0.06], respectively. For each population, the functional mean together with the estimated standard error are plotted. As desired, all the functional statistics result in being non-negative and convex. Furthermore, even though monotonic dependence from the parameters is not verified with diversity profiles, the functional mean satisfies the internality property in all the simulation runs.

5. Conclusion

We consider a Parametric Family of Functional Data which does not constitute a vectorial sub-space of an Lp space. We use the parameter space

in order to transport the statistics of the parame-ters to the functional space. Assuming a monotonic dependence from parameters, we prove the suitable properties for the functional mean and variability, similar to those of the vectorial case.

References

Di Battista, T., Gattone, S. A. & Valentini, P. [2007] “Functional data analysis of GSR signal,” S.Co.2007 Venice.

Di Battista, T., Gattone, S. A. & De Sanctis, A. [2010] “Dealing with FDA estimation methods,” New Per-spectives in Statistical Modeling and Data Analysis (Springer, NY), in press.

Ferraty, F. & Vieu, P. [2006] Nonparametric Functional Data Analysis: Theory and Practice (Springer-Verlag, NY).

Gattone, S. A. & Di Battista, T. [2009] “A functional approach to diversity profiles,” J. Roy. Stat. Soc. Series C58, 267–284.

Patil, G. P. & Taillie, C. [1982] “Diversity as a con-cept and its measurements,” J. Amer. Stat. Assoc.

77, 548–561.

Ramsay, J. O. & Silverman, B. W. [2007] Functional Data Analysis (Springer, NY).

Rudin, W. [1986] Real and Complex Analysis (McGraw-Hill).

Sung, J. A. [2005] Least Squares Orthogonal Distance Fitting of Curves and Surfaces in Space (Springer, NY).

Vieira, S. & Hoffmann, R. [1977] “Comparison of the logistic and the Gompertz growth functions consid-ering additive and multiplicative error terms,” Appl. Stat.26, 143–188.

Figura

Fig. 1. Functional populations f α ( x) = x α +  with three different parameter spaces α and  ∼ N(0, 0.01).
Fig. 2. J = 5000. Functional mean statistics for a sample size m = 10. to measure diversity by means of the β-diversity

Riferimenti

Documenti correlati

L’imprenditorializzazione della società, e non più solo della governance urbana come riteneva David Harvey alla fine degli anni Ottanta (Harvey, 1989), appare essere l’obiettivo

The FOIL object includes informations about the logic as a factory to produce class expressions and to manage ontology interaction, the target concept, the lists of individuals,

The additional dotted black line shows the distribution function of magnetic field values for the Gaussian model that best matches the observed profile of FR in the Coma

In the first edition, Lewis confined the quantum theory to the end of the second volume, but in the second edition (1919), he felt that “the role which the quantum theory now plays

Umberto Eco, Il pendolo di Foucault In 2009 Suhayl Saadi, the writer of Pakistani and Afghani origin, au- thored Joseph’s Box, a novel in which chaos seems to be the govern-

I profili che acquisiscono interesse sul piano costituzionale sono davvero molteplici, per la loro rilevanza e novità: il conflitto si appunta sul procedimento legislativo

Finally, by a lumped parameter thermal network the thermal analysis is conducted and the steady-state temperature is calculated as a function of the current phase

VIHI: continuous acquisition changing the IT and the RT for 8 different sections of orbit As per all Performance Tests, the execution of the test session will be automatic with