• Non ci sono risultati.

A Language for Differentiable Functions

N/A
N/A
Protected

Academic year: 2021

Condividi "A Language for Differentiable Functions"

Copied!
26
0
0

Testo completo

(1)

A Language for Differentiable Functions

Pietro Di Gianantonio1 Abbas Edalat2

1 Dip. di Matematica e Informatica Universit`a di Udine, 33100 Udine, Italy

pietro.digianatonio@uniud.it

2 Department of Computing, Imperial College London ae@ic.ac.uk

Abstract. We introduce a typed lambda calculus in which real num- bers, real functions, and in particular continuously differentiable and more generally Lipschitz functions can be defined. Given an expression representing a real-valued function of a real variable in this calculus, we are able to evaluate the expression on an argument but also evaluate the L-derivative of the expression on an argument. The language is an extension of PCF with a real number data-type but is equipped with primitives for min and weighted average to capture computable contin- uously differentiable or Lipschitz functions on real numbers. We present an operational semantics and a denotational semantics based on contin- uous Scott domains and several logical relations on these domains. We then prove an adequacy result for the two semantics. The denotational semantics also provides denotational semantics for Automatic Differen- tiation. We derive a definability result showing that for any computable Lipschitz function there is a closed term in the language whose evalu- ation on any real number coincides with the value of the function and whose derivative expression also evaluates on the argument to the value of the L-derivative of the function.

Introduction

Real-valued locally Lipschitz maps on finite dimensional Euclidean spaces enjoy a number of fundamental properties which make them the appropriate choice of functions in many different areas of applied mathematics and computation.

They contain the class of continuously differentiable functions, are closed un- der composition and the absolute value, min and max operations, and contain the important class of piecewise polynomial functions, which are widely used in geometric modelling, approximation and interpolation and are supported in Mat- Lab [4]. Lipschitz maps with uniformly bounded Lipschitz constants are closed under convergence with respect to the sup norm. Another fundamental property of these maps is that a Lipschitz vector field in Rn has a unique solution in the initial value problem [3].

In the past thirty years, motivated by applications in control theory and op- timisation and using an infinitary double limit superior operation, the notion

(2)

of Clarke gradient has been developed as a convex and compact set-valued gen- eralized derivative for real-valued locally Lipschitz maps [2]. For example, the absolute value function, which is not classically differentiable at zero, is a Lips- chitz map which has Clarke gradient [−1, 1] at zero. The Clarke gradient extends the classical derivative for continuously differentiable functions.

Independently, a domain-theoretic Scott continuous Lipschitz derivative, later called the L-derivative, was introduced in [9] for interval-valued functions of an interval variable and was used to construct a domain for locally Lipschitz maps;

these results were then extended to higher dimensions [10]. It was later shown that on finite dimensional Euclidean spaces the L-derivative actually coincides with the Clarke gradient [6]. In finite dimensions, therefore, the L-derivative provides a simple and finitary representation for the Clarke gradient.

Since the mid 1990’s, a number of typed lambda calculi, namely extensions of PCF with a real number data type, have been constructed, including Real PCF, RL and LPR [12, 5, 19], which are essentially equivalent and in which computable continuous functions can be defined. Moreover, IC-Reals, a variant of LPR with seven digits, has been implemented with reasonable efficiency in C and Haskell [14].

It was relatively straightforward in [8] to equip Real PCF with the integral operator, which is in fact a continuous functional. However, adding a derivative operator to the language has proved to be non-trivial since classic differentia- tion may not be defined on continuous functions and even when defined it may not result in a continuous function. The development of the Scott continuous L-derivative, defined in a finitary manner, has therefore been essential for con- struction of a language with a derivative operator.

The aim of this work is to take the current extensions of PCF with a real number data type into a new category and define a typed lambda calculus, in which real numbers, real functions and in particular continuously differentiable and Lipschitz functions are definable objects. Given an expression e representing a function from real numbers to real numbers in this language, we would be able to evaluate both e and its L-derivative on an argument. In this paper we will only be concerned with the theoretical feasibility of such a language and not with questions of efficiency.

To develop such a language, we need to find a suitable replacement for the test for positiveness ((0 <) ), which is used in the current extensions of PCF with real numbers to define functions by cases. In fact, a function defined using the conditional with this constructor will not be differentiable at zero even if the two outputs of the conditional are both differentiable: Suppose we have two real computable functions f and g whose derivatives D f and D g are also computable, and consider l = λx. if (0 <) x then f x else g x. The function l is computable and there is an effective way to obtain approximations of the value of l(x) including at 0. However, there is no effective way to generate any approximation for the derivative of l, i.e., D l, at the point 0. In fact, it is correct to generate an approximation of Dl on 0 only if f (0) = g(0), but this equality

(3)

is undecidable, i.e., it cannot be established by observing the computation of f and g at 0 for any finite time.

In this paper, instead of the test (0 <) , we will use the functions minimum, negation and weighted average when defining continuously differentiable or Lip- schitz maps. These primitives are of course definable in Real PCF, RL and LPR, but the definitions are based on the test (0 <) , which means that the information about the derivative is lost.

By a simple transfer of the origin and a rescaling of coordinates we can take the interval [−1, 1] as the domain of definition of Lipschitz maps. Furthermore, by a rescaling of the values of Lipschitz maps (i.e., multiplying them with the reciprocal of their Lipschitz constant) we can convert them to non-expansive maps, i.e., we can take their Lipschitz constant to be one. Concretely, we take digits similar to those in Real PCF as constructors and develop an operational semantics and a denotational semantics based on three logical relations, and prove an adequacy result. The denotational semantics for first order types is closely related but different from the domain constructed in [9] in that we capture approximations to the function part and to the derivative part regarded as a sublinear map on the tangent space. Finally, we prove a definability result and show that every computable Lipschitz map is definable in the language as the limit of a sequence of piecewise linear maps with the convergence of their L- derivatives.

We note that all our proofs can be found in the Appendix.

0.1 Related work

Given a programme to evaluate the values of a function defined in terms of a number of basic primitives, Automatic Differentiation (also called Algorithmic Differentiation) seeks to use the chain rule to compute the derivative of the function [13]. AD is distinct from symbolic differentiation and from numerical differentiation. Our work can be regarded as providing denotational semantics for forward Automatic Differentiation and can be used to extend AD to computation of the generalised derivative of Lipschitz functions.

In [11], the differential λ-calculus, and in [17], the perturbative λ-calculus that integrates the latter with AD, have been introduced which syntactically model the derivative operation on power series in a typed λ-calculus or a full linear logic. Although apparently similar, our calculus and these two λ-calculi differ in almost every aspect: motivation, syntax, semantics, and the class of definable real functions. (i) These λ-calculi have been presented to analyse lin- ear substitution and formal differentiation, (ii) the syntax is quite structured and contains constructors that have no correspondence in our setting, (iii) the semantics is based on differential categories and not on domain theory, and (iv) the definable real functions are limited to analytical maps which have power series expansion.

On the other hand, Computable Analysis [20, 21] and Constructive Analy- sis [1] are not directly concerned with computation of the derivative and both

(4)

only deal with continuously differentiable functions. In fact, a computable real- valued function with a continuous derivative has a computable derivative if and only if the derivative has a recursive modulus of uniform continuity [15, p. 191], [20, p. 53], which is precisely the definition of a differentiable function in constructive mathematics [1, p. 44].

1 Syntax

We denote the new language with PCDF (Programming language for Com- putable and Differentiable Functions).

The types of PCDF are the types of a slightly modified version of PCF where natural numbers are replaced by integers, together with a new type ι, an expression e of type ι denotes a real number in the interval [−1, 1] or a partial approximation of a real number, represented by a closed intervals contained in [−1, 1]. The set T of type expressions is defined by the grammar:

σ ::= o | ν | ι | σ → σ

where o is the type of booleans and ν is the type of integer numbers.

The expressions of PCDF are the expressions of PCF together with a new set of constants for dealing with real numbers. This set of constants is composed by the following elements:

(i) A constructor for real numbers given by: dig : ν → ν → ν → ι → ι. It is used to build affine transformations, and real numbers are obtained by a limiting process. The expression dig l m n represents the affine transforma- tion λx.(l + m · x)/n, if 0 ≤ m < n and |l| ≤ n − m, or the constant function λx.0 otherwise. The above condition on l, m, n implies that dig l m n repre- sents a rational affine transformation mapping the interval [−1, 1] strictly into itself with a non-negative slope or derivative 0 ≤ m/n < 1.

In this way we use three integers to encode a rational affine transformation;

of course it is possible to devise other encodings where just natural numbers or a single natural number is used, however these alternative encodings will be more complex.

The affine transformations definable by dig are also called generalised dig- its. Since there is no constant having type ι, an expression e having type ι can never normalise and its evaluation proceeds by producing expressions in the form dig l m n e0. These expressions give partial information about the value represented by e, namely they state that e represents a real num- ber contained in the interval [(l + m)/n, (l − m)/n], which is the range of the function λx.(l + m · x)/n. During the reduction process, this interval is repeatedly refined and the exact result, a completely defined real number, can be obtained as the limit of this sequence.

(ii) The opposite sign function (negation) opp : ι → ι.

(iii) add : ν → ν → ι → ι, representing the function λ p q x . min((p/q + x), 1), if 0 < p < 2 · q, and the constant function λx. 0 otherwise.

(5)

We define sub p q x as syntactic sugar for the expression opp (add p q (opp x)), which returns the value max((x − (p/q), −1)).

(iv) A weighted average function av : ν → ν → ι → ι → ι. The expression av p q represents the function λx y . (p/q) · x + (1 − p/q) · y, if 0 < p < q, and the constant function λx y . 0 otherwise.

(v) The minimum function

min : ι → ι → ι

with the obvious action on pairs of real numbers. We define max x y as syntactic sugar for the expression opp (min (opp x)(opp y)).

(vi) A test function (0 <) : ι → o , which returns true if the argument is strictly greater than zero, and false if the argument is strictly smaller that zero. The test function can be used for constructing functions that are not differentiable, an example being the function λx. if (0 <) (x) then 1 else 0; as a consequence we impose some restriction in its use.

(vii) The if-then-else constructor on reals, ifι : o → ι → ι → ι, and the parallel if-then-else constructor pifι : o → ι → ι → ι.

The main use for the parallel if operator is to evaluate, without loss of in- formation, derivative of expressions containing the min operator. However, the parallel if operator can be completely avoided in defining non-expansive functions on real numbers. In fact in the constructive proof of our defin- ability result, the parallel if operator is never used.

(viii) A new binding operator D. The operator D can bind only variables of type ι and can be applied only to expressions of type ι. In our language, Dx.e represents the derivative of the real function λx.e.

The differential operator D can be applied only to expressions that contain neither the constant (0 <) nor the differential operator D itself.

We note that, with the exception of the test functions (0 <) , all the new constants represent functions on reals that are non-expansive; the if-then-else constructors are also non-expansive if the distance between true (tt) and false (f f ) is defined to be equal to two, while the test function (0 <) cannot be non-expansive, whatever metric is defined on the Boolean values. The expres- sions containing neither the constant (0 <) nor the differential operator D are called non-expansive since they denote functions on real numbers that are non- expansive. This fact, intuitively true, is formally proved by Proposition 2. The possibility to syntactically characterise a sufficiently rich set of expressions rep- resenting non-expansive functions is a key ingredient in our approach that allows us to obtain information about the derivative of a function expression without completely evaluating it. For example, from the fact that e : ι is a non-expansive expression, one can establish that the derivative of λx.e, at any point, is con- tained in the interval [−1, 1] and that the derivative of λx.dig l m n e is contained in the smaller interval [−m/n, m/n].

(6)

2 Operational semantics

The operational semantics is given by a small step reduction relation, → , which is obtained by adding to the PCF reduction rules the following set of extra rules for the new constants.

The operational semantics of add and min operators uses an extra constant aff : ν → ν → ν → ι → ι. The expression aff l m n is intended to represent general affine transformations (including expansive ones) with a non-negative derivative, i.e., the affine transformation λx.(l + mx)/n with m ≥ 0, n > 0. A property preserved (i.e., invariant) by the reduction rules is that the constant aff appears only as the head of one of the arguments of min or as the head of the fourth argument of aff . It follows that in any expression e0 in the reduction chain of a standard expression e (without the extra constants aff ), the constant aff can appear only in the above positions.

The generalised digit dig l m n is a special case of an affine transformation.

Therefore, in applying the reduction rules, we use the convention that any re- duction rule containing, on the left hand side, a general affine transformation aff can be applied also to terms where the affine transformation aff is substituted by the constructor dig .

On affine transformations we will use the following notations:

– (aff l1m1n1) ◦ (aff l2m2n2) stands for aff (l1· n2+ m1· l2) (m1· m2) (n1· n2), i.e., the composition of affine transformations.

– If m 6= 0, (aff l m n)−1 stands for (aff (−l) n m), i.e., the inverse affine trans- formation; if m = 0, the expression (aff l m n)−1 is undefined.

– The symbols l, m, n, p, q stand for values of integer type.

The reduction rules are the PCF reduction rules together the following set of extra rules. First we have three simple reductions:

– dig l m n e → dig 0 0 1 e if m < 0 or n ≤ 0 or |l| > n − m.

– add p q e → dig 0 0 1 e if p ≤ 0 or q ≤ p.

– av p q e1e2 → dig 0 0 1 e1 if p ≤ 0 or q ≤ p.

The above rules deal with those instances of dig , add , av with integer arguments that reduce to the constant zero digit. An implicit condition on the following set of rules is that they apply only if none of the above three rules can be applied.

1. dig l1m1n1(dig l2m2n2e) → ((dig l1m1n1) ◦ (dig l2m2n2)) e 2. opp (dig l m n e) → dig (−l) m n (opp e)

3. add p q e → min (aff p q q e)(dig 1 0 1 e)

note that (aff p q q) and (dig 1 0 1) represent the functions λx.p/q + x and λx.1 respectively.

4. av p q (dig l m n e1) e2 → dig l0m0n0(av p0q0e1e2)

where l0= l · p, m0= q0 = m · p + n · q − n · p, n0= n · q and p0 = m · p.

By a straightforward calculation, one can check that the left and the right parts of the reduction rules represent the same affine transformation on the arguments e1, e2.

(7)

5. av p q e1(dig l m n e2) → dig l0m0n0(av p0q0e1e2)

where l0= l(q − p), m0 = q0= np + mq − mp, n0= nq and p0= np.

6. min (dig l1m1n1e1)(aff l2m2n2e2) → dig l1m1n1e1

if (l1+ m1)/n1≤ (l2− m2)/n2.

The above condition states that every point in the image of (dig l1m1n1) is smaller, in the usual Euclidean order, than every point in the image of (aff l2m2n2), i.e., the first argument of min is certainly smaller that the second.

7. min (aff l1m1n1e1)(dig l2m2n2e2) → dig l2m2n2e2

if (l2+ m2)/n2≤ (l1− m1)/n1

The symmetric version of the previous rule.

8. min (dig l m n e1) e2

dig l0m0n0(min ((dig l0m0n0)−1◦ (dig l m n) e1) ((dig l0m0n0)−1e2)) if l + m < n and l0= l + m − n, m0= l + m + n 6= 0, n0= 2 · n.

The above equations imply that if (dig l m n) has image [a, b] then (dig l0m0n0) has image [−1, b]. The rule is justified by the fact if the first argument of min are smaller than b then the value of min is also smaller than b.

9. min e1(dig l m n e2) →

dig l0m0n0(min ((dig l0m0n0)−1e1) ((dig l0m0n0)−1◦ (dig l m n) e2)) if l + m < n and l0= l + m − n, m0= l + m + n 6= 0, n0= 2 ◦ n.

The symmetric version of the previous rule.

10. min (aff l1m1n1e1)(aff l2m2n2e2) →

dig l0m0n0(min ((dig l0m0n0)−1◦(aff l1m1n1) e1)((dig l0m0n0)−1◦(aff l2m2n2) e2)) if −1 < (l1+m1)/n1≤ (l2−m2)/n2and l0= l1−m1+n1, m0 = m1−l1+m1,

n0= 2 · n1.

The above equation implies that if (dig l1m1n1) has image [a, b] then (dig l0m0n0) has image [a, 1]. The rule is justified by the fact if both arguments of min are greater that a then the value of min is also greater than a.

11. min (aff l1m1n1e1)(aff l2m2n2e2) →

dig l0m0n0(min ((dig l0m0n0)−1◦(aff l1m1n1) e1)((dig l0m0n0)−1◦(aff l2m2n2) e2)) if −1 < (l2−m2)/n2< (l1+m1)/n1, and l0 = l2−m2+n2, m0 = m2−l2+m2,

n0= 2 · n2.

The symmetric version of the previous rule.

12. aff l1m1n1(aff l2m2n2e) → ((aff l1m1n1) ◦ (aff l2m2n2)) e 13. aff l m n e → dig l m n e

if −1 ≤ (l − m)/n, (l + m)/n ≤ 1 and e is not in the form aff a0b0e.

14. (0 <) (dig l m n e) → tt if (l − m)/n > 0 15. (0 <) (dig l m n e) → ff if (l − m)/n < 0 16. pifι e then dig l1m1n1e1else dig l2m2n2e2

dig l0m0n0(pif e then (dig l0m0n0)−1◦ (dig l1m1n1) e1 else (dig l0m0n0)−1◦ (dig l2m2n2) e2)

where n0 = 2 · n1· n2, m0 = max((l1+ m1) · n2, (l2+ m2) · n1) − min((l1− m1) · n2, (l2− m2) · n1), l0= 2 · min((l1− m1) · n2, (l2− m2) + m0.

Here the values l0, m0, n0 are defined in such a way that if (dig l1m1n1) has image [a1, b1] and (dig l2m2n2) has image [a2, b2], then (dig l0m0n0) has image the convex closure of the set [a1, b1] ∪ [a2, b2].

(8)

The remaining rules for pif, if are included in the reduction rules for PCF and therefore are omitted from the present list.

17. N → N0

M N → M N0 if M is a constant different from the combinator Y or is an expression in the form min n1, min n1n2, min n1n2n3, min n1n2n3M0, av n1, av n1n2, av n1n2M0, add n1, add n1n2, pifι M0 then, pifι M0 then M00 else, where n1, n2, n3 are values.

The reduction rules for the derivative operator are:

1. Dx. x → λy. dig 0 0 1 y

2. Dx. dig l m n e → λy. dig 0 m n (Dx. e)y 3. Dx. opp e → λy. opp (Dx. e)y

4. Dx. add p e q → λy. pifι(0 <) (add (q−p) q (opp e)) then (Dx. e)y else dig 0 0 1 y 5. Dx. av p q e1e2 → λy. av p q ((Dx. e1)y) ((Dx. e2)y)

6. Dx. min e1e2

λy. pif (λx. (0 <) (av 1 2 (opp e1)e2))y then (Dx. e1)y else (Dx. e2)y

7. Dx. pifι e1then e2else e3 → λy. pifι (λx. e1)y then (Dx. e1)y else (Dx. e2)y 8. Dx. if e1then e2else e3 → λy. if (λx. e1)y then (Dx. e1)y else (Dx. e2)y 9. Dx. Y e → Dx. e(Y e)

10. Dx. (λy. e)e1. . . en → Dx. e[e1/y]e2. . . en

Note that the rules for the derivative operator are a direct derivation of the usual rules for the symbolic computation of the derivative of a function.

2.1 Examples

We will give some examples for defining non-analytic functions in later sections;

in particular we will show in the proof of definability how easily piecewise linear maps with rational coefficients are defined in the language. A useful technique to define analytic functions and real constants is to consider their Taylor series expansions and reduce the Taylor series to a sequence of applications of affine transformations. For example the value e − 2, where e is the Euler constant, is given by the Taylor series 1/2!+1/3!+1/4! . . .. Denoting the affine transformation λx.(1 + x)/n as f , the above series can be expressed as f (2)(f (3)(f (4)(. . .) . . .).

It follows that in PCDF e − 2 can be expressed as

(Y λf : ν → ι. λn : ν. f n. dig 1 1 n (f (n + 1))) 2.

Given an expression to represent product in PCDF, it is possible to use the above technique to express analytic functions. For example, suppose hp de- fines the half-product function λxy. x · y/2. Then, one can express the function λx. ex/2− 1 − x/2 by the PCDF.

λx : ι. hp x ((Y λf : ν → ι → ι. λn : ν. λx : ι. hp x (dig 1 1 n (f (n + 1)))) 2 x) The half product hp is definable in PCDF by reducing product to a series of applications of the average and minimum function. The actual definition of the

(9)

function hp is lengthy and we will not present it here. As a simpler example of the technique involved, we present the definition of the function λx.x2/2.

Consider the following mutual recursive definition of the terms g, h : ν → ι → ι

g 0 x = max x (opp x)

h n x = add 1 (2n+1) (g n (add 1 (2n+1) x)) g (n + 1) x = min (h n x)(h n (opp x))

By standard techniques, one can derive a PCDF expression g satisfying the above recursive definition. The careful reader can check that the term:

λx. (sub 1 2 (av 1 2 (g 0 x)(av 1 2 (g 1 x)(av 1 2 (g 2 x) . . . (av 1 2 (g n x) (dig 101x) . . .) represents the step-wise linear interpolation of the function λx.x2/2 on the points of the set {i/2n|i ∈ Z, −2n ≤ i ≤ 2n}. It follows that the function λx.x2/2 is defined by the term:

λx. (sub 1 2((Y λf : ν → ι → ι. λn : ν. λx : ι. (av 1 2 (g n x) (f (n + 1) x))) 0 x).

3 Denotational Semantics

The denotational semantics for PCDF is given in the standard way as a family of continuous Scott domains, U D := {Dσ| σ ∈ T }. The basic types are interpreted using the standard flat domains of integers and booleans. The domain associated to real numbers is the product domain Dι = I × I, where I is the continuous Scott domain consisting of the non-empty compact subintervals of the interval I = [−1, 1] partially ordered with reverse inclusion. Elements of I can represent either a real number x, i.e., the degenerated interval [x, x], or some partial in- formation about a real number x, i.e., an interval [a, b], with x ∈ [a, b]. On the elements of I, we consider both the set-theoretic operation of intersection (∩), the pointwise extensions of the arithmetic operations, and the lattice operations on the domain information order (u, t), [12]. Function types have the usual interpretation of call-by-name programming languages: Dσ→τ = Dσ→ Dτ.

A hand waiving explanation for the definition of the domain Dι= I × I, is that the first component is used to define the value part of the function while the second component is used to define the derivative part. More precisely, a (non- expansive) function f from I to I, is described, in the domain, by the product of two functions hf1, f2i : (I × I) → (I × I): the function f1 : (I × I) → I represents the value part of f , in particular f1(i, j) is the image of the interval i under f for all intervals j, i.e., f1 depends only on the first argument. The second function f2: (I × I) → I represents the derivative part. If D f denotes the derivative of f , then f2(i, j) is the image of the intervals i and j under the function λx, y.D f (x) · y. Thus, f2 is linear in its second component and f2({x}, {1}) is the derivative of f at the point x.

Note that with respect to the above interpretation, composition behaves cor- rectly, that is if the pair hf1, f2i : (I × I) → (I × I) describes the value part and the derivative part of a function f : I → I and hg1, g2i : (I × I) → (I × I) describes a function g : I → I then hh1, h2i describes, by the chain rule, the func- tion f ◦ g with h1(i, j) = f1(g1(i, j), g2(i, j)) and h2(i, j) = f2(g1(i, j), g2(i, j)).

(10)

The L-derivative of the non-expansive map f : I → I is the Scott continuous function L(f ) : I → I defined by [6]:

L(f )(x) =T{b ∈ I : ∃ open interval O ⊂ I, x ∈ O with

f (u)−f (v)

u−v ∈ b for all u, v ∈ O, u 6= v}.

Consider now the case of functions in two arguments. Given a function g : I → I → I, its domain description will be an element in (I × I) → (I × I) → (I × I), which is isomorphic to ((I × I) × (I × I)) → (I × I). Thus again, the domain description of g consists of a pair of functions hg1, g2i, with g1 describing the value part. If D g (x1, x2) is the linear transformation representing the derivative of g at (x1, x2), then the function g2 is a domain extension of the real function λx1, y1, x2, y2.D g (x1, x2) · (y1, y2).

This approach for describing functions on reals is also used in (forward mode) Automatic Differentiation [13]. While Automatic Differentiation is different from our method in that it does not consider the domain of real numbers and the notion of partial reals, it is similar to our approach in that it uses two real numbers as input and a pair of functions to describe the derivative of functions on reals. Authomatic differentiation is also used in [17], while the idea of using two separated components to describe the value part and the derivative part in the domain-theoretic setting can be found also in [9].

The semantic interpretation function E is defined, by structural induction, in the standard way:

EJcKρ = BJcK EJxKρ = ρ(x)

EJe1e2Kρ = EJe1Kρ(EJe2Kρ) EJλx

σ.eKρ = λd ∈ Dσ.EJeK(ρ[d/x])

The semantic interpretation of any PCF constant is the usual one, while the semantic interpretation of the new constants on reals is given by:

BJdig K(l, m, n, hi, j i) =

⊥ if l = ⊥ ∨ m = ⊥ ∨ n = ⊥

h[0, 0], [0, 0]i if ¬(0 ≤ m < n ∧ |l| ≤ n − m) hl/n + m/n · i, m/n · ji otherwise

BJopp K(hi, j i) = h−i, −j i

BJadd K(p, q, hi, j i) =









⊥ if p = ⊥ ∨ q = ⊥

h[0, 0], [0, 0]i if ¬(0 < 2 · p < q) hi + p/q, ji if i + p/q < 1 h[1, 1], [0, 0]i if i + p/q > 1 hi + p/q ∩ [−1, 1], j u [0, 0]i otherwise BJav K(p, q, hi1, j1i, hi2, j2i)

=

⊥ if p = ⊥ ∨ q = ⊥

h[0, 0], [0, 0]i if ¬(0 < p < q)

hp/q · i1+ (1 − p/q) · i2, p/q · j1+ (1 − p/q) · j2i otherwise

(11)

BJmin K(hi1, j1i, hi2, j2i) =

hi1, j1i if i1< i2

hi2, j2i if i1> i2

hi1min i2, j1u j2i otherwise

BJ(0 <) K(hi, j i) =

tt if i > 0 ff if i < 0

⊥ otherwise The interpretation of the derivative operator is given by:

EJDx.eKρ= λd ∈ I × I . hπ2(EJeKρ[hπ1d, 1i/x]), ⊥i

Note that the function BJ(0 <) K loses the information given by the derivative part, while the function EJDx.eKρ, is a sort of translation of the function EJλx.eKρ: The value of EJDx.eKρ is obtained from the derivative part of EJλx.eKρ, while the derivative part of EJDx.eKρis set to ⊥.

Consider some examples. The absolute value function can be implemented through the term Ab = λx.max (opp x)x with the following semantic interpreta- tion:

EJAbKρ(hi, ji) =

hi, ji if i > 0 h−i, −ji if i < 0 h[k, k+], [−1, 1]ji otherwise, where k= max(i, −i+), k+= max(i+, −i) with i = [i, i+].

When the absolute value function is evaluated at 0, where it is not differen- tiable, the derivative part of the semantic interpretation returns a partial value:

π2(EJAbKρ({0}, {1}) = [−1, 1]. This partial value coincides with the Clarke gra- dient, equivalently the L-derivative, of the absolute value function.

The function |x−y|2 , is represented by the expression

Ab-dif = λx.y.max (xav 1/2(opp y))((opp x)av 1/2y) whose semantics is the function:

EJAb-difKρ(hi1, j1i, hi2, j2i) =





hi1−i2 2,j1−j2 2i if i1> i2 hi2−i2 1,j2−j2 1i if i1< j1

h[k, k+], [−1/2, 1/2](j1− j2)i otherwise, ,

where k= max(i1 − i+2, i2 − i+1) and k+= max(i+1 − i2, i+2 − i1).

FromJAb-difK it is possible to evaluate the partial derivative of the function

|x−y|

2 , not only along the axes x and y, but along any direction. Considering the Euclidean distance, the derivative of the function at (0, 0) in the direction of the unit vector (u/√

u2+ v2, v/√

u2+ v2) is given by EJAb-difKρ(h{0}, {u/√

u2+ v2}i, h{0}, {v/√

u2+ v2}i), that is the the interval [−1/2, 1/2]u−v

u2+v2. Again this value coincides with the value of the Clarke gra- dient of the function |x−y|2 at (0, 0) in the direction (u/√

u2+ v2, v/√

u2+ v2).

(12)

3.1 Logical relations characterization

In the present approach we choose to define the semantic domains in the simplest possible way. As a consequence, our domains contain also points that are not consistent with the intended meaning, for example, the domain Dι→ι= (I×I) → (I × I) contains also the product of two functions hf1, f2i where the derivative part f2 is not necessarily linear in its second argument and is not necessarily consistent with the value part, i.e., the function f1; moreover the value part f1

can be a function depending also on its second argument.

However the semantic interpretation of (non-expansive) PCDF expressions will not have this pathological behaviour. A proof of this fact and a more precise characterisation of the semantic interpretation of expressions can be obtained through the technique of logical relations [18]. In particular we define a set of logical relations on the semantic domains and prove that, for any non-expansive PCDF expression e, the semantic interpretation of e satisfies these relations.

Using this method, we can establish a list of properties for the semantic inter- pretation of PCDF expressions.

Definition 1. The following list of relations are defined on the domain Dι. – Independence: A binary relation Riι consisting of the pairs of the form

(hi, j1i, hi, j2i). The relation Riι is used to establish that, for a given function, the value part of the result is independent from the derivative part of the argument: f1(i, j1) = f1(i, j2).

– Sub-linearity: A family of relations Rl,rι indexed by a rational number r ∈ [−1, 1]. The family Rl,rι consists of pairs of the form (hi, j1i, hi, j2i) where j1v r · j2. These relations are used to establish the sublinearity of the derivative part: f2(i, r · j) v r · f2(i, j).

– Consistency: A family of ternary relation Rd,rι indexed by a rational number r ∈ (0, 2], consisting of triples of the form (hi1, j1i, hi2, j2i, hi3, j3i) with i3v i1u i2 and (r · j3) consistent with (i1− i2), that is the intervals (r · j3) and (i1− i2) have a non-empty intersection. This relation is used to establish the consistency of the derivative part of a function with respect to the value part.

The above relations are defined on the other ground domains Do and Dν as the diagonal relations in two or three arguments, e.g., Rd,rν (l, m, n) iff l = m = n. The relations are extended inductively to higher order domains by the usual definition on logical relations: Rσ→τi (f, g) iff for every d1, d2 ∈ Dσ, Riσ(d1, d2) implies Riτ(f (d1), g(d2)), and similarly for the other relations.

Proposition 1. For any closed expression e : σ, for any rational number r ∈ [−1, 1], the semantic interpretation EJeKρ of e, is self-related by Riσ, Rl,rσ , i.e.

Rσi(EJeKρ, EJeKρ), and similarly for Rl,rσ . Moreover, if the expression e : σ is non-expansive, the semantic interpretation EJeKρ, is self-related by Rd,rσ .

We now show how the three relations ensure the three properties of indepen- dence, sublinearity and consistency. To any element f = hf1, f2i in the domain Dι→ι= (I × I) → (I × I) we associate a partial function fv : I → I with

(13)

fv(x) = y if f1(h{x}, ⊥i) = {y}

undefined if f1(h{x}, ⊥i) is a proper interval and a total function

fd: I → I = λx.f2(h{x}, {1}i))

The preservation of the relations Rιi, Rl,rι has the following straightforward consequences:

Proposition 2. (i) For any function f = hf1, f2i in Dι→ι self-related by Riι→ι, for every i, j1, j2, f1(hi, j1i) = f1(hi, j2i), the return value part is independent from the derivative argument.

(ii) For any function f = hf1, f2i in Dι→ι self-related by Rl,rι→ι for every i, j, and for every rational r ∈ [−1, 1], f2(hi, r · ji) v r · f2(hi, ji). It follows that:

– (f2(hi, {r}i))/r v f2(hi, {1}i), i.e., the most precise approximation of the L-derivative is obtained by evaluating the function with 1 as its second argument,

– for every i, j, f2(hi, −ji) = −f2(hi, ji), i.e., the derivative part is an odd function.

The preservation of the relation Rd,rι induces the following properties (see the Appendix for the proof):

Proposition 3. For any function f in Dι→ι self-related by Rd,rι→ι: (i) the function fv is non-expansive;

(ii) on the open sets where the functions fv is defined, the function fd is an approximation to the L-derivative of the function fv;

(iii) if f is a maximal element of Dι→ι then fv is a total function and fd is the associated L-derivative.

3.2 Subdomains

By definition, the logical relations are closed under directed lubs, and as a con- sequence the sets of elements self-related by them are also closed under directed lubs.

For any ground type σ the relations Riσ, Rσl,r, Rd,rσ are closed under arbitrary meets, meaning that if ∀j ∈ J . Riσ(dj, ej) then Riσ(d

j∈Jdj,d

j∈Jej) and simi- larly for the other relations Rl,rσ , Rd,rσ . The proof is immediate for σ = o, ν, and is a simple check for σ = ι. The following result shows that this closure property holds also for σ = ι → ι.

Proposition 4. The set of elements in Dι→ι self-related by any of the three relations Rι→ιi , Rl,rι→ι, and Rd,rι→ι is closed under arbitrary meets.

(14)

Proof. For the independence relation Rι→ιi , the closure property is trivial to check. For the consistency relation Rd,rι→ι, the closure under non-empty meets follows immediately from the fact that this relation is downward closed. The closure property for the sublinearity relation Rι→ιl,r is given in the Appendix.

We now employ the following result whose proof can be found in the Ap- pendix.

Proposition 5. In a continuous Scott domain, a non-empty subset closed under lubs of directed subsets and closed under non-empty meets is a continuous Scott subdomain.

Corollary 1. If σ is a ground type or first order type, then the set of elements in Dσ self-related by the three logical relations is a continuous Scott subdomain of Dσ.

As we do not deal with second or higher order real types in this extended abstract, we will not discuss the corresponding subdomains here.

3.3 Adequacy

As usual once an operational and denotational semantics are defined, it is nec- essary to present an adequacy theorem stating that the two semantics agree.

Let us denote by [a, b]  Eval(e) the fact that there exits three integers l, m, n such that e →? dig l m n e0 and [(l − m)/n, (l + m)/n] ⊂ (a, b). The proof of the following theorem is presented in the Appendix.

Theorem 1 (Adequacy). For every closed term e with type ι, interval [a, b]

and environment ρ, we have:

[a, b]  Eval(e) iff [a, b]  π1(EJeKρ)

In the operational semantics that we have proposed, the calculus of the derivative is performed through a sort of symbolic computation: the rewriting rules specify how to evaluate the derivative of the primitive functions and the application of the derivative rules essentially transforms a function expression into the function expression representing the derivative. The denotational se- mantics provides an alternative approach to the computation of the derivative, which almost exactly coincides with the computation performed by Automatic Differentiation. We can interpret our adequacy result as a proof that symbolic computation of the derivative and the computation of the derivative through Automatic Differentiation coincide. We remark in passing that, inspired by the denotational semantics, it is possible to define an alternative operational seman- tics that will perform the computation of the derivative in the same way that is performed by Automatic Differentiation.

(15)

3.4 Function definability

We will show in the following theorem that any computable Lipschitz function can be obtained in our framework as the limit, in the sup norm, of a sequence of piecewise linear maps definable in PCDF such that every piecewise linear map in the sequence gives lower and upper bounds for the function and the L- derivative of the function is contained in the L-derivatives of the piecewise linear maps, which converge to the classical derivative of the function wherever it is continuously differentiable.

Theorem 2. For any maximal computable function f in Dι→ι preserving the logical relations Riι→ι, Rl,rι→ι, Rd,rι→ι, there exists a closed PCDF expression f such that:

∀x ∈ I. fv(x) = (EJf Kρ)v(x) ∧ fd(x) = (EJf Kρ)d(x)

The above definability result states that if we consider only the behaviour of the domain functions on the total elements of Dι (i.e. the elements representing completely defined real numbers) then PCDF is sufficiently rich to represent the computable elements of Dι→ι.

We do not consider the problem of defining PCDF expressions whose seman- tics coincides with domain functions also on partial elements. The reason for this choice is that this later problem is technically more difficult and less interesting from a practical point of view.

The proof of the above result is quite lengthy: we define a general methodol- ogy to transform the information that can be extracted from a domain function into a PCDF expression. The Appendix contains a description of the construc- tion.

4 Conclusion

We have integrated, in a single language, exact real number computation with the evaluation of the derivatives of function expressions.

The language has been designed using a minimal set of primitives sufficient to define any computable (and differentiable) function. It can be seen as a the- oretical basis for the implementation of exact real number computation in a programming language. In a practical implementation, however, one needs both to extend the set of primitive functions and to carefully redesign the reduction strategy to increase both the usability of the language and the efficiency of the computation.

The main result presented here is an adequate denotational semantics for dif- ferentiable functions, which has required original ideas in developing the seman- tics domains, and a definability result showing the expressivity of the language.

The present research can be extended in several directions. Some possible future works are the following.

– An obvious problem to consider is whether the definability result presented in the paper can be extended to a larger class of function domains. We claim

(16)

that the techniques presented here can be easily adapted to functions with several arguments. This is not however the case when considering higher order functions, whose definability is an open problem.

– A second direction for possible further research is the treatment of the second derivative and more generally derivative of arbitrary order.

References

1. E. Bishop and D. Bridges. Constructive Analysis. Springer-Verlag, 1985.

2. F. H. Clarke. Optimization and Nonsmooth Analysis. Wiley, 1983.

3. E. A. Coddington and N. Levinson. Theory of Ordinary Differential Equations.

McGraw-Hill, 1955.

4. T. A. Davis and K. Sigmon. MATLAB Primer. CRC Press, seventh edition, 2005.

5. P. Di Gianantonio. An abstract data type for real numbers. Theoretical Computer Science, 221:295-326, 1999.

6. A. Edalat. A continuous derivative for real-valued functions. In New Computa- tional Paradigms, Changing Conceptions of What is Computable, pages 493–519.

Springer, 2008.

7. A. Edalat. A differential operator and weak topology for Lipschitz maps. Topology and its Applications, 157,(9):1629-1650, June 2010.

8. A. Edalat, M. Escard´o Integration in Real PCF. Information and Computation, 160:128–166, 2000.

9. A. Edalat, A. Lieutier. Domain theory and differential calculus (Functions of one variable). Mathematical Structures in Computer Science, 14(6):771–802, December 2004.

10. A. Edalat, A. Lieutier, and D. Pattinson. A computational model for multi-variable differential calculus. In Proc. FoSSaCS 2005, volume 3441 of LNCS, 3441: 505–519, 2005.

11. Thomas Ehrhard and Laurent Regnier. The differential lambda-calculus. Theoret- ical Computer Science, 309(1-3), 2003.

12. M. H. Escard´o. PCF extended with real numbers. Theoretical Computer Science, 162(1):79–115, August 1996.

13. A. Griewank and A. Walther. Evaluating Derivatives. Siam, second edition, 2008.

14. www.doc.ic.ac.uk/exact-computation/.

15. K. Ko. Complexity Theory of Real Numbers. Birkh¨auser, 1991.

16. G. Lebourg. Generic differentiability of lipschitzian functions. Transaction of AMS, 256:125–144, 1979.

17. O. Manzyiuk. A simply typed lambda calculus for forward automatic differentia- tion. In Proc MFPS 12, ENTCS 259–273, 2012.

18. J. C. Mitchell. Foundations of Programming Languages. MIT Press, 1996.

19. P. J. Potts, A. Edalat, and M. Escard´o. Semantics of exact real arithmetic. In Twelfth Annual IEEE Symposium on Logic in Computer Science. IEEE, 1997.

20. M. B. Pour-El and J. I. Richards. Computability in Analysis and Physics. Springer- Verlag, 1988.

21. K. Weihrauch. Computable Analysis (An Introduction). Springer, 2000.

(17)

Appendix

We give the details of several proofs.

4.1 Proof of Proposition 1

Proof. The proof is quite standard, and is based on the fact that the relations Rσi, Rl,rσ , Rd,rσ are logical. First one proves that the semantic interpretation of (non-expansive) constants are self-related by Rd,rσ , Riσ, and Rl,rσ . Then, to show that the fixed-point operator preserves the relations, one shows that the bottom elements are self-related by Riσ, Rl,rσ , and Rd,rσ , and that the relations are closed under the lub of chains. Finally, by the basic lemma of logical relations, one

obtains the result. ut

4.2 Proof of Proposition 3

Proof. (i) Let x and y be two real numbers for which the function fvis defined.

For any rational r ≥ |x − y| we have that

Rιd,r(h{x}, [−1, 1]i, h{y}, [−1, 1]i, h[x, y], [−1, 1]i). Therefore

Rιd,r(f (h{x}, [−1, 1]i), f (h{y}, [−1, 1]i), f (h[x, y], [−1, 1]i)), which implies that fv(x) − fv(y) ∈ r · f2(h[x, y], [−1, 1]i), and thus −1 ≤ fv(x)−fx−yv(y)≤ 1.

(ii) Given any x ∈ I and any open interval O containing fd(x) = f2(h{x, }, {1}i), let [a, b]  {x} be a rational interval such that fv is define on [a, b] and f2(h[a, b], {1}i) ⊆ O, and let r = b − a, we have

Rιd,r(h{b}, [−1, 1]i, h{a}, [−1, 1]i, h[a, b], {1}i), by repeating the arguments of the previous point, it follows fv(b)−fb−av(a) ∈ f2(h[a, b], {1}i). By monotonicity of f it follows that for any pair of rationals a0, b0 ∈ (a, b), we have: fv(bb00)−f−av0(a0) ∈ f2(h[a, b], {1}i), and by continuity of fvfor any pair of real numbers x, y ∈ (a, b)

fv(x)−fv(y)

x−y ∈ O.

(iii) If f is a maximal element Dι→ι, by point (i) the function fvis non-expansive on the points where it is defined. It follows that if the function fvis not defined at a given point x, it is always possible to construct a function f such that f v f and fv defined on x, leading to a contradiction. Similar arguments can be used to prove that fdis the L-derivative of fv and not only an approximation of the L-derivative.

4.3 Proof of Proposition 4

The sublinearity relation Rl,rι→ιis closed under non-empty meets.

Proof. To show that sublinearity is closed under meets, assume that fk : I → I with k ∈ K is a family of Scott continuous functions satisfying the sublinearity fk(r[x, y]) ≥ rfk([x, y]) for all [x, y] ∈ I and some (rational) r ∈ [−1, 1]. We show that the meetd

kfk will also be sublinear.

(18)

We use the lower and upper parts of any f : I → I as f, f+ : T → [−1, 1]

where T = {(x, y) ∈ [−1, 1] × [−1, 1] : x ≤ y}. Note that f and f+ are lower and upper semi-continuous respectively. Sublinearity of f is equivalent to f+(r(x, y)) ≥ rf+((x, y)) and f(r(x, y)) ≤ rf((x, y)) for r ∈ [−1, 1] and (x, y) ∈ T .

We have: (d

kfi)+ = g with g = lim sup g0 where g0 = supk∈Kfk+ and similarly (d

kfi)= h with h = lim inf h0 where h0= infk∈Kfk.

The sublinearity condition for fk is equivalent to fk+(r(x, y)) ≥ rfk+((x, y)) and fk(r(x, y)) ≤ rfk((x, y)) for r ∈ [−1, 1] and (x, y) ∈ T .

By taking pointwise sup and inf respectively we get: g0(r(x, y)) ≥ rg0((x, y)) and h0(r(x, y)) ≤ rh0((x, y)). By taking limsup and liminf respectively we ob- tain: g(r(x, y)) ≥ rg((x, y)) and h(r(x, y)) ≤ rh((x, y)) as required.

4.4 Proof of Proposition 5

In a continuous Scott domain, a non-empty subset closed under lubs of direct subsets and closed under non-empty meets is a continuous Scott subdomain.

Proof. Let D be a continuous Scott domain and C ⊂ D a non-empty subset with the above closure properties. Given an element of d ∈ D, denote by i(d) the greatest lower bound (meet) in C of the set {c | d v c, c ∈ C}, if this set is not empty, otherwise let i(d) be undefined.

Then i, regarded as a partial function from D to C, preserves the well below relation . In fact given two elements x, y ∈ D with y D x and i(x) defined, we check that i(y) C i(x). Let A be a directed subset of elements in C, with i(x) vF

CA. Since C is closed under lub of directed sets,F

CA =F

DA, and by construction of i, we have x v i(x). Thus, x vF

DC, and, by hypothesis, there exists a ∈ A such that y v a. Since i is monotone and coincides with the identity on the elements of C, we have i(y) v i(a) = a and therefore i(y) C i(x). It follows that if a set B is a basis of D then i(B) forms is a basis for C. In fact given an element x ∈ C, the set A = {a ∈ B | a  x}, is a directed set with lub x, then i(A) is a directed set of elements well below i(x) = x, having x as lub.

Therefore C is a continuous dcpo, and since it has non-empty meets, it is also consistently complete.

4.5 Proof of Adequacy Theorem 1

For every closed term e with type ι and environment ρ, we have:

[a, b]  Eval(e) iff [a, b]  π1(EJeKρ)

Proof. We use the technique of computability predicates to prove both the soundness and the completeness of the operational semantics. Note that the soundness of the operational semantics cannot be proved by simply showing that the reduction rules preserve the denotational semantics, since this is simply not true. A simple example being the expression (Dx. dig 0 1 2 e1) e2that reduces

(19)

to dig 0 1 2 ((Dx. e1) e2). The elements EJ(Dx. dig 0 1 2 e1) e2Kρ and

EJdig 0 1 2 ((Dx. e1) e2)Kρdo not coincide on their second component (the first is

⊥, the other is above [−1/2, 1/2]. More generally, all the reduction rules for the derivative operator do not preserve the semantics on the second element.

We define a computability predicate Comp for closed terms of type o, ν by requiring that the denotational and operational semantics coincide, in the usual way. A closed term e having type ι satisfies the predicate Compι if for every closed rational interval [a, b] and environment ρ we have: [a, b]  Eval(e) iff [a, b]  π1(EJeKρ)

The computability predicate is then extended, by induction on types, to closed elements of any type, and, by closure, to arbitrary elements.

Using the standard techniques for computability predicate, it is possible to prove that all constants are computable, and that λ-abstraction preserves the computability of the expressions. Therefore all expressions not containing the derivative operator are computable.

To prove that the computability predicate is satisfied by expressions con- taining the derivative operator, we show, by structural induction on the non- expansive expression e, that the expression Dx.e is also computable.

The proof considers many cases. As an example, we take the case where e = min e1e2. By the induction hypothesis we can assume the computability of Dx.e1 and Dx.e2. Since the expressions e1, e2, pif and av do not contain the derivative operator, we can assume that they are computable.

We need to prove, for any computable expression e0, that [a, b]  Eval((Dx.min e1e2)e0) iff [a, b]  π1(EJ(Dx.min e1e2)e0Kρ).

On the one hand, we have the following chain of implications:

[a, b]  Eval((Dx.min e1e2)e0) iff, by the denotational semantics rules, [a, b]  Eval(pif (0 <) (av 1 2 (opp e1[e0/x])(e2[e0/x]))

then (Dx. e1)e0else (Dx. e2)e0)

iff, by computability of the expression right hand side [a, b]  π1(EJpif (0 <) (av 0 1 2 (opp e1[e0/x]) (e2[e0/x]))

then (Dx. e1)e0else (Dx. e2)e0Kρ) If we pose: EJe

0

Kρ = hi, ji, EJe1Kρ[hi,1i/x] = hi1, j1i, EJe1Kρ[hi,ji/x] = hi01, j10i, EJe2Kρ[hi,1i/x]= hi2, j2i, EJe2Kρ[hi,ji/x]= hi02, j20i.

by applying the denotational semantics rule we can derive that right hand side in the last relation is equal to j1 if i01 < i02, to j2 if i02< i01, and to j1t j2

otherwise.

On the other hand, by the rules of denotational semantics:

π1(EJ(Dx.min e1e2)eKρ) = π2(EJmin e1e2Kρ[i,1i/x]) which is equal to j1 if i1< i2, to j2if i2< i1, and to j1t j2otherwise.

Since EJe1K and E Je2K are self-related by R

i

σ, their value parts are independent from the derivative part so i1= i01 and i2= i02, from which the result follows.

The other cases can be proved in a similar way. ut

Riferimenti

Documenti correlati

To receive full credit, show all of your work.. Neither calculators nor computers

To receive full credit, show all of your work.. Neither calculators nor computers

Heteropolymer, Semiflexible Chain, Disorder, Persistence Length, Large Scale Limit, Tensor Analysis, Non-Commutative Fourier

Simply choosing ad hoc sets of nodes leads to resampling the (unknown) function and to avoid this drawback, here we present the use of the so-called mapped bases or fake nodes

Other forms of enforcement, on the other hand, lend themselves to almost complete automation, such as in the case of enforcement over (at least certain types of)

Indeed, we use L 2 methods with respect to these sequences of measures: we can prove rigorously that our L 2 maximixation procedure leads to the same asymptotic as the L ∞

While in our approach each virtual object is represented by a speci c data structure which contains a reference to the corresponding real object, in the O 2 approach only real

cose più importanti per lo studio della lingua e aiuta ad imparare la pronuncia, ma aiuta anche molto nella. comprensione e nella fluidità