A Language for Diﬀerentiable Functions

(1)

A Language for Differentiable Functions

Pietro Di Gianantonio¹ Abbas Edalat²

1 Dip. di Matematica e Informatica Universit`a di Udine, 33100 Udine, Italy

pietro.digianatonio@uniud.it

2 Department of Computing, Imperial College London ae@ic.ac.uk

Abstract. We introduce a typed lambda calculus in which real numbers, real functions, and in particular continuously differentiable and more generally Lipschitz functions can be defined. Given an expression representing a real-valued function of a real variable in this calculus, we are able to evaluate the expression on an argument but also evaluate the L-derivative of the expression on an argument. The language is an extension of PCF with a real number data-type but is equipped with primitives for min and weighted average to capture computable continuously differentiable or Lipschitz functions on real numbers. We present an operational semantics and a denotational semantics based on continuous Scott domains and several logical relations on these domains. We then prove an adequacy result for the two semantics. The denotational semantics also provides denotational semantics for Automatic Differen- tiation. We derive a definability result showing that for any computable Lipschitz function there is a closed term in the language whose evaluation on any real number coincides with the value of the function and whose derivative expression also evaluates on the argument to the value of the L-derivative of the function.

Introduction

Real-valued locally Lipschitz maps on finite dimensional Euclidean spaces enjoy a number of fundamental properties which make them the appropriate choice of functions in many different areas of applied mathematics and computation.

They contain the class of continuously differentiable functions, are closed under composition and the absolute value, min and max operations, and contain the important class of piecewise polynomial functions, which are widely used in geometric modelling, approximation and interpolation and are supported in Mat- Lab [4]. Lipschitz maps with uniformly bounded Lipschitz constants are closed under convergence with respect to the sup norm. Another fundamental property of these maps is that a Lipschitz vector field in Rⁿ has a unique solution in the initial value problem [3].

In the past thirty years, motivated by applications in control theory and op- timisation and using an infinitary double limit superior operation, the notion

(2)

of Clarke gradient has been developed as a convex and compact set-valued gen- eralized derivative for real-valued locally Lipschitz maps [2]. For example, the absolute value function, which is not classically differentiable at zero, is a Lips- chitz map which has Clarke gradient [−1, 1] at zero. The Clarke gradient extends the classical derivative for continuously differentiable functions.

Independently, a domain-theoretic Scott continuous Lipschitz derivative, later called the L-derivative, was introduced in [9] for interval-valued functions of an interval variable and was used to construct a domain for locally Lipschitz maps;

these results were then extended to higher dimensions [10]. It was later shown that on finite dimensional Euclidean spaces the L-derivative actually coincides with the Clarke gradient [6]. In finite dimensions, therefore, the L-derivative provides a simple and finitary representation for the Clarke gradient.

Since the mid 1990’s, a number of typed lambda calculi, namely extensions of PCF with a real number data type, have been constructed, including Real PCF, RL and LPR [12, 5, 19], which are essentially equivalent and in which computable continuous functions can be defined. Moreover, IC-Reals, a variant of LPR with seven digits, has been implemented with reasonable efficiency in C and Haskell [14].

It was relatively straightforward in [8] to equip Real PCF with the integral operator, which is in fact a continuous functional. However, adding a derivative operator to the language has proved to be non-trivial since classic differentiation may not be defined on continuous functions and even when defined it may not result in a continuous function. The development of the Scott continuous L-derivative, defined in a finitary manner, has therefore been essential for construction of a language with a derivative operator.

The aim of this work is to take the current extensions of PCF with a real number data type into a new category and define a typed lambda calculus, in which real numbers, real functions and in particular continuously differentiable and Lipschitz functions are definable objects. Given an expression e representing a function from real numbers to real numbers in this language, we would be able to evaluate both e and its L-derivative on an argument. In this paper we will only be concerned with the theoretical feasibility of such a language and not with questions of efficiency.

To develop such a language, we need to find a suitable replacement for the test for positiveness ((0 <) ), which is used in the current extensions of PCF with real numbers to define functions by cases. In fact, a function defined using the conditional with this constructor will not be differentiable at zero even if the two outputs of the conditional are both differentiable: Suppose we have two real computable functions f and g whose derivatives D f and D g are also computable, and consider l = λx. if (0 <) x then f x else g x. The function l is computable and there is an effective way to obtain approximations of the value of l(x) including at 0. However, there is no effective way to generate any approximation for the derivative of l, i.e., D l, at the point 0. In fact, it is correct to generate an approximation of Dl on 0 only if f (0) = g(0), but this equality

(3)

is undecidable, i.e., it cannot be established by observing the computation of f and g at 0 for any finite time.

In this paper, instead of the test (0 <) , we will use the functions minimum, negation and weighted average when defining continuously differentiable or Lip- schitz maps. These primitives are of course definable in Real PCF, RL and LPR, but the definitions are based on the test (0 <) , which means that the information about the derivative is lost.

By a simple transfer of the origin and a rescaling of coordinates we can take the interval [−1, 1] as the domain of definition of Lipschitz maps. Furthermore, by a rescaling of the values of Lipschitz maps (i.e., multiplying them with the reciprocal of their Lipschitz constant) we can convert them to non-expansive maps, i.e., we can take their Lipschitz constant to be one. Concretely, we take digits similar to those in Real PCF as constructors and develop an operational semantics and a denotational semantics based on three logical relations, and prove an adequacy result. The denotational semantics for first order types is closely related but different from the domain constructed in [9] in that we capture approximations to the function part and to the derivative part regarded as a sublinear map on the tangent space. Finally, we prove a definability result and show that every computable Lipschitz map is definable in the language as the limit of a sequence of piecewise linear maps with the convergence of their L- derivatives.

We note that all our proofs can be found in the Appendix.

0.1 Related work

Given a programme to evaluate the values of a function defined in terms of a number of basic primitives, Automatic Differentiation (also called Algorithmic Differentiation) seeks to use the chain rule to compute the derivative of the function [13]. AD is distinct from symbolic differentiation and from numerical differentiation. Our work can be regarded as providing denotational semantics for forward Automatic Differentiation and can be used to extend AD to computation of the generalised derivative of Lipschitz functions.

In [11], the differential λ-calculus, and in [17], the perturbative λ-calculus that integrates the latter with AD, have been introduced which syntactically model the derivative operation on power series in a typed λ-calculus or a full linear logic. Although apparently similar, our calculus and these two λ-calculi differ in almost every aspect: motivation, syntax, semantics, and the class of definable real functions. (i) These λ-calculi have been presented to analyse linear substitution and formal differentiation, (ii) the syntax is quite structured and contains constructors that have no correspondence in our setting, (iii) the semantics is based on differential categories and not on domain theory, and (iv) the definable real functions are limited to analytical maps which have power series expansion.

On the other hand, Computable Analysis [20, 21] and Constructive Analy- sis [1] are not directly concerned with computation of the derivative and both

(4)

only deal with continuously differentiable functions. In fact, a computable real- valued function with a continuous derivative has a computable derivative if and only if the derivative has a recursive modulus of uniform continuity [15, p. 191], [20, p. 53], which is precisely the definition of a differentiable function in constructive mathematics [1, p. 44].

1 Syntax

We denote the new language with PCDF (Programming language for Com- putable and Differentiable Functions).

The types of PCDF are the types of a slightly modified version of PCF where natural numbers are replaced by integers, together with a new type ι, an expression e of type ι denotes a real number in the interval [−1, 1] or a partial approximation of a real number, represented by a closed intervals contained in [−1, 1]. The set T of type expressions is defined by the grammar:

σ ::= o | ν | ι | σ → σ

where o is the type of booleans and ν is the type of integer numbers.

The expressions of PCDF are the expressions of PCF together with a new set of constants for dealing with real numbers. This set of constants is composed by the following elements:

(i) A constructor for real numbers given by: dig : ν → ν → ν → ι → ι. It is used to build affine transformations, and real numbers are obtained by a limiting process. The expression dig l m n represents the affine transformation λx.(l + m · x)/n, if 0 ≤ m < n and |l| ≤ n − m, or the constant function λx.0 otherwise. The above condition on l, m, n implies that dig l m n represents a rational affine transformation mapping the interval [−1, 1] strictly into itself with a non-negative slope or derivative 0 ≤ m/n < 1.

In this way we use three integers to encode a rational affine transformation;

of course it is possible to devise other encodings where just natural numbers or a single natural number is used, however these alternative encodings will be more complex.

The affine transformations definable by dig are also called generalised digits. Since there is no constant having type ι, an expression e having type ι can never normalise and its evaluation proceeds by producing expressions in the form dig l m n e⁰. These expressions give partial information about the value represented by e, namely they state that e represents a real number contained in the interval [(l + m)/n, (l − m)/n], which is the range of the function λx.(l + m · x)/n. During the reduction process, this interval is repeatedly refined and the exact result, a completely defined real number, can be obtained as the limit of this sequence.

(ii) The opposite sign function (negation) opp : ι → ι.

(iii) add : ν → ν → ι → ι, representing the function λ p q x . min((p/q + x), 1), if 0 < p < 2 · q, and the constant function λx. 0 otherwise.

(5)

We define sub p q x as syntactic sugar for the expression opp (add p q (opp x)), which returns the value max((x − (p/q), −1)).

(iv) A weighted average function av : ν → ν → ι → ι → ι. The expression av p q represents the function λx y . (p/q) · x + (1 − p/q) · y, if 0 < p < q, and the constant function λx y . 0 otherwise.

(v) The minimum function

min : ι → ι → ι

with the obvious action on pairs of real numbers. We define max x y as syntactic sugar for the expression opp (min (opp x)(opp y)).

(vi) A test function (0 <) : ι → o , which returns true if the argument is strictly greater than zero, and false if the argument is strictly smaller that zero. The test function can be used for constructing functions that are not differentiable, an example being the function λx. if (0 <) (x) then 1 else 0; as a consequence we impose some restriction in its use.

(vii) The if-then-else constructor on reals, if_ι : o → ι → ι → ι, and the parallel if-then-else constructor pif_ι : o → ι → ι → ι.

The main use for the parallel if operator is to evaluate, without loss of information, derivative of expressions containing the min operator. However, the parallel if operator can be completely avoided in defining non-expansive functions on real numbers. In fact in the constructive proof of our definability result, the parallel if operator is never used.

(viii) A new binding operator D. The operator D can bind only variables of type ι and can be applied only to expressions of type ι. In our language, Dx.e represents the derivative of the real function λx.e.

The differential operator D can be applied only to expressions that contain neither the constant (0 <) nor the differential operator D itself.

We note that, with the exception of the test functions (0 <) , all the new constants represent functions on reals that are non-expansive; the if-then-else constructors are also non-expansive if the distance between true (tt) and false (f f ) is defined to be equal to two, while the test function (0 <) cannot be non-expansive, whatever metric is defined on the Boolean values. The expressions containing neither the constant (0 <) nor the differential operator D are called non-expansive since they denote functions on real numbers that are non- expansive. This fact, intuitively true, is formally proved by Proposition 2. The possibility to syntactically characterise a sufficiently rich set of expressions representing non-expansive functions is a key ingredient in our approach that allows us to obtain information about the derivative of a function expression without completely evaluating it. For example, from the fact that e : ι is a non-expansive expression, one can establish that the derivative of λx.e, at any point, is contained in the interval [−1, 1] and that the derivative of λx.dig l m n e is contained in the smaller interval [−m/n, m/n].

(6)

2 Operational semantics

The operational semantics is given by a small step reduction relation, → , which is obtained by adding to the PCF reduction rules the following set of extra rules for the new constants.

The operational semantics of add and min operators uses an extra constant aff : ν → ν → ν → ι → ι. The expression aff l m n is intended to represent general affine transformations (including expansive ones) with a non-negative derivative, i.e., the affine transformation λx.(l + mx)/n with m ≥ 0, n > 0. A property preserved (i.e., invariant) by the reduction rules is that the constant aff appears only as the head of one of the arguments of min or as the head of the fourth argument of aff . It follows that in any expression e⁰ in the reduction chain of a standard expression e (without the extra constants aff ), the constant aff can appear only in the above positions.

The generalised digit dig l m n is a special case of an affine transformation.

Therefore, in applying the reduction rules, we use the convention that any reduction rule containing, on the left hand side, a general affine transformation aff can be applied also to terms where the affine transformation aff is substituted by the constructor dig .

On affine transformations we will use the following notations:

– (aff l1m1n1) ◦ (aff l2m2n2) stands for aff (l1· n2+ m1· l2) (m1· m2) (n1· n2), i.e., the composition of affine transformations.

– If m 6= 0, (aff l m n)⁻¹ stands for (aff (−l) n m), i.e., the inverse affine transformation; if m = 0, the expression (aff l m n)⁻¹ is undefined.

– The symbols l, m, n, p, q stand for values of integer type.

The reduction rules are the PCF reduction rules together the following set of extra rules. First we have three simple reductions:

– dig l m n e → dig 0 0 1 e if m < 0 or n ≤ 0 or |l| > n − m.

– add p q e → dig 0 0 1 e if p ≤ 0 or q ≤ p.

– av p q e₁e₂ → dig 0 0 1 e1 if p ≤ 0 or q ≤ p.

The above rules deal with those instances of dig , add , av with integer arguments that reduce to the constant zero digit. An implicit condition on the following set of rules is that they apply only if none of the above three rules can be applied.

1. dig l1m1n1(dig l2m2n2e) → ((dig l1m1n1) ◦ (dig l2m2n2)) e 2. opp (dig l m n e) → dig (−l) m n (opp e)

3. add p q e → min (aff p q q e)(dig 1 0 1 e)

note that (aff p q q) and (dig 1 0 1) represent the functions λx.p/q + x and λx.1 respectively.

4. av p q (dig l m n e1) e2 → dig l⁰m⁰n⁰(av p⁰q⁰e1e2)

where l⁰= l · p, m⁰= q⁰ = m · p + n · q − n · p, n⁰= n · q and p⁰ = m · p.

By a straightforward calculation, one can check that the left and the right parts of the reduction rules represent the same affine transformation on the arguments e1, e2.

(7)

5. av p q e₁(dig l m n e₂) → dig l⁰m⁰n⁰(av p⁰q⁰e₁e₂)

where l⁰= l(q − p), m⁰ = q⁰= np + mq − mp, n⁰= nq and p⁰= np.

6. min (dig l1m1n1e1)(aff l2m2n2e2) → dig l1m1n1e1

if (l1+ m1)/n1≤ (l2− m2)/n2.

The above condition states that every point in the image of (dig l1m1n1) is smaller, in the usual Euclidean order, than every point in the image of (aff l2m2n2), i.e., the first argument of min is certainly smaller that the second.

7. min (aff l1m1n1e1)(dig l2m2n2e2) → dig l2m2n2e2

if (l₂+ m₂)/n₂≤ (l1− m1)/n₁

The symmetric version of the previous rule.

8. min (dig l m n e₁) e₂ →

dig l⁰m⁰n⁰(min ((dig l⁰m⁰n⁰)⁻¹◦ (dig l m n) e₁) ((dig l⁰m⁰n⁰)⁻¹e₂)) if l + m < n and l⁰= l + m − n, m⁰= l + m + n 6= 0, n⁰= 2 · n.

The above equations imply that if (dig l m n) has image [a, b] then (dig l⁰m⁰n⁰) has image [−1, b]. The rule is justified by the fact if the first argument of min are smaller than b then the value of min is also smaller than b.

9. min e1(dig l m n e2) →

dig l⁰m⁰n⁰(min ((dig l⁰m⁰n⁰)⁻¹e1) ((dig l⁰m⁰n⁰)⁻¹◦ (dig l m n) e2)) if l + m < n and l⁰= l + m − n, m⁰= l + m + n 6= 0, n⁰= 2 ◦ n.

10. min (aff l₁m₁n₁e₁)(aff l₂m₂n₂e₂) →

dig l⁰m⁰n⁰(min ((dig l⁰m⁰n⁰)⁻¹◦(aff l₁m₁n₁) e₁)((dig l⁰m⁰n⁰)⁻¹◦(aff l₂m₂n₂) e₂)) if −1 < (l1+m1)/n1≤ (l2−m2)/n2and l⁰= l1−m1+n1, m⁰ = m1−l1+m1,

n⁰= 2 · n1.

The above equation implies that if (dig l1m1n1) has image [a, b] then (dig l⁰m⁰n⁰) has image [a, 1]. The rule is justified by the fact if both arguments of min are greater that a then the value of min is also greater than a.

11. min (aff l1m1n1e1)(aff l2m2n2e2) →

dig l⁰m⁰n⁰(min ((dig l⁰m⁰n⁰)⁻¹◦(aff l1m1n1) e1)((dig l⁰m⁰n⁰)⁻¹◦(aff l2m2n2) e2)) if −1 < (l₂−m2)/n₂< (l₁+m₁)/n₁, and l⁰ = l₂−m2+n₂, m⁰ = m₂−l2+m₂,

n⁰= 2 · n₂.

12. aff l₁m₁n₁(aff l₂m₂n₂e) → ((aff l₁m₁n₁) ◦ (aff l₂m₂n₂)) e 13. aff l m n e → dig l m n e

if −1 ≤ (l − m)/n, (l + m)/n ≤ 1 and e is not in the form aff a⁰b⁰e.

14. (0 <) (dig l m n e) → tt if (l − m)/n > 0 15. (0 <) (dig l m n e) → ff if (l − m)/n < 0 16. pif_ι e then dig l1m1n1e1else dig l2m2n2e2 →

dig l⁰m⁰n⁰(pif e then (dig l⁰m⁰n⁰)⁻¹◦ (dig l₁m₁n₁) e₁ else (dig l⁰m⁰n⁰)⁻¹◦ (dig l₂m₂n₂) e₂)

where n⁰ = 2 · n₁· n₂, m⁰ = max((l₁+ m₁) · n₂, (l₂+ m₂) · n₁) − min((l₁− m₁) · n₂, (l₂− m₂) · n₁), l⁰= 2 · min((l₁− m₁) · n₂, (l₂− m₂) + m⁰.

Here the values l⁰, m⁰, n⁰ are defined in such a way that if (dig l1m1n1) has image [a1, b1] and (dig l2m2n2) has image [a2, b2], then (dig l⁰m⁰n⁰) has image the convex closure of the set [a1, b1] ∪ [a2, b2].

(8)

The remaining rules for pif, if are included in the reduction rules for PCF and therefore are omitted from the present list.

17. N → N⁰

M N → M N⁰ if M is a constant different from the combinator Y or is an expression in the form min n₁, min n₁n₂, min n₁n₂n₃, min n₁n₂n₃M⁰, av n₁, av n₁n₂, av n₁n₂M⁰, add n₁, add n₁n₂, pif_ι M⁰ then, pif_ι M⁰ then M⁰⁰ else, where n₁, n₂, n₃ are values.

The reduction rules for the derivative operator are:

1. Dx. x → λy. dig 0 0 1 y

2. Dx. dig l m n e → λy. dig 0 m n (Dx. e)y 3. Dx. opp e → λy. opp (Dx. e)y

4. Dx. add p e q → λy. pif_ι(0 <) (add (q−p) q (opp e)) then (Dx. e)y else dig 0 0 1 y 5. Dx. av p q e1e2 → λy. av p q ((Dx. e1)y) ((Dx. e2)y)

6. Dx. min e1e2 →

λy. pif (λx. (0 <) (av 1 2 (opp e₁)e₂))y then (Dx. e₁)y else (Dx. e₂)y

7. Dx. pif_ι e₁then e₂else e₃ → λy. pif_ι (λx. e₁)y then (Dx. e₁)y else (Dx. e₂)y 8. Dx. if e₁then e₂else e₃ → λy. if (λx. e₁)y then (Dx. e₁)y else (Dx. e₂)y 9. Dx. Y e → Dx. e(Y e)

10. Dx. (λy. e)e1. . . en → Dx. e[e1/y]e2. . . en

Note that the rules for the derivative operator are a direct derivation of the usual rules for the symbolic computation of the derivative of a function.

2.1 Examples

We will give some examples for defining non-analytic functions in later sections;

in particular we will show in the proof of definability how easily piecewise linear maps with rational coefficients are defined in the language. A useful technique to define analytic functions and real constants is to consider their Taylor series expansions and reduce the Taylor series to a sequence of applications of affine transformations. For example the value e − 2, where e is the Euler constant, is given by the Taylor series 1/2!+1/3!+1/4! . . .. Denoting the affine transformation λx.(1 + x)/n as f , the above series can be expressed as f (2)(f (3)(f (4)(. . .) . . .).

It follows that in PCDF e − 2 can be expressed as

(Y λf : ν → ι. λn : ν. f n. dig 1 1 n (f (n + 1))) 2.

Given an expression to represent product in PCDF, it is possible to use the above technique to express analytic functions. For example, suppose hp de- fines the half-product function λxy. x · y/2. Then, one can express the function λx. e^x/2− 1 − x/2 by the PCDF.

λx : ι. hp x ((Y λf : ν → ι → ι. λn : ν. λx : ι. hp x (dig 1 1 n (f (n + 1)))) 2 x) The half product hp is definable in PCDF by reducing product to a series of applications of the average and minimum function. The actual definition of the

(9)

function hp is lengthy and we will not present it here. As a simpler example of the technique involved, we present the definition of the function λx.x²/2.

Consider the following mutual recursive definition of the terms g, h : ν → ι → ι

g 0 x = max x (opp x)

h n x = add 1 (2ⁿ⁺¹) (g n (add 1 (2ⁿ⁺¹) x)) g (n + 1) x = min (h n x)(h n (opp x))

By standard techniques, one can derive a PCDF expression g satisfying the above recursive definition. The careful reader can check that the term:

λx. (sub 1 2 (av 1 2 (g 0 x)(av 1 2 (g 1 x)(av 1 2 (g 2 x) . . . (av 1 2 (g n x) (dig 101x) . . .) represents the step-wise linear interpolation of the function λx.x²/2 on the points of the set {i/2ⁿ|i ∈ Z, −2ⁿ ≤ i ≤ 2ⁿ}. It follows that the function λx.x²/2 is defined by the term:

λx. (sub 1 2((Y λf : ν → ι → ι. λn : ν. λx : ι. (av 1 2 (g n x) (f (n + 1) x))) 0 x).

3 Denotational Semantics

The denotational semantics for PCDF is given in the standard way as a family of continuous Scott domains, U D := {D_σ| σ ∈ T }. The basic types are interpreted using the standard flat domains of integers and booleans. The domain associated to real numbers is the product domain Dι = I × I, where I is the continuous Scott domain consisting of the non-empty compact subintervals of the interval I = [−1, 1] partially ordered with reverse inclusion. Elements of I can represent either a real number x, i.e., the degenerated interval [x, x], or some partial information about a real number x, i.e., an interval [a, b], with x ∈ [a, b]. On the elements of I, we consider both the set-theoretic operation of intersection (∩), the pointwise extensions of the arithmetic operations, and the lattice operations on the domain information order (u, t), [12]. Function types have the usual interpretation of call-by-name programming languages: Dσ→τ = Dσ→ Dτ.

A hand waiving explanation for the definition of the domain Dι= I × I, is that the first component is used to define the value part of the function while the second component is used to define the derivative part. More precisely, a (non- expansive) function f from I to I, is described, in the domain, by the product of two functions hf₁, f₂i : (I × I) → (I × I): the function f1 : (I × I) → I represents the value part of f , in particular f₁(i, j) is the image of the interval i under f for all intervals j, i.e., f₁ depends only on the first argument. The second function f₂: (I × I) → I represents the derivative part. If D f denotes the derivative of f , then f₂(i, j) is the image of the intervals i and j under the function λx, y.D f (x) · y. Thus, f2 is linear in its second component and f2({x}, {1}) is the derivative of f at the point x.

Note that with respect to the above interpretation, composition behaves cor- rectly, that is if the pair hf₁, f₂i : (I × I) → (I × I) describes the value part and the derivative part of a function f : I → I and hg₁, g₂i : (I × I) → (I × I) describes a function g : I → I then hh1, h2i describes, by the chain rule, the function f ◦ g with h1(i, j) = f1(g1(i, j), g2(i, j)) and h2(i, j) = f2(g1(i, j), g2(i, j)).

(10)

The L-derivative of the non-expansive map f : I → I is the Scott continuous function L(f ) : I → I defined by [6]:

L(f )(x) =T{b ∈ I : ∃ open interval O ⊂ I, x ∈ O with

f (u)−f (v)

u−v ∈ b for all u, v ∈ O, u 6= v}.

Consider now the case of functions in two arguments. Given a function g : I → I → I, its domain description will be an element in (I × I) → (I × I) → (I × I), which is isomorphic to ((I × I) × (I × I)) → (I × I). Thus again, the domain description of g consists of a pair of functions hg₁, g₂i, with g1 describing the value part. If D g (x₁, x₂) is the linear transformation representing the derivative of g at (x₁, x₂), then the function g₂ is a domain extension of the real function λx₁, y₁, x₂, y₂.D g (x₁, x₂) · (y₁, y₂).

This approach for describing functions on reals is also used in (forward mode) Automatic Differentiation [13]. While Automatic Differentiation is different from our method in that it does not consider the domain of real numbers and the notion of partial reals, it is similar to our approach in that it uses two real numbers as input and a pair of functions to describe the derivative of functions on reals. Authomatic differentiation is also used in [17], while the idea of using two separated components to describe the value part and the derivative part in the domain-theoretic setting can be found also in [9].

The semantic interpretation function E is defined, by structural induction, in the standard way:

EJcK^ρ = BJcK EJxK^ρ = ρ(x)

EJe¹e2K^ρ = EJe¹K^ρ(EJe²K^ρ) EJλx

σ.eKρ = λd ∈ Dσ.EJeK(ρ[d/x])

The semantic interpretation of any PCF constant is the usual one, while the semantic interpretation of the new constants on reals is given by:

BJdig K(l, m, n, hi, j i) =







⊥ if l = ⊥ ∨ m = ⊥ ∨ n = ⊥

h[0, 0], [0, 0]i if ¬(0 ≤ m < n ∧ |l| ≤ n − m) hl/n + m/n · i, m/n · ji otherwise

BJopp K(hi, j i) = h−i, −j i

BJadd K(p, q, hi, j i) =











⊥ if p = ⊥ ∨ q = ⊥

h[0, 0], [0, 0]i if ¬(0 < 2 · p < q) hi + p/q, ji if i + p/q < 1 h[1, 1], [0, 0]i if i + p/q > 1 hi + p/q ∩ [−1, 1], j u [0, 0]i otherwise BJav K(p, q, hi1, j₁i, hi₂, j₂i)

=







⊥ if p = ⊥ ∨ q = ⊥

h[0, 0], [0, 0]i if ¬(0 < p < q)

hp/q · i1+ (1 − p/q) · i2, p/q · j1+ (1 − p/q) · j2i otherwise

(11)

BJmin K(hi¹, j1i, hi2, j2i) =







hi1, j1i if i1< i2

hi2, j2i if i1> i2

hi1min i2, j1u j2i otherwise

BJ(0 <) K(hi, j i) =







tt if i > 0 ff if i < 0

⊥ otherwise The interpretation of the derivative operator is given by:

EJDx.eKρ= λd ∈ I × I . hπ2(EJeKρ[hπ1d, 1i/x]), ⊥i

Note that the function BJ(0 <) K loses the information given by the derivative part, while the function EJDx.eK^ρ, is a sort of translation of the function EJλx.eK^ρ: The value of EJDx.eKρ is obtained from the derivative part of EJλx.eKρ, while the derivative part of EJDx.eKρis set to ⊥.

Consider some examples. The absolute value function can be implemented through the term Ab = λx.max (opp x)x with the following semantic interpretation:

EJAbK^ρ(hi, ji) =







hi, ji if i > 0 h−i, −ji if i < 0 h[k⁻, k⁺], [−1, 1]ji otherwise, where k⁻= max(i⁻, −i⁺), k⁺= max(i⁺, −i⁻) with i = [i⁻, i⁺].

When the absolute value function is evaluated at 0, where it is not differentiable, the derivative part of the semantic interpretation returns a partial value:

π2(EJAbK^ρ({0}, {1}) = [−1, 1]. This partial value coincides with the Clarke gradient, equivalently the L-derivative, of the absolute value function.

The function ^|x−y|₂ , is represented by the expression

Ab-dif = λx.y.max (xav 1/2(opp y))((opp x)av 1/2y) whose semantics is the function:

EJAb-difKρ(hi1, j1i, hi2, j2i) =







hⁱ¹⁻ⁱ₂ ²,^j¹^−j₂ ²i if i₁> i₂ hⁱ²⁻ⁱ₂ ¹,^j²^−j₂ ¹i if i1< j1

h[k⁻, k⁺], [−1/2, 1/2](j₁− j₂)i otherwise, ,

where k⁻= max(i⁻₁ − i⁺₂, i⁻₂ − i⁺₁) and k⁺= max(i⁺₁ − i⁻₂, i⁺₂ − i⁻₁).

FromJAb-difK it is possible to evaluate the partial derivative of the function

|x−y|

2 , not only along the axes x and y, but along any direction. Considering the Euclidean distance, the derivative of the function at (0, 0) in the direction of the unit vector (u/√

u²+ v², v/√

u²+ v²) is given by EJAb-difKρ(h{0}, {u/√

u²+ v²}i, h{0}, {v/√

u²+ v²}i), that is the the interval [−1/2, 1/2]^√^u−v

u²+v². Again this value coincides with the value of the Clarke gradient of the function ^|x−y|₂ at (0, 0) in the direction (u/√

u²+ v², v/√

u²+ v²).

(12)

3.1 Logical relations characterization

In the present approach we choose to define the semantic domains in the simplest possible way. As a consequence, our domains contain also points that are not consistent with the intended meaning, for example, the domain Dι→ι= (I×I) → (I × I) contains also the product of two functions hf1, f2i where the derivative part f2 is not necessarily linear in its second argument and is not necessarily consistent with the value part, i.e., the function f1; moreover the value part f1

can be a function depending also on its second argument.

However the semantic interpretation of (non-expansive) PCDF expressions will not have this pathological behaviour. A proof of this fact and a more precise characterisation of the semantic interpretation of expressions can be obtained through the technique of logical relations [18]. In particular we define a set of logical relations on the semantic domains and prove that, for any non-expansive PCDF expression e, the semantic interpretation of e satisfies these relations.

Using this method, we can establish a list of properties for the semantic interpretation of PCDF expressions.

Definition 1. The following list of relations are defined on the domain D_ι. – Independence: A binary relation Rⁱ_ι consisting of the pairs of the form

(hi, j1i, hi, j2i). The relation Rⁱ_ι is used to establish that, for a given function, the value part of the result is independent from the derivative part of the argument: f1(i, j1) = f1(i, j2).

– Sub-linearity: A family of relations R^l,r_ι indexed by a rational number r ∈ [−1, 1]. The family R^l,r_ι consists of pairs of the form (hi, j₁i, hi, j₂i) where j1v r · j2. These relations are used to establish the sublinearity of the derivative part: f2(i, r · j) v r · f2(i, j).

– Consistency: A family of ternary relation R^d,r_ι indexed by a rational number r ∈ (0, 2], consisting of triples of the form (hi₁, j₁i, hi2, j₂i, hi3, j₃i) with i3v i₁u i₂ and (r · j₃) consistent with (i₁− i₂), that is the intervals (r · j₃) and (i₁− i₂) have a non-empty intersection. This relation is used to establish the consistency of the derivative part of a function with respect to the value part.

The above relations are defined on the other ground domains D_o and D_ν as the diagonal relations in two or three arguments, e.g., R^d,r_ν (l, m, n) iff l = m = n. The relations are extended inductively to higher order domains by the usual definition on logical relations: R_σ→τⁱ (f, g) iff for every d1, d2 ∈ Dσ, Rⁱ_σ(d1, d2) implies Rⁱ_τ(f (d1), g(d2)), and similarly for the other relations.

Proposition 1. For any closed expression e : σ, for any rational number r ∈ [−1, 1], the semantic interpretation EJeK^ρ of e, is self-related by Rⁱ_σ, R^l,r_σ , i.e.

R_σⁱ(EJeK^ρ, EJeK^ρ), and similarly for R^l,r_σ . Moreover, if the expression e : σ is non-expansive, the semantic interpretation EJeKρ, is self-related by R^d,r_σ .

We now show how the three relations ensure the three properties of independence, sublinearity and consistency. To any element f = hf1, f2i in the domain Dι→ι= (I × I) → (I × I) we associate a partial function fv : I → I with

(13)

f_v(x) = y if f₁(h{x}, ⊥i) = {y}

undefined if f₁(h{x}, ⊥i) is a proper interval and a total function

fd: I → I = λx.f2(h{x}, {1}i))

The preservation of the relations R_ιⁱ, R^l,r_ι has the following straightforward consequences:

Proposition 2. (i) For any function f = hf1, f2i in Dι→ι self-related by Rⁱ_ι→ι, for every i, j1, j2, f1(hi, j1i) = f1(hi, j2i), the return value part is independent from the derivative argument.

(ii) For any function f = hf1, f2i in Dι→ι self-related by R^l,r_ι→ι for every i, j, and for every rational r ∈ [−1, 1], f2(hi, r · ji) v r · f2(hi, ji). It follows that:

– (f2(hi, {r}i))/r v f2(hi, {1}i), i.e., the most precise approximation of the L-derivative is obtained by evaluating the function with 1 as its second argument,

– for every i, j, f2(hi, −ji) = −f2(hi, ji), i.e., the derivative part is an odd function.

The preservation of the relation R^d,r_ι induces the following properties (see the Appendix for the proof):

Proposition 3. For any function f in Dι→ι self-related by R^d,r_ι→ι: (i) the function fv is non-expansive;

(ii) on the open sets where the functions fv is defined, the function fd is an approximation to the L-derivative of the function fv;

(iii) if f is a maximal element of D_ι→ι then f_v is a total function and f_d is the associated L-derivative.

3.2 Subdomains

By definition, the logical relations are closed under directed lubs, and as a consequence the sets of elements self-related by them are also closed under directed lubs.

For any ground type σ the relations Rⁱ_σ, R_σ^l,r, R^d,r_σ are closed under arbitrary meets, meaning that if ∀j ∈ J . Rⁱ_σ(dj, ej) then Rⁱ_σ(d

j∈Jdj,d

j∈Jej) and similarly for the other relations R^l,r_σ , R^d,r_σ . The proof is immediate for σ = o, ν, and is a simple check for σ = ι. The following result shows that this closure property holds also for σ = ι → ι.

Proposition 4. The set of elements in D_ι→ι self-related by any of the three relations R_ι→ιⁱ , R^l,r_ι→ι, and R^d,r_ι→ι is closed under arbitrary meets.

(14)

Proof. For the independence relation R_ι→ιⁱ , the closure property is trivial to check. For the consistency relation R^d,r_ι→ι, the closure under non-empty meets follows immediately from the fact that this relation is downward closed. The closure property for the sublinearity relation R_ι→ι^l,r is given in the Appendix.

We now employ the following result whose proof can be found in the Ap- pendix.

Proposition 5. In a continuous Scott domain, a non-empty subset closed under lubs of directed subsets and closed under non-empty meets is a continuous Scott subdomain.

Corollary 1. If σ is a ground type or first order type, then the set of elements in D_σ self-related by the three logical relations is a continuous Scott subdomain of D_σ.

As we do not deal with second or higher order real types in this extended abstract, we will not discuss the corresponding subdomains here.

3.3 Adequacy

As usual once an operational and denotational semantics are defined, it is nec- essary to present an adequacy theorem stating that the two semantics agree.

Let us denote by [a, b] Eval(e) the fact that there exits three integers l, m, n such that e →^? dig l m n e⁰ and [(l − m)/n, (l + m)/n] ⊂ (a, b). The proof of the following theorem is presented in the Appendix.

Theorem 1 (Adequacy). For every closed term e with type ι, interval [a, b]

and environment ρ, we have:

[a, b] Eval(e) iff [a, b] π1(EJeKρ)

In the operational semantics that we have proposed, the calculus of the derivative is performed through a sort of symbolic computation: the rewriting rules specify how to evaluate the derivative of the primitive functions and the application of the derivative rules essentially transforms a function expression into the function expression representing the derivative. The denotational semantics provides an alternative approach to the computation of the derivative, which almost exactly coincides with the computation performed by Automatic Differentiation. We can interpret our adequacy result as a proof that symbolic computation of the derivative and the computation of the derivative through Automatic Differentiation coincide. We remark in passing that, inspired by the denotational semantics, it is possible to define an alternative operational semantics that will perform the computation of the derivative in the same way that is performed by Automatic Differentiation.

(15)

3.4 Function definability

We will show in the following theorem that any computable Lipschitz function can be obtained in our framework as the limit, in the sup norm, of a sequence of piecewise linear maps definable in PCDF such that every piecewise linear map in the sequence gives lower and upper bounds for the function and the L- derivative of the function is contained in the L-derivatives of the piecewise linear maps, which converge to the classical derivative of the function wherever it is continuously differentiable.

Theorem 2. For any maximal computable function f in Dι→ι preserving the logical relations Rⁱ_ι→ι, R^l,r_ι→ι, R^d,r_ι→ι, there exists a closed PCDF expression f such that:

∀x ∈ I. fv(x) = (EJf K^ρ)v(x) ∧ fd(x) = (EJf K^ρ)d(x)

The above definability result states that if we consider only the behaviour of the domain functions on the total elements of Dι (i.e. the elements representing completely defined real numbers) then PCDF is sufficiently rich to represent the computable elements of Dι→ι.

We do not consider the problem of defining PCDF expressions whose semantics coincides with domain functions also on partial elements. The reason for this choice is that this later problem is technically more difficult and less interesting from a practical point of view.

The proof of the above result is quite lengthy: we define a general methodol- ogy to transform the information that can be extracted from a domain function into a PCDF expression. The Appendix contains a description of the construction.

4 Conclusion

We have integrated, in a single language, exact real number computation with the evaluation of the derivatives of function expressions.

The language has been designed using a minimal set of primitives sufficient to define any computable (and differentiable) function. It can be seen as a theoretical basis for the implementation of exact real number computation in a programming language. In a practical implementation, however, one needs both to extend the set of primitive functions and to carefully redesign the reduction strategy to increase both the usability of the language and the efficiency of the computation.

The main result presented here is an adequate denotational semantics for differentiable functions, which has required original ideas in developing the semantics domains, and a definability result showing the expressivity of the language.

The present research can be extended in several directions. Some possible future works are the following.

– An obvious problem to consider is whether the definability result presented in the paper can be extended to a larger class of function domains. We claim

(16)

that the techniques presented here can be easily adapted to functions with several arguments. This is not however the case when considering higher order functions, whose definability is an open problem.

– A second direction for possible further research is the treatment of the second derivative and more generally derivative of arbitrary order.

References

1. E. Bishop and D. Bridges. Constructive Analysis. Springer-Verlag, 1985.

2. F. H. Clarke. Optimization and Nonsmooth Analysis. Wiley, 1983.

3. E. A. Coddington and N. Levinson. Theory of Ordinary Differential Equations.

McGraw-Hill, 1955.

4. T. A. Davis and K. Sigmon. MATLAB Primer. CRC Press, seventh edition, 2005.

5. P. Di Gianantonio. An abstract data type for real numbers. Theoretical Computer Science, 221:295-326, 1999.

6. A. Edalat. A continuous derivative for real-valued functions. In New Computa- tional Paradigms, Changing Conceptions of What is Computable, pages 493–519.

Springer, 2008.

7. A. Edalat. A differential operator and weak topology for Lipschitz maps. Topology and its Applications, 157,(9):1629-1650, June 2010.

8. A. Edalat, M. Escard´o Integration in Real PCF. Information and Computation, 160:128–166, 2000.

9. A. Edalat, A. Lieutier. Domain theory and differential calculus (Functions of one variable). Mathematical Structures in Computer Science, 14(6):771–802, December 2004.

10. A. Edalat, A. Lieutier, and D. Pattinson. A computational model for multi-variable differential calculus. In Proc. FoSSaCS 2005, volume 3441 of LNCS, 3441: 505–519, 2005.

11. Thomas Ehrhard and Laurent Regnier. The differential lambda-calculus. Theoret- ical Computer Science, 309(1-3), 2003.

12. M. H. Escard´o. PCF extended with real numbers. Theoretical Computer Science, 162(1):79–115, August 1996.

13. A. Griewank and A. Walther. Evaluating Derivatives. Siam, second edition, 2008.

14. www.doc.ic.ac.uk/exact-computation/.

15. K. Ko. Complexity Theory of Real Numbers. Birkh¨auser, 1991.

16. G. Lebourg. Generic differentiability of lipschitzian functions. Transaction of AMS, 256:125–144, 1979.

17. O. Manzyiuk. A simply typed lambda calculus for forward automatic differentiation. In Proc MFPS 12, ENTCS 259–273, 2012.

18. J. C. Mitchell. Foundations of Programming Languages. MIT Press, 1996.

19. P. J. Potts, A. Edalat, and M. Escard´o. Semantics of exact real arithmetic. In Twelfth Annual IEEE Symposium on Logic in Computer Science. IEEE, 1997.

20. M. B. Pour-El and J. I. Richards. Computability in Analysis and Physics. Springer- Verlag, 1988.

21. K. Weihrauch. Computable Analysis (An Introduction). Springer, 2000.

(17)

Appendix

We give the details of several proofs.

4.1 Proof of Proposition 1

Proof. The proof is quite standard, and is based on the fact that the relations R_σⁱ, R^l,r_σ , R^d,r_σ are logical. First one proves that the semantic interpretation of (non-expansive) constants are self-related by R^d,r_σ , Rⁱ_σ, and R^l,r_σ . Then, to show that the fixed-point operator preserves the relations, one shows that the bottom elements are self-related by Rⁱ_σ, R^l,r_σ , and R^d,r_σ , and that the relations are closed under the lub of chains. Finally, by the basic lemma of logical relations, one

obtains the result. ut

Proof. (i) Let x and y be two real numbers for which the function fvis defined.

For any rational r ≥ |x − y| we have that

R_ι^d,r(h{x}, [−1, 1]i, h{y}, [−1, 1]i, h[x, y], [−1, 1]i). Therefore

R_ι^d,r(f (h{x}, [−1, 1]i), f (h{y}, [−1, 1]i), f (h[x, y], [−1, 1]i)), which implies that fv(x) − fv(y) ∈ r · f2(h[x, y], [−1, 1]i), and thus −1 ≤ ^f^v^(x)−f_x−y^v^(y)≤ 1.

(ii) Given any x ∈ I and any open interval O containing fd(x) = f2(h{x, }, {1}i), let [a, b] {x} be a rational interval such that fv is define on [a, b] and f₂(h[a, b], {1}i) ⊆ O, and let r = b − a, we have

R_ι^d,r(h{b}, [−1, 1]i, h{a}, [−1, 1]i, h[a, b], {1}i), by repeating the arguments of the previous point, it follows ^f^v^(b)−f_b−a^v^(a) ∈ f2(h[a, b], {1}i). By monotonicity of f it follows that for any pair of rationals a⁰, b⁰ ∈ (a, b), we have: ^f^v^(b_b⁰0^)−f−a^v⁰^(a⁰⁾ ∈ f₂(h[a, b], {1}i), and by continuity of f_vfor any pair of real numbers x, y ∈ (a, b)

fv(x)−fv(y)

x−y ∈ O.

(iii) If f is a maximal element Dι→ι, by point (i) the function fvis non-expansive on the points where it is defined. It follows that if the function fvis not defined at a given point x, it is always possible to construct a function f^◦ such that f v f^◦ and f_v^◦ defined on x, leading to a contradiction. Similar arguments can be used to prove that f_dis the L-derivative of f_v and not only an approximation of the L-derivative.

The sublinearity relation R^l,r_ι→ιis closed under non-empty meets.

Proof. To show that sublinearity is closed under meets, assume that f_k : I → I with k ∈ K is a family of Scott continuous functions satisfying the sublinearity fk(r[x, y]) ≥ rfk([x, y]) for all [x, y] ∈ I and some (rational) r ∈ [−1, 1]. We show that the meetd

kfk will also be sublinear.

(18)

We use the lower and upper parts of any f : I → I as f⁻, f⁺ : T → [−1, 1]

where T = {(x, y) ∈ [−1, 1] × [−1, 1] : x ≤ y}. Note that f⁻ and f⁺ are lower and upper semi-continuous respectively. Sublinearity of f is equivalent to f⁺(r(x, y)) ≥ rf⁺((x, y)) and f⁻(r(x, y)) ≤ rf⁻((x, y)) for r ∈ [−1, 1] and (x, y) ∈ T .

We have: (d

kf_i)⁺ = g with g = lim sup g₀ where g₀ = sup_k∈Kf_k⁺ and similarly (d

kfi)⁻= h with h = lim inf h0 where h0= infk∈Kf_k⁻.

The sublinearity condition for f_k is equivalent to f_k⁺(r(x, y)) ≥ rf_k⁺((x, y)) and f_k⁻(r(x, y)) ≤ rf_k⁻((x, y)) for r ∈ [−1, 1] and (x, y) ∈ T .

By taking pointwise sup and inf respectively we get: g₀(r(x, y)) ≥ rg₀((x, y)) and h₀(r(x, y)) ≤ rh₀((x, y)). By taking limsup and liminf respectively we obtain: g(r(x, y)) ≥ rg((x, y)) and h(r(x, y)) ≤ rh((x, y)) as required.

In a continuous Scott domain, a non-empty subset closed under lubs of direct subsets and closed under non-empty meets is a continuous Scott subdomain.

Proof. Let D be a continuous Scott domain and C ⊂ D a non-empty subset with the above closure properties. Given an element of d ∈ D, denote by i(d) the greatest lower bound (meet) in C of the set {c | d v c, c ∈ C}, if this set is not empty, otherwise let i(d) be undefined.

Then i, regarded as a partial function from D to C, preserves the well below relation . In fact given two elements x, y ∈ D with y D x and i(x) defined, we check that i(y) C i(x). Let A be a directed subset of elements in C, with i(x) vF

CA. Since C is closed under lub of directed sets,F

CA =F

DA, and by construction of i, we have x v i(x). Thus, x vF

DC, and, by hypothesis, there exists a ∈ A such that y v a. Since i is monotone and coincides with the identity on the elements of C, we have i(y) v i(a) = a and therefore i(y) _C i(x). It follows that if a set B is a basis of D then i(B) forms is a basis for C. In fact given an element x ∈ C, the set A = {a ∈ B | a x}, is a directed set with lub x, then i(A) is a directed set of elements well below i(x) = x, having x as lub.

Therefore C is a continuous dcpo, and since it has non-empty meets, it is also consistently complete.

4.5 Proof of Adequacy Theorem 1

For every closed term e with type ι and environment ρ, we have:

[a, b] Eval(e) iff [a, b] π1(EJeK^ρ)

Proof. We use the technique of computability predicates to prove both the soundness and the completeness of the operational semantics. Note that the soundness of the operational semantics cannot be proved by simply showing that the reduction rules preserve the denotational semantics, since this is simply not true. A simple example being the expression (Dx. dig 0 1 2 e1) e2that reduces

(19)

to dig 0 1 2 ((Dx. e₁) e₂). The elements EJ(Dx. dig 0 1 2 e1) e₂Kρ and

EJdig 0 1 2 ((Dx. e1) e₂)Kρdo not coincide on their second component (the first is

⊥, the other is above [−1/2, 1/2]. More generally, all the reduction rules for the derivative operator do not preserve the semantics on the second element.

We define a computability predicate Comp for closed terms of type o, ν by requiring that the denotational and operational semantics coincide, in the usual way. A closed term e having type ι satisfies the predicate Comp_ι if for every closed rational interval [a, b] and environment ρ we have: [a, b] Eval(e) iff [a, b] π₁(EJeKρ)

The computability predicate is then extended, by induction on types, to closed elements of any type, and, by closure, to arbitrary elements.

Using the standard techniques for computability predicate, it is possible to prove that all constants are computable, and that λ-abstraction preserves the computability of the expressions. Therefore all expressions not containing the derivative operator are computable.

To prove that the computability predicate is satisfied by expressions containing the derivative operator, we show, by structural induction on the non- expansive expression e, that the expression Dx.e is also computable.

The proof considers many cases. As an example, we take the case where e = min e₁e₂. By the induction hypothesis we can assume the computability of Dx.e1 and Dx.e2. Since the expressions e1, e2, pif and av do not contain the derivative operator, we can assume that they are computable.

We need to prove, for any computable expression e⁰, that [a, b] Eval((Dx.min e1e2)e⁰) iff [a, b] π1(EJ(Dx.min e1e2)e⁰Kρ).

On the one hand, we have the following chain of implications:

[a, b] Eval((Dx.min e1e2)e⁰) iff, by the denotational semantics rules, [a, b] Eval(pif (0 <) (av 1 2 (opp e1[e⁰/x])(e2[e⁰/x]))

then (Dx. e1)e⁰else (Dx. e2)e⁰)

iff, by computability of the expression right hand side [a, b] π1(EJpif (0 <) (av 0 1 2 (opp e¹[e⁰/x]) (e2[e⁰/x]))

then (Dx. e1)e⁰else (Dx. e2)e⁰K^ρ) If we pose: EJe

0

Kρ = hi, ji, EJe1Kρ[hi,1i/x] = hi₁, j₁i, EJe1Kρ[hi,ji/x] = hi⁰₁, j₁⁰i, EJe2Kρ[hi,1i/x]= hi₂, j₂i, EJe2Kρ[hi,ji/x]= hi⁰₂, j₂⁰i.

by applying the denotational semantics rule we can derive that right hand side in the last relation is equal to j1 if i⁰₁ < i⁰₂, to j2 if i⁰₂< i⁰₁, and to j1t j2

otherwise.

On the other hand, by the rules of denotational semantics:

π1(EJ(Dx.min e¹e2)eK^ρ) = π2(EJmin e¹e2Kρ[i,1i/x]) which is equal to j1 if i1< i2, to j2if i2< i1, and to j1t j2otherwise.

Since EJe1K and E Je2K are self-related by R

i

σ, their value parts are independent from the derivative part so i1= i⁰₁ and i2= i⁰₂, from which the result follows.

The other cases can be proved in a similar way. ut