Partial Functions: option types - Programming Languages and Techniques Lecture Notes for CIS 12

Consider the problem of computing the maximum integer from a list of integers.

At first sight, such a problem seems simple—we simply look at each element of the list and retain the maximum. We can start to program this functionality very straightforwardly by using the by-now-familiar list recursion pattern:

let rec list_max (l:int list) : int = begin match l with

| [] -> (* what to do here? *)

| x::tl -> max x (list_max tl) end

Unfortunately, this program doesn’t have a good answer in the case that the list is empty—there is no “maximum” element of the empty list.

What can we do? One possibility is to simply have the ^list_maxfunction fail whenever it is called on an empty list. This solution requires that we handle three separate cases, as shown below:

let rec list_max (l:int list) : int = begin match l with

| [] -> failwith "list_max called on []"

| x::[] -> x

| x::tl -> max x (list_max tl) end

The cases cover the empty list, the singleton list, and a list of two-or more ele-ments. We have to separate out the singleton list as a special case because it is the first length for which the^list_maxfunction is well defined.

This solution is OK, and can be very appropriate if we happen to know by external reasoning that^list_max will never be called on an empty lists. We saw such use of failure in the ^tree_maxfunction used to find the maximal element of

a nonempty binary search tree in §6. However, what if we can’t guarantee that the

list_max function won’t be called on an empty list? How else could we handle this possibility without failing, and thereby aborting the program¹

The problem is that ^list_maxis an example of a partial function—it isn’t well-defined for all possible inputs. Other examples of partial functions that you have encountered are the integer division operation, which isn’t defined when dividing by⁰, and the^Map.findfunction, which can’t return a good value in the case that a key isn’t in its set of key–value bindings.

It turns out that we can do better than failing, and, moreover, we already have all the tools needed. The idea is to create a datatype that explicitly represents the absence of a value. We call this datatype the^{'a option}datatype, and it is defined like this:

type 'a option =

| None

| Some of 'a

There are only two cases: either the value is missing, represented by the con-structor ^None, or the value is present, represented by ^{Some v}. As with any other datatype, we use pattern matching to determine which case occurs.

Option types are very useful for representing the lack of a particular value. For a partial function like ^list_max, we can alter the return type to specify that the result is optional. Doing so leads to an implementation like this:

(* NOTE: this version has a different return type! *) let rec list_max (l:int list) : int option =

begin match l with

| [] -> None (* indicates partiality *)

| x::tl -> begin match (list_max tl) with

| None -> Some x

| Some m -> Some max x m end

end

As you can see, this implementation handles the same three cases as in the one that uses ^failwith; the difference is that after the recursive call we must explic-itly check (by pattern matching against the result) whether list_max tl is well defined.

At first blush, this seems like a rather awkward programming style, and the need to explicitly check for^Noneversus^{Some v}onerous. However, it is often

possi-1OCaml, like Java and other modern languages, supports exceptions that can be caught and handled to prevent the program from aborting in the case of a failure. Exceptions are another way of dealing with partiality; we will cover them later in the course.

ble to write the program in such a way such checks can be avoided. For example, here is a cleaner way to write the same^list_maxfunction by using^fold:

let rec list_max (l:int list) : int option = begin match l with

| [] -> None

| x::tl -> Some (fold max x tl) end

The expressionfold max x tltakes the maximum element from among those in^tland^x, which is always well-defined.

It is also worth pointing out that because the type ^{'a option} is distinct from the type^'a, it is never possible to introduce a bug by confusing them—OCaml will force the programmer to do a pattern match before being able to get at the^'avalue in an^{'a option}.

For example, suppose we wanted to find the sum of the maximum values of each of two lists. We can write this program as:

let sum_two_maxes (l1:int list) (l2:int list) : int option = begin match (list_max l1, list_max l2) with

| (None, None) -> None

| (Some m1, None) -> Some m1

| (None, Some m2) -> Some m2

| (Some m1, Some m2) -> Some (m1 + m2) end

Here we are forced to explicitly think about what to do in the case that both lists are empty—here we make the choice to make sum_two_maxes itself return an ^int ^option. The option types prevent us from mistakenly trying to naively do (list_max l1) + (list_max l2), which would result in a program crash (or worse) if permitted.

In languages like C, C++, and Java that have a^nullvalue which can be given any type, it is an extremely common mistake to conflate^null, which should mean

“the lack of a value,” with “an empty value” of some type. For example, one might try to represent the empty list as^null. However, such conflation very often leads to so-called “null pointer exceptions” that arise because some part of the program treats a^nullas though it has type^'a, when it is really meant to be the ^Noneof an

'a option.

There is a big difference between “the absence of a list” and “an empty list”—

it makes sense to insert an element into the empty list, for example, but it never makes sense to insert an element into “the absence of a list.”

Sir Tony Hoare, Turing-award winner and a researcher scientist at Microsoft, invented the idea of ^null in 1965 for a language called A^LGOL W. He calls it his

“billion-dollar mistake” because of the amount of money the software industry has spent fixing bugs that arise due to unfortunate confusion of the^'aand^{'a option} types. Option datatypes provide a simple solution to this problem.

We will see that option types also play a crucial role when we study linked, mutable datastructures in §16.

Nel documento Programming Languages and Techniques Lecture Notes for CIS 120 Steve Zdancewic Stephanie Weirich University of Pennsylvania March 25, 2018 (pagine 109-113)