** Mutable State and Aliasing**

**14.1 Mutable Records**

To see how the “action at a distance” provided by mutable state can
sim-plify some programming problems, let’s consider a simple task. Suppose
we wanted to do a performance analysis of the ^{delete} operation for the
binary search trees we saw in §7. In particular, suppose that we want to
count the number of times that the helper function^{tree_max}is called,
ei-ther by^{delete}or via recursion.

First, let’s recall the definition of these functions:

(* returns the maximum integer in a NONEMPTY BST t *)
**let rec** tree_max **(t:tree) :** **int** **=**

**begin match** t **with**

**|** Empty **->** failwith "tree_max called on empty tree"

**|** Node(_,x,Empty) -> x

**|** Node(_,_,rt) -> tree_max rt
**end**

(* returns a binary search tree that has the same set of
nodes as t except with n removed (if it’s there) *)
**let rec** delete **(n:’a) (t:’a tree) :** **’a tree** **=**

**begin match** t **with**

**|** Empty **->** Empty

**|** Node(lt,x,rt) ->

**if** x **=** n **then**

**begin match** **(lt,rt)** **with**

**| (Empty,** Empty) -> Empty

**| (Node** **_,** Empty) -> lt

**| (Empty,** Node **_) ->** rt

**| _ ->** **let** m **=** tree_max lt **in**
Node(delete m lt, m, rt)
**end**

**else**

**if** n **<** x **then** Node(delete n lt, x, rt)
**else** Node(lt, x, delete n rt)

**end**

It isn’t too hard to modify the ^{tree_max} function to return both the
maximum value in the tree and a count of the number of times that it is
called (including by itself, recursively):

**let rec** tree_max2 **(t:’a tree) :** **’a** ***** **int** **=**
**begin match** t **with**

**|** Empty **->** failwith "tree_max called on empty tree"

**|** Node(_,x,Empty) -> (x, **1)**

**|** Node(_,_,rt) -> **let** **(m,** cnt) = tree_max2 rt **in** **(m,** cnt+1)
**end**

Now to modify the^{delete}function itself, we have to “thread through”

this extra information about the count:

**let rec** delete2 **(n:’a) (t:’a tree) :** **’a tree** ***** **int** **=**

**| (Empty,** Empty) -> (Empty, **0)**

**| (Node** **_,** Empty) -> (lt, **0)**

**| (Empty,** Node **_) -> (rt,** **0)**

**| _ ->** **let** **(m,** cnt1) = tree_max2 lt **in**
**let** **(lt2,** cnt2) = delete2 m lt **in**

**(Node(lt2,** m, rt), cnt1 **+** cnt2)
**end**

This is a bit clunky, but it gets even worse if we consider that code
that uses the ^{delete2} method will have to be modified to keep track of
the count too. For example, before these modifications, we could have
written a function that removes all of the elements in a list from a tree very
elegantly by using^{fold}like this:

**let** delete_all **(l:** **’a** **list) (t:** **’a tree) :** **’a tree** **=**
fold delete t l

After modifying ^{delete} to count the calls to ^{tree_max}, we now have
the much more verbose:

**let** delete_all2 **(l:** **’a** **list) (t:** **’a tree) :** **’a tree** ***** **int** **=**
**let** combine **(n:’a) (x:’a tree*****int) :** **’a tree** ***** **int** **=**

**let** **(delete_all_tl,** cnt1) = x **in**

**let** **(ans_t,** cnt2) = delete2 n delete_all_tl **in**
**(ans_t,** cnt1+cnt2)

**in**

fold combine **(t,0)** l

Ugh!

Mutable state lets us sidestep having to change ^{delete}and the all of
the code that uses it. Instead, we can declare a global mutable counter,

which only needs to be incremented whenever^{tree_max}is invoked. Let’s
see how to do this in OCaml:

**type** state **= {mutable** count **:** **int}**

**let** global **:** state **= {count** **=** **0}**

**let rec** tree_max3 **(t:’a tree) :** **’a** **=**

globals.count **<-** globals.count **+** **1;** (* update the count *)
**begin match** t **with**

**|** Empty **->** failwith "tree_max called on empty tree"

**|** Node(_,x,Empty) -> x

**|** Node(_,_,rt) -> tree_max3 rt
**end**

Here, the type^{state}is a record containing a single field called ^{count}.
This field is marked with the keyword ** ^{mutable}**, which indicates that the
value of this field may be updated in place. We then create an instance
of this record type—I have chosen to call it

^{global}as a reminder that this state is available to be modified or read anywhere in the remainder of the program. (We’ll see how to avoid such use of global state below, see §17.)

The only change to the program we need to make is to update the

global.countat the beginning of the^{tree_max}function. OCaml uses the
notationrecord.field <- valueto mean that the (mutable)^{field}
compo-nent of the given^{record}should be updated to contain^{value}. Such
expres-sions are commands—they return a^{unit}value, so the sequencing operator

‘^{;}’ (see §12) is useful when working with imperative updates.

Neither the^{delete}nor the^{delete_all}functions need to be modified.

At any point later in the program, we can find out how many times the

tree_maxfunction has been called by by simply doingglobal.count. We can reset the count at any time by simply doingglobal.count <- 0.

**14.2** **Aliasing: The Blessing and Curse of ** **Muta-ble State**

As illustrated by the example above, mutable state can drastically sim-plify certain programming tasks by allowing one part of the program to interact with a remove part of the program. However, this power is a double-edged sword: mutable state makes it potentially much more

diffi-cult to reason about the behavior of a program, and requires a much more sophisticated model of the computer’s state to properly explain.

To illustrate the fundamental issue, consider the following example.

Suppose we wanted to implement a type for tracking the coordinates of points in a 2D space. We might create a mutable datatype of points, and couple useful operations on them like this:

**type** point **= {mutable** x:int; **mutable** y:int}

(* shift a points coordinates *)

**let** shift **(p:point) (dx:int) (dy:int) :** **unit** **=**
p.x **<-** p.x **+** dx;

p.y **<-** p.y **+** dy

**let** string_of_point **(p:point) :** **string** **=**

"{x=" **ˆ (string_of_int p.x) ˆ**

"; y=" **ˆ (string_of_int p.y) ˆ** "}"

We can now easily create some points and move them around:

**let** p1 **= {x=0;y=0}**

**let** p2 **= {x=17;y=17}**

**;;** shift p1 **12 13**

**;;** shift p2 **2 4**

**;;** print_string **(string_of_point p1)** (* prints {x=12; y=13} *)

**;;** print_string **(string_of_point p2)** (* prints {x=19; y=21} *)

So far, so good.

Now consider this function, which simply sets the^{x}coordinates of two
points and then returns the new coordinate of the first point:

**let** f **(p1:point) (p2:point) :** **int** **=**
p1.x **<-** **17;**

p2.x **<-** **42;**

p1.x

What will this function return? The “obvious” answer is that since^{p1.x}
was set to^{17}, the result of this function will always be^{17}. But that’s wrong!

Sometimes this function can return^{42}. To see why, consider this example:

**let** p **= {x=0;y=0}** **in**

f p p (* f called with the same point for both arguments! *)

Calling^{f}on the same point twice causes the identifiers^{p1}and^{p2}

men-tioned in the body of^{f}to be aliases—these two identifiers are different names
for the same mutable record.

A more explicit way of showing the same thing is to consider the dif-ference between these two tests:

(* p1 and p2 are not aliases *)
**let** p1 **= {x=0;y=0}**

**let** p2 **= {x=0;y=0}**

**;;** shift p2 **3 4**

(* this test will PASS *)
**let** test () **:** **bool** **=**

p1.x **=** **0** **&&** p1.y **==** **0**

**;;** run_test "p1’s coordinates haven’t changed" test
(* p1 and p2 are aliases *)

**let** p1 **= {x=0;y=0}**

**let** p2 **=** p1

**;;** shift p2 **3 4**

(* this test will FAIL *)
**let** test () **:** **bool** **=**

p1.x **=** **0** **&&** p1.y **==** **0**

**;;** run_test "p1’s coordinates haven’t changed" test

Aliasing like that illustrated above shows how programs with muta-ble state can be subtle to reason about—in general the programmer has to know something about which identifiers might be aliases in order to understand the behavior of the program. For small examples like those above, this isn’t too difficult, but the problem becomes much harder as the size of the program grows. The two points passed to a function like

f might themselves have been obtained by some complicated computa-tion, the outcome of which might determine whether or not aliases are provided as inputs.

Such examples also motivate the need for a different model of compu-tation, one that takes into account the “places” affected by mutable up-dates. If we blindly follow the substitution model that has served us so well thus far, we obtain the wrong answer! Here is an example:

**let** p1 **= {x=0;y=0}**

**let** p2 **=** p1 (* create an alias! *)
**let** ans **=** p2.x **<-** **17;** p1.x

7−→ (by substituting the value for p1)

**let** p1 **= {x=0;y=0}**

**let** p2 **= {x=0;y=0}** (* alias information is lost *)
**let** ans **=** p2.x **<-** **17; {x=0;y=0}.x**

7−→ (by substituting the value for p2)

**let** p1 **= {x=0;y=0}**

**let** p2 **= {x=0;y=0}**

**let** ans **= {x=0;y=0}.x** **<-** **17; {x=0;y=0}.x**

7−→ update the x field “in place”, but we need to discard the result

**let** p1 **= {x=0;y=0}**

**let** p2 **= {x=0;y=0}**

**let** ans **=** ignore({x=17;y=0}); {x=0;y=0}.x

7−→

**let** p1 **= {x=0;y=0}**

**let** p2 **= {x=0;y=0}**

**let** ans **=(); {x=0;y=0}.x**

7−→ throw away the unit answer

**let** p1 **= {x=0;y=0}**

**let** p2 **= {x=0;y=0}**

**let** ans **= {x=0;y=0}.x**

7−→ project the x field

**let** p1 **= {x=0;y=0}**

**let** p2 **= {x=0;y=0}**

**let** ans **=** **0** (* WRONG *)

The next chapter develops a computation model, called the abstract stack machine, suitable for explaining the correct behavior of this and all other examples.