Mutable State and Aliasing

14.1 Mutable Records

To see how the “action at a distance” provided by mutable state can sim-plify some programming problems, let’s consider a simple task. Suppose we wanted to do a performance analysis of the delete operation for the binary search trees we saw in §7. In particular, suppose that we want to count the number of times that the helper functiontree_maxis called, ei-ther bydeleteor via recursion.

First, let’s recall the definition of these functions:

(* returns the maximum integer in a NONEMPTY BST t *) let rec tree_max (t:tree) : int =

begin match t with

| Empty -> failwith "tree_max called on empty tree"

| Node(_,x,Empty) -> x

| Node(_,_,rt) -> tree_max rt end

(* returns a binary search tree that has the same set of nodes as t except with n removed (if it’s there) *) let rec delete (n:’a) (t:’a tree) : ’a tree =

begin match t with

| Empty -> Empty

| Node(lt,x,rt) ->

if x = n then

begin match (lt,rt) with

| (Empty, Empty) -> Empty

| (Node _, Empty) -> lt

| (Empty, Node _) -> rt

| _ -> let m = tree_max lt in Node(delete m lt, m, rt) end

else

if n < x then Node(delete n lt, x, rt) else Node(lt, x, delete n rt)

end

It isn’t too hard to modify the tree_max function to return both the maximum value in the tree and a count of the number of times that it is called (including by itself, recursively):

let rec tree_max2 (t:’a tree) : ’a * int = begin match t with

| Empty -> failwith "tree_max called on empty tree"

| Node(_,x,Empty) -> (x, 1)

| Node(_,_,rt) -> let (m, cnt) = tree_max2 rt in (m, cnt+1) end

Now to modify thedeletefunction itself, we have to “thread through”

this extra information about the count:

let rec delete2 (n:’a) (t:’a tree) : ’a tree * int =

| (Empty, Empty) -> (Empty, 0)

| (Node _, Empty) -> (lt, 0)

| (Empty, Node _) -> (rt, 0)

| _ -> let (m, cnt1) = tree_max2 lt in let (lt2, cnt2) = delete2 m lt in

(Node(lt2, m, rt), cnt1 + cnt2) end

This is a bit clunky, but it gets even worse if we consider that code that uses the delete2 method will have to be modified to keep track of the count too. For example, before these modifications, we could have written a function that removes all of the elements in a list from a tree very elegantly by usingfoldlike this:

let delete_all (l: ’a list) (t: ’a tree) : ’a tree = fold delete t l

After modifying delete to count the calls to tree_max, we now have the much more verbose:

let delete_all2 (l: ’a list) (t: ’a tree) : ’a tree * int = let combine (n:’a) (x:’a tree*int) : ’a tree * int =

let (delete_all_tl, cnt1) = x in

let (ans_t, cnt2) = delete2 n delete_all_tl in (ans_t, cnt1+cnt2)

in

fold combine (t,0) l

Ugh!

Mutable state lets us sidestep having to change deleteand the all of the code that uses it. Instead, we can declare a global mutable counter,

which only needs to be incremented whenevertree_maxis invoked. Let’s see how to do this in OCaml:

type state = {mutable count : int}

let global : state = {count = 0}

let rec tree_max3 (t:’a tree) : ’a =

globals.count <- globals.count + 1; (* update the count *) begin match t with

| Empty -> failwith "tree_max called on empty tree"

| Node(_,x,Empty) -> x

| Node(_,_,rt) -> tree_max3 rt end

Here, the typestateis a record containing a single field called count. This field is marked with the keyword mutable, which indicates that the value of this field may be updated in place. We then create an instance of this record type—I have chosen to call itglobalas a reminder that this state is available to be modified or read anywhere in the remainder of the program. (We’ll see how to avoid such use of global state below, see §17.)

The only change to the program we need to make is to update the

global.countat the beginning of thetree_maxfunction. OCaml uses the notationrecord.field <- valueto mean that the (mutable)field compo-nent of the givenrecordshould be updated to containvalue. Such expres-sions are commands—they return aunitvalue, so the sequencing operator

;’ (see §12) is useful when working with imperative updates.

Neither thedeletenor thedelete_allfunctions need to be modified.

At any point later in the program, we can find out how many times the

tree_maxfunction has been called by by simply doingglobal.count. We can reset the count at any time by simply doingglobal.count <- 0.

14.2 Aliasing: The Blessing and Curse of Muta-ble State

As illustrated by the example above, mutable state can drastically sim-plify certain programming tasks by allowing one part of the program to interact with a remove part of the program. However, this power is a double-edged sword: mutable state makes it potentially much more

diffi-cult to reason about the behavior of a program, and requires a much more sophisticated model of the computer’s state to properly explain.

To illustrate the fundamental issue, consider the following example.

Suppose we wanted to implement a type for tracking the coordinates of points in a 2D space. We might create a mutable datatype of points, and couple useful operations on them like this:

type point = {mutable x:int; mutable y:int}

(* shift a points coordinates *)

let shift (p:point) (dx:int) (dy:int) : unit = p.x <- p.x + dx;

p.y <- p.y + dy

let string_of_point (p:point) : string =

"{x=" ˆ (string_of_int p.x) ˆ

"; y=" ˆ (string_of_int p.y) ˆ "}"

We can now easily create some points and move them around:

let p1 = {x=0;y=0}

let p2 = {x=17;y=17}

;; shift p1 12 13

;; shift p2 2 4

;; print_string (string_of_point p1) (* prints {x=12; y=13} *)

;; print_string (string_of_point p2) (* prints {x=19; y=21} *)

So far, so good.

Now consider this function, which simply sets thexcoordinates of two points and then returns the new coordinate of the first point:

let f (p1:point) (p2:point) : int = p1.x <- 17;

p2.x <- 42;

p1.x

What will this function return? The “obvious” answer is that sincep1.x was set to17, the result of this function will always be17. But that’s wrong!

Sometimes this function can return42. To see why, consider this example:

let p = {x=0;y=0} in

f p p (* f called with the same point for both arguments! *)

Callingfon the same point twice causes the identifiersp1andp2

men-tioned in the body offto be aliases—these two identifiers are different names for the same mutable record.

A more explicit way of showing the same thing is to consider the dif-ference between these two tests:

(* p1 and p2 are not aliases *) let p1 = {x=0;y=0}

let p2 = {x=0;y=0}

;; shift p2 3 4

(* this test will PASS *) let test () : bool =

p1.x = 0 && p1.y == 0

;; run_test "p1’s coordinates haven’t changed" test (* p1 and p2 are aliases *)

let p1 = {x=0;y=0}

let p2 = p1

;; shift p2 3 4

(* this test will FAIL *) let test () : bool =

p1.x = 0 && p1.y == 0

;; run_test "p1’s coordinates haven’t changed" test

Aliasing like that illustrated above shows how programs with muta-ble state can be subtle to reason about—in general the programmer has to know something about which identifiers might be aliases in order to understand the behavior of the program. For small examples like those above, this isn’t too difficult, but the problem becomes much harder as the size of the program grows. The two points passed to a function like

f might themselves have been obtained by some complicated computa-tion, the outcome of which might determine whether or not aliases are provided as inputs.

Such examples also motivate the need for a different model of compu-tation, one that takes into account the “places” affected by mutable up-dates. If we blindly follow the substitution model that has served us so well thus far, we obtain the wrong answer! Here is an example:

let p1 = {x=0;y=0}

let p2 = p1 (* create an alias! *) let ans = p2.x <- 17; p1.x

7−→ (by substituting the value for p1)

let p1 = {x=0;y=0}

let p2 = {x=0;y=0} (* alias information is lost *) let ans = p2.x <- 17; {x=0;y=0}.x

7−→ (by substituting the value for p2)

let p1 = {x=0;y=0}

let p2 = {x=0;y=0}

let ans = {x=0;y=0}.x <- 17; {x=0;y=0}.x

7−→ update the x field “in place”, but we need to discard the result

let p1 = {x=0;y=0}

let p2 = {x=0;y=0}

let ans = ignore({x=17;y=0}); {x=0;y=0}.x

7−→

let p1 = {x=0;y=0}

let p2 = {x=0;y=0}

let ans =(); {x=0;y=0}.x

7−→ throw away the unit answer

let p1 = {x=0;y=0}

let p2 = {x=0;y=0}

let ans = {x=0;y=0}.x

7−→ project the x field

let p1 = {x=0;y=0}

let p2 = {x=0;y=0}

let ans = 0 (* WRONG *)

The next chapter develops a computation model, called the abstract stack machine, suitable for explaining the correct behavior of this and all other examples.

Nel documento Programming Languages and Techniques Lecture Notes for CIS 120 Steve Zdancewic Stephanie Weirich University of Pennsylvania February 9, 2014 (pagine 126-133)