We saw in the last chapter that the simple substitution model of computa-tion breaks down in the presence of mutable state. This chapter presents a more detailed model of computation, called the abstract stack machine that lets us faithfully model programs even in the presence of mutable state. While we develop this model in the context of explaining OCaml programs, small variants of it can be used to understand the behaviors of programs written in almost any other modern programming language, including Java, C, C++, or C#. The abstract stack machine (abbreviated ASM) is therefore an important tool for thinking about software behavior.

The crucial distinction between the ASM and the simple substitution model presented earlier is that the ASM properly accounts for the loca-tions of data in the computer’s memory. Modeling such spatial locality is essential, precisely because mutable update modifies some part of the computer’s memory in place—the notion of “where” an update occurs is therefore necessary. As we shall see, the ASM gives an accurate picture of how OCaml (and other type-safe, garbage-collected languages like Java and C#) represents data structures internally, which helps us predict how much space a program will need and how fast it will run.

Despite this added realism, the ASM is still abstract—it hides many of the details of the computer’s actual memory structure and representation of data. The level of detail present in the ASM is chosen so that we can understand the behavior of programs in a way that doesn’t depend on the underlying computer hardware, operating system, or other low-level de-tails about memory. Such dede-tails might be needed to understand C or C++

programs for example, which aren’t type-safe and require programmers to

manually manage memory allocation.

15.1 Parts of the ASM

Recall that the substitution model of computation had two basic notions:

• values, which are “finished” results such as integers (e.g. 0,1, . . . ), tuples of values (e.g.(1,(2,3))), records (e.g.{x=0; y=1}), functions (e.g.(fun (x:int) -> x + 1)), or constructors applied to values (e.g.

Cons(3,Nil)).

• expressions, which are “computations in progress”, like1+2*3,(f x),

p.x,begin match ... with ... end, etc..

The substitution model computes by simplifying an expression—that is, repeatedly substituting values for identifiers, and performing simple calculations—until no more simplification can be done.

For programs that don’t use mutable state, the abstract stack machine will achieve the same net results. That is, for pure programs, the ASM can be thought of as a (complicated) way of implementing substitution and simplification. We didn’t previously specify precisely what was meant by

“substitution”; instead we relied on your intuitions about what it means to replace an identifier with a value, and just left it at that. The ASM gives an explicit algorithm for implementing substitution using a stack, and further refines the notion of value and computation model to keep track of where in memory the data structures reside.

There are three basic parts of the abstract stack machine model:

• The workspace keeps track of the expression or command that the computer is currently simplifying. As the program evaluates, the contents of the workspace change to reflect the progress being made by the computation.

• The stack keeps track of a sequence of bindings that map identifiers to their values. New bindings are added to the stack when thelet ex-pression is simplified. Later, when an identifier is encountered dur-ing simplification, its associated value can be found by lookdur-ing in the stack. The stack also keeps track of partially simplified expressions that are waiting for the results of function calls to be computed.

begin match l1 with!

Figure 15.1: A picture of an Abstract Stack Machine in mid evaluation of a list opera-tionappend a b. The code in the workspace is preparing to look up the value of the local identifierl1in the stack (as indicated by the underline) before proceeding on to do a pat-tern match. The stack contains bindings for all of the identifiers introduced to this point, includingappendand the two listsaandb. It also has a saved workspace, which is there becauseappendis recursive, and bindings forl1andl2. The stack values are themselves references into the heap, which stores both code (for the body of theappendfunction it-self), and the list structures built from Consand Nilcells. The arrows are references, as explained in §15.2.

• The heap models the computer’s memory, which is used for stor-age of non-primitive data values. It specifies (abstractly) where data structures reside, and shows how they reference one another.

Figure 15.1 shows these three parts of an ASM in action. The sections below explain each of these pieces in more detail and explains how they work together. First, however, we need to understand how the ASM rep-resents its values.

15.2 Values and References to the Heap

The ASM makes a distinction between two kinds of values. Primitive val-ues are integers, booleans, characters, and other “small” pieces of data

136 The Abstract Stack Machine

Figure 15.2: A pictorial representation of reference values.

for which the computer provides basic operations such as addition or the boolean logic operations.

All other values are references to structured data stored in the heap.

A reference is the address (or location) of a piece of data in the heap. Pic-torially, we draw a reference as an “arrow”—the start of the arrow is the reference itself (i.e. the address). The arrow “points” to a piece of data in the heap, which is located at the reference address. Figure 15.2 shows how we visualize reference values in the ASM.

The heap itself contains three different kinds of data:

• A cell is labeled by a datatype constructor (such asConsorNil) and contains a sequence of constructor arguments. Figure 15.2 shows two different heap cells. One is labeled with constructor nameCons, which has two arguments, namely3 and a reference to the second heap cell, which is labeled Nil (and has no arguments). The argu-ments to a constructor are themselves just values (either primitive values or references.) Note that, conceptually, the constructor names likeConstake up some amount of space in the heap.

• A record contains a value for each of its fields. Unlike constructor cells, the field names of a record don’t actually take up space in the heap, but we mark them anyway to help us do book keeping. We mark mutable record fields using a “doubled” box, as shown, for example, by the record of typepointin Figure 15.3. Such a mutable field is the “place” where an “in-place” update occurs, as we shall see below.

• A function, which is just an anonymous function value of the form

(fun (x1:t1) ... (xn:tn) -> e). Figure 15.1 shows that the stack

let p2 : point = . ! let ans : int = 


p2.x <- 17; p1.x!

-+(.)'/0"$ 1#/0.$ 2"/'$

3415678$1'(*,9$6755$

'5$ x! 1!

y! 1!

Figure 15.3: The state of the ASM just before it creates the stack binding forp2. Note thatp2is an alias ofp1—they both point to the same record of the heap.

binding forappendis a reference to the code for theappendfunction itself.1

Nel documento Programming Languages and Techniques Lecture Notes for CIS 120 Steve Zdancewic Stephanie Weirich University of Pennsylvania February 9, 2014 (pagine 133-137)