The Abstract Stack Machine - Programming Languages and Techniques Lecture Notes for CIS 120 Ste

We saw in the last chapter that the simple substitution model of computation breaks down in the presence of mutable state. This chapter presents a more de-tailed model of computation, called the abstract stack machine that lets us faithfully model programs even in the presence of mutable state. While we develop this model in the context of explaining OCaml programs, small variants of it can be used to understand the behaviors of programs written in almost any other mod-ern programming language, including Java, C, C++, or C#. The abstract stack ma-chine (abbreviated ASM) is therefore an important tool for thinking about software behavior.

The crucial distinction between the ASM and the simple substitution model presented earlier is that the ASM properly accounts for the locations of data in the computer’s memory. Modeling such spatial locality is essential, precisely because mutable update modifies some part of the computer’s memory in place—the no-tion of “where” an update occurs is therefore necessary. As we shall see, the ASM gives an accurate picture of how OCaml (and other type-safe, garbage-collected languages like Java and C#) represents data structures internally, which helps us predict how much space a program will need and how fast it will run.

Despite this added realism, the ASM is still abstract—it hides many of the de-tails of the computer’s actual memory structure and representation of data. The level of detail present in the ASM is chosen so that we can understand the behavior of programs in a way that doesn’t depend on the underlying computer hardware, operating system, or other low-level details about memory. Such details might be needed to understand C or C++ programs for example, which aren’t type-safe and require programmers to manually manage memory allocation.

15.1 Parts of the ASM

Recall that the substitution model of computation had two basic notions:

• values, which are “finished” results such as integers (e.g. ⁰,¹, . . . ), tu-ples of values (e.g. ^(1,(2,3))), records (e.g. ^{x=0; ^y=1}), functions (e.g.

(fun (x:int) -> x + 1)), or constructors applied to values (e.g.Cons(3,Nil)).

• expressions, which are “computations in progress”, like ¹⁺²*3, ^{(f x)}, ^p.x,

begin match ... with ... end, etc..

The substitution model computes by simplifying an expression—that is, repeat-edly substituting values for identifiers, and performing simple calculations—until no more simplification can be done.

For programs that don’t use mutable state, the abstract stack machine will achieve the same net results. That is, for pure programs, the ASM can be thought of as a (complicated) way of implementing substitution and simplification. We didn’t previously specify precisely what was meant by “substitution”; instead we relied on your intuitions about what it means to replace an identifier with a value, and just left it at that. The ASM gives an explicit algorithm for implementing substitu-tion using a stack, and further refines the nosubstitu-tion of value and computasubstitu-tion model to keep track of where in memory the data structures reside.

There are three basic parts of the abstract stack machine model:

• The workspace keeps track of the expression or command that the computer is currently simplifying. As the program evaluates, the contents of the workspace change to reflect the progress being made by the computation.

• The stack keeps track of a sequence of bindings that map identifiers to their values. New bindings are added to the stack when the ^let expression is simplified. Later, when an identifier is encountered during simplification, its associated value can be found by looking in the stack. The stack also keeps track of partially simplified expressions that are waiting for the results of function calls to be computed.

• The heap models the computer’s memory, which is used for storage of non-primitive data values. It specifies (abstractly) where data structures reside, and shows how they reference one another.

Figure 15.1 shows these three parts of an ASM in action. The sections below explain each of these pieces in more detail and explains how they work together.

First, however, we need to understand how the ASM represents its values.

begin match l1 with!

Figure 15.1: A picture of an Abstract Stack Machine in mid evaluation of a list operation append a b. The code in the workspace is preparing to look up the value of the local identi-fier l1 in the stack (as indicated by the underline) before proceeding on to do a pattern match. The stack contains bindings for all of the identifiers introduced to this point, including append and the two lists a and b. It also has a saved workspace, which is there because append is recursive, and bindings for l1 and l2. The stack values are themselves references into the heap, which stores both code (for the body of the append function itself), and the list structures built from Cons and Nilcells. The arrows are references, as explained in §15.2.

15.2 Values and References to the Heap

The ASM makes a distinction between two kinds of values. Primitive values are integers, booleans, characters, and other “small” pieces of data for which the com-puter provides basic operations such as addition or the boolean logic operations.

All other values are references to structured data stored in the heap. A reference is the address (or location) of a piece of data in the heap. Pictorially, we draw a refer-ence as an “arrow”—the start of the arrow is the referrefer-ence itself (i.e. the address).

The arrow “points” to a piece of data in the heap, which is located at the reference address. Figure 15.2 shows how we visualize reference values in the ASM.

The heap itself contains three different kinds of data:

• A cell is labeled by a datatype constructor (such as ^Cons or ^Nil) and con-tains a sequence of constructor arguments. Figure 15.2 shows two different heap cells. One is labeled with constructor name^Cons, which has two argu-ments, namely³and a reference to the second heap cell, which is labeled^Nil

130 The Abstract Stack Machine Figure 15.2: A pictorial representation of reference values.

!"#$%&'("))*+,$

let p2 : point = . !

Figure 15.3: The state of the ASM just before it creates the stack binding for p2. Note that p2 is an alias of p1—they both point to the same record of the heap.

(and has no arguments). The arguments to a constructor are themselves just values (either primitive values or references.) Note that, conceptually, the constructor names like^Constake up some amount of space in the heap.

• A record contains a value for each of its fields. Unlike constructor cells, the field names of a record don’t actually take up space in the heap, but we mark them anyway to help us do book keeping. We mark mutable record fields using a “doubled” box, as shown, for example, by the record of type^pointin Figure 15.3. Such a mutable field is the “place” where an “in-place” update occurs, as we shall see below.

• A function, which is just an anonymous function value of the form^(fun (x1:t1) ... (xn:tn) -> e). Figure 15.1 shows that the stack binding for^appendis a reference to the code

for the^appendfunction itself.¹

1We will have to refine this notion of heap-allocated functions slightly to account for local func-tions. See 17.1 for details.

Nel documento Programming Languages and Techniques Lecture Notes for CIS 120 Steve Zdancewic Stephanie Weirich University of Pennsylvania March 25, 2018 (pagine 127-131)