• Non ci sono risultati.

Linked Structures: Queues

16.4 Iteration and Tail Calls

Suppose that we want to extend the queue interface to include some operations that work with the queue as a whole, rather than just with one node at a time.

A simple example of such a function is the length operation, which counts the number of elements in the queue:

module type QUEUE = sig

(* type of the data structure *) type 'a queue

(* Get the length of the queue *) val length : 'a queue -> int end

One simple way to implement this function is to directly translate the recursive definition oflength that we are familiar with for immutable lists to the datatype of queues. Since the “linked” structure of the queues is given by the sequence of qnode values and the queue itself consists of just the head and tail pointers, we need to decompose the length operation into two pieces: one part that uses recursion as usual to process the qnodes and one part that deals with the actual

queue type itself. (Note that this is yet another place where the structure of the types guides the code that we write.)

We can therefore writelengthlike this:

(* Calculate the length of the list using recursion *)

In this code, the helper functionlooprecursively follows the sequence ofqnode values along theirnextpointers until the last node is found. Note that this function implicitly assumes that the queue invariants hold, since a cycle in thenextpointers would cause the program to go into an infinite loop.

Although this program will compute the right answer, it is still somehow un-satisfactory: if we observe the behavior of a call to this version oflengthby using the ASM, we can see that the height of the stack portion of the ASM state is pro-portional to the length of the queue—each recursive call toloopwill save a copy of the workspace consisting of1 + (____)and create a new binding for the copy ofno. The queue lecture slides give a complete animation of the ASM for such an example.1

It seems awfully wasteful to use up so much space just to count the number of elements in the queue—why not just talk down the queue, keeping a running total of the number of nodes we’ve seen so far? This version oflengthrequires a slight modification to theloophelper function to allow it to keep track of the count:

(* Calculate the length of the list using iteration *) let length (q:'a queue) : int =

let rec loop (no:'a qnode option) (len:int) : int = begin match no with

Why is this better? At first glance, it seems that we are stuck with the same problems as with the recursive version above. However, note that in this version of the code, the workspace pushed on to the stack at the recursive call toloopwill always be of the form (____). That is, after the call to loopin the Some nbranch of the match, there is no more work left to be done (in contrast, the first version

1To be fair, this same criticism applies to the recursive version of length for immutable lists that we studied earlier.

always pushed workspaces of the form1 + (____), which has a pending addition operation). If we watch the behavior of this program in the ASM, we see that the ASM will always perform a long series of stack pops, restoring these empty workspaces whenloopfinally returns thelenvalue in theNonebranch.

The observation that we will always immediately pop the stack after restoring an empty workspace suggests a way to optimize the ASM behavior: there is no need to push an empty workspace when we do such a function call since it will be immediately popped anyway. Moreover, at the time when we would have pushed an empty workspace, we can also eagerly pop any stack bindings created since the last non-empty workspace was pushed. Why? Since the workspace is empty, we know that none of those stack bindings will be needed again.

We say that a function call that would result in pushing an empty workspace is in a tail call2position. The ASM optimizes such functions as described above—

it doesn’t push the (empty) workspace, and it eagerly pops off stack bindings.

This process is called tail call optimization, and it is frequently used in functional programming.

Why is tail call optimization such a big deal? It effectively turns recursion into iteration. Imperative programming languages includeforandwhileloops that are used to iterate a fragment of code multiple times. The essence of such iteration is that only a constant amount of stack space is used and that parts of the program state are updated each time around the loop, both to accumulate an answer and to determine when the loop should finish.

In the iterative version of the queuelengthfunction shown above, the extralen argument is incremented at each call. If we see how the ASM stack behaves when using tail call optimizations, each such call to loop essentially modifies the stack bindings in place. That is, at every call to loop, there will be one stack binding for no and one stack binding for len3. Since, under tail call optimizations, those two old bindings will be popped and the two new bindings fornoandlenwill be pushed each timeloopis called, the net effect is the same as imperatively updating the stack. The queue lecture slides give a detailed animation of this behavior in the ASM.

The upshot is that writing a loop using tail calls is exactly the same as writing a whileloop in a language like Java, C, or C++. Indeed, tail calls subsume even thebreakandcontinuekeywords that are used to “jump out” ofwhileloops—the

break keyword corresponds to just returning an answer from the loop, and the

2The word “tail” in this definition is not to be confused with the tail of a queue. The “tail” in tail call means that the function call occurs at the “end” of the function body. Note, however, that very often when writing loops over queues you will want a tail call whose argument is towards the tailof the queue, relative to the “current” node being visited.

3And, technically, one for the loop function itself, which is saved in the closure for the loop code when it is moved to the heap.

continuekeyword corresponds to callingloopin a tail-call position.