# Abstract types and modularity

## Modularity and Abstraction

### 9.2 Abstract types and modularity

Suppose we want to implement a type of sets in OCaml. How do we go about that process? The first step of design, as always, is to understand the problem—what concepts are involved and how do they relate to one another. Clearly, sets contain elements, but, just as we can have a list of integers and a list of strings, the element type can vary from set to set.

Thus, we expect thesettype to be generic over its element type:

1Actually OCaml’s libraries do provide an implementation of sets in theSetmodule;

here we’ll see how one could write this library oneself.

type ’a set = ... (* ’a is the type of elements *)

How do we create a set and how do we manipulate the sets once we have them? Here there are many possible design alternatives: we are look-ing for a simple list of operations that will allow us to create and use sets flexibly. We certainly need a way of creating an empty set, and it seems reasonable to be able to add an element to or remove an element from an existing set. Taking a cue from mathematics, we might also consider adding a union operation. It is also worth thinking about how sets re-late to other datatypes like lists: for example, we might want to be able to create a set out of a list of elements. Putting all of these considerations together, we arrive at the following operations for creating sets:

let empty : ’a set = ...

let add (x:’a) (s:’a set) : ’a set = ...

let remove (x:’a) (s:’a set) : ’a set = ...

let union (s1:’a set)(s2:’a set) : ’a set = ...

let list_to_set (l:’a list) : ’a set = ...

We also need a way of examining the contents of a set, determining whether a given set is empty, or whether it is equal to another set. It might also be useful to be able to enumerate the elements of a set as a list. This yields these further operations:

let is_empty (x:’a set) : bool = ...

let member (x:’a) (s:’a set) : bool = ...

let equal (s1:a set) (s2:a set) : bool = ...

let elements (s:’a set) : ’a list = ...

.mli

### files and signatures

Having understood the concepts related to the set datatype and identified the operations that connect them, we can now proceed to the second step of the design process: formalizing the interface. To do so, we need to un-derstand a bit about how programming languages package together code as re-usable components.

In programming languages jargon, a module is an independently-com-pilable collection of code that (typically) provides a few data types and their associated operations. Modules provide a way of decomposing large software projects into several different pieces, each of which can be

de-veloped in isolation. Modules that are intended to provide common and widely-used implementations of datastructures, algorithms, or other op-erations, are often called libraries.

A key feature of modules is that they provide boundaries between dif-ferent parts of a program. Modules separate code at interfaces, which spec-ify the ways in which external code (outside the module) can legally inter-act with the module’s implementation, which is the code inside the module that determines the behavior of the module’s operations.

The point of the interface is that code outside the module can be writ-ten only with reference to the interface—the external code can’t (and shouldn’t) depend on the particular details of how the operations sup-ported by the interface are implemented.

Different programming languages provide different mechanisms for specifying module interfaces—C uses “header” files and Java uses the

interface keyword, for example. As we shall see, OCaml uses either a

.mlifile or a “module type signature”. The commonality among these ap-proaches is the ability to write down a specification using the types of the operations in the module.

For the ’a setexample above, if we look at only the types of each of the operations, we are left with this type signature:

type ’a set (* ’a is the element type *) val empty : ’a set

val add : ’a -> ’a set -> ’a set val union : ’a set -> ’a set -> ’a set val remove : ’a -> ’a set -> ’a set val list_to_set : ’a list -> ’a set val is_empty : ’a set -> bool

val member : ’a -> ’a set -> bool val equal : ’a set -> ’a set -> bool val elements : ’a set -> ’a list

Note that we are using the “arrow” notation to specify function types (see §2.7), so we can read the type ofadd, for example, as a function that takes a value of type’aand and’a setand returns an’a set. Also note that, unlike an implementation, we use the keywordval(instead of let) to say that any module that satisfies this interface will provide a named value of the appropriate type. So any implementation of thesetmodule must provide anaddfunction with the type mentioned above.

OCaml provides two ways of defining module interfaces. The first way is to use OCaml modules corresponding to file names. In this style, we put the above interface code in a file ending with the .mliextension, for example ListSet.mli, and the implementation in a similarly-named .ml file, for example ListSet.ml. The “i” part of the file extension stands for

“interface”, and each OCaml .ml file is, by default, associated with the correspondingly named.mlifile.2

The second way to define an interface is to use an explicitly named module type signature. For example, we might put the following type signature in a file calledMySet.ml:

module type Set = sig (* A named interface called Set *) type ’a set (* ’a is the element type *)

val empty : ’a set

val add : ’a -> ’a set -> ’a set val union : ’a set -> ’a set -> ’a set val remove : ’a -> ’a set -> ’a set val list_to_set : ’a list -> ’a set val is_empty : ’a set -> bool

val member : ’a -> ’a set -> bool val equal : ’a set -> ’a set -> bool val elements : ’a set -> ’a list

end

This program gives the nameSetto the interface (a.k.a. module type) defined by the signature between thesig andendkeywords. Other code can appear before or after this declaration, but it won’t be considered part of theSetsignature.

We can also create an explicitly-named module with a given interface using a similar notation. Rather than put the code implementing each set operation in a.mlfile, we do this:

2In fact, if you don’t create a.mlifile for a given.mlfile, the OCaml compiler will create one for you by figuring out the most liberal interface it can for the given imple-mentation. You may have noticed these files appearing when you work with OCaml projects in Eclipse.

module LSet : Set = struct type ’a set = ...

let empty : ’a set = ...

let add (x:’a) (s:’a set) : ’a set = ...

let remove (x:’a) (s:’a set) : ’a set = ...

let union (s1:’a set)(s2:’a set) : ’a set = ...

let list_to_set (l:’a list) : ’a set = ...

let is_empty (x:’a set) : bool = ...

let member (x:’a) (s:’a set) : bool = ...

let equal (s1:a set) (s2:a set) : bool = ...

let elements (s:’a set) : ’a list = ...

end

Here the keywordsstructandenddelineate the code that is considered to be part of theLSetmodule. The advantage of having a named interface is that we can re-use it in other contexts. For example, we might create a second, more efficient, implementation of sets in a new module:

module BSet : Set = struct

... (* a different implementation of sets *) end

Regardless of whether we choose to use.mlifiles or explicitly-named interfaces, OCaml will check to make sure that the implementation actu-ally complies with the interface. This means that every operation, type, or value declared in the interface must have an identically-named, but fully realized, implementation in the module. Moreover, OCaml will check that the implementation and the interface agree with respect to the types they use. The implementation may include more than necessary to meet the interface—it can contain extra types, helper functions, or auxiliary values that aren’t revealed by the interface.

Because each file of an OCaml program corresponds to a module, if we want to access components of one file from another file, we have to either use the module “dot” notation or open the module to expose the identifiers it defines. For example, if we have defined the set module in ListSet.ml (whose interface is given by ListSet.mli), and we want to use those operations in a different module found inFoo.ml, we write

ListSet.<identifier>inside ofFoo.ml. For example, we might write:

let add_to_set (s:int ListSet.set) : int ListSet.set = ListSet.add 1 (ListSet.add 2 s)

This can become burdensome, so OCaml also provides theopen com-mand, which reveals all of the operations defined in a module’s interface without the need to use the ListSet.prefix. The following is equivalent to the above:

;; open ListSet

let add_to_set (s:int set) : int set = add 1 (add 2 s)

There is one gotcha—if we use explicitly named modules, we still need to either explicitly use dot notation for the implicitly defined module cor-responding to the filename. For example, if the Setinterface and the two modules LSet and BSet were all defined in the MySet.ml file, we could write:

;; open MySet (* reveal the LSet and BSet modules *) (* work with LSet values *)

let add_to_lset (s:int LSet.set) : int LSet = LSet.add 1 (LSet.add 2 s)

(* work with BSet values *)

let add_to_bset (s:int BSet.set) : int BSet = BSet.add 1 (BSet.add 2 s)