** Modularity and Abstraction**

**9.2 Abstract types and modularity**

Suppose we want to implement a type of sets in OCaml. How do we go about that process? The first step of design, as always, is to understand the problem—what concepts are involved and how do they relate to one another. Clearly, sets contain elements, but, just as we can have a list of integers and a list of strings, the element type can vary from set to set.

Thus, we expect the^{set}type to be generic over its element type:

1Actually OCaml’s libraries do provide an implementation of sets in the^{Set}module;

here we’ll see how one could write this library oneself.

**type ’a set** **= ...** (* ’a is the type of elements *)

How do we create a set and how do we manipulate the sets once we have them? Here there are many possible design alternatives: we are look-ing for a simple list of operations that will allow us to create and use sets flexibly. We certainly need a way of creating an empty set, and it seems reasonable to be able to add an element to or remove an element from an existing set. Taking a cue from mathematics, we might also consider adding a union operation. It is also worth thinking about how sets re-late to other datatypes like lists: for example, we might want to be able to create a set out of a list of elements. Putting all of these considerations together, we arrive at the following operations for creating sets:

**let** empty **:** **’a set** **= ...**

**let** add **(x:’a) (s:’a set) :** **’a set** **= ...**

**let** remove **(x:’a) (s:’a set) :** **’a set** **= ...**

**let** union **(s1:’a set)(s2:’a set) :** **’a set** **= ...**

**let** list_to_set **(l:’a** **list) :** **’a set** **= ...**

We also need a way of examining the contents of a set, determining whether a given set is empty, or whether it is equal to another set. It might also be useful to be able to enumerate the elements of a set as a list. This yields these further operations:

**let** is_empty **(x:’a set) :** **bool** **= ...**

**let** member **(x:’a) (s:’a set) :** **bool** **= ...**

**let** equal **(s1:a set) (s2:a set) :** **bool** **= ...**

**let** elements **(s:’a set) :** **’a** **list** **= ...**

**Interfaces:**

^{.mli}**files and signatures**

Having understood the concepts related to the set datatype and identified the operations that connect them, we can now proceed to the second step of the design process: formalizing the interface. To do so, we need to un-derstand a bit about how programming languages package together code as re-usable components.

In programming languages jargon, a module is an independently-com-pilable collection of code that (typically) provides a few data types and their associated operations. Modules provide a way of decomposing large software projects into several different pieces, each of which can be

de-veloped in isolation. Modules that are intended to provide common and widely-used implementations of datastructures, algorithms, or other op-erations, are often called libraries.

A key feature of modules is that they provide boundaries between dif-ferent parts of a program. Modules separate code at interfaces, which spec-ify the ways in which external code (outside the module) can legally inter-act with the module’s implementation, which is the code inside the module that determines the behavior of the module’s operations.

The point of the interface is that code outside the module can be writ-ten only with reference to the interface—the external code can’t (and shouldn’t) depend on the particular details of how the operations sup-ported by the interface are implemented.

Different programming languages provide different mechanisms for specifying module interfaces—C uses “header” files and Java uses the

interface keyword, for example. As we shall see, OCaml uses either a

.mlifile or a “module type signature”. The commonality among these ap-proaches is the ability to write down a specification using the types of the operations in the module.

For the ^{’a set}example above, if we look at only the types of each of
the operations, we are left with this type signature:

**type ’a set** (* ’a is the element type *)
**val** empty **:** **’a set**

**val** add **:** **’a** **->** **’a set** **->** **’a set**
**val** union **:** **’a set** **->** **’a set** **->** **’a set**
**val** remove **:** **’a** **->** **’a set** **->** **’a set**
**val** list_to_set **:** **’a** **list** **->** **’a set**
**val** is_empty **:** **’a set** **->** **bool**

**val** member **:** **’a** **->** **’a set** **->** **bool**
**val** equal **:** **’a set** **->** **’a set** **->** **bool**
**val** elements **:** **’a set** **->** **’a** **list**

Note that we are using the “arrow” notation to specify function types
(see §2.7), so we can read the type of^{add}, for example, as a function that
takes a value of type^{’a}and and^{’a set}and returns an^{’a set}. Also note
that, unlike an implementation, we use the keyword** ^{val}**(instead of

**) to say that any module that satisfies this interface will provide a named value of the appropriate type. So any implementation of the**

^{let}^{set}module must provide an

^{add}function with the type mentioned above.

OCaml provides two ways of defining module interfaces. The first way
is to use OCaml modules corresponding to file names. In this style, we
put the above interface code in a file ending with the ^{.mli}extension, for
example ListSet.mli, and the implementation in a similarly-named ^{.ml}
file, for example ^{ListSet.ml}. The “i” part of the file extension stands for

“interface”, and each OCaml ^{.ml} file is, by default, associated with the
correspondingly named^{.mli}file.^{2}

The second way to define an interface is to use an explicitly named
module type signature. For example, we might put the following type
signature in a file called^{MySet.ml}:

**module type** Set **=** **sig** (* A named interface called Set *)
**type ’a set** (* ’a is the element type *)

**val** empty **:** **’a set**

**val** add **:** **’a** **->** **’a set** **->** **’a set**
**val** union **:** **’a set** **->** **’a set** **->** **’a set**
**val** remove **:** **’a** **->** **’a set** **->** **’a set**
**val** list_to_set **:** **’a** **list** **->** **’a set**
**val** is_empty **:** **’a set** **->** **bool**

**val** member **:** **’a** **->** **’a set** **->** **bool**
**val** equal **:** **’a set** **->** **’a set** **->** **bool**
**val** elements **:** **’a set** **->** **’a** **list**

**end**

This program gives the name^{Set}to the interface (a.k.a. module type)
defined by the signature between the** ^{sig}** and

**keywords. Other code can appear before or after this declaration, but it won’t be considered part of the**

^{end}^{Set}signature.

We can also create an explicitly-named module with a given interface
using a similar notation. Rather than put the code implementing each set
operation in a^{.ml}file, we do this:

2In fact, if you don’t create a^{.mli}file for a given^{.ml}file, the OCaml compiler will
create one for you by figuring out the most liberal interface it can for the given
imple-mentation. You may have noticed these files appearing when you work with OCaml
projects in Eclipse.

**module** LSet **:** Set **=** **struct**
**type ’a set** **= ...**

**let** empty **:** **’a set** **= ...**

**let** add **(x:’a) (s:’a set) :** **’a set** **= ...**

**let** remove **(x:’a) (s:’a set) :** **’a set** **= ...**

**let** union **(s1:’a set)(s2:’a set) :** **’a set** **= ...**

**let** list_to_set **(l:’a** **list) :** **’a set** **= ...**

**let** is_empty **(x:’a set) :** **bool** **= ...**

**let** member **(x:’a) (s:’a set) :** **bool** **= ...**

**let** equal **(s1:a set) (s2:a set) :** **bool** **= ...**

**let** elements **(s:’a set) :** **’a** **list** **= ...**

**end**

Here the keywords** ^{struct}**and

**delineate the code that is considered to be part of the**

^{end}^{LSet}module. The advantage of having a named interface is that we can re-use it in other contexts. For example, we might create a second, more efficient, implementation of sets in a new module:

**module** BSet **:** Set **=** **struct**

**...** (* a different implementation of sets *)
**end**

Regardless of whether we choose to use^{.mli}files or explicitly-named
interfaces, OCaml will check to make sure that the implementation
actu-ally complies with the interface. This means that every operation, type, or
value declared in the interface must have an identically-named, but fully
realized, implementation in the module. Moreover, OCaml will check that
the implementation and the interface agree with respect to the types they
use. The implementation may include more than necessary to meet the
interface—it can contain extra types, helper functions, or auxiliary values
that aren’t revealed by the interface.

Because each file of an OCaml program corresponds to a module, if
we want to access components of one file from another file, we have to
either use the module “dot” notation or ** ^{open}** the module to expose the
identifiers it defines. For example, if we have defined the set module
in

^{ListSet.ml}(whose interface is given by ListSet.mli), and we want to use those operations in a different module found in

^{Foo.ml}, we write

ListSet.<identifier>inside of^{Foo.ml}. For example, we might write:

**let** add_to_set **(s:int** **ListSet.set) :** **int** **ListSet.set** **=**
**ListSet.add** **1** **(ListSet.add** **2** s)

This can become burdensome, so OCaml also provides the** ^{open}**
com-mand, which reveals all of the operations defined in a module’s interface
without the need to use the

^{ListSet.}prefix. The following is equivalent to the above:

**;;** **open** ListSet

**let** add_to_set **(s:int** set) : **int** set **=**
add **1** **(add** **2** s)

There is one gotcha—if we use explicitly named modules, we still need
to either explicitly use dot notation for the implicitly defined module
cor-responding to the filename. For example, if the ^{Set}interface and the two
modules ^{LSet} and ^{BSet} were all defined in the ^{MySet.ml} file, we could
write:

**;;** **open** MySet (* reveal the LSet and BSet modules *)
(* work with LSet values *)

**let** add_to_lset **(s:int** **LSet.set) :** **int** LSet **=**
**LSet.add** **1** **(LSet.add** **2** s)

(* work with BSet values *)

**let** add_to_bset **(s:int** **BSet.set) :** **int** BSet **=**
**BSet.add** **1** **(BSet.add** **2** s)