** Binary Search Trees**

**7.1 Creating Binary Search Trees**

We now know how to do efficient lookup in a binary search tree by ex-ploiting the ordering of the labels and the invariants of the tree. How do we go about obtaining such a tree?

One possibility would be to simply check whether a given tree satisfies the binary search tree invariant. We can code up the invariant as a boolean valued-function that determines whether a given tree is a binary search tree or not.

To do so, we first create two helper functions that can determine whether all of the nodes of a tree are less than (or greater than) a particular

intvalue:

(* helper functions for writing is_bst *)

(* (tree_less t n) is true when all nodes of t are strictly less than n

*)

**let rec** tree_less **(t:tree) (n:int) :** **bool** **=**
**begin match** t **with**

**|** Empty **->** true

**|** Node(lt, x, rt) ->

x **<** n **&& (tree_less lt n) && (tree_less rt n)**
**end**

(* (tree_gtr t n) is true when all nodes of t are strictly greater than n

*)

**let rec** tree_gtr **(t:tree) (n:int) :** **bool** **=**
**begin match** t **with**

**|** Empty **->** true

**|** Node(lt, x, rt) ->

x **>** n **&& (tree_gtr lt n) && (tree_gtr rt n)**
**end**

The ^{is_bst}function uses these two helpers to directly encode the
bi-nary search tree invariant as a program:

121 if you’re interested in such datastructures.

(* determines whether t satisfies the bst invariant *)
**let rec** is_bst **(t:tree) :** **bool** **=**

**begin match** t **with**

**|** Empty **->** true

**|** Node(lt, x, rt) ->

is_bst lt **&&** is_bst rt **&&**

**(tree_less lt x) && (tree_gtr rt x)**
**end**

This solution isn’t very satisfactory, though. It is very unlikely that some tree we happen to obtain from somewhere actually satisfies the bi-nary search tree invariant. Moreover, checking that the tree satisfies the invariant is pretty expensive.

A better way to construct a binary search tree, is to start with a simple
binary search tree like ^{Empty}, which trivially satisfies the invariant, and
then^{insert}or^{delete}nodes as desired to obtain a new binary search tree.

Each operation must preserve the binary search tree invariant: given a
binary search tree ^{t} as input, ^{insert t n} should produce a new binary
search tree that contains the same set of elements as ^{t} but additionally
contains^{n}. If^{t}happens to already contain^{n}, then the resulting tree is just

titself.

How can we implement such an insertion function? The key idea is
that inserting a new element is just like searching for it—if we happen
to find the element we’re trying to insert, then the result is just the input
tree. On the other hand, if the input tree does not already contain the
element we’re trying to insert, then the search will find a ^{Empty}tree, and
we just need to replace that empty tree with a new leaf node containing
the inserted element. In code:

(* ASSUMES: t is a binary search tree.

Inserts n into the binary search tree t,
yielding a new binary search tree *)
**let rec** insert **(t:tree) (n:int) :** tree **=**

**begin match** t **with**

**|** Empty **->**

(* element not found, create a new leaf *) Node(Empty, n, Empty)

**|** Node(lt, x, rt) ->

**if** x **=** n **then** t

**else if** n **<** x **then** Node **(insert lt n,** x, rt)
**else** Node(lt, x, insert rt n)

**end**

The code for^{insert}exactly mirrors that of^{lookup}, except that it returns
a tree at each stage rather than simply searching for the element. We have
to check that the resulting tree maintains the binary search tree invariant,
but this is easy to see, since we only ever insert a node ^{n} to the left of a
node^{x}when^{n}is strictly less than^{x}. (And similarly for insertion into the
right subtree.)

Deletion is more complex, because there are several cases to consider.

If the node we are trying to delete is not already in the tree, then ^{delete}
can simply return the original tree. On the other hand, if the node is in
the tree, there are three possibilities. First: the node to be deleted is a leaf.

In that case, we simply remove the leaf node by replacing it with^{Empty}.
Second, the node to be deleted has exactly one child. This case too is easy
to handle: we just replace the deleted node with its child tree. The last
case is when the node to be deleted has two non-empty subtrees. The
question is how delete the node while still maintaining the binary search
tree invariant.

This requires a bit of cleverness. Observe that the left subtree must be
non-empty, so it by definition contains a maximal element, call it^{m}, that is
still strictly less than^{n}, the node to be deleted. Note also that^{m}is strictly
less than all of the nodes in the right subtree of^{n}. Both of these properties
follow from the binary search tree invariant. We can therefore promote ^{m}
to replace^{n} in the resulting tree, but we have to also remove ^{m}(which is
guaranteed to be a leaf) at the same time.

Putting all of these observations together gives us the following code:

(* returns the maximum integer in a NONEMPTY bst t *)
**let rec** tree_max **(t:tree) :** **int** **=**

**begin match** t **with**

**|** Empty **->** failwith "tree_max called on empty tree"

**|** Node(_,x,Empty) -> x

**|** Node(_,_,rt) -> tree_max rt
**end**

(* returns a binary search tree that has the same set of
nodes as t except with n removed (if it’s there) *)
**let rec** delete **(n:int) (t:tree) :** tree **=**

**begin match** t **with**

**|** Empty **->** Empty

**|** Node(lt,x,rt) ->

**if** x **=** n **then**

**begin match** **(lt,rt)** **with**

**| (Empty,** Empty) -> Empty

**| (Node** **_,** Empty) -> lt

**| (Empty,** Node **_) ->** rt

**| _ ->** **let** m **=** tree_max lt **in**
Node(delete m lt, m, rt)
**end**

**else**

**if** n **<** x **then** Node(delete n lt, x, rt)
**else** Node(lt, x, delete n rt)

**end**