Trees as data structures - HAMILTON CYCLES 4-25

N ETWORK TRAVERSAL

4.2. HAMILTON CYCLES 4-25

5.1.2 Trees as data structures

Trees are also used extensively to organize data in computer systems. In particular, they appear as so-called rooted trees, which is a tree with a single vertex designated as the root. To given an example of how trees can be used to represent data, consider the following well-known arithmetic expression describing one solution (if it exists) of the quadratic equation ax²+bx+c:

x= −b+√

b²−4ac 2a

Computers need to process such expressions, to which end they first need to be stored. This can be done conveniently in the form of the rooted tree shown in Figure 5.2. The tree contains two types of nodes. The leaf nodes, which are the ones having degree 1 forming the “lowest level” nodes, con-tain the variables or constants. In our example, we have one variable, namely x, and three constants, a, b, and c. The other, intermediate nodes, represent operations.

The link between the original expression and the tree may be better un-derstood if we rewrite the expression as

x= (−b+_sqrt(b∗b−4∗ (a∗c)))_/(₂∗a)

where we now use the function sqrt(y) as an equivalent notation for√ y.

Note how each operator has either one or two descendants, depending on

x /

-b

-* *

b b 4

a c

2 a

Figure 5.2: The representation of an arithmetic expression as a tree.

whether we are dealing with a unary operator (which operates on one vari-able or constant), or a binary operator (which takes two arguments).

Note 5.1 (More information)

In fact, we can replace each of the other operations with functions like sqrt as follows:

operation function type

= eq binary

+ sum binary

− min binary

∗ mul binary

/ div binary

− neg unary

√ sqrt unary

As said, we make a distinction between binary and unary operations, where it should be noted that the operation “−” is used in two different forms. Note also that sqrt is indeed a unary operation. With these functions, we can rewrite our original expression as:

eq(x,div(add(neg(b),sqrt(min(mul(b,b),mul(4,mul(a,b))))),mul(2,a))) What has happened in comparison to the original expression, is that we have switched from what is known as an infix notation to a prefix notation. In the former, operators are placed between variables and constants, whereas with the latter they are placed in front of them. To a computer it makes no difference.

The only thing that does matter is the organization of the rooted tree as given in Figure 5.2, as this tree is an unambiguous representation of the expression.

5.1. BACKGROUND 5-7

The common terminology for rooted trees that are used for data struc-tures is to say that every node has one or more descendants. Likewise, each node except the root node is said to have a parent. Note also that each node u having k descendants is the root of k subtrees, each subtree in turn rooted by a respective descendant of u. A special case is a binary tree in which there are exactly two descendants for each intermediate node.

Binary trees come in handy when we need to quickly look up elements in a finite ordered set. An ordered set S= {x₁, x₂, . . . , xn}, has the property that x_i < x_jif i < j. As an example, any finite subset of natural numbers forms an ordered set. Consider the set

S= {3, 6, 8, 12, 15, 20, 21, 27, 32, 33, 34, 45, 49, 51, 56, 60, 61}.

This set consists of 17 elements. To represent it as a binary tree, each element will be represented by a node. We demand for each node x that all elements in the left subtree represent values that are less than x, and all elements in the right subtree have larger values. Leaf nodes store no value; they are simply added for convenience as we discuss next.

Figure 5.3 shows the representation of our set S as a binary tree. Now suppose that we wish to look up the element x=16, which is not contained in S. In this case, we start at the root node, which has value 27, continue in its left subtree reaching 12. If x= 16 is contained in S, it must be stored in the right subtree, which brings us to node 20, and from there on to 15, where we can stop the traversal, now knowing that 16 is not in S.

12 49

20 33

32 34

51 61

60 6

3 8 15 21

Figure 5.3: The representation of a set of natural numbers as a binary tree.

Now compare this lookup operation with the situation that we had rep-resented S as a list. In that case, we would start from the first element and subsequently move through the list until reaching the first element larger than 16. On average, a lookup operation would require inspecting|S|/2 el-ements if S had been represented as a list. In the case of using binary trees,

one can show that the number of operations are approximately log₂(|S|), which is considerably less as S becomes larger. Further details on how to use trees for representing data in computers can be found in [Goodrich and Tamassia, 2002].

5.2 Fundamentals

Before discussing various tree-related algorithms, let’s first consider a few characteristic features of trees. We start with the following observation:

Theorem 5.1:For any connected graph G with n vertices and m edges, n≤m+1.

Proof. The proof proceeds by induction on the number of edges m. Clearly, if m=1, we necessarily have n=2 so that the theorem is true. Now assume the theorem is true for all graphs with fewer than k edges and consider a graph G with exactly k edges and n vertices.

Suppose that G contains a cycle C. In that case, choose an arbitrary edge e ∈ E(C)and construct the induced subgraph G^∗ = G−e. Because e was lying on the cycle C, G^∗will still be connected, meaning that n= |V(G^∗)| ≤

|E(G^∗)| +₁ = (k−1) +₁ = k. But in that case, we certainly have that n≤k+1.

If G does not contain a cycle, find a longest path P in G. Let u and w be the end points of P. Note that the degree of each these nodes must be 1, for otherwise P could not have been a longest path. Now consider the induced subgraph G^∗ = G−u. Clearly, G^∗ is connected and we have

|V(G^∗)| =n−1 and E(G^∗) =k−1. By induction, we thus also have that n−1≤ (k−1) +1=k, and thus n≤k+1, completing our proof.

Note 5.2 (Proof techniques)

Again, we have encountered a proof by induction. Note that the approach we have taken is common to many such proofs. After having proven that the theo-rem holds for an initial, generally almost trivial case, we proceed with assuming that the theorem holds up until and including the case that m=k. We then con-sider a situation with k+1 edges. In our attempt to prove the theorem, we try to reduce the new graph to one with at least one edge less, knowing that in that case we can assume the theorem holds. This brings us to a new starting situa-tion from where on we need to show that the theorem also holds in the original situation with k+1 edges.

Furthermore, note also that we have combined a proof by induction with extremality by looking at a longest path P, from which we then remove an edge

“at the extreme.” As we stated before, it is important to fully understand these

5.2. FUNDAMENTALS 5-9

proofs, as they enforce you to understand details and techniques that are com-mon to many graph-theoretical problems.

It should now come as no real surprise that trees obey the following prop-erty:

Theorem 5.2:For any tree T with n vertices and m edges, n=m+1.

Note that we already proved this theorem in Chapter 2 (Lemma 2.1 on page 2-34). We leave it as an exercise to the reader to provide an alterna-tive proof, based on the proof of Theorem 5.1. Interestingly, the implication formulated in the previous proof also holds in the opposite direction:

Theorem 5.3:A connected graph G with n vertices and m edges for which n = m+1, is a tree.

Proof. We prove the theorem by contradiction. To this end, assume G is not a tree, i.e., it contains a cycle C. Let edge e∈ E(C). Obviously, the induced subgraph G−e is still connected, but with one edge less than G. From Theorem 5.1 we know that|V(G−e)| ≤ |E(G−e)| +_{1. With}|V(G−e)| =n and|E(G−e)| =m−1, we thus have that n≤ (m−1) +1=m. However, we assumed that n = m+1, which contradicts that n ≤ m. Hence, our initial assumption, namely that G is not a tree, was false.

Let us proceed with another important characterization of trees:

Theorem 5.4:A graph G is a tree if and only if there exists exactly one path between every two vertices u and v.

Proof. Recall that the phrase “if and only if” means that we need to prove two things: (1) If G is a tree then there exists a unique path between every two vertices and (2) if there exists a unique path between every two vertices, then G is tree.

(1) Let G be a tree and let u and v be two distinct vertices. Because G is connected, there exists a (u, v)-path P. Assume there is another, distinct(_{u, v})-path Q. Let x be the last vertex on P that is also on Q when traversing P starting from u. In other words, the next vertex following x will be different for P and for Q, as shown in Figure 5.4.

Likewise, let y be the first vertex succeeding x that is common to both P and Q again. We have now identified a cycle in G, contradicting that G was a tree.

u x y v Q

Figure 5.4: The construction of a cycle based on two distinct(_{u, v})-paths.

(2) Now assume that G is not a tree. Note that because there is a path be-tween every two vertices, G is connected. If G is not a tree, there must be a cycle C = [v1, v2, . . . , vn = v1]. Clearly, for every two distinct vertices viand vj (i < j) on C we have also have two distinct(v_i, vj) -paths: P₁ = [v_i, v_i+1, . . . , v_j−1, v_j] and P₂ = [v_i, v_i−1, . . . , v₂, v₁ = vn, v_n−1, . . . , v_j+1, v_j], which contradicts the uniqueness of paths.

Before we provide another characterization of trees, we prove the following, intuitively simple theorem:

Theorem 5.5:An edge e of a graph G is a cut edge if and only if e is not part of any cycle of G.

Proof. Again, we need to prove two things: (1) If e is not part of any cycle, then e is a cut edge, and (2) if e is a cut edge, it cannot be part of any cycle of G.

(1) By contradiction: assume that e= hu, viis not a cut edge (and not part of any cycle). If e is not a cut edge, then u and v must still be in the same component of G−e. This implies that there is a(u, v)-path P in G−e connecting u and v. However, this also means that P+e is a cycle in G, which violates our assumption.

(2) Again, by contradiction: let e= hu, vibe a cut edge of G and let x and y be two vertices in different components of G−e. Because there is an(x, y)-path P in G connecting x and y, we necessarily have that e is part of P. Assume that u precedes v when traversing P from x to y. Let P1be the(x, u)-path part of P and P2the(v, y)-path that is part of P. If e were part of a cycle C, then u and v would be connected in G−e through the path C−e. Let u^∗ be the first vertex common to P₁and C−e when traversing P₁from x. Likewise, let v^∗be the first vertex common to P2and C−e when traversing P2from y. Let a−→^Q b denote that part of path Q that connects vertex a to b. Clearly, the path x −^P→¹ u^{∗ C−e}−−→v^{∗ P}−→² y connects x and y in G−e, contradicting that e was a cut edge. Hence, e cannot be part of any cycle.

Nel documento Graph Theory and Complex Networks (pagine 117-123)