2-3 and B-Trees
Section 16.4

m-Way Search Trees

Motivation

By considering search trees of degree greater than 2, we can arrive at tree structures for which insertion and deletion algorithms are simpler than those for AVL trees, yet still have O(logn) complexity.

B-Trees are the de facto standard for organizing large database systems and for external searching and sorting

They allow use of the Indexed Sequential Access Method (ISAM) for external directory sequential and random access. See page 636 for more discussion.

Basically, the idea is records do not all fit in the internal main memory and are stored in a file on disk.
IBM's Virtual Storage Access Method (VSAM) file organization uses B-tree.

In a B-tree an order is used to group the keys of records so that the B-tree will not grow very high. It grows from the bottom (leaf) to outward and upward.

The grouping of keys in each node is used to minimize the disk access time.

The size of the group defines the order of the B-tree.

The keys are used for comparison and can be page numbers in a disk storage system.

Multiway Search Trees

m-Way Search Trees


Figure 16.23 A seven-way search tree
Although it is useful to include external nodes when defining an m-way search tree, external nodes are not physcally represented in actual implementations. Rather, a null pointer appears wherever there would otherwise be an external node.

Definition: A B-Tree of order m is an m-way tree such that

Figure 16.24 A B-Tree of order 7

Online slides for B-Trees if you want more information.
We will focus on a specific B-Tree

2-3 Trees

A B-tree of order 3 is a 2-3 tree

The name reflects the fact that each internal node in a 2-3 tree has degree two or three.
A degree-two node is called a 2-node; a degree-three node is called a 3-node.

Definition: A 2-3 tree is a search tree that either is empty or satisfies the following properties:

  1. Each internal node is a 2-node or a 3-node. A 2-node has one element; a 3-node has two elements

  2. Let LeftChild and MiddleChild denote the children of a 2-node. Let dataL be the element in this node, and let dataL.key be its key. All elements in the 2-3 subtree with root LeftChild have key less than dataL.key, whereas all elements in the 2-3 subtree with root MiddleChild have key greater than dataL.key.

  3. Let LeftChild, MiddleChild, and RightChild denote the children of a 3-node. Let dataL anddataR be the two elements in this node.
    Then dataL.key < dataR.key; all keys in the 2-3 subtree with root LeftChild are less than dataL.key; all keys in the 2-3 subtree with root MiddleChild are less than dataR.key and greater than dataL.key, and all keys in the 2-3 tree with root RightChild are greater than dataR.key.

  4. All external nodes are at the same level.

Figure 16.25 A B-Tree of order 3, and another. (Second example illustrates a 2 node with the second side "empty".)

Remember: External nodes are introduced only for ease of communication. They are not physically represented inside the computer (rather they are references to null)

By convention, assume that no element has key MAXKEY and if only 2-node, dataR.key = MAXKEY

Its single element is kept in dataL, and LeftChild and MiddleChild reference its two children.

<KeyType> in the examples simply indicates that the Object must have a key, i.e, what we will search on. (For example, we shall assume that no element has key MAXKEY and adopt the convention that a 2 node has dataR.key=MAXKEY. In this situation, the 2 nodes single element is kept in data.L, an the LeftChild and MiddleChild point to its children. Its RightChild data member may be assigned any arbitrary value.)

Note, I thought this book had more on 2-3 trees than it did. I am sorry. I have translated the notes and (pseudo) code in these notes from a C++ book (from "Fundamentals of Data Structures in C++", Horowitz, Sahni, Mehta), so if I left pieces in by mistake, try to "read around" them - but please let me know and I will correct it)

Classes


class Two3Node {
public:
private:
   Element dataL, dataR;
   Two3Node LeftChild, MiddleChild, RightChild, parent;

   void display(int i);
   void treeprint(int i);
   int compare(Element x);
   void putIn(Element x, Two3Node node);
}

class Two3 {
private:
   Two3Node root;
   KeyType MAXKEY;

public:
   // Constructor

   boolean insert(Element x);
   boolean delete(Element x);
   Two3Node search(Element x);

   Two3Node findNode(Element x);
   void newRoot(Element x, Two3Node a);
   Element split(Two3Node node, Element x, Two3Node node2, Two3Node node3);
   void treeprint() 
   void display() 
}

Search

Code below searches a 2-3 tree for a node that contains an element with key x. The search method uses a method compare that compares a key x with the keys in a given node p. The method compare returns the values 1, 2, 3, 4, respectively, depending upon whether x is less than the first key, between the first and the second keys, greater than the second key, or equal to one of the keys in node p. The number of iterations of the for loop is bounded by the height of the 2-3 tree.
Hence, if the tree has n nodes, the complexity of search is O(log n)

Two3 tree Search method pseudo:

public Two3Node search(Element x)
{  // Search the 2-3 tree for an element x.  If the element is not in the tree, then
   // return null.  Otherwise, return a pointer to the node that contains the element.

   for (Two3Node p = root; p;)
      switch(p.compare(x))
      {
	 case 1: p = p.LeftChild; break;
	 case 2: p = p.MiddleChild; break;
	 case 3: p = p.RightChild; break;
	 case 4: return p;
      }
   return null;
}

Search looks left, middle, right (each as deep as can)

Insertion:

If insertion in that node puts value into dataR.key,

Insertion examples:

Insertion sometimes requires splits

Search for the unique location for indexed key finds that node (say p) full. (i.e., one is at a node with 3 values now)

Splits:

  1. create a new node q
    (p is split into p, q, and a median element)
  2. q will contain the element with the largest value of the three
  3. the element with the smallest will stay in p (the node found)
  4. the median element with the median key (and the reference to the new node q) will be inserted into the parent of p putIn
With the help of the web site: add 44 with explicit splits

public boolean Two3 insert(Element y)
{
	// Insert the element y into the 2-3 tree only if it does not already contain
       // an element with the key

   Two3Node p;
   Element x = y;
   if (x.key == MAXKEY) return false;    // invalid key
   if (root ==null) { newRoot(x, null); return true;}    // empty 2-3 tree
   p = findNode(x);
   if (p == null) {/*insertionError(); */ return false;}   // key already in 2-3 tree
   for (Two3Node a = null;;)
      if (p.dataR.key == MAXKEY) {   // p is a 2-node
	 p.putIn(x,a);
	 return true;
      }
      else {                         // p is a 3 node
	 Two3Node olda = a;
	 a = new(Two3Node);
	 x = split(p, x, olda, a);
	 if (root == p) {
	    newRoot(x, a);
	    return true;
	 }
	 else p = p.parent();
      }
}

This method makes use of additional methods that you will need to implement: (from "Fundamentals of Data Structures in C++", Horowitz, Sahni, Mehta)

  1. void newRoot(Element x, Two3Node a)
      This is invoked when the root of the 2-3 tree is to change. The single element x is inserted into the new root, while a is made its MiddleChild. The old root becomes the LeftChild of the new root
  2. Two3Node findNode(Element x)
      This is a modified version of search(Element x) (above). It searches a nonempty 2-3 tree for the presence of an element with key x.key. If this key is present, then it returns null. Otherwise, it returns the leaf node encountered in the search. Method insert() uses the variable p to store the node returned by findNode. findNode also creates a data structure that enables us to return from p to root. This data structure could be a list of the nodes on the path from root to p. Such a data structure is needed, as following a node split, it is necessary to access the parent of the node that was split.
  3. void insertionError(Element x, Two3Node root)
      When an attempt to insert an element whose key corresponds to that of an element already in the 2-3 tree is made, an insertion error occurs. This method signals an error.
  4. void putIn(Element x, Two3Node node)
      This method is used to insert an element x into a node (this) that has exactly one element in it. The subtree a is to be placed immediately to the right of x. So, if x becomes dataL, then a becomes MiddleChild, and the previous values of dataL and MiddleChild move to dataR and RightChild, respectively. If x becomes dataR, then a becomes RightChild
  5. Element split(Two3Node p, Element x, Two3Node olda, Two3Node a)
      This operates on a Two3Node (p) that initially contains two elements as follows: The newly created, empty node pointed at by a will contain the element with the largest key from among the two elements initially in p and the element x.
      The element with the smallest key will be the only element left in p. The three original children of p and the olda will occupy the four children data members that need to be defined in p and a. The method returns the element with the median key.

In insert, x denotes the element to be inserted into node p, and a denotes the node that was newly created at the last iteration of the for loop. For complexity analysis, we see that the total time taken is proportional to the height of the 2-3 tree.
Hence, insertion into a 2-3 tree with n elements takes O(log n) time.

Deletions:

Deletion from a 2-3 tree is conceptually no harder than insertion. If we are deleting an element that is not in a leaf node, then we transform this into a deletion from a leaf node by replacing the deleted element with a suitable element that is in a leaf. For example, if we are to delete the element with key 50 that is in the root of , then this element can be replaced by either the element with key 20 or the element with key 60. Both are leaf nodes. In a general situation, we may use either the element with the largest key in the subtree on the left or the element with the smallest key in the subtree on the right of the element being deleted.

Deletion examples:

Remember: If we are deleting an element that is not a leaf node, then we transform this into a deletion from a leaf node by replacing the deleted node by a suitable element that is in a leaf.

Thus we only have to "worry" about deletions from leaf nodes.

Also, the text these notes came from, say in the above example, "to delete the element with key 70, we must merely set dataR.key = MAXKEY in node C". Note that this implies that the code they provide herein, does not delete or make a cell null, but rather just sets its key value to be the specified conventional value for keys not allowed.

The patterns of what can happen will most certainly help you with writing your code.

Steps in deletion from a leaf of a 2-3 tree:
  1. Modify node p as necessary to reflect its status after the desired element has been deleted

  2. for ( ; p has zero elements && p != root ; p = r)
        {
    	let r be the parent of p, and let q be the left or right sibling of p (as appropriate);
    	if (q is a 3-node) perform a rotation
    	else perform a combine;
         }  
  3. If p has zero elements, then p must be the root. The left child of p becomes the new root, and node p is deleted.

Refinement of step 1 above:

public void deleteKey(Two3Node p, Element x)
  // Key x.key is to be deleted from the leaf node p
{
   if (x.key == p.dataL.key)  // delete first element
	if (p.dataR.key != MAXKEY)
	{  // p is a 3-node
		p.dataL = p.dataR;
		p.dataR.key = MAXKEY;
	}
	else
		p.dataL.key = MAXKEY;  // p is a 2-node
    else
	p.dataR.key = MAXKEY;  // delete second element
}
Code for a rotation when p is a left child of r:
  // Rotation when p is a left child of r, q is a middle child of r
   	p.dataL =  r.dataL;
	p.MiddleChild =  q.LeftChild; 
	r.dataL =  q.dataL;

	q.dataL =  q.dataR;
	q.LeftChild = q.MiddleChild;
	q.MiddleChild = q.RightChild;
   	q.dataR.key = MAXKEY;  
Finally, code to perform a combine when p is a left child of r: (cases 1 and 2 )
  // Combine when p is a left child of r, q is a right sibling of p

   	p.dataL =  r.dataL;
	p.dataR =  q.dataL;
	p.MiddleChild =  q.LeftChild;
	p.RightChild =  q.MiddleChild;  

	if (r.dataR.key == MAXKEY) // r was a 2-node
		r.dataL = MAXKEY;
	else
		{
		r.dataL.key = r.dataR;   
   		r.dataR.key = MAXKEY;  
		r.MiddleChild = r.RightChild;
		}

Analysis of 2-3 tree deletion: It should be evident that an individual rotation or combine takes O(1) time. If a rotation is performed, deletion is completed. If a combine is performed, p moves up a level in the 2-3 tree. Hence , the number if combines that can be performed cannot exceed the height of the 2-3 tree.
Consequently, deletion from a 2-3 tree with n elements takes O(log n) time.

Summary of methods you will need to do for 2-3 Tree implementation:

By the way, I have seen a recursive delete method for 2-3 trees...

Don't forget about this helpful applet page