CSCI 311 - Program 2

B-Star Trees

Dr. Melody Stapleton, Instructor

Due Date: Monday, April 17, 2006 - BY NOON!!!

Your TA will provide the final input file(s) to run your program against.

Your are to implement an order 9 and an order 13 B-Star Tree, as discussed in class and described in this assignment.   The insert and delete processes for B-Trees needs to be modified to accommodate the special features given in this assignment.

B-Star Tree Special features: 

A B-Star Tree of order m is a search tree that satisfies the following properties (please note - in the following I have used "floor" to represent the floor function and "ceiling" to represent the ceiling function) :

For Inserts: 

You will write your insert function so that it always attempts to delay node splits on non-root "full nodes" in the following fashion - If the node you are inserting into (insertion node) is full and is a right-most sibling, then you will look to the immediate sibling to the left and see if there is room in that node to move the parent value over into the node and data member from the insertion node up into the parent position.   (This is somewhat like a "rotation" and is basically the "moveleft" function used by Kruse's delete).  If the insertion node is a left-most sibling, you will look to its immediate sibling to the right for the same purposes.  If the node has both left and right siblings, you will first attempt to move data into the left sibling, provided it is not full.  If the left sibling is full, then move the excess data into the right sibling. When you cannot move data into an immediate sibling to the left or right because they are full, then you must go ahead and split the original insertion node.  Please note that this rule for delaying splits needs to be applied "AT ALL LEVELS" of the tree.  This means that if a node split causes a new value to be inserted into the parent node, then if the parent node is full, one must first look to the siblings of that parent node to see if they have room to accommodate another value before the parent itself is split.  The moveleft or moveright algorithm when invoked at a higher level of the tree, other than the root, may result in a subtree having to "jump" over to a new parent to maintain the ordered search properties of the B-Star Tree.  Such a case as this will be demonstrated in class and in homework.

Thus for inserts into other than a root node, the idea is to delay splits until the right sibling of your node and the node your where originally trying to insert into are both full.  Then, you will use these two nodes as a pair to split.  If the node your were originally trying to insert into is the rightmost sibling, and if the sibling immediately to the left of it is also full, then you will use these two as a pair to split.  

For Deletes:

The idea is to delay combines here.  Combines at the level just below and including the root will be a 2 nodes combine into 1 combine.  Levels below this will be a 3 nodes combine into 2 combine.  If the node that is now below the minimum has a sibling on both the right and left, these are the nodes to combine with.  If either of these nodes has "room to borrow" from, then use move_right or move_left to delay the combine as long as possible.  Give preference to borrowing from the sibling to the left first, if you possibly can.  If not, then borrow from the sibling to the right.  If you are at a leftmost sibling, then use the 2 nodes immediately to the right to borrow from or combine with.  If you are at the rightmost sibling, then use the 2 nodes immediately to the left.  Note: for a delete from a rightmost sibling or a leftmost sibling, you may actually have to do 2 sequential move_right calls or similarly, 2 sequential move-left calls.  I.e., you may borrow from a node that is 2 nodes "away" from the deletion node.  Give preference to borrowing from the closest sibling first, in these cases.  Combines of 3 nodes to 2 will always involve 3 nodes that have all fallen to the level of having only 5 data values (for an order 9 tree) and a new delete has caused one of these to fall to only 4.  Combines of 2 nodes into 1 (root combines) will involve 2 sibling nodes (and they are their only siblings) that have fallen to 5 data values each and the new delete has caused one of the nodes to fall to 4 values. 

Similar to the issue of delaying splits for inserts "AT ALL LEVELS", you must write your solution to this programming problem so that combines are delayed at all levels of the tree.  

Furthermore, when deleting a value that is not in a leaf, always choose the inorder predecessor (which is guaranteed to be in a leaf) as the replacement value for this internal value.  Then, one will continue the delete from the leaf itself.  

Your TA may come up with additional requirements as he deems reasonable for this assignment.

Here is a link to appropriate node definitions to accommodate generic (base class) nodes and root nodes versus non-root nodes in your B* Trees:  Please note!  I have removed the node.cpp file from this directory  since it was an empty file and simply not needed!

***************************************

You are to structure your main program to accept and interpret file input with the following format:

You will need to write your program to accept a series of "commands" from an input file. Listed below are the possibilities for the different categories of commands that will follow the above line in the input file(s):

You will, of course, need to create a B-Star Tree class that has at least the following member functions:

Output: Send your output for this program to a file.  For inserts, it is an error to insert a duplicate value in the B-Star Tree structure, be sure to notify the user of such an attempt.  Tell the user if an operation is successful, as well.  Be sure to give appropriate "error" messages back to the user of your program, via the output file.  For e.g., if asked to delete a value from the B-Star Tree structure that is not found in the structure, you should reply that the value was not found and so could not be deleted.  

Also Note: You are responsible to write a program that deals with *all cases*.  That is, your program should not only *work good on some input files*, it need to work on *all possible* input files.  This means it is not your TAs responsibility to give you an input file that allows your program to work.  It is your responsibility to write code that cannot be broken by any input file and so will work in all cases.  This will be the case for every program you write for this course.

A sample file follows that adheres to the input file format:

A  46

A 55

A 88

P

S 46

S 99

A 99

D 50

S 200

A 200

D 55

P

A 120

A 200

A 300

A 400

A 500

A 12

A 59

A 62

A 99

A 98

A 93

A 47

A 88

A 100

A 29

A 35

A 54

A 33

A 22

A 331

A 342

A 302

A 999

A 445

A 999

A 532

A 501

A 444

S 300

P

D 600

D 500