CSCI 151 - Program 2
B - Trees
Dr. Melody Stapleton, Instructor
Due Date:
Your TA will provide the final input file(s) to run your program against.
Your are to implement an order 3 and an order 7 B-Tree, as discussed in class and in Kruse's book The insert and delete processes for B-Trees needs to be modified to accommodate the special features given in this assignment.
Special features:
You will write your insert function so that it always attempts to delay node splits on "full nodes" in the following fashion - If the node you are inserting into (insertion node) is full and is a right-most sibling, then you will look to the immediate sibling to the left and see if there is room in that node to move the parent value over into the node and data member from the insertion node up into the parent position. (This is somewhat like a "rotation" and is basically the "moveleft" function used by Kruse's delete). If the insertion node is a leftmost sibling, you will look to its immediate sibling to the right for the same purposes. If the node has both left and right siblings, you will move data into the node that has the least number of data elements. If the left and right siblings have the same count for the number of data elements then always move into the node to the left (again, providing it has room). When you cannot move data into an immediate sibling to the left or right because it/they are full, then you must go ahead and split the original insertion node. Please note that this rule for delaying splits needs to be applied "AT ALL LEVELS" of the tree. This means that if a node split causes a new value to be inserted into the parent node, then if the parent node is full, one must first look to the siblings of that parent node to see if they have room to accommodate another value before the parent itself is split. The moveleft or moveright algorithm when invoked at a higher level of the tree, other than the root, may result in a subtree having to "jump" over to a new parent to maintain the ordered search properties of the B-tree. Such a case as this was demonstrated in class and in homework.
Similarly, you will write your deletion function so that it always attempts to delay node combines on "underflow nodes" in the following fashion - If the node you are deleting from (deletion node) is at minimum allowable count and is a right-most sibling, then you will look to the immediate sibling to the left and see if there is room in that node to "borrow" from it, or use the function "moveleft". . If the deletion node is a leftmost sibling, you will look to its immediate sibling to the right for the same purposes. If the node has both left and right siblings, you will borrow data from the node that has the greatest number of data elements. If the left and right siblings have the same count for the number of data elements then always move from the node to the left (again, providing it has enough elements to borrow one). When you cannot move data from an immediate sibling to the left or right because it/they are at the minimum count for number of data elements, then you must go ahead and combine the original insertion node with its neighbor that is already at a minimum count as is done by Kruse's code. Similar to the issue of delaying splits for inserts "AT ALL LEVELS", you must write your solution to this programming problem so that combines are delayed at all levels of the tree.
Furthermore, when deleting a value that is not in a leaf, one can choose the inorder predecessor or successor (which is guaranteed to be in a leaf) as the replacement value for this internal value. Then, one will continue the delete from the leaf itself. In order to make a "better" decision as to whether to use the predecessor or successor, look at the count for the number of values in the node that contains the predecessor and the node that contains the successor. If the successor count is greater than the predecessor count, then use the successor. Otherwise, (count in predecessor is <= that of successor), use the predecessor for the replacing the original value to be deleted. The idea behind this requirement is that based on your choice (predecessor vs. successor) you may delay another node combine by always picking the replacement to be from the more full node.
Your TA may come up with additional requirements as he deems reasonable for this assignment.
***************************************
You are to structure your main program to accept and interpret file input with the following format:
You will need to write your program to accept a series of "commands" from an input file. Listed below are the possibilities for the different categories of commands that will follow the above line in the input file(s):
You will, of course, need to create a B-Tree class that has at least the following member functions:
Output: Send your output for this program to a file. For inserts, it is an error to insert a duplicate value in the B-Tree structure, be sure to notify the user of such an attempt. Tell the user if an operation is successful, as well. Be sure to give appropriate "error" messages back to the user of your program, via the output file. For e.g., if asked to delete a value from the B-Tree structure that is not found in the structure, you should reply that the value was not found and so could not be deleted.
Also Note: You are responsible to write a program that deals with *all cases*. That is, your program should not only *work good on some input files*, it need to work on *all possible* input files. This means it is not your TAs responsibility to give you an input file that allows your program to work. It is your responsibility to write code that cannot be broken by any input file and so will work in all cases. This will be the case for every program you write for this course.
Extra Credit: If you implement the B Tree for a general order, to be specified at run-time, then you will receive up to 10% extra credit for a possible score of 110% on the program.
A sample file follows that adheres to the input file format:
A 46
A 55
A 88
P
S 46
S 99
A 99
D 50
S 200
A 200
D 55
P
A 120
A 200
A 300
A 400
A 500
A 12
A 59
A 62
A 99
A 98
A 93
A 47
A 88
A 100
A 29
A 35
A 54
A 33
A 22
A 331
A 342
A 302
A 999
A 445
A 999
A 532
A 501
A 444
S 300
P
D 600
D 500