CSCI 151 - Program 2

B* - Trees

Dr. Melody Callan, Instructor

Paul Bell and Chris Moessmer, Teaching Assistants

Due Dates:

Your TAs, Paul and Chris, will provide the final input file(s) to run your program against by Tuesday, October 26.

*********************************************************

DEFINITIONS OF "FLOOR" AND "CEILING" FUNCTIONS:

Since I don't have the font available to draw these graphically, I will use the following notation -

Floor (expression) = the greatest whole integer contained in (less than or = to) the result of the expression.    E.g., floor (4/3) = 1, floor (6/3) = 2 

Ceiling = the smallest whole integer greater than or = to the result of the expression.                         E.g. ceiling (4/3) = 2, ceiling (6/3) = 2

TURNS OUT YOU DON'T NEED CEILING FOR ANY OF THE CALCULATIONS!  YOU CAN SIMPLY USE FLOOR.... SEE CHARTS BELOW!

********************************************************

Go here to see a full trace of Kruse's INSERT function:

********************************************************

B* Tree parameters:  Suppose you have a B* Tree of order m....

  Root Node Minimum Root Node Maximum Non Root Node Minimum Non Root Node Maximum
# of data elements 1 2*(floor(2/3(m-1)) floor(2/3(m-1)) m-1
# of children pointers 2 2*(floor(2/3(m-1)) + 1 floor(2/3(m-1)) + 1 m

So, for an order m = 9 B* Tree we have:

  Root Node Minimum Root Node Maximum Non Root Node Minimum Non Root Node Maximum
# of data elements 1 10 5 8
# of children pointers 2 11 6 9

Note: The above values are contrasted with that of a B-tree of order 9 which will have only 4 values in each node after a split.

****************************************************************************

Your are to implement an order 9 B*-Tree, as discussed in class and in Kruse's book, Chapter 11, problem E12, Page 556.  Although you are implementing for a constant order, your program should run by simply changing a single constant in your program.  Note that B*-trees delay node splitting until a node and its neighbor are full.  At that time the splitting process involves 2 full nodes and a total of 2*m data values.  These 2*m data values are then spread out over 3 nodes, each of the 3 nodes being at least floor(2/3) full.  The insert process for B-Trees needs to be modified such that once a node which is to be inserted into is full, the insert process looks at the neighbor to check its fullness.  We will agree that for any node and it's siblings we will always look to the right sibling node when our node becomes full, and rotate our overflow into that right sibling.  Once the right sibling becomes full also, we must do the 2 node to 3 node split process.  The exception is the rightmost sibling, for which, we must obviously look to its sibling to the left to rotate our overflow into and to use for a partner in the 2 to 3 split once both are full.  Coalescing becomes a 3 node down to 2 node coalesce.  More discussion on particulars in the lab.

***************************************

More info on B* versus B-trees: 

More details on the 2-node to 3-node split:  There will always be 2*m values involved in the 2 nodes becomes 3 nodes split.  2 of these values will be pushed up into the parent.  That leaves 2*m - 2 values to be distributed among the 3 nodes you are splitting into.  Note that, for some orders of trees, this number will not be divisible by 3.  E.g., for an order m=21 tree.  There will be (2*21)-2 = 42-2=40  values to split between 3 nodes.  In general, the grid below gives the sizes of the 3 nodes resulting from the 3 way split, where p is the leftmost node, q is the middle node of the three, and r is the rightmost node:

p q r
floor(2m/3) floor((2*m-1)/3) floor((2*m-2)/3)

Thus, for an order m=9 B* tree we have:

p q r
floor(2m/3)=6 floor((2*m-1)/3)=5 floor((2*m-2)/3)=5

Practice: To cement your own knowledge of the 2 to 3 way split.  Try the calculations for an order 7, 8 and 9 tree, to see what you come up with for the values involved in the 2 to 3 way split.  

Details on the root and root split:  (HERE IS SOME NEW INFO/CORRECTIONS) Roots are the exception to the rule on the 2 to 3 node split.  Since the root node is the first (original) node in the tree, it has to split to get more than one node!  Thus, we will need a routine to split the root node in half (not a 2 to 3 split, but a 1 to 2 split).  Ahah!  We can use the old B-tree 1 to 2 node split for this.  However, we must have the resulting nodes from the split be at least floor(2/3(m-1)), or 2/3 full.  This forces us to allow the root to be the one exception to the size that a node can grow.  We will in the case of a root node (and only in the case of the root node) allow the root to grow to the value 2*floor(2/3*(m-l)), but no bigger.  Capping the size of the data array of the root at this value will allow us to guarantee that, upon insert of the next value to grow past this cap, we will have enough values to split over the 2 nodes resulting in the split to guarantee that each of these nodes has floor(2/3(m-1)) values, or is 2/3 full.  E.g., in the case of the order m=21 tree, we will let the root (and root alone) grow to hold 2*floor(2/3*(m-l))=2*floor (2/3*20)=2*floor(40/3)=2*floor(13 1/3) =2*13=26 values.  Then, on insert of the next value, the root will split into 2 nodes.  The node on the left will hold 13 values, the node on the right will hold 13 values, and the new root will, of course, hold one value.  Thus, the count added up, of all nodes involved is 27 = 26 + the 1 new value being inserted.

Thus, we must handle the root a bit differently.  You will need to establish an appropriate node class definition that allows for the root to grow to the appropriate size.

NOTE:  

****************************

For your pseudocode, you will need to turn in the complete class definitions, and the pseudocode for all functions specified below.  In the case of the Print function, you will need to write your pseudocode from scratch.  In the cases of the Search, Insert and Delete function (and auxiliary functions) you will need to turn in the code for these from Kruse and identify which lines you will be changing or will be new and write pseudocode for these sections of "missing code".  Any new auxiliary functions you see a need for, give the pseudocode for these from scratch.  Any questions on pseudocode requirements will be clarified in the labs.

You are to structure your main program to accept and interpret file input with the following format:

You will need to write your program to accept a series of "commands" from an input file. Listed below are the possibilities for the different categories of commands that will follow the above line in the input file(s):

You will, of course, need to create a B*-Tree class that has at least the following member functions:

Output: Send your output for this program to a file.  For inserts, it is an error to insert a duplicate value in the B*-Tree structure, be sure to notify the user of such an attempt.  Tell the user if an operation is successful, as well.  Be sure to give appropriate "error" messages back to the user of your program, via the output file.  For e.g., if asked to delete a value from the B*-Tree structure that is not found in the structure, you should reply that the value was not found and so could not be deleted.  

Also Note: You are responsible to write a program that deals with *all cases*.  That is, your program should not only *work good on some input files*, it need to work on *all possible* input files.  This means it is not your TAs responsibility to give you an input file that allows your program to work.  It is your responsibility to write code that cannot be broken by any input file and so will work in all cases.  This will be the case for every program you write for this course.

Extra Credit:  If you implement the B* Tree for a general order, to be specified at run-time, then you will receive up to 10% extra credit for a possible score of 110% on the program.

A sample file follows that adheres to the input file format:

I  46

I 55

I 88

P

S 46

S 99

I 99

D 50

S 200

I 200

D 55

P

I 120

I 200

I 300

I 400

I 500

I 12

I 59

I 62

I 99

I 98

I 93

I 47

I 88

I 100

I 29

I 35

I 54

I 33

I 22

I 331

I 342

I 302

I 999

I 445

I 999

I 532

I 501

I 444

S 300

P

D 600

D 500