CSCI 311

Spring 2006

Program 2: B*star tree

Due: 17 April 2006 by noon.

NEW!! Sample program is complete.

Unfortunately the test suite is not.

I am planning to have the grades posted for p2 the same day it is due

The sample program is in written in C++. Consequently it only runs on jaguar.

If you find anything inconsistant with either this document or the tests send me email.

Overview:

You are to implement a B*star tree. See my sample program.
This is a VERY challenging assingment. I have decided NOT to require templates for this assignment. Templates and inheritance combined in one program will be too difficult.
You must begin this assignment as soon as possible and get help early.
Everything about this assignment from searching to printing to array manipulation is very tricky.

Very Important: for this assignment all tests will be weighted equally.

Make sure you use the formulas provided by Dr Stapleton. You can find them here.

Use them instead of shortcuts such as order - 1 because you will note handle order 13 correctly. There will be no easy points.

Name your executable 'btree'.

Output:

Your output will have the following format for the B*tree describe below:

//////////////////////////////////////////
// You don't ouput this fancy ascii B*tree
//
//       Node: 0 [60,-,-,-,-,-,-,-,-,-]
//               / \
//              /   \ 
//             /     \
//            /       \
//           /         \
//          /           \
//  [10,20,30,40,50]  [70,80,90,100,110]
//////////////////////////////////////////

Drawing tree:
-------------------------------------
Node: (0) capacity = 1/10
Data: [60,_,_,_,_,_,_,_,_,_]
Ptrs: <1,2,_,_,_,_,_,_,_,_,_>

Node: (1) capacity = 5/8
Data: [10,20,30,40,50,_,_,_]
Ptrs: <_,_,_,_,_,_,_,_,_>

Node: (2) capacity = 5/8
Data: [70,80,90,100,110,_,_,_]
Ptrs: <_,_,_,_,_,_,_,_,_>
_____________________________________

FORMAT:
Node: (node-id) capacity = count/max
Data: [value,value,blank = _,.....]
Ptrs: <NULL = _,node-id,.....>

NOTE:  the node that is the root of the tree should always have id=0.  Always have it print first.

Subsequent nodes print in their search order.

In addition to the ouput above have your splitting functions output messages that a split has occured:

splitting root node: median = 60
splitting node: medians = 51,90
Where the numbers are the values that will be moving up into the parent node.

Pushing values to adjacent nodes should print out two messages. The node that went to the parent and the node that went fromt the parent into the adjacent node.

pushing: 54 from node(1) to parent
pushing: 60 from parent to node(2)

Combines should print out a message indicated which nodes were involved:

combined nodes: 8, 9, 10 into nodes: 8, 9
or
root combined nodes: 13, 14 into root

Input:

Your main.cpp will read in the commands from standard input again.

Input summary:

==============================================================================
NOTE:numbers you do not need to worry about negative numbers
NOTE:assume all input will be formatted correctly
'//....'		comments.  Yes put comments in your test inputs
'A' number
'D' number
'S' number
'P'
'V' 
==============================================================================

You main.cpp will output to stdout. This way you can redirect your output to a file of your choosing.

You may output whatever you like to cerr. cerr is just like cout except that when you write to it with << it can be redirectly separately. You can still see it print out normally because cerr by default goes to the same place as cout.


$btree < tests/t01 > tests/t01.out 2> tests/t01.cerr
Would redirect cerr to a file called t01.cerr in the tests subdirectory.
$cat tests/t01.cerr

Here is an in depth break down of the commands:

==============================================================
NUMBER
    This is very import.  The very first line of the input will always be the order.

    It makes no sense to allow adding,deleting,searching etc on a tree that has no order define.  

    Do not use a default.

    To simply your input handling routines you may safely assume that the first line is the search order.

    To aid in remebering to always issue a search order when the program starts output this prompt:

    

    B*tree order:
    
Read in the order and then echo it back. Run the sample program to better understand this behavior. ============================================================== Lines that begin with '/' are comments until the end of the line ** line 1 MUST NOT BE a comment ** ============================================================== ============================================================== A INTERGER_KEY 'A' is for add. ============================================================== You do not have to worry about checking the input. All input will be correctly formatted. You need to echo that a number was added to the tree:

added: 2
You need to output this message if the number being added is already in the tree:

duplicate rejected: 2
============================================================== D INTEGER_KEY 'D' is for delete. When the number of a delete command is not in the tree output this message: "delete failed: "<number>" is not in tree" ============================================================== S INTEGER_KEY 'S' is for search. When the value to be searched is present output:

found: 11
When the value to be searched is NOT present output:

search: 90 is not in tree"
============================================================== V 'V' is for verbose output. The output of this program is UGLY no matter how you look at it. So the output format looks very busy. It is essential that you can read it and build the tree on paper. The format is as follows: Drawing tree: ------------------------------------ first line: ["node-id": list of values] second line: {list of pointers} when the pointer is null print "null", when it points to a node print that nodes id instead. . . remaining nodes . . _____________________________________ Example: **see any test** Important the order is: print the node first, then follow each of its pointers. Wacky yes, but this ensures that root prints first. followed by the other nodes. Use this see what your tree looks like. ============================================================== P 'P' is for search order print Just print the values stored in the B*tree in search order Example: printing 'P' for the above B*tree Printing tree: 10 20 30 40 50 51 52 53 60 70 80 90 100 110 120 130 140 150 Use this to be sure your tree has maintained its search properties. ==============================================================

How to test your program:

**TIPS** First set up your ecst account with a SSH key. Second add a rule to your Makefile that looks like this: Third prefer to logon to jaguar.ecst.csuchico.edu


tests:
    rm -i tests/*
    scp username@jaguar.ecst.csuchico.edu:/user/s4/admiral/p2_tests.tar .
To obtain the tests when logged into your ecst account.

$cp /user/s4/admiral/p2_tests.tar .
$tar xvf p1_tests.tar
Please familiarize yourself with how the program is to behave by giving my sample program a test drive:
Testable output:

$./example_btree
Manual Example:
$./example_btree< tests/t01 > tests/t01.myout
$diff tests/t01.myout test/t01.out
$
The diff command will output nothing if your output matches mine. However if there are differences diff will tell you which lines are your file but not mine and vice versa.

Diff will be the basis of how I will test your program's outputs against my absolutely 100% correct output.

Again you will find a perl script to run all the tests.

$./run_p2_tests
I advise running the script after every change you make to your program. You can edit the script with your text editor to change the directory where it looks for files and the program it uses to generate your outputs.

How to submit the program:

NEW!!
You will find a turn-in directory under:
/user/projects/csci311/p2/your-user-name
All your .cpp and .h files and your Makefile should be turned in.
I will compile your programs when I grade them so DO NOT turn in an executable or any .o files.
This time I want you to provide a clean: rule in your make file. Looks like this:


clean:
    rm *.o btree

I would advise you to make a rule for copying your program into the turn directory. Use whenever you hit a major milestone in your program.


turn_in:
    cp *.cpp *.h Makefile /user/projects/csci311/p1/user_name
To automatically copy your program into the turn directory type:

$make turn_in

Hints:

This program is an ORDER of magnitude more challenging than the previous program.

Nobody using pico as their editing environment will be able to finish this assignment. You must learn to use an editor like vim. If you have any questions about what vim can or cannot do send me email.

IMPORTANT: use values of 0 or -1 to mark node data as empty. Don't worry about input that may be zero or negative one.

Node id

In order to make the verbose output ledgeable you need to give each node a unique id.

To accomplish this make a static variable for the class Node.


class Node
{
.
.
static int node_id;
.
.
}
In your node.cpp file outside any functions, provide the storage for node_id:

/* static */ int Node::node_id = 0;
Now in your constructors for B_root and B_node:

    m_id = ++Node::node_id
This gives each node successive id's in the order that they are created. You must do this to match my output files.

Step1:

Start the project by making the following files:

node.h
node.cpp
bnode.h
bnode.cpp
broot.h
broot.cpp
btree.h
btree.cpp
main.cpp
Makefile

Inheritance:

         <node>
        /      \
       /        \
<broot_node>  <bnode>

Make the folloing member functions virtual:
split()
combine()
~destructors()
Use 'touch' to make empty files. Then create your Makefile. So what you should have is a project that compiles with the make command. From now on you must keep your project in a state that compiles. This will not be easy since you will be using inheritance.

The VAST majority of your time will be spent DEBUGGING. You must get past syntax/compiler gotchas this week. Start early and seek help immediately.

Step2:

Get main finished. It is real simple. Make a loop that reads in the character into a switch statement..... c << cin switch(c) case 'A': data << cin ......

NOTE: use cin.ignore(1000, '\n'); after your first spot a comment. For some system cin.ignore('\n') seems to ignore the rest of the input.

Make sure that you read in the order as the first line. Print out a warning to yourself if you forget the order. I will NOT be testing for this.

Much like the previous assignment add the public interface functions to your btree.h/btree.cpp files that main will call. They take in a int and return a boolean.

Step3:

Try to understand the search and print algorithms first. They are very similar. You must understand how these work in order to decipher the output of the program.

Step4:

Now you need to implement Kruse's search_within_node function. Understand what it does and how it works.

In order to solve this program you will be making heavy use of "output parameters". It is essential you understand what these are.

We will discuss search_within_node() in lab this week. So be there.

Step4:

Now that you have search, search_within_node, fill out the constructors, and lay down a skeleton for the insert function.

Get your node.print() to output what my sample program outputs. It is harder to do than you might think.

Step5:

To enable splits you need a whole suite of helper functions. We will discuss this in a session outside of lab this week if time permits.

Consider this

You cannot really bootstrap this program until you have both a print and some portion of insert working. Verbose print will be necessary for you to debug insert.

Unfortunately you cannot use Kruse's btree insert. B*tree splitting is fundamentally very different.

You will not be able to find anything online that will help you understand the algorithm for B*trees.

Templates for this assignment would make this assign too hard. They add a lot of extra typeing too. Not to mention how many gotchas you will encounter.

Debugging

Compiler:

First you will want to turn on compiler warnings:

Makefile:
btree: main.o btree.o broot.o bnode.o node.o
	g++ -g -Wall -o main.o btree.o broot.o bnode.o node.o

main.o: main.cpp btree.o
    g++ -g -Wall -c main.cpp
.
.
.
node.o: node.h node.cpp
    g++ -g -Wall -c node.cpp

Debugger:

When you encounter segmentation faults don't panic. You will get many of these. They are just minor annoyances.

What to do when you get one:


$btree < tests/t01
Segmentation fault
$
Start the debugger, gdb on your file like so:

$gdb btree
Now you are inside the dugger. The prompt is (gdb)

(gdb)
use the run command of the debugger. Pass in the usual arguements such as which test to use as input:

(gdb) run < tests/t01
.
normal output....
.
Program received signal SIGSEGV, Segmentation fault.
0x0804b4e4 in Node::print (this=0x0) at node.cpp:9
Now sometimes the line number and file are useful and sometimes not. In this case the source file where the segmentation fault occured is node.cpp, the line number is 9.

Notice that the this pointer is NULL (0x0 is Hexidecimal for 0). To find out more about how Node::print() was called on a NULL pointer use the where command to observe the runtime stack:

(gdb) where
#0  0x0804b4e4 in Node::print (this=0x0) at node.cpp:9
#1  0x0804bc95 in Node::push_in (this=0x8050008, new_value=1, new_pointer=0x0, position=0)
    at node.cpp:153
#2  0x080494e0 in B_tree::push_down (this=0xbffa7d0c, node=0x8050008, new_value=1,
    insert_value=@0xbffa7cc8, insert_branch=@0xbffa7ccc) at btree.cpp:193
#3  0x08049bb2 in B_tree::insert (this=0xbffa7d0c, new_value=1) at btree.cpp:22
#4  0x08048cec in main (argv=0x1, argc=-1074102860) at main.cpp:41

runtime stack:
==================
Node::print()
Node::push_in()
B_tree::push_down()
B_tree::insert()
main()
Now we are talking! Files, line numbers. So Node::push_in() called print() on a null pointer from line 153. This is usually enough to figure out what happened.

If you need more info use the frame command to dive into a function on the stack to query the values of variables:

(gdb) frame 1

(gdb) print variable
$1 = (Node*) 0x0

Heavy Duty debugging: valgrind

NEW!! Valgrind is installed on jaguar under: /opt/valgrind

When you use arrays extensively in a sizeable program eventually you will get a corruption in the program that seemingly comes from nowhere.

These are most likely caused by index out of bounds bugs with your arrays. This occurs when in a statement like: m_data[i] = 0; the variable i does not have the value you think it does. If i is larger than the size of the array then somewhere in your program you overwrote 4 bytes with zero.

Even if you know what function is probably doing it, you will most likely NEVER find which line is doing it. Valgrind is a memory profiler. Run your program in it:


$/opt/valgrind --tool=memcheck --leak-check=yes btree < tests/tXX > valgrind.out 2>&1
Now:

$less valgrind.out
Skip past the first 100 lines or so.

Look for lines that say:

==(process id)==  Invalid write of size 4 (int is 4 bytes)
NEXT LINE WILL TELL YOU THE LINE IN THE SOURCE CODE THAT IS INDEXING OUT OF BOUNDS!!

Late assignments:

I will not be accepting late assignments.

What I want from you is to turn in what you have.

Please add a README to your turn in directory explaining what isn't working. This is not required if you have everything working.

If you don't include a README I will not give you partial credit. I probably won't give you partial credit anyway. However if can tell me how to fix what is broken with your program I would give you the points for that test.

"Turn off" any portions of your code that are broken so that what I get from you will compile.

DO NOT TURN IN NOTHING!! On the last assignment several of you turned in nothing. 50% of the points on p1 came from having a functioning BINARY TREE. About an hour's worth of effort!! There is no such reprieve on this assignment.