CSCI 340: Operating Systems
Fall 2004
Program 1: UNIX find utility
Code due: Midnight Tuesday September 13
<== new time announced in class on 9/7   (copied to your ecst turn-in directory, see below)  
Grading Weight: 1 (programs will have a weight between 1 - 3)


Overview:
The goal of this program is to introduce you to UNIX system calls.  The program is simple (only about 150 lines of code), but you will need to learn about several system calls, and the syntax of system calls can be tricky.

Implement a rudimentary version of the UNIX find utility.  find searches a directory tree looking for a given filename.  The command line arguments should be a pathname followed by a target:

% find <pathname> <target>

For example,

% find /home/tyson  p1.html

will recursively search /home/tyson and all its subdirectories.  When it finds a file with the name "p1.html" it will print to standard output (cout) the path for that file (relative to the path specified on the command line):

/home/tyson/classes/15b.f05/public/p1.html
/home/tyson/classes/340.f05/public/p1.html

Another example,

% find foo a.cpp

would print paths of the form:

foo/a.cpp
foo/my_directory/a.cpp

Part 1 (85%): get find working w/o regular expressions, that is, the target is just a string.
Part 2 (10%): get find working w/regular expressions, that is, the target is a regular expression.
Part 3 (5 %): modify your find so that it does not follow symbolic links.

Program Requirements:

Consider the entries in a directory in the order given by readdir().  In other words, if you use readdir() to read the directory, your program will order the matches correctly.

Use the regcomp() and regexec() to implement regular expressions.

Implement your program so that the given regular expression only matches entire filenames or directory names.  For example, "x" should match the directory /blah/blah/x but not the directory /blah/blah/xx.  In other words, when the user enters the regular expression "exp", convert it into the expression "^exp$"

If a directory matches the target, print out the directory BEFORE you search the directory for matches (this is how the Unix find works).

Your program must output the same errors as my program so that I can grade the programs automatically.  Here are my error messages (you may copy this code into your program):

// from main(), these errors all have to do with bad input
// all of these errors cause the program to give up (that is, call exit(1))
cerr << "find: must specify path and target" << endl; // use this if there are not the right number of arguments
cerr << "find: empty path specified" << endl; // use this if some pinhead passes empty quotes as the path
cerr << "find: empty target specified" << endl; // use this is some pinhead passes empty quotes as the target
cerr << "find: permission denied <" << pathname << ">" << endl;
cerr << "find: directory does not exist <" << pathname << ">" <<endl;
cerr << "find: path is not a directory <" << pathname << ">" <<endl;
cerr << "find: error opening <" << pathname << ">" << endl;

// from my recursive search function, these errors do not stop the program
// if opendir() fails because of EACCES error
cerr << "find: permission denied <" << pathname << ">" << endl;
// if opendir() fails for any other reason, you can write your own message, I won't test this

// if stat() fails for any reason
cerr << "find: cannot stat <" << full_path << ">" << endl;



Testing Your Program:

I will provide sample tests for your program in the tests/p1 directory.

Each test consists of a directory (t01, and t01.tar), a shell script to execute your program (t01.cmd), the correct output (t01.out), and the correct error output (t01.err).  I will test your programs as follows:

% cd to student's directory
% ~tyson/340/tests/p1/t01.cmd > t01.out 2> t01.err
% diff t01.out ~tyson/340/tests/p1/t01.out
% diff t01.err ~tyson/340/tests/p1/t01.err

If there are any differences between the output of your program and the posted correct output, diff will print the difference, you fail this particular test, and you will lose points.  Thus the output of your program must be identical to the output of my program.

I will use tests that I don't post, so it is a good idea to develop some of your own tests.  My goal will be to exercise all the possible errors.

If you are programming using windows, watch out for the extra hidden character at the end of each line (the DOS standard is different than the UNIX standard).  I will only post UNIX files.  However, you can convert your DOS files to UNIX files using the command dos2unix.


Hints:

When working with system calls, it is very helpful to program incrementally.  Get a little piece of the program working and tested before you move on.

The logic of this program is simple, f
iguring how to use the system calls can be tedious.

It is much easier to implement this program recursively.  Specifically, write a function that searches a directory, and when you find a directory inside of the one currently being searched, call the search function:

search(directory, target)
{
 for all files in this directory
    if current file is a directory
       search(new_directory, target)
}

Directories can be open using opendir() and the files/directories in the directory can be extracted using readdir().

Directories can be closed using closedir().

The function stat() and the macro S_ISDIR() can be used to determine if a file is a file or a directory.  There is a similar function and macro that can be used to determine if a directory is a symbolic link (I'll leave it to you to find out what they are).

The functions regcomp() and regexec() can be used to implement regular expressions.  Write a small program so that you can learn how regcomp() and regexec() work before you put them in your find program.  Make sure your find program works without regular expressions (use strcmp()) before you add them.  A student found the following introduction to regular expressions.

The following c-style string functions maybe helpful:  strcpy(), strcat(), and strcmp().

In the tests that I post, the order of the output depends on the order that files are visited by readdir().  On some versions of readdir() the order is different than those I post.  I generate the output using Fedora Core 3.0.  This is the same order as Solaris, but different than Free BSD based UNIX.  While I will not take off points for machine dependency issues, it will make your life and my job easier if you test your program on a Linux (e.g. cheetah.ecst.csuchico.edu) or Solaris (e.g. tiglon.ecst.csuchico.edu) machine.


General Requirements:
I will deduct 10 percent if your program does not compile using the command "g++ -o find find.cpp" on a Linux machine. 

I will grade your program using another program, so if your program does not work exactly as specified you will lose points.  For example, if you put an extra space at the end of the line, or a blank line at the end of the output, you will lose points.  Test your program using the sample tests in the test directory (see above).

Please put your name at the top of all your files.  If you are working with someone, put both your names at the top of all files. 


How to turn in code:
You must turn in the following file:

find.cpp

copy the above file into the directory:
/user/projects/csci340/p1/USERNAME
where USERNAME is your ecst username.

If you are working with someone, only turn in one copy of the assignment.  It is very helpful is people working together always turn their assignments in under the same name.

Notes:   You will not be able to access your turn in directory once the deadline has passed.



Late assignments:
You will lose 10% for each 24 hours your assignment is late.

e-mail late assignments to me. 

It may take me much, much longer to grade late assignments (if I have already graded the other assignments, grading late assignments gets a low priority).

I use the time I receive your e-mail, not the time you send your e-mail as the turn in time (sometimes e-mail from an ISP takes a day or two).  If you want to be absolutely sure your assignment gets to me immediately, e-mail it from your ecst account (use your browser (webmail.ecst.csuchico.edu) to e-mail the file to me).