CSCI-515 Term Project
Background:
The
CSMCC is implemented with the UNIX compiler utilities lex and yacc. Lex converts source programs into token streams, yacc uses productions to check the syntax of programs, and semantic routines embedded in the productions perform support actions and emit code.
Program Components:
makefile This is the file to invoke separate compilation. When all of the files below are correct, typing:
make
compiles all routines and invokes lex and yacc. If make is successful, an executable file called compile is produced. If there are unwanted conflicts in the compiler's productions, the file y.output is also generated. CSMCC programs are compiled by typing:
compile < t.c
where < redirects input and t.c is a properly configured CSMCC program. The successful compilation of an input file causes a compilation listing and the program's output to appear on the screen and produces the files:
1.
output,
the program's output
2. assy, the program's assembly language listing.
lex.l The lex input file. It contains regular expressions for each token the scanner is to recognize. When a properly configured file with the .l suffix is lex'ed, such as:
lex lex.l
C code for a scanner is produced with the file name lex.yy.c. Lex'ing a file with the -d option also produces a file called y.tab.h that assigns numbers to each for coordination with the yacc routines.
c.y This is the yacc input file. It contains the productions to verify program syntactic correctness and has embedded routines to add semantic meaning to the program. c.y also contains code that will become the compiler's main routine. When a properly configured file with the .y suffix is yacc'ed, such as:
yacc c.y
C
code for parser is produced with the file name y.tab.c.
c.h
The .h file for c.y (i.e., y.tab.c).
symbol.c
The compiler's symbol table routines.
math.c
Code for mathematical functions.
init.c
Initialization routines for keywords and
pre-compilation symbol table entries.
code.c
An interpreter for compiled code.
input
The program's input file. The file
input must be present during compilation.
Current CSMCC Capabilities/Limitations:
1. Recognizes only integer and floating point data types.
2. Determines types implicitly (through use) rather than explicitly (through declarations).
3. Uses the following operators in expressions: binary +, -, *, /, ^ (exponentiation), unary -, certain mathematical (built-in) functions, and relational operators.
4. Implements the following at the statement-level: assignment, if-then, if-then-else, pre-loop-test while, ? :, printf, and scanf.
5. Does not recognize: comments, pointers, structures, typedefs, the for statement, the opening of datafiles for read and write, explicit declarations, the comma expression, the use of any statements at the expression-level, or the use of expressions as components of statements.
Program Requirements:
MINIMUM REQUIREMENTS FOR PASSING (D,C-, C or C+) GRADE
1. Implement comma expressions (i.e., a = b,c,d;).
2. Implement variables that can contain underscores.
3. Implement the post test while statement (do while).
4. Implement the for statement.
5. Extend ? : so it can be used at the expression-level and with expressions as components (i.e., x = b>=0?y:z;).
6. Extend assignment so it can be used at the expression-level (i.e. x = y = 49;).
7. Implement pre and post increment operators as statements (i.e., a++;) and expressions (i.e., b = ++a;). Note that implementation of pre and post decrement is not required.
8. Keep compilation listing and assy file up to date reflecting all changes.
ADDITIONAL REQUIREMENTS FOR IMPROVED (B-, B, B+) GRADE)
9.
Implement real and integer declarations as
follows:
a. Declarations
are only allowed at the top of the main() block, not in sub-blocks (other
than for loop sub-blocks ¾ see 12 below).
b. Enter
type of variable in symbol table at declaration productions. Emit a
compile-time dual declaration error if the variable already has a defined
type.
c. Check
for variable type in executable statements. Emit a compile-time undeclared
variable error if the type is undefined.
10. Implement type casting for integer/float variables (i.e., (int)a, where a is declared as a float, and (float) b, where b is declared as an int).
11. Implement type checking where types cannot be mixed (i.e., report error for a = b + c where a and c are float type and b is int type. Note that a = (float)b + c is O.K.
12. Allow C++ style for loop variable declarations (i.e., for(int i = 1;i < 10; i++) {...}).
OPTIONAL REQUIREMENTS TO DO VERY WELL (A-, A)
13. Implement goto statements branching to a label (i.e.,
...
...
go to
a;
b: ...
...
or
...
a: ...
go to b;
...
...
14. Implement
binary and (&&) and or (||) operators.
For example (a>b||c==d).
15. Implement
(non-heap) pointers ¾ i.e., int *i, j, *k;
§
j=47;
§
i = &j;
§
i = k;
Although some changes to the file code.c are necessary, do not make any changes unless they have been cleared in advance by the instructor. Some changes are necessary to support explicit declarations and to implement the && and || operators. You will find those changes in the file new_code.c.
Submission Procedures:
You may work in groups of up to 3, but if you represent your work as group work MAKE SURE THAT IT IS IN FACT GROUP WORK.
Submit hard copies of all modified source code files and sample output for all program requirements that were implemented. Identify clearly those requirements that were either not implemented or which are not fully functional.
After the hard copy is submitted schedule a time to demonstrate the
project to the instructor the last week of classes. If you
completed your project on the unix
workstations, you can demonstrate your project at the instructor's
machine. If you conpleted it on a PC,
bring you project files to the demonstration on a 3.5 inch floppy disk. Note that distance learning students may
e-mail their project and are not required to appear in person.