Background:
The Chico State Mini-C Compiler (CSMCC) is a student
training load-and-go compiler.
The source language is a subset of C and the target language is a stack-based
assembly
language.
CSMCC is implemented with the UNIX compiler utilities lex
and yacc. Lex converts
source programs into token streams, yacc uses productions to check the syntax
of programs,
and semantic routines embedded in the productions perform support actions and
emit code.
Program Components:
makefile
This is the file to invoke separate compilation. When all of the files below
are correct, typing:
make
compiles all routines and invokes lex and yacc. If make is successful, an
executable file called compile is produced. If there are unwanted conflicts
in the compiler's productions, the file y.output is also generated. CSMCC
programs are compiled by typing:
compile < t.c
where < redirects input and t.c is a properly configured CSMCC program.
The successful compilation of an input file causes a compilation listing and
the program's output to appear on the screen and produces the files output,
the
program's output and assy, the program's assembly language listing.
lex.l
The lex input file. It contains regular expressions for each token the
scanner is to recognize. When a properly configured file with the .l suffix
is lex'ed, such as:
lex lex.l
C code for a scanner is produced with the file name lex.yy.c. Lex'ing a file
with the -d option also produces a file called y.tab.h that assigns numbers
to each for coordination with the yacc routines.
c.y
This is the yacc input file. It contains the productions to verify program
syntactic correctness and has embedded routines to add semantic meaning
to the program. c.y also contains code that will become the compiler's main
routine. When a properly configured file with the .y suffix is yacc'ed, such
as:
yacc c.y
C code for parser is produced with the file name y.tab.c.
c.h
The .h file for c.y (i.e., y.tab.c).
symbol.c
The compiler's symbol table routines.
math.c
Code for mathematical functions.
init.c
Initialization routines for keywords and pre-compilation symbol table entries.
code.c
An interpreter for compiled code.
input
The program's input file. The file input must be present during compilation.
Current CSMCC Capabilities/Limitations:
1. Recognizes only integer and floating point data types.
2. Determines types implicitly (through use) rather than explicitly (through
declarations).
3. Uses the following operators in expressions: binary +, -, *, /, ^
(exponentiation),
unary -, certain mathematical (built-in) functions, and relational operators.
4. Implements the following at the statement-level: assignment, if-then,
if-then-else,
pre-loop-test while, ? :, printf, and scanf.
5. Does not recognize: comments, pointers, structures, typedefs, the for
statement,
the opening of datafiles for read and write, explicit declarations, the comma
expression, the use of any statements at the expression-level, or the use of
expressions as components of statements.
Program Requirements:
MINIMUM REQUIREMENTS FOR PASSING (D,C-, or C) GRADE
1. Implement comma expressions (i.e., a = b,c,d;).
2. Implement the post test while statement (while do).
3. Implement the for statement.
4. Extend ? : so it can be used at the expression-level and with expressions as
components (i.e., x = b>=0?y:z;).
5. Extend assignment so it can be used at the expression-level (i.e. x = y =
1;).
6. Implement pre and post increment operators as statements (i.e., a++;) and
expressions (i.e., b = ++a;)
7. Keep compilation listing and assy file up to date reflecting all changes.
ADDITIONAL REQUIREMENTS FOR IMPROVED (B) GRADE
8. Implement real and integer declarations as follows:
a. Declarations are only allowed at the top of the main() block, not in
sub-blocks
(other than for loop sub-blocks - see 6 below).
b. Enter type of variable in symbol table at declarations. Emit a compile-time
dual
declaration error if the variable already has a defined type.
c. Check for variable type in executable statements. Emit a compile-time
undeclared variable error if the type is undefined.
9. Implement type casting for integer/float variables (i.e., (int)a, where a is
declared
as a float, and (float) b, where b is declared as an int).
10. Implement type checking where where types cannot be mixed (i.e., report
error
for a = b + c where a and c are float type and b is int type. Note that
a = (float)b + c is O.K.
11. Allow C++ style for loop variable declarations (i.e., for(int i = 1;i <
10; i++) {...}).
OPTIONAL REQUIREMENTS TO DO VERY WELL (A).
12. Implement goto statements branching to a label (i.e.,
...
...
go to a;
b: ...
...
or
...
a: ...
go to b;
...
...
13. Implement += and *= operators.
14. Implement (non-heap) pointers (i.e., int *i, j, *k;
j=47;
i = &j;
i = k;
Although some changes to the file code.c are necessary, do
not make any changes unless
they have been cleared in advance by the instructor. Some changes will be
necessary to
support explicit declarations. You will find those changes in the file
new_code.c.