Compiled by Ben Steel for CSCI 250
See also: Yacc Error Handling and Recovery
Both Lex and YACC create C code which can be compiled into a parser. The C function created by Lex serves to read input on a character level and break it into bite-size chunks for the grammar expressed in YACC. The C program created by YACC is basically a state machine that traces through the steps in fulfilling the grammatical rules laid down by the user in the .y file. To know which step to take next, YACC repeatedly calls the function that lex created to read input characters until a useful "chunk" is encountered. When this chunk is encountered, the lex function puts the actual text that it read into the string yytext and returns to the calling yacc program with a "token" describing what kind of chunk was encountered. For example if english were being read and interpreted, lex might read in the word "dog" and return the token "NOUN" to the calling YACC program. The original text "dog" would be held in yytext for YACC's use.
The following sample of Lex code shows how lex can be used to partition input text characters into meaningful "chunks."

one.l
The line preceding the first "%%" is a definition which is used in the rules section following the "%%." If two or more rules match the input, Lex obeys the top one. This is demonstrated by the bottom two lines. If the order of the bottom two lines were reversed, the unary addition (++) would never occur. There would just be two binary operations (+) in a row. Because of the above ordering, whenever the lex program reads a "+," it waits to see if the next character is another "+" before responding.
To run the above example, type it into a file called one.l. Then create the C code file lex.yy.c with the command:
lex one.l
Then, compile it into a working program called one with the command:
cc lex.yy.c -ll -o one.
The -ll loads the lex library.
As you run the resulting program, you'll notice that it doesn't interpret your keystrokes until you strike the return key. When you tire of experimenting with this parser, hold down the control key and type "c" to end it. A sample run is shown on the following page.
$a.out sdlj tag, value sdlj 2537 decimal number 2537 kslk345 tag, value kslk decimal number 345 <ctrl-c>
The last entry, "kslk345," shows how lex segments incoming text into bite-sized tokens for the yacc grammar.
| Expression | Meaning |
| x | The character "x" |
| "x" | An "x", even if x is an operator. |
| \x | An "x", even if x is an operator. |
| [xy] | The character x or y. |
| [x-z] | The character x, y, or z. |
| [^x] | Any character but x. |
| . | Any character but newline. |
| ^x | An x at the beginning of a line. |
| <y>x | An x when Lex is in start condition y. |
| x$ | An x at the end of a line. |
| x? | An optional x. |
| x* | 0,1,2,... instances of x. |
| x+ | 1,2,3,... instances of x. |
| x|y | An x or a y. |
| (x) | An x. |
| x/y | An x, but only if followed by a y. |
| {xx} | The translation of xx from the definitions section. |
| x{m,n} | m through n occurrences of x |
The following example shows how yacc can be used to force grammatical structure on the input. The only input the parser will accept is the rhyme "ding dong dell."
![]() two.l | ![]() two.y |
This example can be compiled by the following list of commands:
lex two.l yacc -dv two.y cc y.tab.c -ly -ll
The "-d" option tells yacc to generate a header file called "y.tab.h" which contains the necessary #defines to make the tokens DING, DONG, and DELL valid values. The -v option tells yacc to create the file y.output which contains human-readable documentation of the parser tables. The -ly and -ll options tell the C compiler to load the yacc and lex libraries respectively.
The following style hints appear in the original AT&T documentation memo for YACC:
a. Use all capital letters for token names, all lower case letters for nonterminal names. This rule comes under the heading of "knowing who to blame when things go wrong."
b. Put grammar rules and actions on separate lines. This allows either to be changed without an automatic need to change the other.
c. Put all rules with the same left hand side together. Put the left hand side in only once, and let all following rules begin with a vertical bar.
d. Put a semicolon only after the last rule with a given left hand side, and put the semicolon on a separate line. This allows new rules to be easily added.
e. Indent rule bodies by two tab stops, and action bodies by three tab stops.
Useful Lex and YACC projects often need custom C functions to implement things beyond the capabilities of Lex and YACC. The example below shows how C functions can be added to implement a stack. The example takes a C addition statement of the form:
a = b + c
and returns the wordy and reversed COBOL representation:
ADD c TO b GIVING a.
![]() Makefile | ![]() three.l |
![]() three.c | ![]() three.y |
The example program below shows one way to retrieve command line arguments from within a C program. This may prove useful in implementing a front-end for your compiler.

four.c