Objectives of the SIL Compiler Design Project:

  1. Implement an Interpreter for a subset of SIL with no functions (other than main function.)

  2. Implement a two-pass compiler for SIL for SIM machine.

  3. Implement Extensions suggested for SIL as far as you can. This work carries extra credit.

Implementation Roadmap

The following gives a tentative guideline for implementing the SIL Interpreter/Compiler Project.

The project must be carried out through various stages. This document does not contain information about extensions of SIL. Note that this is just a guideline for implementation and you shall and must deviate when you feel so.

Stage 1 Simple Calculator (Expected Time: One Week)

Using the LEX/YACC environment implement a simple calculator. The input is an integer arithmetic expression. Integers may be read in lex and passed to the YACC file. Write the grammar in YACC to evaluate the expression. Also write YACC semantic actions to write down the post fix form of the expression given. The grammar is E -> NUM | E+E | E*E | (E).

Stage 2: Simple Expression Tree (Expected Time: One Week)

Modify the attribute stack into a structure which can store either a value or a character (corresponding to an operator). Now modify the above calculator program to build an expression tree and evaluate (input remains the same, but now you first parse the expression into an expression tree and then evaluate it). Generate target code in Simple Integer Machine (SIM) for the calculator. SIM specification can be found here: http://www.athena.nitc.ac.in/~kmurali/Compiler/sim.html

The key issue in generating target code is register allocation. The simplest strategy would be to allocate the smallest numbered register when a new register is needed and free the highest numbered register after use so that a single variable for storing the register count can be used to keep track of the registers in use.

Stage 3: Adding Single letter Variables (Expected Time: One Week)

Allow integer variable names a,b,c,...z. The lexeme ID may be used to denote a variable. For each variable allocate one integer location (an array holding 26 integer entries suffices). Consider the grammar:

Slist -> Slist Stmt

Stmt -> ID = E; | read(ID); | write(E);

E -> E+E | E*E | NUM | ID

This allows programmes like :

a=3; b=5; read(c); write(a+b+c);

Write an interpreter for this small language. Now generate SIM target code for this language (now the same array used for storing the values of variables may be used to store their addresses (bindings) in the SIM Machine memory).

Stage 4: Abstract Syntax Tree (AST) : Adding Conditional and Iterative constructs. (Expected Time: One week)

Expand the syntax of statements to include conditional and iterative expressions:

Stmt -> ID=E; | read(ID) | write(E) | If (E) then Slist endif; | while (E) do Slist endwhile;

Allow relational operators: E -> E+E | E*E | (E) | E<E | E>E| E==E| NUM | ID Now you will have to combine statements and create an abstract syntax tree and then evaluate. The tree node structure given here can be used hereafter (for all remaining enhancements) for nodes in the expression tree. http://www.athena.nitc.ac.in/~kmurali/Compiler/treenode.html

Implement both an interpreter and SIM code generator (compiler for this language). Note that the language still does not have types or user defined variables.

The key issue in target code generation is

Stage 5: Symbol Table : Adding User Defined Variables (Expected Time: One week)

Allow ID to be user defined variables. Now you will need a symbol table for storing variables. The structure for symbol table is given here: http://www.athena.nitc.ac.in/~kmurali/Compiler/symbol.html.

The Global symbol table structure given in the link may be used. Implement interpreter as well as SIM target code generator.

Stage 6: Semantic Analysis : Adding Types (Expected Time: One week)

Modify your syntax to make the programming language syntax compatible with the Simple Integer Language (SIL) specification except functions we will take up functions in the next step). SIL specification is given here: http://www.athena.nitc.ac.in/~kmurali/Compiler/sil.html

You have to add logical operators to expression and boolean constants TRUE and FALSE. Ensure that each sub tree is type checked before adding connecting them together while forming the AST. This will ensure that AST is created only if the program has not semantic errors. If there is any error found, immediately report error. Once a type checked AST is formed, recursively traverse the tree and build both interpreter as well as SIM target code generator. Recursively descend the syntax tree to generate SIM target code.

Stage 7: Function Call Implementation (Expected Time: Two weeks)

Now we are ready to build the full compiler for SIL. Please read the language specification carefully before proceeding with the implementation. You need to generate SIM target code for the whole language. You need not do the interpreter for this stage. Here is a structural outline for the grammar: http://www.athena.nitc.ac.in/~kmurali/Compiler/grammar.html

All stages of the compiler will need minor (and sometimes major) modifications to incorporate function calls. Functions require global declaration and the global symbol table entry for a function must contain information such as its return type, type and name or each argument (note that function calls must be type checked for name equivalence), binding (label) etc. Functions require activation records to be created at run-time and hence local variables must be given offsets relative to the base of the activation record of a function. (Note that the BP register of SIM is provided to store the base address of the current activation record in memory). Each function requires separate local symbol table containing local variable entries declared in that function. After parsing a function and constructing its syntax tree, its code may be generated immediately before parsing the next function, so that at a given time only two symbol tables needs to be maintained. Before a function call, the current registers must be saved on stack. This is followed by arguments (using any convenient calling convention), Space for Return value, Return Address (automatically pushed by the SIM Call instruction). Space for local variables is allocated in the runtime stack by the called function.

Wish you good luck !