Translation of Programs
- Translation is the process of transforming a program into some intermediate representation
- Input to the translation process is the representation of the program as produced by the parser after it has been subject to semantic analysis
- Conceptually translation is performed on a parse tree for the program
- Major translation issues are expressions and control structures
- Translation builds the programs control flow graph
Abstract Syntax Tree Directed Translation
- Perform semantic analysis and code generation by walking the AST
- Write the code translation as an AST visitor
- Emit LLVM IR statement by statement and expression by expression
- Map global variables to LLVM global variables
- Map local variables to LLVM alloca stack allocation.
- Use LLVm virtual registers to hold temporary results
Local Variable Translation
- Using stack allocation to translate local variables is straightforward because it gets around the SSA requirement on temporary register.
- But stack memory is slower than temporary registers
- LLVM IR parameter variables must satisfy SSA form like registers
- Cannot modify parameter variables in the function body
- How to handle non-constant parameter?
- Create a temporary local variable on-stack for each parameter with alloca
- Store the original parameter value to the temporary variable.
- Treat the parameter variables as normal local variables in the follow-up code translation
Expression Translation
- Generate IR for expressions with a depth first traversal of the AST that represents the expression.
- Typically, generate subexpressions first before the main expression.
- Virtual registers may be required during expression processing to hold intermediate results.
- Use symbol table to tack LLVM alloca and global variable associated with each program variables.
- Record the mapping between expressions to already generated LLVM IR values.
Basic Block and Control Flow Graph
- A basic block is a program is a piece of the program with one entry and one exit
- Branching statements cause a change in the flow of control through a program.
- The control flow graph for a program describes all possible flows of control between basic blocks in the program.
Comparison and Short0Circuit Evaluation
- The relational operators $<, \leq, =, \neq, \geq, >$ are usually treated as binary operators that produce a boolean result.
- C and C++ perform short circuit evaluation on && and ||
- Only evaluate as much of a boolean expression as is required to determine its value. Generate branching code instead of arithmetic code
- Create two additional basic blocks to represent two possible execution paths:
- slow path: The basic block to execute if we need to evaluate the second sub-expression
- out: The exit of basic block
- Translate the first sub-expression
- Generate a conditional branch on the sub-expression result value, if true(false) for &&(||) jump to the slow path, otherwise jump to out
- Set the insertion point to the slow path basic block
- Translate the second sub-expression
- Set the insertion point to the out basic block.
- Generate a PHI instruction to merge the results of two paths.
Translate If statement
- Create three basic blocks, the entrance labels of them are:
- then: The basic block for then branch
- else: The basic block fo else branch
- out: The basic block for follow-up statements
- Translate the conditional expression
- Generate a conditional branch instruction to jump on the basic block based on the conditional expression value
- Set the insertion point to then and recursively translate the then branch statements
- Generate an unconditional branch instruction to jump to out
- Set the insertion point to else and recursively translate the else branch statements
- Generate an unconditional branch instruction to jump to out
- If there is no else branch, we can create one less basic block
- Note that there might be nested control statements inside then/else block, so the branch instructions might not be in the then / else
Simple If Statement | Nested If Statement |
Translate While Loop
- Create three basic blocks, the entrance labels of them are:
- cond: The basic block foe evaluating the condition
- body: The basic block for the while body
- out: The basic block for follow-up statements
- Set cond as the insertion point
- Translate the conditional expression
- Generate a conditional branch instruction that jumps to body or out based on the value of the conditional expresion
- Set body as the insertion point
- Translate the while body statement
- Generate an unconditional branch instruction to jump to cond
Translate For Loop
- Create three basic blocks, the entrance labels of them are:
- cond: The basic block for evaluating the condition
- body: The basic block for the for body
- out: The basic block for follow-up statements
- Translate the initialization expression
- Set cond as the insertion point
- Translate the conditional expression
- Generate a conditional branch instruction that jumps to body or out based on the value of the conditional expression
- Set body as the insertion point
- Translate the for body statement
- Translate the step expression
- Generate an unconditional jump to cond
Translate Break and Continue
- Many languages have special control flow statements inside loops such as break and continue
- To translate break, the code translator will record the exit label inside a loop and generate an unconditional jump to exit
- To translate continue for while loop, it will generate an unconditional jump back to cond
- To translate continue for for loop, the code translator needs to create one additional basic block for the step expression only separated from the body and generate unconditional jump to this additional basic block
Translate Function Declarations
- Perform semantic analysis on function header.
- Record formal parameter declarations in symbol/type table
- Record type of return value for a function
- For the function body, create an entrance basic block and emit prologue code to set up run time environment for the function
- Process declaration for local variables.
- Emit epilogue code as required
- Record the generated IR handle in the symbol table
Prologue and Epilogue
- The prologue is typically emitted at the head of each routine
- Allocate resources
- For native code, typically saving registers that might be modified
- The typical prologue for MiniC in LLVM IR is to set up the temporary variable space at the stack for non-constant parameters
- The epilogue is invoked to trigger a return from the function
- Deallocate resources
- Return from the function
- The typical epilogue for MiniC in LLVM IR is to return the default value defined by the language for the function if there is no return statement in the end.
Return and Terminator Caveat in LLVM IR
-
Return statements should be directly translated into return instructions in LLVm IR.
- Terminator Caveat: Every basic block in LLVM can only have one terminator instruction at the end!
- Return instruction, conditional branch, unconditional branch, etc.
- You may want to skip some the branch statement if the previous statement is already a terminator when translating if/for statements.
Argument-Passing Method
-
Formal parameters in function definition are just placeholders, replaced by arguments during a function call.
-
Parameter passing - matching of arguments with formal parameters when a routine call occurs.
Translate Call Expressions
- Call expressions should be translated into call instructions in LLVM IR.
- LLVM call instruction assume pass-by-value semantic. For call-by-value cases, the translation is straightforward
- For call-by-reference cases, we should modify the function signature to convert the parameter into its corresponding pointer type and then pass the address pointer of the argument variable
Struct and Classes
- Struct is user defined aggregate types. It can be directly mapped to aggregate types in LLVm IR.
- A class-like construct is available in a number of languages.
- Classes are used to isolate some data and a collection of procedures and functions that operate on the data
- Classes allow separate compilation in large software systems.
- Classes provide information hiding so that data internal to the class is not visible outside the class. Information hiding is an important tool in allowing parts of a software system to be constructed and maintained separately.
Class Implementation Issues
- Initialization and finalization of class instances
- When should it happen?
- Can it be guaranteed to happen?
- What is the correct order of class initialization?
- What is the correct order of class finalization
- Code translation component may often
- Nested class declarations.
Kind of Classes
- The difficulty in implementing a class depends on the kind of class, single instance, multiple instance or template multiple instance.
- Single Instance Classes There is exactly one instance of the classes internal data and routines.
- Example: Static Classes in Java
- Multiple Instances Classes There may be multiple instances of the classes internal data but only one instance of the class member functions.
- Example: Classes in C++, Classes in Java
- Template Multiple Instance Classes: The class is parameterized by constant and type information (templates) which can be used to specialize each
instance of the class. Usually requires multiple copies of the classes data and member functions.
- Example: Parameterized classes in C++ and Java
Generic Class
1
2
3
4
5
6
7
8
class{
# Import declarations
# Export declarations
# Internal Constants, types and variables
# Exported constants and types
# Internal functions and procedures
# Exported functions and procedures
}
Implementing Class Imports and Exports
- Managing imports and exports is a symbol table management issue.
- The body of a class is a separate scope like the body of a routine
- Create a type table entry for describing classes even if classes aren’t treated like types in the language. The type table entry will be similar to the ones used for records
- The names exported by a class are linked together in the symbol table in a list that is pointed to from the classes type table entry. This list is used outside the class to resolve references to exported items.
- Symbol table entries for items imported into a class can be copied into the symbol table of the class.
Single instance Class Data
- Treat class like a struct variable
- Allocate storage in enclosing major scope.
- Member functions are children of enclosing major scope.
- Control access to class data via symbol table lookup management
1
2
3
4
5
6
7
int A, B;
static class C1{
int X;
int[10] Y;
void foo(...){...}
}
bool D;
Multiple Instance Classes
- Treat class like a struct type
- Allocate storage separately for each class instance variable.
- For space efficiency, generate one instance of each member function.
- Give member functions access to class instance data via an extra parameter of a hidden pointer for the class (i.e., this pointer)
1 2 3 4 5 6 7
int A, B; class C2{ int X; void foo(...){...} } C2 R, S, T[5]; bool D;
Multiple Instance Template Classes
- Treat each class template instance as a distinct struct type
- Option1: Expand the class definition similar to C macros
- This is the C++ approach
- Advantage: Straightforward
- Disadvantage: Put more weight on preprocessors. Slow compilation. Hard to provide good diagnose error message for semantic analysis.
- Option2: Use symbol table to track and maintain different instances. Build a type system around the generic template type parameters.
- Advantage: More elegant solutions and better diagnose message
- Disadvantage: Need a complicated type system