IR Gen

Translation of Programs

Translation is the process of transforming a program into some intermediate representation
Input to the translation process is the representation of the program as produced by the parser after it has been subject to semantic analysis
Conceptually translation is performed on a parse tree for the program
Major translation issues are expressions and control structures
Translation builds the programs control flow graph

Abstract Syntax Tree Directed Translation

Perform semantic analysis and code generation by walking the AST
Write the code translation as an AST visitor
Emit LLVM IR statement by statement and expression by expression
- Map global variables to LLVM global variables
- Map local variables to LLVM alloca stack allocation.
- Use LLVm virtual registers to hold temporary results

Local Variable Translation

Using stack allocation to translate local variables is straightforward because it gets around the SSA requirement on temporary register.
But stack memory is slower than temporary registers
LLVM IR parameter variables must satisfy SSA form like registers
- Cannot modify parameter variables in the function body
How to handle non-constant parameter?
- Create a temporary local variable on-stack for each parameter with alloca
- Store the original parameter value to the temporary variable.
- Treat the parameter variables as normal local variables in the follow-up code translation

Expression Translation

Generate IR for expressions with a depth first traversal of the AST that represents the expression.
Typically, generate subexpressions first before the main expression.
Virtual registers may be required during expression processing to hold intermediate results.
Use symbol table to tack LLVM alloca and global variable associated with each program variables.
Record the mapping between expressions to already generated LLVM IR values.

Basic Block and Control Flow Graph

A basic block is a program is a piece of the program with one entry and one exit
Branching statements cause a change in the flow of control through a program.
The control flow graph for a program describes all possible flows of control between basic blocks in the program.

Comparison and Short0Circuit Evaluation

The relational operators $<, \leq, =, \neq, \geq, >$ are usually treated as binary operators that produce a boolean result.
C and C++ perform short circuit evaluation on && and ||
- Only evaluate as much of a boolean expression as is required to determine its value. Generate branching code instead of arithmetic code
Create two additional basic blocks to represent two possible execution paths:
- slow path: The basic block to execute if we need to evaluate the second sub-expression
- out: The exit of basic block
  1. Translate the first sub-expression
  2. Generate a conditional branch on the sub-expression result value, if true(false) for &&(||) jump to the slow path, otherwise jump to out
  3. Set the insertion point to the slow path basic block
  4. Translate the second sub-expression
  5. Set the insertion point to the out basic block.
  6. Generate a PHI instruction to merge the results of two paths.

Translate If statement

Create three basic blocks, the entrance labels of them are:
- then: The basic block for then branch
- else: The basic block fo else branch
- out: The basic block for follow-up statements
  1. Translate the conditional expression
  2. Generate a conditional branch instruction to jump on the basic block based on the conditional expression value
  3. Set the insertion point to then and recursively translate the then branch statements
  4. Generate an unconditional branch instruction to jump to out
  5. Set the insertion point to else and recursively translate the else branch statements
  6. Generate an unconditional branch instruction to jump to out
If there is no else branch, we can create one less basic block
Note that there might be nested control statements inside then/else block, so the branch instructions might not be in the then / else

Simple If Statement	Nested If Statement

Translate While Loop

Create three basic blocks, the entrance labels of them are:
- cond: The basic block foe evaluating the condition
- body: The basic block for the while body
- out: The basic block for follow-up statements
  1. Set cond as the insertion point
  2. Translate the conditional expression
  3. Generate a conditional branch instruction that jumps to body or out based on the value of the conditional expresion
  4. Set body as the insertion point
  5. Translate the while body statement
  6. Generate an unconditional branch instruction to jump to cond

Translate For Loop

Create three basic blocks, the entrance labels of them are:
- cond: The basic block for evaluating the condition
- body: The basic block for the for body
- out: The basic block for follow-up statements
  1. Translate the initialization expression
  2. Set cond as the insertion point
  3. Translate the conditional expression
  4. Generate a conditional branch instruction that jumps to body or out based on the value of the conditional expression
  5. Set body as the insertion point
  6. Translate the for body statement
  7. Translate the step expression
  8. Generate an unconditional jump to cond

Translate Break and Continue

Many languages have special control flow statements inside loops such as break and continue
To translate break, the code translator will record the exit label inside a loop and generate an unconditional jump to exit
To translate continue for while loop, it will generate an unconditional jump back to cond
To translate continue for for loop, the code translator needs to create one additional basic block for the step expression only separated from the body and generate unconditional jump to this additional basic block

Translate Function Declarations

Perform semantic analysis on function header.
Record formal parameter declarations in symbol/type table
Record type of return value for a function
For the function body, create an entrance basic block and emit prologue code to set up run time environment for the function
Process declaration for local variables.
Emit epilogue code as required
Record the generated IR handle in the symbol table

Prologue and Epilogue

The prologue is typically emitted at the head of each routine
- Allocate resources
- For native code, typically saving registers that might be modified
- The typical prologue for MiniC in LLVM IR is to set up the temporary variable space at the stack for non-constant parameters
The epilogue is invoked to trigger a return from the function
- Deallocate resources
- Return from the function
- The typical epilogue for MiniC in LLVM IR is to return the default value defined by the language for the function if there is no return statement in the end.

Return and Terminator Caveat in LLVM IR

Return statements should be directly translated into return instructions in LLVm IR.
Terminator Caveat: Every basic block in LLVM can only have one terminator instruction at the end!
- Return instruction, conditional branch, unconditional branch, etc.
You may want to skip some the branch statement if the previous statement is already a terminator when translating if/for statements.

Argument-Passing Method

Formal parameters in function definition are just placeholders, replaced by arguments during a function call.
Parameter passing - matching of arguments with formal parameters when a routine call occurs.

Translate Call Expressions

Call expressions should be translated into call instructions in LLVM IR.
LLVM call instruction assume pass-by-value semantic. For call-by-value cases, the translation is straightforward
For call-by-reference cases, we should modify the function signature to convert the parameter into its corresponding pointer type and then pass the address pointer of the argument variable

Struct and Classes

Struct is user defined aggregate types. It can be directly mapped to aggregate types in LLVm IR.
A class-like construct is available in a number of languages.
Classes are used to isolate some data and a collection of procedures and functions that operate on the data
Classes allow separate compilation in large software systems.
Classes provide information hiding so that data internal to the class is not visible outside the class. Information hiding is an important tool in allowing parts of a software system to be constructed and maintained separately.

Class Implementation Issues

Initialization and finalization of class instances
- When should it happen?
- Can it be guaranteed to happen?
- What is the correct order of class initialization?
- What is the correct order of class finalization
Code translation component may often
Nested class declarations.

Kind of Classes

The difficulty in implementing a class depends on the kind of class, single instance, multiple instance or template multiple instance.
Single Instance Classes There is exactly one instance of the classes internal data and routines.
- Example: Static Classes in Java
Multiple Instances Classes There may be multiple instances of the classes internal data but only one instance of the class member functions.
- Example: Classes in C++, Classes in Java
Template Multiple Instance Classes: The class is parameterized by constant and type information (templates) which can be used to specialize each instance of the class. Usually requires multiple copies of the classes data and member functions.
- Example: Parameterized classes in C++ and Java

Generic Class

class{
  # Import declarations
  # Export declarations
  # Internal Constants, types and variables
  # Exported constants and types
  # Internal functions and procedures
  # Exported functions and procedures
}

Implementing Class Imports and Exports

Managing imports and exports is a symbol table management issue.
The body of a class is a separate scope like the body of a routine
Create a type table entry for describing classes even if classes aren’t treated like types in the language. The type table entry will be similar to the ones used for records
The names exported by a class are linked together in the symbol table in a list that is pointed to from the classes type table entry. This list is used outside the class to resolve references to exported items.
Symbol table entries for items imported into a class can be copied into the symbol table of the class.

Single instance Class Data

Treat class like a struct variable
Allocate storage in enclosing major scope.
Member functions are children of enclosing major scope.
Control access to class data via symbol table lookup management

int A, B;
static class C1{
  int X;
  int[10] Y;
  void foo(...){...}
}
bool D;

Multiple Instance Classes

Treat class like a struct type
Allocate storage separately for each class instance variable.
For space efficiency, generate one instance of each member function.

Give member functions access to class instance data via an extra parameter of a hidden pointer for the class (i.e., this pointer)

int A, B;
class C2{
int X;
void foo(...){...}
}
C2 R, S, T[5];
bool D;

Multiple Instance Template Classes

Treat each class template instance as a distinct struct type
Option1: Expand the class definition similar to C macros
- This is the C++ approach
- Advantage: Straightforward
- Disadvantage: Put more weight on preprocessors. Slow compilation. Hard to provide good diagnose error message for semantic analysis.
Option2: Use symbol table to track and maintain different instances. Build a type system around the generic template type parameters.
- Advantage: More elegant solutions and better diagnose message
- Disadvantage: Need a complicated type system

CSC488 Lec7

Translation of Programs

Abstract Syntax Tree Directed Translation

Local Variable Translation

Expression Translation

Basic Block and Control Flow Graph

Comparison and Short0Circuit Evaluation

Translate If statement

Translate While Loop

Translate For Loop

Translate Break and Continue

Translate Function Declarations

Prologue and Epilogue

Return and Terminator Caveat in LLVM IR

Argument-Passing Method

Translate Call Expressions

Struct and Classes

Class Implementation Issues

Kind of Classes

Generic Class

Implementing Class Imports and Exports

Single instance Class Data

Multiple Instance Classes

Multiple Instance Template Classes

CATALOG

FEATURED TAGS