Introduction to LLVM 2

CSCD70 tut2

Posted by Dino on January 18, 2021

LLVM Transform Pass

User-Use-Value relation

Suppose that we have the following code to optimize:

1
2
3
4
5
6
%2 = add %1, 0 ; Algebraic Identity
%3 = mul %2, 2

-- Suppose we omit the identity
               ; Algebraic Identity
%3 = mul ???, 2
  • Program crashes because we did not update the references properly.

  • How to make sure that all references (i.e. uses) are updated properly?

    • LLVM User-Use-Value Relationship
      1
      2
      3
      
         -----            ----            -----------
        |Value|  ------> |User| -------> |Instruction|
         -----            ----            -----------
      
    • We are going to show that the Instruction can play the role both as an User and as an Usee(Value).

Value

  • The Value class is the most important base class in LLVM, as almost all object types inherit from it.

  • A Value has a type (e.g., integer, floating point): getType.

  • A Value might or might not have a name: hasName, getName.

  • Most importantly, a Value has a list of Users that are using itself.

Instruction as an User

An Instruction is an User

  • Each User(Instruction) has a list of values it is using.
    • Those values are known as the Operands, of type Value.
      1
      2
      3
      4
      5
      
       User &Inst = ...
       for (auto Iter = Inst.op_begin();
                 Iter != Inst.op_end(); ++Iter)
            Value *Operand = *Iter;
      // %2 = add %1, 0 -> operand %1 & 0
      

Instruction as an Usee

  • Why is an Instruction is an Usee?

  • The answer lies in our interpretation of LLVM Instruction:
    • %2 = add %1, 0

    • %2 is the Value representation of instruction add %1, 0.

    • Don’t treat %2 as the result of a computation.

    • Instead, treat it as the alias of an instruction.

  • Therefore, wherever in later text we use the value %2, we mean to use the instruction add %1, 0

Summary

Back to the problem we have at the very beginning:

1
2
%2 = add %1, 0 ; Algebraic Identity
%3 = mul %2, 2
  • Let Inst be a reference to instruction %2 = add %1, 0
    1
    2
    3
    
      for (auto Iter = Inst.op_begin();
                Iter != Inst.op_end(); ++Iter)
          Value *Operand = *Iter;
    
    • Operand %1, 0
    1
    2
    3
    
      for (auto Iter = Inst.user_begin();
                Iter != Inst.user_end(); ++Iter)
          User *InstUser = *Iter;
    
    • Instruction mul %2, 2 (alternatively, Value %3)

    • Since %3 is using %2, that is why %3 is an user of it

Example Code

Source Code for transformation

1
2
3
4
5
6
7
int foo(int e, int a){
    int b = a + 1;
    int c = b * 2;
    b = e << 1;
    int d = b / 4;
    return c * d;
}

LLVM Transform Pass

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
#include <llvm/IR/Constants.h>
#include <llvm/IR/Function.h>
#include <llvm/IR/Instructions.h>
#include <llvm/IR/Module.h>
#include <llvm/Pass.h>
#include <llvm/Support/raw_ostream.h>

using namespace llvm;

namespace {

class Transform final : public ModulePass {
private:
  bool runOnBasicBlock(BasicBlock &B) {
    // get the first and second instruction
    Instruction &Inst1st = *B.begin(), &Inst2nd = *(++B.begin());

    // The address of the 1st instruction is equal to that of the 1st operand of
    // the 2nd instruction
    assert(&Inst1st == Inst2nd.getOperand(0));

    // print the 1st instruction
    outs() << Inst1st << "\n";
    // print the 1st instruction as an operand
    Inst1st.printAsOperand(outs(), false);
    outs() << "\n";

    // User-Use-Value
    for (auto *Iter = Inst1st.op_begin(); Iter != Inst1st.op_end(); ++Iter) {
      Value *Operand = *Iter;

      if (Argument *Arg = dyn_cast<Argument>(Operand)) {
        outs() << "I am function " << Arg->getParent()->getName() << "\'s #"
               << Arg->getArgNo() << " argument"
               << "\n";
      }
      if (ConstantInt *C = dyn_cast<ConstantInt>(Operand)) {
        outs() << "I am a constant integer of value " << C->getValue() << "\n";
      }
    }

    for (auto Iter = Inst1st.user_begin(); Iter != Inst1st.user_end(); ++Iter) {
      outs() << *(dyn_cast<Instruction>(*Iter)) << "\n";
    }

    for (auto Iter = Inst1st.use_begin(); Iter != Inst1st.use_end(); ++Iter) {
      outs() << *(dyn_cast<Instruction>(Iter->getUser())) << "\n";
    }

    // Instruction Manipulation
    Instruction *NewInst = BinaryOperator::Create(
        Instruction::Add, Inst1st.getOperand(0), Inst1st.getOperand(0));

    NewInst->insertAfter(&Inst1st);
    Inst1st.user_begin()->setOperand(0, NewInst);

    return true;
  }

  bool runOnFunction(Function &F) {
    bool Transformed = false;

    for (auto Iter = F.begin(); Iter != F.end(); ++Iter) {
      if (runOnBasicBlock(*Iter)) {
        Transformed = true;
      }
    }

    return Transformed;
  }

public:
  static char ID;

  Transform() : ModulePass(ID) {}
  virtual ~Transform() override {}

  virtual void getAnalysisUsage(AnalysisUsage &AU) const override {
    AU.setPreservesCFG();
  }

  virtual bool runOnModule(Module &M) override {
    bool Transformed = false;

    for (auto Iter = M.begin(); Iter != M.end(); ++Iter) {
      if (runOnFunction(*Iter)) {
        Transformed = true;
      }
    }
    return Transformed;
  }
};

char Transform::ID = 0;
RegisterPass<Transform> X("transform", "Example Transform Pass");

} // anonymous namespace

Result

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
@ Before Transformation
define i32 @foo(i32 %0, i32 %1){
    %3 = add nsw i32 %1, 1
    %4 = mul nsw i32 %3, 2
    @ Note since the SSA, the same variable b is assigned with a new address %5 instead of original %3
    %5 = shl i32 %0, 1
    %6 = sdiv i32 55, 4
    %7 = mul nsw i32 %4, %6
    ret i32 %7
}

@ After Transormation
define i32 @foo(i32 %0, i32 %1){
    %3 = add nsw i32 %1, 1 ;
    %4 = add nsw i32 %3, %3 ; Pass insert a line of code
    %5 = mul nsw i32 %4, 2 ; Pass changes the operand of the instruction
    %6 = shl i32 %0, 1 ; 
    %7 = sdiv i32 %6, 4 ;
    %8 = mul nsw i32 %5, %7 ;
    ret i32 %8 
}

LLVM Pass Manager

Recall from the previous section:

  • The optimization passes can be categorized into analysis and transform passes.
  • This is a form of isolation.

We have the isolation, how should we build the connection?

  • LLVM Pass Manager
  • Where is it?
    1
    
    virtual void getanalysisUsage(AnalysisUsage &AU) 
    

What does the Pass Manager do?

  • Requires & Preserves Information
    • If a pass A requires pass B, bt the time A initiates, LLVM will automatically run B if B has not been run before
      • Suppose A requires B

      • If we run opt -A

      • The actual order for running is B -> A

      • Think about dependency chain in the soft development

    • if a pass A preserves pass B, by the time pass A terminates, LLVM will preserve B, and B will not be rerun unless subsequent passes invalidate it.
      • Suppose both A & C requires B and A preserve B

      • If we run opt -A -C

      • The actual order for running is B -> A -> C since B is preserved it will not terminate since A preserves B

      • If A does not perverse B

      • The actual order for running is B -> A -> B -> C since A does not preserve B, then before running C the B will be run again since C requires the B

  • To use the pass manager, the getAnalysisUsage must be overridden to specify the required and invalidated sets.

  • Some of common usages
    • setPreservesAll: Preserves all previous passes.

    • setPreservesCFG: Preserves the CFG

    • addPreserved<A>: Preserves pass A after this pass.

    • addRequired<A>: Requires pass A to run before this pass.

    • getAnalysis<A>: Gets the information from the required pass A.