Complete Compilation
1 Syntax
For simplicity, we will focus on a very simple language as illustrated the following sample code. We will call this language Calc.
"compute_area"
pi = 31415
a = 3
b = 4
area = pi * (a + b) * (a + b)
print areaThe grammar can be quite easily specified in Antlr.
For brevity, we will omit the lexer rules. The grammar defines the following:
- ID: variable identifier
- NUMBER: integer literals
- STRING: double-quoted strings
- NL: newline
- Whitespaces are discarded by the tokenizer.
Calc.g4
grammar Calc;
program
  : STRING NL* block EOF
  ;We include the program name as a string in the source code.
It is then followed by a block of code.
block
  : (stmt NL)* stmt NL?
  ;A block is one or more newline separated statements stmt. Note the last newline is optional.
stmt
  : assign
  | print
  ;A statement can either be a variable assignment assign, or a print statement print.
assign
  : ID '=' expr
  ;Variable assignments assign any expression expr to a variable specified by its ID.
print
  : 'print' expr
  ;We can print the expression value using the print statement.
expr
  : expr ('*' | '/' | '+' | '/') expr
  | '(' expr ')'
  | NUMBER
  | ID
  ;An expression expr is any arithmetic expression. It allows number literals and variable names.
2 Compiler Backend
The compiler backend are the Kotlin classes that implement either Stmt or Expr abstract classes as described in Compiler Basics
We have already seen many programming constructs as implementations of Stmt or Expr.
Here is an example. It is the implementation for integer literals.
class IntExpr(
    val value: Int,
): Expr() {
    override fun gen(scope: Scope): List<String> 
    = listOf(
        "ldc $value"
    )
    companion object {
        @JvmStatic
        fun make(text:String): IntExpr
        = IntExpr(Integer.parseInt(text))
    }
}Note:
Each implementation has a companion object with the make method that can construct an object from the parsed results.
In the case of IntExpr, it can be constructed by IntExpr.make(lexeme) where lexeme is the text string of the NUMBER token in the parse tree.
Another example is the arithmetic expression as implemented by ArithExpr.
class ArithExpr(
    val op: ArithExpr.Op,
    val left: Expr,
    val right: Expr,
): Expr() {
    enum class Op {Add, Sub, Mul, Div}
    override fun gen(scope: Scope): List<String> 
    = left.gen(scope) +
      right.gen(scope) +
      when(op) {
          Op.Add -> "iadd"
          Op.Sub -> "isub"
          Op.Mul -> "imul"
          Op.Div -> "idiv"
      }
    companion object {
      @JvmStatic
      fun make(
        optext:String,
        x:Expr,
        y:Expr
      ):ArithExpr {
        val op = when(optext) {
            "+" -> ArithExpr.Op.Add
            "-" -> ArithExpr.Op.Sub
            "*" -> ArithExpr.Op.Mul
            "/" -> ArithExpr.Op.Div
            else -> null
        } ?: throw Exception("Unknown operator")
        return ArithExpr(op, x, y)
      }
    }
}This is an expression (limited to integer arithmetics). It has an ArithExpr.make(...) method that constructs the expression object based on operator token lexeme, and sub-expression objects.
2.1 A list of the compiler backend objects
class ClassStmt: Stmt()
class MethodStmt: Stmt()
class Print: Stmt()
class AssignStmt: Stmt()
class BlockStmt: Stmt()
class IntExpr: Expr()
class DerefExpr: Expr()
class ArithExpr: Expr()3 Syntax Directed Definitions
With the support for data attributes and embedded actions, we can implement the SDD compiler object construction as part of the ANTLR grammar file.
Lets go through selected modification.
grammar Calc;
@header {
  import java.util.List;
  import java.util.ArrayList;
}We need to build lists of statement objects using ArrayList later.
program returns [ClassStmt code]
  : STRING NL* block EOF
    { $code = ClassStmt.make(
        $STRING.text,
        $block.code); 
    }
  ;
Note:
- programrules now have a data attribute- ClassStmt code.
- The $codedata attribute is constructed by the action that usesClassStmt.makefunction to build the class object.
block returns [BlockStmt code]
  @init { List<Stmt> stmtList = new ArrayList<>(); }
  : (x=stmt { stmtList.add($x.code); } NL) *
    y=stmt { stmtList.add($y.code); } NL?
    { $code = BlockStmt.make(stmtList); }
  ;
Note:
- blockrule has a data attribute- BlockStmt code.
- It also has a local variable List<Stmt> stmtListthat is initialized in the@initaction.
- In the iteration over (stmt NL)* stmt, we have embedded actions which updates thestmtListwith the newly createdStmtobjects.
Also note that parse tree labels used in the rule:
x=stmt ... y=stmt.
- An action is added at the end to construct the BlockStmtusing theBlockStmt.makefunction.
4 A Complete Compiler
Now, we can integrate everything together into a single compiler executable.
The Compiler toolchain was developered in earlier lectures. See Generating Java class and Java class loader sections in Compiler Basics
import org.antlr.v4.runtime.*
import java.io.*
fun main(args:List<String>) {
    val sourceFile = args[0]
    val input = CharStreams.fromFile(sourceFile)
    val lexer = CalcLexer(input)
    val tokens = CommonTokenStream(lexer)
    val parser = CalcParser(tokens)
    
    parser.program().code.let {
        Compiler.compile(it)
        Compiler.run(it.className)
    }
}Note
- Antlr generates CalcLexerandCalcParserfrom the grammar fileCalc.g4.
- From the parser, we can obtain the ClassStmt codedata attribute.
- The Compilerobject is the compiler toolchain developed earlier.
4.1 Putting things together:
Let’s go through the build steps of our compiler:
- Compile the backend source code (.kt) to Java classes:
$ kotinc -d target/backend \
     -cp jasmin.jar \
     src/backend/*.kt- Generate from the grammar file (*.g4) the lexer and parser source code files (.java)
$ java -jar antlr.jar \
      -o target/grammar \
      src/antlr/*.g4- Compile the parser class
$ javac -d target/parser \
    -cp antlr.jar:target/backend \
    target/grammar/src/antlr/*.java - Compile the actual compiler as a kotlin application
kotlinc -d target \
    -cp jasmin.jar:antlr.jar:target/backend:target/parser \
    Main.ktThis allows us to parse, compile and run our own scripts.
Here is a example:
sample.txt
"Sample"
x = 3
y = 4
z = x * y
z = z + 10
print zWe can run:
$ java MainKt sample1.txt
225 Stack and Register Allocation
Let’s take a closer look at the ClassStmt’s make method:
class ClassStmt( ... ) {
    ...
    companion object {
        @JvmStatic
        fun make(name: String, body: Stmt):ClassStmt
        = ClassStmt(
            name.replace("\"", ""),
            MethodStmt(
                "main",
                "([Ljava/lang/String;)V",
                stackLimit = 10,  // 👈
                localsLimit = 10, // 👈
                body
            )
        )
    }
}Note
We limited the stack size and register count to 10.
5.1 Register Count Estimation
Registers are used for local variables. So, a conservative estimate is just the number of local variables defined in the scope.
Conservative estimate of localsLimit
fun estimateLocals(code:Code):Int {
    val scope = Scope().apply {
        code.gen(this)
    }
    return scope.size
}This is not an optimal estimation. Consider the code:
x = 1
print x
y = 2
print y- Both xandycan use the same register, so the optimal estimation is 1.
- But estimateLocalswill be 2.
5.2 Estimation of stack size
Consider the code for 1 + 2 + 3 + 4.
The generated code is:
ldc 1
ldc 2
iadd
ldc 3
iadd
ldc 4This only requires a stack size of 2.
A very similar expression: 1 + (2 + (3 + 4))) generates the following bytecode:
ldc 1
ldc 2
ldc 3
ldc 4
iadd
iadd
iaddThis requires stack size of 4.
A conservative estimate of the stack size is the maximum height of all expressions in the code. The height of an Expr node can be computed by the parser as part of the SDD.
expr returns [Expr code, int height]
  : NUMBER { $code = IntExpr.make($NUMBER.text);
             $height = 1; }
  | x=expr op='+' y=expr
    { $code = ArithExpr.make($op, $x, $y);
      $height = max($x.height, $y.height) + 1; }
  | ...
  ;Conservative estimate of stack size
Given a
code:Code, we need to locate all expressionsExprin the code using tree walk, and return the maximum expression height.