Complete Compilation
1 Syntax
For simplicity, we will focus on a very simple language as illustrated the following sample code. We will call this language Calc.
"compute_area"
= 31415
pi = 3
a = 4
b = pi * (a + b) * (a + b)
area
print area
The grammar can be quite easily specified in Antlr.
For brevity, we will omit the lexer rules. The grammar defines the following:
ID
: variable identifierNUMBER
: integer literalsSTRING
: double-quoted stringsNL
: newline- Whitespaces are discarded by the tokenizer.
Calc.g4
;
grammar Calc
program: STRING NL* block EOF
;
We include the program name as a string in the source code.
It is then followed by a block of code.
block: (stmt NL)* stmt NL?
;
A block is one or more newline separated statements stmt
. Note the last newline is optional.
stmt: assign
| print
;
A statement can either be a variable assignment assign
, or a print statement print
.
assign: ID '=' expr
;
Variable assignments assign any expression expr
to a variable specified by its ID
.
print: 'print' expr
;
We can print the expression value using the print
statement.
expr: expr ('*' | '/' | '+' | '/') expr
| '(' expr ')'
| NUMBER
| ID
;
An expression expr
is any arithmetic expression. It allows number literals and variable names.
2 Compiler Backend
The compiler backend are the Kotlin classes that implement either Stmt
or Expr
abstract classes as described in Compiler Basics
We have already seen many programming constructs as implementations of Stmt
or Expr
.
Here is an example. It is the implementation for integer literals.
class IntExpr(
val value: Int,
): Expr() {
override fun gen(scope: Scope): List<String>
= listOf(
"ldc $value"
)
companion object {
@JvmStatic
fun make(text:String): IntExpr
= IntExpr(Integer.parseInt(text))
}
}
Note:
Each implementation has a companion object with the make
method that can construct an object from the parsed results.
In the case of IntExpr
, it can be constructed by IntExpr.make(lexeme)
where lexeme
is the text string of the NUMBER
token in the parse tree.
Another example is the arithmetic expression as implemented by ArithExpr
.
class ArithExpr(
val op: ArithExpr.Op,
val left: Expr,
val right: Expr,
): Expr() {
enum class Op {Add, Sub, Mul, Div}
override fun gen(scope: Scope): List<String>
= left.gen(scope) +
right.gen(scope) +
when(op) {
Op.Add -> "iadd"
Op.Sub -> "isub"
Op.Mul -> "imul"
Op.Div -> "idiv"
}
companion object {
@JvmStatic
fun make(
optext:String,
x:Expr,
y:Expr
):ArithExpr {
val op = when(optext) {
"+" -> ArithExpr.Op.Add
"-" -> ArithExpr.Op.Sub
"*" -> ArithExpr.Op.Mul
"/" -> ArithExpr.Op.Div
else -> null
} ?: throw Exception("Unknown operator")
return ArithExpr(op, x, y)
}
}
}
This is an expression (limited to integer arithmetics). It has an ArithExpr.make(...)
method that constructs the expression object based on operator token lexeme, and sub-expression objects.
2.1 A list of the compiler backend objects
class ClassStmt: Stmt()
class MethodStmt: Stmt()
class Print: Stmt()
class AssignStmt: Stmt()
class BlockStmt: Stmt()
class IntExpr: Expr()
class DerefExpr: Expr()
class ArithExpr: Expr()
3 Syntax Directed Definitions
With the support for data attributes and embedded actions, we can implement the SDD compiler object construction as part of the ANTLR grammar file.
Lets go through selected modification.
;
grammar Calc
@header {
import java.util.List;
import java.util.ArrayList;
}
We need to build lists of statement objects using ArrayList
later.
[ClassStmt code]
program returns : STRING NL* block EOF
{ $code = ClassStmt.make(
.text,
$STRING.code);
$block}
;
Note:
program
rules now have a data attributeClassStmt code
.- The
$code
data attribute is constructed by the action that usesClassStmt.make
function to build the class object.
[BlockStmt code]
block returns @init { List<Stmt> stmtList = new ArrayList<>(); }
: (x=stmt { stmtList.add($x.code); } NL) *
=stmt { stmtList.add($y.code); } NL?
y{ $code = BlockStmt.make(stmtList); }
;
Note:
block
rule has a data attributeBlockStmt code
.- It also has a local variable
List<Stmt> stmtList
that is initialized in the@init
action. - In the iteration over
(stmt NL)* stmt
, we have embedded actions which updates thestmtList
with the newly createdStmt
objects.
Also note that parse tree labels used in the rule:
x=stmt ... y=stmt
.
- An action is added at the end to construct the
BlockStmt
using theBlockStmt.make
function.
4 A Complete Compiler
Now, we can integrate everything together into a single compiler executable.
The Compiler
toolchain was developered in earlier lectures. See Generating Java class and Java class loader sections in Compiler Basics
import org.antlr.v4.runtime.*
import java.io.*
fun main(args:List<String>) {
val sourceFile = args[0]
val input = CharStreams.fromFile(sourceFile)
val lexer = CalcLexer(input)
val tokens = CommonTokenStream(lexer)
val parser = CalcParser(tokens)
.program().code.let {
parser.compile(it)
Compiler.run(it.className)
Compiler}
}
Note
- Antlr generates
CalcLexer
andCalcParser
from the grammar fileCalc.g4
. - From the parser, we can obtain the
ClassStmt code
data attribute. - The
Compiler
object is the compiler toolchain developed earlier.
4.1 Putting things together:
Let’s go through the build steps of our compiler:
- Compile the backend source code (.kt) to Java classes:
$ kotinc -d target/backend \
-cp jasmin.jar \
*.kt src/backend/
- Generate from the grammar file (*.g4) the lexer and parser source code files (.java)
$ java -jar antlr.jar \
-o target/grammar \
*.g4 src/antlr/
- Compile the parser class
$ javac -d target/parser \
-cp antlr.jar:target/backend \
*.java target/grammar/src/antlr/
- Compile the actual compiler as a kotlin application
kotlinc -d target \
-cp jasmin.jar:antlr.jar:target/backend:target/parser \
Main.kt
This allows us to parse, compile and run our own scripts.
Here is a example:
sample.txt
"Sample"
= 3
x = 4
y = x * y
z = z + 10
z print z
We can run:
$ java MainKt sample1.txt
22
5 Stack and Register Allocation
Let’s take a closer look at the ClassStmt
’s make
method:
class ClassStmt( ... ) {
...
companion object {
@JvmStatic
fun make(name: String, body: Stmt):ClassStmt
= ClassStmt(
.replace("\"", ""),
name(
MethodStmt"main",
"([Ljava/lang/String;)V",
= 10, // 👈
stackLimit = 10, // 👈
localsLimit
body)
)
}
}
Note
We limited the stack size and register count to 10.
5.1 Register Count Estimation
Registers are used for local variables. So, a conservative estimate is just the number of local variables defined in the scope.
Conservative estimate of localsLimit
fun estimateLocals(code:Code):Int {
val scope = Scope().apply {
.gen(this)
code}
return scope.size
}
This is not an optimal estimation. Consider the code:
= 1
x
print x= 2
y print y
- Both
x
andy
can use the same register, so the optimal estimation is 1. - But
estimateLocals
will be 2.
5.2 Estimation of stack size
Consider the code for 1 + 2 + 3 + 4
.
The generated code is:
1
ldc 2
ldc
iadd3
ldc
iadd4 ldc
This only requires a stack size of 2.
A very similar expression: 1 + (2 + (3 + 4)))
generates the following bytecode:
1
ldc 2
ldc 3
ldc 4
ldc
iadd
iadd iadd
This requires stack size of 4.
A conservative estimate of the stack size is the maximum height of all expressions in the code. The height of an Expr
node can be computed by the parser as part of the SDD.
[Expr code, int height]
expr returns : NUMBER { $code = IntExpr.make($NUMBER.text);
= 1; }
$height | x=expr op='+' y=expr
{ $code = ArithExpr.make($op, $x, $y);
= max($x.height, $y.height) + 1; }
$height | ...
;
Conservative estimate of stack size
Given a
code:Code
, we need to locate all expressionsExpr
in the code using tree walk, and return the maximum expression height.