Compiler Basics and Toolchain

Author

Ken Pu

@file:DependsOn("/jasmin.jar")

1 Design of Compiler Backend

In this section, we will cover the over design of a compiler backend.

It is important to note that for compiled languages, the compiler backend does not have any runtime information. Namely, it would not be able to predict the data values involved in the computation. All code analysis must rely solely on metadata that can be obtained from the parse tree of the program.

1.1 Variable

A variable is object to be created and maintained by the compiler backend for the purpose of code generation. It will have information on the data type (as JVM type signature) and the register (local) index.

Interpreted language:

In the interpreter backend, we had Variable to store the actual runtime data. This is because the interpreter backend is responsible for computing all data values, and thus variables are bound to the their respective data values.

Compiled languages:

Variables do not have the actual data as that would only be available at runtime. Recall the compiler backend only generates bytecode instructions, and do not perform any computation. Thus, the compiler backend does not have access to the actual data values. This means the variables can only be bound to the storage location (register index) and data type of the data value.

data class Variable(
    val local:Int,
    val signature:String = "I",
)

1.2 Scope

A scope is a collection of Variable objects indexed by their variable names as strings.

import java.util.HashMap

class Scope(initUnusedIndex:Int = 0) 
    : HashMap<String, Variable>() {
    // track next local index to be allocated
    var unusedIndex = initUnusedIndex;

    // toString
    override fun toString():String = this.map {
        "${it.key} = ${it.value}"
    }.joinToString("\n")
}
  • A scope is a hashmap from variable name, known as symbol name, to Variable objects.
  • In a scope, we also track the next available register index.
fun Scope.resolveOrPut(symbolName: String):Variable {
    return getOrPut(symbolName) {
        Variable(this.unusedIndex++)
    }
}

Retrieves the variable by symbolName. If it does not exist, then create a new variable.

Note:

We only consider integer values for now. So the variable type is "I".

1.2.1 Testing scope

Scope().apply {
  resolveOrPut("x")
  resolveOrPut("y")
}
x = Variable(local=0, signature=I)
y = Variable(local=1, signature=I)

1.3 Code

Code is a generator of JVM bytecode. In order to generate JVM bytecode, we need a scope.

Every building block is an instance of Code.

We will ensure an invariance property on how the generated bytecode will affect the operand stack.

sealed class Code {
    abstract fun gen(scope: Scope):List<String>
    override fun toString():String 
    = "Bytecode:\n" +
        this.gen(Scope()).mapIndexed {
            index, code -> "  ${index+1}. ${code}"
        }.joinToString("\n")
}

1.4 Statement and expressions

  • Statements code fragments that will not affect the stack after its execution.
  • Expressions are code fragments that will push one element onto the stack after its code execution.
abstract class Stmt: Code()
abstract class Expr: Code()

2 Compiler Backend

2.1 EmptyStatement

object emptyStmt: Stmt() {
    override fun gen(scope: Scope):List<String> = listOf()
}

2.2 Class Statement

class ClassStmt(
    val className: String,
    val body: Stmt,
): Stmt() {
    override fun gen(scope: Scope): List<String> 
    = listOf(
        ".class public $className",
        ".super java/lang/Object"
    ) + 
    body.gen(scope)
}

2.2.1 Testing ClassStmt

ClassStmt("Hello", emptyStmt)
Bytecode:
  1. .class public Hello
  2. .super java/lang/Object

2.3 Method statement

class MethodStmt(
    val name: String,
    val signature: String,
    val stackLimit: Int,
    val localsLimit: Int,
    val body: Stmt,
): Stmt() {
    override fun gen(scope: Scope): List<String>
    = listOf(
        ".method public static $name$signature",
        ".limit stack $stackLimit",
        ".limit locals $localsLimit",
    ) + 
    body.gen(scope) + 
    listOf(
        "return",
        ".end method",
    )
    
    companion object {
        fun main(stackLimit:Int, localsLimit:Int, body:Stmt)
        = MethodStmt(
            "main", "([Ljava/lang/String;)V", 
            stackLimit, localsLimit, body)
    }
}

2.3.1 Testing MethodStmt

ClassStmt(
    "Hello",
    MethodStmt.main(10, 10, emptyStmt)
)
Bytecode:
  1. .class public Hello
  2. .super java/lang/Object
  3. .method public static main([Ljava/lang/String;)V
  4. .limit stack 10
  5. .limit locals 10
  6. return
  7. .end method

2.5 Integer Expression

class IntExpr(
    val value: Int,
): Expr() {
    override fun gen(scope: Scope): List<String> 
    = listOf(
        "ldc $value"
    )
}

2.5.1 Testing Print and IntExpr

IntExpr(100)
Bytecode:
  1. ldc 100
Print(IntExpr(123))
Bytecode:
  1. ldc 123
  2. getstatic java/lang/System/out Ljava/io/PrintStream;
  3. swap
  4. invokevirtual java/io/PrintStream/println(I)V

2.6 Assignment statement

class AssignStmt(
    val symbolName: String,
    val e: Expr,
): Stmt() {
    override fun gen(scope: Scope): List<String>
    = scope.resolveOrPut(symbolName).let {
        e.gen(scope) + 
        "istore ${it.local} ; $symbolName = ..."
    }
}

2.6.1 Testing AssignStmt

AssignStmt("x", IntExpr(100))
Bytecode:
  1. ldc 100
  2. istore 0 ; x = ...

2.7 Block Statement

We can chain multiple statements together (with a shared scope).

class BlockStmt(
    val list:List<Stmt>,
): Stmt() {
    override fun gen(scope: Scope):List<String>
    = list.map {
        it.gen(scope)
    }.flatten()
}

2.7.1 Testing BlockStmt

BlockStmt(
    listOf(
    AssignStmt("x", IntExpr(100)),
    AssignStmt("y", IntExpr(200)),
    AssignStmt("x", IntExpr(300))
    )
)
Bytecode:
  1. ldc 100
  2. istore 0 ; x = ...
  3. ldc 200
  4. istore 1 ; y = ...
  5. ldc 300
  6. istore 0 ; x = ...

2.8 Deref Expression

class DerefExpr(
    val symbolName: String,
): Expr() {
    override fun gen(scope: Scope): List<String> {
        val v = scope[symbolName]
        if(v == null) {
            throw Exception("$symbolName not in scope.")
        }
        return listOf(
        "iload ${v.local} ; $symbolName"
        )
    }
}

2.8.1 Testing Deref

Scope().let {scope ->
    scope.resolveOrPut("x")
    DerefExpr("x").gen(scope)
}
[iload 0 ; x]
try {
    Scope().let {scope ->
        scope.resolveOrPut("x")
        DerefExpr("y").gen(scope)
    }
} catch(e:Exception) {
    println("error: $e")
}
error: java.lang.Exception: y not in scope.
BlockStmt(
    listOf(
    AssignStmt("x", IntExpr(100)),
    AssignStmt("y", DerefExpr("x")),
    )
)
Bytecode:
  1. ldc 100
  2. istore 0 ; x = ...
  3. iload 0 ; x
  4. istore 1 ; y = ...

2.9 Arithmetics

class ArithExpr(
    val op: ArithExpr.Op,
    val left: Expr,
    val right: Expr,
): Expr() {
    enum class Op {
        Add,
        Sub,
        Mul,
        Div,
    }
    override fun gen(scope: Scope): List<String> = (
            left.gen(scope)
            + right.gen(scope)
            + when(op) {
                Op.Add -> "iadd"
                Op.Sub -> "isub"
                Op.Mul -> "imul"
                Op.Div -> "idiv"
            })
}

2.9.1 Test ArithExpr

BlockStmt(listOf(
    AssignStmt("pi", IntExpr(31415)),
    AssignStmt("r", IntExpr(100)),
    Print(
        ArithExpr(ArithExpr.Op.Mul,
            DerefExpr("pi"),
            ArithExpr(ArithExpr.Op.Mul,
                DerefExpr("r"),
                DerefExpr("r")
            )
        )
    ))
)
Bytecode:
  1. ldc 31415
  2. istore 0 ; pi = ...
  3. ldc 100
  4. istore 1 ; r = ...
  5. iload 0 ; pi
  6. iload 1 ; r
  7. iload 1 ; r
  8. imul
  9. imul
  10. getstatic java/lang/System/out Ljava/io/PrintStream;
  11. swap
  12. invokevirtual java/io/PrintStream/println(I)V
pi = 31415
r = 100
print(pi * (r * r))

2.10 Putting them together

val program = ClassStmt(
    "Hello",
    MethodStmt(
        "main", "([Ljava/lang/String;)V", 10, 5,
        BlockStmt(listOf(
            AssignStmt("pi", IntExpr(31415)),
            AssignStmt("r", IntExpr(100)),
            Print(
                ArithExpr(ArithExpr.Op.Mul,
                    DerefExpr("pi"),
                    ArithExpr(ArithExpr.Op.Mul,
                        DerefExpr("r"),
                        DerefExpr("r")
                    )
                )
            ),
        ))
    )
)
println(program)
Bytecode:
  1. .class public Hello
  2. .super java/lang/Object
  3. .method public static main([Ljava/lang/String;)V
  4. .limit stack 10
  5. .limit locals 5
  6. ldc 31415
  7. istore 0 ; pi = ...
  8. ldc 100
  9. istore 1 ; r = ...
  10. iload 0 ; pi
  11. iload 1 ; r
  12. iload 1 ; r
  13. imul
  14. imul
  15. getstatic java/lang/System/out Ljava/io/PrintStream;
  16. swap
  17. invokevirtual java/io/PrintStream/println(I)V
  18. return
  19. .end method

3 JVM Assembler with JASMIN

3.1 Assembler to ClassFile

import jasmin.ClassFile
import java.io.BufferedReader
import java.io.File
import java.io.StringReader
fun compile(classStmt: ClassStmt) {
    val name = classStmt.className
    classStmt
        .gen(Scope())
        .joinToString("\n").also {
            File("$name.j").writeText(it)
        }.run {
            BufferedReader(StringReader(this))
        }.let { reader ->
            ClassFile().apply {
                readJasmin(reader, name, false)
            }
        }.let { classFile ->
            File("$name.class")
            .outputStream().use { classFile.write(it) }
        }    
    println("Success: $name.class")
}
compile(program)
Success: Hello.class

3.2 Java Class Loader

import java.net.URLClassLoader

fun run(className:String) {
    val classLoader = URLClassLoader(arrayOf(File(".").toURI().toURL()))
    val classObj = classLoader.loadClass(className)
    val mainMethod = classObj.getMethod("main", Array<String>::class.java)
    mainMethod.invoke(null, arrayOf<String>())
}
run(program.className)
314150000

4 Conclusion

We have covered:

  • Compiler backend for simple arithemtic and variables
  • Programs can be formed by composing compiler objects
  • Programs generate Java bytecode which are assembled and executed in the JVM

What about branching and loops?

  • See previous lecture on JVM Programming
  • We need to design the corresponding code generators for these programming constructs.

What about function declaration and invocation?

  • See previous lecture on JVM Internals
  • We need to design the code generators for these programming constructs.
  • Scope management needs to be suppose subscopes.