A Brief Intro to JVM Internals

Author

Ken Pu

1 JVM Internals

The Java Virtual Machine (JVM) executes Java bytecode using a stack-based architecture. Here’s a high-level view of its internals.

JVM_Internals JVM Java Virtual Machine Stack JVM Stack (Call Stack) JVM->Stack Heap Heap (Objects & Class Data) JVM->Heap Method_Area Method Area (Bytecode, Static Fields) JVM->Method_Area Frame Stack Frame (Per Method) Stack->Frame Push/Pop Bytecode Bytecode Instructions Method_Area->Bytecode Fetch & Decode Operand_Stack Operand Stack Frame->Operand_Stack Locals Local Variables Frame->Locals Return Return Address Frame->Return Bytecode->Operand_Stack Operate on Stack Bytecode->Locals Load/Store

1.1 Frames (Method Execution Context)

Each method invocation creates a new stack frame. This frame represents the execution context of the method and is pushed onto the Java stack (also called the call stack).

1.2 Frame Structure

Each stack frame contains:

1.2.1 Operand Stack:

A runtime stack where bytecode instructions push, pop, and manipulate values. This is used for expression evaluation and intermediate computations.

 |     |
 | <z> | <-- top of the stack
 | <y> |
 | <x> |
 +-----+

Fixed stack cell capacity

  • Each stack cell is fixed to be 32-bit.
  • In some cases, we need to store a single 64-bit data. This is done by using two cells on the stack, and distinguish them as the low-cell and high-cell.

1.2.2 Local Variables (Local Variable Array):

A fixed-size array storing method parameters and local variables. These are accessed using indexed load/store instructions.

   $0  $1  $2  $3
  +---+---+---+---+---
  | a | b | c | d | ...
  +---+---+---+---+---

Fixed locals cell capacity

  • Each locals cell is fixed to be 32-bit (4 bytes).
  • In some cases, we need to store a single 64-bit (8 bytes) data. This is done by using two cells in the locals array, and distinguish them as the low-cell and high-cell.

1.2.3 Return Address & Frame Data:

Holds metadata needed for execution, including the return address for method calls.

Return address and frame data are automatically managed by the virtual machine.

2 Programming in JVM Bytecode

In this section, we will explore the following topics:

  • Skeleton a Java class in bytecode.
  • Type description
  • Skeleton of a Java method in bytecode.

2.1 Java class in JVM bytecode

In JVM bytecode, we declare a class using the directives:

  • .class <access-modifier> <class-name>
  • .super <java-class-type>
%%jvm filename=Hello.j

.class public Hello
.super java/lang/Object
Generated: Hello.class

This will generate a Java class.

! ls *.class
Hello.class

However, with an empty class, it has no methods, and thus, cannot be executed. For a class to be executable, we need:

public static void main(String[] args) {
 ...
}

But before we get to how to declare methods, we first need to talk about how to describe Java types.

2.2 Type descriptions in JVM bytecode

Any Java type signature is serialized into a single string without whitespace.

Here are the different types of Java types:

  • Primitive Types

    int
  • Object Types

    java.io.OutputStream
  • Array Types

    int[]
    java.lang.String[]
  • Method Types

    double exp(int a, int b)

2.2.1 1. Primitive Type Signatures

Each primitive type has a single-character representation:

Java Type Signature
boolean Z
byte B
char C
short S
int I
long J
float F
double D
void V

2.2.2 2. Object Type Signatures

  • Format: L<class-name>;
  • Example:
    • java.lang.StringLjava/lang/String;
    • java.util.ListLjava/util/List;

2.2.3 3. Array Type Signatures

  • Format: [<element-type>
  • Examples:
    • int[][I
    • String[][][[Ljava/lang/String;

2.2.4 4. Method Type Signatures

  • Format: (<parameter-types>)<return-type>

  • Example:

    int sum(int a, long b, String s) -> (IJLjava/lang/String;)I

2.2.5 5. List of Type Signatures

A very important property of the type signatures is that they can be safely concatenated together without creating any ambiguity.

For example, consider a list types:

int, float, int, String[][], int[]

Each of these is encoded individually as:

I, F, I, [[Ljava/lang/String; [I

Let’s concatenate them together:

IFI[[Ljava/lang/String;[I

We can carefully decode the individual types by scanning the type signature from left to right.

2.3 Method declaration in JVM bytecode

The way to declare a method is:

.method <access-modifer> static? <name> (<arg-types>)<return-type>
    .limit stack <stack-size>
    .limit locals <local-size>
    <instruction>
    <instruction>
    <instruction>
    return
.end method

So we can declare:

public static void main(String[] args)

as

.method public static main([java/lang/String;)V
  ...
.end method

Note, we don’t retain the symbol names for the arguments in JVM bytecode.

2.3.1 An executable class in JVM bytecode

%%jvm filename=Hello.j

.class public Hello
.super java/lang/Object

.method public static main([Ljava/lang/String;)V
.limit stack 10
.limit locals 2
return
.end method
Generated: Hello.class
%%bash
java Hello

Let’s run it. It should succeed without any outputs.

3 JVM Bytecode Programming

Now, let’s focus on the JVM instructions and see how we can perform computation using the low-level JVM operations.

Note

Here is the complete list of JVM instructions and their actions on the stack and locals array.

List of Java bytecode instructions

3.1 Data representation in memory

Data can exist in three areas:

  • operand stack: a stack structure that provides the operands of the instructions, and stores the returned value of the instructions.

  • locals array: an integer indexed array that stores local variables and method parameters.

  • heap: a very large memory pool that is used to store objects that are too large to fit into the 32-bit cells of the stack and locals.

JVM stores large objects (Java objects and arrays) in serialized binary form in regions of the heap. Each large object starts at some heap location, known as its reference. References are 32-bits (using a technique called Compressed Ordinary Object Pointers) and thus can fit into a single cell.

3.2 Move data

3.2.1 ldc

Loads a constant (e.g., an integer, float, or string) from the constant pool onto the operand stack.

3.2.1.1 Example: ldc 10

Stack before: [...]
Stack after: [..., 10]

Note

For string constants,

ldc "hello world"

The string "hello world" is created on the heap, and the reference of the string is placed on the stack.

Note

ldc2_w loads 64-bit constants:

  • 3.1415
  • 100L

3.2.2 dup

Duplicates the top value on the operand stack.

3.2.2.1 Example: dup

Stack before: [..., 10]
Stack after: [..., 10, 10]

Note

dup2 is for 64-bit stack cell.


3.2.3 istore

Stores an integer from the operand stack into a local variable.

3.2.3.1 Example: istore 1

Stack before: [..., 10]
Stack after: [...]
Locals after: $1 = 10

Note

See fstore for float, dstore for double, and lstore for long. For double and long, two cells will be needed.


3.2.4 iload

Loads an integer from a local variable onto the operand stack.

3.2.4.1 Example: iload_1

Stack before: [...]
Stack after: [..., 10]
Locals before: $1 = 10

Note

See fload for float, dload for double, and lload for long. For double and long, two cells will be needed.

3.3 Example

%%jvm stack=10 locals=2

ldc2_w 100
lstore 0
Generated: Hello.class
! java Hello
%%jvm stack=10 locals=1

ldc2_w 100
lstore 0
Generated: Hello.class
! java Hello
Error: Unable to initialize main class Hello
Caused by: java.lang.VerifyError: (class: Hello, method: main signature: ([Ljava/lang/String;)V) Illegal local variable number

4 Arithmetic Instructions in JVM Bytecode**

4.1 Integer arithmetics

  1. iadd
    • Adds the top two integers from the operand stack and pushes the result back.
    • Example: iadd
    • Before: [... 3, 5]
    • After: [... 8]
  2. isub
    • Subtracts the second popped value from the first and pushes the result back.
    • Example: isub
    • Before: [... 7, 2]
    • After: [... 5]
  3. imul
    • Multiplies the top two integers from the operand stack and pushes the result back.
    • Example: imul
    • Before: [... 4, 6]
    • After: [... 24]
  4. idiv
    • Divides the first popped value by the second and pushes the quotient back.
    • Throws ArithmeticException if division by zero occurs.
    • Example: idiv
    • Before: [... 8, 2]
    • After: [... 4]
  5. irem
    • Computes the remainder of the division of two popped values and pushes the result back.
    • Throws ArithmeticException if division by zero occurs.
    • Example: irem
    • Before: [... 9, 4]
    • After: [... 1]
  6. ineg
    • Negates the top integer from the operand stack and pushes the result back.
    • Example: ineg
    • Before: [... 6]
    • After: [... -6]
  7. iinc
    • Increments a local variable by a specified constant.
    • Example: iinc 0, 3
    • Before: $0 = 5
    • After: $0 = 8

4.2 Other scalar data types

Note

Refer to the floating point, double, and long variants:

  • fadd
  • fmul
  • fsub

4.3 Example

\[ A = \pi r^2 \] where

  • \(\pi = 3.14\)
  • \(r = 5.0\)
%%jvm stack=10 locals=10
ldc 3.14f
ldc 5.0f
dup
fmul
fmul
Generated: Hello.class
! java Hello

5 Object Oriented Programming in JVM

5.1 Object-Oriented Instructions in JVM Bytecode**

  • new

    • Creates a new object and pushes a reference onto the operand stack.
    • Example: new java/lang/String
    • Before: [...]
    • After: [... ref]
  • invokespecial

    • Calls an instance constructor (<init>) or a private method.
    • Example: invokespecial java/lang/Object/<init>()V
    • Before: [... ref]
    • After: [...]
  • invokevirtual

    • Calls an instance method based on the runtime type of the object.
    • Example: invokevirtual java/lang/String/length()I
    • Before: [... ref]
    • After: [... int]
  • invokestatic

    • Calls a static method.
    • Example: invokestatic java/lang/Math/abs(I)I
    • Before: [... int]
    • After: [... int]
  • getfield

    • Fetches an instance field value and pushes it onto the operand stack.
    • Example: getfield java/lang/String/value [C
    • Before: [... ref]
    • After: [... ref, value]
  • putfield

    • Sets an instance field value.
    • Example: putfield java/lang/String/value [C
    • Before: [... ref, value]
    • After: [...]
  • getstatic

    • Fetches a static field value and pushes it onto the operand stack.
    • Example: getstatic java/lang/System/out Ljava/io/PrintStream;
    • Before: [...]
    • After: [... ref]
  • putstatic

    • Sets a static field value.
    • Example: putstatic java/lang/System/version I
    • Before: [... int]
    • After: [...]

5.2 Example

Now, we can examine the arithmetic results.

%%jvm stack=10 locals=10
ldc 3.14f                                              ;; [3.14f]
ldc 5.0f                                               ;; [3.14f 5.0f]
dup                                                    ;; [3.14f 5.0f 5.0f]
fmul                                                   ;; [3.14f 25.0f]
fmul                                                   ;; [result]
getstatic  java/lang/System/out Ljava/io/PrintStream;  ;; [result out]
swap                                                   ;; [out result]
invokevirtual java/io/PrintStream/print(F)V            ;; []
Generated: Hello.class
! java Hello
78.5