A Brief Intro to JVM Internals
1 JVM Internals
The Java Virtual Machine (JVM) executes Java bytecode using a stack-based architecture. Here’s a high-level view of its internals.
1.1 Frames (Method Execution Context)
Each method invocation creates a new stack frame. This frame represents the execution context of the method and is pushed onto the Java stack (also called the call stack).
1.2 Frame Structure
Each stack frame contains:
1.2.1 Operand Stack:
A runtime stack where bytecode instructions push, pop, and manipulate values. This is used for expression evaluation and intermediate computations.
| |
| <z> | <-- top of the stack
| <y> |
| <x> |
+-----+
Fixed stack cell capacity
- Each stack cell is fixed to be 32-bit.
- In some cases, we need to store a single 64-bit data. This is done by using two cells on the stack, and distinguish them as the low-cell and high-cell.
1.2.2 Local Variables (Local Variable Array):
A fixed-size array storing method parameters and local variables. These are accessed using indexed load/store instructions.
$0 $1 $2 $3
+---+---+---+---+---
| a | b | c | d | ...
+---+---+---+---+---
Fixed locals cell capacity
- Each locals cell is fixed to be 32-bit (4 bytes).
- In some cases, we need to store a single 64-bit (8 bytes) data. This is done by using two cells in the locals array, and distinguish them as the low-cell and high-cell.
1.2.3 Return Address & Frame Data:
Holds metadata needed for execution, including the return address for method calls.
Return address and frame data are automatically managed by the virtual machine.
2 Programming in JVM Bytecode
In this section, we will explore the following topics:
- Skeleton a Java class in bytecode.
- Type description
- Skeleton of a Java method in bytecode.
2.1 Java class in JVM bytecode
In JVM bytecode, we declare a class using the directives:
.class <access-modifier> <class-name>
.super <java-class-type>
%%jvm filename=Hello.j
class public Hello
.super java/lang/Object .
Generated: Hello.class
This will generate a Java class.
! ls *.class
Hello.class
However, with an empty class, it has no methods, and thus, cannot be executed. For a class to be executable, we need:
public static void main(String[] args) {
...
}
But before we get to how to declare methods, we first need to talk about how to describe Java types.
2.2 Type descriptions in JVM bytecode
Any Java type signature is serialized into a single string without whitespace.
Here are the different types of Java types:
Primitive Types
int
Object Types
java.io.OutputStream
Array Types
int[] java.lang.String[]
Method Types
double exp(int a, int b)
2.2.1 1. Primitive Type Signatures
Each primitive type has a single-character representation:
Java Type | Signature |
---|---|
boolean |
Z |
byte |
B |
char |
C |
short |
S |
int |
I |
long |
J |
float |
F |
double |
D |
void |
V |
2.2.2 2. Object Type Signatures
- Format:
L<class-name>;
- Example:
java.lang.String
→Ljava/lang/String;
java.util.List
→Ljava/util/List;
2.2.3 3. Array Type Signatures
- Format:
[<element-type>
- Examples:
int[]
→[I
String[][]
→[[Ljava/lang/String;
2.2.4 4. Method Type Signatures
Format:
(<parameter-types>)<return-type>
Example:
int sum(int a, long b, String s) -> (IJLjava/lang/String;)I
2.2.5 5. List of Type Signatures
A very important property of the type signatures is that they can be safely concatenated together without creating any ambiguity.
For example, consider a list types:
int, float, int, String[][], int[]
Each of these is encoded individually as:
I, F, I, [[Ljava/lang/String; [I
Let’s concatenate them together:
IFI[[Ljava/lang/String;[I
We can carefully decode the individual types by scanning the type signature from left to right.
2.3 Method declaration in JVM bytecode
The way to declare a method is:
.method <access-modifer> static? <name> (<arg-types>)<return-type>
.limit stack <stack-size>
.limit locals <local-size>
<instruction>
<instruction>
<instruction>
return
.end method
So we can declare:
public static void main(String[] args)
as
.method public static main([java/lang/String;)V
...
.end method
Note, we don’t retain the symbol names for the arguments in JVM bytecode.
2.3.1 An executable class in JVM bytecode
%%jvm filename=Hello.j
class public Hello
.super java/lang/Object
.
/lang/String;)V
.method public static main([Ljava10
.limit stack locals 2
.limit return
.end method
Generated: Hello.class
%%bash
java Hello
Let’s run it. It should succeed without any outputs.
3 JVM Bytecode Programming
Now, let’s focus on the JVM instructions and see how we can perform computation using the low-level JVM operations.
Here is the complete list of JVM instructions and their actions on the stack and locals array.
3.1 Data representation in memory
Data can exist in three areas:
operand stack: a stack structure that provides the operands of the instructions, and stores the returned value of the instructions.
locals array: an integer indexed array that stores local variables and method parameters.
heap: a very large memory pool that is used to store objects that are too large to fit into the 32-bit cells of the stack and locals.
JVM stores large objects (Java objects and arrays) in serialized binary form in regions of the heap. Each large object starts at some heap location, known as its reference. References are 32-bits (using a technique called Compressed Ordinary Object Pointers) and thus can fit into a single cell.
3.2 Move data
3.2.1 ldc
Loads a constant (e.g., an integer, float, or string) from the constant pool onto the operand stack.
3.2.1.1 Example: ldc 10
Stack before: [...]
Stack after: [..., 10]
For string constants,
ldc "hello world"
The string "hello world"
is created on the heap, and the reference of the string is placed on the stack.
ldc2_w
loads 64-bit constants:
- 3.1415
- 100L
3.2.2 dup
Duplicates the top value on the operand stack.
3.2.2.1 Example: dup
Stack before: [..., 10]
Stack after: [..., 10, 10]
dup2
is for 64-bit stack cell.
3.2.3 istore
Stores an integer from the operand stack into a local variable.
3.2.3.1 Example: istore 1
Stack before: [..., 10]
Stack after: [...]
Locals after: $1 = 10
See fstore
for float, dstore
for double, and lstore
for long. For double
and long
, two cells will be needed.
3.2.4 iload
Loads an integer from a local variable onto the operand stack.
3.2.4.1 Example: iload_1
Stack before: [...]
Stack after: [..., 10]
Locals before: $1 = 10
See fload
for float, dload
for double, and lload
for long. For double
and long
, two cells will be needed.
3.3 Example
%%jvm stack=10 locals=2
100
ldc2_w 0 lstore
Generated: Hello.class
! java Hello
%%jvm stack=10 locals=1
100
ldc2_w 0 lstore
Generated: Hello.class
! java Hello
Error: Unable to initialize main class Hello
Caused by: java.lang.VerifyError: (class: Hello, method: main signature: ([Ljava/lang/String;)V) Illegal local variable number
4 Arithmetic Instructions in JVM Bytecode**
4.1 Integer arithmetics
- iadd
- Adds the top two integers from the operand stack and pushes the result back.
- Example:
iadd
- Before:
[... 3, 5]
- After:
[... 8]
- isub
- Subtracts the second popped value from the first and pushes the result back.
- Example:
isub
- Before:
[... 7, 2]
- After:
[... 5]
- imul
- Multiplies the top two integers from the operand stack and pushes the result back.
- Example:
imul
- Before:
[... 4, 6]
- After:
[... 24]
- idiv
- Divides the first popped value by the second and pushes the quotient back.
- Throws
ArithmeticException
if division by zero occurs. - Example:
idiv
- Before:
[... 8, 2]
- After:
[... 4]
- irem
- Computes the remainder of the division of two popped values and pushes the result back.
- Throws
ArithmeticException
if division by zero occurs. - Example:
irem
- Before:
[... 9, 4]
- After:
[... 1]
- ineg
- Negates the top integer from the operand stack and pushes the result back.
- Example:
ineg
- Before:
[... 6]
- After:
[... -6]
- iinc
- Increments a local variable by a specified constant.
- Example:
iinc 0, 3
- Before:
$0 = 5
- After:
$0 = 8
4.2 Other scalar data types
Refer to the floating point, double, and long variants:
- fadd
- fmul
- fsub
- …
4.3 Example
\[ A = \pi r^2 \] where
- \(\pi = 3.14\)
- \(r = 5.0\)
%%jvm stack=10 locals=10
3.14f
ldc 5.0f
ldc
dup
fmul fmul
Generated: Hello.class
! java Hello
5 Object Oriented Programming in JVM
5.1 Object-Oriented Instructions in JVM Bytecode**
new
- Creates a new object and pushes a reference onto the operand stack.
- Example:
new java/lang/String
- Before:
[...]
- After:
[... ref]
invokespecial
- Calls an instance constructor (
<init>
) or a private method. - Example:
invokespecial java/lang/Object/<init>()V
- Before:
[... ref]
- After:
[...]
- Calls an instance constructor (
invokevirtual
- Calls an instance method based on the runtime type of the object.
- Example:
invokevirtual java/lang/String/length()I
- Before:
[... ref]
- After:
[... int]
invokestatic
- Calls a static method.
- Example:
invokestatic java/lang/Math/abs(I)I
- Before:
[... int]
- After:
[... int]
getfield
- Fetches an instance field value and pushes it onto the operand stack.
- Example:
getfield java/lang/String/value [C
- Before:
[... ref]
- After:
[... ref, value]
putfield
- Sets an instance field value.
- Example:
putfield java/lang/String/value [C
- Before:
[... ref, value]
- After:
[...]
getstatic
- Fetches a static field value and pushes it onto the operand stack.
- Example:
getstatic java/lang/System/out Ljava/io/PrintStream;
- Before:
[...]
- After:
[... ref]
putstatic
- Sets a static field value.
- Example:
putstatic java/lang/System/version I
- Before:
[... int]
- After:
[...]
5.2 Example
Now, we can examine the arithmetic results.
%%jvm stack=10 locals=10
3.14f ;; [3.14f]
ldc 5.0f ;; [3.14f 5.0f]
ldc ;; [3.14f 5.0f 5.0f]
dup ;; [3.14f 25.0f]
fmul ;; [result]
fmul /lang/System/out Ljava/io/PrintStream; ;; [result out]
getstatic java;; [out result]
swap /io/PrintStream/print(F)V ;; [] invokevirtual java
Generated: Hello.class
! java Hello
78.5