The assembler will automatically recognize files with the extension jbc
(Java ByteCode) as files to parse and assemble.
Files with the extension class
will be disassembled to jbc
files.
Tokens are defined in this document using the token := ...
notation.
Tokens are written in italic, literals use the normal formatting.
Regex-like operations, such as (
and )
for groups, *
for 0 or more, and ?
for 0 or 1 are also be used in the documentation.
Standard Java syntax comments are possible: //
for single-line comments and /* */
for multi-line comments.
In essence, the Assembler can distinguish 4 different token types (based on the StreamTokenizer
) tokens:
number
: any sequence of0-9
, starting with-
for negative numbers, and containing a single.
for non-integer numbers.word
: any sequence of-
,.
,0-9
,A-Z
,a-z
, and all characters with a value greater than, or equal to 240 but less than, or equal to 255. Aword
must not start with anumber
.string
: any sequence of characters surrounded by double quotes ("
)string
can contain escaped characters:\a
for the bell character\b
for the backspace character\f
for the new page character\n
for the new line character\r
for the carriage return character\t
for the horizontal tab character\v
for the vertical tab character
- Additionally
string
can contain octal-escaped characters:\xxx
where x is a0-7
digit (up to\377
).
character
: a single character, surrounded by single quotes ('
)character
follows the same escape rules asstring
, e.g.'\n'
and'\177'
are valid characters.
Types generally follow the Java syntax, albeit less restrictive: any word can be a type, and any type can be succeeded by []
to denote an array.
Method arguments also follow Java, with the important distinction that no argument names are specified.
type := word type := type [] methodArguments := ( ) methodArguments := ( (type ,)* type )
In most cases, every Java bytecode access flag can be combined, even if these combinations would be meaningless, or illegal for the JVM. An exception to this are some class access flags, which are conveniently expressed as class types rather than access flags.
classAccessFlag := public classAccessFlag := private classAccessFlag := protected classAccessFlag := static classAccessFlag := final classAccessFlag := super classAccessFlag := synchronized classAccessFlag := volatile classAccessFlag := transient classAccessFlag := bridge classAccessFlag := varargs classAccessFlag := native classAccessFlag := abstract classAccessFlag := strictfp classAccessFlag := synthetic classAccessFlag := mandated classAccessFlag := open classAccessFlag := transitive classAccessFlag := static_phase
accessFlag := classAccessFlag accessFlag := module accessFlag := enum accessFlag := interface accessFlag := annotation
classFile := import* version class import := import type ; version := version number ;
As in Java, it's possible to import classes at the top of the file. Only fully qualified class names are allowed, no wildcard or static imports are supported.
Every file should also declare the Java version to assemble for. Valid versions are: 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 5. 1.6, 6, 1.7, 7, 1.8, 8, 1.9, 9, 10, 11, 12, and 13.
Example:
import java.lang.String;
import java.lang.System;
import java.io.PrintStream;
version 12;
public class MyClass {
public static void main(final String[] args) {
getstatic System#PrintStream out
ldc "Hello World!"
invokevirtual PrintStream#void println(String)
return
}
}
class := classAccessFlag* classType word superClassSpecifier classInterfacesSpecifier attributes? classBody class := classAccessFlag* interfaceType word interfaceInterfacesSpecifier attributes? classBody classType := class classType := enum classType := module interfaceType := interface interfaceType := @interface superClassSpecifier := extends word classInterfacesSpecifier := implements (word ,)* word interfaceInterfacesSpecifier := extends (word ,)* word classBody := ; classBody := { classMember* }
Classes are defined analogously to Java: classes and enums can extend superclasses and implement interfaces.
Even though the syntax also allows modules to extend a superclass and implement interfaces, this is illegal for the JVM.
Interfaces and annotations (@interface
) use the extends
keywords as syntactic sugar to implement interfaces.
Additionally, classes extend java.lang.Object
by default, emums extend java.lang.Enum
by default, and annotations implement java.lang.annotation.Annotation
by default.
Classes can optionally declare attributes and class members.
Examples:
public class MyException extends RuntimeException { // No attributes
// No fields
// No methods
}
public enum MyEnum; // Extends java.lang.Enum by default, no attributes, no fields, no methods
public @interface MyAnnotatation [ // Implements java.lang.annotation.Annotation by default
Synthetic;
Deprecated;
]; // No fields, no methods
public module module-info [ // Does not extend java.lang.Object by default
// No attributes
] {
// No fields, no methods
}
public class MyClass; // Extends java.lang.Object by default, no attributes, no fields, no methods
classMember := field classMember := method field := accessFlag* type word (= fieldConstant)? attributes? ; method := accessFlag* attributes? methodBody method := accessFlag* type methodName methodArgumentsDefinition methodThrows? attributes? methodBody methodBody := ; methodBody := { instruction* } methodName := word methodName := <init> methodName := <clinit> methodThrows := throws (type ,)* type methodArgumentsDefinition := ( ) methodArgumentsDefinition := ( (methodArgumentDefinition ,)* methodArgumentDefinition ) methodArgumentDefinition := accessFlag* type word?
Fields and methods are also defined analogously to Java.
Fields can be initialized using the equals sign (=
), which will set the ConstantValue attribute.
Although this syntax is always valid, this initialization is only legal for the JVM if the field has static
and final
access flags.
The loadable constant type does not have to match the field type, indeed, some combinations are perfectly valid: a boolean
field can be initialized using an intConstant
.
Fields can also optionally declare attributes.
Examples:
public static final int INT_FIELD = 0; // No attributes
public static final boolean BOOLEAN_FIELD = 1; // The loadable constant type does not have to match the field type.
java.lang.String myStringField [
Deprecated;
Synthetic;
];
protected transient char someChar = 'a' []; // This is illegal for the JVM, as the field is not static and final.
A method can be declared in class initializer format or regular Java method format.
Method arguments can contain access flags and method names, and the method can have a throws
clause.
Methods can also optionally have a method attribute, which will set the Code attribute, and declare other attributes.
Examples:
static { // No attributes
return
} // Class initializer
public static final void main(synthetic final java.lang.String[] args) throws java.lang.Throwable [
Deprecated;
Synthetic;
] {
new java.lang.Exception
dup
invokespecial java.lang.Exception#void <init>()
athrow
}
public void <init>() [
Code {
aload_0
invokespecial java.lang.Object#void <init>()
return
} // Explicit code attribute
]; // No method body
fieldReference := type # type word fieldReference := # type word methodReference := type # type word methodArguments methodReference := # type word methodArguments invokedynamicReference := number type word methodArguments methodHandle := getstatic fieldReference methodHandle := putstatic fieldReference methodHandle := getfield fieldReference methodHandle := putfield fieldReference methodHandle := invokevirtual methodReference methodHandle := invokestatic methodReference methodHandle := invokespecial methodReference methodHandle := newinvokespecial methodReference methodHandle := invokeinterface methodReference
If the first type
of fieldReference
or methodReference
is not supplied, the type will be the type of the current class being assembled.
In other words, these notations are shorthand for accessing fields or invoking methods of the current class.
invokedynamicReference
has 4 arguments: the index in the BootstrapMethods attribute, the return type of the method, the name of the method, and the method arguments.
Because the Assembler has to know which constant to assign a value to, there are multiple notations for most constants.
Some constants have a defining format, for example in the case of booleanConstant
, however it's always possible to explicitly provide the type of the constant, in Java 'cast' format.
boolean := true boolean := false doubleLiteralSuffix := D doubleLiteralSuffix := d floatLiteralSuffix := F floatLiteralSuffix := f longLiteralSuffix := L longLiteralSuffix := l booleanConstant := boolean booleanConstant := (boolean) boolean booleanConstant := (boolean) number byteConstant := (byte) number charConstant := character charConstant := (char) character charConstant := (char) number doubleConstant := number doubleLiteralSuffix doubleConstant := (double) number doubleLiteralSuffix doubleConstant := (double) number floatConstant := number floatLiteralSuffix floatConstant := (float) number floatLiteralSuffix floatConstant := (float) number intConstant := number intConstant := (int) number longConstant := number longLiteralSuffix longConstant := (long) number longLiteralSuffix longConstant := (long) number shortConstant := (short) number stringConstant := string stringConstant := (String) string classConstant := type classConstant := (Class) type methodHandleConstant := (MethodHandle) methodHandle methodTypeConstant := (MethodType) type methodArguments dynamicConstant := (Dynamic) number type word fieldConstant := booleanConstant fieldConstant := byteConstant fieldConstant := charConstant fieldConstant := doubleConstant fieldConstant := floatConstant fieldConstant := intConstant fieldConstant := longConstant fieldConstant := shortConstant fieldConstant := stringConstant loadableConstant := fieldConstant loadableConstant := classConstant loadableConstant := methodHandleConstant loadableConstant := methodTypeConstant loadableConstant := dynamicConstant
booleanConstant
, byteConstant
, charConstant
, intConstant
, and shortConstant
are all converted to integer constants by the Assembler.
This means that, in most cases, those constants are indistinguishable in the compiled class file.
dynamicConstant
has 3 arguments: the index in the BootstrapMethods attribute, the type of the constant and the name of the constant.
attributes := [ attribute* ]
Some attributes are not explicitly parsed by the Assembler, but handled in a special way:
- ConstantValue: assignment similar to Java (see section Fields)
- MethodParameters: parameter access flags and names similar to Java (see section Methods)
- Exceptions: methods throw exceptions similar to Java (see section Methods)
- StackMap and StackMapTable: code is preverified by the ProGuard preverifier and these attributes are generated automatically.
- LineNumberTable, LocalVariableTable, and LocalVariableTypeTable: using pseudo-instructions in the code (see subsection Code attribute)
These attributes can not be defined explicitly, and will not be printed explicitly by the Disassembler
attribute := BootstrapMethods { bootstrapMethod* } bootstrapMethod := methodHandle { bootstrapMethodArgument* } bootstrapMethodArgument := loadableConstant ;
Example:
BootstrapMethods {
invokestatic java.lang.invoke.StringConcatFactory#java.lang.invoke.CallSite makeConcatWithConstants(java.lang.invoke.MethodHandles$Lookup, java.lang.String, java.lang.invoke.MethodType, java.lang.String, java.lang.Object[]) {
"abc \001 def";
}
}
attribute := SourceFile string ;
Example: SourceFile "Assembler.java";
attribute := SourceDir string ;
Example: SourceDir "My Source Directory";
attribute := InnerClasses { innerClass* } innerClass := classAccessFlag* innerClassType innerName? outerClass? ; innerClassType := classType innerClassType := interfaceType innerName := as word outerClass := in type
Both innerName
and outerClass
are optional. Note that even though module is a valid class type, it has no valid meaning in inner classes in Java bytecode.
Example:
InnerClasses {
public class InnerClass as InnerName in OuterClass;
public static @interface InnerAnnotation as Annotation;
public enum InnerEnum in EnclosingClass;
private module InnerModule;
}
attribute := EnclosingMethod enclosingClass enclosingMethod? ; enclosingClass := type enclosingMethod := # type word methodArguments
Although the enclosing class always has to be specified, enclosingMethod
is optional.
Example:
EnclosingMethod EnclosingClass # void enclosingMethod(java.lang.String, java.lang.Object);
EnclosingMethod AnotherEnclosingClass;
attribute := NestHost type ;
Example:
NestHost java.lang.Class;
attribute := NestMembers { nestMember* } nestMember := type ;
Example:
NestMembers {
java.lang.Class;
java.lang.String;
}
attribute := Deprecated ;
attribute := Synthetic ;
attribute := Signature string ;
Example:
Signature "Ljava/lang/Enum<LType;>;";
attribute := Code { instruction* } attributes?
instruction := nop instruction := aconst_null instruction := iconst_m1 instruction := iconst_0 instruction := iconst_1 instruction := iconst_2 instruction := iconst_3 instruction := iconst_4 instruction := iconst_5 instruction := lconst_0 instruction := lconst_1 instruction := fconst_0 instruction := fconst_1 instruction := fconst_2 instruction := dconst_0 instruction := dconst_1 instruction := bipush number instruction := sipush number instruction := ldc loadableConstant instruction := ldc_w loadableConstant instruction := ldc2_w loadableConstant instruction := iload number instruction := lload number instruction := fload number instruction := dload number instruction := aload number instruction := iload_0 instruction := iload_1 instruction := iload_2 instruction := iload_3 instruction := lload_0 instruction := lload_1 instruction := lload_2 instruction := lload_3 instruction := fload_0 instruction := fload_1 instruction := fload_2 instruction := fload_3 instruction := dload_0 instruction := dload_1 instruction := dload_2 instruction := dload_3 instruction := aload_0 instruction := aload_1 instruction := aload_2 instruction := aload_3 instruction := iaload instruction := laload instruction := faload instruction := daload instruction := aaload instruction := baload instruction := caload instruction := saload instruction := istore number instruction := lstore number instruction := fstore number instruction := dstore number instruction := astore number instruction := istore_0 instruction := istore_1 instruction := istore_2 instruction := istore_3 instruction := lstore_0 instruction := lstore_1 instruction := lstore_2 instruction := lstore_3 instruction := fstore_0 instruction := fstore_1 instruction := fstore_2 instruction := fstore_3 instruction := dstore_0 instruction := dstore_1 instruction := dstore_2 instruction := dstore_3 instruction := astore_0 instruction := astore_1 instruction := astore_2 instruction := astore_3 instruction := iastore instruction := lastore instruction := fastore instruction := dastore instruction := aastore instruction := bastore instruction := castore instruction := sastore instruction := pop instruction := pop2 instruction := dup instruction := dup_x1 instruction := dup_x2 instruction := dup2 instruction := dup2_x1 instruction := dup2_x2 instruction := swap instruction := iadd instruction := ladd instruction := fadd instruction := dadd instruction := isub instruction := lsub instruction := fsub instruction := dsub instruction := imul instruction := lmul instruction := fmul instruction := dmul instruction := idiv instruction := ldiv instruction := fdiv instruction := ddiv instruction := irem instruction := lrem instruction := frem instruction := drem instruction := ineg instruction := lneg instruction := fneg instruction := dneg instruction := ishl instruction := lshl instruction := ishr instruction := lshr instruction := iushr instruction := lushr instruction := iand instruction := land instruction := ior instruction := lor instruction := ixor instruction := lxor instruction := iinc number number instruction := i2l instruction := i2f instruction := i2d instruction := l2i instruction := l2f instruction := l2d instruction := f2i instruction := f2l instruction := f2d instruction := d2i instruction := d2l instruction := d2f instruction := i2b instruction := i2c instruction := i2s instruction := lcmp instruction := fcmpl instruction := fcmpg instruction := dcmpl instruction := dcmpg instruction := ifeq label instruction := ifne label instruction := iflt label instruction := ifge label instruction := ifgt label instruction := ifle label instruction := if_icmpeq label instruction := if_icmpne label instruction := if_icmplt label instruction := if_icmpge label instruction := if_icmpgt label instruction := if_icmple label instruction := if_acmpeq label instruction := if_acmpne label instruction := goto label instruction := jsr label instruction := ret number instruction := tableswitch { switchCase* } instruction := lookupswitch { switchCase* } instruction := ireturn instruction := lreturn instruction := freturn instruction := dreturn instruction := areturn instruction := return instruction := getstatic fieldReference instruction := putstatic fieldReference instruction := getfield fieldReference instruction := putfield fieldReference instruction := invokevirtual methodReference instruction := invokespecial methodReference instruction := invokestatic methodReference instruction := invokeinterface methodReference instruction := invokedynamic invokedynamicReference instruction := new type instruction := newarray type instruction := anewarray type instruction := arraylength instruction := athrow instruction := checkcast type instruction := instanceof type instruction := monitorenter instruction := monitorexit instruction := multianewarray type number instruction := ifnull label instruction := ifnonnull label instruction := goto_w label instruction := jsr_w label switchCase := case number : label switchCase := default : label
Note that the wide
instruction is not present, this instruction is replaced by the pseudo-instructions:
instruction := iload_w number instruction := lload_w number instruction := fload_w number instruction := dload_w number instruction := aload_w number instruction := istore_w number instruction := lstore_w number instruction := fstore_w number instruction := dstore_w number instruction := astore_w number instruction := iinc_w number number instruction := ret_w number
Furthermore, pseudo-instructions exist for labels, try-catch blocks, local variables, local variable types, and line numbers:
instruction := label : instruction := catch type label label instruction := catch any label label instruction := startlocalvar number type word instruction := endlocalvar number instruction := startlocalvartype number string word instruction := endlocalvartype number instruction := line number label := word
A catch
pseudo-instruction specifies an exception handler at the location of the pseudo-instruction.
The catch type, start, end, and handler will be added to the exception table in the Code attribute.
startlocalvar
and startlocalvartype
, endlocalvar
and endlocalvartype
, specify the start or end of a local variable or local variable type, respectively.
These pseudo-instructions modify the LocalVariableTable or LocalVariableTypeTable attributes in the Code attribute.
The number
defines the index of the local variable or local variable type.
A startlocalvar
and startlocalvartype
must always have an accompanying endlocalvar
or endlocalvartype
, placed after the startlocalvar
or startlocalvartype
in the instructions.
line
specifies the line number
at a position in the bytecode. The line number
and bytecode offset will be stored in a LineNumberTable attribute.
attribute := RuntimeVisibleAnnotations { annotation* } attribute := RuntimeInvisibleAnnotations { annotation* } attribute := RuntimeVisibleParameterAnnotations { parameterAnnotation* } attribute := RuntimeInvisibleParameterAnnotations { parameterAnnotation* } attribute := RuntimeVisibleTypeAnnotations { typeAnnotation* } attribute := RuntimeInvisibleTypeAnnotations { typeAnnotation* } attribute := AnnotationDefault elementValue annotation := type { (word = elementValue)* } parameterAnnotation := { annotation* } typeAnnotation := annotation targetInfo { typePath* }
Examples:
RuntimeVisibleAnnotations {
java.lang.Deprecated {
since = "sinceVersion";
forRemoval = true;
}
}
RuntimeInvisibleAnnotations {
java.lang.Deprecated {} // Empty values
}
RuntimeVisibleParameterAnnotations {
{} // Empty annotations for parameter 0
{
java.lang.Deprecated {
since = "sinceVersion";
forRemoval = true;
}
}
}
RuntimeInvisibleParameterAnnotations {
{
java.lang.Deprecated {} // Empty values
}
{} // Empty annotations for parameter 1
{} // Empty annotations for parameter 2
{} // Empty annotations for parameter 3
}
RuntimeVisibleTypeAnnotations {
java.lang.Deprecated {
since = "sinceVersion";
forRemoval = true;
} local_variable {
start0 end0 0;
start10 end10 10;
} {} // Empty type path
}
RuntimeVisibleTypeAnnotations {
java.lang.Deprecated {} argument_generic_method_new newLabel 1 {
array;
type_argument 1;
}
}
RuntimeInvisibleTypeAnnotations {
java.lang.Deprecated {} field {} // Empty values, empty type path
}
AnnotationDefault {
false; // Boolean element value
true; // Boolean element value
(byte) 1; // Byte element value
'2'; // Char element value
3.0D; // Double element value
4F; // Float element value
5; // Int element value
6l; // Long element value
(short) 7; // Short element value
"string"; // String element value
java.lang.Class; // Class element value
Enum#Constant; // Enum constant element value
@java.lang.Deprecated {} // Annotation element value
{} // Array element value
} // Array element value
elementValue := booleanConstant ; elementValue := byteConstant ; elementValue := charConstant ; elementValue := doubleConstant ; elementValue := floatConstant ; elementValue := intConstant ; elementValue := longConstant ; elementValue := shortConstant ; elementValue := stringConstant ; elementValue := classConstant ; elementValue := (Enum) type # word ; elementValue := type # word ; elementValue := (Annotation) annotation elementValue := @ annotation elementValue := (Array) { elementValue* } elementValue := { elementValue* }
Apart from the usual primitive constants, string constants, and class constants, element values can also denote enum constants (enum type + constant name), annotations and arrays.
Note that annotation element values and array element values do not end with a ;
, as they already (either implicitly or explicitly) end with a }
.
targetInfo := parameter_generic_class number targetInfo := parameter_generic_method number targetInfo := extends number targetInfo := bound_generic_class number number targetInfo := bound_generic_method number number targetInfo := field targetInfo := return targetInfo := receiver targetInfo := parameter number targetInfo := throws number targetInfo := local_variable { localVar* } targetInfo := resource_variable { localVar* } targetInfo := catch number targetInfo := instance_of label targetInfo := new label targetInfo := method_reference_new label targetInfo := method_reference label targetInfo := cast label number targetInfo := argument_generic_method_new label number targetInfo := argument_generic_method label number targetInfo := argument_generic_method_reference_new label number targetInfo := argument_generic_method_reference label number localVar := label label number ;
In general, the arguments of the target infos roughly match the ones specified in the Class File Format specification.
typePath := array number? ; typePath := inner_type number? ; typePath := wildcard number? ; typePath := type_argument number? ;
Although every type path has an optional number
argument, this argument only has meaning in combination with type_argument
.
In that case, the number
denotes which type argument is annotated (see the Class File Format specification for more details).
attribute := Module accessFlag* word word? { moduleDirective* } moduleDirective := requires accessFlag* word word? ; moduleDirective := exports accessFlag* type exportsTo? ; moduleDirective := opens accessFlag* type opensTo? ; moduleDirective := uses type ; moduleDirective := provides type providesWith? ; exportsTo := to (word ,)* word opensTo := to (word ,)* word providesWith := with (type ,)* type
The module attribute specifies the module access flags, the module name, and an optional module version.
As the module version must be a word
, it can not start with a number
.
exports, opens, and provides all have optional arguments specifying the directive. These arguments use the same syntax as their Java counterparts.
Example:
Module open synthetic mandated ModuleName v1.0 {
requires transitive some.package.RequiredModule v1.0;
requires static_phase some.package.OtherRequiredModule;
requires synthetic some.package.SyntheticRequiredModule alpha;
requires mandated some.package.MandatedRequiredModule beta;
exports synthetic some.package.exportedpackage;
exports mandated some.package.mandated.exportedpackage to some.package.export.to.package, some.package.export.to.otherpackage, some.package.export.to.finalpackage;
opens synthetic some.package.openedpackage;
opens mandated some.package.mandated.openedpackage to some.package.open.to.package, some.package.open.to.otherpackage, some.package.open.to.finalpackage;
uses some.package.UsedClass;
uses some.package.OtherUsedClass;
uses some.package.MoreUsedClass;
uses some.package.FinalUsedClass;
provides some.package.ProvidedClass;
provides some.package.OtherProvidedClass with some.package.OtherProvidedClassImpl, some.package.OtherProvidedClassImpl1;
provides some.package.FinalProvidedClass;
}
attribute := ModuleMainClass type ;
Example:
ModuleMainClass some.package.ModuleMainClass;
attribute := ModulePackages { type* }
Example:
ModulePackages {
some.package;
some.other.package;
}