The BCEL APIThe BCEL API abstracts from the concrete circumstances of the Java Virtual Machine and how to read and write binary Java class files. The API mainly consists of three parts:
JavaClass
The "static" component of the BCEL API resides in the package
The top-level data structure is
The constant pool serves as some kind of central repository and is
thus of outstanding importance for all components.
Methods and fields contain a signature, symbolically defining
their types. Access flags like int access_flags = ACC_PUBLIC | ACC_STATIC | ACC_FINAL;
As mentioned in section
2.1 already, several components may contain attribute
objects: classes, fields, methods, and Class repository
Using the provided JavaClass clazz = Repository.lookupClass("java.lang.String");
The repository also contains methods providing the dynamic equivalent
of the if (Repository.instanceOf(clazz, super_class)) { ... } Accessing class file data
Information within the class file components may be accessed like
Java Beans via intuitive set/get methods. All of them also define
a System.out.println(clazz); printCode(clazz.getMethods()); ... public static void printCode(Method[] methods) { for (int i = 0; i < methods.length; i++) { System.out.println(methods[i]); Code code = methods[i].getCode(); if (code != null) // Non-abstract method System.out.println(code); } } Analyzing class data
Last but not least, BCEL
supports the Visitor design pattern, so one can write
visitor objects to traverse and analyze the contents of a class
file. Included in the distribution is a class
ClassGen
This part of the API (package
Types
We abstract from the concrete details of the type signature syntax
(see 2.5) by introducing the
Type return_type = Type.VOID; Type[] arg_types = new Type[] { new ArrayType(Type.STRING, 1) };
Generic fields and methods
Fields are represented by
Generic methods contain methods to add exceptions the method may
throw, local variables, and exception handlers. The latter two are
represented by user-configurable objects as well. Because
exception handlers and local variables contain references to byte
code addresses, they also take the role of an instruction
targeter in our terminology. Instruction targeters contain a
method
The maximum stack size needed by the method and the maximum number
of local variables used may be set manually or computed via the
Instructions
Modeling instructions as objects may look somewhat odd at first
sight, but in fact enables programmers to obtain a high-level view
upon control flow without handling details like concrete byte code
offsets. Instructions consist of an opcode (sometimes called
tag), their length in bytes and an offset (or index) within the
byte code. Since many instructions are immutable (stack operators,
e.g.), the
Instructions are grouped via sub-classing, the type hierarchy of
instruction classes is illustrated by (incomplete) figure in the
appendix. The most important family of instructions are the
branch instructions, e.g.,
All instructions can be traversed via
For debugging purposes it may even make sense to "invent" your own
instructions. In a sophisticated code generator like the one used
as a backend of the Barat
framework for static analysis one often has to insert
temporary One could also think of new byte code instructions operating on complex numbers that are replaced by normal byte code upon load-time or are recognized by a new JVM. Instruction lists
An instruction list is implemented by a list of
instruction handles encapsulating instruction objects.
References to instructions in the list are thus not implemented by
direct pointers to instructions but by pointers to instruction
handles. This makes appending, inserting and deleting
areas of code very simple and also allows us to reuse immutable
instruction objects (fly-weight objects). Since we use symbolic
references, computation of concrete byte code offsets does not
need to occur until finalization, i.e., until the user has
finished the process of generating or transforming code. We will
use the term instruction handle and instruction synonymously
throughout the rest of the paper. Instruction handles may contain
additional user-defined data using the Appending: One can append instructions or other instruction lists anywhere to an existing list. The instructions are appended after the given instruction handle. All append methods return a new instruction handle which may then be used as the target of a branch instruction, e.g.: InstructionList il = new InstructionList(); ... GOTO g = new GOTO(null); il.append(g); ... // Use immutable fly-weight object InstructionHandle ih = il.append(InstructionConstants.ACONST_NULL); g.setTarget(ih); Inserting: Instructions may be inserted anywhere into an existing list. They are inserted before the given instruction handle. All insert methods return a new instruction handle which may then be used as the start address of an exception handler, for example. InstructionHandle start = il.insert(insertion_point, InstructionConstants.NOP); ... mg.addExceptionHandler(start, end, handler, "java.io.IOException");
Deleting: Deletion of instructions is also very
straightforward; all instruction handles and the contained
instructions within a given range are removed from the instruction
list and disposed. The try { il.delete(first, last); } catch (TargetLostException e) { for (InstructionHandle target : e.getTargets()) { for (InstructionTargeter targeter : target.getTargeters()) { targeter.updateTarget(target, new_target); } } }
Finalizing: When the instruction list is ready to be dumped
to pure byte code, all symbolic references must be mapped to real
byte code offsets. This is done by the InstructionList il = new InstructionList(); ClassGen cg = new ClassGen("HelloWorld", "java.lang.Object", "<generated>", ACC_PUBLIC | ACC_SUPER, null); MethodGen mg = new MethodGen(ACC_STATIC | ACC_PUBLIC, Type.VOID, new Type[] { new ArrayType(Type.STRING, 1) }, new String[] { "argv" }, "main", "HelloWorld", il, cp); ... cg.addMethod(mg.getMethod()); il.dispose(); // Reuse instruction handles of list Code example revisited
Using instruction lists gives us a generic view upon the code: In
Figure 5 we again present the code chunk
of the Instruction factories
To simplify the creation of certain instructions the user can use
the supplied
Example: Pushing constants Pushing constants onto the
operand stack may be coded in different ways. As explained in section 2.2 there are
some "short-cut" instructions that can be used to make the
produced byte code more compact. The smallest instruction to push
a single
Instead of repeatedly selecting the most compact instruction in,
say, a switch, one can use the compound InstructionFactory f = new InstructionFactory(class_gen); InstructionList il = new InstructionList(); ... il.append(new PUSH(cp, "Hello, world")); il.append(new PUSH(cp, 4711)); ... il.append(f.createPrintln("Hello World")); ... il.append(f.createReturn(type)); Code patterns using regular expressions
When transforming code, for instance during optimization or when
inserting analysis method calls, one typically searches for
certain patterns of code to perform the transformation at. To
simplify handling such situations BCEL introduces a special feature:
One can search for given code patterns within an instruction list
using regular expressions. In such expressions,
instructions are represented by their opcode names, e.g.,
"NOP+(ILOAD|ALOAD)*"
represents a piece of code consisting of at least one
The Example: Optimizing boolean expressions
In Java, boolean values are mapped to 1 and to 0,
respectively. Thus, the simplest way to evaluate boolean
expressions is to push a 1 or a 0 onto the operand stack depending
on the truth value of the expression. But this way, the
subsequent combination of boolean expressions (with
When the code has been finalized these chunks can be optimized
with a peep hole algorithm: An CodeConstraint constraint = new CodeConstraint() { public boolean checkCode(InstructionHandle[] match) { IfInstruction if1 = (IfInstruction) match[0].getInstruction(); GOTO g = (GOTO) match[2].getInstruction(); return (if1.getTarget() == match[3]) && (g.getTarget() == match[4]); } }; InstructionFinder f = new InstructionFinder(il); String pat = "IfInstruction ICONST_0 GOTO ICONST_1 NOP(IFEQ|IFNE)"; for (Iterator e = f.search(pat, constraint); e.hasNext(); ) { InstructionHandle[] match = (InstructionHandle[]) e.next();; ... match[0].setTarget(match[5].getTarget()); // Update target ... try { il.delete(match[1], match[5]); } catch (TargetLostException ex) { ... } }
The applied code constraint object ensures that the matched code
really corresponds to the targeted expression pattern. Subsequent
application of this algorithm removes all unnecessary stack
operations and branch instructions from the byte code. If any of
the deleted instructions is still referenced by an
Example application: The expression: if ((a == null) || (i < 2)) System.out.println("Ooops"); can be mapped to both of the chunks of byte code shown in figure 6. The left column represents the unoptimized code while the right column displays the same code after the peep hole algorithm has been applied:
|