The Verifier (Ch. 2, Sec. 6) [Securing Java]

The Base Java Security Model: The Original Applet Sandbox
CHAPTER SECTIONS: 1 / 2 / 3 / 4 / 5 / 6 / 7 / 8 / 9 / 10 / 11 / 12 / 13

Section 6 -- The Verifier

Recall that when a Java program is compiled, it compiles down to platform-independent Java byte code. As Figure 2.3 shows, Java byte code is verified before it can run. This verification scheme is meant to ensure that the byte code, which may or may not have been created by a Java compiler, plays by the rules. After all, byte code could well have been created by a "hostile compiler'' that assembled byte code meant to violate the rules of the Java VM; created directly with an editor like emacs; created almost directly with a Java byte code assembler like Jasmin; or compiled from another source language like C++, Scheme, or Ada into Java byte code. The important thing is not what the source code looked like or even what language it was written in, but what the byte code (the code that actually ends up running) does. In this sense, the Verifier makes mystery code a bit less mysterious.

Figure 2.3 The Verifier scrutinizes byte code before it is allowed to be run on a local VM.
The Verifier plays an essential role in Java's language-based approach to security, which is built on the foundation of type safety.

Verifying class files containing byte code is one way in which Java automatically checks untrusted code before it is allowed to run. Once Java code has been verified, it can execute in uninterrupted fashion on a VM (with much less need to make security-critical checks while the code runs). This strategy leads to improvements in the efficiency of Java execution, which offset the speed concerns raised by Java's security checking. The Verifier is built in to the VM and cannot be accessed by Java programmers or Java users. In most Java implementations, when Java code arrives at the VM and is formed into a Class by the Class Loader, the Verifier automatically examines it. The Verifier checks byte code at a number of different levels. The simplest test makes sure that the format of a code fragment is correct. On a less-basic level, a built-in theorem prover is applied to each code fragment. The theorem prover helps to make sure that byte code does not forge pointers, violate access restrictions, or access objects using incorrect type information. If the Verifier discovers a problem with a class file, it throws an exception, loading ceases, and the class file never executes. The verification process, in concert with the security features built into the language and checked at runtime, helps to establish a base set of security guarantees.³

The Verifier also ensures that class files that refer to each other preserve binary compatibility. Because of Java's ability to dynamically load classes, there is the possibility that a class file being dynamically linked may not be compatible with a referring class. Binary incompatibility problems could occur when a library of Java classes is updated or when a class from a large Java program is not recompiled during development. There are rules of compatibility that govern the ability to change use of classes and methods without breaking binary compatibility [Venners, 1998]. For example, it is okay to add a method to a class that is used by other classes, but not okay to delete methods from a class used by other classes. Compatibility rules are enforced by the Verifier.

Binary incompatibility also has security implications. If a malicious programmer can get your VM to accept a set of mutually incompatible classes, the hostile code will probably be able to break out of the sandbox. This problem happened several times with early VM implementations. (See, for example, You're Not My Type, in Chapter 5.)

Java users and business people investigating the use of Java in their commercial enterprises often complain about the length of time it takes for a Java applet to get started running in a browser. E-commerce system designers paint startup delay as a business-side show stopper, citing the fact that consumers do not react well even to a 20-second delay in their shopping experience. Many people believe falsely that the main delay in starting applets is the time it takes to download the applet itself. But given a reasonably fast connection to the Internet, what takes the longest is not downloading the code, but verifying it. The inherent costs of verification fit the classic tradeoff between functionality and security to a tee. As security researchers, we believe the security that byte code verification provides is well worth the slight delay. We also think it is possible to speed up the verification process so that its execution time is acceptable.

In order to work, the Verifier reconstructs type state information by looking through the byte code. The types of all parameters of all byte code instructions must be checked, since the byte code may have come from an untrustworthy source. Because of this possibility, the Verifier provides a first line of defense against external code that may try to break the VM. Only code that passes the Verifier's tests will be run.

The process of Verification in Java is defined to allow different implementations of the Java VM a fair amount of flexibility. The VM specification lists what must be checked and what exceptions and errors may result from failing a check, but it does not specify exactly when and how to verify types. Nevertheless, most Java implementations (especially the most widely used commercial VMs) take a similar approach to verification. The process is broken into two major steps: internal checks that check everything that can be checked by looking only at the class file itself and runtime checks that confirm the existence and compatibility of symbolically referenced classes, fields, and methods.

Through the two kinds of checks, the Verifier assures a number of important properties. Once byte code passes through verification, the following things are guaranteed:

The class file has the correct format, including the magic number (0xCAFEBABE) and proper length. (Much of this trivial checking can take place as a class file loads.)

Stacks will not be overflowed or underflowed. Overflowing stacks is a common attack on programs written in other languages such as C that has led to several of the most notorious security vulnerabilities. For example, the Internet worm used stack overflow as part of its arsenal [Spafford, 1989]. Java assigns each thread two stacks: a data stack and an operand stack. The data stack is the kind of stack C programmers are all too familiar with. It includes a series of frames that hold local variables and provide some storage needed for method invocation. The Verifier cannot prevent overflow of the data stack, and a denial of service attack that takes advantage of this fact can be trivially implemented with a recursive function. (A demonstration applet that carries out this attack can be found at www.cs.nps.navy.mil/research/languages/DynApplet.html.) The operand stack (which is itself allocated on the data stack) holds the values that each method invocation in Java byte code operates on. This is the stack that the Verifier tracks and evaluates.

Byte code instructions all have parameters of the correct type. For example, integers are always used as integers and nothing else. Method descriptors, which include both a return type and the number and types of parameters, are checked with a context-free grammar [Venners, 1998].

No illegal data conversions (casts) occur. For example, treating an integer as a pointer is not allowed. Correctly handling all potential casting combinations is a tricky undertaking. The rules are complex, making a perfect implementation nontrivial. Although many checks are done by the Verifier, some are deferred until runtime.

Private, public, protected, and default accesses are legal. In other words, no improper access to restricted classes, interfaces, variables, and methods will be allowed.

All register accesses and stores are valid. Computer scientists refer to the meaning of a language as its semantics, and the structure of a language is its syntax. When you are thinking about what a language can do, it is useful to talk about semantics. The semantics of the Java language provide much of Java's built-in security. It is critical that the semantics of Java be enforced in each and every Java program. Byte code is designed to contain enough symbolic information that safety verification (double-checking the compiler-enforced safety rules) can occur. Byte code specifies the methods of a class as a set of Java Virtual Machine instructions. These instructions must pass a battery of tests before they can be run.

Class Files and Byte Code

When Java source code is compiled, the results of the compilation are put into Java class files, whose names typically end with the .class or .cls extension. Java class files are made up of streams of 8-bit bytes. Larger values requiring 16 or 32 bits are composed of multiple 8-bit bytes. Class files contain several pieces of information in a particular format. Included in a class file are:

The magic constant (0xCAFEBABE)
Major and minor version information
The constant pool (a heterogeneous array composed of five primitive types)
Information about the class (name, superclass, etc.)
Information about interfaces
Information about the fields and methods in the class
Debugging information

[Sun Microsystems, 1996b; Sun Microsystems, 1996c] Much more information on class file formats and byte code syntax can be found in [Venners, 1998].

The byte code inside a class file is made up of instructions that can be divided into several categories. Among other things, byte code instructions, called opcodes, implement:

Pushing constants onto the stack
Accessing and modifying the value of a VM register
Accessing arrays
Manipulating the stack
Arithmetic instructions
Logic instructions
Conversion instructions
Control transfer
Function return
Manipulating object fields
Invoking methods
Creating objects
Type casting

Since it exists at the level of the VM, Java byte code is very similar to assembly language. Each line of byte code is a one-byte opcode followed by zero or more bytes of operand information. All instructions (with the exception of two table lookup instructions) are of fixed length.

Opcodes and their associated operands represent the fundamental operations of the VM. Every method invocation in Java gets its own stack frame to use as local storage for variable values and intermediate results. As discussed earlier, the intermediate storage area part of a frame is called the operand stack. Opcodes refer to data stored either on the operand stack or in the local variables of a method's frame. The VM uses these values as well as direct operand values from an instruction as execution data.

Internal Checks
Considered one level above the nitty-gritty level, class verification (of which byte code verification is a crucial step) is usually said to occur in four passes. The first three passes implement the Verifier's internal checks. The last pass implements the runtime checks. (Why runtime checking is considered a part of the load-time verification process is a mystery to us, but Sun seems to have convinced everyone to call it a fourth verification pass, so we'll go along.) The passes are:

Ensure that the class file is in the proper format. This includes checking the magic number and making sure that all attributes are of the right length. The byte code cannot be too short or too long, and the constant pool is able to be parsed.
Verify anything that can be done without looking at the opcodes. This includes the following checks:
1. final classes cannot be subclassed, and final methods cannot be overridden.
2. Every class must have a superclass (except the class java.lang.Object).
3. The constant pool must satisfy more stringent constraints.
4. All field references and method references in the constant pool must have legal names, legal classes, and a legal type signature.
Verify the byte code using data-flow analysis. At any given point in the byte code program, no matter how that point is reached, all of the following must hold:
1. The operand stack is always the same size and includes the same types.
2. Register access is checked for proper value type.
3. Methods are called with the appropriate number and types of arguments.
4. Fields are modified with values of the appropriate type.
5. All opcodes have proper type arguments on the stack and in the registers.
6. Variables are properly initialized. (See [Sun Microsystems, 1996c].)
Perform at runtime any checks that were not done at verification time. Some of these checks might have been impossible at verification time since some aspects of Java's type system cannot be statically checked, and some checks might have been deferred to runtime for implementation convenience.

Step 3, in which the actual byte code is verified, is a complex process that is carried out in two passes by Sun's Verifier. (Other vendors' Verifiers may behave differently.) The first pass identifies individual opcodes and stores them in a table. Once all opcodes are identified, the second pass parses each opcode's operands. During the second pass, a structure is built for each byte code instruction. This structure is evaluated for syntactic correctness by checking that:

Flow control related instructions branch to valid instructions.
Local variable references are legal (associated with the proper method).
Use of constant pool entries follows typing rules.
Opcodes have the correct number of arguments.
Exception handlers start and end with valid instructions, and the start point comes before the end point.

These checks, along with data flow analysis that tracks behavior of the operand stack and local variables, make up Step 3.

Runtime Checks and Dynamic Loading
In addition to the internal verification steps, some runtime checks occur during class execution. For example, whenever an instruction calls a method, or modifies a field, the runtime checks ensure that the method or field exists, check the call for the proper form, and check the executing method for access privilege.

In practice, the internal checks performed by the Verifier occur very soon after the VM loads a class file. Loading all possible classes that might be called in a particular execution of a class is not the most efficient approach. Instead, Java loads each class only when it is actually needed at runtime.

In order to verify a class, the VM must load in the definition of any not-yet-loaded class that is referenced by the class being verified. More precisely, the class being verified refers to some other classes by name, and the VM needs to decide exactly which class each name refers to, so it can replace the reference-by-name with a reference to a specific class object. This process is known as dynamic linking, and has proven to be a persistent source of security problems.

As in the Verifier's internal checking, dynamic linking can throw an error should it fail. An invalid reference happens when a referenced class does not exist, or when a referenced class exists but does not contain some referenced field or method. Once an error of this sort is thrown, the class file doing the reference is no longer considered valid.

The main benefit that the Verifier provides is that it speeds up execution by removing much of the checking that would otherwise have to occur at runtime. For example, there is no runtime check for stack overflow since it has already been done by the Verifier.

Checking the Checker

The Verifier disallows many obvious approaches to byte code manipulation. Nonetheless, a number of researchers have succeeded in creating byte code that should be illegal, but nevertheless passes the Verifier. The Princeton Secure Internet Programming team was the first to sneak attacks involving illegal byte code past the Verifiers. Mark LaDue, creator of the Hostile Applet Home Page (www.rstcorp.com/hostile-applets), performs a number of interesting experiments in which he creates byte code that does not play by the rules and yet passes verification. Other Java security researchers, most notably the Kimera group at the University of Washington, have discovered problems in commercial Verifiers. Their work places special emphasis on correct verification. We have more to say about the Verifier and Java holes in later chapters.

The Java Runtime and the Verifier

The Verifier acts as the primary gatekeeper in the Java security model. It ensures that each piece of byte code downloaded from the outside plays by the rules.⁴ That way, the Java VM can safely execute byte code that may not have been created by a Java compiler. When the Verifier finds a problem in a class, it rejects the malformed class and throws an exception. This is obviously a much more reasonable behavior than running buggy or malicious code that crashes the VM.

In order for the Verifier to succeed in its role as gatekeeper, the Java runtime system must be correctly implemented. Bugs in the runtime system will make byte code verification useless. Most of the available Java implementations appear to be mostly correct (see Chapter 5), but behavior may vary from one implementation to another.

It would be nice for Sun or other interested parties to create comprehensive validation and verification test suites for the entire Java development environment and publish the results of testing the various Java implementations. Existing test suites have neither been completely developed nor verified by outside experts. From a security perspective, this certainly presents a problem. It is especially critical to verify any third-party Java environment to ensure that it properly implements the Java run time. Without a guarantee of bug-free run time, Java security falls to pieces. Java users should think carefully about the Java run time product that they use.

We should note, though, that one cannot construct a test suite that will find all security problems. Although testing technology is improving, it can never be perfect-not even in theory. All we can hope for is that security testing technology will find most of the bugs in new code before it is released.

Chapter... Preface -- 1 -- 2 -- 3 -- 4 -- 5 -- 6 -- 7 -- 8 -- 9 -- A -- B -- C -- Refs
Front -- Contents -- Help