Java: Serializable
(how to save instances)

Object Serialization.

(Text: In Core Java Volume 1, Chapter 12 "Streams and Files" section Object Streams)

Java provides a feature called object serialization that allows you take any object that implements the Serializable interface and turns it into a sequence of bytes that can later be fully restored into the original object.

"Objects are serialized with the ObjectOutputStream and they are deserialized with the ObjectInputStream. Both of these classes are part of the java.io package, and they function, in many ways, like DataOutputStream and DataInputStream because they define the same methods for writing and reading binary representations of Java primitive types to and from streams. What ObjectOutputStream and ObjectInputStream add, however, is the ability to write and read non-primitive object and array values to and from a stream." Java in a Nutshell

The ObjectInput and ObjectOutput interfaces are the basis for serialization within Java. The ObjectInputStream and ObjectOutputStream classes, respectively, implement the interfaces.

What Objects need to be serialized?

The basic idea is to save Instance information. Why Instances?

Classes are defined and "saved" in your .java source files. However, for instances, they disappear when you exit the application. One might want to save Instance information.

What information might this be?

Well, it won't be methods, since methods are defined in Classes (defined in the class files). Not so for Instance objects. Instance objects are created at run-time. Instance objects hold Instance Variables and can change Class Variables. Thus, static variables in an Instance can be a problem.. more later

To get an idea of what serialization is doing, let's consider more about the nature of objects. From, "Developing Java Beans" by Robert Englander.

"Most components maintain information that defines their appearance and behavior. This information is known as the state of the object. Some of this information is represented by the object's properties. For instance, the font or color properties of a visual component are usually considered to be part of that object's state. There may also be internal data used by an object that is not exposed as properties, but plays a part in defining the behavior of the object nevertheless.

...The state information of all the components, as well as the application or applet itself, must be saved on a persistent storage medium so that it can be used to recreate the overall application state at run-time. An important aspect of the application state is the definition of the components themselves: the persistent state of an application includes a description of the components being used, as well their collective state."

Note that we use ObjectStreams to save objects. You can only read and write objects not numbers. To write and read numbers, you use methods such as writeInt/readInt or writeDouble/readDouble. (The objectstream classes implement the DataInput/DataOutput interfaces.) (more later on why you would want to do this)

Of course, numbers inside objects (IVs) are saved and stored automatically (discussion on CV storage being static later).

When an object is saved all of its state is saved. This means that all handles and objects that the saved object refers to are saved.In its simplest application the programmer includes the phrase implements java.io.Serializable in the class definition as shown in the following example from "Developing Java Beans."

note implements java.io.Serializable could be implements Serializable but need to import java.io.*

In this example there are three data members. The first two, anInteger and aFloat are primitive data types and are therefore serializable. The third, aButton is an instance of type java.awt.Button, a subclass of java.awt.Component which itself implements java.io.Serializable. Therefore the class SimpleExample can be serialized without doing anything more than declaring that it implements java.io.Serializable.

Only classes that implement the Serializable or Externalizable interface can be written to or read from an object stream. Serializable is a marker interface - it doesn't define any methods and serves only to specify whether an object is allowed to be serialized (look in the API ... it is empty!). Read what it says about the readObject and writeObject methods. The Externalizable interface (which extends Serializable) does define methods and is used by objects that want advanced control.

Below is the save() method in the example 8.1 in Java in a Nutshell 2. Note the creation of the ObjectOutputStream and the use of writeObject(). Since it does not implement the writeObject method, it needs to instantiate the ObjectOutputStream explicitly. It is only saving one Object (specificially lines).
Notice that is is just lines that is serializable.
(See Java in a Nutshell, page 173 http://www.ecst.csuchico.edu/~amk/foo/javanut2/ch08/ScribbleFrame.java)

Static or transient data

However, this "ease" is not true in all cases. As we shall see, serialization is not so easily applied to classes with static or transient data members.

Only data associated with a specific instance of a class is serialized, therefore static data, that is, data associated with a class as opposed to an instance, is not serialized automatically. To serialize data stored in a static variable one must provide class-specific serialization.

Similarly, some classes may define data members to use as scratch variables. Serializing these data members may be unnecessary. Some examples of transient data include runtime statistics or hash table mapping references. These data should be marked with the transient modifier to avoid serialization. Transient, by definition, is used to designate data members that the programmer does not want or need to be serialized. See Java in a Nutshell, page 174: mouse position, preferred size, file handles (machine specific (native code)).

When writing code if something is declared transient, then this triggers (to programmer) necessity of the posibility of special code for serialization later.

To serialize an object, you create some sort of OutputStream object and then wrap it inside an ObjectOutputStream object. At this point you only need to call writeObject() and your object is magically serialized and sent to the OutputStream. To reverse the process, you wrap an InputStream inside an ObjectInputStream and call readObject(). What comes back is, as usual, a handle to an upcast Object, so you must downcast to set things straight.

If you need to dynamically query the type of the object, you can use the getClass method. Specifically dk.getClass.getName() returns the name of the class that dk is an instance of. I.e., this asks the object for the name of its corresponding class object. (Hmmm, True, but what about syntax? I still need to know what it is to declare it...too bad) (C++ can do this in one operation (dynamic_cast (gives null if wrong type)), java can use instanceof operator to check if it is what I think (see Core Java, Ch5 Inheritence, Casting section)

Custom Serialization

Note: a class can define custom serialization and deserialization behavior for its objects by implementing writeObject() and readObject() methods. These methods are not defined by any interface. The methods must be declared private (rather surprising since they are called from outside of the class during serialization and deserialization.)

The following example illustrates the serialization process. It is not necessarily what you need to do unless you need to customize; it is what is automatically done for an instances IVs

Note the order: when reading back keep track of the number of objects, their order and their type. In Java, remember that strings and arrays are objects and can, therefore, be restored with the writeObject/readObject methods. (Why need to do this if they are objects? why not automatic? ... IS automatic if the object is serializable and it is an IV of an instance that is serializable... and it is not static or transient)

Another example: (This is Java in a Nutshell, page 175) (show use of transient data)

What Objects need to implement Serializable?
Component implements Serializable, so all AWT components can be serialized. If a Class in the API is serializable, all of its superclasses are (otherwise they could/should not have made that claim). If you want to serialize a Class of your own, you need to make sure that all of its supers are serializable as well. Otherwise you may be fooling yourself and others. Specifically, you need to insure that your own classes implement serializable properly. Illustration

Since in a given application, one probably does not inherit most classes (other than the AWT stuff) you often need to do this explicitly.

Why not serialize everything?

Back to serializing...
So, each Class that you have Instances that will need to be saved should implement Serializable and , possibly have custom readObject() and writeObject() methods.

If a class does not implement the method, the default serialization provided by defaultReadObject() will be used. In custom methods, call the default first.
(The default methods may be called only from a class's read/writeObject methods. If it is called from outside the writeObject method (for example), the NotActiveException is thrown)

(What does this readObject() probably do? Upon (1) reading the object type, (2) instantiates the (blank) object and (3) sets it variables to the values saved)

If you write a save() method for the top Object, then as Java tries to serialize it, it accesses its variables - which are possibly objects that need to be serialized, and performs this recursively. From specs "The writeObject method serializes the specified object and traverses its references to other objects in the object graph recursively to create a complete serialized represetnation of the graph."

As far as subclasses (from Englander): "When an object is serialized, the highest serializable class in its derivation hierachy is located and serialized first. Then the hierachy is walked, with each subclass being serialized in turn."

Specifically, readObject and writeObject methods only need to save and load their data fields; they should not concern themselves with superclass data or any other class information. (except changes to static variables)

Example 1-4 in Core Java (example only to show how saved - poor OO use of main) http://www.ecst.csuchico.edu/~amk/foo/CoreJava/v2ch1/ObjectFileTest.java On the SUNs at Chico (at least on Expert), the corejava package is at /opt/java/corejava, this would need to be in your CLASSPATH for this code (and other code from the CoreJava book) to run

Keep in mind that objects may contain references to their variables, not separate copies.

Specifically, consider the example below. Two managers can share the same secretary. One does not want to save three copies of Harry. (One wants to maintain consistent data and not worry about editing numerous copies.)

See Core Java, chapter on Streams and Files: Object Streams
Fig.5, Fig.6, Fig.7, Fig.8

Thus the term serialization...

Example 1-5 in Core Java V2 (show how saved hierarchically) http://www.ecst.csuchico.edu/~amk/foo/CoreJava/v2ch1/ObjectRefTest.java

Remember, objects contain references to its IV objects, not separate copies of objects. We want the object layout on disk to be exactly like the object layout in memory. This is persistance . Java achieves persistance through serialization .

In general:

About Class Variables (static).

The problem here is when one wants to dynamically change static variables. Since these are defined in the class, when the class recreates the saved instance, it would put the old value for the static variable there.

One needs to customize (as above) to restore this information. Beware that it is an instance that is trying to save this new class variable. Specifically, if an object1 refers to a class (static) attribute of another class and object1 is to be serialized, to accurately save object1 the static attribute from the referenced class would also have to be saved and any state associated with that static attribute. However, if the referenced class is not serializable then the object1 should throw a NotSerializableException.


Cautions:

While the model used for serialization is very simple, it has some drawbacks.

First, it's not as simple as marking serializable classes with the Serializable interface. It is possible for an object that can't be serialized to implement Serializable (either directly or by inheritance).

Ultimately, serialization has to do with the data members of the class, not the methods it contains; after all, Serializable is an empty interface and doesn't require you to implement any methods.

A class is serializable if, and only if, it has only members that are serializable. By default, static and transient members are ignored when an object is serialized.

Generally speaking, classes that belong to the standard Java distribution are serializable unless serializing an object of that class would be a security risk. The problem is that there are many standard classes that would present security risks if serialized--for example, a FileInputStream can't be serialized, because when it is deserialized at a later time (and possibly on a different machine), you have an object that references some file handle that may no longer be meaningful, or that may point to a different file than it did originally.

You should make it a practice to check the class of any data members you add to a serializable class to make sure that data members can be serialized also. Don't make any assumptions; just look it up in the documentation.

Stating that a class implements Serializable is essentially a promise that the class can be successfully saved and restored using the serialization mechanism. The problem is that any subclass of that class automatically implements Serializable via inheritance, even if it adds some non-serializable members. Java throws a NotSerializableException (from the java.io package) if you try to save or restore a non-serializable object.

I looked up JTable and Swing. The Swing components DO implement Serializable but look at the bottom of their API page descriptions (scroll up a little). It gives a warning that "this class will not be compatible with future Swing releases." So, a "short term" serialization. Hence better to leave them out of major projects.

When you are writing Beans (or any class that you may want to serialize), you have to think carefully about what the class contains, and you also have to think about how the class will be used.

You can redesign almost any class so that it is serializable, but this redesign may have implications for the interface between your class and the rest of the world. Ultimately, that's the trick with object serialization. It's not as simple as marking a few classes Serializable; it has real implication for how you write code.

Versioning

The idea: you have a class and you have serialized objects made from this class. Now the class changes (you have a new version). What happens when you try to load old instance information to a newly created instance (from a newer class version)?

"When an object is serialized, some information about its class must obviously be serialized with it, so that the correct class file can be loaded when the object is deserialized. This information about the class is represented by the java.io.ObjectStreamClass class. It contains the fully-qualified name of the class and a version number. The version number is very important because an early version of a class may not be able to deserialize a serialized instance created by a later version of the same class." Java in a Nutshell, page 175

Core JavaV2, (In the 1.2 Core Java text, this information is in the Volume1) discusses what the files that are saved during serialization actually look like. Note (2) of the class description: "the serial version unique ID , which is a fingerprint of the data field types and method signatures".

Java gets this fingerprint by using their Secure Hash Algorithm SHA on the data of the class.

When a class definition changes in any way, so does this SHA fingerprint. So the idea is when you start serializing instances of a class, you should identify the fingerprint of the current version of the class.

See SUNs Stream Unique Identifiers page to see what all is in this fingerprint

To get the SHA fingerprint:

  1. Do SHA to get the fingerprint (see Core Java about the use of serialver (a standalone program that generates) these numbers).
  2. Once generated, put it as a definition in the class and later versions that will not break serialization compatibility
"breaking" serializations produces exceptions like:
java.io.InvalidClassException: Person; local class incompatible: 
stream classdesc serialVersionUID = -2832314155938395448, 
local class serialVersionUID = 480295508009809219

Situations:

  1. if only the methods change, no problem
  2. Program using version 1 objects: if data fields from version 1 are less than version 2, created and set to default values for type Fig.10
  3. Program using version 1 objects: if data fields from version 1 are greater than version 2, ignore extra Fig.11

If you make larger changes that break serialization (2 and 3 above) compatibility, run serialver again to generate an updated version number. "It is up to the class designer to implement additional code in the readObject method to fix version incompatibilities or to make sure the methods are robust enough to handle null data." CoreJavaV2 pg.66 ( null is the default for un-instantiated objects)


Do we get it? For a final overview see Serialization part of Bean tutorial



There is obviously a lot more to Serializable. For further reference try
(the online book I have mentioned) "Thinking in Java" also "Java in a Nutshell", 2nd ed for code and 3rd ed in IO chapter (scroll own to look at the method .GetField), and "Developing Java Beans" by Robert Englander published by O'Reilly (ISBN: 1-56592-289-1). The SUN web site to visit would be: at http://java.sun.com/j2se/1.4.2/docs/guide/serialization and specifically the Specs at http://java.sun.com/j2se/1.4.2/docs/guide/serialization/spec/serialTOC.html

non-trivial