JObjectTree is a very simple scheme for storing Java objects. Objects are organized in filesystem-like hierarchical trees. Any Serializable object can be stored in JObjectTree without modification. JObjectTree is designed to use multiple underlying object stores and currently supports two: a local
... [More]
file system and (more importantly), Amazon Simple Storage Service (Amazon S3). JObjectTree is implemented as a Java library.
In JObjectTree, each object is uniquely identified by a Loc, which consists of a Context (such as an S3 bucket) and a Path, where a Path is an ordered array of strings.
By default, JObjectTree uses the standard Java serialization mechanism to convert objects to a persistable form. You can control this process by implementing your own version of Serializer. Currently, the distribution contains two Serializers, one that does plain serialization and another that compresses objects (see GzipSerializer. ).
Why another Java object persistence scheme? There are, at this point, many sophisticated, powerful systems for persisting Java objects. Despite this, I decided to write JObjectTree, and did so for two main reasons. First, for projects that entail parallel processing, I find it convenient to store objects in AWS S3 (which inherently supports large scale parallel access to data). Persistence schemes such as JDO and JPA primarily use relational databases as object stores, so they cannot be used with S3. Secondly, these powerful systems are simply overkill for many of my projects. I often write applications that don't need to do queries or ACID transactions - they just need to store objects in an orderly way and be able to retrieve them reliably. Since I don't need the flexibility these systems provide, I really don't want to bother adding annotation to my code or writing XML metadata files. Hence JObjectTree.
Dependencies JObjectTree uses the following Java libraries, which are included in the distribution zip file (in the lib directory).
JetS3t Jakarta Commons HttpClient Apache Commons Codec Apache Commons Logging
How to use JObjTree
This is quick introduction to using JObjectTree with AWS S3 as the object store. For more details, see the Javadoc API documentation.
Get the jars
All the jars you need are in lib directory of the zip distribution file. Put them in your classpath.
Get access to S3
To use JObjectTree with S3, you'll need to specify your AWS access ID and AWS secret key. Once you have an AWS account, you can find these on your Access Identifiers page.
Create an S3 bucket
All operations are performed using an instance of instance of Ot. An Ot instance handles access to one Context. For an S3 context ( ContextS3. ), this means one S3 bucket.
It's best to have a bucket that's dedicated to JObjectTree. While this isn't absolutely necessary, but it makes things much simpler, and there's no cost to creating a bucket. You can create a bucket programmatically (see OtS3.createBucket() ) but it's just as easy to create one using an interactive tool, such as S3 Organizer. See the AWS S3 API Reference Guide for bucketname rules (e.g. underscores are not allowed).
Create an OtS3 instance
Ot ot = new OtS3(new ContextS3("your-bucket-name"),"aws_access_ID", "aws_secret_key");Or, to compress your objects:
Ot ot = new OtS3(new ContextS3("your-bucket-name"),
"aws_access_ID", "aws_secret_key",
new GZipSerializer());
Store objects
To store an object, you need to specify an OPath, which is just an ordered array of strings. The last component of the path should have an appropriate extension, e.g.
String objectNameExtension =
ot.getContext().getDefaultSerializer().getObjectNameExtension();Then
OPath path = new OPath(new String[] { "path", "to","an_object." + objectNameExtension });And to store an object with this path:
Serializable x = new !SomeSerializableClass();
ot.add(path,a);(If you have classes that will often be stored in JObjectTree, it's convenient to have them create their own paths by implementing the Otable interface.)
List objects
To list all the objects in a context,
java.util.List objLocs = ot.listContext();See Ot for other listing methods.
Retrieve objects
Serializable obj = ot.get(path);
Other operations
See Ot for other operations, such as testing for the existence of an object and deleting an object. See OtS3 for a couple of S3 specific operations. For synchronizing an S3 context with a file system context, see ot.Sync.
Re Security
SSH
By default, OtS3 communicates with S3 en clair. To enable SSH, you need to reset a JetS3t property (JetS3T is the Java library used to communicate with S3). You can reset JetS3t properties with OtS3.setJetS3tProperty(). Use
ot.setJet3tProperty("s3service.https-only","true");to enable SSH.
Implementing encryption
JObjectTree does not yet support encryption. You can implement your own encryption by creating a class that implements the Serializer interface and performs encryption. In JObjectTree, objects are written to S3 or files as byte arrays, and these byte arrays are created by the Serializer.objectToByteArray() method. When objects are retrieved, they are reconverted to objects by the Serializer.byteArrayToObject() method. For examples of how this is done, see the source code for StandardSerializer and GzipSerializer.
Caveat re S3 If you haven't used S3 before, there is a possible pitfall re locking, or rather, the lack thereof. S3 does not provide any means for client applications to lock an object. Furthermore, S3 makes extensive use of caches. This means that if you modify an object at time t0, a read at time t0 + delta could still get the old version. There is no way to determine when a write has been completed. And there is no way to determine how long it will take for a change to be propagated to caches.
Internally, S3 does prevent simultaneous writes - once a write command is received, the object's location is locked until the write is completed. Writes to a given address are fully processed in the order received. We do not have access to this locking mechanism.
serialVersionUID
JObjectTree uses the standard Java serialization mechanism and thus is subject to the standard serialVersionUID problem. For those who haven't used serialization: each Java class has a version number, the serialVersionUID. By default, this number is changed (by the compiler) whenever a class is recompiled. This means that if you serialize and object and then recompile its source, you won't be able to deserialize it because the serialized object's ID no longer matches the class' ID. The standard solution it to explicitly set the serialVersionUID of a Serializable class, e.g.
static final long serialVersionUID = 1;
Logging
JObjectTreeitself does not perform any logging. However, JetS3t, which is used for communications with S3, uses Apache Commons Logging. By default, all log messages are written to the stderr. To completely disable log messages, you can use ot.s3.NilLog. Add the line
System.setProperty("org.apache.commons.logging.Log", "ot.s3.NilLog");at the beginning of your application. If you want to record log messages, you'll need to add a logging package that implements Apache Commons Logging. Log4J is the most common choice.
Acknowledgement
For all communications with S3, I rely on James Murty's admirable toolkit, JetS3t. [Less]