Show Table Of Contents
XML-XIG: XML Instance Generator

White Paper

Related Resources:
download XML-XIG 0.1 XML-XIG home-page at home-page licensing information contact us home-page
Copyright P. Simon Tuffs, 2004. All Rights Reserved.Version 0.1

1 XML-XIG: XML Instance Generator (pronounced XML-ZIG)

by P. Simon Tuffs, Software Architect and Consultant
Preliminary Planning Document
August 2004

1.1 Introduction

Within the past few years XML has established itself as the lingua-franca of enterprise computing. It finds many uses ranging from data interchange to persistence formats for structured data. XML documents are increasingly tied to schemas (grammars), which constrain the content of a document to certain limited forms in order to make generation and consumption a more predictable and error-free process.

When developing software which produces or consumes XML, it is often necessary to create a set of documents which are valid with respect to a particular schema. There are a number of ways to do this, with varying degrees of sophistication:

  1. Hand edit a set of text XML files.
  2. Write a program that generates text XML files using print statements.
  3. Create a data-binding from a schema, and write a program that creates and renders instances from that binding.
  4. Dynamically introspect the schema, and create instance documents based on meta-data.
Of these, approaches 1-3 suffer from a relatively serious problem: if the schema for the documents change, then it is likely that the generated documents themselves will be incorrect. In the case of option 1, widespread changes to multiple text files will be required. With option 2, changes to the program will be required, with no guarantee that the resulting XML data is valid with respect to the modified schema. With option 3 the generating program will probably fail to compile against the new data-binding, but once debugged the generated XML document instances should be correct.

Option 4 is the approach taken in the XML-XIG (pronounced XML-ZIG) project. By dynamically introspecting the schema document, we ensure that the generated XML documents are always valid. By combining this introspection with a simple meta-data format, rich content documents can be generated which will remain valid regardless of the evolution in the schema.

1.2 An Example

Let's consider a simple library catalog which holds books by title and author. Here are three valid XML instance documents The most common schema language for XML documents is the W3C XML-Schema specification. One possible (and very simple) schema for this set of documents is shown below: Now it' should not be too controversial to state that, even in this very simple case, it isn't obvious from the XML schema what the structure of valid instance documents should be. And given an even moderately complex XML schema, generating a set of valid test data can be a complicated and tedious task.

This is where XIG comes in. When presented with an XML schema document, XIG first produces an outline XML document which shows the structure of the XML documents which are valid within the constraints of the schema. Here is the XIG output from our simple schema above:

The generated XIG document consists of two parts: an outline document (in this case a <book>) and an empty instance generation section <xig:generate>. Notice that XIG has reconstituted a valid <book> document from the schema, and that it has indicated the type of the content inside the <title> and <author> elements as xs:string, again directly from the schema document.

The special syntax ${} (inspired by the Apache Ant project Ant) indicates that variable substitution will take place from supplied meta-data (more on this in one moment).

When XIG is run in diagnostic mode with this default <generate> meta-data, the output is as follows:

A total of ten <book> instance documents are created because of the count='10' attribute to the loop directive. Why are the <title> and <author> fields null? Because there was no meta-data supplied for the variable substitution ${string}.

Meta-data is supplied to XIG by populating the <generate> section of the .xig file with parameters as shown below:

Now the output becomes a little more interesting, and potentially much more useful for XML application testing: Wouldn't it be nice if we could put different information into each instance document, and into different elements of the instance documents? As you might expect, XIG supports this, and it is here that the real power of this approach starts to become apparent. If the meta-data syntax looks familiar, it should: parameter names are specified using XPath notation ("//" indicates that any element path which precedes the final name is valid). This has been augmented with some XIG specific notation, which allows substitution of document instance information into a parameter value using ${/$instance}. Since "$" is not a valid element of an XPath name, the syntax /$instance can be recognized by XIG as a special substitution, in this case the document instance number. Why /$instance? In case of repeating elements in the schema, each repetition level is assigned a named instance variable, for example if a book could have more than one author, there would be an author/$instance variable, which would contain the instance number of the iterator for generating a number of authors.

The XIG meta-data facility is simple, yet can be used to generate a wide range of data from a small number of parameters. It is this leverage which makes XIG into a powerful tool for XML document instance generation.

Consider a final extension to this example. We augment the schema to allow multiple authors for each book, and wish to generate different data for each author field, indexing the author-names by their position in the list. To generate three authors for each book, we specify author/$instances=3, and to expand the instance number for each author, we specify //author="Author Name ${author/$instance}". The $instance variable is expanded for each of the values in the list of authors as follows:

1.3 Taking a Closer Look

At this stage you are probably interested in seeing some code. A very early prototype has been published at SourceForge, and can be downloaded here: download XML-XIG 0.1

This JAR file contains an executable prototype of the XIG concept, based on an XMLBeans implementation. It can be invoked as follows:

Where does the examples/book directory come from? It magically appears when the One-JAR JAR file is executed. This of course means that you should be careful where you run the JAR file: it will create two sub-directories at that location: examples and doc. examples/book contains the schema and XIG files for this document, and doc contains a copy of the XIG license.

Note that when you download the file, it will have a version number (e.g. xml-xig-0.1.jar) encoded into it: please adapt the following command line examples accordingly. The arguments to the XIG executable are the schema file, the name of the root element to be instantiated, and the XIG meta-data file.:

This concludes the introduction to XIG. Next we will discuss the implementation details of the current prototype software, and then outline what needs to be done to take this early-stage prototype to a production quality Open Source product.

1.4 Status of this Document

This document is an early-stage planning document and is subject to change without notice. It's role is to facilitate a discussion of the approach to XML instance document generation which is proposed for the XIG project.

The XIG software archive, available through SourceForge, also represents this early-stage specification, and should be used only to get an idea of what is involved in building a tool like XIG, and as a starting point for ongoing discussions about the project. It should not be considered ready for production use.

If you like XIG then you might want to check out some of the other Open-Source projects developed by
Look Inside Your Jar Files Deliver Your Java Application in One-JAR UrlPlug: Browse the Universe Java Network Benchmark Customer Driven Design