| XML-XIG: XML Instance Generator |
|
||||
| Copyright P. Simon Tuffs, 2004. All Rights Reserved. | Version 0.1 |
| by P. Simon Tuffs, Software Architect and Consultant (www.simontuffs.com). |
Preliminary Planning Document August 2004 |
When developing software which produces or consumes XML, it is often necessary to create a set of documents which are valid with respect to a particular schema. There are a number of ways to do this, with varying degrees of sophistication:
Option 4 is the approach taken in the XML-XIG (pronounced XML-ZIG) project. By dynamically introspecting the schema document, we ensure that the generated XML documents are always valid. By combining this introspection with a simple meta-data format, rich content documents can be generated which will remain valid regardless of the evolution in the schema.
<book> <title>Life, the Universe, and Everything</title> <author>Douglas N. Adams</author> </book> <book> <title>A Hitchikers Guide To The Universe</title> <author>Douglas N. Adams</author> </book> <book> <title>So Long and Thanks for all the Fish</title> <author>Douglas N. Adams</author> </book>The most common schema language for XML documents is the W3C XML-Schema specification. One possible (and very simple) schema for this set of documents is shown below:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
<xs:element name="author" type="xs:string"/>
<xs:element name="book">
<xs:complexType>
<xs:sequence>
<xs:element ref="title"/>
<xs:element ref="author"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="title" type="xs:string"/>
</xs:schema>
Now it' should not be too controversial to state that, even in this very simple case, it isn't obvious from the XML schema
what the structure of valid instance documents should be. And given an even moderately complex
XML schema, generating a set of valid test data can be a complicated and tedious task.
This is where XIG comes in. When presented with an XML schema document, XIG first produces an outline XML document which shows the structure of the XML documents which are valid within the constraints of the schema. Here is the XIG output from our simple schema above:
<xig:template document='book' schema='doc/book.xsd' xmlns:xig='http://xml-xig.sourceforge.net/schema'>
<book>
<title>${xs:string}</title>
<author>${xs:string}</author>
</book>
<xig:generate>
<!-- Generate instance documents from template document above -->
<loop count='10'>
</loop>
</xig:generate>
</xig:template>
The generated XIG document consists of two parts: an outline document (in this case a <book>) and an
empty instance generation section <xig:generate>. Notice that XIG has reconstituted a valid <book>
document from the schema, and that it has indicated the type of the content inside the <title>
and <author> elements as xs:string, again directly from the schema document.
The special syntax ${} (inspired by the Apache Ant project Ant) indicates
that variable substitution will take place from supplied meta-data (more on this in one moment).
When XIG is run in diagnostic mode with this default <generate>
meta-data, the output is as follows:
--------------------------------------------------- XIG: Instantiating... <book> <title>null</title> <author>null</author> </book> <book> <title>null</title> <author>null</author> </book> <book> ... etc.A total of ten
<book> instance documents are created because of the
count='10' attribute to the loop directive.
Why are the <title>
and <author> fields null? Because there was no
meta-data supplied for the variable substitution ${string}.
Meta-data is supplied to XIG by populating the <generate>
section of the .xig file with parameters as shown below:
<xig:generate>
<!-- Generate instance documents from template document above -->
<loop count="10" root="book">
<parameter name="string" value="This is a string"/>
</loop>
</xig:generate>
Now the output becomes a little more interesting, and potentially much more useful for XML application testing:
--------------------------------------------------- XIG: Instantiating... <book> <title>This is a string</title> <author>This is a string</author> </book> <book> <title>This is a string</title> <author>This is a string</author> </book> <book>Wouldn't it be nice if we could put different information into each instance document, and into different elements of the instance documents? As you might expect, XIG supports this, and it is here that the real power of this approach starts to become apparent.
<loop count="10" root="book">
<parameter name="string" value="This is a string"/>
<parameter name="//title" value="This is title ${/$instance}"/>
<parameter name="//author" value="Author Name ${/$instance}"/>
</loop>
---------------------------------------------------
XIG: Instantiating...
<book>
<title>This is title 1</title>
<author>Author Name 1</author>
</book>
<book>
<title>This is title 2</title>
<author>Author Name 2</author>
</book>
<book>
If the meta-data syntax looks familiar, it should: parameter names are specified using
XPath notation ("//" indicates that any element path which precedes the final name is valid).
This has been augmented with some XIG specific notation, which allows substitution
of document instance information into a parameter value using ${/$instance}.
Since "$" is not a valid element of an XPath name, the syntax /$instance
can be recognized by XIG as a special substitution, in this case the document instance number.
Why /$instance? In case of repeating elements in the schema, each
repetition level is assigned a named instance variable, for example if a book could have more
than one author, there would be an author/$instance variable, which would
contain the instance number of the iterator for generating a number of authors.
The XIG meta-data facility is simple, yet can be used to generate a wide range of data from a small number of parameters. It is this leverage which makes XIG into a powerful tool for XML document instance generation.
Consider a final
extension to this example. We augment the schema to allow multiple authors for each book,
and wish to generate different data for each author field, indexing the author-names
by their position in the list. To generate three authors for each book, we specify
author/$instances=3, and to expand the instance number for each author,
we specify //author="Author Name ${author/$instance}". The $instance
variable is expanded for each of the values in the list of authors as follows:
<loop count="10" root="book">
<parameter name="string" value="This is a string"/>
<parameter name="//title" value="This is title ${/$instance}"/>
<parameter name="//author" value="Author Name ${/$instance}"/>
<!-- Generate multiple authors -->
<parameter name="author/$instances" value="3"/>
<parameter name="//author" value="Author Name ${author/$instance}"/>
</loop>
---------------------------------------------------
XIG: Instantiating...
<book>
<title>This is title 1</title>
<author>Author Name 1</author>
<author>Author Name 2</author>
<author>Author Name 3</author>
</book>
<book>
<title>This is title 2</title>
<author>Author Name 1</author>
<author>Author Name 2</author>
<author>Author Name 3</author>
</book>
<book>
This JAR file contains an executable prototype of the XIG concept, based on an XMLBeans implementation. It can be invoked as follows:
java -jar xig.jar examples/book/book-1.xsd book examples/book/book.xig
<xig:template document='book' schema='examples/book/book-1.xsd' xmlns:xds='http://xml-xsd.sourceforge.net/schema/XmlXsd-0.1'>
<book>
<title>${xs:string}</title>
<author>${xs:string}</author>
</book>
<xig:generate>
<!-- Generate instance documents from template document above -->
<loop count='10'>
</loop>
</xig:generate>
</xig:template>
---------------------------------------------------
XIG: Instantiating...
<book>
<title>This is title 1</title>
<author>Author Name 1</author>
<author>Author Name 2</author>
<author>Author Name 3</author>
</book>
<book>
<title>This is title 2</title>
<author>Author Name 1</author>
<author>Author Name 2</author>
<author>Author Name 3</author>
</book>
<book>
<title>This is title 3</title>
Where does the examples/book directory come from? It magically appears
when the One-JAR JAR file is executed.
This of course means that you should be careful where you run the JAR file: it will create
two sub-directories at that location: examples and doc.
examples/book contains the schema and XIG files for this document, and
doc contains a copy of the XIG license.
Note
that when you download the file, it will have a version number (e.g. xml-xig-0.1.jar)
encoded into it: please adapt the following command line examples accordingly.
The arguments to the XIG executable are the schema file, the name of the root
element to be instantiated, and the XIG meta-data file.:
$ java -jar xig.jar XIG usage: java -jar xig.jar [schema.xsd] [root-element] [xig-file.xig]
This concludes the introduction to XIG. Next we will discuss the implementation details of the current prototype software, and then outline what needs to be done to take this early-stage prototype to a production quality Open Source product.
The XIG software archive, available through SourceForge http://sourceforge.net/projects/xml-xig, also represents this early-stage specification, and should be used only to get an idea of what is involved in building a tool like XIG, and as a starting point for ongoing discussions about the project. It should not be considered ready for production use.
|
|
|
|
|
|