XQJ Tutorial Part V: Serializing Results



The XQuery 1.0 specification consists of multiple books; one is XSLT 2.0 and XQuery 1.0 Serialization. Given a data model instance, the specification defines how to serialize that instance into a sequence of octets. The XQuery 1.0 specification defines a number of parameters that influence the serialization process:

  • byte-order-mark
  • cdata-section-elements
  • doctype-public
  • doctype-public
  • encoding
  • escape-uri-attributes
  • include-content-type
  • indent
  • media-type
  • method
  • normalization-form
  • omit-xml-declaration
  • standalone
  • undeclare-prefixes
  • use-character-maps
  • version

You'll learn more about some of these serialization parameters later in this chapter.

Note that serialization is an DataDirect XQuery® implementation, all parameters are documented here.)

Serializing Results to a File

The XQuery 1.0 specification provides guidelines on, among other topics, writing query results using XML syntax into a file (a typical use case for query result processing). Let's use a simple example to illustrate the process of serializing your query results in a file:

...
XQExpression xqe;
XQSequence xqs;
xqe = xqc.createExpression();
xqs = xqe.executeQuery(
"doc('orders.xml')/*/ORDERS[O_ORDERKEY = '39']");
xqs.writeSequence(
new FileOutputStream("/home/jimmy/result.xml"),
new Properties());
...

Note that the second argument of writeSequence() is an empty Properties object. You can also specify null. Both an empty Properties object and null imply that the XQJ driver uses the default values for each of the serialization parameters.

You might get a result like this (assume this to be one line; we used new lines here for formatting considerations):

<ORDERS><O_ORDERKEY>39</O_ORDERKEY><O_CUSTKEY>
8177</O_CUSTKEY><O_ORDERSTATUS>O</O_ORDERSTATUS>
<O_TOTALPRICE>307811.89</O_TOTALPRICE><O_ORDERDATE>
1996-09-20T00:00:00</O_ORDERDATE><O_ORDERPRIORITY>3-MEDIUM
</O_ORDERPRIORITY><O_CLERK>Clerk#000000659</O_CLERK>
<O_SHIPPRIORITY>0</O_SHIPPRIORITY><O_COMMENT>furiously
unusual pinto beans above the furiously ironic asymptot
</O_COMMENT> </ORDERS>

Specifying Indenting and Encoding

That's not really readable, is it? Some indentation would help. It's also good practice to add the XML declaration and an encoding. Let's assume we want to encode the XML file as UTF-16:

...
Properties serializationProps = new java.util.Properties();
// make sure we output xml
serializationProps.setProperty("method", "xml");
// pretty printing
serializationProps.setProperty("indent", "yes");
// serialize as UTF-16
serializationProps.setProperty("encoding", "UTF-16");
// want an XML declaration
serializationProps.setProperty("omit-xml-declaration", "no");
XQExpression xqe;
XQSequence xqs;
xqe = xqc.createExpression();
xqs = xqe.executeQuery(
"doc('orders.xml')/*/ORDERS[O_ORDERKEY = '39']");
xqs.writeSequence(
new FileOutputStream("/home/jimmy/result.xml"),
serializationProps);
...

The result looks much better now:

<?xml version="1.0" encoding="UTF-16"?>
<ORDERS>
<O_ORDERKEY>39</O_ORDERKEY>
<O_CUSTKEY>8177</O_CUSTKEY>
<O_ORDERSTATUS>O</O_ORDERSTATUS>
<O_TOTALPRICE>307811.89</O_TOTALPRICE>
<O_ORDERDATE>1996-09-20T00:00:00</O_ORDERDATE>
<O_ORDERPRIORITY>3-MEDIUM</O_ORDERPRIORITY>
<O_CLERK>Clerk#000000659</O_CLERK>
<O_SHIPPRIORITY>0</O_SHIPPRIORITY>
<O_COMMENT>furiously unusual pinto beans above the furiously ironic asymptot</O_COMMENT>
</ORDERS>

Handling Characters That Require Escaping

During serialization, characters are escaped as needed for the specified encoding. Suppose a query returns a document with a registered trademark character (®), with the specified encoding US-ASCII:

...
Properties serializationProps = new java.util.Properties();
serializationProps.setProperty("method", "xml");
serializationProps.setProperty("encoding", "ASCII");
XQExpression xqe;
XQSequence xqs;
xqe = xqc.createExpression();
xqs = xqe.executeQuery(
"<product>DataDirect XQuery®</product>");
xqs.writeSequence(
new FileOutputStream("/home/jimmy/result.xml"),
serializationProps);
...

You'll get the following result (note that the ® character is serialized as a character reference because it is not defined in the ASCII character set):

<product>DataDirect XQuery&#xae</product>

Using CDATA

In some use cases, the cdata-section-elements parameter is useful. Imagine that you're serializing some XML elements including ampersand characters. By default the "&" characters are escaped; using CDATA sections may be preferable to make the XML file more human readable:

...
Properties serializationProps = new java.util.Properties();
serializationProps.setProperty("method", "xml");
serializationProps.setProperty("cdata-section-elements", "product");
XQExpression xqe;
XQSequence xqs;
xqe = xqc.createExpression();
xqs = xqe.executeQuery(
"<product>DataDirect XQuery &amp; XML Converters</product>");
xqs.writeSequence(
new FileOutputStream("/home/jimmy/result.xml"),
null);
...

The result is serialized as follows:

<product><![CDATA[DataDirect XQuery & XML Converters]]></product>

Note that multiple elements can be specified through the cdata-section-elements parameter, separating each one with a semi-colon character. And if the element is in a namespace, you can add the namespace URI using the James Clark notation, "{"+namespace URI+"}"localname:

...
Properties serializationProps = new java.util.Properties();
serializationProps.setProperty("method", "xml");
serializationProps.setProperty("encoding", "UTF-8");
serializationProps.setProperty("omit-xml-declaration", "no");
serializationProps.setProperty("cdata-section-elements",
"product;{uri}product");
XQExpression xqe;
XQSequence xqs;
xqe = xqc.createExpression();
xqs = xqe.executeQuery(
"<e xmlns:p='uri'> " +
" <product>DataDirect XQuery &amp; XML Converters</product>" +
" <p:product>DataDirect XQuery &amp; XML Converters</p:product>" +
"</e>");
xqs.writeSequence(
new FileOutputStream("/home/jimmy/result.xml"),
null);
...

The result is the following:

<?xml version="1.0" encoding="UTF-8"?>
<e xmlns:p="uri">
<product><![CDATA[DataDirect XQuery & XML Converters]]></product>
<p:product><![CDATA[DataDirect XQuery & XML Converters]]></p:product>
</e>

Specifying HTML and XHTML Output

In addition to the XML output method, the XQuery serialization specification defines other output methods like HTML and XHTML. Note that these serialization methods will not "magically" produce (X)HTML — it is still the query's responsibility to generate results that conform to the (X)HTML specifications. But the serializer will consider the (X)HTML rules outputting the results. For example, when choosing HTML, <br> elements will be serialized without a closing </br>.

Note, for example, the difference between the result.xml and result.html for the following code:

...
Properties serializationProps = new java.util.Properties();
XQPreparedExpression xqpe = xqc.createPreparedExpression(
"<html>line1<br/>line2</html>");
XQSequence xqs = xqpe.executeQuery();
serializationProps.setProperty("method", "xml");
xqs.writeSequence(
new FileOutputStream("/home/jimmy/result.xml"),
serializationProps);
XQSequence xqs = xqpe.executeQuery();
serializationProps.setProperty("method", "html");
xqs.writeSequence(
new FileOutputStream("/home/jimmy/result.html"),
serializationProps);
...

result.xml is as follows:

<html>line1<br/>line2</html>

... while result.html looks like this:

<html>line1<br>line2</html>

Alternatives to Streaming Results

In all previous examples, we've serialized the query results in a FileOutputStream. An XQSequence can also be serialized into a java.io.Writer using the writeSequence() method. And getSequenceAsString() serializes to a java.lang.String.

Similar to serializing the complete XQSequence, there are methods to serialize the current individual item in the XQSequence. In the following example, the items in the query result are saved into distinct files — result1.xml, result2.xml, and so on.

...
Properties serializationProps = new java.util.Properties();
serializationProps.setProperty("method", "xml");
serializationProps.setProperty("indent", "yes");
serializationProps.setProperty("encoding", "UTF-8");
serializationProps.setProperty("omit-xml-declaration", "no");
XQExpression xqe;
XQSequence xqs;
xqe = xqc.createExpression();
xqs = xqe.executeQuery("doc('orders.xml')/*/ORDERS");
int i = 1;
while (xqs.next()) {
FileOutputStream file;
file = new FileOutputStream("/home/jimmy/result" +
i + ".xml");
xqs.writeItem(file, serializationProps);
file.close();
}
...

Note that XML serialization doesn’t always result in a well-formed XML document. More precisely, it is either a well-formed XML document or a well-formed XML external general parsed entity. See the serialization specification for more information on this topic.