Home Services Partners Company

Querying XML Columns

Introduction

Increasingly, XML is stored in relational databases as structured XML. Some applications will continue to shred XML into a relational format or store XML as CLOB values. Other applications can now take advantage of relational databases that support a native XML data type to represent an XML document or XML document fragment. This paper shows how you can use DataDirect XQuery® to query XML stored as character data and query XML stored as XML.

Querying XML Stored As Character Data

When XML is stored as character data, it must be parsed before it can be queried as XML. You can parse XML stored as character data using Java external functions.

Querying XML Stored as XML

The SQL standard is evolving to allow XQuery to be used directly in a SQL query, providing a convenient way to query XML, but many relational databases that are currently shipping do not support this ability or implement it with limited functionality. Because DataDirect XQuery® uses SQL to query relational databases, this means that DataDirect XQuery® cannot directly evaluate XQuery expressions against XML stored in a column of a relational database.

Fortunately, you can always return an entire column that contains XML, and workarounds exist that allow you to write almost any query you need. Unfortunately, these workarounds force you to choose between database independence and performance.

Future versions of DataDirect XQuery® will solve this problem. In the meantime, you can execute XQuery queries against XML columns using one of the following three approaches:

  1. Return an entire XML column.
  2. Evaluate part of the query in memory. Although DataDirect XQuery® cannot evaluate path expressions against XML in the database, you can force DataDirect XQuery® to evaluate an expression against an XML column in memory. This approach is portable and allows the full power of XQuery, but it often is slower than the third approach because it does not leverage indexes and consumes more memory.
  3. Use database vendor-specific SQL functions in an XQuery query to return XML stored in the database. This approach is not portable and is limited to the functions provided by a given vendor. It is generally faster than the second approach because the vendor usually takes advantage of indexes that support these functions and because it does not require the entire XML column to be instantiated in memory.

Sample Data

In the remainder of this article, we show queries that use each of the preceding approaches. These queries operate on the following table named HOLDINGSXML. This table contains a column named USERID and a column named XMLCOL. For examples showing how to query XML stored as character data, assume that XMLCOL is a CLOB-typed column. For the rest of our examples, which show how to query XML stored in an XML column, assume that XMLCOL is an XML-typed column. 

USERID
------
XMLCOL
------
Jonathan 
<HOLDINGS>
<SHARE COMPANY="Amazon.com, Inc."
USERID="Jonathan">3000</SHARE>
<SHARE COMPANY="eBay Inc." USERID="Jonathan">4000</SHARE>
<SHARE COMPANY="Int'l Business Machines C"
USERID="Jonathan">2500</SHARE>
<SHARE COMPANY="Progress Software"
USERID="Jonathan">23</SHARE>
</HOLDINGS>
Minollo 
<HOLDINGS>
<SHARE COMPANY="Amazon.com, Inc."
USERID="Minollo">3000</SHARE>
<SHARE COMPANY="eBay Inc." USERID="Minollo">4000</SHARE>
<SHARE COMPANY="Lucent Technologies, Inc."
USERID="Minollo">40000</SHARE>
<SHARE COMPANY="Progress Software"
USERID="Minollo">4000000</SHARE>
</HOLDINGS>
... 
... 


Parsing XML Stored as Character Data

If your database does not support the XML type and you store XML documents as character data, you must parse the XML before it can be queried. This can be done with a Java external function called from within an XQuery query. The following Java external function creates a DOM tree by parsing its input:

public static Document txt2xml(String txt) {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder;
try {
builder = factory.newDocumentBuilder();
}
catch (ParserConfigurationException e) {
e.printStackTrace();
return null;
}
Document doc = null;
try {
doc = builder.parse(new InputSource(new StringReader(txt)));
}
catch (SAXException e1) {
e1.printStackTrace();
return null;
}
catch (IOException e1) {
e1.printStackTrace();
}
return doc;
}

Using DataDirect XQuery®, you can call this function by declaring it in the prolog and using it in a query. For example:

declare namespace p='ddtekjava:txt2xml';
declare function p:txt2xml($inp as xs:string) as document-node() external;
for $row in collection('HOLDINGSXML')/HOLDINGSXML/XMLCOL
return
p:txt2xml($row/XMLCOL)/HOLDINGS/SHARE[@COMPANY='Amazon.com, Inc.']

Returning an Entire XML Column

Using DataDirect XQuery®, it's straightforward to return an entire XML column because it does not require evaluation of the contents of the column. For the HOLDINGSXML table, the following query returns a column:

collection('HOLDINGSXML')/HOLDINGSXML/XMLCOL

The result looks like this:

<XMLCOL>
    <HOLDINGS>
     <SHARE COMPANY="Amazon.com, Inc."
USERID="Jonathan">3000</SHARE>
     <SHARE COMPANY="eBay Inc." USERID="Jonathan">4000</SHARE>
     <SHARE COMPANY="Int'l Business Machines C"
USERID="Jonathan">2500</SHARE>
     <SHARE COMPANY="Progress Software"
USERID="Jonathan">23</SHARE>
    </HOLDINGS>
</XMLCOL>
<XMLCOL>
    <HOLDINGS>
     <SHARE COMPANY="Amazon.com, Inc."
USERID="Minollo">3000</SHARE>
     <SHARE COMPANY="eBay Inc." USERID="Minollo">4000</SHARE>
     <SHARE COMPANY="Lucent Technologies Inc."
USERID="Minollo">40000</SHARE>
     <SHARE COMPANY="Progress Software"
USERID="Minollo">4000000</SHARE>
    </HOLDINGS>
</XMLCOL>

In contrast, any query that evaluates path expressions against the contents of an XML column fails. For example, the following query fails and raises an error:

collection('HOLDINGSXML')/HOLDINGSXML/XMLCOL/HOLDINGS

Because the // operator requires all descendant nodes to be evaluated, this operation fails if applied to an XML column. For example, the following query fails and raises an error:

collection('HOLDINGSXML')//HOLDINGSXML

Evaluating XML in Memory

DataDirect XQuery® supports an extension expression named evaluate in memory that forces an expression to be evaluated in memory using XQuery directly, not using SQL. This is a simple way to use XQuery functionality that is not available in SQL.

For example, the following query fails and raises an error because DataDirect XQuery® cannot evaluate path expressions against XML in the database:

for $h in collection('HOLDINGSXML')/HOLDINGSXML
where $h/USERID eq 'Minollo'
return
$h/HOLDINGS[@COMPANY = 'Progress Software']

If we rewrite the query, using the evaluate-in-memory extension expression to force the XML to be evaluated in memory, the query succeeds:

for $h in collection('HOLDINGSXML')/HOLDINGSXML
let $holdings := (# ddtek:evaluate-in-memory #) {$h/XMLCOL}
where $h/USERID eq 'Minollo'
return
$holdings/SHARE[xs:string(@COMPANY) eq 'Progress
Software']

The result looks like this:

<HOLDINGS>
<SHARE COMPANY="Progress Software"
USERID="Minollo">4000000</SHARE>
</HOLDINGS>

Using Database Vendor-Specific SQL Functions to Query XML

DataDirect XQuery® can call any SQL function as an external function. DB2 and Oracle provide SQL functions that allow you to return parts of an XML value stored in the database using path expressions similar to XPath 1.0.

For example, the DB2 extractCLOB function accepts two parameters; the first parameter is the XML to be queried, and the second parameter is a path expression to be evaluated against the XML.

To call an external function, you must first declare it in the query prolog:

declare function ddtek-sql:extractCLOB(
$inp as node(), $xp as xs:string) as node() external;

Once the function is declared, it can be called in a query:

for $h in collection('HOLDINGSXML')/HOLDINGSXML/XMLCOL
return
ddtek-sql:extractCLOB(
$h/node(),
"/HOLDINGS")

In the preceding query, note that the first parameter is a path expression that points to the XML to be queried. The second parameter is a string containing the path expression to be used.

For more information about declaring and invoking SQL functions, refer to the DataDirect XQuery® User's Guide and Reference.

Using DB2 SQL Functions

The following examples show how to use some common DB2 SQL functions to query XML stored in the database.

When the DB2 function extractCLOB returns an XML value, the result can only be returned without further processing as part of the result of the complete XQuery expression. DataDirect XQuery® allows the following exception: the result of an extract function can be used as the content expression of an element constructor that is returned as-is from the XQuery expression. You can use the evaluate-in-memory extension expression to work around this limitation. This extension expression forces DataDirect XQuery® to evaluate the XQuery expression in memory.

In addition, DataDirect XQuery® does not support DB2 functions that return sequences of values, including extractIntegers, extractCLOBs, and so on.

NOTE: DB2 XML Extender functionality is not enabled by default; it must be explicitly enabled. In addition, the user ID executing DB2 SQL functions must have the correct permissions to use XML Extender functionality. Refer to your DB2 documentation for more information about enabling XML Extender functionality and setting permissions.

extractCLOB Function

This example returns the value of all HOLDINGS elements.

declare function ddtek-sql:DB2XML.extractCLOB(
$inp as node(), $xp as xs:string) as node() external;
for $h in collection('HOLDINGSXML')/HOLDINGSXML/XMLCOL
return
ddtek-sql:DB2XML.extractCLOB(
$h/node(),
"/HOLDINGS")
Using the evaluate-in-memory Extension Expression with the extractCLOB Function

In this example, notice how the argument of fn:node-name is the result of a SQL extractCLOB function that normally would have to be returned as-is from the expression. In this case, you must use the evaluate-in-memory extension expression to force DataDirect XQuery® to evaluate the XQuery expression in memory; otherwise, this query fails and raises an error.

declare function ddtek-sql:DB2XML.extractCLOB(
$inp as node(), $xp as xs:string) as node() external;
for $h in collection('HOLDINGSXML')/HOLDINGSXML/XMLCOL
let $holdings := ddtek-sql:DB2XML.extractCLOB($h/node(),
"/HOLDINGS")
let $holdings2 := (#ddtek:evaluate-in-memory#) {$holdings}
for $n in $holdings2//(*|@*)
return
fn:node-name($n)
Using Oracle SQL Functions

Now let's look at using Oracle SQL functions in the same way.

Similar to the DB2 extractCLOB function, when the Oracle function extract returns an XML value, the result can only be returned without further processing as part of the result of the complete XQuery expression. DataDirect XQuery® allows the following exception: the result of an extract function can be used as the content expression of an element constructor that is returned as-is from the XQuery expression. You can use the evaluate in-memory extension expression to force DataDirect XQuery® to evaluate the XML in memory.

Using our sample data, the following examples show how to use some common Oracle SQL functions to query XML stored in the database.

extract Function

This example returns all SHARE elements.

declare function ddtek-sql:extract(
$inp as node(), $xp as xs:string) as node() external;
for $h in collection('HOLDINGSXML')/HOLDINGSXML/XMLCOL
return
<extract path='/SHARE'>{
ddtek-sql:extract(
$h/node(),
"/HOLDINGS/SHARE")
}</extract>
extractValue Function

This example returns the value of all SHARE elements for which the COMPANY attribute is set to "Progress Software".

declare function ddtek-sql:extractValue(
$inp as node(), $xp as xs:string) as xs:string external;
for $h in collection('HOLDINGSXML')/HOLDINGSXML/XMLCOL
return
ddtek-sql:extractValue(
$h/node(),
"/HOLDINGS/SHARE[@COMPANY = 'Progress Software']")
Using the evaluate-in-memory Extension Expression with the extract Function

In this example, notice how the argument of the fn:node-name is the result of a SQL extract function that normally would have to be returned as-is from the expression. In this case, you must use the evaluate-in-memory extension expression to force DataDirect XQuery® to evaluate the XML in memory; otherwise, this query fails and raises an error.

declare function ddtek-sql:extract(
$inp as node(), $xp as xs:string) as node() external;
for $h in collection('HOLDINGSXML')/HOLDINGSXML/XMLCOL
let $SHARE := ddtek-sql:extract($h/node(),"//SHARE")
return
(# ddtek:evaluate-in-memory #) {fn:node-name($SHARE)}

Conclusion

DataDirect XQuery® can query XML stored as character data, but you must provide a Java external function to parse the XML.

Because DataDirect XQuery® uses SQL to query relational databases, this means that DataDirect XQuery® cannot directly evaluate XQuery expressions against XML stored in a column of a relational database. As an alternative, DataDirect XQuery® provides three ways to query XML that is stored in a column of a database as XML:

  • Return the entire XML column.
  • Use the evaluate-in-memory extension expression to write portable queries that evaluate the XML in memory.
  • Use vendor-specific SQL functions to query the XML more efficiently, at the cost of portability. We have shown the most common ways to do this with DB2 and Oracle.

Future versions of DataDirect XQuery® will provide this functionality in a transparent, portable, and efficient way.

Appendix: Creating the Sample Data

We used the following methods to insert the XML data into the XML column named XMLCOL using the example data shipped with DataDirect XQuery®.

DB2

CREATE TABLE HOLDINGSXML (userid varchar(50) not null primary key,
XMLCOL DB2XML.XMLCLOB NOT LOGGED)
INSERT INTO HOLDINGSXML
(SELECT
h.USERID,
xml2clob(XMLELEMENT(name "HOLDINGS",
XMLAGG(
XMLELEMENT(name "SHARE",
XMLATTRIBUTES(s.COMPANYNAME AS COMPANY,h.USERID AS USERID),
h.SHARES))))
FROM HOLDINGS h, STATISTICAL s
WHERE h.STOCKTICKER = s.TICKER
GROUP BY h.USERID)
Oracle

CREATE TABLE HOLDINGSXML(userid varchar2(50) primary key, XMLCOL XMLTYPE)
INSERT INTO HOLDINGSXML
(SELECT
h.USERID,
XMLELEMENT("HOLDINGS",
XMLAGG(
XMLELEMENT("SHARE",
XMLATTRIBUTES(s.COMPANYNAME AS COMPANY,h.USERID AS USERID),
h.SHARES)))
FROM HOLDINGS h, STATISTICAL s
WHERE h.STOCKTICKER = s.TICKER
GROUP BY h.USERID)
jdbc

DataDirect
XQuery Connectors

XQuery Integration to Various Data Sources

community

DataDirect
XQuery Tutorials

Discussions, tips and tricks for
DataDirect Connect drivers