XQuery against RDBMS: let the engine optimize your SQL

August 28, 2007 Data Platform

We recently noticed some questions on a newsgroup that attracted our attention. The question was something similar to this:

How to create an XQuery which generates SQL like

[cc lang="sql" theme="xquery"] SELECT t1.columnName1 FROM Table t1 WHERE t1.columnName2 IN('as','fa','pr')[/cc]

My immediate reaction was: why is he worrying about that? You should't need to think about how to write an XQuery to obtain a specific SQL; it is the XQuery processor's goal to digest your XQuery and make the "best" SQL out of it...

The obvious XQuery that comes to my mind would be...

[cc lang="xquery"] for $ts in fn:collection("Table")/Table where $t1/columnName2 = ('as','fa','pr') return $ts/columnName1[/cc]

Strangely enough, experts on that newsgroup suggested to instead write the query like:

[cc lang="xquery"]for $t in Table() where $t/columnName2 = 'as' or $t/columnName2 = 'fa' or $t/columnName2 = 'pr' return $t[/cc]

...or, assuming a sequence of values on which to filter:

[cc lang="xquery"]for $v in $values for $t in Table() where $t/columnName2 = $v return $t[/cc]

Why would they suggest such an unnatural way to solve that problem in XQuery? The reality is that I'm thinking in terms of what DataDirect XQuery would do; while they are thinking about different XQuery processors. DataDirect XQuery has been designed to re-write XQuery expressions in SQL without forcing the user to code XQuery in a specific way. In XQuery it's possible to express the same logic in many different ways; but it shouldn't be the XQuery author's responsibility to guess about how the underlying XQuery engine optimizes what he writes; it should be the XQuery engine that is able to take the "right decisions" no matter how the user codes the solution in XQuery (at least in reasonably equivalent scenarios, like the one described above).

For the record, the XQueries described above all end up generating the same execution plan in DataDirect XQuery:

If you are interested in more details about how DataDirect XQuery generates SQL when running XQuery against Relational data sources, a good source of information is: the Generating SQL white paper as Consistent SQL Generation.

digg_skin = 'compact';

Minollo