XQuery for the SQL Programmer - Grouping and aggregation

August 05, 2008 Data Platform

Today's topic in the XQuery for the SQL programmer series is grouping. Where grouping is built-in the SQL language, this is unfortunately a bit less trivial in XQuery. Let's start with a simple example, count the number of bids for every user.

[cc lang="sql"]select USERID, count(*) from BIDS group by USERID[/cc]

XQuery has no built-in grouping construct, but you can achieve the same using the distinct-values() function, which we introduced in our last week's post.

[cc lang="xquery"]for $userid in distinct-values(collection("BIDS")/BIDS/USERID) return

{$userid} {count(collection("BIDS")/BIDS[USERID = $userid])} [/cc]

Let's now add a having clause, we're interested in the users with more than 1 bid,

[cc lang="sql"]select USERID, count(*) from BIDS group by USERID having count(*) > 1[/cc]

Our previous XQuery is changed as follows. Note that we use a let-clause, which we introduced in Joining Data.

[cc lang="xquery"]for $userid in distinct-values(collection("BIDS")/BIDS/USERID) let $count := count(collection("BIDS")/BIDS[USERID = $userid]) where $count > 1 return

{$userid} {$count } [/cc]

And we further detail our query, as we're only interested in significant bids of more than 100$.

[cc lang="xquery"]select USERID, count(*) from BIDS where BID > 100 group by USERID having count(*) > 1[/cc]

In XQuery we do,

[cc lang="xquery"]for $userid in distinct-values(collection("BIDS")/BIDS/USERID) let $count := count(collection("BIDS")/BIDS[USERID = $userid and BID > 100]) where $count > 1 return

{$userid} {$count } [/cc]

As a final example, let's discuss a grouping scenario based on two values. For all items group on the rating of the user, and count the number of bids, in addition to the average bid,

[cc lang="sql"]select ITEMNO, RATING, AVG(BID), COUNT(*) from xvs001.BIDS, xvs001.USERS where USERS.USERID = BIDS.USERID GROUP BY RATING, ITEMNO ORDER BY ITEMNO, RATING[/cc]

In XQuery one could write the following query. Not the two for-loops over distinct-values().

[cc lang="xquery"]for $rating in distinct-values(collection("USERS")/USERS/RATING) for $itemno in distinct-values(collection("BIDS")/BIDS/ITEMNO) for $userid in collection("USERS")/USERS[RATING=$rating]/USERID let $bids := collection("BIDS")/BIDS[ITEMNO=$itemno and USERID = $userid] let $avg := avg($bids/BID) let $count := count($bids) where $bids order by $itemno, $rating return

{$itemno} {$rating } {$avg} {$count} [/cc]

We can conclude that resolving grouping problems in XQuery 1.0 is less simple than in SQL. Unfortunate, but today's reality. In the margin, the XML Query Working Group has recognized the issue, and is active working on adding native grouping capabilities to the language. All this in scope of XQuery 1.1. Today still a working draft, but at least we know it will change and improve over time.

Tech Tags:

Marc Van Cappellen