Grouping an XML document based on element names

Grouping an XML document based on element names

by Minollo
Posted on November 07, 2008 0 Comments

It has been a while since we have talked about some "pure XML" problem to be solved with XQuery; so when I read this un-answered post on the Stylus Studio Developer Network I thought that was a good chance to talk about it here as an interesting XQuery example.
The problem involves moving from a flat XML structure like this one: [cc lang="xquery"] 1 2 3 4 5 6 7 8 9 [/cc] a more hierarchical XML that "explodes" the implicit structure hidden in the original XML element names: [cc lang="xquery"]

1 2 3 ...

... [/cc]In the end this is a grouping problem, but a bit trickier than usual, as it involves recognizing and exploding the groups from the original XML element names.
Even if XQuery 1.0 doesn't support grouping explicitly, the fn:distinct-values() function is extremely useful in solving grouping problems. fn:distinct-values() gets a sequence of atomic values in input and returns a sequence containing the same values with any duplicate removed. That helps a lot with our problem, as we can retrieve what all the unique top level categories are (MAINx) and what the unique sub categories are (SUBy) for each top level one. Add to that a very simple use of the fn:tokenize() function that splits a name like "MAIN1_SUB1_COLNAME1" into a sequence like ("MAIN1", "SUB1", "COLNAME1"), and the problem is easily solved; here is the XQuery I came up with:[cc lang="xquery"]declare option ddtek:serialize "indent=yes";

declare function local:splitName($node as element()) as xs:string* { tokenize($node/local-name(), "_") };

let $input :=

1 2 3 4 5 6 7 8 9

return { for $mainlevel-prefix in distinct-values($input/*/local:splitName(.)[1]) return element {$mainlevel-prefix} { for $sublevel-prefix in distinct-values($input/*[starts-with(local-name(), concat($mainlevel-prefix, "_"))]/local:splitName(.)[2]) return { for $sublevel-node in $input/*[starts-with(local-name(), concat($mainlevel-prefix,"_",$sublevel-prefix,"_"))] return element {local:splitName($sublevel-node)[3]} {$sublevel-node/text()} } } } That generates the following XML result, which is what we are looking for: