Natural sorting in XQuery

December 13, 2007 Data Platform

Jeff Atwood touched an interesting topic yesterday, Sorting for Humans : Natural Sort Order.

 

Let's sort the following strings: a, A, b, B, 1, 2, 10 The ASCIIbetical order is as follows: 1, 10, 2, A, B, a, b The natural sorting, for most human beings, is as follows: 1, 2, 10, a, A, b, B

 

Sorting strings in most programming languages will result in the ASCIIbetical result, and Jeff wonders if a more human-friendly natural sort option should be built into mainstream programming languages. What about XQuery? We're talking here about collations, and XQuery has built-in support for collations.

 

The default collation in XQuery is the Unicode Codepoint collation. For example,

 

[cc lang="xquery"]for $s in ("a", "A", "b", "B", "1", "2", "10") order by $s return $s[/cc]

 

yields the following result: 1, 10, 2, A, B, a, b.

 

XQuery implementation are allowed to used a different default collation. With DataDirect XQuery, the default collation is based on the locale of your Java Virtual Machine. The query above will result in: 1, 10, 2, a, A, b, B. That's already better, 'a' and 'A' are sorted before 'b' and 'B'. By the way, using the locale implies that on a German system for example, characters like umlaut will collate as a German would expect.

 

But we're not yet there, the numbers are still not naturally sorted. You can achieve this with DataDirect XQuery, by explicitly overriding the default collation and specify the alphanumeric option. As shown in the next query,[cc lang="xquery"]declare default collation "http://www.datadirect.com/xquery/collation?alphanumeric=yes"; for $s in ("a", "A", "b", "B", "1", "2", "10") order by $s return $s[/cc] And we get the desired result: 1, 2, 10, a, A, b, B.

 

Want more tips and tricks?

 

digg_skin = 'compact'; digg_url = 'hhttp://www.xml-connection.com/2007/12/natural-sorting-in-xquery.html';

Tech Tags:

Marc Van Cappellen