Split up an XML document

Split up an XML document

January 17, 2008 0 Comments

In the series XQuery generating multiple XML documents we talk today about splitting a large XML document in several smaller ones. In some scenarios such huge documents are simply unmanageable, requiring the need to split them up.

 

Consider one of the Shakespeare plays marked up in XML, for example Hamlet. We want to split for this play each speech in a separate XML document.

 

[cc lang="xquery"]for $speech at $i in doc("http://www.andrew.cmu.edu/user/akj/shakespeare/hamlet.xml") /PLAY//SPEECH let $url := concat("C:/SHAKESPEAR/", string-join($speech/SPEAKER,"_"), $i, ".xml") return ddtek:serialize-to-url($speech, $url, "omit-xml-declaration=no,indent=yes")[/cc]

 

It outputs 1138 files in the C:SHAKESPEAR directory.

 

Or you might need to generate an HTML file for each speech, and have a document referencing them all. In the next example, each of the speech HTML files are created through ddtek:serialize-to-url, the query result is the master HTML referencing the others.

 

[cc lang="xquery"]{ for $speech at $i in doc("http://www.andrew.cmu.edu/user/akj/shakespeare/hamlet.xml") /PLAY//SPEECH let $url := concat("C:/SHAKESPEAR/", string-join($speech/SPEAKER,"_"), $i, ".html") let $htmlspeech := {$speech/SPEAKER/text()}:
{ for $line in $speech/LINE return ($line/text(),
) } return ( ddtek:serialize-to-url($htmlspeech, $url, "method=html"), <a href="{$url}">{ $speech/SPEAKER/text() },
) } [/cc]

 

As you see, using the ddtek:serialize-to-url function it becomes very natural to split XML documents in multiple ones. And of course, all this taking advantage of DataDirect XQuery's XML Streaming capabilities, enabling support for huge XML documents.

 

digg_skin = 'compact'; digg_url = 'http://www.xml-connection.com/2008/01/split-up-xml-document.html'

Tech Tags:

Marc Van Cappellen

View all posts from Marc Van Cappellen on the Progress blog. Connect with us about all things application development and deployment, data integration and digital business.

Comments
Comments are disabled in preview mode.
Topics
 
 
Latest Stories in
Your Inbox
Subscribe
More From Progress
d12fcc0bdb669b804e7f71198c9619a7
5 Questions Automakers Should Ask to Improve Asset Uptime
Download Whitepaper
 
SF_MQ_WCM
2018 Gartner Magic Quadrant Web Content Management (WCM)
Download Whitepaper
 
What-Serverless-Means-For-Enterprice-Apps-Kinvey
What Serverless Means for Enterprise Apps
Watch Webinar