XQuery TUTORIAL

How to Split XML Files into Multiple Files Tutorial

Updated: 30 Dec 2021

Introduction

A common challenge in XML development is splitting a large XML document in several smaller ones. In some scenarios such huge XML documents are simply unmanageable, requiring the need to split XML.

Split XML Files: An Example

Consider one of the Shakespeare plays marked up in XML, for example Hamlet. We want to split XML for this play, each speech in a separate XML document.

for $speech at $i in
 doc("http://www.andrew.cmu.edu/user/akj/shakespeare/hamlet.xml")
 /PLAY//SPEECH
let $url := concat("C:/SHAKESPEAR/",
 string-join($speech/SPEAKER,"_"),
 $i, ".xml")
return ddtek:serialize-to-url($speech, $url,
 "omit-xml-declaration=no,indent=yes")

It outputs 1138 files in the C:\SHAKESPEAR directory.

Or you might need to generate an HTML file for each speech, and have a document referencing them all. In the next example, each of the speech HTML files are created through ddtek:serialize-to-url, the query result is the master HTML referencing the others.

<html>{
 for $speech at $i in
 doc("http://www.andrew.cmu.edu/user/akj/shakespeare/hamlet.xml")
 /PLAY//SPEECH
 let $url := concat("C:/SHAKESPEAR/",
 string-join($speech/SPEAKER,"_"),
 $i, ".html")
 let $htmlspeech := <html>
 <b>{$speech/SPEAKER/text()}</b>:<br/>{
 for $line in $speech/LINE
 return 
 ($line/text(),<br/>)
 }</html>
 return 
 (
 ddtek:serialize-to-url($htmlspeech, $url, "method=html"),
 <a href="{$url}">{
 $speech/SPEAKER/text()
 }</a>,
 <br/>
 )
}</html>

Using the ddtek:serialize-to-url function it becomes very natural to split XML documents in multiple ones. And of course, all this taking advantage of DataDirect XQuery's XML Streaming capabilities, enabling support for huge XML documents.