Create and deliver personalized experiences across digital properties at scale
Build engaging websites with intuitive web content management
Leverage a complete UI toolbox for web, mobile and desktop development
Build, protect and deploy apps across any platform and mobile device
Build mobile apps for iOS, Android and Windows Phone
Rapidly develop, manage and deploy business apps, delivered as SaaS in the cloud
Automate UI, load and performance testing for web, desktop and mobile
Host, deploy and scale Node.js, Java and .NET Core apps on premise or in the cloud
Optimize data integration with high-performance connectivity
Automate decision processes with a no-code business rules engine
Globally scale websites with innovative content management and infrastructure approaches
Content-focused web and mobile solution for empowering marketers
Faster, tailored mobile experiences for any device and data source
UX and app modernization to powerfully navigate today's digital landscape
Fuel agility with ever-ready applications, built in the cloud
Well, the temptation to keep reviewing license agreements rather than playing with XQuery was strong... but I resisted! I had to help this guy! :)
I was happy to notice that the spreadsheets had been saved using the Office Open XML (OOXML) format; OOXML is basically a package of XML files describing the document (text, spreadsheet, slides) that it represents. Being XML-based, it's quite easy to inspect it using an XML query language; the package is a zip format, so individual documents can be retrieved using the standard "jar:" file scheme, for example.
So, coming back to the guy asking for help joining two spreadsheets; what can we do for him? If you look at the structure of the two spreadsheets he sent me (attached to this post), you'll see two XML documents that are relevant to us:
To retrieve the whole content of xl/worksheets/sheet1.xml from c:doc1.xlsx using a Java-based XQuery processor like DataDirect XQuery, you can just us the standard fn:doc() function:
The index corresponding to the character string contained in the first cell of the first row in the spreadsheet can be retrieved doing:
...where ooxml is a namespace prefix associated to the URI http://schemas.openxmlformats.org/spreadsheetml/2006/main .
And if you have the index of the character string, the character string itself can be retrieved as:
Easy, isn't it!?
Add some syntax, a couple of FLWOR expressions, a few HTML tags, and here you go! This XQuery just joins the two spreadsheets and creates an easy to read HTML report:
[cc lang="xquery"]declare namespace ooxml=
let $doc1Strings := doc(“jar:file:///c:/doc1.xlsx!/xl/sharedStrings.xml”)//ooxml:si
let $doc2Strings := doc(“jar:file:///c:/doc2.xlsx!/xl/sharedStrings.xml”)//ooxml:si
for $row1 in
let $col1-1 := $doc1Strings[$row1/ooxml:c/ooxml:v/number()+1]/ooxml:t/text()
let $col2-1 := $doc2Strings[$row2/ooxml:c/ooxml:v/number()+1]/ooxml:t/text()
where $col1-1 = $col2-1
let $col1-2 := $doc1Strings[$row1/ooxml:c/ooxml:v/number()+1]/ooxml:t/text()
let $col2-2 := $doc2Strings[$row2/ooxml:c/ooxml:v/number()+1]/ooxml:t/text()
It's that simple! Here are the two spreadsheets, if you want to try it yourself: doc1.xlsx, doc2.xlsx
View all posts from Minollo on the Progress blog. Connect with us about all things application development and deployment, data integration and digital business.
Copyright © 2016, Progress Software Corporation and/or its subsidiaries or affiliates.
All Rights Reserved.
Progress, Telerik, and certain product names used herein are trademarks or registered trademarks of Progress Software Corporation and/or one of its subsidiaries or affiliates in the U.S. and/or other countries. See Trademarks or appropriate markings.