Deliver superior customer experiences with an AI-driven platform for creating and deploying cognitive chatbots
Deliver Awesome UI with the most complete toolboxes for .NET, Web and Mobile development
Automate UI, load and performance testing for web, desktop and mobile
A complete cloud platform for an app or your entire digital business
Detect and predict anomalies by automating machine learning to achieve higher asset uptime and maximized yield
Automate decision processes with a no-code business rules engine
Optimize data integration with high-performance connectivity
Connect to any cloud or on-premises data source using a standard interface
Build engaging multi-channel web and digital experiences with intuitive web content management
Personalize and optimize the customer experience across digital touchpoints
Build, protect and deploy apps across any platform and mobile device
Rapidly develop, manage and deploy business apps, delivered as SaaS in the cloud
Well, the temptation to keep reviewing license agreements rather than playing with XQuery was strong... but I resisted! I had to help this guy! :)
I was happy to notice that the spreadsheets had been saved using the Office Open XML (OOXML) format; OOXML is basically a package of XML files describing the document (text, spreadsheet, slides) that it represents. Being XML-based, it's quite easy to inspect it using an XML query language; the package is a zip format, so individual documents can be retrieved using the standard "jar:" file scheme, for example.
So, coming back to the guy asking for help joining two spreadsheets; what can we do for him? If you look at the structure of the two spreadsheets he sent me (attached to this post), you'll see two XML documents that are relevant to us:
To retrieve the whole content of xl/worksheets/sheet1.xml from c:doc1.xlsx using a Java-based XQuery processor like DataDirect XQuery, you can just us the standard fn:doc() function:
The index corresponding to the character string contained in the first cell of the first row in the spreadsheet can be retrieved doing:
...where ooxml is a namespace prefix associated to the URI http://schemas.openxmlformats.org/spreadsheetml/2006/main .
And if you have the index of the character string, the character string itself can be retrieved as:
Easy, isn't it!?
Add some syntax, a couple of FLWOR expressions, a few HTML tags, and here you go! This XQuery just joins the two spreadsheets and creates an easy to read HTML report:
[cc lang="xquery"]declare namespace ooxml=
let $doc1Strings := doc(“jar:file:///c:/doc1.xlsx!/xl/sharedStrings.xml”)//ooxml:si
let $doc2Strings := doc(“jar:file:///c:/doc2.xlsx!/xl/sharedStrings.xml”)//ooxml:si
for $row1 in
let $col1-1 := $doc1Strings[$row1/ooxml:c/ooxml:v/number()+1]/ooxml:t/text()
let $col2-1 := $doc2Strings[$row2/ooxml:c/ooxml:v/number()+1]/ooxml:t/text()
where $col1-1 = $col2-1
let $col1-2 := $doc1Strings[$row1/ooxml:c/ooxml:v/number()+1]/ooxml:t/text()
let $col2-2 := $doc2Strings[$row2/ooxml:c/ooxml:v/number()+1]/ooxml:t/text()
It's that simple! Here are the two spreadsheets, if you want to try it yourself: doc1.xlsx, doc2.xlsx
View all posts from Minollo on the Progress blog. Connect with us about all things application development and deployment, data integration and digital business.
Copyright © 2018 Progress Software Corporation and/or its subsidiaries or affiliates.
All Rights Reserved.
Progress, Telerik, and certain product names used herein are trademarks or registered trademarks of Progress Software Corporation and/or one of its subsidiaries or affiliates in the U.S. and/or other countries. See Trademarks for appropriate markings.