Build, protect and deploy apps across any platform and mobile device
Leverage a complete UI toolbox for web, mobile and desktop development
Automate UI, load and performance testing for web, desktop and mobile
Rapidly develop, manage and deploy business apps, delivered as SaaS in the cloud
Automate decision processes with a no-code business rules engine
Build mobile apps for iOS, Android and Windows Phone
Optimize data integration with high-performance connectivity
Connect to any cloud or on-premise data source using a standard interface
Build engaging multi-channel web and digital experiences with intuitive web content management
Now and then I need to write applications that use data stored in Excel spreadsheets. As Marc has already discussed in other posts, the new Open Office XML format used by recent releases of Excel, Word and Powerpoint opens the door to the possibility of using XML tools to manipulate Office documents without going through the pain of looking for and using conversion utilities.Recently I had to solve a problem involving fetching a reference table from a spreadsheet; to better understand how you can fetch data from spreadsheets serialized as OOXML, I searched for some documentation describing the structure of OOXML files, which is not entirely trivial. I ended up finding this useful entry in MSDN which describes how to directly access OOXML documents to accomplish a handful of operations using VB or C#, and in particular how to retrieve cell values.Given a OOXML spreadsheet, a worksheet name and a cell ID, retrieving the cell value involves basically the following steps:
As you can see, what you would consider a relatively simple task is not that simple. The MSDN article provides solutions in VB and C#; you can see how the code looks like here.Of course you can solve the same problem using XQuery; and, as you might (should?) expect, doing that in XQuery is way simpler - considering we are manipulating and navigating XML documents: [cc lang="xquery"]declare namespace ooxl = "http://schemas.openxmlformats.org/spreadsheetml/2006/main";
declare namespace r = "http://schemas.openxmlformats.org/officeDocument/2006/relationships";
declare namespace rp = "http://schemas.openxmlformats.org/package/2006/relationships";
declare function local:get-cell($spreadsheetURI as xs:string, $sheetName as xs:string, $cell as xs:string)
let $stringsDoc := doc(concat("zip:", $spreadsheetURI, "!/xl/sharedStrings.xml"))
let $workbookDoc := doc(concat("zip:", $spreadsheetURI, "!/xl/workbook.xml"))
let $relsDoc := doc(concat("zip:", $spreadsheetURI, "!/xl/_rels/workbook.xml.rels"))
let $sheetFileName := $relsDoc/rp:Relationships/rp:Relationship[@Id = $workbookDoc/ooxl:workbook/ooxl:sheets/ooxl:sheet[@name=$sheetName]/@r:id]/@Target
let $sheetDoc := doc(concat("zip:", $spreadsheetURI, "!/xl/", $sheetFileName))
let $cellNode := $sheetDoc/ooxl:worksheet/ooxl:sheetData/ooxl:row/ooxl:c[@r = $cell]
if ($cellNode/@t="s") then
else if ($cellNode/@t="b") then
if ($cellNode/ooxl:v = 1) then "TRUE" else "FALSE"
[/cc]You can test the function with an expression like:
[cc lang="xquery"]local:get-cell("file:///c:/mySpreadsheet.xlsx", "2008Data", "B2")[/cc]Encouraged by this simple test, I spent some more time trying to achieve what my original goal was: retrieving tables and ranges in my spreadsheet. The result is the XQuery attached to this post, that I developed and tested against DataDirect XQuery; in addition to the get-cell() function, there are two top-level functions that I found particularly useful:[cc lang="xquery"]
declare function local:get-table($spreadsheetURI as xs:string, $sheetName as xs:string, $tableName as xs:string) as element();
declare function local:get-range($spreadsheetURI as xs:string, $sheetName as xs:string, $range as xs:string) as element();
[/cc]Both functions return data in a simple XML format controlled by the get-range-impl() function; you can change it the way you like, of course.If you have the patience to go through the XQuery, you'll notice that I had to create a few interesting functions to convert numeric col/row values to/from the typical letters/numbers notation used in spreadsheets ("AB21" -> 28,21; 512,99 -> "SR99"); that's probably the main difference between the functions manipulating ranges/tables and the one manipulating single cells is. I'm sure there are more intelligent and performant ways to obtain a similar result (maybe playing with regular expressions), but this was good enough starting point for me.Hopefully you'll find these XQuery functions as useful as I did; and hopefully you'll find some bugs I missed ;)
View all posts from Minollo on the Progress blog. Connect with us about all things application development and deployment, data integration and digital business.
Copyright © 2017, Progress Software Corporation and/or its subsidiaries or affiliates.
All Rights Reserved.
Progress, Telerik, and certain product names used herein are trademarks or registered trademarks of Progress Software Corporation and/or one of its subsidiaries or affiliates in the U.S. and/or other countries. See Trademarks or appropriate markings.