A question about querying office spreadsheets, on the Mulberry XSL list.
[cc lang="xquery"]... All I need to do is retrieve the string value of every cell of column "A" whose string length is greater than 20 characters in length. Anyone know how to use XPath to retrieve this from an Excel 2007 file? ...[/cc] Querying office documents with XQuery, or another XML query language like XPath or XSLT, is a popular reoccurring question. We believe DataDirect XQuery should be in your toolset when it comes to querying office documents. Hence we have blogged about it a few times before.Back to the question, looks like a fairly simple question, isn't it? Let's try... Let's start from a concrete OOXML example, a worksheet might look as follows.
[cc lang="xquery"] ...0
0
1.23
1
CONCATENATE("Hello ", "world", "!") Hello world!
2
...
... [/cc]
The sheet's data is structured by row. With each row specifying thenon-empty cells. Cells contain values in the element. here it get's more tricky. String values are not stored in the cell table, unless they are the result of a calculation. In stead you'll find a zero-based index as value, which points into the shared string table where that string is stored uniquely. This is done to optimize load/save performance and to reduce duplication of information. To determine whether the 0 in is a number or an index to a string, the cell's data type must be examined. When the data type indicates string, then it is an index and not a numeric value. In the example above,Hello world!
Hello
wonderful
world
!
Hello wonderful world!
[/cc]
So, back to the question. The next query is the answer. [cc lang="java"]declare namespace ssml = "http://schemas.openxmlformats.org/spreadsheetml/2006/main"; declare variable $sheet := doc("jar:file:///C:/example.xlsx!/xl/worksheets/sheet1.xml"); declare variable $sharedStrings := doc("jar:file:///C:/example.xlsx!/xl/sharedStrings.xml"); for $c in $sheet/ssml:worksheet/ssml:sheetData/ssml:row/ssml:c[matches(@r,"^A[0-9]+")] let $str := if ($c/@t = "s") then string-join(($sharedStrings/ssml:sst/ssml:si)[xs:integer($c/ssml:v)+1]//ssml:t,"") else string($c/ssml:v) where string-length($str)>20 return {$str}[/cc]1. Declares the SpreadsheetML namespace 2. Specifies the XML document representing the first sheet in the spreadsheet 3. Specifies the XML document containing the shared strings table 4. Get all cells in the A column, we use a regular expression to filter all cells starting with an A, followed by a number of alpha numeric characters 5. 6. Test if the cell contains a string value or not 7. Lookup the string value in the shared strings table. As the index in spreadsheetML is zero-based we need to increment the index by one before accessing the n-th element. As the string can be rich text, we string-join all descendent elements 8. 9. Get the value contained in the cell itself 10. As specified in the original question, only get the values of more than 20 characters 11. 12. Return the cell row/column number and the actual value After all this query is longer than one might have thought initially, given the rather simple question. However, all complexity is there due to the Open Office XML standard. Can you imagine to write the equivalent in for example Java? Out of scope for this post, but the solution can easily be enhanced, for example to query all sheets in the spreadsheet. Someone wants to try and write the same for spreadsheets saved in OpenDocument Format?View all posts from Marc Van Cappellen on the Progress blog. Connect with us about all things application development and deployment, data integration and digital business.
Let our experts teach you how to use Sitefinity's best-in-class features to deliver compelling digital experiences.
Learn MoreSubscribe to get all the news, info and tutorials you need to build better business apps and sites
Progress collects the Personal Information set out in our Privacy Policy and the Supplemental Privacy notice for residents of California and other US States and uses it for the purposes stated in that policy.
You can also ask us not to share your Personal Information to third parties here: Do Not Sell or Share My Info
We see that you have already chosen to receive marketing materials from us. If you wish to change this at any time you may do so by clicking here.
Thank you for your continued interest in Progress. Based on either your previous activity on our websites or our ongoing relationship, we will keep you updated on our products, solutions, services, company news and events. If you decide that you want to be removed from our mailing lists at any time, you can change your contact preferences by clicking here.