Yes! You can finally query your office documents!

October 22, 2007 Data Platform

When I was working on an XML database in the late nineties, I remember hearing a lot of noise about the fact that — finally! — XML would allow for real reuse and collaboration when working with applications like word processors and spreadsheet editors. Copy like "Data is finally disjoint from format; store the data as XML, and use other languages [XSLT, typically] to take care of the format." was easy to find in industry publications and seemed to herald the start of something big.

Well, XML did work that way, but only in limited cases; for the vast majority of applications, XML didn't provide any of the benefits of separate data and formatting. Fast forward a decade, and there's now genuine promise in this area.

Two emerging standards, OpenDocument Format (ODF) and Office Open XML (OOXML), are gaining popularity (and originating several fights in standard bodies), and things are moving again, even if they're not moving in exactly the way people thought they would some time ago: neither ODF nor OOXML really create a clear separation between data and presentation. Instead, they have adopted an XML format that provides for a mix of data and presentation. But both standards have adopted XML, which means that you can finally use standard XML tools, like XQuery, to query your office documents!

Marc recently wrote a nice article about how you can leverage XQuery to query XML-based office document standards; he provides an excellent technical overview, and the included examples are good references about how to get started experimenting with XQuery and these new formats.

Minollo