Home Partners Company

Working With Flat Files, JSON and other data sources

Converting Proprietary Flat File Formats to Specific XML Structures

Previous tutorials explained how to use XQuery to deal with EDI messages, for example, how to transform a directory of EDI messages, create EDI messages out of your database, converting tab delimited files into EDI, etc. But sometimes you're not dealing with EDI; companies still frequently rely on proprietary flat file formats to exchange or circulate information.

A Flat File to XML Example

Consider the following example of a user who needed to convert a file in a proprietary flat file format into a specific XML structure. The sample file looks like this:

10 0123456789MR JOHN SMITH
20 0123456789ACCT1
30 0123456789SALARY 500000
30 0123456789BONUS 1000
20 0123456789ACCT2
30 0123456789OTHER 100
10 1234566790MR DAVID DOE
20 1234567890ACCT1
30 1234567890SALARY 10000

This isn't EDI, CSV, tab delimited or any of the other standard or semi-standard ways to represent information in text files; it's a proprietary format based on a specific interpretation of the size and position of the fields. The following is the description of records and fields in the proprietary flat file format:

one block is made of:
one record 10 (customer)
1 to N record 20 per record 10 (accounts)
1 to N record 30 per record 20 (transactions)
record 10
record type on 10 characters
customer id on 10 characters
customer lastname on 20 characters
customer firstname on 30 characters
record 20
record type on 10 characters
customer id on 10 characters
account id on 10 characters
record 30
record type on 10 characters
customer id on 10 characters
label on 10 characters
value on 10 characters

How can you deal with such a flat file, without building our own parser? DataDirect XML Converters and Stylus Studio can definitely help! XML Converters support "custom conversions"; a custom conversion defines how files belonging to the same format family should be converted to XML by XML Converters. As the definition of a custom conversion can be a tedious task, Stylus Studio provides an editor that allows you to create custom conversion through an intuitive graphical interface. The easiest way to explain how that would work is to watch short online video tutorial which describes

<root>
<record_10>
<type>10</type>
<customer_id>0123456789</customer_id>
<lastname>MR JOHN</lastname>
<firstname>SMITH</firstname>
</record_10>
<record_20>
<type>20</type>
<customer_id>0123456789</customer_id>
<account_id>ACCT1</account_id>
</record_20>
<record_30>
<type>30</type>
<customer_id>0123456789</customer_id>
<label>SALARY</label>
<value>500000</value>
</record_30>
<record_30>
<type>30</type>
<customer_id>0123456789</customer_id>
<label>BONUS</label>
<value>1000</value>
</record_30>
<record_20>
<type>20</type>
<customer_id>0123456789</customer_id>
<account_id>ACCT2</account_id>
</record_20>
<record_30>
<type>30</type>
<customer_id>0123456789</customer_id>
<label>OTHER</label>
<value>100</value>
</record_30>
<record_10>
<type>10</type>
<customer_id>1234566790</customer_id>
<lastname>MR DAVID</lastname>
<firstname>DOE</firstname>
</record_10>
<record_20>
<type>20</type>
<customer_id>1234567890</customer_id>
<account_id>ACCT1</account_id>
</record_20>
<record_30>
<type>30</type>
<customer_id>1234567890</customer_id>
<label>SALARY</label>
<value>10000</value>
</record_30></root>

Now that we've converted the proprietary flat file into a specific XML structure, the question becomes, what next?

Processing Converted Flat Files from XQuery

Once a flat file to XML custom conversion has been configured, the combination of DataDirect XQuery and XML Converters proves particularly helpful: not only we are able to move from a proprietary flat file format to well formed XML, but we also have all the power, performance and scalability of XQuery to deal with such XML to transform it (potentially, even augment it!) into the final format that we need. The following XQuery — which implicitly relies on the custom converter described above — does the trick; notice that we created a couple of functions to take care of the positional grouping which is required to generate the desired output; that makes the XQuery much easier to read and more reusable:

declare function local:getRelated20s($item) {
let $nextItem :=$item/following-sibling::*[local-name()!="record_20"
and local-name()!="record_30"][1]
for $related in $item/following-sibling::*[local-name()="record_20"]
where if ($nextItem) then $related << $nextItem else true()
return $related
};
declare function local:getRelated30s($item) {
let $nextItem := $item/following-sibling::*[local-name()!="record_30"][1]
for $related in $item/following-sibling::*[local-name()="record_30"]
where if($nextItem) then $related << $nextItem else true()
return $related
};
<DataSets>
{
for $record_10 in
doc('converter:file:///c:/blog1/sample.conv?sampleinput.txt')/root/record_10
return
<DataSet>
<CustomerId>
{$record_10/customer_id/text()}
</CustomerId>
<CustomerName>
{$record_10/lastname/text()}
</CustomerName>
<CustomerFirstName>
{$record_10/firstname/text()}
</CustomerFirstName>
<Accounts>
{
for $record_20  in local:getRelated20s($record_10)
return
<Account>
<AccountId>
{$record_20/account_id/text()}
</AccountId>
<Transactions> {
for $record_30 in local:getRelated30s($record_20)
return
<Transaction>
<Label>{$record_30/label/text()}</Label>
<Value>{$record_30/value/text()}</Value>
</Transaction>
} </Transactions>
</Account>
}
</Accounts>
</DataSet>
}
</DataSets>

Once again, thanks to the fact that DataDirect XQuery and XML Converters make it possible to manipulate a wide variety of data sources (even proprietary flat files!), you are able to deal with transformation (and possibly aggregation) problems from an XML point of view, shielding as much as possible those tasks from the low level details of the data sources.

How to Convert Proprietary Flat Files to XML

In conclusion, DataDirect XML Converters and/or DataDirect XQuery greatly simplifies accessing and working with any non-XML data sources directly from your Java or .NET applications.

Office Documents - Using XQuery With ODF & OOXML

A key benefit of ODF and OOXML for developers is the reuse of existing standards

Marc Van Cappellen is a Director of Development at DataDirect Technologies. Marc has more than 15 years of experience in various data access technologies including ODBC, JDBC, ADO, XQJ, SQL and XQuery.

Over the past year we have seen office documents -- spreadsheets, word processing documents, and the like -- taking a more prominent role in business processes, workflows, and vertical applications. While office documents have always been common place, their proprietary binary formats have made them all but unusable to businesses which have sought to integrate office documents in their processes. New office document standards like the OpenDocument Format(ODF) and Office Open XML (OOXML), however, are making office document integration in business processes a reality.

A key benefit of ODF and OOXML for developers is the reuse of existing standards -- in essence, your office documents are XML documents, which makes available a complete palette of tools for manipulating these documents and the information they contain. Using tools and technologies available today, you can transform office documents to HTML or PDF, store them in an XML database, shred their information and store it in a relation databases, embed SOAP messages, enrich them with external information, and so on.

Read the full article: XQuery Your Office Documents

JSON Processing Using XQuery

There has been a point of controversy over whether JSON or XML is better at dealing with communication between endpoints. For documents, XML is certainly the leader, supporting namespaces and mixed content. But for communications between live programs, JSON with its implicit hash and array support.

Where the message content can be handled equally well by either, JSON is more succinct, but not as much as one might think. But because the syntax is simpler, it parses more quickly.

But it doesn't matter, because with DataDirect XML Converters, you can transform JSON or XML with equal facility.

A Simple JSON XQuery

Here is a retrieval http call of calendar data from Google's developer calendar.

doc("converter:JSON? http://www.google.com/calendar/feeds/ developer-calendar@google.com/public/full ?alt=json &amp;orderby=starttime &amp;singleevents=true &amp;sortorder=ascending &amp;futureevents=true &amp;max-results=4")

(In this and several of the following examples, spaces and line breaks are added for clarity.)

The portions in this color are the actual XQuery syntax, which includes escaping the "&" characters. The portion in this color is the call to the converter.

What the above mini XQuery program (yes, it is actually a complete program) does is fetch the calendar data as JSON and convert it to XML. Because we haven't specified any transformations, the content will just be output.

Taking the resulting JSON from the raw http: query would result in something like this (formatted for ease of reading):

...
"entry": [
{
"id": {"_t": "http://www.google.com/calendar/feeds/
developer-calendar%40google.com/public/full/
r53qk5j7ps71oe0u1mim9rcvq0_20080102T030000Z"},
"published": {"_t": "2007-07-16T16:55:58.000Z"},
"updated": {"_t": "2007-11-15T18:19:05.000Z"},
"category": {
},
"title": {
"type": "text",
"_t": "Joomla Bay Area User Group Meeting"
},
...

which is clearly some sort of nested tree structure. Since XML is also a nested tree structure,* and XQuery actually operates on the XQuery Data Model and not on just raw XML bits, using the JSON XML Converter we can transform this on-the-fly to XML input for another application, including the DataDirect XQuery Engine.

(* actually, this looks like XML converted to JSON, which we're going to deconvert.)

This means the XQuery engine effectively "sees" something like the following — but remember, this is a virtual translation, which means there is no separate step that saves something as JSON and then reloads it as XML. It all happens inside of the JSON XML Converter library.

...
<entry>
    <id>
        <_t>http://www.google.com/calendar/feeds/
developer-calendar%40google.com/public/full/
r53qk5j7ps71oe0u1mim9rcvq0_20080102T030000Z</_t>
    </id>
    <published>
        <_t>2007-07-16T16:55:58.000Z</_t>
    </published>
    <updated>
        <_t>2007-11-15T18:19:05.000Z</_t>
    </updated>
    <category>
        <scheme>http://schemas.google.com/g/2005#kind</scheme>
        <term>http://schemas.google.com/g/2005#event</term>
    </category>
    <title>
        <type>text</type>
        <_t>Joomla Bay Area User Group Meeting</_t>
...

Once the data "looks" like XML, XQuery can transform it. So to see all of the names and dates of the next four upcoming entries, the following XQuery program will do just that:

<html><body><table border="1">{(
<tr><th>Meeting</th><th>Starts</th><th>Ends</th></tr>,
for $entry in doc("converter:JSON?http://www.google.com/calendar/feeds/
developer-calendar@google.com/public/full ?alt=json&orderby=starttime
&singleevents=true &sortorder=ascending&
futureevents=true&max-results=4")/feed/entry
return <tr><td>{ data($entry/title/_t) }</td>
<td>{ xs:dateTime($entry/gd_when/startTime) }</td>
<td>{ xs:dateTime($entry/gd_when/endTime) }</td>
</tr>)
}</table>
</body></html>

Which of course produces the following schedule of upcoming Google-related events:

Meeting

Starts

Ends

Joomla Bay Area User Group Meeting

2008-01-01T19:00:00-08:00

2008-01-01T21:00:00-08:00

Kernel Walkthroughs

2008-01-01T19:00:00-08:00

2008-01-01T21:30:00-08:00

Kernel Walkthroughs

2008-01-08T19:00:00-08:00

2008-01-08T21:30:00-08:00

MySQL Community Meetup

2008-01-14T18:30:00-08:00

2008-01-14T20:30:00-08:00


JSON versus XML — A Moot Point

The real advantage of the DataDirect XML Converters is that they make file format issues irrelevant. Just as DataDirect XQuery is a highly conformant W3C XQuery processor that works the same against any of the databases it is connected to — whether OracleIBM DB2SybaseMySQLSQL Server or whatever — so do the XML Converters mask the difference between different storage formats.

So the JSON adherents can have their way, and the XML adherents theirs, but you can bridge the gap and process data from either or both sources without difficulty.

Accessing Legacy Data Sources

By: Ivan Pedruzzi, Product Architect, The DataDirect XML Converters™ Team.

If you have processed XML using Java, you should be familiar with JAXP (Java API for XML Processing). Here is a simple program that processes an XML document using XSLT:

import javax.xml.transform.*;
public class MyTransform {
public static void main(String args[]) throws Throwable
{
StreamSource input = new StreamSource("file:///c:/input.xml");
StreamSource xslt = new StreamSource("file:///c:/tranform.xsl");
StreamResult output = new StreamResult("file:///c:/result.xml");
TransformerFactory tf = TransformerFactory.newInstance();
tf.newTransformer(xslt).transform(input, output);
}
}

Nice and easy if you are dealing with XML documents, but what happens if your data is not all XML? Assume for instance that we need to exchange data with a business partner who requires files using a specific XML dialect, but that our data is tabular and is stored as CSV files.

Accessing Non-XML Files as XML in Java

Here's a sample from our CSV file of used motorcycles inventory:

make,model,year,mileage
BMW,R1150RS,2004,14274
Kawasaki,GPz1100,1996,60234
Ducati,ST2,1997,24000
Moto Guzzi,LeMans,2001,12393
BMW,R1150R,2002,17439
Ducati,Monster,2000,15682
Aprilia,Futura,2001,17320

Our partner requires an XML document that looks like what follows, and has provided an XML Schema so that we can ensure that the XML we provide complies with his requirements:

<?xml version="1.0"?>
<inventory662;
<bike>
<make>BMW</make>
<model>R1150RS</model>
<year>2004</year>
<mileage>14274</mileage>
</bike>
</inventory>

The first step is simply to create an XML representation of our CSV data.

01.<?xml version="1.0"?>
02.<table>
03.<row>
04.<make>BMW</make>
05.<model>R1150RS</model>
06.<year>2004</year>
07.<mileage>14274</mileage>
08.</row>
09.<row>
10.<make>Kawasaki</make>
11.<model>GPz1100</model>
12.<year>1996</year>
13.<mileage>60234</mileage>
14.</row>
15.<row>
16.<make>Ducati</make>
17.<model>ST2</model>
18.<year>1997</year>
19.<mileage>24000</mileage>
20.</row>
21.<row>
22.<make>Moto Guzzi</make>
23.<model>LeMans</model>
24.<year>2001</year>
25.<mileage>12393</mileage>
26.</row>
27.<row>
28.<make>BMW</make>
29.<model>R1150R</model>
30.<year>2002</year>
31.<mileage>17439</mileage>
32.</row>
33.<row>
34.<make>Ducati</make>
35.<model>Monster</model>
36.<year>2000</year>
37.<mileage>15682</mileage>
38.</row>
39.<row>
40.<make>Aprilia</make>
41.<model>Futura</model>
42.<year>2001</year>
43.<mileage>17320</mileage>
44.</row>
45.</table>

Though some of the specifics vary (additional parent elements, different element names, and so on), because the data is now in XML format, XSLT can be used to create the final XML result.

So, that's the theory (CSV to> XML to> XSLT to> XML), but how do you make it all work? Even assuming you have an XML document that represents data stored in CSV format, the prospect of writing XSLT can be daunting. It might sound intimidating, but using DataDirect XML Converters™ we can create solid, complex, bug-free XSLT in just a few steps.

  • Create a new XSLT Mapper document. The XSLT Mapper generates XSLT code based on the maps you draw between source and target documents.
  • Use the inventory CSV file as the XSLT source. DataDirect XML Converters™ lets us convert it to XML on the fly by selecting user-defined or built-in converters, as shown here:


  • Use the XML schema representing the XML format required by our partner as target
  • Create a map between the relevant elements in the converted CSV source and the XML Schema using simply drag-and-drop
  • Click Preview Result to view the XML document that results from processing the XML document (converted from CSV) with the XSLT stylesheet we designed in the XSLT Mapper

You know what they say ... a picture is worth 1000 words.

DataDirect XML Converters™ did most of the work for us — converting CSV to XML, writing XSLT code based on simple mappings — but what happens if we need to perform the same steps periodically, perhaps as part of larger application so that we can automate the process of exchanging inventory information with our partner? The answer lies in common URLs, and DataDirect XML Converters™'s uncommon Java runtime library.

Automating Non-XML Data Access in Java Applications

URL stands for Uniform Resource Locator. URLs allow us to use a symbolic name to reach a document that either exists somewhere or gets created dynamically when the address represented in the URL is hit. If you look closely inside the DataDirect XML Converters™ Project window, you quickly realize how adapters work — an adapter is represented as a URL, like FTP, HTTP, FILE, and so on.

Let's take a closer look at an adapter's components, using the following adapter as an example:

adapter:CSV:newline=crlf:sep=,:first=yes:escape=\:quotes='"?file:///c:/one.csv
  • adapter: URLs always start with a scheme like HTTP. DataDirect XML Converters™ recognizes a special type of URL, one that uses adapter as its scheme.
  • CSV: Defines the converter we want to invoke.
  • newline=crlf:sep=,:first=yes:escape=\:quotes='": Parameters are specific to individual converters. For instance, in our CSV converter, first=yes instructs DataDirect XML Converters™ to use the first row of the CSV file as list column titles, which in turn are used to provide element names in the final XML document.
  • ?file:///c:/one.csv The name and location of the input file, specified as a URL, that we want to convert. If the adapter URL was universally understood by the Java URL class, we could use it as we would any other URL (like http://www.stylusstudio.com/press/news.rss, for example).

Using The DataDirect XML Converters™ Runtime Library

Let's go back to the original JAXP example. Wouldn't it be cool if we could use the adapter URL to specify the argument for the Java StreamSource method — that is, to effectively use a CSV file as a source for an XSLT transformation, and do it all programmatically? Thanks to the DataDirect XML Converters™ runtime library, we can!

01.import javax.xml.transform.*;
02.import javax.xml.transform.stream.*;
03.
04.// here, we have included the DataDirect XML Converters™ runtime library
05.import com.exln.stylus.io.*;
06.
07.
08.public class MyTransform {
09.public static void main(String args[]) throws Throwable
10.{
11.// we use the createSource method from the DataDirect
12.// XML Converters™ runtime library, which understands
13.// the DataDirect XML Converters™ adapter URL syntax
14.StreamSource input = StylusFileFactory.createSource
15.("adapter:CSV:newline=crlf:sep=,:first=yes:escape=\\:quotes='\"?
16.file:///c:/one.csv", "", true);
17.StreamSource xslt = new StreamSource("file:///c:/tranform.xsl");
18.StreamResult output = new StreamResult("file:///c:/result.xml");
19.TransformerFactory tf = TransformerFactory.newInstance();
20.tf.newTransformer(xslt).transform(input, output);
21.}
22.}

JAXP defines a clever mechanism that allows XSLT processors to delegate URL resolution to an external component called theURIResolver. The XSLT processor invokes the URIResolver to turn a URL used in document()xsl:import, or xsl:includeinstructions into a Source object.

The StylusFileFactory class in the DataDirect XML Converters™ runtime library implements the URIResolver interface, allowing adapter URLs to be resolved in the same fashion as standard URLs. Here is an XSLT fragment that shows how a CSV file is accessed as an XML document through the document() function:

1.<xsl:for-each select="document('
2.adapter:CSV:newline=crlf:sep=,:first=yes:
3.escape=\:quotes='"?file:///c:/one.csv')/table/row"> <xsl:copy-of select="."/> </xsl:for-each>

To register a URIResolver, we use the following code:

1.TransformerFactory tf = TransformerFactory.newInstance();
2.Transformer t = tf.newTransformer(xslt);
3.t.setURIResolver(StylusFileFactory.getFactory());
4.t.transform(input, output);

We hope you enjoyed this overview of DataDirect XML Converters™'s runtime library. Happy Java programming!

Ivan Pedruzzi, The DataDirect XML Converters™ Team

TUTORIAL

Working With Flat Files, JSON and other data sources

jdbc

DataDirect
XQuery Connectors

XQuery Integration to Various Data Sources

community

DataDirect
XQuery Tutorials

Discussions, tips and tricks for
DataDirect Connect drivers