Guest Post: 5X performance increase with Teradata's TPT ODBC Operator DataBlockSize

Guest Post: 5X performance increase with Teradata's TPT ODBC Operator DataBlockSize

August 29, 2013 0 Comments

To follow-up on my previous article on Teradata Parallel Transporter (TPT), I invited Jack Harris from WhereScape to share his latest performance tuning tips with TPT and DataDirect Connect for ODBC drivers certified for use by Teradata.  I have worked with Jack on several projects for WhereScape Red shops running Teradata TPT with DataDirect connectivity to build/renovate data warehouses and marts.

Jack's contribution is specific to the DataDirect Connect for ODBC Oracle Wire Protocol driver, however ODBC performance tuning properties are available across all drivers used in TPT projects including SQL Server, Sybase, DB2, MySQL, Postgres, Salesforce, Informix, Hadoop Hive, and Cloudera Impala.

I've seen the Wherescape Wizard/Magician perform at different conferences and turn $1 bills into $100s right before my eyes; but I've got to say I'm even more impressed with Jack's performance results.

WhereScape Wizard at TDWI

Introducing Jack Howard from WhereScape:

One of the most exciting enhancements to Teradata’s TPT ODBC Operator was released in Version 14.0, efix 5.   While DataBlockSize may not cause you to wake up your wife in the middle of the night to make this announcement, it happened at my house.  The ODBC Operator is a vital member in a Teradata data loading strategy because data is never landed – among other issues, you can provision data directly into the database without concern for those pesky Carriage Return or Line Feed Characters one may encounter loading data from a file.

At the core of TPT’s architecture are Operators, and the behavior of an operator is modified by one or more Attributes.  The ability to add Attributes enables TPT to provide new features more easily than any other data provisioning technology available today.

Prior to the availability of the DataBlockSize attribute in the ODBC Operator, buffers were consumed in 64KB blocks, regardless of the size provided by the ODBC driver.  This had an adverse impact on performance – which in turn lead to a variety of perceptions, some of which were not founded in fact.

With the DataBlockSize attribute, you may now change the size of the blocks being read by the ODBC Operator to match the size of the block provided by the driver.   With the Oracle ODBC Wire Protocol driver, ArraySize is provided in bytes.  The DataBlockSize parameter reads kilobytes, so be sure to make the conversion to match these up correctly.

The TPT release note from Teradata provides some additional detail:

This optional attribute that allows the user to dynamically adjust the data block size of the buffer that will hold multiple rows with a single fetch call.  Additionally, the size of the row will affect the number of rows that can be fetched into that data block. End-users and customers will now have the ability to fine tune ODBC Operator performance by directly assigning a number value to the attribute.

In testing I’ve done with our customers, elapsed time has been reduced by over 80%.  This kind of improvement is exactly why you wake up your household, regardless of the time of day.  But, we know whenever a bottle neck is removed, new constraints to performance will become visible.

While the Oracle ArraySize parameter can be set to 4,294,967,296 bytes, we’ve found that probably won’t lead to optimal throughput.  Here are some rules of thumb to keep in mind:

  • We’ve yet to see a benefit of ArraySize & DataBlockSize being configured beyond 2048KB with the Oracle Wire Protocol driver.  In most cases, a size between 1024KB and 2048KB has been optimal.
  • This isn’t magic – if you have extremely large tables you need to ensure the SQL request to the database is supported with Where clause predicates that engage partitions, indexes, or other methods that can enhance database throughput.  Do some test queries in your favorite query tool to make sure you understand the access plan to the source data.  If “Select * From YourTable” doesn’t return in your query tool, don’t expect anything different.
  • Think about setting QueryTimeout – this parameter will tell the ODBC driver to not timeout if the value is set to -1.  My suggestion is to set this value only after it’s proven to be needed.

 

Jack

Jack Howard is a Principal Architect for Teradata Solutions, WhereScape USA.

Bio: Data Architect focused on data warehouse solutions for the last twenty years, on Teradata.  In addition to Teradata, I’ve provided design, development and strategy services on Oracle, DB2, SQL Server architectures for the last thirty years at a variety of Fortune 500 companies.  My last position prior to joining WhereScape was a Center of Excellence Architect for the Teradata Corporation.

Sumit Sakar

Sumit Sarkar

Technology researcher, thought leader and speaker working to enable enterprises to rapidly adopt new technologies that are adaptive, connected and cognitive. Sumit has been working in the data access infrastructure field for over 10 years servicing web/mobile developers, data engineers and data scientists. His primary areas of focus include cross platform app development, serverless architectures, and hybrid enterprise data management that supports open standards such as ODBC, JDBC, ADO.NET, GraphQL, OData/REST. He has presented dozens of technology sessions at conferences such as Dreamforce, Oracle OpenWorld, Strata Hadoop World, API World, Microstrategy World, MongoDB World, etc.

Comments
Comments are disabled in preview mode.
Topics
Latest Stories
in Your Inbox

Subscribe to get all the news, info and tutorials you need to build better business apps and sites

More From Progress
autonomousrest
Creating Quick, Codeless Connectivity with Autonomous REST Connector
Read More
 
kinvey-console-thumbnail
Getting Started with Kinvey
Read More
 
openedge-12-overview-datasheet
OpenEdge 12.0 - Powering Tomorrow's Evolution
Read More