To follow-up on my previous article on Teradata Parallel Transporter (TPT), I invited Jack Harris from WhereScape to share his latest performance tuning tips with TPT and DataDirect Connect for ODBC drivers certified for use by Teradata. I have worked with Jack on several projects for WhereScape Red shops running Teradata TPT with DataDirect connectivity to build/renovate data warehouses and marts.
Jack's contribution is specific to the DataDirect Connect for ODBC Oracle Wire Protocol driver, however ODBC performance tuning properties are available across all drivers used in TPT projects including SQL Server, Sybase, DB2, MySQL, Postgres, Salesforce, Informix, Hadoop Hive, and Cloudera Impala.
I've seen the Wherescape Wizard/Magician perform at different conferences and turn $1 bills into $100s right before my eyes; but I've got to say I'm even more impressed with Jack's performance results.
One of the most exciting enhancements to Teradata’s TPT ODBC Operator was released in Version 14.0, efix 5. While DataBlockSize may not cause you to wake up your wife in the middle of the night to make this announcement, it happened at my house. The ODBC Operator is a vital member in a Teradata data loading strategy because data is never landed – among other issues, you can provision data directly into the database without concern for those pesky Carriage Return or Line Feed Characters one may encounter loading data from a file.
At the core of TPT’s architecture are Operators, and the behavior of an operator is modified by one or more Attributes. The ability to add Attributes enables TPT to provide new features more easily than any other data provisioning technology available today.
Prior to the availability of the DataBlockSize attribute in the ODBC Operator, buffers were consumed in 64KB blocks, regardless of the size provided by the ODBC driver. This had an adverse impact on performance – which in turn lead to a variety of perceptions, some of which were not founded in fact.
With the DataBlockSize attribute, you may now change the size of the blocks being read by the ODBC Operator to match the size of the block provided by the driver. With the Oracle ODBC Wire Protocol driver, ArraySize is provided in bytes. The DataBlockSize parameter reads kilobytes, so be sure to make the conversion to match these up correctly.
The TPT release note from Teradata provides some additional detail:
This optional attribute that allows the user to dynamically adjust the data block size of the buffer that will hold multiple rows with a single fetch call. Additionally, the size of the row will affect the number of rows that can be fetched into that data block. End-users and customers will now have the ability to fine tune ODBC Operator performance by directly assigning a number value to the attribute.
In testing I’ve done with our customers, elapsed time has been reduced by over 80%. This kind of improvement is exactly why you wake up your household, regardless of the time of day. But, we know whenever a bottle neck is removed, new constraints to performance will become visible.
While the Oracle ArraySize parameter can be set to 4,294,967,296 bytes, we’ve found that probably won’t lead to optimal throughput. Here are some rules of thumb to keep in mind:
Jack Howard is a Principal Architect for Teradata Solutions, WhereScape USA.
Bio: Data Architect focused on data warehouse solutions for the last twenty years, on Teradata. In addition to Teradata, I’ve provided design, development and strategy services on Oracle, DB2, SQL Server architectures for the last thirty years at a variety of Fortune 500 companies. My last position prior to joining WhereScape was a Center of Excellence Architect for the Teradata Corporation.
Sumit Sarkar is a Chief Data Evangelist at Progress, with over 10 years experience working in the data connectivity field. The world's leading consultant on open data standards connectivity with cloud data, Sumit's interests include performance tuning of the data access layer for which he has developed a patent pending technology for its analysis; business intelligence and data warehousing for SaaS platforms; and data connectivity for aPaaS environments, with a focus on standards such as ODBC, JDBC, ADO.NET and ODATA. He is an IBM Certified Consultant for IBM Cognos Business Intelligence and TDWI member. He has presented sessions on data connectivity at various conferences including Dreamforce, Oracle OpenWorld, Strata Hadoop, MongoDB World and SAP Analytics and Business Objects Conference, among many others.
Copyright © 2018 Progress Software Corporation and/or its subsidiaries or affiliates.
All Rights Reserved.
Progress, Telerik, and certain product names used herein are trademarks or registered trademarks of Progress Software Corporation and/or one of its subsidiaries or affiliates in the U.S. and/or other countries. See Trademarks for appropriate markings.