Data & AI

DataDirect Redshift Drivers: Million Row Challenge

by Sumit Sarkar Posted on February 09, 2015

DataDirect tests a 1M row bulk load of Redshift data.

Can DataDirect Improve Data Performance?

The challenge? Load one million rows of data into Amazon Redshift, a process that usually takes six hours, in less than one. All using my usual tools and no help from an Amazon S3 bucket. That was the challenge I accepted last October. I can’t say that it wasn’t a little bit daunting.

The reason for this challenge is simple: Until now, the only way to load data into Redshift was to use Amazon S3 buckets. Loading data into Redshift is an isolated, time-consuming and frustrating process, outside of usual workflows. This inefficiency just won’t cut it in today’s performance-driven world. I wanted to prove that this process could be much faster than people realize and easily integrated into your daily workflow.

Bulk-Loading Data at Lightning Speed

So, how did we do? Pretty well, I’d say. Using Progress® DataDirect® drivers, we were able to cut the time to load one million rows of data from six hours down to only eight minutes.

Your Step-by-Step Instructions

It really is as simple as downloading our Progress® DataDirect® Amazon Redshift ODBC driver or JDBC driver. In my demo, I used Oracle Data Integrator, but the drivers I used are compatible with many more tools including:

Microsoft SSIS
IBM DataStage
Informatica PowerCenter
Ab Initio
SAP Data Service
Pentaho Data Integrator
Talend
Syncsort DMExpress
Qlikview Expressor
SAS ETL
Actian DataConnect

Sumit Sarkar DDL target supplier table code

DDL for target supplier table

Once you’ve chosen your tool, just follow these steps:

Obtain Amazon Redshift credentials or sign up for a free trial: http://aws.amazon.com/redshift/free-trial/
Download a free trial of DataDirect Amazon Redshift ODBC driver or DataDirect Amazon Redshift JDBC driver.
Connect to Amazon Redshift and create the DDL for the target supplier table as shown at right.
Download a CSV source file with sample data.
Build a basic workflow to load data from your CSV file into Amazon Redshift using the DataDirect driver.
Run workflow.

The Results Are In

The following images are sample results using Microsoft SQL Server Integration Services 2012 (SSIS) and finishing in less than 10 minutes compared to six hours with the open source Postgres ODBC driver.

DataDirect Million Row Challenge Results for Amazon Redshift

Fir. 1: Million Row Challenge Results

Fig. 2: Data task wofkflow and validation.

Webinar: Get Tips on Better Database Performance

This tutorial shows one way you can get massive improvement in your data connectivity performance, but it’s just a sample insider tip from Progress DataDirect. If you want to discover more ways to improve performance, be sure to register for this February 11 webinar: Industry Insight: Optimize Your Data for Better Performance. We look forward to seeing you there! If you want to get started now, Get Your Free ODBC Driver Trial Now.

I also talk about the challenge in this video:

Try Amazon Redshift Drivers

Sumit Sarkar

Technology researcher, thought leader and speaker working to enable enterprises to rapidly adopt new technologies that are adaptive, connected and cognitive. Sumit has been working in the data access infrastructure field for over 10 years servicing web/mobile developers, data engineers and data scientists. His primary areas of focus include cross platform app development, serverless architectures, and hybrid enterprise data management that supports open standards such as ODBC, JDBC, ADO.NET, GraphQL, OData/REST. He has presented dozens of technology sessions at conferences such as Dreamforce, Oracle OpenWorld, Strata Hadoop World, API World, Microstrategy World, MongoDB World, etc.

Related Tags

Amazon data JDBC load time million row challenge ODBC Redshift SSIS

Progress DataDirect Achieves Google Cloud Ready—AlloyDB Designation

Progress DataDirect’s Drivers for Google AlloyDB offer a high-performing, secure and reliable connectivity solution for JDBC applications to access data in AlloyDB.

Data & AI DataDirect

Todd Wright March 29, 2023

Top 5 Reasons to Use DataDirect with Salesforce

Customers pick Progress DataDirect for Salesforce connectivity because of its security, performance, high availability and more.

Data & AI DataDirect

Todd Wright March 08, 2023

DataDirect Redshift Drivers: Million Row Challenge

Can DataDirect Improve Data Performance?

Bulk-Loading Data at Lightning Speed

Your Step-by-Step Instructions

The Results Are In

Webinar: Get Tips on Better Database Performance

Sumit Sarkar

Related Tags:

Related Products:

DataDirect

Related Tags

Related Articles

DataDirect Redshift Drivers: Million Row Challenge

Can DataDirect Improve Data Performance?

Bulk-Loading Data at Lightning Speed

Your Step-by-Step Instructions

The Results Are In

Webinar: Get Tips on Better Database Performance

Sumit Sarkar

Related Tags:

Related Products:

DataDirect

Related Tags

Related Articles

Latest Stories in Your Inbox