Connecting to Amazon Redshift from R via JDBC Driver

Introduction

Amazon Redshift is a scalable, fully-managed and fast data warehouse used by organizations/users to analyze data in Petabyte scale with advanced security features built-in. R is a language used by statisticians and data scientists for statistical computing, graphical capabilities and data analysis. This makes R a perfect language that can be used along with Redshift for data analysis.

So, how do you start? Well, for beginners you can use Progress DataDirect Amazon Redshift JDBC driver to connect to Redshift from R and this tutorial will walk you through the steps.

Prerequisites

  • Amazon Redshift Cluster that is publicly accessible or accessible with in your VPC
  • Load sample data in to the Redshift cluster. You can follow these steps to load data in to your cluster
  • Install R language by downloading it from here.
  • Install R Studio by downloading it from here.

Installing Progress DataDirect Redshift JDBC Driver

  1. Download the Redshift JDBC driver from the website.
  2. After the download has completed, unzip the package PROGRESS_DATADIRECT_JDBC_REDSHIFT_WIN.zip to extract the files to a folder.
  3. Double click on the PROGRESS_DATADIRECT_JDBC_INSTALL.exe to start the installer. Follow the prompts on the installer and when prompted about type of installation, choose Evaluation Installation.
  4. If you would like to change the installation folder, you can do so during the installation process.
  5. Complete the installation, by click on Next and on the Install Summary page, click on Install.
  6. Exit the installer, once the installation is complete.

Querying Redshift from R

  1. Open R Studio and on the console, run the following command to install RJDBC package, which allows you to connect to any database through JDBC.

     

    install.packages("RJDBC",dep=TRUE)

     

  2. Once the installation is complete, load the RJDBC package by running following command.

     

    library(RJDBC)

     

  3. Run the following commands to register the driver and connect to your Redshift instance.                                                                                                                                                                                                                                       
    drv <- JDBC("com.ddtek.jdbc.redshift.RedshiftDriver", "C:\Program Files\Progress\DataDirect\JDBC_51\lib\redshift.jar", identifier.quote="`")
    conn <- dbConnect(drv, "jdbc:datadirect:redshift://<;hostname>:5439;DatabaseName=dev", "<user>", "<password>")

     

  4. Now that you have established a connection with Redshift, you can run any SQL query that you would like to do. Following are some examples for your reference.

//List all tables
dbListTables(conn)
 
//List User tables
dbGetQuery(conn, "SELECT table_name FROM information_schema.tables WHERE table_schema = 'public'")
 
 
//Executing simple queries
dbGetQuery(conn, "select count(*) from venue")
dbGetQuery(conn, "select * from venue where venueseats > 30000")
 
 
//Reading a table
venues <- dbReadTable(conn, "venue")

Get Started

Whether you are loading data into or pulling data out, DataDirect offers connectivity to Amazon Redshift that blows the competition away. To learn more about DataDirect Redshift driver performance, read this blog about Sumit Sarkar’s Redshift million row challenge.  For the challenge, he loaded 1 million records in a live demo at OOW14, in under 10 mins, without staging the data in S3! Grab a free trial of our Amazon Redshift JDBC driver and get the most out of your Redshift data today.
JDBC TUTORIAL

Connecting to Amazon Redshift from R via JDBC Driver

DataDirect
ODBC connectors

Real-Time Access to XML Data

DataDirect
ODBC Tutorials

Discussions, tips and tricks for
DataDirect Connect drivers