Connecting to Amazon Redshift from R via JDBC Driver

Amazon Redshift is a scalable, fully-managed and fast data warehouse used by organizations/users to analyze data in Petabyte scale with advanced security features built-in. R is a language used by statisticians and data scientists for statistical computing, graphical capabilities and data analysis. This makes R a perfect language that can be used along with Redshift for data analysis.

So, how do you start? Well, for beginners you can use Progress DataDirect Amazon Redshift JDBC driver to connect to Redshift from R and this tutorial will walk you through the steps.

Amazon Redshift Cluster that is publicly accessible or accessible with in your VPC
Load sample data in to the Redshift cluster. You can follow these steps to load data in to your cluster
Install R language by downloading it from here.
Install R Studio by downloading it from here.

Download the Redshift JDBC driver from the website.
After the download has completed, unzip the package PROGRESS_DATADIRECT_JDBC_REDSHIFT_WIN.zip to extract the files to a folder.
Double click on the PROGRESS_DATADIRECT_JDBC_INSTALL.exe to start the installer. Follow the prompts on the installer and when prompted about type of installation, choose Evaluation Installation.
If you would like to change the installation folder, you can do so during the installation process.
Complete the installation, by click on Next and on the Install Summary page, click on Install.
Exit the installer, once the installation is complete.

Open R Studio and on the console, run the following command to install RJDBC package, which allows you to connect to any database through JDBC.

install.packages("RJDBC",dep=TRUE)
Once the installation is complete, load the RJDBC package by running following command.

library(RJDBC)
Run the following commands to register the driver and connect to your Redshift instance.

drv <- JDBC("com.ddtek.jdbc.redshift.RedshiftDriver", "C:\Program Files\Progress\DataDirect\JDBC_51\lib\redshift.jar", identifier.quote="`")

conn <- dbConnect(drv, "jdbc:datadirect:redshift://<;hostname>:5439;DatabaseName=dev", "<user>", "<password>")
Now that you have established a connection with Redshift, you can run any SQL query that you would like to do. Following are some examples for your reference.

//List all tables

dbListTables(conn)

//List User tables

dbGetQuery(conn, "SELECT table_name FROM information_schema.tables WHERE table_schema = 'public'")

//Executing simple queries

dbGetQuery(conn, "select count(*) from venue")

dbGetQuery(conn, "select * from venue where venueseats > 30000")

//Reading a table

venues <- dbReadTable(conn, "venue")

Whether you are loading data into or pulling data out, DataDirect offers connectivity to Amazon Redshift that blows the competition away. To learn more about DataDirect Redshift driver performance, read this blog about Sumit Sarkar’s Redshift million row challenge. For the challenge, he loaded 1 million records in a live demo at OOW14, in under 10 mins, without staging the data in S3! Grab a free trial of our Amazon Redshift JDBC driver and get the most out of your Redshift data today.

Connecting to Amazon Redshift from R via JDBC Driver

Introduction

Prerequisites

Installing Progress DataDirect Redshift JDBC Driver

Querying Redshift from R

Get Started

Contents

Connect any application to any data source anywhere

Need additional help with your product?

MarkLogic

Semaphore

OpenEdge

DataDirect

Sitefinity

Telerik

Kendo UI

Corticon

DataDirect

MOVEit

Chef

Flowmon

Kemp LoadMaster

WhatsUp Gold

Telerik

Kendo UI

Fiddler

Test Studio

MOVEit

WS_FTP

Connecting to Amazon Redshift from R via JDBC Driver

Introduction

Prerequisites

Installing Progress DataDirect Redshift JDBC Driver

Querying Redshift from R

Get Started

Contents

Connect any application to any data source anywhere

Need additional help with your product?