JDBC TUTORIAL

Querying external tables in Hive using native JDBC Storage Handler

Updated: 26 Feb 2021

Introduction

With the release of Hive 2.3.0, Hive has introduced a new feature called a JDBC Storage Handler. The idea is to use a generic JDBC driver so that you could import your table in to Hive and run Hive QL queries to analyze the data that resided in JDBC tables by joining data from other systems, imported using JDBC Storage handler.

In this tutorial, we will walk you through on how you can use Hive’s native JDBC storage handler to query the data in Relational database – MySQL and SaaS data – Salesforce from Hive. We will be using high-performant Progress DataDirect MySQL and Salesforce drivers to enable this.

Prerequisites

The following software is required:

Hive 2.3.0 – Minimum requirement
Have a Salesforce account or MySQL database.

Install Progress DataDirect MySQL and Salesforce JDBC drivers

Download Progress DataDirect Salesforce JDBC driver.
Install the Salesforce JDBC drivers by running the installer. Note that the installer will also install MySQL drivers as a part of the trial.

Querying MySQL tables from Hive

Add the MySQL JDBC driver (mysql.jar) found in the install path (home/Progress/DataDirect/Connect_for_JDBC_51/lib) to Hive lib folder or in hive prompt, run the following command to add the driver for that session.

add jar </path/to/mysql.jar>
Create an external table for your MySQL table as shown below. Replace the query, connection configurations, column mappings and the create statement as per your needs.

CREATE EXTERNAL TABLE Album

(

Album_Id int,

AlbumName String,

Artist_Id STRING

)

STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'

TBLPROPERTIES (

"hive.sql.database.type" = "MYSQL",

"hive.sql.jdbc.url" = "jdbc:datadirect:mysql://<;host>:3306;User=<user>;Password=<pass>",

"hive.sql.jdbc.driver" = "com.ddtek.jdbc.mysql.MySQLDriver",

"hive.sql.query" = "SELECT AlbumId, Title, ArtistId FROM odbc01.Album",

"hive.sql.column.mapping" = "Album_Id=AlbumId, AlbumName=Title, Artist_Id=ArtistId",

"hive.sql.dbcp.maxActive" = "1"

);
Once your external table is created, you are ready to query your MySQL table from Hive. You should see results in your Hive as shown below.

Querying Salesforce tables from Hive

Add the Salesforce JDBC driver (sforce.jar) found in the install path (home/Progress/DataDirect/Connect_for_JDBC_51/lib) to Hive lib folder or in hive prompt, run the following command to add the driver for that session.

add jar </path/to/sforce.jar>
Create an external table for your Salesforce table as shown below. Replace the query, connection configurations, column mappings and the create statement as per your needs.

CREATE EXTERNAL TABLE Account

(

ROWID String,

SYS_NAME String,

TYPE STRING,

INDUSTRY STRING,

ACCOUNTSOURCE STRING

)

STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'

TBLPROPERTIES (

"hive.sql.database.type" = "MYSQL",

"hive.sql.jdbc.url" = "jdbc:datadirect:sforce://login.salesforce.com;User=<user>; Password=<Pass>;SecurityToken=<Token>",

"hive.sql.jdbc.driver" = "com.ddtek.jdbc.sforce.SForceDriver",

"hive.sql.query" = "SELECT ROWID,SYS_NAME,TYPE,INDUSTRY,ACCOUNTSOURCE FROM SFORCE.ACCOUNT",

"hive.sql.column.mapping" = "ROWID=ROWID, SYS_NAME=SYS_NAME, TYPE=TYPE, INDUSTRY=INDUSTRY, ACCOUNTSOURCE=ACCOUNTSOURCE",

"hive.sql.dbcp.maxActive" = "1"

);

Once your external table is created, you are ready to query your Salesforce table from Hive. You should see results in your Hive as shown below.

You can create external tables for tables in any data source using Progress JDBC drivers and query your data from Hive using its native JDBC storage handler. We hope this tutorial helped you to work with Hive’s native JDBC storage handler. Feel free to comment if you have any questions.

Querying external tables in Hive using native JDBC Storage Handler

Introduction

Prerequisites

Querying MySQL tables from Hive

Querying Salesforce tables from Hive

Contents

Connect any application to any data source anywhere

Need additional help with your product?

MarkLogic

Semaphore

OpenEdge

DataDirect

Sitefinity

Telerik

Kendo UI

Corticon

DataDirect

MOVEit

Chef

Flowmon

Kemp LoadMaster

WhatsUp Gold

Telerik

Kendo UI

Fiddler

Test Studio

MOVEit

WS_FTP

Querying external tables in Hive using native JDBC Storage Handler

Introduction

Prerequisites

Querying MySQL tables from Hive

Querying Salesforce tables from Hive

Contents

Connect any application to any data source anywhere

Need additional help with your product?