With the release of Hive 2.3.0, Hive has introduced a new feature called a JDBC Storage Handler. The idea is to use a generic JDBC driver so that you could import your table in to Hive and run Hive QL queries to analyze the data that resided in JDBC tables by joining data from other systems, imported using JDBC Storage handler.
In this tutorial, we will walk you through on how you can use Hive’s native JDBC storage handler to query the data in Relational database – MySQL and SaaS data – Salesforce from Hive. We will be using high-performant Progress DataDirect MySQL and Salesforce drivers to enable this.
The following software is required:
Install Progress DataDirect MySQL and Salesforce JDBC drivers
add jar </
path
/to/mysql.jar>
CREATE EXTERNAL TABLE Album
(
Album_Id int,
AlbumName String,
Artist_Id STRING
)
STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'
TBLPROPERTIES (
"hive.sql.database.type" = "MYSQL",
"hive.sql.jdbc.url" = "jdbc:datadirect:mysql://<;
host
>:3306;User=<
user
>;Password=<
pass
>",
"hive.sql.jdbc.driver" = "com.ddtek.jdbc.mysql.MySQLDriver",
"hive.sql.query" = "SELECT AlbumId, Title, ArtistId FROM odbc01.Album",
"hive.sql.column.mapping" = "Album_Id=AlbumId, AlbumName=Title, Artist_Id=ArtistId",
"hive.sql.dbcp.maxActive" = "1"
);
add jar </
path
/to/sforce.jar>
CREATE EXTERNAL TABLE Account
(
ROWID String,
SYS_NAME String,
TYPE STRING,
INDUSTRY STRING,
ACCOUNTSOURCE STRING
)
STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'
TBLPROPERTIES (
"hive.sql.database.type" = "MYSQL",
"hive.sql.jdbc.url" = "jdbc:datadirect:sforce://login.salesforce.com;User=<
user
>; Password=<
Pass
>;SecurityToken=<
Token
>",
"hive.sql.jdbc.driver" = "com.ddtek.jdbc.sforce.SForceDriver",
"hive.sql.query" = "SELECT ROWID,SYS_NAME,TYPE,INDUSTRY,ACCOUNTSOURCE FROM SFORCE.ACCOUNT",
"hive.sql.column.mapping" = "ROWID=ROWID, SYS_NAME=SYS_NAME, TYPE=TYPE, INDUSTRY=INDUSTRY, ACCOUNTSOURCE=ACCOUNTSOURCE",
"hive.sql.dbcp.maxActive" = "1"
);
You can create external tables for tables in any data source using Progress JDBC drivers and query your data from Hive using its native JDBC storage handler. We hope this tutorial helped you to work with Hive’s native JDBC storage handler. Feel free to comment if you have any questions.