Querying external tables in Hive using native JDBC Storage Handler

Introduction

With the release of Hive 2.3.0, Hive has introduced a new feature called a JDBC Storage Handler. The idea is to use a generic JDBC driver so that you could import your table in to Hive and run Hive QL queries to analyze the data that resided in JDBC tables by joining data from other systems, imported using JDBC Storage handler.

In this tutorial, we will walk you through on how you can use Hive’s native JDBC storage handler to query the data in Relational database – MySQL and SaaS data – Salesforce from Hive. We will be using high-performant Progress DataDirect MySQL and Salesforce drivers to enable this.

Prerequisites

The following software is required:

  • Hive 2.3.0 – Minimum requirement
  • Have a Salesforce account or MySQL database.

Install Progress DataDirect MySQL and Salesforce JDBC drivers

  • Download Progress DataDirect Salesforce JDBC driver.
  • Install the Salesforce JDBC drivers by running the installer. Note that the installer will also install MySQL drivers as a part of the trial.

Querying MySQL tables from Hive

  • Add the MySQL JDBC driver (mysql.jar) found in the install path (home/Progress/DataDirect/Connect_for_JDBC_51/lib) to Hive lib folder or in hive prompt, run the following command to add the driver for that session.

    add jar </path/to/mysql.jar>

     

  • Create an external table for your MySQL table as shown below. Replace the query, connection configurations, column mappings and the create statement as per your needs.

    CREATE EXTERNAL TABLE Album
    (
    Album_Id int,
    AlbumName String,
    Artist_Id STRING
    )
    STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'
    TBLPROPERTIES (
    "hive.sql.database.type" = "MYSQL",
    "hive.sql.jdbc.url" = "jdbc:datadirect:mysql://<;host>:3306;User=<user>;Password=<pass>",
    "hive.sql.jdbc.driver" = "com.ddtek.jdbc.mysql.MySQLDriver",
    "hive.sql.query" = "SELECT AlbumId, Title, ArtistId FROM odbc01.Album",
    "hive.sql.column.mapping" = "Album_Id=AlbumId, AlbumName=Title, Artist_Id=ArtistId",
    "hive.sql.dbcp.maxActive" = "1"
    );

     

  • Once your external table is created, you are ready to query your MySQL table from Hive. You should see results in your Hive as shown below.


  • Hive JDBC Storage Handler

Querying Salesforce tables from Hive

  • Add the Salesforce JDBC driver (sforce.jar) found in the install path (home/Progress/DataDirect/Connect_for_JDBC_51/lib) to Hive lib folder or in hive prompt, run the following command to add the driver for that session.

    add jar </path/to/sforce.jar>

     

  • Create an external table for your Salesforce table as shown below. Replace the query, connection configurations, column mappings and the create statement as per your needs.
  • CREATE EXTERNAL TABLE Account
    (
    ROWID String,
    SYS_NAME String,
    TYPE STRING,
    INDUSTRY STRING,
    ACCOUNTSOURCE STRING
    )
    STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'
    TBLPROPERTIES (
    "hive.sql.database.type" = "MYSQL",
    "hive.sql.jdbc.url" = "jdbc:datadirect:sforce://login.salesforce.com;User=<user>; Password=<Pass>;SecurityToken=<Token>",
    "hive.sql.jdbc.driver" = "com.ddtek.jdbc.sforce.SForceDriver",
    "hive.sql.query" = "SELECT ROWID,SYS_NAME,TYPE,INDUSTRY,ACCOUNTSOURCE FROM SFORCE.ACCOUNT",
    "hive.sql.column.mapping" = "ROWID=ROWID, SYS_NAME=SYS_NAME, TYPE=TYPE, INDUSTRY=INDUSTRY, ACCOUNTSOURCE=ACCOUNTSOURCE",
    "hive.sql.dbcp.maxActive" = "1"
    );


  • Once your external table is created, you are ready to query your Salesforce table from Hive. You should see results in your Hive as shown below.

  • Hive JDBC Storage Handler

You can create external tables for tables in any data source using Progress JDBC drivers and query your data from Hive using its native JDBC storage handler. We hope this tutorial helped you to work with Hive’s native JDBC storage handler. Feel free to comment if you have any questions.

JDBC TUTORIAL

Querying external tables in Hive using native JDBC Storage Handler

DataDirect
connectors

Connect any data source to any application

DataDirect
Tutorials

Discussions, tips and tricks for
DataDirect Connect drivers