Using Apache Drill to query SQL Server, Salesforce and more using RDBMS Storage Plugin

Introduction

Apache Drill is an open source SQL query engine for Big Data Exploration tool that can connect to various data sources and provide a common SQL interface to them, delivering self-service data exploration capabilities to the data stored in multiple formats. In this tutorial, we will go through on how you can connect to your data in in any Relational, NoSQL, Big Data or SaaS applications and query it using Apache Drill RDBMS Storage plugin and Progress DataDirect JDBC drivers. The tutorial will focus on SQL Server and Salesforce as an example, but you can use similar process to query any data source using Progress DataDirect JDBC drivers.

Download and Install Progress DataDirect JDBC drivers (Linux)

  1. Download Progress DataDirect suite of JDBC drivers from here
  2. Once the package is downloaded, install the JDBC drivers by running the below command in the terminal

     

    java -jar PROGRESS_DATADIRECT_JDBC_SQLSERVER_ALL.jar

     

  3. Follow through the simple installation and you should have the drivers installed in the path you have selected in the installer.
  4. Note that, although you have downloaded SQL Server JDBC, as a part of the trial the package contains other JDBC drivers such as Salesforce, DB2, MYSQL, PostgreSQL, Hive, SQL Server etc., Not all the drivers we offer are in this package, If you can’t find your needed driver in the install folder, go to our website and you should find a different installation package for the driver you need.

Download and Start Apache Drill (Linux)

  1. Download Apache Drill from here
  2. Extract the contents from the downloaded package by running the below command in your terminal

     

    tar -xvzf apache-drill-1.XX.0.tar.gz

     

  3. Copy the SQL Server(sqlserver.jar), Salesforce(sforce.jar) JDBC drivers from Progress install folder(Progress/DataDirect/Connect_for_JDBC_51) to Apache Drills (apache-drill-1.XX.0/jar/) folder. Feel free to copy any other JDBC driver that you may need, if you want to query that data source using Drill.

     

  4. Once the package is extracted, go in to the folder apache-drill-1.XX.0, and run the following command in your terminal to start Drill.

     

    bin/drill-embedded

     

  5. You should now see the prompt 0: jdbc:drill:zk=local> if drill has started successfully.

Configure SQL Server Storage Plugin

  1. Once you have Drill running, open your browser and go to http://<DrillServer>:8047/storage to configure Storage Plugin.
  2. Under “New Storage Plugin”, fill in the textbox on what you want to call your plugin. I named mine “SQLServer”. Note that the storage plugin name is case-sensitive. Click on the Create button.
  3. On the next page, you should see configuration area, fill it up as below.

     

    {
          "type": "jdbc",
          "driver": "com.ddtek.jdbc.sqlserver.SQLServerDriver",
          "url": "jdbc:datadirect:sqlserver://<;Server>:1433;databaseName=Northwind",
          "username": "user",
          "password": "pass",
          "enabled": true
    }

     

  4. Click on button Create and you should see a success message if configured properly.

 

Query SQL Server using Apache Drill

  1. Go to Drill Terminal and type the below command to see all the schemas available
  2. show schemas;
  3. You should see a list of schemas as shown below.

  4. Apache Drill JDBC 1

  5. Now select the Schema where your tables are residing. In my case, the test tables I have created are in Northwind.dbo. Run the below commands to choose the schema Northwind.dbo
  6.     use SQLServer.Northwind;
        use SQLServer.dbo;

     



    Apache Drill JDBC 2

  7. Run your queries against the tables in your SQL Server schema as shown below.
  8.  

    Apache Drill JDBC 3



Configure Salesforce Storage Plugin

  1. Once you have Drill running, open your browser and go to http://<DrillServer>:8047/storage to configure Storage Plugin.
  2. Under “New Storage Plugin”, fill in the textbox on what you want to call your plugin. I named mine “sforce”. Note that the storage plugin name is case-sensitive. Click on the Create button.
  3. On the next page, you should see configuration area, fill it up as below.
    {
        "type": "jdbc",
        "driver": "com.ddtek.jdbc.sforce.SForceDriver",
        "url": "jdbc:datadirect:sforce://login.salesforce.com",
        "username": "user",
        "password": "pass",
        "enabled": true
        }

  4. Click on button Create and you should see a success message if configured properly.

Query Salesforce using Apache Drill

  1. Go to Drill Terminal and type the below command to see all the schemas available


    show schemas;


  2. You should see a list of schemas as shown below.

    Apache Drill JDBC 4



  3. Now select the Schema where your tables are residing. Your tables would be residing in sforce.SFORCE schema. Choose that schema by running the below command.


    use sforce.SFORCE;

  4. Run your queries against the tables in your Salesforce as shown below

  5. Apache Drill JDBC 5


We hope that this tutorial helped you to query your data sources using Apache Drill. We offer a broad range of enterprise JDBC drivers for Relational, Big Data, No SQL and SaaS data sources. Feel free to try any of our drivers with Apache Drill and follow the similar process as shown in this tutorial.
JDBC TUTORIAL

Using Apache Drill to query SQL Server, Salesforce and more using RDBMS Storage Plugin

DataDirect
connectors

Connect any data source to any application

DataDirect
Tutorials

Discussions, tips and tricks for
DataDirect Connect drivers