Using an Impala JDBC driver to Query Apache Kudu
Apache Kudu is columnar storage manager for Apache Hadoop platform, which provides fast analytical and real time capabilities, efficient utilization of CPU and I/O resources, ability to do updates in place and an evolvable data model that’s simple. You can learn more about Apache Kudu features in detail from the documentation
One of the features of Apache Kudu is that it has a tight integration with Apache Impala, which allows you to insert, update, delete or query Kudu data along with several other operations
. In this tutorial, we will walk you through on how you can access Progress DataDirect Impala JDBC driver to query Kudu tablets using Impala SQL syntax.
Before you start with this tutorial, we expect you to have an existing Apache Kudu instance with Impala installed. If you don’t you can follow this getting started
tutorial to spin up an Apache Kudu VM and load the data in to it.
This tutorial also assumes that you have the Progress DataDirect Impala JDBC driver. If you do not, follow these 3 simple steps:
- Download the Cloudera Impala JDBC driver from here.
- Once the package is downloaded, unzip the package and run the program
- The installation process will be simple, just follow the instructions. For most users, the default settings will be sufficient to install the driver successfully.
Configure and Test Connection
- To configure and connect to Apache Kudu using the DataDirect Impala JDBC driver, we will be using SQL Workbench
- Open SQL Workbench and go to File -> Connect Window, which will open a new window. On the bottom left of that window you will find a button named ‘Manage Drivers’. Click on it.
- Add a new driver by clicking on the new button. Give the name as ‘Impala’ and browse the path to impala.jar which will be in lib folder of installed directory as shown below. Click on OK once you are finished.
- You should be back on the Connect window. Create new connection, give any name to it, and choose Impala(com.ddtek.jdbc.impala.ImpalaDriver) as your driver.
- Fill in the URL for connection in the following format and credentials in respective fields as shown below.
- Click on Test button and you should be able to connect successfully. Click on OK and you should now be able to query your Apache Kudu without any problem.
Once you have followed this getting started tutorial for Apache Kudu, you can run queries against the date.
For example, here are a few basic queries to test it out:
select * from sfmta LIMIT 1
INSERT INTO sfmta VALUES(1323, 123, -122.32, 32.22, 12.322, 52.0)
select * from sfmta where report_time = 1323