Apache Kudu is columnar storage manager for Apache Hadoop platform, which provides fast analytical and real time capabilities, efficient utilization of CPU and I/O resources, ability to do updates in place and an evolvable data model that’s simple. You can learn more about Apache Kudu features in detail from the documentation
One of the features of Apache Kudu is that it has a tight integration with Apache Impala, which allows you to insert, update, delete or query Kudu data along with several other operations
. In this tutorial, we will walk you through on how you can access Progress DataDirect Impala JDBC driver to query Kudu tablets using Impala SQL syntax.
Before you start with this tutorial, we expect you to have an existing Apache Kudu instance with Impala installed. If you don’t you can follow this getting started
tutorial to spin up an Apache Kudu VM and load the data in to it.
This tutorial also assumes that you have the Progress DataDirect Impala JDBC driver. If you do not, follow these 3 simple steps:
- Download the Cloudera Impala JDBC driver from here.
- Once the package is downloaded, unzip the package and run the program
- The installation process will be simple, just follow the instructions. For most users, the default settings will be sufficient to install the driver successfully.
Once you have followed this getting started tutorial for Apache Kudu, you can run queries against the date.
For example, here are a few basic queries to test it out:
select * from sfmta LIMIT 1
INSERT INTO sfmta VALUES(1323, 123, -122.32, 32.22, 12.322, 52.0)
select * from sfmta where report_time = 1323