Import data from any REST API in to HDFS using SQOOP

Introduction

One of the most popular interfaces for exposing and consuming data is using a REST API and one of the best ways to store data at a low cost and analyze the data later is HDFS. Data Architects like you encounter the scenarios where your business would want to store the data from REST API’s as they get updated information, so that your BI/Analytics team can run analytics on top of this data. To do this currently you would need a custom plugin/solution to bring that data from the REST API to HDFS, but wouldn’t it be so good if you can use the existing tool like SQOOP and bring in the data from REST API with out having to write a single line of code?

That is exactly what we are going to do with the help of Sqoop and Progress DataDirect Autonomous REST Connector. Progress Autonomous REST Connector allows you to query API data using standard SQL by normalizing the semi-structured JSON response data into structured data that can be used with SQL.

In this tutorial we will be using Autonomous REST Connector with SQOOP, where Autonomous REST Connector will connect to Alpha Vantage REST API (this is an example, you can use any REST API) to get stocks data and SQOOP will ingest this data into HDFS for your future analytical needs. Let’s see how you can do this.

Pre-requisites

Before you start the tutorial, we assume you have the following environment and tools setup.

  1. HDFS and a Hive instance
  2. Sqoop tool
  3. Get your Alpha Vantage API key from here.

Install Autonomous REST Connector

  1. Download Autonomous REST JDBC connector from our website.
  2. Install the connector by running the setup executable file on your machine. The installation process is straight forward, just follow the instructions on the installer.
  3. After you have finished installation, you should find the Autonomous REST connector at the below default path, unless you have chosen to install it at a different location.
    /home/<user>/Progress/DataDirect/JDBC_60/lib/autorest.jar


  4. Copy the Autonomous REST Connector from the above path to Sqoop’s lib folder, by running the below command
    cp /home/<user>/Progress/DataDirect/JDBC_60/lib/autorest.jar $SQOOP_HOME$/lib/autorest.jar


Configuring Autonomous REST connector

  1. To connect to the Alpha Vantage API using Autonomous REST connector, you can just point it to the endpoint with some default values.
  2. For the purpose of the tutorial, I will connect to Stock Time series endpoints offered by the Alpha Vantage. Let’s say I want to get all the time series intraday values for a stock with an interval of 1min, here would be my request


  3. To configure the driver to connect to the above endpoint, all you must do is to use the below JDBC URL
    jdbc:datadirect:autorest:sample=https://www.alphavantage.co/query?function=TIME_SERIES_INTRADAY&symbol=PRGS&interval=1min&apikey=<apikey>


  4. When you connect to any REST API using Autonomous REST Connector, it will automatically sample the API and create a configuration, which you can access by querying the _CONFIGURATION table. You can get this configuration by using the Autonomous REST JDBC driver in any SQL querying tools like Dbeaver, Squirrel SQL etc.,

     

  5. For the tutorial, I am providing this configuration here, save this in a file called alphavantage.rest file.


    {
        "query":{
            "#path":[
                "https://www.alphavantage.co/query"
            ],
            "Meta Data":{
                "1. Information":"VarChar(84)",
                "2. Symbol":"VarChar(64),#key",
                "3. Last Refreshed":"Timestamp(0)",
                "4. Interval":"VarChar(64)",
                "5. Output Size":"VarChar(64)",
                "6. Time Zone":"VarChar(64)"
            },
            "Time Series (1min){Timestamp(0)}":{
                "1. open":"Double",
                "2. high":"Double",
                "3. low":"Double",
                "4. close":"Double",
                "5. volume":"Integer"
            },
            "function":{
                "#type":"VarChar(64),#key",
                "#virtual":true,
                "#default":"TIME_SERIES_INTRADAY",
                "#eq":"function"
            },
            "symbol":{
                "#type":"VarChar(64),#key",
                "#virtual":true,
                "#default":"MSFT",
                "#eq":"symbol"
            },
            "interval":{
                "#type":"VarChar(64),#key",
                "#virtual":true,
                "#default":"1min",
                "#eq":"interval"
            },
            "apikey":{
                "#type":"VarChar(64),#key",
                "#virtual":true,
                "#default":"<your api key>",
                "#eq":"apikey"
            }
        }
    }

    Note: Please use your api key in the #default property under apikey json object.

  6. If you notice the configuration, you can notice that Autonomous REST Connector has detected all the objects, and their data types.

     

  7. To learn more about Autonomous REST Connector and how you can configure it to your needs and to connect to multiple endpoints, we recommend you go through these tutorials after you have finished this one.
  1. Getting Started
  2. Connecting to Multiple Endpoints
  3. Editing Auto generated Schema

Import Data from any REST API to HDFS

  1. To import data from Alpha Vantage API we would be using Autonomous REST connector which will provide SQL interface for the API.This driver would be used with SQOOP to import the data into HDFS.
  2. To start importing the data from the table TIMESERIES1MIN, all you have to do is run the below sqoop command, if you want to use the default values that you used when you sampled the endpoint.
    sqoop import --connect 'jdbc:datadirect:autorest:config=/db/hadoop/scripts/AlphaVantage.rest;TransactionMode=ignore' --table TIMESERIES1MIN --hive-import --create-hive-table --hive-table TIMESERIES1MIN --driver com.ddtek.jdbc.autorest.AutoRESTDriver
  3. If you want to use the custom SQL Query in Sqoop to use custom stock symbols, you can run the following command.
    sqoop import --connect 'jdbc:datadirect:autorest:config=/db/hadoop/scripts/AlphaVantage.rest;TransactionMode=ignore' --query "select * from TIMESERIES1MIN where QUERY_SYMBOL='AAPL' AND \$CONDITIONS" --hive-import --create-hive-table --hive-table TIMESERIES1MIN --driver com.ddtek.jdbc.autorest.AutoRESTDriver --target-dir /usr/hadoop/timeseriesquery --split-by key
  4. After the sqoop command has completed execution, you can find the data from the REST API ingested into Hadoop.
    results

 

It’s so easy to import any REST API to HDFS using Progress DataDirect Autonomous REST connector and you won’t have to write a single line of code to do so. Feel free to try out Autonomous REST connector with any API and if you have any issue or questions, feel free to talk to us.
JDBC TUTORIAL

Import data from any REST API in to HDFS using SQOOP

View all Tutorials

Connect any application to any data source anywhere

A product specialist will be glad to get in
touch with you