Import data from Salesforce and others to Apache Kylin using JDBC

Introduction

Apache Kylin is an open source distributed analytics engine which provides OLAP on your massive data sets in Hadoop/Spark and lets you query them less than a second. Apart from Hadoop/Spark – Kylin will also let you connect to your data in RDBMS sources like SQL Server, MySQL, Postgres etc., using JDBC drivers.

In addition to RDBMS data sources, you can connect to a variety of data sources from Apache Kylin using Progress DataDirect JDBC Connectors including SaaS sources like Salesforce, JIRA, Oracle Eloqua, Google Analytics and RDBMS sources like SQL Server, DB2, MySQL, Postgres, OpenEdge etc. In this tutorial, we will show how you can connect and sync tables from Salesforce – but you can use the same steps with any Progress DataDirect JDBC Connectors.

Install Progress DataDirect Salesforce JDBC Driver

  1. Download the Progress DataDirect JDBC Driver for Salesforce.
  2. To install the driver, execute the .jar package. This is done by running the following command in the terminal or just by double clicking on the jar package.
    java -jar PROGRESS_DATADIRECT_JDBC_SF_ALL.jar
  3. This will launch an interactive Java installer which you can use to install the Salesforce JDBC driver to your desired location as either a licensed or evaluation installation.

Configure Salesforce connectivity for Kylin

  1. Kylin uses SQOOP to load the data to HDFS. You will need to copy the DataDirect Salesforce JDBC driver you just installed to two locations.
    1. $SQOOP_HOME/lib
    2. $KYLIN_HOME/ext
  2. Next, configure the Salesforce connection in kylin.properties file. Go to $KYLIN_HOME/conf/kylin.properties and add the below configuration

  3. kylin.source.default=8
    kylin.source.jdbc.connection-url=jdbc:datadirect:sforce://login.salesforce.com;TransactionMode=ignore
    kylin.source.jdbc.driver=com.ddtek.jdbc.sforce.SForceDriver
    kylin.source.jdbc.dialect=default
    kylin.source.jdbc.user=<username>
    kylin.source.jdbc.pass=<password>
    kylin.source.jdbc.sqoop-home=<Your SQOOP_HOME directory>
    kylin.source.jdbc.filed-delimiter=|

 

Sync tables from Salesforce

  1. Go to Kylin Web UI at http://localhost:7070 and open the model tab. Under Model Tab, go to the Open Data sources tab and click on the icon “Load Table from Tree”.
  2. You should now see the list of schemas and Tables in the pop up which represents the data in your Salesforce instance.


    Table select

  3. Choose the tables you want and click on Sync. Now you are ready to build models, create OLAP cubes to query your Salesforce data instantaneously from your BI tools using Kylin.

Feel free to try the Progress DataDirect JDBC Driver for Salesforce to bring your Salesforce data to Apache Kylin for faster querying as well as any other Progress DataDirect JDBC drivers. Let us know if you have any questions as we are happy to help.


JDBC TUTORIAL

Import data from Salesforce and others to Apache Kylin using JDBC

View all Tutorials

Connect any application to any data source anywhere

A product specialist will be glad to get in
touch with you