Matillion ETL is a powerful cloud platform to integrate with data warehouses such as Snowflake, Redshift, Microsoft Synapse and Google BigQuery, but have you ever needed to integrate on-premises sources?
Forget about complicated VPNs, SSH tunnels or opening ports!
By combining Matillion ETL with Progress DataDirect Hybrid Data Pipeline, we will explore how easily and securely data behind the firewall can be expose into Matillion ETL by leveraging their support for third party JDBC drivers.
In this tutorial, I will show you the following 4 easy steps, with more detailed instructions below.
Install Progress DataDirect Hybrid Data Pipeline into AWS using this tutorial.
Be sure to download all of the components you will require from the trial site:
After you have installed Hybrid Data Pipeline, you will need to open ports TCP 8443, 40501 and 11280 in the cloud so the on-premises connector can access the server. In my case, I created an AWS security group specifically for the Hybrid Data Pipeline server.
You should be able to access https://<hdpserver>:8443 from the system you plan to install the On-Premises Connector (OPC) on. Be sure you can log in using the d2cadmin credentials you specified during the installation.
To enable connecting from the cloud to your on-premises sources, you will need to install the On-Premises Connector. This is a component of the Hybrid Data Pipeline.
First, download the 4 redist files which were created when you installed Hybrid Data Pipeline. You can find these on your linux server here:
<install path>/Progress/DataDirect/Hybrid_Data_Pipeline/Hybrid_Server/redist
Next, place the 4 redist files, along with the On-Premises Connector installer download earlier into a folder on the Windows system you plan to install on. Run the installer following this guide.
Once installed, verify that it connects to the Hybrid Data Pipeline server by opening the OPC configuration tool from your program files menu. Click on the status tab and click Test. All of the tests should return green:
You may also find this tutorial helpful.
Next, you will define a data source within the Hybrid Data Pipeline interface to connect to your on-premises data source.
To connect Matillion ETL to your data source, you will need to install the JDBC driver downloaded earlier. I installed it on my local machine where I access the Matillion web interface. Documentation to install the JDBC driver can be found in these directions. Note you will be using the redist files downloaded from the Hybrid Data Pipeline server above.
Once installed you will need to add the driver profile to Matillion ETL:
Browse to the directory where you installed the JDBC driver to upload ddhybrid.jar. On my system, it is here:
/Users/aburg/Progress/DataDirect/Progress DataDirect Hybrid Data Pipeline for JDBC 4.6/lib
Click Test to verify the driver can be loaded.
You should now be able to create a database query orchestration job in Matillion ETL which connects to the on-premises data source. For the connection URL, be sure to define the path to your Hybrid Data Pipeline server and the name of the data source you created.
jdbc:datadirect:DDhybrid://hdpserver.com:8443;hybridDataPipelineDataSource=<datasourcename>
Username and Password would be your Hybrid Data Pipeline account.
If you are interested in using Progress DataDirect Hybrid Data Pipeline to simplify on-premises data access from Matillion ETL, please download a trial to get connected to your data quickly and safely or contact us to learn more.