Tutorial: Using Google Cloud Dataflow to Ingest Data Behind a Firewall

Tutorial: Using Google Cloud Dataflow to Ingest Data Behind a Firewall

February 27, 2018 0 Comments
Tutorial Using Google Cloud Dataflow to Ingest Data Behind a Firewall_870x450

In this tutorial, you'll learn how to easily extract, transform and load (ETL) on-premises Oracle data into Google BigQuery using Google Cloud Dataflow.

Google Cloud Dataflow is a service for processing and enriching real-time streaming and batch data. Dataflow uses the Apache Beam SDK for Java for data inflow and outflow.  As you might expect with a cloud-based solution, the Java I/O has a list of predefined data stores which are primarily cloud and Big Data.

However, Dataflow can be expanded broadly beyond Big Data and the Cloud to many other sources through the JDBC interface. Using Progress DataDirect JDBC connectors, you can open Google Dataflow's processing power to a wide range of on-premises data including Oracle, SQL Server, IBM DB2, Postgres and many more. The capability to expand your data sources means that you can integrate diverse external databases with the Google ecosystem, eliminating non-Google data silos.

Combining on-premises data with cloud technologies almost always raises immediate concerns about security, but the DataDirect Hybrid Data Pipeline lets you securely access data behind any firewall without the requirement to make complex network configurations such as SSH tunnels, reverse proxies or VPNs. It can also be deployed to work with existing network configurations, which is often required in industries such as financial services.

Firewall Friendly Access to On-Premises Data Sources

The DataDirect Hybrid Data Pipeline JDBC driver can be used to ingest both on-premises and cloud data to Google Cloud Dataflow through the Apache Beam Java SDK interface. We've written a detailed tutorial to show you how to extract, transform and load (ETL) on-premises Oracle data into Google BigQuery using Google Cloud Dataflow.

Our tutorial demonstrates how to connect to an on-premises Oracle database, read the data, apply a simple transformation and write it to BigQuery. This does not require any additional components from the database vendors.

You can use a similar process with any of the Hybrid Data Pipeline’s supported data sources like SQL Server, Hive, IBM DB2, Salesforce, Amazon Redshift, etc. Check out the tutorial and please contact us if you need any help or have any questions.

View the Tutorial

Saikrishna-Teja-Bobba_164x164.jpg

Saikrishna Teja Bobba

Saikrishna is a DataDirect Developer Evangelist at Progress. Prior to working at Progress, he worked as Software Engineer for 3 years after getting his undergraduate degree, and recently graduated from NC State University with Masters in Computer Science. His interests are in the areas of Data Connectivity, SaaS and Mobile App Development.

Comments
Comments are disabled in preview mode.
Topics
 
 
Latest Stories in
Your Inbox
Subscribe
More From Progress
New_Mobile_Dev_Ebook_Progress_Website_Thumbail
The New Mobile Development Landscape
Download Whitepaper
 
IDC Spotlight Sitefinity Thumbnail
Choosing the Right Digital Experience Platform to Improve Business Outcomes
Download Whitepaper
 
TheFastestWayToBuildMobileAppsArtboard-2
The Fastest Way to Build Mobile Apps With Cloud Data
Watch Webinar