In this tutorial, you'll learn how to easily extract, transform and load (ETL) on-premises Oracle data into Google BigQuery using Google Cloud Dataflow.
Google Cloud Dataflow is a service for processing and enriching real-time streaming and batch data. Dataflow uses the Apache Beam SDK for Java for data inflow and outflow. As you might expect with a cloud-based solution, the Java I/O has a list of predefined data stores which are primarily cloud and Big Data.
However, Dataflow can be expanded broadly beyond Big Data and the Cloud to many other sources through the JDBC interface. Using Progress DataDirect JDBC connectors, you can open Google Dataflow's processing power to a wide range of on-premises data including Oracle, SQL Server, IBM DB2, Postgres and many more. The capability to expand your data sources means that you can integrate diverse external databases with the Google ecosystem, eliminating non-Google data silos.
Combining on-premises data with cloud technologies almost always raises immediate concerns about security, but the DataDirect Hybrid Data Pipeline lets you securely access data behind any firewall without the requirement to make complex network configurations such as SSH tunnels, reverse proxies or VPNs. It can also be deployed to work with existing network configurations, which is often required in industries such as financial services.
The DataDirect Hybrid Data Pipeline JDBC driver can be used to ingest both on-premises and cloud data to Google Cloud Dataflow through the Apache Beam Java SDK interface. We've written a detailed tutorial to show you how to extract, transform and load (ETL) on-premises Oracle data into Google BigQuery using Google Cloud Dataflow.
Our tutorial demonstrates how to connect to an on-premises Oracle database, read the data, apply a simple transformation and write it to BigQuery. This does not require any additional components from the database vendors.
You can use a similar process with any of the Hybrid Data Pipeline’s supported data sources like SQL Server, Hive, IBM DB2, Salesforce, Amazon Redshift, etc. Check out the tutorial and please contact us if you need any help or have any questions.
Saikrishna is a DataDirect Developer Evangelist at Progress. Prior to working at Progress, he worked as Software Engineer for 3 years after getting his undergraduate degree, and recently graduated from NC State University with Masters in Computer Science. His interests are in the areas of Data Connectivity, SaaS and Mobile App Development.
Let our experts teach you how to use Sitefinity's best-in-class features to deliver compelling digital experiences.
Learn MoreSubscribe to get all the news, info and tutorials you need to build better business apps and sites
Progress collects the Personal Information set out in our Privacy Policy and the Supplemental Privacy notice for residents of California and other US States and uses it for the purposes stated in that policy.
You can also ask us not to share your Personal Information to third parties here: Do Not Sell or Share My Info
We see that you have already chosen to receive marketing materials from us. If you wish to change this at any time you may do so by clicking here.
Thank you for your continued interest in Progress. Based on either your previous activity on our websites or our ongoing relationship, we will keep you updated on our products, solutions, services, company news and events. If you decide that you want to be removed from our mailing lists at any time, you can change your contact preferences by clicking here.