Learn how to access MongoDB using a DataDirect JDBC driver with AWS Glue.
AWS Glue is an Extract, Transform, Load (ETL) service available as part of Amazon’s hosted web services. Glue is intended to make it easy for users to connect their data in a variety of data stores, edit and clean the data as needed, and load the data into an AWS-provisioned store for a unified view. Announced in 2016 and officially launched in Summer 2017, Glue greatly simplifies the cumbersome process of setting up and maintaining ETL jobs.
MongoDB is an open-source, NoSQL data store. Rather than the tabular rows and columns format of relational databases, MongoDB uses documents and schemas. MongoDB has grown in popularity and is generally ranked among the top 5 most popular data stores. At Progress, we've seen increased interest in learning how to use MongoDB in an Amazon AWS Glue environment.
Glue supports accessing data via JDBC and currently the databases supported by Glue through JDBC are Postgres, MySQL, Redshift and Aurora. Of course, JDBC drivers exist for many other data sources besides these four. If you want to access any other database with JDBC, you can do so using JDBC drivers through Spark connections. The data can then be processed in Spark or joined with other data sources, and AWS Glue can fully leverage the data in Spark.
Using JDBC connectors you can access many other data sources via Spark for use in AWS Glue. For example, this AWS blog demonstrates the use of Amazon Quick Insight for BI against data in an AWS Glue catalog. Quick Insight supports Amazon data stores and a few other sources like MySQL and Postgres.
With DataDirect JDBC through Spark, you can open up any JDBC-capable BI tool to the full breadth of databases supported by DataDirect drivers, including MongoDB, Salesforce, Oracle and many others.
So, how do you setup a JDBC connection to access data through Spark using a JDBC driver? Here is a quick overview of the simple steps to get started.
The industry standard for JDBC database connectivity, the Progress DataDirect JDBC drivers solve the limitations of Type 4 JDBC drivers, delivering the fastest, most scalable Java application performance. The DataDirect line of JDBC drivers supports all major databases and include advanced enterprise functionality such as application failover, bulk load, SSL data encryption, and operating system authentication using the Kerberos protocol. DataDirect also publishes a Security Vulnerability Response Policy to address all databases in a timely manner—including SaaS, big data and relational sources.
Download a DataDirect JDBC driver today and get started with AWS Glue.
Start My Trial
Nishanth Kadiyala is a Technical Marketing Manager at Progress. He got his B.Tech degree from IIT Guwahati and his MBA from UNC Chapel Hill. He has worked on several technologies including database designing, SQL querying and Cloud Computing in the past. Currently, he is committed to educating enterprises about standards based connectivity via ODBC, JDBC, ADO.NET and OData. He is also proficient with DataDirect Hybrid Connectivity Services – DataDirect Cloud and Hybrid Data Pipeline. You can stay in touch with him through Twitter.
Subscribe to get all the news, info and tutorials you need to build better business apps and sites
Copyright © 2019 Progress Software Corporation and/or its subsidiaries or affiliates.All Rights Reserved.
Progress, Telerik, Ipswitch, and certain product names used herein are trademarks or registered trademarks of Progress Software Corporation and/or one of its subsidiaries or affiliates in the U.S. and/or other countries. See Trademarks for appropriate markings.