AWS Glue is an Extract, Transform, Load (ETL) service available as part of Amazon’s hosted web services. Glue is intended to make it easy for users to connect their data in a variety of data stores, edit and clean the data as needed, and load the data into an AWS-provisioned store for a unified view.
A lot of organizations now use REST APIs to expose and consume data. We often see that they also want to store this data coming from the REST APIs to provide real time business intelligence or analytics. The problem with this approach is that each of these REST APIs are built differently. Their authentication schemes differ, their response structures differ and when you want to bring in this data into AWS RedShift, S3 or EMR Hive using AWS Glue, you end up writing a lot of code for each of these services. This can mean a lot of unnecessary effort.
With Progress DataDirect Autonomous REST Connector, you can connect to any REST API without you having to write a single line of code and run SQL queries to access the data via a JDBC interface. In this tutorial we will show how you can use Autonomous REST Connector with AWS Glue to ingest data from any REST API into AWS Redshift, S3, EMR Hive, RDS etc., We will be using the Yelp API for this tutorial and we’ll use AWS Glue to read the API data using Autonomous REST Connector. Finally, we’ll write it to S3.
- Click on the Run Job button to start the job. You can see the status by going back and selecting the job that you have created.
- After the Job has run successfully, you should have a csv file in S3 with the data that you extracted using Autonomous REST Connector.
This is just one example of how easy and painless it can be with Progress DataDirect Autonomous REST Connector to pull data into AWS Glue from any REST API. Feel free to try the connector with any application you want. If you have any questions, please contact us or comment below.