Drive Real-Time Analytics with SparkSQL

Drive Real-Time Analytics with SparkSQL

July 08, 2015 0 Comments

Idaliz Baez demonstrates how to use the Progress DataDirect driver for SparkSQL for real-time analytics and BI with SparkSQL.

Back in May, I hosted a webinar focused on driving real-time analytics with SparkSQL and our new Progress® DataDirect® for SparkSQL driver. As a part of my presentation, I demonstrated just how easily our SparkSQL driver can execute complicated queries for fast BI analytics.

The demo went over really well, but if you’re anything like me, I know you won’t be satisfied just watching it—you'll want to get your hands dirty. I figured our readers might like to take a crack at recreating the demo themselves, so this week, I am detailing the steps I took so you can follow along.

How to Use the DataDirect Driver for SparkSQL to Drive Analytics

Step 1: Download Hortonworks 2.2 Sandbox and Load Data

The first step is to watch and follow Gordon Crenshaw’s video on setting up your Hortonworks Hadoop environment. This video will guide you through how to download, install, create a database and upload data.



The Star Wars survey data I used in my demo can be found at github for you to use, but you could run this demo with just about any data you may have laying around.

Step 2: Download and Configure the Spark Software

Next, you’ll need to get Spark and SparkSQL up and running on the Hortonworks Hadoop environment you set up in the previous step. The following video will guide you through the process:


Step 3: Install and Configure the ODBC/JDBC Driver on Windows

You can now choose to watch the ODBC or JDBC installation and configuration video depending on which kind of connector you want to use to access your data.

I used the Progress® DataDirect® ODBC driver in order to connect to Tableau, but you can feel free to use a JDBC connector or a different BI tool if that is what you prefer. At the end of both of these videos, Gordon will lead you through how to establish a connection to SparkSQL.



Step 4: Use a BI Tool to Connect to Your Star Wars Data on Hadoop

Finally, all you have to do is point your BI tool to the SparkSQL data source you configured in the previous steps. For Tableau, this is as simple as: opening the application, choosing “Other Databases (ODBC)” from the menu on the left side of the screen and selecting the DSN you created in the earlier steps. Then just click “connect” and log in!

Figure 2: Select the DSN you configured in previous steps.

Select the DSN you configured in previous steps.

May the BI be with you

At this point, you shouldn’t have any questions about how to access your Big Data sources for BI and analytics. If you still do, though, don’t worry! You can always reach out to us or leave a comment below and we will do everything we can to help!

Idaliz Baez

Idaliz is a Sales Engineer with Progress. After receiving her undergraduate degree from Duke University in Civil and Environmental Engineering, Idaliz Baez spent a year at NASA Goddard Space Flight Center gaining on-the-job experience before returning to Duke in pursuit of her Masters of Engineering Management degree. 

Comments are disabled in preview mode.
Latest Stories
in Your Inbox

Subscribe to get all the news, info and tutorials you need to build better business apps and sites

More From Progress
2020 Progress Data Connectivity Report
2020 Progress Data Connectivity Report
Read More
Getting Ahead of the Hybrid Data Curve
Read More
Creating Quick, Codeless Connectivity with Autonomous REST Connector
Read More