Connect to Salesforce from Azure Databricks

Introduction

Azure Databricks is a Spark-based analytics platform that will let you read your data from multiple data sources such as Azure Blob, Azure Data Lake, Azure SQL Databases etc., and turn it into breakthrough insights using Spark. Apart from the data sources you can connect to from Azure Databricks, there are several external data sources you would want to connect to like Salesforce, Eloqua, IBM DB2, Oracle etc., to get better insights from all your data in different silos.

You can connect to these various data sources using Progress DataDirect JDBC connectors. In this tutorial we will walk you through how you can connect to these data sources from Azure Databricks. To get started, we will be showing how you can connect to Salesforce – but you can use the same steps to connect to Eloqua, IBM Db2, Oracle and other DataDirect JDBC data sources.

Download and Install Progress DataDirect Salesforce JDBC Driver

  1. You would have to upload the Salesforce JDBC driver to Azure Databricks, but before you do that you would have to install Salesforce JDBC driver on your local machine.
  2. Download DataDirect Salesforce JDBC driver from here.
  3. To install the driver, you would have to execute the .jar package and you can do it by running the following command in terminal or just by double clicking on the jar package.
    java -jar PROGRESS_DATADIRECT_JDBC_SF_ALL.jar
  4. This will launch an interactive java installer which you can use to install the Salesforce JDBC driver to your desired location as either a licensed or evaluation installation.
  5. Note that this will install Salesforce JDBC driver and a bunch of other drivers too for your trial purposes in the same folder.

Upload Salesforce JDBC Driver to Azure Databricks

  1. Navigate to the folder where you installed Salesforce JDBC driver, it should be found in the below path

    C:\Program Files\Progress\DataDirect\JDBC_60\lib\sforce.jar
  2. Go to Clusters, select the cluster where you will be running the workloads. In that cluster, go to Libraries tab and Click on Install new. Select the Salesforce driver in the above path, click on Install to install the driver to that cluster as shown below.

Connect to Salesforce

  1. Open a new notebook and check if you are able to find JDBC driver, by running the following code.
    Class.forName("com.ddtek.jdbc.sforce.SForceDriver")


  2. Create the JDBC URL for connecting to Salesforce by running the below code. Replace your username, password and Security Token (optional) in the below code.


    val jdbcHostname = "login.salesforce.com"
    val jdbcSecurityToken = "Your Security Token"
     
    // Create the JDBC URL without passing in the user and password parameters.
    val jdbcUrl = s"jdbc:datadirect:sforce://${jdbcHostname};SecurityToken=${jdbcSecurityToken}"
     
    // Create a Properties() object to hold the parameters.
    import java.util.Properties
    val connectionProperties = new Properties()
     
    connectionProperties.put("user", "Your username")
    connectionProperties.put("password", "Your password")



  3. Set the driver class name by running the following command
    val driverClass = "com.ddtek.jdbc.sforce.SForceDriver"
    connectionProperties.setProperty("Driver",driverClass)
  4. Now you can read the data from Salesforce by running the below code. Here is an example code to connect to Salesforce and print schema for Opportunity table.


    val opportunity_table = spark.read.jdbc(jdbcUrl, "Opportunity", connectionProperties)
    opportunity_table.printSchema



  5. You can query the Opportunity table and print the data to get started –

     

    display(opportunity_table.select("AMOUNT", "FISCALQUARTER").groupBy("FISCALQUARTER").avg("AMOUNT"))



  6. Now you can either apply transformations or connect to another data source, join the data and apply your transforms.

 

Feel free to try the Salesforce JDBC driver and other Progress DataDirect JDBC drivers and let us know if you have any questions.

JDBC TUTORIAL

Connect to Salesforce from Azure Databricks

View all Tutorials

Connect any application to any data source anywhere

A product specialist will be glad to get in
touch with you