Access ODBC Data Sources in Jupyter Python Notebook Example

Jupyter Notebook is a web-based interactive application that enables users to create notebook documents that feature live code, interactive plots, widgets, equations, images etc., and share these documents easily. It is also quite versatile as it can support many programming languages via kernels such as Julia, Python, Ruby, Scala, Haskell and R.

How can users connect to their databases using standards-based connectivity like ODBC or JDBC from various applications? In this tutorial, we will walk through on how you can query your Oracle database using Progress DataDirect Oracle ODBC driver. Note that you can use similar procedure to query/access your other databases using Progress DataDirect suite of ODBC drivers which includes drivers for Relational, Big Data, NoSQL and Cloud data sources.

Install Jupyter through Anaconda which will also install Python for you.
Install unixODBC by running the following command on your terminal.

sudo apt-get install unixodbc-dev unixodbc-bin unixodbc
Install Python module pyodbc that helps you access the ODBC databases from Python simple. Run the following command to install it.

pip install pyodbc

Download DataDirect Oracle ODBC driver.
Extract files from the downloaded package to a new directory by running following command.

tar -xvf PROGRESS_DATADIRECT_ODBC_ORACLE_LINUX_64.tar.Z -C /path/to /directory/
Install ksh interpreter by running the following command on your terminal.

sudo apt-get install ksh
Go to the directory where you have extracted in the previous step and run the following command to start the installation

./unixmi.ksh
Accept the license terms and when prompted for a Key and serial no, enter EVAL for both of them to start your 15-day trial of the Oracle ODBC driver.
Choose Oracle Wire Protocol as the driver that you would like to install when the installer asks for the driver that you would like to install and continue to finish the installation.
After the installation is complete, go to the path, /path/to/Progress/DataDirect/Connect64_for_ODBC_71 and run the shell script odbc.sh or odbc.sh to add new environment variables (ODBCINI, ODBCINST, LD_LIBRARY_PATH) to your machine.

Go to the path, /path/to/Progress/DataDirect/Connect64_for_ODBC_71 and you should see a file odbc.ini where you can configure a data source for your oracle database.
Open the odbc.ini file and you should see there should see that it has a template for a data source called Oracle Wire Protocol that is using DataDirect 7.1 Oracle Wire protocol as the driver.
For the purpose of this tutorial, lets edit the template to configure the connection to your oracle data source. Following is an example of the configuration that I have done to connect to my Oracle database. I have highlighted the properties that I have modified.

[Oracle Wire Protocol]

Driver=/home/progress/Progress/DataDirect/Connect64_for_ODBC_71/lib/ddora27.so

Description=DataDirect 7.1 Oracle Wire Protocol

AlternateServers=

ApplicationUsingThreads=1

AccountingInfo=

Action=

ApplicationName=

ArraySize=60000

AuthenticationMethod=1

BulkBinaryThreshold=32

BulkCharacterThreshold=-1

BulkLoadBatchSize=1024

BulkLoadFieldDelimiter=

BulkLoadRecordDelimiter=

CachedCursorLimit=32

CachedDescLimit=0

CatalogIncludesSynonyms=1

CatalogOptions=0

ClientHostName=

ClientID=

ClientUser=

ConnectionReset=0

ConnectionRetryCount=0

ConnectionRetryDelay=3

DataIntegrityLevel=0

DataIntegrityTypes=MD5,SHA1

DefaultLongDataBuffLen=1024

DescribeAtPrepare=0

EditionName=

EnableBulkLoad=0

EnableDescribeParam=0

EnableNcharSupport=0

EnableScrollableCursors=1

EnableStaticCursorsForLongData=0

EnableTimestampWithTimeZone=0

EncryptionLevel=0

EncryptionMethod=0

EncryptionTypes=AES128,AES192,AES256,DES,3DES112,3DES168,RC4_40,RC4_56,RC4_128,RC4_256

FailoverGranularity=0

FailoverMode=0

FailoverPreconnect=0

FetchTSWTZasTimestamp=0

GSSClient=native

HostName=<Host name of your database>

HostNameInCertificate=

InitializationString=

KeyPassword=

KeyStore=

KeyStorePassword=

LoadBalanceTimeout=0

LoadBalancing=0

LocalTimeZoneOffset=

LockTimeOut=-1

LoginTimeout=15

LogonID=<LoginID>

MaxPoolSize=100

MinPoolSize=0

Module=

Password=<password>

Pooling=0

PortNumber=<Port Number>

ProcedureRetResults=0

ProgramID=

QueryTimeout=0

ReportCodePageConversionErrors=0

ReportRecycleBin=0

ServerName= <Remove this line>

ServerType=0

ServiceName=

SID=XE

TimestampeEscapeMapping=0

TNSNamesFile=<tnsnames.ora_filename>

TrustStore=

TrustStorePassword=

UseCurrentSchema=1

ValidateServerCertificate=1

WireProtocolMode=2
Save the odbc.ini file and close the editor.

Before you start the Jupyter application, make sure that you have the ODBCINI, ODBCINST and LD_LIBRARY_PATH in your environment variables. To check it, run the command env or echo $ODBCINI or echo $ODBCINST or echo $LD_LIBRARY_PATH
Start Jupyter by running the command jupyter notebook. You should see browser open up and load the homepage for Jupyter.
Under Files tab, go to New -> Python Notebook (default or conda root), which will open a new notebook.
To connect to your database and fetch some data, run the following sample script.

##Importing pyodbc module

import pyodbc

##connecting to your database through ODBC

cnxn = pyodbc.connect('DSN=Oracle Wire Protocol;UID=<user>;PWD=<password>', autocommit=True)

cursor = cnxn.cursor()

##Executing a simple query and printing the records

cursor.execute("SELECT * FROM GDP ORDER BY YEAR")

for row in cursor.fetchall():

print (row)

In the previous section you have seen how you can connect to a database from Jupyter Python notebook through ODBC, now let’s apply that to create interactive plots. Note that this section is purely optional.
Before you proceed any further, install the Plotly graphing library for Python that helps you create plots and also register for a Plotly account to get the API key for using their API. Run the following command to install Plotly.

pip install plotly
Following is a sample script that I wrote in Python notebook which fetches the GDP data from Oracle database through DataDirect Oracle ODBC driver and generates a line plot which visualizes the GDP for US, UK, China, India and World from the year 2000.

##importing modules

import pyodbc;

import plotly.plotly as py

import plotly.graph_objs as go

##Signing into Plotly to use the API

py.sign_in(username='saiteja09', api_key='9aoJRKWK4z8NYoTpb2Tv');

##Establishing connection

cnxn = pyodbc.connect('DSN=Oracle Wire Protocol;UID=pets;PWD=progress', autocommit=True)

cursor = cnxn.cursor()

years = []

ind_gdp = []

wor_gdp = []

us_gdp = []

chn_gdp = []

uk_gdp = []

##Executing query and appending the data in to respective lists

cursor.execute("SELECT * FROM GDP ORDER BY YEAR")

for row in cursor.fetchall():

    years.append(row[4])

    ind_gdp.append(row[1])

    wor_gdp.append(row[2])

    us_gdp.append(row[3])

    chn_gdp.append(row[5])

    uk_gdp.append(row[6])

##Setting X and Y axis, and the name of trace

trace0 = go.Scatter(x=years, y=ind_gdp,name='India')

trace1 = go.Scatter(x=years, y=wor_gdp,name='World')

trace2 = go.Scatter(x=years, y=us_gdp,name='US')

trace3 = go.Scatter(x=years, y=chn_gdp,name='China')

trace4 = go.Scatter(x=years, y=uk_gdp,name='UK')

##The Plot

data = [trace0, trace1, trace2, trace3, trace4]

py.iplot(data, filename='basic-line-plot')
Here is the screenshot of the plot that above script generated in my Python Notebook.

GDP for US, UK, China, India and World from the year 2000

GDP Plot for various countries

We hope this tutorial helped you to understand on how you can connect to your database using Progress DataDirect Oracle ODBC driver and visualize the data in your Jupyter Notebooks. Feel free to try other Progress DataDirect ODBC drivers for your data analysis and visualization purposes on your Jupyter notebooks. Also learn about how DataDirect drivers are more viable in than open source drivers with regards to licensing complexity, open source risk, legal risk, product quality and support.

Note: Progress DataDirect ODBC drivers are not certified with the unixODBC Driver Manager and reported issues must be reproducible with a supported version of the DataDirect ODBC Driver Manager for Unix/Linux.

Access ODBC Data Sources in Jupyter Python Notebook

Introduction

Prerequisites

Install Progress DataDirect Oracle ODBC Driver

Configuring ODBC Connection for Oracle:

Querying Data from Python Notebook

Visualizing Data in Python Notebook

Next Steps

Contents

Connect any application to any data source anywhere

Need additional help with your product?