Access ODBC Data Sources in Jupyter Python Notebook

Introduction

Jupyter Notebook is a web-based interactive application that enables users to create notebook documents that feature live code, interactive plots, widgets, equations, images etc., and share these documents easily. It is also quite versatile as it can support many programming languages via kernels such as Julia, Python, Ruby, Scala, Haskell and R.

How can users connect to their databases using standards-based connectivity like ODBC or JDBC from various applications? In this tutorial, we will walk through on how you can query your Oracle database using Progress DataDirect Oracle ODBC driver. Note that you can use similar procedure to query/access your other databases using Progress DataDirect suite of ODBC drivers which includes drivers for Relational, Big Data, NoSQL and Cloud data sources.

Prerequisites

  1. Install Jupyter through Anaconda which will also install Python for you.
  2. Install unixODBC by running the following command on your terminal.                                                                                                                                                                        
    sudo apt-get install unixodbc-dev unixodbc-bin unixodbc
                                                                                                                                                 
  3. Install Python module pyodbc that helps you access the ODBC databases from Python simple. Run the following command to install it.  
pip install pyodbc



Install Progress DataDirect Oracle ODBC Driver

  1. Download DataDirect Oracle ODBC driver.
  2. Extract files from the downloaded package to a new directory by running following command.

     

    tar -xvf PROGRESS_DATADIRECT_ODBC_ORACLE_LINUX_64.tar.Z -C /path/to /directory/

     

  3. Install ksh interpreter by running the following command on your terminal.

     

    sudo apt-get install ksh

     

  4. Go to the directory where you have extracted in the previous step and run the following command to start the installation

     

    ./unixmi.ksh

     

  5. Accept the license terms and when prompted for a Key and serial no, enter EVAL for both of them to start your 15-day trial of the Oracle ODBC driver.
  6. Choose Oracle Wire Protocol as the driver that you would like to install when the installer asks for the driver that you would like to install and continue to finish the installation.
  7. After the installation is complete, go to the path, /path/to/Progress/DataDirect/Connect64_for_ODBC_71 and run the shell script odbc.sh or odbc.sh to add new environment variables (ODBCINI, ODBCINST, LD_LIBRARY_PATH) to your machine.

Configuring ODBC Connection for Oracle:

  1. Go to the path, /path/to/Progress/DataDirect/Connect64_for_ODBC_71 and you should see a file odbc.ini where you can configure a data source for your oracle database.
  2. Open the odbc.ini file and you should see there should see that it has a template for a data source called Oracle Wire Protocol that is using DataDirect 7.1 Oracle Wire protocol as the driver.
  3. For the purpose of this tutorial, lets edit the template to configure the connection to your oracle data source. Following is an example of the configuration that I have done to connect to my Oracle database. I have highlighted the properties that I have modified.                                                                                                                             
    [Oracle Wire Protocol]
    Driver=/home/progress/Progress/DataDirect/Connect64_for_ODBC_71/lib/ddora27.so
    Description=DataDirect 7.1 Oracle Wire Protocol
    AlternateServers=
    ApplicationUsingThreads=1
    AccountingInfo=
    Action=
    ApplicationName=
    ArraySize=60000
    AuthenticationMethod=1
    BulkBinaryThreshold=32
    BulkCharacterThreshold=-1
    BulkLoadBatchSize=1024
    BulkLoadFieldDelimiter=
    BulkLoadRecordDelimiter=
    CachedCursorLimit=32
    CachedDescLimit=0
    CatalogIncludesSynonyms=1
    CatalogOptions=0
    ClientHostName=
    ClientID=
    ClientUser=
    ConnectionReset=0
    ConnectionRetryCount=0
    ConnectionRetryDelay=3
    DataIntegrityLevel=0
    DataIntegrityTypes=MD5,SHA1
    DefaultLongDataBuffLen=1024
    DescribeAtPrepare=0
    EditionName=
    EnableBulkLoad=0
    EnableDescribeParam=0
    EnableNcharSupport=0
    EnableScrollableCursors=1
    EnableStaticCursorsForLongData=0
    EnableTimestampWithTimeZone=0
    EncryptionLevel=0
    EncryptionMethod=0
    EncryptionTypes=AES128,AES192,AES256,DES,3DES112,3DES168,RC4_40,RC4_56,RC4_128,RC4_256
    FailoverGranularity=0
    FailoverMode=0
    FailoverPreconnect=0
    FetchTSWTZasTimestamp=0
    GSSClient=native
    HostName=<Host name of your database>
    HostNameInCertificate=
    InitializationString=
    KeyPassword=
    KeyStore=
    KeyStorePassword=
    LoadBalanceTimeout=0
    LoadBalancing=0
    LocalTimeZoneOffset=
    LockTimeOut=-1
    LoginTimeout=15
    LogonID=<LoginID>
    MaxPoolSize=100
    MinPoolSize=0
    Module=
    Password=<password>
    Pooling=0
    PortNumber=<Port Number>
    ProcedureRetResults=0
    ProgramID=
    QueryTimeout=0
    ReportCodePageConversionErrors=0
    ReportRecycleBin=0
    ServerName= <Remove this line>
    ServerType=0
    ServiceName=
    SID=XE
    TimestampeEscapeMapping=0
    TNSNamesFile=<tnsnames.ora_filename>
    TrustStore=
    TrustStorePassword=
    UseCurrentSchema=1
    ValidateServerCertificate=1
    WireProtocolMode=2

     

  4. Save the odbc.ini file and close the editor.

Querying Data from Python Notebook

  1. Before you start the Jupyter application, make sure that you have the ODBCINI, ODBCINST and LD_LIBRARY_PATH in your environment variables. To check it, run the command env or echo $ODBCINI or echo $ODBCINST or echo $LD_LIBRARY_PATH
  2. Start Jupyter by running the command jupyter notebook. You should see browser open up and load the homepage for Jupyter.
  3. Under Files tab, go to New -> Python Notebook (default or conda root), which will open a new notebook.
  4. To connect to your database and fetch some data, run the following sample script.                                                                                                                                                     
    ##Importing pyodbc module
    import pyodbc
     
    ##connecting to your database through ODBC
    cnxn = pyodbc.connect('DSN=Oracle Wire Protocol;UID=<user>;PWD=<password>', autocommit=True)
    cursor = cnxn.cursor()
     
    ##Executing a simple query and printing the records
    cursor.execute("SELECT * FROM GDP ORDER BY YEAR")
    for row in cursor.fetchall():
        print (row)

Visualizing Data in Python Notebook

  1. In the previous section you have seen how you can connect to a database from Jupyter Python notebook through ODBC, now let’s apply that to create interactive plots. Note that this section is purely optional.
  2. Before you proceed any further, install the Plotly graphing library for Python that helps you create plots and also register for a Plotly account to get the API key for using their API. Run the following command to install Plotly.

     

    pip install plotly

     

  3. Following is a sample script that I wrote in Python notebook which fetches the GDP data from Oracle database through DataDirect Oracle ODBC driver and generates a line plot which visualizes the GDP for US, UK, China, India and World from the year 2000.                                                                                                                               
    ##importing modules
    import pyodbc;
    import plotly.plotly as py
    import plotly.graph_objs as go
     
    ##Signing into Plotly to use the API
    py.sign_in(username='saiteja09', api_key='9aoJRKWK4z8NYoTpb2Tv');
     
    ##Establishing connection
    cnxn = pyodbc.connect('DSN=Oracle Wire Protocol;UID=pets;PWD=progress', autocommit=True)
    cursor = cnxn.cursor()
     
    years = []
    ind_gdp = []
    wor_gdp = []
    us_gdp = []
    chn_gdp = []
    uk_gdp = []
     
    ##Executing query and appending the data in to respective lists
    cursor.execute("SELECT * FROM GDP ORDER BY YEAR")
    for row in cursor.fetchall():
        years.append(row[4])
        ind_gdp.append(row[1])
        wor_gdp.append(row[2])
        us_gdp.append(row[3])
        chn_gdp.append(row[5])
        uk_gdp.append(row[6])
     
    ##Setting X and Y axis, and the name of trace
    trace0 = go.Scatter(x=years, y=ind_gdp,name='India')
    trace1 = go.Scatter(x=years, y=wor_gdp,name='World')
    trace2 = go.Scatter(x=years, y=us_gdp,name='US')
    trace3 = go.Scatter(x=years, y=chn_gdp,name='China')
    trace4 = go.Scatter(x=years, y=uk_gdp,name='UK')
     
    ##The Plot
    data = [trace0, trace1, trace2, trace3, trace4]
    py.iplot(data, filename='basic-line-plot')

     

  4. Here is the screenshot of the plot that above script generated in my Python Notebook.

GDP for US, UK, China, India and World from the year 2000

GDP Plot for various countries

Next Steps

We hope this tutorial helped you to understand on how you can connect to your database using Progress DataDirect Oracle ODBC driver and visualize the data in your Jupyter Notebooks. Feel free to try other Progress DataDirect ODBC drivers for your data analysis and visualization purposes on your Jupyter notebooks. Also learn about how DataDirect drivers are more viable in than open source drivers with regards to licensing complexity, open source risk, legal risk, product quality and support.

Note: Progress DataDirect ODBC drivers are not certified with the unixODBC Driver Manager and reported issues must be reproducible with a supported version of the DataDirect ODBC Driver Manager for Unix/Linux.

ODBC TUTORIAL

Access ODBC Data Sources in Jupyter Python Notebook

jdbc

DataDirect
ODBC connectors

Real-Time Access to XML Data

community

DataDirect
ODBC Tutorials

Discussions, tips and tricks for
DataDirect Connect drivers