Import SQL Server data into Apache Solr using Data Import Handler via JDBC

Introduction

Apache Solr is an open source enterprise search engine for data stored in HDFS which enables organizations to perform full-text search and real-time indexing. Many applications store data in Relational databases, and searching the data is a common use case that you'll have to deal with. You can search the data in Relational or structured databases by importing the data using Apache Solr’s Data Import Handler. The Data Import handler is a way of importing data from a database using JDBC drivers and indexing it. To help you get started, we put together this tutorial on how you can import data from SQL Server in to Apache Solr for indexing using Data Import Handler via JDBC.

Prerequisites

The tutorial will assume that you are using a Linux machine. You can follow the tutorial even if you are using a Windows machine and it can give you an idea on how to import SQL Server data in to Apache Solr.

This tutorial also assumes that you have a Progress DataDirect JDBC driver for SQL Server. If you do not yet have that driver installed, please follow these steps:

  1. Download Progress DataDirect SQL Server JDBC driver on to your machine.
  2. Install SQL Server JDBC driver by running the below command in the downloaded location.

    java -jar PROGRESS_DATADIRECT_JDBC_SQLSERVER_ALL.jar -i console


  3. This will launch the installer in your console. It should be a fairly straight forward install, but if you need help with installation, you can visit this documentation page.

Download and Configure Apache Solr

  1. Download Apache Solr and extract the files from downloaded package
  2. Start Apache Solr by running the following command in your terminal at the location where you have extracted Apache Solr.

    bin/solr start


  3. Once Solr has started running, create a new core by running the below command.

    bin/solr create -c your_corename


  4. Once the Solr core has been created, shutdown the Solr by running the following command.

    bin/solr stop


  5. Create a new folder lib in the directory /path-to/solr-6.X.X/contrib/dataimporthandler/
  6. Copy the Progress DataDirect SQL Server JDBC driver from your install directory to the above lib folder using the following command:

    cp /path-to/
        Progress/DataDirect/Connect_for_JDBC_51/lib/sqlserver.jar
        /path-to/ solr-
        6.6.0/contrib/dataimporthandler/lib/sqlserver.jar


  7. Go to the path /path-to/solr-6.6.0/server/solr/your_corename/conf. Note that your_corename is the same name that you have provided to create a new core in Step 6.
  8. Add the following configuration lines to solrconfig.xml which will instruct Solr to load data import handler and SQL Server JDBC driver jar files.

    <lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-dataimporthandler-.*\.jar" />
        <lib dir="${solr.install.dir:../../../..}/contrib/dataimporthandler/lib" regex=".*\.jar" />


  9. Add the following configuration lines to solrconfig.xml which will instruct Solr to register the Data Import handler.

    <requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
        <lst name="defaults">
        <str name="config">/path-to/solr-6.6.0/server/solr/your_corename/conf/data-config.xml</str>
        </lst>
        </requestHandler>

    The only parameter that’s required is the ‘config’ which specifies the location where Data Import Handler configuration file. This file contains details on how to connect to data source using JDBC, how to fetch the data, how to process it to generate Solr documents to be indexed.
  10. Go to the path /path-to/solr-6.6.0/server/solr//conf and create a new file data-config.xml. Add the following configuration to the file which has the details on how to connect to SQL Server using Progress DataDirect JDBC driver, which entity to get the data from specified by the query and fields that you are expecting in the query resultset.

    <dataConfig>
        <dataSource type="JdbcDataSource"
        driver="com.ddtek.jdbc.sqlserver.SQLServerDriver"
        url="jdbc:datadirect:sqlserver://<;Server1>;DatabaseName=<dbname>"
        user="<user>"
        password="<pass>"/>
        <document>
        <entity name="Products" pk="ProductID" query="select * from Products">
        <field column="ProductID" name="ProductID"/>
        <field column="ProductName" name="ProductName"/>
        <field column="QuantityPerUnit" name="QuantityPerUnit"/>
        <field column="UnitPrice" name="UnitPrice"/>
        <field column="UnitsinStock" name="UnitsinStock"/>
        <field column="UnitsOnOrder" name="UnitsOnOrder"/>
        <field column="ReorderLevel" name="ReorderLevel"/>
        <field column="Discontinued" name="Discontinued"/>
        </entity>
        </document>
        </dataConfig>


    Note: Progress DataDirect SQL Server JDBC driver also supports NTLM and Kerberos Authentication, if you want to use other than SQL Server Auth mechanism.
  11. Go to the path /path-to/solr-6.6.0/server/solr//conf and open the file managed-schema.xml.
  12. Find the uniqueKey tag and change it to the primary key of the entity that you are querying on. In this case I have ProductID as my primary key for the table Products that I am trying to fetch, so I changed the value of the tag as below.

    <uniqueKey>ProductID</uniqueKey>


  13. Also, you would have to add schema of the table that you are trying to import as shown below to the managed-schema.xml

    <field name="ProductID" type="int" indexed="true" stored="true" required="true" multiValued="false"/>
        <field name="ProductName" type="string" indexed="true" stored="true" multiValued="false"/>
        <field name="SupplierID" type="int" indexed="true" stored="true" multiValued="false"/>
        <field name="CategoryID" type="int" indexed="true" stored="true" multiValued="false"/>
        <field name="QuantityPerUnit" type="string" indexed="true" stored="true" multiValued="false"/>
        <field name="UnitPrice" type="double" indexed="true" stored="true" multiValued="false"/>
        <field name="UnitsInStock" type="int" indexed="true" stored="true" multiValued="false"/>
        <field name="UnitsOnOrder" type="int" indexed="true" stored="true" multiValued="false"/>
        <field name="ReorderLevel" type="int" indexed="true" stored="true" multiValued="false"/>
        <field name="Discontinued" type="boolean" indexed="true" stored="true" multiValued="false"/>


  14. I have also uploaded the managed-schema file that I have used for your reference.

Importing the Data

  1. Start Apache Solr by running the below command in your terminal.

    bin/solr start


  2. Go to browser and open http://localhost:8983/solr to access Solr admin. Choose your Core as shown below.

  3. ApacheSolrDataDirect1


  4. You should now see a new menu. Choose Data Import from the menu and you should see a view as shown below. Click on the Execute button to start the data import.

  5. ApacheSolrDataDirect2


  6. The above execution will perform a full-import of the data using the query that we have defined in data-config.xml. If you want to perform a delta query, you would have to specify delta query in the data-config.xml and choose delta-import command in the above screen.
  7. Once the import has finished, you will see a message as shown below.

    ApacheSolrDataDirect4

  8. To check the imported data, go to Query option and click on Execute Query button as shown below.

  9. ApacheSolrDataDirect3

Conclusion

We hope this tutorial has helped you in getting started with Solr Data Import Handler to import data from SQL Server using the Progress DataDirect SQL Server JDBC driver.  Note that the Progress DataDirect SQL Server JDBC driver has full support for SQL Server Auth, NTLM And Kerberos Authentication, giving you the choice to connect to your server securely in the way you want.

You can follow similar procedure to import data from other data sources like Oracle, Salesforce, IBM DB2 using Progress DataDirect JDBC connectors in to Apache Solr.
JDBC TUTORIAL

Import SQL Server data into Apache Solr using Data Import Handler via JDBC

DataDirect
connectors

Connect any data source to any application

DataDirect
Tutorials

Discussions, tips and tricks for
DataDirect Connect drivers