Progress DataDirect for JDBC for Apache Spark SQL

    An asterisk (*) indicates support that was added in a hotfix or software patch subsequent to a release.

    Refer to the following resources for additional information:

    • Product Compatibility Guide: Provides the latest data source and platform support information. 
    • Fixes: Describes the issues resolved since general availability.  

    Version 6.0.1

    ENHANCEMENTS
    • The driver has been enhanced to comply with FIPS standards for data encryption. As part of this enhancement, the driver was tested with FIPS 140-3 enabled using a Red Hat OpenJDK 21 on a Red Hat Universal Base Image 9 instance. Refer to FIPS (Federal Information Processing Standard) for details.*
      Available: 1/31/2025 | 6.0.1.001186
    • The driver has been enhanced to allow you to override the default value of the User-Agent header when required by a service. Using the new UserAgent property, you can specify the string value of the User-Agent header to be used in HTTP requests.

      Available: May 2023 | 6.0.1.000623

      For details, refer to UserAgent.
    • The driver has been enhanced to include timestamp in the Spy and JDBC packet
      logs by default. If required, you can disable the timestamp logging by
      specifying the following at connection: For Spy logs, set
      spyAttributes=(log=(file)Spy.log;timestamp=no) and for JDBC packet logs,
      set ddtdbg.ProtocolTraceShowTime=false.
    • Interactive SQL for JDBC (JDBCISQL) is now installed with the product. JDBCISQL is a command-line interface that supports connecting your driver to a data source, executing SQL statements and retrieving results in a terminal. This tool provides a method to quickly test your drivers in an environment that does not support GUIs.*
    • The driver has been enhanced to support the Statement.cancel API, which allows you to cancel running queries. The Statement.cancel API is supported only on Apache Spark SQL 2.0 and higher.*
    • The driver has been enhanced to support the Binary data type for Apache Spark SQL 2.0 and higher, including the following two new connection properties:*
      • MaxBinarySize allows you to specify the maximum length of fields of the Binary data type that the driver describes through result set descriptions and metadata methods.
      • BinaryDescribeType allows you to specify whether Binary columns are described as VARBINARY or LONGVARBINARY.
      For details, MaxBinarySize and BinaryDescribeType.
    • The driver has been enhanced to support HTTP mode, which allows you to access Apache Spark SQL data stores using HTTP/HTTPS requests. HTTP mode can be configured using the new TransportMode and HTTPPath connection properties. For details, refer to TransportMode and HTTPPath.*
    • The driver has been enhanced to support cookie based authentication for HTTP connections. Cookie based authentication can be configured using the new EnableCookieAuthentication and CookieName connection properties. For details, refer to EnableCookieAuthentication and CookieName.*
    • The driver has been enhanced to support the Decimal and Varchar data types. For details, refer to Data Types.
    • The ArrayFetchSize connection property has been added to the driver to improve performance and reduce out of memory errors. ArrayFetchSize can be used to increase throughput or, alternately, improve response time in Web-based applications. For details, refer to ArrayFetchSize.
    CHANGED BEHAVIOR
    • The installer program now requires you to install a JRE that is Java SE 11 or higher before running the installer. In earlier versions, the JRE used by the installer program was included in the product. However, to avoid potential security vulnerabilities, the installer program no longer includes a JRE. Instead, the installer program uses the JRE in your environment to allow for the most secure version of a JRE to be used.*
      Note: This change does not affect the JVM requirements for the driver. For the latest driver requirements, refer to the Product Compatibility Guide.
      Available: 7/3/2024
    • The driver no longer registers the Statement Pool Monitor as a JMX MBean by
      default. To register the Statement Pool Monitor and manage statement pooling
      with standard JMX API calls, the new RegisterStatementPoolMonitorMBean
      connection property must be set to true. For details, refer to RegisterStatementPoolMonitorMBean.
    NOTES, KNOWN ISSUES, and LIMITATIONS
    • When returning result set metadata for Varchar columns, the Spark Thrift server reports the column type as (12) STRING and the precision as 2147483647. For the latest information about this issue, refer to the Apache JIRA SPARK-5918 issue Web page: https://issues.apache.org/jira/browse/SPARK-5918
    • For Spark SQL versions 1.5 and earlier, the Spark Thrift supports only a single connection from a single application (client) per instance. Multiple connections can result in a number of unexpected behaviors, including corrupting data or crashing the Thrift server. While Spark SQL 1.4 resolved some issues related to this limitation, all of the multi-connect issues are not expected to be fixed until Spark SQL 1.6 or later. Refer to https://github.com/apache/spark/pull/8909 for more information.
    • For Spark SQL version 1.4, the rand() function is not supported. If the rand() function is used, the driver throws the following exception: org.apache.spark.sql.AnalysisException: For input string: "TOK_TABLE_OR_COL";
    • For Spark SQL version 1.5, the Spark Thrift server returns (-17) for the hour() function.
    • The driver supports Spark SQL and the core SQL grammar (primarily SQL-92). For information about how the driver handles SQL queries, see the "Supported SQL Functionality" topic in the PROGRESS DATADIRECT FOR JDBC FOR APACHE SPARK SQL DRIVER USER'S GUIDE. Note that Spark SQL uses a subset of SQL and HiveQL. While it provides much of the functionality of SQL, the Spark SQL subset of SQL and HiveQL has some differences and limitations. For the latest information, refer to the "Spark SQL and DataFrame Guide": http://spark.apache.org/docs/latest/sql-programming-guide.html
    • Spark SQL currently supports most Hive features and additional features are continuously being added. For the latest information, refer to the "Compatibility with Apache Hive" section of the "Spark SQL and DataFrame Guide": https://spark.apache.org/docs/latest/sql-programming-guide.html#compatibility-with-apache-hive
    • Apache Spark SQL does not support transactions, and by default, the driver reports that transactions are not supported. However, some applications will not operate with a driver that reports transactions are not supported. The TransactionMode connection property allows you to configure the driver to report that it supports transactions. In this mode, the driver ignores requests to enter manual commit mode, start a transaction, or commit a transaction and return success. Requests to rollback a transaction return an error regardless of the transaction mode specified.
    • For UNIX/Linux users: If you receive an error message when executing any DataDirect for JDBC shell script, make sure that the file has EXECUTE permission. To do this, use the chmod command. For example, to grant EXECUTE permission to the testforjdbc.sh file, change to the directory containing testforjdbc.sh and enter: chmod +x testforjdbc.sh
    • The driver allows PreparedStatement.setXXX methods and ResultSet.getXXX methods on Clob data types, in addition to the functionality described in the JDBC specification. The supported conversions typically are the same as those for LONGVARCHAR, except where limited by database support.
    • The Performance Tuning Wizard is not available with the Apache Spark SQL driver.
    • Internet Explorer with the Google Toolbar installed sometimes displays the following error when the browser is closed: "An error has occurred in the script on this page." This is a known issue with the Google Toolbar and has been reported to Google. When closing the driver's help system, this error may display.

    Version 6.0.0

    GA Release Features
    • Supports read-write access to Apache Spark SQL
    • Supports SSL data encryption
    • Supports Kerberos authentication
    • Supports connection pooling
    • Returns result set metadata for parameterized statements that have been prepared but not yet executed
    • Includes a set of timeout connection properties which allow you to limit the duration of active sessions and how long the driver waits to establish a connection before timing out
    • Includes the TransactionMode connection property which allows you to configure the driver to report that it supports transactions, although Spark SQL does not support transactions. This provides a workaround for applications that do not operate with a driver that reports transactions are not supported.

    Connect any application to any data source anywhere

    Explore all DataDirect Connectors

    A product specialist will be glad to get in touch with you

    Contact Us