Michael Coutsoftides, Principal Solutions Engineer at Progress DataDirect
Michael Coutsoftides discusses the applications of Apache Spark and why you should pay attention to “the most important new open source project in a decade.”
Apache Spark has come a long way from its humble beginnings, becoming one of the most important technologies in the Big Data space.
It’s so big that even Big Blue is taking notice.
Why? Because Apache Spark, promises to increase performance by up to 100 times for certain applications.
On June 15, 2015, IBM announced a major commitment to Apache Spark, calling it “potentially the most important new open source project in a decade.”
IBM has committed to:
IBM presented a number of compelling use cases illustrating how Spark is transforming business and driving innovation:
Given my inner-geek, I was most excited about the partnership between IBM, NASA and SETI. Carl Sagan fans may recall that The SETI Institute is an organization dedicated to “explore, understand and explain the origin, nature and prevalence of life in the universe.” For more than 35 years, they have tirelessly searched the cosmos for signs of intelligent life. Hopefully, that search is going to get easier now that SETI is using Spark running on IBM Bluemix to help their hunt.
As you read this, Spark is analyzing over 100 million radio signals, collected by the Allen Telescope Array. SETI is using Spark to analyze signals to see if they come from the same location, even if signals are spread out over a period of years or the signal composition is different. It’s a bit like searching for a needle in a billion haystacks, but the machine-learning capabilities of Spark will act like a high-powered magnet—pulling out the important data.
Thanks to Spark, we may soon learn the answer to one of life’s biggest questions: Are we alone in the universe?
Spark is available for applications much broader than looking for aliens, however. (It can also be used to analyze data about fictional aliens!) Any kind of analytics process can benefit from Spark’s high performance. SparkSQL makes it very easy to connect to existing business intelligence (BI) and analytics software like Tableau or Microsoft’s Power BI using ODBC or JDBC data connectivity. To make the most of this connection, you need a high performance driver from Progress® DataDirect®.
On June 2, DataDirect, the leader in ODBC and JDBC connectivity across relational, NoSQL, Big Data and SaaS application access, announced the release of our enterprise-class SparkSQL driver. Our drivers enable you to fully leverage the speed of Apache Spark for the fastest Big Data analytics possible. Don’t hesitate—you can get a free trial of our SparkSQL driver today!
As a Principal Solutions Engineer at Progress Data Direct, Michael eats, sleeps and breathes data connectivity. He is dedicated to developing and implementing proven, high-performance data connectivity solutions, empowering enterprises to better manage and integrate data across Big Data, Cloud and Relational data sources. Follow him at https://twitter.com/DataSherpa
Copyright © 2019 Progress Software Corporation and/or its subsidiaries or affiliates.
All Rights Reserved.
Progress, Telerik, and certain product names used herein are trademarks or registered trademarks of Progress Software Corporation and/or one of its subsidiaries or affiliates in the U.S. and/or other countries. See Trademarks for appropriate markings.