It can’t be overstated—data is everywhere. To put the vastness of data into perspective, if we burned all of the data created in one single day onto DVDs, it would stack to the moon and back… twice! Wrapping your mind around big data and how to harness it feels like a daunting, if not impossible task. It’s good to know our team at Progress® DataDirect® has your back.
As the year comes to a close and we all look forward to warm fireplaces and relaxation, let’s take a moment to look at some of the amazing things that happened this past year in data connectivity. We want to make good on our promise to keep you up to date on the data world, so here’s a second look at our top 10 data connectivity insights of 2015.
As huge volumes of data grow, performance and speed are essential to the success of businesses who are analyzing their data. For example, Hulu successfully monitors customer behavior by enabling customers to choose ads they are interested in. Macy’s app delivers targeted coupons based on in-store location. In both examples, these organizations were able to get a leg up on their competition by enabling faster data access and analytics.
Whether you are predicting weather patterns, optimizing transportation routes or even writing on a notebook, how fast you can mine, translate, transform and analyze your data is key. Jesse Davis made the case for high-performance connectivity earlier this year.
Tech-savvy companies like Amazon, Hulu and Apple are harnessing their customer data in data lakes. The data lake is a treasure trove of behavioral insights, so these companies are able to see and understand what their customers want. Times are changing, online businesses are at the forefront and consumer data is increasing. Airbnb made over $10 million in bookings last year without a single storefront. Companies like Blockbuster failed because they did not adapt to the changing marketplace. Insights fished from data lakes drive better decisions and success.
Earlier this year, Mike Johnson told us how to get the most out of data lakes without falling in.
Too bad Sony Music didn’t have DataDirect. They lost millions after the death of Michael Jackson because they could not sustain the traffic on their site and it crashed. Wikipedia, Twitter, and AOL messenger also crashed upon the news of his death.
Adding more CPU or RAM won’t solve this bandwidth problem. The root of it exists in the connectivity and integration between the application and the database. Using high-performance bulk load, companies can effectively satisfy almost any bulk data access requirements for a broad array of data access use cases. Systems engineer Manny Vergara outlined a potential solution that uses DataDirect wire technology.
Apache Spark had humble beginnings but is becoming one of the most important technologies in big data. Why is it important? It potentially speeds applications over 100x. It is currently used by Optibus to optimize transportation planning, Independence Blue Cross to analyze clinical and radiological data and yes, you guessed it, NASA uses it at the SETI institute to analyze terabytes of complex, deep space radio signals in the search for extraterrestrial life. Michael Coutsoftides brought us the whole story in July.
Typical open source drivers take over six hours to load one million rows of data. Sumit Sarkar saw this as a challenge, and tweeted out that he would load one million rows of data live at 2015 Oracle Open World in less than ten minutes without S3 buckets. This caused a stir in the big data community, and many doubted he would come close to his goal. Against everyone’s expectations, Sumit not only achieved his goal, he loaded every row in only eight minutes using the DataDirect Redshift driver!
Take the challenge yourself, or follow along with Sumit!
Fifteen years ago, companies would choose DB2 or SQL Server, which effectively tied you to a single client library with no flexibility to choose multiple libraries. That lack of flexibility is unacceptable today. This led the enterprising engineers at Netflix to coin the term “polyglot” to refer to applications that used multiple data sources as they were doing. Since then, retrieving data from multiple sources has become essential for every business, and doing it quickly and effectively became a must. Mike Johnson explained how DataDirect enables polyglot persistence in November.
In a galaxy far, far away they only talk about “the Force,” but in our galaxy, there are many forces driving big data, including large-scale web services like Amazon, Ebay, and the IoT. Information is taken from these sources and usually stored in Hadoop clusters to later be analyzed and provide valuable insights. Of course, this means nothing if you can’t leverage this huge amount of data. In May, Idaliz Baez shared a solution that is over 100 times faster than Hadoop Map Reduce in memory and 10 times faster on disk.
Big data dilemmas are creating a battle between storage vendors, resulting in complicated choices for data storage for anyone trying to implement a data lake or big data strategy. There are so many data sources and new storage options (such as Hadoop) that it makes it unfeasible to write and maintain an application and connect point to point to each available source.
The cloud solves this problem, creating a single data access point to be used securely by any BI tool quickly and easily. Mike Johnson showed us how to avoid picking sides in the database wars and leverage all that data, thanks to the cloud.
Drones are now used for oil rig inspections. Fitbits collect personal health data. Neural networks create models and deduce relationships without instruction. Predictive analytics track vast amounts of consumer data for targeted advertising.
The IoT creates huge amounts of data that flow into data lakes for analysis, but none of this matters if you can’t access your data. Last spring, Jeff Reser explained the top five reasons you should invest in data connectivity.
In a recent PaaS survey, only 10% of respondents only use one data source. 68% require multiple data sources to be integrated on at least half of their apps. For an average application, 81% of respondents need two or more data sources. Surprisingly, even with all of this investment, 61% of applications do not have full access to their data sources.
This odd discrepancy of applications without full data access arises mostly due to failed or poorly designed data integration. In February, Mark Troester offered a solution, along with a free whitepaper, “9 Essentials to Create Amazing Applications Faster.”
Austin is a content strategist, social promoter and marketer at Progress with a passion for technology, data visualization and music. He keeps up to date on the data connectivity industry and discusses related topics in a visually appealing, thorough and easily understandable way.
Copyright © 2018 Progress Software Corporation and/or its subsidiaries or affiliates.
All Rights Reserved.
Progress, Telerik, and certain product names used herein are trademarks or registered trademarks of Progress Software Corporation and/or one of its subsidiaries or affiliates in the U.S. and/or other countries. See Trademarks for appropriate markings.