Hadoop has come a long way since its development in 2009. It began as a simple idea between a serial entrepreneur from Atlanta, Georgia (Rob Bearden) and a programmer from Palo Alto, California (Eric Baldeschwieler). They wanted to create an open-source framework that enabled users to compute large amounts of data. That goal is now a reality, and companies such as eBay, Twitter and Yahoo utilize Hadoop for core business activities.
Software giants such as IBM, Oracle and Microsoft are even getting on the Hadoop bandwagon and developing tools to help enterprises manage their data. In fact, here at DataDirect, we just announced a preview of our ODBC Connect driver for Hadoop Hive that enables reliable, secure and full featured connectivity to many different distributions of Hadoop.
But what if you have so much data that it’s bigger than your ordinary Big Data? Wired reports that Facebook has developed two new software platforms that can help. The first is Corona, a program that will allow users to run tasks across multiple servers without the possibility of crashing an entire cluster. The second is Prism, which enables a user to run a Hadoop cluster so large that “it spans multiple data centers across the globe.”
The reason for this shift in software platforms is Facebook’s expansion to more than 900 million users – which makes it difficult to store the vast amount of user data that is collected daily. When trying to access this data, current Hadoop installations are plagued by a “single point of failure” – making it possible for an entire cluster to temporarily go down. Another challenge is that, considering the amount of photos, videos and comments left by 900 million Facebook members, the company will soon outgrow its cluster. The new software platforms adjust for these challenges in different ways – Corona allows the use of multiple “job trackers” to manage a larger cluster of servers and Prism eliminates the need to be tied down to a single data center.
It’s interesting to see how the Hadoop space has grown in such a short time – Hive, HBase, Hadapt, PIG, and now Corona and Prism. Hadoop is becoming a lego-like space to fully customize how your organization processes, stores, and analyzes Big Data. Here at DataDirect, we’re focusing on helping you access that data from anywhere. Want to get at your Hadoop data from your favorite analysis tool? Just plug in our ODBC driver and we’ll get you processing and analyzing that data in a way that you are familiar with – using the existing tools and expertise you have today. Being that the DataDirect ODBC driver for Hadoop Hive (launching in October) is a fully ODBC compliant driver, you can just plug it in and continue summarize, query, and analyze your data the way you always have!
As Senior Director of Research & Development, Jesse is responsible for the daily operations, product development initiatives and forward looking research for Progress DataDirect. Jesse has spent nearly 20 years creating enterprise data products and has served as an expert on several industry standards including JDBC, J2EE, DRDA and OData. Jesse holds a bachelor of science degree in Computer Engineering from North Carolina State university.
Copyright © 2018 Progress Software Corporation and/or its subsidiaries or affiliates.
All Rights Reserved.
Progress, Telerik, and certain product names used herein are trademarks or registered trademarks of Progress Software Corporation and/or one of its subsidiaries or affiliates in the U.S. and/or other countries. See Trademarks for appropriate markings.