The concepts of Big Data have been around well before the introduction of the Hadoop file system. Back when I was in college (ancient times, according to my family) and pursuing my degree in astrophysics, I wrote a thesis on ‘stellar spectroscopy’ – researching the spectrum of data from electromagnetic radiation and visible light which radiates from stars and other celestial objects to determine properties, chemical compositions, and Doppler shift motion. That description alone tells you that there can be volumes of data involved. At times it was tedious, poring through vast amounts of data from these celestial sources, giving new meaning to data sources “beyond the cloud.” Back then, we stored the data in an early form of NFS (Network File System) and dealt with archaic connectivity and reams of computer paper filled with composition graphs and numbers. Today, of course, we have much more sophisticated methods of storing and processing data. With the advent of Hadoop, high-performance connectivity, and tools to help process the structured and unstructured data, our new world offers great leaps in how we analyze, interpret, and act on what the data tells us. But what about mysterious “dark data” that lurks in the depths of data stores? How can we discover and understand the data out there with invisible connections to further business intelligence – if only we could see it and analyze it. Dark data is much like dark matter in cosmology – we know it exists even if we can’t grasp it. Dark matter is 84% of the universe, and we can’t see it. Dark data in a sense is everything we do and we still can’t see it. To that end, how do we “see further” into the vast amounts of data collected across a variety of data stores? The revelations that can be uncovered within the hidden dark data are only possible with premium high-performance connectivity to the broad range of data sources in heterogeneous Big Data environments today. The Hadoop distributed file system is becoming a key fixture in those environments. As the trend towards greater usage of Hadoop drives greater needs for Big Connectivity, greater insights will be gained by spelunking through caverns of dark data. Use cases like my thesis project example provide us with greater insight into how important Big Data connectivity and analytics are in the scientific community as well as to businesses – enabling us all to “see further”. Check out the DataDirect Connect for ODBC Hadoop preview ! Check out part 1 of this blog series !
View all posts from Jeff Reser on the Progress blog. Connect with us about all things application development and deployment, data integration and digital business.
Copyright © 2019 Progress Software Corporation and/or its subsidiaries or affiliates.
All Rights Reserved.
Progress, Telerik, and certain product names used herein are trademarks or registered trademarks of Progress Software Corporation and/or one of its subsidiaries or affiliates in the U.S. and/or other countries. See Trademarks for appropriate markings.