Testing the Waters of “Data Lakes”

Testing the Waters of “Data Lakes”

Posted on September 19, 2014 0 Comments

As a society, we’ve become data hoarders — collecting and storing data without really knowing what to do with it.

Take the closet space and storage space in your house, for example. Many people seem to accumulate just enough stuff to fill the available space. But once everything is stored away, what good it is? Maybe you have a yard sell and get rid of some of it. Or maybe, if you are lucky, you discover Uncle Elmer’s vintage baseball collection that makes you rich on the “Antiques Road Show.” But, ultimately, what do you do with everything? You purge most of it and you keep the things that you think might be valuable in the future.

This is how you have to think about storing data. For all intents and purposes, storage space for data today is unlimited. But that doesn’t mean you have to store everything. The trick is making sure that you keep that one thing that will give the most value in the end — like Uncle Elmer’s vintage baseball collection.

So how do you decide what to keep? Like most other things in business, you start by understanding your business strategy and ensuring that you keep the data that might play a role in your future business strategy. And, you err on the side of caution.

One trend being thrown around in the industry to combat the data storage issue is the idea of “data lakes” — a repository where you can store any and all data in its original source and then figure out what to do with it later. With such an unstructured environment, though, companies are still struggling with the hardware and software needed to process everything and make it possible to work across systems, apps, infrastructures, etc.

From this, it’s evident that we’re not living in the age of “data lakes” quite yet. Rather, we’re in an era of “data ponds.” In actuality, we’re years away from a “data lakes” reality, mainly because of the challenges of basic data integration between too many disparate stores of data. We can certainly test the waters with smaller environments to see if the processing power exists to get valuable insight, but a 360 view of all data is not likely with the technology most enterprises have in place today.

With the appropriate access management, the appropriate governance and the appropriate data consumer skills, data lakes do offer access to potentially far-flung data elements. And, who knows, maybe someone will make startling new discoveries from that – maybe not as startling at discovering Uncle Elmer’s baseball collection, but startling nevertheless. So, let’s don’t give up on “data lakes” and dip our toes into the water with “data ponds.” Who knows what could happen.

Tony Fisher

Tony Fisher is the Technology Officer of Progress Software responsible for the company’s data connectivity and data integration product portfolio.  Prior to his role at Progress, Fisher was the president and CEO of DataFlux Corporation. Fisher guided DataFlux through tremendous growth as it became a market-leading provider of data quality and data integration solutions. He is also a noted author and a sought after industry speaker on emerging trends in data quality, data integration, master data management and how better management of data leads to business optimization.  Fisher holds a Bachelor of Science degree in computer science and mathematics from Duke University.


Comments are disabled in preview mode.

Sitefinity Training and Certification Now Available.

Let our experts teach you how to use Sitefinity's best-in-class features to deliver compelling digital experiences.

Learn More
Latest Stories
in Your Inbox

Subscribe to get all the news, info and tutorials you need to build better business apps and sites

Loading animation