Data Platform

ProgressBlogs Data Hubs: Separating Hype from Reality

Data Hubs: Separating Hype from Reality

by Joe Pasqua

Posted on September 04, 2018 0 Comments

Three years ago, when we put forward our vision for an operational data hub (the approach that quickly creates a 360-degree view of your data and lets you use the data for operational applications), the world was mired in ETL and reaching for a lifeline. Data silos were, and still are, proliferating and they slowed down businesses. Teams couldn’t answer simple questions because data spanned multiple systems, and the fragmented data was stifling innovation. However, this dilemma motivated organizations to find a solution.

Many reached out towards Data Lakes to bring their data together, but found that their data was still not harmonized, indexed, discoverable or secure. Others tried a connector approach, but they often ended up with ETL and Data Lakes or ETL and connectors. Either way, their worlds actually got more complex, harder to govern, more costly and less secure. Perhaps more importantly, they couldn’t make any sense of the data they collected. What does it all mean? How does it all relate?

Roll the clock forward a few years, and if imitation is the sincerest form of flattery, we’re feeling very flattered at MarkLogic these days. It was a little lonely as the only company that talked about and delivered operational data hubs, but now everyone and their brother is talking about them. We’re happy that the concept has gone mainstream, but, like all bandwagons, not everyone who jumps on has really embraced the vision or delivered the goods.

The Data Hub Checklist

When looking at a data hub architecture, you want to be sure that it’s just not a rebadged Data Lake or a black box with connectors on the edge and ETL hidden inside. You also want to be sure that it’s not just an old data virtualization architecture in new clothing. There is a long list of attributes you should look for in an enterprise-grade data hub–especially if you’re going to use it as the basis for mission-critical applications. For example:

It must avoid ETL by ingesting data as-is.
It must be flexible so you don’t have to do 18 months of data modeling upfront.
It must be agile, so you can adapt to inevitable changes over time.
It must be secure and governed by default, so you can actually trust your data and know you’re using it appropriately.
It must add meaning to your data, not just store it.

These are just some of the important attributes, and while all of them are crucial, to be truly transformational, a data hub needs to enable a higher pace of innovation in the business. In large organizations today, the vast majority of data is locked up in legacy systems. It might be a mainframe, it might be an ERP app, it might be a 30-year-old Oracle system. The problem is, any time you want to do something new, you have to interface to those legacy systems. Actually, you probably need to interface to 10 of them or 50 of them. And every project does it over again in a slightly different way. It’s a complete waste of time and money, and it’s a soul-draining exercise for the IT pros who you wish were creating differentiated innovation for the business. The pace of innovation of the new stuff is bogged down by the old stuff.

The Power of “Operational”

The right kind of operational data hub can fix that problem by ring-fencing legacy systems and allowing new applications to go to the hub rather than going to lots of individual systems. And it’s not just new analytical apps that can go to the hub–we’re not talking about an old-fashioned data warehouse. As the name implies, an operational data hub is actually operational. You can build new transactional processes on the data. It’s not just a 360-degree view of your data, it is an actionable view.

I’d like to tell you that years ago MarkLogic had a brilliant flash of inspiration and the idea for the operational data hub popped into existence, but that’s not the way it happened. Our innovation tends to be very pragmatic. We work with the largest companies and government agencies in the world. Every single one of them has the problem of data in silos— every single one. As we worked with them to apply the MarkLogic® database to solve their problems, we found that strong patterns and best practices emerged. The data hub came into being because we built a lot of them with our customers. It wasn’t theoretical; it was an architecture that was built to solve the real issues that customers faced in an entirely new way. For example, when we introduced Semantics technology five years ago, almost no one recognized how important it would be. But as customers struggled to find meaning and relationships in their data, they realized that Semantics is the perfect tool. That’s why it underpins so much of the MarkLogic Operational Data Hub. The best solutions are always the ones that are developed arm in arm with users.

The bottom line for us is that we’re happy to see the concept of a data hub embraced so broadly. We think it’s good for customers, good for the industry and good for innovation. We will continue to lead the way with our free, open-source data hub technology based on the world’s best database for integrating data from silos–MarkLogic. To everyone else, welcome to the party!

For an in-depth introduction to the MarkLogic Operational Data Hub, including use cases and how it can fit into your existing enterprise architecture, download the free e-book.

MarkLogic

Joe Pasqua

Joe Pasqua brings over three decades of experience as both an engineer and a leader. He has personally contributed to several game changing initiatives including the first personal computer at Xerox, the rise of RDBMS in the early days of Oracle, and the desktop publishing revolution at Adobe. In addition to his individual contributions, Joe has been a leader at companies ranging from small startups to the Fortune 500.

Most recently, Joe established Neustar Labs which is responsible for creating strategies, technologies, and services that enable entirely new markets. Prior to that, Joe held a number of leadership roles at Symantec and Veritas Software including VP of Strategy, VP of Global Research, and CTO of the $2B Data Center Management business.

Joe’s technical interests include system software, knowledge representation, and rights management. He has over 10 issued patents with others pending. Joe earned simultaneous Bachelor of Science Degrees in Computer Science and Mathematics from California Polytechnic State University San Luis Obispo where he is a member of the Computer Science Advisory Board.

Comments

Comments are disabled in preview mode.

Topics

More From Progress

Shadow Analytics: Why You Can’t Afford to Leave It Unchecked

Then, Now and Beyond: The Future of Back Office Software

2022 Progress Data Connectivity Report

Subscribe to get all the news, info and tutorials you need to build better business apps and sites

Country/Territory

Blog

MarkLogic

Semaphore

OpenEdge

DataDirect

Sitefinity

Telerik

Kendo UI

Corticon

DataDirect

MOVEit

Chef

Flowmon

Kemp LoadMaster

WhatsUp Gold

Telerik

Kendo UI

Fiddler

Test Studio

MOVEit

WS_FTP