Top Data Security Concerns Around Data Integration

Top Data Security Concerns Around Data Integration

Posted on October 16, 2017 0 Comments

With data security, it is important to address the tactical details (think incident patterns, attack vectors, dynamic testing, etc.). But, it’s also important to look at the broader, more strategic issues and concerns with data security.

In talking to many of our customers, we identified three top strategic issues with data security that CIO’s, architects, and business leaders are most concerned about:

Concern #1: How traditional data integration with relational databases creates security vulnerabilities

Concern #2: How application developers are unduly burdened with data security

Concern #3: How insider threats create unknown, unmanaged data security risks within the network perimeter

In this post, I take a closer look at these issues that are particularly relevant to data integration, and discuss how MarkLogic helps address them.

The Data Security Problem Is Getting Worse

Headlines reporting cyberattacks, ransomware, and compromises in data security are increasingly common. Data security is now a top priority — the risk of not securing data is simply too high.

There is no shortage of splashy numbers that highlight the problem:

  • Each cyber incident costs U.S. companies a reported $7.1 million on average, or $221 per record (Source: IBM)
  • In 2011, there were 468 major breaches recorded. In 2012, 1,175. In 2013, 1,731. See the trend? (For the gory details, check out the Veris Community Database)
  • Two-thirds (63%) of organizations deploy new IT prior to having appropriate data security measures in place (Source: Thales Security)


Despite increasing awareness and spending, the data security is getting worse, at least if you’re measuring by the number of attacks and level of damage.

Every organization is looking for better solutions, but it’s a particularly difficult problem to solve with large-scale data integration projects that involve a variety of data silos that house mission-critical data.

Organization’s that do not take a comprehensive approach to data security — if they just focus on tactical details or just protecting the perimeter — they open themselves up to enormous cyber risk. So, with that, let’s jump into discussing the top strategic issues that we identified.

Concern #1: Traditional Data Integration Creates Security Vulnerabilities


The traditional approach to data integration with relational databases and ETL leads to data loss and governance problems.

Role-and policy-based access controls are essential to govern, preserve, and audit data and associated entitlements. If these controls are not managed, you introduce unnecessary complexity and risk.

Unfortunately, most organizations have a proliferation of relational database silos. Each one has separate security access controls that make it virtually impossible to adequately track and protect all of the data.

Additionally, there are multiple ETL tools with obfuscated code and integration points, not to mention their own access controls that need to be managed. With an increasing number of data silos, there are more opportunities for exploits.

Often, what happens is a team builds a complex ETL process from multiple databases to a centralized analytical data warehouse—all using relational databases.

The ETL is done for two reasons:

  • To make the system be able to function; and
  • To “cleanse” the data because there is a business process that requires standardization so that the system can be used to count things, do math, or disambiguate the data.

But, step 2 often fails to ensure quality. In fact, the cleansing process may actually reduce quality by removing important data.

To a data analyst, some metadata may seem like “data lint” that needs to be laundered, but to a compliance analyst or data modeler, that same “data lint” may be required for critical business reasons (say, to prove to a regulatory agency that your trades were legal in order to avoid a hefty fine).

According to Mike Fillion, Director of Architecture at Aetna:

“Auditors don’t care if the data is ‘dirty,’ and in fact they get suspicious if you start remediating data… The database is the key to strategic data integration.”

(Watch the full presentation from MarkLogic World 2017)

Over time, it becomes more and more difficult to maintain data governance (i.e., quality, lineage and provenance, security and privacy, compliance requirements, availability).

Failing to pay close attention to each aspect of data governance across the entire lifecycle of data creates additional cyber risk.

How MarkLogic Helps

MarkLogic makes data integration a good thing for security and data governance.

First, MarkLogic reduces the burden of traditional ETL. By handling the process of ingesting source data as is and transforming and harmonizing the data inside MarkLogic, the whole process of integrating data becomes faster and more seamless. No data gets discarded during the process.

Second, MarkLogic’s multi-model approach using documents and triples is better for governing data over time. You can manage high level business concepts from multiple silos, materializing them as entities and relationships. Data and metadata stay together and you can track the details across the lifecycle—its provenance, who can see it, how it changed—all in a single system. (To learn more, download the free e-book, Building on Multi-Model Databases.)

Aetna is one company that has embraced this approach, and according to Mike Fillion, Director of Architecture:

“NoSQL gives you a huge upside: You can load the data as is, profile it, understand what’s low quality, and pass it back to admin to get it fixed. It’s a key data governance facet that gets solved.”

By taking a more comprehensive approach, MarkLogic reduces opportunities for exploits and provides a more agile platform to handle new and changing regulations.

Concern #2: Application Developers Are Burdened With Data Security


Unless security is handled in a more centralized database, what results is a spaghetti architecture that leads to more vulnerabilities. This graphic does not even depict the systems for backup and recovery, development, and testing that also require security monitoring maintenance.

It’s really hard to secure data across multiple data silos at every layer. Unfortunately, data is not secured in one central place, and not in the database layer. Usually, the burden is simply put on developers to do their best to secure data at the application layer for every new application.

With regulation around data privacy and security that organizations now have to account for (HIPAA, SEC17a-4, FINRA, GDPR, etc.), the stakes are higher and the burden is growing.

This is problematic because development and security teams are often disconnected.

A disconnect has grown because of the move towards DevOps and agile development. Both are positive improvements to software development that enable shorter release cycles.

Unfortunately, security teams cannot keep up. Security review cycles are designed to take weeks or months, and security certification and accreditations are bound to waterfall methods, not continuous improvement. Most developers know the OSWAP Top Ten, but the real security experts are only brought into the development process to do a final check before go-live.

According to Gartner, 90 percent of companies using DevOps consider security an afterthought. (Source: Gartner)

It is no surprise then, that according to the Department of Homeland Security, 90 percent of exploits are due to defective software. (Source: Homeland Security)


There is a disconnect between DevOps and security teams. Security is often only worked on during testing and release rather than through the whole lifecycle.

One example showing the disconnect between teams is at Intuit, which adopted an agile, DevOps approach for their 3,000-person team. Shannon Lietz, senior manager for cloud security engineering at Intuit, said in an interview (Source: TechTarget):

“We realized that the DevOps teams were throwing [responsibility] over the wall to security, and [security] had all the information; they knew all the attacks that were coming in, and the DevOps people did not have the information to make the decisions.”

While most organizations are not the size of Intuit, the challenge is often similar. A development team is tasked with stitching together multiple technologies with different, usually quite limited security capabilities. The security team is out of sync and cannot keep up.

To solve this problem, organizations should implement many tactical recommendations:

  • Develop closer integration between security and DevOps teams to close the feedback loop
  • Make security checks more automated by performing dynamic code analysis (and perform such checks earlier and more frequently in the sprint lifecycle)
  • Improve Identity and Access Management (IAM) systems
  • Enforce segregation of duties
  • Conduct risk and threat modeling for applications
  • … and many other tactical things

Additionally, it is important to take a broader, more strategic look at how data is managed at the lowest possible level—in the database.

How MarkLogic Helps

The goal is to keep data governance governable across the stack.

If you move to using a centralized database to govern and secure the data, securing applications becomes easier and faster. The work of data governance happens in one place. One change in data policy at the database level can be automatically applied to a hundred applications.

MarkLogic has extensive capabilities to govern and secure data in the database, which in turn helps with many of the aspects of application security.

The SANS Institute, a well-known cybersecurity training organization, provides a SWAT checklist to help development teams. (Note: This checklist includes references to the common weakness enumerators referenced by the OWASP Top Ten, which many people are more familiar with).

SWAT Checklist (Securing Web Application Technologies)

  • Error handling and logging
  • Data protection
  • Configuration and operations
  • Authentication
  • Session management
  • Input and output
  • Access control

Of this list, MarkLogic fully addresses numbers 1, 2, and 7 – error handling and logging, data protection, and access control – and also helps address the rest (3, 4, 5, and 6).

By addressing many of these concerns in the database, the attack surface is decreased significantly.

One of MarkLogic’s key underlying capabilities that makes data security stronger and easier to implement is Role Based Access Control (RBAC). RBAC governs who can access what data based on their privileges and permissions. These privileges and permissions work to secure data at the document level.

MarkLogic also has Element Level Security, which makes it possible to secure pieces of data inside documents (more on this later). Working together, these features make life easier on developers by managing the access controls in the database.

Additionally, MarkLogic has programming APIs so developers can create and execute policies utilizing all of the security and data protection capabilities in MarkLogic (e.g., backup, retention, data access, data lifecycle, and authentication).

Policies can be associated with data, metadata, and data attributes so that policies such as those for privacy or compliance can be easily executed. And, the security controls and checks are transparent to developers.

Beyond these features, MarkLogic also has additional out-of-the-box features designed to help organizations with compliance:

  • Bitemporal data management ensures that historical data remains unchanged and that you have a full audit trail of data.
  • Compliance Archive provides a mechanism to protect data from changes, and save the data to WORM (Write Once, Read Many) storage.

All of these features means smarter data management in the database, less work for developers to do at the application level, reduced time and complexity around security testing, and better security resilience.

Concern #3: Unknown, Unmanaged Risks From Insider Threats


Focusing only on network security may create a secure perimeter, but the data in the “squishy middle” is then vulnerable.

Typically, most organizations put an immense focus on implementing endpoint, application, perimeter, and network security—and for good reason. Preventing intrusion into your network is a critical part of securing your infrastructure.

Some companies see hundreds of thousands of intrusion attempts against their network—every single day.

But focusing only on network security is like creating a hard shell around a soft, squishy middle. If you can get in, you’re in. The truth is, no network perimeter will ever be impenetrable. There are likely bad actors already in the network.

Some of the biggest data breaches have occurred because an insider got the keys to the kingdom. And, the number of incidents involving internal actors is increasing.

The numbers vary, but in general, internal actors are involved in 25 percent of all breaches (Source: Verizon).

In the healthcare industry, insiders are responsible for 68 percent of breaches (Source: IBM).


Unfortunately, many systems are vulnerable to such attacks because they only have all-or-none data access rather than fine-grained security controls.

Complicating the insider threat problem is the fact that modern enterprises have staff, contractors, sub-contractors, trading partners, consultants, auditors, and other people involved. It is very difficult to discern just who is ‘inside’ and who is ‘outside.’

Sometimes, it is relatively innocuous data management decisions that can create the biggest insider threats. For example, many organizations have data lakes that are virtual treasure troves of data with broad access to users.

One global bank we work with spent years building a data lake using another technology. But, they shut it down for security and compliance reasons when they realized the new system did not have proper controls and that were potentially violating certain rules and regulations regarding customer data.

Organizations today need better data security. It is not an option, however, to just lock everything down. While the most secure database in the world might be one that is locked in a safe and dropped in the bottom of the ocean, that data would not be very shareable.

In the quest for data security, it is important to still maintain data sharing.

Organizations must have proper security controls to ensure that the right portions of data are accessible and shareable with those in and outside the company who are granted proper access. And, there must be a separation of duties so that administrators granting access do not themselves have access to the crown jewels.

How MarkLogic Helps

As discussed in the previous section, MarkLogic has fine-grained access controls designed to provide optimal data security even when sharing data. One additional feature that directly addresses the problem of insider threats is Advanced Encryption.

Without encryption, or even with file system encryption, the system administrator, cloud operator, or hacker could access or modify files—including the files that comprise the database.

MarkLogic’s Advanced Encryption allows data, configuration, and logs to be encrypted on disk (i.e., encrypted at rest). This feature requires no modification to applications developed on MarkLogic. And, the optional use of an External Key Management System (KMS) further ensures separation of duties and integration into existing security infrastructure.


In this post, I covered the top strategic data security issues that many of our customers are working on. The list if not comprehensive, nor does every organization struggle with all three. Regardless, it is important for every organization to think strategically about the vulnerabilities throughout their data ecosystem.

How is your organization addressing these problems? Are there any additional issues to add to the list?

If you’re interested in learning more about MarkLogic’s approach to security and data governance, here are some key resources below.

For More Information

White Paper – Top Data Security Concerns When Integrating Data

PresentationSecurity Keynote: SVP of EngineeringDavid Gorbet, SVP of Engineering, MarkLogic

PresentationData Security In PracticeCaio Milani, Director of Product Management, MarkLogic

PresentationData Governance in an Unpredictable WorldDamon Feldman, Ph.D., Solutions Director, MarkLogic

Matt Allen

Matt Allen is a VP of Product Marketing Manager responsible for marketing all the features and benefits of MarkLogic across all verticals. In this role, Matt interfaces with the product and engineering team and with sales and marketing to create content and events that educate and inspire adoption of the technology. Matt is based at MarkLogic headquarters in San Carlos, CA and in his free time he is an artist who specializes in large oil paintings.


Comments are disabled in preview mode.

Sitefinity Training and Certification Now Available.

Let our experts teach you how to use Sitefinity's best-in-class features to deliver compelling digital experiences.

Learn More
Latest Stories
in Your Inbox

Subscribe to get all the news, info and tutorials you need to build better business apps and sites

Loading animation