The Oracle database is one of the most successful software products of all time. In the past 40 years, it has evolved to become a dominant technology for storing and managing data, and the RDBMS and SQL standards are now entrenched in every large enterprise. Alternatives like MarkLogic were built more recently to address new data management challenges, particularly with data integration.
Today, organizations are confronting new challenges that did not exist in the 80’s and 90’s. Data is big, fast, varied, and changing. Instead of a small handful of systems, organizations have hundreds of systems and petabytes of data. And, business needs are changing more quickly than ever before, and are more regulated than ever before. This means rethinking how data is managed in order to meet quickly evolving business needs:
MarkLogic provides distinct advantages compared to Oracle in all of the above areas. MarkLogic provides greater agility to integrate data and with less risk, makes cloud costs significantly lower and more predictable, speeds up delivery of new applications, and does not sacrifice data security or governance.
This comparison looks at the underlying differences between Oracle and MarkLogic databases, and also how MarkLogic Data Hub Service stacks up against Oracle’s suite of cloud products. In summary, the main underlying differences are the following:
Oracle is one of the ten largest software companies in the world and provides the most widely adopted relational database. Oracle continues to drive a relatively large percentage of its revenue from licensing the Oracle database (and its many derivatives), and in recent years has made significant investment in building out Oracle Cloud, its suite of over 120 different products spans SAAS, PAAS, and IAAS.
The Oracle relational database was first released in 1979 and the latest release of that software is Oracle Database 19c (the long term support release of Oracle 12c and 18c). There are various iterations of this core product, and Oracle’s product suite now includes specific products for different workloads (analytics vs transactional), engineered systems (combined hardware/software appliances), and fully managed cloud services.
MarkLogic Data Hub Service is MarkLogic’s flagship product. It is a fully managed cloud data hub for agile data integration and data management. Built on MarkLogic Server, it has all the same multi-model, security, and scale-out capabilities.
MarkLogic Server is a multi-model database with modern NoSQL and trusted enterprise capabilities. It can be deployed as part of MarkLogic Data Hub Service, or alone in any environment (on-premises, cloud, hybrid).
MarkLogic also develops associated tools and connectors for the ecosystem, which includes various APIs and connectors.
There are two ways of comparing MarkLogic to Oracle:
MarkLogic Server was the first modern, multi-model database on the market. MarkLogic has multiple ways to model data (e.g., documents, graphs, relational), even data that represents the same entity. And, MarkLogic supports storing data in multiple schemas at the same time — all in the same database — with a single integrated back end.
Only a few years ago, the term “multi-model” was relatively new, and it required significant effort to explain what it was and why you needed it. Today, that is not the case — there is widespread acknowledgment that multi-model databases should be part of any modern data architecture.
The following resources provide a deeper dive into understanding the multi-model advantage provided by MarkLogic:
Here’s the summary of MarkLogic’s multi-model benefits:
Oracle is a relational database that stores data in rows and columns. Here are three specific examples to highlight how this approach differs from a multi-model approach:
Data Modeling and Access — With Oracle, users need to understand relational schemas that are often very complex. With this structure, data defining a single business entity may be split across a large number of tables. This usually results in cryptic column and field names (or VARCHAR columns) that only the database administrator understands, which means only they know how to properly access the data. For example, if data about drugs is stored, users must know whether to query on “aspirin”, “acetylsalicylic acid”, “Excedrin”, or “Bufferin” (all names for the same thing). If users query on the wrong term they miss most of the results.
MarkLogic Server solves these issues by using the document model that is more human readable and does not require shredding entity data. Also, users of MarkLogic Server can rely on its built-in search and semantic capabilities to search across the data like a knowledge graph, making it much easier for non-database and domain experts to query the data.
Indexing and Performance — With Oracle, there is significant maintenance overhead as database admins spend time constantly optimizing tables and indexes for query performance. For example, Oracle requires constant defragmentation of the tablespaces (perhaps weekly or more often depending on amount of deletes), in order to maintain insert performance. Also, Oracle indexes usually require constant rebalancing and re-indexing. With Oracle, users can expect that execution plans will get frequently impacted and new ones will need to be created to maintain performance. Relying on the Oracle optimizer can make things worse even with a well maintained system. Also, using Oracle data replication can negatively impact transaction performance.
Unlike a relational database, MarkLogic Server has a Universal Index that automatically indexes words, phrases, relationships, values, and structure. This index requires zero maintenance to build, update, or keep in sync. And, query performance against this index is more like Google search and is consistent even as workloads vary.
Metadata and Data Governance — With Oracle, tracking metadata requires upfront planning, changes are complex, and data is often lost. Like other relational databases that have defined schemas, columns must be added to handle new pieces of metadata. But often, metadata is discarded or just stored separately. Provenance and lineage information is metadata that is critical for data governance, but is often too cumbersome to manage, especially across a complex data integration life cycle.
MarkLogic Server stores any amount of metadata right alongside the data itself — it’s just more attributes in the document. The PROV-O standard is used for storing provenance and lineage metadata, so any tool can understand it. Furthermore, as with any data in MarkLogic, it can be harmonized and semantically enriched. The same can’t be said with a relational approach.
In some ways, Oracle’s latest version is a multi-model database. Historically, however, Oracle ridiculed any other approach than a purely relational one. In a 2015 eWeek article, Oracle executive Andy Mendelsohn said that NoSQL databases, including multi-model databases, were “designed for simple data management problems”, have “very low productivity,” and are “limited.” In reality, there has been massive growth in the NoSQL market. And, in 2019, Mendelsohn stated at Oracle Open World that Oracle actually is a NoSQL multi-model database after all: “over the years, relational databases have become multi-model databases. We support JSON as a data type. We support XML…”
He’s right. It is possible to ingest JSON, XML, and RDF in Oracle. But, underneath Oracle is still relational, not truly multi-model, just like every other version before it.
Oracle does not natively store JSON. Oracle’s documentation states that: “In Oracle Database, JSON data is stored using the common SQL data type VARCHAR2, CLOB, and BLOB (unlike XML data, which is stored using abstract SQL data type XMLType).”
As a result, in order to retrieve a value from the JSON document, the entire JSON document must be traversed to locate the data. This approach is slow. There are two workarounds Oracle recommends to improve performance. One workaround is to extract the data into a materialized view, pushing values into another table (i.e. shredding). The other workaround is a JSON search index, which does not maintain ACID compliance (it is only updated periodically when triggered).
In general, handling multi-model workloads with a relational database like Oracle will be hard, brittle, or both. In addition to running into simple challenges like querying documents, a relational database cannot do more advanced functions like link documents together with triples or query XML and JSON together – tasks that come easy with MarkLogic Server.
In the past, developers were often forced to use relational databases because of their broad adoption. As Jeff Bezos pointed out in his 2018 letter to shareholders, “the broad familiarity with relational databases among developers made this technology the go-to even when it wasn’t ideal.” Today, more than ever, users have a choice about which database to use and the adoption barrier is growing smaller as multi-model gains widespread popularity.
|Search & Query
Note: Oracle does have another product called Oracle NoSQL database. That database is a Key Value store, which is very different from a document-oriented database like MarkLogic Server. Key Value stores are most often used as caching layers where they are optimized for simple low latency processing, not for data integration. For that reason, we do not cover it in this comparison.
This comparison looks at the similarities and differences between MarkLogic Data Hub Service and an Oracle “cloud data hub.” That is in quotes because Oracle does not provide one unified product. If an organization wants a cloud data hub with Oracle, they must stitch together a collection of Oracle Cloud products (or other third party products) to achieve similar functionality.
Here’s a list of some of the Oracle Products that may need to be stitched together to create an Oracle “cloud data hub”:
MarkLogic Data Hub Service is a better choice compared to the collection of Oracle’s products because of the following advantages:
With MarkLogic Data Hub Service, organizations can skip the big upfront modeling steps required to load data. When loading data, users simply add the data they have as needed to meet the immediate business need. Users can represent repeating hierarchical attributes like phone numbers and addresses naturally as JSON or XML documents, without having to build out separate tables. And, because MarkLogic indexes data as users add it, it’s immediately discoverable.
To integrate data, MarkLogic users iteratively build a canonical model of the data, master the data, enrich it with metadata, add semantics, and govern the entire process. This makes it much faster to create data services for downstream business needs, with less risk when things change. And, MarkLogic supports real-time, operational applications, in addition to traditional BI and analytics.
Deploying cloud infrastructure is fast, often complex, and can get expensive quickly without the proper considerations. It is important to understand what inputs determine cost and how variable a cloud service is to prevent cost overruns. Different services handle bursting much differently (when excess demand spikes past your normal predicted consumption), and that leads to large variations with consumption-based economics.
MarkLogic Data Hub Service uses a consumption model and is priced along three dimensions:
The service takes any workload’s context into account when upscaling or downscaling the cluster. It independently scales operational, analytical, and curation workloads to provide a high degree of reliability and responsiveness. This consumption-based pricing frees organizations from unpredictable spending that often comes with complex cloud services.
MarkLogic Data Hub Service makes scaling and bursting simple and predictable. MarkLogic uses a system like rollover minutes so that unused units get saved, and “rolled over” to the next billing cycle. Organizations can store up to 12x of their capacity (not 2x), and do not have to pay extra to use those credits that were already paid for. With this approach, organizations avoid the costly mistake of provisioning for the peak, but also avoid letting the bill get out of control when a spike happens.
Oracle also uses a consumption-based billing mode, and uses their own cloud credit units, named Oracle uses OCPUs (Oracle Compute Units). Similarly, organizations also have to pay a nominal rate for bandwidth and storage on top of that.
Oracle’s set of cloud services, put together, will be more expensive than MarkLogic Data Hub Service. This is because organizations will simply be running more services that require orders of magnitude more infrastructure (data duplication, backup and recovery, redundant indexing, etc.). Organizations have to pay for capacity for each of Oracle’s services whereas with MarkLogic, you’re only paying for one comprehensive service. Cloud credit units are somewhat arbitrary so it is difficult to be precise, but estimates show that Oracle is 3x the cost of MarkLogic for most use cases. It is well known among both companies and analysts that Oracle’s pricing is a frequent cause for concern, and organizations run the risk of surprise billing in arrears.
And, the cost differences above are not even taking into account how each service handles bursting. Both products can handle bursting, but with Oracle it is more unpredictable, restrictive, and expensive than MarkLogic. With Oracle, organizations are billed pay-as-you-go pricing for excess capacity when bursting, and can only burst to 2x their subscription (See Oracle docs). Also, when organizations exceed their bursting limit, Oracle Cloud notifies them and suspends the account (See Oracle docs).
MarkLogic is a cloud-neutral vendor and a strategic partner of the leading cloud providers. MarkLogic Data Hub Service fits seamlessly into their ecosystems. Rather than just another relational database (AWS and Azure have relational databases you can use), MarkLogic provides a highly differentiated product and provides the flexibility for customers to change cloud providers later if necessary.
Oracle dominated the legacy database software market but that is not the future. The future is cloud data management and that is not Oracle’s strong suit. As an article in Forbes pointed out, “Only 2 percent [of CIOs] surveyed see Oracle as ‘their most integral vendor for cloud computing.'”
The following table compares MarkLogic Data Hub Service to the collection of Oracle Cloud components required to achieve similar functionality.
|MarkLogic Data Hub Service
|Oracle “Cloud Data Hub” Components*
|Security & Governance
*Autonomous DB with options, Data Integrator, GoldenGate, other related cloud services mentioned above
Oracle is designed for storing and managing traditional relational data modeled in rows and columns and queried with SQL. This structured approach, combined with the ubiquity of SQL and relational modeling skills in the market, means that Oracle is used to run transactional and analytical applications for which it is well-suited. Given Oracle’s long history as an on-premises software leader, many organizations already have Oracle ELAs in place. In those instances when data is predictable, managed on-premises, and licenses are available, it makes sense to continue using Oracle.
When data management workloads become larger, more varied, and more complex — and as organizations migrate to the cloud — then MarkLogic is a better choice.
MarkLogic is a better choice than Oracle for use cases around data integration — especially when it involves large, complex data sets required for both transactional and analytical purposes. This may mean building a data hub for use cases like 360 of anything, operational analytics, or search and discovery. Whenever the data is somewhat messy and rapidly changing, it will work better in a data hub with a multi-model database than in an RDBMS.
While MarkLogic is not a rip and replace option for Oracle, MarkLogic can easily ingest data from Oracle. Also, MarkLogic does support relational views for traditional SQL querying and one-click integration with leading BI tools. For these reasons, there are many organizations that use Oracle alongside MarkLogic, allowing both technologies to excel at what they do best. For example, they may use Oracle GoldenGate to help aggregate data from upstream Oracle systems before ingesting it into a MarkLogic Data Hub — this is a common pattern. Oftentimes, it makes sense to take advantage of an existing Oracle license when first getting started with MarkLogic even if there are longer-term plans to phase Oracle out.
Here are some specific examples where organizations specifically chose MarkLogic instead of Oracle:
This quick tour walks through in more detail how a relational approach hampers data integration and creates more risk.
This 3-part blog series, written by an engineering veteran in the financial services industry, discusses why organizations are moving to multi-model and what some of the key concepts are when making the transition.
This in-depth eBook provides a history of data integration and the underlying problems with a relational and ETL-driven approach, and how MarkLogic simplifies it.
See how MarkLogic simplifies complex data problems by delivering data agility.