services-hub-hero

Getting Started with the MarkLogic Data Hub

Instructor-Led Training

services-education-illustration

Course Description

Learn to build a MarkLogic Data Hub powered by the MarkLogic database to help accelerate data integration projects and deliver faster time to value to your customers.

Attendees completing this course will get hands-on with the MarkLogic Data Hub software in order to build a MarkLogic Data Hub capable of powering the data services needed to meet the business requirements and objectives defined in the hands-on project user stories.

By completing this course you will be able to:

  • Describe what the MarkLogic Data Hub is
  • Describe when and why you would use the MarkLogic Data Hub
  • Create a MarkLogic Data Hub project
  • Implement a security model
  • Deploy project configuration using ml-gradle
  • Configure entities
  • Configure indexes
  • Control access to sensitive PII (personally identifiable information)
  • Create flow pipelines to ingest, curate, and master data
  • Run and debug flow pipelines
  • Configure ingestion steps
  • Use Apache NiFi
  • Configure mapping steps
  • Use pre-built mapping functions
  • Develop and deploy custom mapping functions
  • Integrate RDF data (semantic triples) in a hub
  • Program, deploy and run a custom data harmonization step
  • Configure Smart Mastering matching and merging steps
  • Access curated data from the hub using JavaScript and SPARQL

Audience

MarkLogic Developers, Data Architects

Duration

8 hours

Prerequisites

Course Outline

Data Services First

  • Understand the high-level approach to data integration projects using the MarkLogic Data Hub
  • Understand the customer and business requirement for the course hands-on project
  • Understand the user stories and technical requirements for the course hands-on project
  • Understand the data sources available for the course hands-on project

The MarkLogic Data Hub

  • Understand what it is
  • Understand what it does
  • Initialize and install a new MarkLogic Data Hub project

Implement Security

  • Create users and roles for both business users and members of the technical project team
  • Understand how to use Data Hub specific roles
  • Implement role hierarchies
  • Assign execute privileges necessary to meet project requirements
  • Deploy security configuration using QuickStart and ml-gradle

Create an Entity

  • Create a new entity
  • Define properties
  • Configure Indexed
  • Protect access to PII (personally identifiable information)

Ingest Data

  • Create flow pipelines
  • Configure ingestion steps
  • Understand the purpose and use of the staging and final databases in a MarkLogic Data Hub
  • Implement key data modeling concepts including document URIs, collections, document permissions, property naming best practices, geospatial data modeling patterns, denormalization, and the use of the envelope pattern

Curate Data

  • Configure mapping steps
  • Use pre-built mapping functions
  • Program, deploy and use a custom mapping function
  • Test and debug mapping steps

Use Semantics

  • Understand key semantic data modeling concepts including triples, IRIs, ontology triples, managed and unmanaged triples
  • Load triples to a MarkLogic Data Hub
  • Program, deploy and use a custom harmonization step to add triples to the envelope of a document

Access Data

  • Explore the use of JavaScript APIs
  • Explore the use of SPARQL
  • Validate that the curated data from the hub can be used to meet the business and technical requirements for the hands-on project

Adapt to Change: Perform Another Iteration of Ingest | Curate | Access

  • Ingest a new data source
  • Curate the new data so that it can be consumed in the same way as existing data

Use Smart Mastering

  • Configure a matching step
  • Configure a merging step
  • Test Smart Mastering
  • Explore mastered data

Available dates

Live Online

See all available dates here.