Graph Exploration with Shortest Path Algorithm

Ever played Six Degrees of Kevin Bacon?

The idea is that anyone in Hollywood is linked to the actor in no more than six steps. The game uses a knowledge graph to represent film connections, where actors are nodes and edges connect co-stars. Finding the shortest connection to Kevin Bacon is made possible by an algorithm called shortest path, or Breadth-First Search (BFS), to find the minimum number of steps (films) connecting an actor to Kevin Bacon.

Pathfinding algorithms excel at finding solutions to problems related to route planning, network optimization and resource allocation. Shortest path calculations are paramount in numerous applications such as information retrieval, finding relationships among documents in recommendation systems and the shortest way to get from A to B. They enhance advanced semantic search approaches that use SPARQL queries to retrieve graph data by quickly traversing a knowledge graph and identifying semantically similar documents.

Progress MarkLogic Server 12 now natively supports the graph shortest path algorithm to help increase the performance of your graph queries and improve the contextual relevance of your semantic RAG systems. Additionally, MarkLogic has long supported multi-model workloads utilizing graphs, documents and search. Organizations generate data in many formats. While other solutions try to normalize all the data into a single structure, the MarkLogic platform allows you to use the best database model for that data. It is important to understand that all data is not created equal and can have different shapes and schemas.

In MarkLogic Server, we can leverage these models to build a graph of interconnected data, collections of documents, and tablature views for reporting. Graphs allow you to represent information in an interconnected web of facts and information. Some representative use cases for graphs are Digital Twins, Social Networks and Recommendation Systems.

Let’s see how that works.

Pathfinding Algorithms in Recommendation Systems

Let us consider the degrees of separation from a reader and the reviews that each of these people have submitted. We will construct a query that goes up to three degrees of separation from the current person and combines the reviews for each person in the path. We will be using the Shortest Path algorithm in combination with multi-model data to power a recommendation system.

Graph use cases can contain vast amounts of data. Exploring them can be overwhelming when you have multiple ways to navigate from one point to another. Shortest Path is an algorithm that can be leveraged to determine the shortest and most efficient path from one point to another.

MarkLogic Server exposes the Shortest Path through SPARQL extensions.

xdmp:shortestPath makes use of these named arguments:
xdmp:start – The starting node in the graph
xdmp:end – The ending node in the graph
xdmp:pattern – a SPARQL statement that matches data in the graph
xdmp:path – The nested result of shortest path
xdmp:length – The length of the node and edges of the path

MarkLogic also exposed the Shortest Path through the Optic API using the shortestPath plan modifier.

Building an Example

Let’s create some sample data to illustrate the utilization of the shortest path. The following commands can be run within the MarkLogic Query Console. The sample data creates a graph of interconnected people. A SPARQL Update query can be used to insert people into a named graph. Note the foaf:knows relationship. This will create a labeled edge between two people nodes in the graph.

# Simple Social Graph 

PREFIX foaf: <http://xmlns.com/foaf/0.1/>  
PREFIX dc: <http://purl.org/dc/elements/1.1/>  
PREFIX ex: <https://example.com/graph/people#>  
 
INSERT DATA 
{ 
     GRAPH <https://example.com/graph/people>  
     { 
        ex:1            a                 foaf:Person ; 
                        dc:identifier     1 ;              
                        foaf:firstName    "Andrey" ; 
                        foaf:lastName     "Ohlsen" ; 
                        foaf:knows        ex:2 ; 
                        foaf:knows        ex:3 . 
 
        ex:2            a                 foaf:Person ; 
                        dc:identifier     2 ;              
                        foaf:firstName    "Rodolphe" ; 
                        foaf:lastName     "Alexandersen" ;         
                        foaf:knows        ex:4 . 

        ex:3            a                 foaf:Person ; 
                        dc:identifier     3 ;              
                        foaf:firstName    "Wilow" ; 
                        foaf:lastName     "Duckels" .        

        ex:4            a                 foaf:Person ; 
                        dc:identifier     4 ;              
                        foaf:firstName    "Sheilakathryn" ; 
                        foaf:lastName     "Arkley" ;         
                        foaf:knows        ex:5 . 
 
        ex:5            a                 foaf:Person ; 
                        dc:identifier     5 ;              
                        foaf:firstName    "Frank" ; 
                        foaf:lastName     "Grayley" . 
    } 
}

Now that we have sample data into the database, we can execute a SPARQL Query to see these people and their network of relationships. The foaf:knows relationship can be expanded using SPARQL Property Paths. Property Paths allow you to traverse an arbitrary depth of a graph. In this case, we will use the + modifier to find one or more occurrences of the knows relationship.

The Query

## query 
 
PREFIX foaf: <http://xmlns.com/foaf/0.1/> 
PREFIX dc: <http://purl.org/dc/elements/1.1/>  

SELECT * FROM <https://example.com/graph/people> WHERE { 
  ?person a foaf:Person ; 
          foaf:knows+ ?knows . 
} 
ORDER BY ?person ?knows

The Result

person	knows
<https://example.com/graph/people#1>	<https://example.com/graph/people#2>
<https://example.com/graph/people#1>	<https://example.com/graph/people#3>
<https://example.com/graph/people#1>	<https://example.com/graph/people#4>
<https://example.com/graph/people#1>	<https://example.com/graph/people#5>
<https://example.com/graph/people#2>	<https://example.com/graph/people#4>
<https://example.com/graph/people#2>	<https://example.com/graph/people#5>
<https://example.com/graph/people#4>	<https://example.com/graph/people#5>

The result above shows the starting point and the possible jumps to the end of a path.

In the next example, we’ll use the shortest path to see how we can get from person number one to person number five. Using the SPARQL extensions, we define a start, an end and a pattern. The Filter ensures that we only look for the two people of interest.

The Query

## query 
PREFIX xdmp: <http://marklogic.com/xdmp#> 
PREFIX dc: <http://purl.org/dc/elements/1.1/>  
PREFIX ex: <https://example.com/graph/people#>  
PREFIX foaf: <http://xmlns.com/foaf/0.1/> 

SELECT   ?path ?length

FROM <https://example.com/graph/people> 
WHERE {   
  ( 
    [xdmp:start ?person] 
    [xdmp:end ?knows] 
    [xdmp:pattern "?person foaf:knows ?knows"] 
  ) xdmp:shortestPath ( 
    [xdmp:path ?path]  
    [xdmp:length ?length]  
  )  
  FILTER (?person = ex:1 && ?knows = ex:5) 
}

path

length

[

{

"person": "https://example.com/graph/people#1",
"knows": "https://example.com/graph/people#2"
},
{
"person": "https://example.com/graph/people#2",
"knows": "https://example.com/graph/people#4"
},
{
"person": "https://example.com/graph/people#4",
"knows": "https://example.com/graph/people#5"
}
]

"3"^^xs:unsignedLong

If you prefer to write your application using JavaScript, the shortest path algorithm is also exposed in the MarkLogic Optic API.

Query

'use strict'; 

const op = require('/MarkLogic/optic'); 

// Triple Prefixers 

const ex = op.prefixer('https://example.com/graph/people#'); 
const foaf = op.prefixer('http://xmlns.com/foaf/0.1/'); 

// Pattern Match for known people 
const person = op.col('person'); 
const knows = op.col('knows'); 

// Shortest Path Variable Setup 
const start = op.col('start'); 
const end = op.col('end'); 
const path = op.col('path'); 
const length = op.col('length'); 

op.fromTriples([op.pattern(start, foaf('knows'), end)]) 
  .shortestPath(start, end, path, length) 
  .where(op.and(op.eq(start, ex('1')), op.eq(end, ex('5')))) 
  .result();

Result

{ 
    "start": "https://example.com/graph/people#1", 
    "end": "https://example.com/graph/people#5", 
    "path": [ 
        { 
            "start": "https://example.com/graph/people#1", 
            "end": "https://example.com/graph/people#2" 
        }, 
        { 
            "start": "https://example.com/graph/people#2", 
            "end": "https://example.com/graph/people#4" 
        }, 
        { 
            "start": "https://example.com/graph/people#4", 
            "end": "https://example.com/graph/people#5" 
        } 
    ], 
    "length": 3 
}

Combining Graph and Document Data

The MarkLogic platform allows you to store document data (JSON, XML, etc.) alongside the Graph data in the same database. In this case, we will be using JSON records that contain brief metadata about book reviews. These book reviews will be used in conjunction with the graph to recommend the reader additional books and reviews.

The following code snippet will insert the records into the same database.

'use strict'; 

declareUpdate(); 

const reviews = [ 
    { 
      "uuid": "ae23fdfc-06eb-4cfa-8812-b1c47a87f8d2", 
      "title": "Historical Tales", 
      "reviewer_id": 1, 
      "review_date": "2024-06-17T14:48:43.687096Z", 
      "rating": 3, 
      "genre": "non-fiction" 
    }, 
    { 
      "uuid": "f8358d6e-0ffe-42df-a3ff-a7e4cc0a5042", 
      "title": "Fictional Wonders", 
      "reviewer_id": 4, 
      "review_date": "2024-11-03T14:48:43.687169Z", 
      "rating": 5, 
      "genre": "fiction" 
    }, 
    { 
      "uuid": "661c355d-833f-4ba6-bf04-b8ffa2d6f602", 
      "title": "Non-Fiction Insights", 
      "reviewer_id": 2, 
      "review_date": "2025-02-19T14:48:43.687204Z", 
      "rating": 3, 
      "genre": "non-fiction" 
    }, 
    { 
      "uuid": "aedfc481-7aa2-495a-8628-b72f690399b9", 
      "title": "Historical Tales", 
      "reviewer_id": 1, 
      "review_date": "2025-01-16T14:48:43.687231Z", 
      "rating": 5, 
      "genre": "non-fiction" 
    }, 
    { 
      "uuid": "c09cb86b-7c16-4b5c-b161-eebed1dbda4e", 
      "title": "Love and Betrayal", 
      "reviewer_id": 1, 
      "review_date": "2024-07-01T14:48:43.687257Z", 
      "rating": 1, 
      "genre": "romance" 
    }, 
    { 
      "uuid": "44a00e0e-7626-4955-a7aa-168f746ff917", 
      "title": "Fictional Wonders", 
      "reviewer_id": 4, 
      "review_date": "2024-05-25T14:48:43.687320Z", 
      "rating": 2, 
      "genre": "fiction" 
    }, 
    { 
      "uuid": "766a9792-ba71-4969-b3cc-eb03b3714e3c", 
      "title": "Love and Betrayal", 
      "reviewer_id": 5, 
      "review_date": "2024-12-23T14:48:43.687358Z", 
      "rating": 2, 
      "genre": "romance" 
    }, 
    { 
      "uuid": "366cc1b0-6c68-4d8b-9698-91b675d543de", 
      "title": "The Hidden Truth", 
      "reviewer_id": 4, 
      "review_date": "2024-11-21T14:48:43.687383Z", 
      "rating": 1, 
      "genre": "sci-fi" 
    }, 
    { 
      "uuid": "0e34b7be-5d48-4c1c-856d-bda58eacb242", 
      "title": "The Hidden Truth", 
      "reviewer_id": 2, 
      "review_date": "2024-08-12T14:48:43.687425Z", 
      "rating": 2, 
      "genre": "sci-fi" 
    }, 
    { 
      "uuid": "f5102ffc-b53f-4791-9142-3f9454cdc1d1", 
      "title": "The Great Adventure", 
      "reviewer_id": 5, 
      "review_date": "2024-10-14T14:48:43.687444Z", 
      "rating": 1, 
      "genre": "fiction" 
    } 
  ] 
  ; 

const options = { 
        permissions : xdmp.defaultPermissions(), 
        collections : 'https://example.com/documents/reviews' 
}; 

reviews.forEach(review => xdmp.documentInsert('/data/reviews/' + review.uuid + '.json', review, options));

We can create a view on top of this data to project the information into tablature format. Template Driven Extraction (TDE) is a configuration that creates a Schema and View that maps the JSON data to a Table of Rows and Columns. This can be leveraged in SQL and our Optic API.

'use strict'; 
declareUpdate(); 
 
const tde = require("/MarkLogic/tde.xqy"); 
 
const template = xdmp.toJSON( 
    { 
        "template": { 
            "description": "Review Template", 
            "context": "/", 
            "collections": ["https://example.com/documents/reviews"], 
            "rows": [ 
                { 
                    "schemaName": "Review", 
                    "viewName": "Metadata", 
                    "viewLayout": "sparse", 
                    "columns": [ 
                        { 
                            "name": "source", 
                            "scalarType": "string", 
                            "val": "xdmp:node-uri(.)", 
                            "nullable": false, 
                            "invalidValues": "ignore" 
                        }, 
                        { 
                            "name": "uuid", 
                            "scalarType": "string", 
                            "val": "uuid", 
                            "nullable": false, 
                            "invalidValues": "ignore" 
                        }, 
                        { 
                            "name": "reviewer_id", 
                            "scalarType": "int", 
                            "nullable": true, 
                            "val": "reviewer_id" 
                        }, 
                        { 
                            "name": "title", 
                            "scalarType": "string", 
                            "val": "title", 
                            "nullable": false, 
                            "invalidValues": "ignore" 
                        }, 
                        { 
                            "name": "rating", 
                            "scalarType": "int", 
                            "nullable": true, 
                            "val": "rating" 
                        }, 
                        { 
                            "name": "genre", 
                            "scalarType": "string", 
                            "nullable": true, 
                            "val": "genre" 
                        }, 
                        { 
                            "name": "review_date", 
                            "scalarType": "dateTime", 
                            "nullable": true, 
                            "val": "review_date" 
                        }, 
                    ] 
                } 
            ] 
        } 
    }) 
 
var perms = [xdmp.arrayValues(xdmp.defaultPermissions(null, 'elements'))]; 
perms.push(xdmp.permission("rest-reader", "read")); 
perms.push(xdmp.permission("rest-writer", "update")); 
 
tde.templateInsert("/tde/book-review.json", template, perms);

We now have two distinct data models in the system. A graph representing the people we know and their relationships to each other. And documents that track their reviews.

The following query utilizes the Optic API to traverse the graph and find all shortest paths with a maximum length of three degrees. Once we have these paths, we can look at the view data and merge them with the path data. Note that the order by portion of the query takes into account the review score and the distance in the social network.

Query (Optic with SPARQL and Views):

'use strict'; 

const op = require('/MarkLogic/optic'); 

const pathPlan = op.fromSPARQL(` 
PREFIX xdmp: <http://marklogic.com/xdmp#> 
PREFIX dc: <http://purl.org/dc/elements/1.1/>  
PREFIX ex: <https://example.com/graph/people#>  
PREFIX foaf: <http://xmlns.com/foaf/0.1/> 

SELECT *  
FROM <https://example.com/graph/people> 
WHERE {   
  ( 
    [xdmp:start ?start] 
    [xdmp:end ?end] 
    [xdmp:pattern "?start foaf:knows ?end . ?person dc:identifier ?personId . ?knows dc:identifier ?knowsId"] 
  ) xdmp:shortestPath ( 
    [xdmp:path ?path]  
    [xdmp:length ?length]  
  )  
  #FILTER (?start = ex:1 && ?length = 3) 
} 
`)  

const reviews = op.fromView('Review', 'Metadata') 

pathPlan 
 .unnestLeftOuter('path','pathUnnest','hop') 
  .bind([ 
    op.as('knowsId', op.map.get(op.col('pathUnnest'), 'knowsId')) 
  ])   
  .select(['start', 'end', 'path', 'length', 'knowsId']) 
  .joinInner(reviews, op.on(op.col('reviewer_id'), op.col('knowsId'))) 
  .orderBy([op.asc(op.col('length')), op.desc(op.col('rating'))]) 
  .offsetLimit(0, 10) 
  .result()

Query (Optic with Triples and Views)

'use strict'; 

const op = require('/MarkLogic/optic'); 

// Triple Prefixers 
const ex = op.prefixer('https://example.com/graph/people#'); 
const dc = op.prefixer('http://purl.org/dc/elements/1.1/'); 
const foaf = op.prefixer('http://xmlns.com/foaf/0.1/'); 

// Pattern Match for known people 
const person = op.col('person'); 
const personId = op.col('personId'); 

const knows = op.col('knows'); 
const knowsId = op.col('knowsId'); 

 // Shortest Path Variable Setup 
const start = op.col('start'); 
const end = op.col('end'); 
const path = op.col('path'); 
const length = op.col('length'); 

 const pathPlan = op.fromTriples([ 
    op.pattern(start, foaf('knows'), end), 
    op.pattern(start, dc('identifier'), personId), 
    op.pattern(end, dc('identifier'), knowsId), 
  ]) 
  .shortestPath(start, end, path, length) 
  .where(op.and(op.eq(start, ex('1')), op.le(length, 3))); 

const pathResults = pathPlan.result(); 

const reviews = op.fromView('Review', 'Metadata') 

pathPlan 
  .unnestLeftOuter('path','pathUnnest','hop') 
  .bind([ 
    op.as('knowsId', op.map.get(op.col('pathUnnest'), 'knowsId')) 
  ])   
  .select(['start', 'end', 'path', 'length', 'knowsId']) 
  .joinInner(reviews, op.on(op.col('reviewer_id'), op.col('knowsId'))) 
  .orderBy([op.asc(op.col('length')), op.desc(op.col('rating'))]) 
  .offsetLimit(0, 10) 
  .result()

The Result

{
    "start": "https://example.com/graph/people#4",
    "Review.Metadata.source": "/data/reviews/aedfc481-7aa2-495a-8628-b72f690399b9.json",
    "end": "https://example.com/graph/people#5",
    "Review.Metadata.uuid": "aedfc481-7aa2-495a-8628-b72f690399b9",
    "path": [
        {
            "start": "https://example.com/graph/people#4",
            "end": "https://example.com/graph/people#5",
            "person": "https://example.com/graph/people#5",
            "personId": 5,
            "knows": "https://example.com/graph/people#1",
            "knowsId": 1
        }
    ],
    "Review.Metadata.reviewer_id": 1,
    "length": 1,
    "Review.Metadata.title": "Historical Tales",
    "knowsId": 1,
    "Review.Metadata.rating": 5,
    "Review.Metadata.genre": "non-fiction",
    "Review.Metadata.review_date": "2025-01-16T14:48:43.687231Z"
}
{
    "start": "https://example.com/graph/people#2",
    "Review.Metadata.source": "/data/reviews/aedfc481-7aa2-495a-8628-b72f690399b9.json",
    "end": "https://example.com/graph/people#4",
    "Review.Metadata.uuid": "aedfc481-7aa2-495a-8628-b72f690399b9",
    "path": [
        {
            "start": "https://example.com/graph/people#2",
            "end": "https://example.com/graph/people#4",
            "person": "https://example.com/graph/people#5",
            "personId": 5,
            "knows": "https://example.com/graph/people#1",
            "knowsId": 1
        }
    ],
    "Review.Metadata.reviewer_id": 1,
    "length": 1,
    "Review.Metadata.title": "Historical Tales",
    "knowsId": 1,
    "Review.Metadata.rating": 5,
    "Review.Metadata.genre": "non-fiction",
    "Review.Metadata.review_date": "2025-01-16T14:48:43.687231Z"
}

Conslusion

MarkLogic’s multi-model capabilities empower organizations to harness the full potential of their data by integrating graph and document models in a single platform. Using Knowledge Graphs and powerful algorithms like Shortest Path, you can uncover meaningful relationships and insights across complex datasets. By combining these insights with structured document data using tools like TDE and the Optic API, MarkLogic provides a unified platform for advanced analytics and decision-making.

Start exploring how the MarkLogic platform graph and document integration can transform your use cases.

Drew Wanczowski

Drew Wanczowski is a Principal Solutions Engineer at MarkLogic in North America. He has worked on leading industry solutions in Media & Entertainment, Publishing, and Research. His primary specialties surround Content Management, Metadata Standards, and Search Applications.