Seven Reasons to Use a Range Index in MarkLogic

April 28, 2017 Data Platform

Do you find yourself getting confused about range indexes in MarkLogic? When should you use range indexes?

Luckily the whitepaper Inside MarkLogic Server explains 7 uses for range indexes. If you haven’t read it, you should really consider doing so for a better understanding of how MarkLogic Server works internally. The following information comes from page 26:

1. Perform fast range queries.

For example, you can provide a query constraint for documents having a date between two given endpoints.

2. Perform data-type-aware equality queries.

For example, you can compare decimal or date values based on their semantic value rather than their lexical serialized value. (Note that you can perform equality comparisons for numeric and Boolean data types in JSON documents without a range index. See “Indexing JSON” above for details.)

3. Quickly extract specific values from the entries in a result set.

For example, you can get a distinct list of message senders from documents in a result set, as well as how often each sender appears. These are often called facets and are displayed with search results to aid search navigation. You can also perform fast aggregates against the extracted values to calculate things like standard deviation and covariance.

4. Perform optimized order by calculations.

For example, you can sort a large set of product results by price.

5. Perform efficient cross-document joins.

For example, if you have a set of documents describing people and a set of documents describing works authored by those people, you can use range indexes to efficiently run queries looking for certain kinds of works authored by certain kinds of people.

6. Perform complex date-time queries on bitemporal documents.

Bitemporal documents include four range indexes that track when events occurred in the real world as well as when the events were stored in MarkLogic. Querying the four range indexes and merging the results is key to resolving bitemporal queries. See the “Bitemporal” section for details.

7. Quickly extract co-occurring values from the entries in a result set.

For example, you can quickly get a report for which two entity values appear most often together  in documents, without knowing either of the two entity values in advance. This is a more advanced use case, so we won’t cover it in this book.

Paxton Hare