Customize the Lucene search scoring

The out-of-the-box Sitefinity CMS search indexing is based on Lucene .NET. Lucene uses a combination of the Vector Space Model (VSM) of Information Retrieval and the Boolean model to determine how relevant a document is to a user's query. It assigns a default score between 0 and 1 to all search results, depending on multiple factors related to document relevancy. The score is dynamically calculated between multiple searches, meaning that same document can have different scores for different searches. This is due to the Lucene score normalization algorithms.

Sitefinity CMS versions 11.1 and above exposes a mechanism for influencing the Lucene search results via choosing the best algorithm to calculate search score and boosting selected documents. The following article explains you can customize the Lucene scoring in Sitefinity CMS.

Search score theory - choosing the best boosting formula for your use case

A common use case scenario is boosting the recently modified documents to appear as more relevant search results. To demonstrate customizing the default scoring mechanism, we’ll showcase this example. When customizing the Lucene scoring mechanism in Sitefinity CMS, the Sitefinity CMS API exposes the default Lucene score and all the document info, so you can design multiple approaches to boosting the score:

Using a multiplier function based on content age

  • finalScore = defaultScore * (1/contentAge)

A multiplier function is when you design a value which will be used to multiply the default Lucene score. To boost documents based on how recent they are, content age is the most suitable value to consider. Content age represents the difference between now and the time the document was last modified. One disadvantage of this approach is that the multiplier function is linear and will not work very well when contentAge is 0. Another possible problem might be the maximum of the multiplier function becoming too huge, thus making the default score irrelevant.  

Using a multiplier function based on content age and a constant

  • finalScore = defaultScore * (1/(constant + contentAge))

An alternative approach is adding a constant to the formula, where the constant can be any number, depending how much we want to boost the new results. For example, 2 does the job relatively well.

Adding a constant makes the boosting function still linear, but it has an improved effect on boosting recent items more aggressively than older results.

Using an exponential boosting function with several constants

  • finalScore = defaultScore * ((boostFactor / (maxRampFactor + days)) ^ (1 / curveAdjustmentFactor))

To address the potential of the boosting function to behave too linear, you can use more than one constant to introduce variables such as boostFactor, maxRampFactor and curveAdjustmentFactor. For example, a function that is getting the job well done could be:

  •  finalScore = defaultScore * ((100 / (5 + days)) ^ (1 / 5))

To understand better how to fine-tune these constants fit your preference, preferences, refer to the following diagram visualizing the boosting formula:

boost_formula_diagram

Implementing the custom search scoring

To implement the custom Lucene scoring you need to plug in to the Sitefinity CMS LuceneSearchService and replace the  default scoring algorithm with a custom one that inherits form the Lucene CustomScoreQuery class.

Create a custom score query

To create a custom score query, you must start by adding a new class which inherits from the Lucene CustomScoreProvider. This provider is responsible for the search score logic. Inside the new class you must override the CustomScore method. This method gives you access to the Lucene document and the default score, which you can obtain by making a call to the base class method. From the document object you can extract the LastModified field value and use it to determine the document age in days. Now that you have access to the content age and default score, you can implement your desired custom scoring logic. For example, to implement an exponential boosting function with several constants, as described earlier in this article, you can add a method in your custom provider called CalculateBoost. You can call this method from the CustomScore method and pass the calculated content age as a parameter. Inside CalculateBoost you can calculate a boost value based on the additional constants you define and the content age input. Finally, you can return the calculated boost value, and use it inside the CustomScore method to adjust the default score (adjustedScore = baseScore * boost).

Once you have completed implementing the custom score provider, you must add a new class and inherit from the Lucene CustomScoreQuery class. Inside this class you must override the GetCustomScoreProvider method, which instructs Lucene which provider to use when determining the search score. In the overridden GetCustomScoreProvider method you must return your custom score provider. The following code sample demonstrates the full implementation:

Replace the  default scoring algorithm in Sitefinity CMS LuceneSearchService

To configure Sitefinity CMS to use your custom score logic, you must create a custom LuceneSearchService, where you will return the custom score query instead of the default one.

You must start by adding a new class which inherits from the Sitefinity CMS LuceneSearchService class. Inside the new class, override the BuildLuceneQuery method. In your implementation of the BuildLuceneQuery method you must get an instance of the Lucene QueryParser, and parse the compiled query, which comes as a method argument. Then you must instantiate your custom score query class and pass the parsed query as an argument. Finally, return the object that is constructed by your custom score query class from the BuildLuceneQuery method. This way the parsed query will go through your custom logic and will be passed back to the Sitefinity CMS default code flow. The following sample demonstrates implementing a custom LuceneSearchService to achieve this functionality:

To complete the task, you must replace the default LuceneSearchService with your custom one. You can do this either through the Sitefintiy CMS administrative backend or inside your website Global.asax class.

To replace the default LuceneSearchService with your custom one via configurations, follow these steps:

  1. Navigate to your Sitefinity CMS backend UI and click on Administration -> Settings -> Advanced
  2. From the navigation menu on the Advanced configurations screen expand Search -> Search Services and click on LuceneSearchService
  3. Change the TypeName to the CLR type of your customized LuceneSearchService, for example SitefinityWebApp.CustomizedLuceneSearchService

Alternatively, you can replace the default LuceneSearchService with your custom one through code via the Sitefinity CMS ServiceBus implementation. To do this, implement the followingcode inside your Global.asax:

NOTE: The same approach, described in this article can be used to boost your content search score based on any other field, using any custom algorithm. Just choose the formula that best represent the boost significance for your specific case and modify the default boost. You can also chain multiple boosting formulas.

 

 

Was this article helpful?