Search with accented characters
Sitefinity CMS uses the Lucene search provider by default. Lucene uses the so-called analyzer classes to examine indexed terms from text and generate a token stream. To implement an accent-insensitive search in Sitefinity CMS, you replace the default analyzer used by Lucene with one that replaces accented characters with the corresponding unaccented ones.
Lucene provides several filter classes, for example, the ASCIIFoldingFilter class, which you can use to customize the search functionality and convert special characters.
For more information, see:
The following example demonstrates how to implement a custom analyzer class:
using Lucene.Net.Analysis;
using Lucene.Net.Analysis.Core;
using Lucene.Net.Analysis.Miscellaneous;
using Lucene.Net.Analysis.Standard;
using Lucene.Net.Analysis.Util;
using Lucene.Net.Util;
using System.Collections.Generic;
namespace SitefinityWebApp
{
public class AccentInsensitiveAnalyzer : Analyzer
{
private readonly CharArraySet stopWords;
public AccentInsensitiveAnalyzer(ISet<string> stopWords)
{
this.stopWords = new CharArraySet(LuceneVersion, stopWords, true);
}
protected override TokenStreamComponents CreateComponents(string fieldName, System.IO.TextReader reader)
{
var tokenizer = new StandardTokenizer(LuceneVersion, reader);
TokenStream tokenStream = new LowerCaseFilter(LuceneVersion, tokenizer);
tokenStream = new StopFilter(LuceneVersion, tokenStream, stopWords);
tokenStream = new ASCIIFoldingFilter(tokenStream);
return new TokenStreamComponents(tokenizer, tokenStream);
}
public static LuceneVersion LuceneVersion = LuceneVersion.LUCENE_48;
}
}
In the code above, you use the ASCIIFoldingFilter class to filter the result in the token stream of the custom analyzer.
To enable Lucene to use your custom analyzer in Sitefinity CMS, you have to register the custom analyzer in Sitefinity CMS using the ObjectFactory class. You do this in the Application_Start method of your Global.asax class:
using System;
using System.Collections.Generic;
using Telerik.Microsoft.Practices.Unity;
using Telerik.Sitefinity.Abstractions;
using Telerik.Sitefinity.Services;
namespace SitefinityWebApp
{
public class Global : System.Web.HttpApplication
{
protected void Application_Start(Object sender, EventArgs e)
{
SystemManager.ApplicationStart += ApplicationStartHandler;
}
private void ApplicationStartHandler(object sender, EventArgs e)
{
ObjectFactory.Container.RegisterType<Analyzer, AccentInsensitiveAnalyzer>(
new ContainerControlledLifetimeManager(),
new InjectionConstructor(new InjectionParameter<ISet<string>>(new HashSet<string>())));
}
}
}
RESULT: Your new analyzer class is used during indexing. This means that any accented characters are replaced with their unaccented equivalents only during indexing and not during searching.