From DBSight Full-Text Search Engine/Platform Wiki
|Table of contents|
There are two ways to do spell checking.
Spell checking based on words from the index
- If "Enable Index-Specific Spell Checking" is enabled, the words inside the index itself would be used.
- Currently you would need to do a "Re-build the spell check index" from dashboard to re-create the dictionary. The theory is, if the index has grown to a fairly large size, the words inside the index may not vary much between each incremental indexing. So it's OK to just do it now and then.
Spell checking based on User Dictionary
- Currently DBSight will be shipped with default English dictionary for spell check. If you want to use other languages, you can simply delete everything in spell_check.txt and replace it with your language's words. The spell check dictionary file is:
It is a text file, each line represent a valid word. When user submit a query into DBSight, for each word, the spell checker will go through the dictionary, if there is a match, the spell checker will treat it as a valid word, otherwise the closest match will be returned as spell suggestion.
- You can update or replace the dictionary by editing the spell_check.txt, especially if you have some keywords not listed in the dictionary.
- The user dictionary index will be created or refreshed after one incremental or full indexing, which will check the timestamp of spell_check.txt and re-create the index if needed.
Stop Words and Synonyms
There are the words that should be ignored during searching, and usually are some common words, like "an", "the". For example, if a user enter query
search the database
search a database search database search the database
should be matched. What's more,
search the database
will be ranked higher than other results. (available since 1.5.4)
The file is:
Each word takes one line, and case-insensitive.
These are the words that are equivalent to each other. The file is:
Each line has several words separated by spaces, and case-insensitive.
How to use it?
Stop words and synonyms are tightly related to Analyzers. Since each field can have a different Analyzer, each field can also choose to have Stopwords and synonyms applied or not, by selecting the check box along side the Analyzer selection.
These are the words that should not be "analyzed" by analyzers(available since 1.5.5 beta). The file is:
For example, "C#" or "C++" should not be simply analyzed into "C" by most analyzers.
Any fields enabling "Synonyms and Stopwords" will also get this reserved words.
But to be able to search these reserved words, you either can use lucene query directly, like
or use "phrase search" to avoid DBSight query parser applying the analyzer on the query: