Exact Search plus fuzzy double Metaphone search

From DBSight Full-Text Search Engine/Platform Wiki

Table of contents

Introduction

A very common feature is that if there is no exact match, DBSight should return something similar. A customer, "merlin", has a solution to analyze the same field with 2 different analyzers, and combine the search results together. Here is the original link:

 http://www.dbsight.net/index.php?q=node/507

And here is his blog page:

 http://merlin1109.blogspot.com/2011/03/dbsight-combining-levenshtin-distance.html

Seems it's his first blog page. So I copied the content here:


DBSight - combining levenshtin distance and double metaphone

My attempt to make "lastname" search friendly to users.

Decision: Use DBSight, utilize multiple analyzers: numberOrLowerCase, Double Metaphone.

Problem

People don't spell names correctly all the time, for example, I'm trying to search for 'picasso', but i decided to misspell it as for 'picaso' (with a missing s). From the search result, 'Bachs' alone with a bunch of other names showed up with the same score as 'picasso'. But to have 'picasso' show up on the top of the list is what I want to achieve. So, I'm thinking, if i can put levenshtin distance in there and assign it with a higher score, then i might be able to solve this problem.(configure search --> searchable columns)

Solution

lastNameT is for text search, so when i type in "picasso" the correct spelling, the exact spelling shows up on the top. This field is also use for Stemming or Levenshtin distance, so that words that spells similarly will weight more than the phonetic matches. if i only use phonetic, names with similar sounds will show up with the same score. lastNameP is the phonetic one.


so here is my setup:

lastNameT, lastNameP all hold the same value, and they are only differ in analyzer and weight. to achieve this, i did the following.

1. getData Data Source->select data->sql


select id, lastName as lastNameT, lastName as lastNameP from mytable



1.2 i have my id as primary key and and the names are text fieldTypes


2. adjust analyzer Data source -> language change the analyzer setting for lastNameT, lastNameP to numberOrLowerCase, Double Metaphone respectively.


3. enable spellcheck (it's really nice to have) Data source -> spell check check the checkbox on lastNameT check the checkbox on "use index-specific" spell checking. (using regular dictionary to correct my spelling of a person's last name wouldn't make too much sense since i would only be interested in what my database has to offer)


4. adjust weight (depending on needs) fieldName type FieldType Analyzer weight lastNameT String text numberOrLowerCase 2.0 lastNameP String text Double Metaphone 1.0

5. enable wild card search(optional)

configure search -> wild-card

this part is very self explanatory.

make a template and when you type in 'picaso' in the search box, hit enter and then add &lq=lastnameT:picaso~0.4 to the end of the url.

there u have it! picasso on top of the list!

because i don't want to modify the URL at the address bar every time when i do a search, i did the following modification to the template so that it captures the lq parameter and fires it off with the form submit.

6. create and modify display template.

 display template -> create from scaffold 

6.2 create a display template.(i used client side sortable table)

6.3 modify template.

display template -> list
  click on the template name and locate searchBox.ftl and add the following java script to the end of the file.
<script>
function populateLQ() {
 var newtext = document.search_demo.q.value;
   document.search_demo.lq.value ='lastNameT:'+newtext+'~0.4';
}
</script>

Note1: line 3:document.search_demo.q.value must be modified to match the name of your index. eg. document.search_myNameIndex.q.value; Note2: notice the ~0.4, that's what i use, because i think 0.4 is close enough.

add a hidden field

<input type="hidden" name="lq" id="lq" value="">

modify the input text field of search box From

<input type="text" name="q" id="q" size="41" maxlength="2048" value="${searchResult.userInput?html}"> 

To:

<input type="text" name="q" id="q" size="41" maxlength="2048" value="${searchResult.userInput?html}" onchange="populateLQ();"> 

this way the input also modifies the hidden field that will be passed for search.

Done!

After thoughts

I later created another index that searches the combination of first name and last name by concatenating the first name and last name separated by a space and added auto complete from partial scaffold->"search suggestion". which is much more user friendly.

I'm sure there are many other ways of doing this and much better ways too. So please share your criticisms, thoughts or what ever you have in mind.

thanks Paul, Will. (DBSight is awesome!)