Search Multiple Data Sources

From DBSight Full-Text Search Engine/Platform Wiki

Table of contents

Search Multiple Indexes

Search Multiple Indexes On the Same Server

If you have multiple databases with a common structure, this may be what you are looking for.

Prequisite

  • You need to define the index configuration, like connection info, main query and subsequent query, indexing schedule, etc, for each database.

Searching

When searching, using this format

search.do?indexName=index1,index2,index3&...   //search index1, index2, index3. The first index is "index1"

Rendering

If templateName is not specified, DBSight will use the template of the first specified index. However, this is not recommended since maybe the template are not the same.

Here are some examples showing how to specify the indexName and templateName:

indexName=a,b,c&templateName=x|y|main.jsp   => search on index:a,b,c, render by index:x, template:y, file: main.jsp
indexName=a,b,c&templateName=x|y            => search on index:a,b,c, render by index:x, template:y, file: main.vm
indexName=a,b,c&templateName=y|main.jsp     => search on index:a,b,c, render by index:a, template:y, file: main.jsp
indexName=x&templateName=y|main.jsp         => search on index:x, render by index:x, template:y, file: main.jsp
indexName=x&templateName=y                  => search on index:x, render by index:x, template:default template, file: main.vm
indexName=&templateName=x|y|main.ftl        => search on index:x, render by index:x, template:y, file: main.ftl
indexName=&templateName=x|y                 => search on index:x, render by index:x, template:y, file: main.vm

As you may notice, you can specify main.jsp, to use a jsp to render results. Also ftl for freemarker servlet. Actually, you can even use any other file name, like documents.jsp, to render a partial of template results. It will come handy when using AJAX.

Search Multiple Indexes Across the Multiple Servers!

Let say you have 3 indexes on 3 DBSight servers respectively. They have normal search URL like this:

http://hostname1:port1/dbsight/search.do?indexName=index1&...
http://hostname2:port2/dbsight/search.do?indexName=index2&...
http://hostname3:port3/dbsight_different/search.do?indexName=index3&...

Then you can combine the search results from the 3 indexes, and render results via the configuration defined on hostname1:

http://hostname1:port1/dbsight/search.do?indexName=index1,index2@hostname2:port2/dbsight,index3@hostname3:port3/dbsight_different

Shard Search Parameters

Most parameters are the same, like "start", "length"/"limit" etc.

One parameter unique for Shard Search is "timeout" (in milliseconds). This is because when talking to several remote nodes, some nodes may go down unexpectedly. You don't want to just wait for this particularly slow node. The default value is 10 seconds ( 10000 milliseconds). Please adjust it to fit your network requirements.

This way, you can achieve a very scalable sharded search solution.

Maintain Multiple Indexes

To make the above searching work, the indexes should be mostly the same, especially with the same set of columns. The jdbc connection, SQLs can vary a little. Most likely you want to vary on how to partition the data.

For example, you can partition the data via date, by having these different Main Queries for the different boxes:

select * from table1 where created_date < to_date('2000-01-01')
select * from table1 where created_date > to_date('2000-01-01') and created_date <= to_date('2005-01-01')
select * from table1 where created_date > to_date('2005-01-01') and created_date <= to_date('2010-01-01')

Partitioning the data via date is a common approach. You can leave old data untouched so it will not slow down the indexing process, and phase out the old data when convenient.