Clustering and Load balancing

From DBSight Full-Text Search Engine/Platform Wiki

There are several kinds of server clustering. One is partition/sharding, one is replication, or a combination of both.

Table of contents

Index Replication

Objective

To increase search throughput by adding more computers.

Setup

Let's say you have server A, server B and server C. The index name is X, and is completed on server A, with scheduling etc.

Manual Setup

  1. download its index configuration from server A
  2. upload the index X' configuration to server B and server C
  3. go to server B, "Data Source"=>"Schedule" page, subscribe to server A, and disable other indexing operations. Do the same on server C.

Setup with Neighbor discovering

The UI will guide you to setup. Should be self evident.

How it works afterwords?

The index X on server B and C will subscribe to the index X on server A.

Whenever server A finishes indexing, index X on server B should get a notification and pull down the index data, warm up, and swap it online.

The notification relies on UDP broadcasting. It may not work on WAN or any special gateway setup. But you still have 2 choices:

  1. Schedule check for updates
  2. Let server A ping a URL to tell Server B to pull data.

This works across data centers.

Load Balancing

DBSight clusters should have a load balancer in front of the servers. All the users' searches are proxyed to each DBSight node.

It's recommended a load balancer with stick session for better search performance and avoid the small gap during replicating the index data.

Partition/Sharding

Objective

To increase total index sizes by adding more computers.

Index Sharding

Partition is more complicated because you need to adjust SQL to select one partition of the data.

Sharded Indexes

The search is as simple as:

http://searchserverA:port/dbsight/search.do?q=...
&indexName=xyz,serverB:port/dbsight,serverC:port/dbsight

Basically you specify other servers by adjusting indexName parameter.