Clustering and Load balancing
From DBSight Full-Text Search Engine/Platform Wiki
There are several kinds of server clustering. One is partition/sharding, one is replication, or a combination of both.
| Table of contents |
|
|
Index Replication
Objective
To increase search throughput by adding more computers.
Setup
Let's say you have server A, server B and server C. The index name is X, and is completed on server A, with scheduling etc.
Manual Setup
- download its index configuration from server A
- upload the index X' configuration to server B and server C
- go to server B, "Data Source"=>"Schedule" page, subscribe to server A, and disable other indexing operations. Do the same on server C.
Setup with Neighbor discovering
The UI will guide you to setup. Should be self evident.
How it works afterwords?
The index X on server B and C will subscribe to the index X on server A.
Whenever server A finishes indexing, index X on server B should get a notification and pull down the index data, warm up, and swap it online.
The notification relies on UDP broadcasting. It may not work on WAN or any special gateway setup. But you still have 2 choices:
- Schedule check for updates
- Let server A ping a URL to tell Server B to pull data.
This works across data centers.
Load Balancing
DBSight clusters should have a load balancer in front of the servers. All the users' searches are proxyed to each DBSight node.
It's recommended a load balancer with stick session for better search performance and avoid the small gap during replicating the index data.
Partition/Sharding
Objective
To increase total index sizes by adding more computers.
Index Sharding
Partition is more complicated because you need to adjust SQL to select one partition of the data.
Sharded Indexes
The search is as simple as:
http://searchserverA:port/dbsight/search.do?q=... &indexName=xyz,serverB:port/dbsight,serverC:port/dbsight
Basically you specify other servers by adjusting indexName parameter.
