Setup For Performance

From DBSight Full-Text Search Engine/Platform Wiki

Table of contents

Environment Setup

Hardware Configuration

For multiple concurrent searches, the conventional hard disk is the bottleneck. It's tested that Solid State Drive helps a lot.

External Links:

  1. http://wiki.statsbiblioteket.dk/summa/Hardware

JVM Configuration

This is an example setup for index with size 3.3G, with 6 multi-Keywords narrowBy fields.

Resin

httpd.exe -J-Xmx1800m -J-Xms1800m -J-XX:NewRatio=3 -J-Xnoclassgc -J-XX:MaxPermSize=768m -J-XX:MaxGCPauseMillis=5000 -J-XX:+CMSClassUnloadingEnabled -J-XX:+CMSPermGenSweepingEnabled

Jetty

-Xmx1800m -Xms1800m -XX:NewRatio=3 -Xnoclassgc -XX:MaxPermSize=768m -XX:MaxGCPauseMillis=5000 -XX:+CMSClassUnloadingEnabled -XX:+CMSPermGenSweepingEnabled

Tomcat

Stop Tomcat server, set environment variable CATALINA_OPTS, and then restart Tomcat.

Look at the file tomcat-install/bin/catalina.sh or catalina.bat for how this variable is used. For example,

set CATALINA_OPTS="-Xmx1800m -Xms1800m -XX:NewRatio=3 -Xnoclassgc -XX:MaxPermSize=768m -XX:MaxGCPauseMillis=5000 -XX:+CMSClassUnloadingEnabled -XX:+CMSPermGenSweepingEnabled"  (Windows)
export CATALINA_OPTS="-Xmx1800m -Xms1800m -XX:NewRatio=3 -Xnoclassgc -XX:MaxPermSize=768m -XX:MaxGCPauseMillis=5000 -XX:+CMSClassUnloadingEnabled -XX:+CMSPermGenSweepingEnabled"  (ksh/bash)
setenv CATALINA_OPTS "-Xmx1800m -Xms1800m -XX:NewRatio=3 -Xnoclassgc -XX:MaxPermSize=768m -XX:MaxGCPauseMillis=5000 -XX:+CMSClassUnloadingEnabled -XX:+CMSPermGenSweepingEnabled"  (tcsh/csh)

In catalina.bat or catallina.sh, you may have noticed CATALINA_OPTS, JAVA_OPTS, or both can be used to specify Tomcat JVM options. What is the difference between CATALINA_OPTS and JAVA_OPTS? The name CATALINA_OPTS is specific for Tomcat servlet container, whereas JAVA_OPTS may be used by other java applications (e.g., JBoss). Since environment variables are shared by all applications, we don't want Tomcat to inadvertently pick up the JVM options intended for other apps. I prefer to use CATALINA_OPTS.

JBoss

Stop JBoss server, edit $JBOSS_HOME/bin/run.conf, and then restart JBoss server.

You can change the line with JAVA_OPTS to something like:

JAVA_OPTS="-server -Xmx1800m -Xms1800m -XX:NewRatio=3 -Xnoclassgc -XX:MaxPermSize=768m -XX:MaxGCPauseMillis=5000 -XX:+CMSClassUnloadingEnabled -XX:+CMSPermGenSweepingEnabled"

Note

  • NewRatio=3 actually is default setting. It's a often misunderstood setting and we actually found the default is best for DBSight. So, just use 3.
  • noclassgc is because the templating language, Velocity, is dynamically generating some java classes. It's best not to auto garbage collect them.

OS setting

If you are using Linux, we highly recommend using ReiserFS for the hard disk partition DBSight works on. You will have faster indexing!

Setup Guideline

If large number of concurrent requests are needed, we would suggest Jetty 6.1 or later, since it has good support for Java NIO. And use the latest Java 6 JVM with the -server option.

If your index size is large, you may need to go with Sun's 64bit JVM + Intel Xeon CPU, and lots of memory.

Choosing Application Sever

Just did a non conclusive compare of Resin/3.1.0 vs Jetty(6.1.7) vs Tomcat(Apache-Coyote/1.1). Same JVM, same machine, no other difference at all. Here are just copy and paste of the "ab" results. In short, the requests per second for Resin, Jetty, and Tomcat are 18.90, 35.07, 37.01 respectively, but keep in mind that failures are 221, 98, 131 among 1000 requests. So, we would recommend Jetty if anyone would ask which application server is better.

First, result for Resin

 [chris@localhost indexes]$ ab -n 1000 -c 10 "http://localhost:8080/dbs/search.do?indexName=my_shuqianji&q=java&btnSearch=dbSearch"
 This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0
 Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
 Copyright 2006 The Apache Software Foundation, http://www.apache.org/
 
 Benchmarking localhost (be patient)
 Completed 100 requests
 Completed 200 requests
 Completed 300 requests
 Completed 400 requests
 Completed 500 requests
 Completed 600 requests
 Completed 700 requests
 Completed 800 requests
 Completed 900 requests
 Finished 1000 requests
 
 
 Server Software:        Resin/3.1.0
 Server Hostname:        localhost
 Server Port:            8080
 
 Document Path:          /dbs/search.do?indexName=my_shuqianji&q=java&btnSearch=dbSearch
 Document Length:        13700 bytes
 
 Concurrency Level:      10
 Time taken for tests:   52.901577 seconds
 Complete requests:      1000
 Failed requests:        221
    (Connect: 0, Length: 221, Exceptions: 0)
 Write errors:           0
 Total transferred:      13898389 bytes
 HTML transferred:       13662389 bytes
 Requests per second:    18.90 [#/sec] (mean)
 Time per request:       529.016 [ms] (mean)
 Time per request:       52.902 [ms] (mean, across all concurrent requests)
 Transfer rate:          256.55 [Kbytes/sec] received
 
 Connection Times (ms)
               min  mean[+/-sd] median   max
 Connect:        0    0   1.6      0      17
 Processing:    79  526 205.5    501    1264
 Waiting:       78  523 205.7    497    1263
 Total:         79  526 205.6    501    1264
 
 Percentage of the requests served within a certain time (ms)
   50%    501
   66%    587
   75%    645
   80%    687
   90%    807
   95%    923
   98%   1012
   99%   1142
  100%   1264 (longest request)

Results for Jetty

 [chris@localhost indexes]$ ab -n 1000 -c 10 "http://localhost:8080/search.do?indexName=my_shuqianji&q=java&btnSearch=dbSearch"
 This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0
 Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
 Copyright 2006 The Apache Software Foundation, http://www.apache.org/
 
 Benchmarking localhost (be patient)
 Completed 100 requests
 Completed 200 requests
 Completed 300 requests
 Completed 400 requests
 Completed 500 requests
 Completed 600 requests
 Completed 700 requests
 Completed 800 requests
 Completed 900 requests
 Finished 1000 requests
 
 
 Server Software:        Jetty(6.1.7)
 Server Hostname:        localhost
 Server Port:            8080
 
 Document Path:          /search.do?indexName=my_shuqianji&q=java&btnSearch=dbSearch
 Document Length:        13700 bytes
 
 Concurrency Level:      10
 Time taken for tests:   28.517546 seconds
 Complete requests:      1000
 Failed requests:        98
    (Connect: 0, Length: 98, Exceptions: 0)
 Write errors:           0
 Total transferred:      13869234 bytes
 HTML transferred:       13703779 bytes
 Requests per second:    35.07 [#/sec] (mean)
 Time per request:       285.175 [ms] (mean)
 Time per request:       28.518 [ms] (mean, across all concurrent requests)
 Transfer rate:          474.94 [Kbytes/sec] received
 
 Connection Times (ms)
               min  mean[+/-sd] median   max
 Connect:        0    0   0.7      0      12
 Processing:    62  282 146.2    253    1115
 Waiting:       62  278 145.9    251    1109
 Total:         62  283 146.3    253    1115
 
 Percentage of the requests served within a certain time (ms)
   50%    253
   66%    333
   75%    373
   80%    399
   90%    475
   95%    559
   98%    660
   99%    725
  100%   1115 (longest request)

Results for Tomcat

 [chris@localhost indexes]$ ab -n 1000 -c 10 "http://localhost:8080/search.do?indexName=my_shuqianji&q=java&btnSearch=dbSearch"
 This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0
 Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
 Copyright 2006 The Apache Software Foundation, http://www.apache.org/
 
 Benchmarking localhost (be patient)
 Completed 100 requests
 Completed 200 requests
 Completed 300 requests
 Completed 400 requests
 Completed 500 requests
 Completed 600 requests
 Completed 700 requests
 Completed 800 requests
 Completed 900 requests
 Finished 1000 requests
 
 
 Server Software:        Apache-Coyote/1.1
 Server Hostname:        localhost
 Server Port:            8080
 
 Document Path:          /search.do?indexName=my_shuqianji&q=java&btnSearch=dbSearch
 Document Length:        13700 bytes
 
 Concurrency Level:      10
 Time taken for tests:   27.16255 seconds
 Complete requests:      1000
 Failed requests:        131
    (Connect: 0, Length: 131, Exceptions: 0)
 Write errors:           0
 Total transferred:      13874970 bytes
 HTML transferred:       13668970 bytes
 Requests per second:    37.01 [#/sec] (mean)
 Time per request:       270.163 [ms] (mean)
 Time per request:       27.016 [ms] (mean, across all concurrent requests)
 Transfer rate:          501.51 [Kbytes/sec] received
 
 Connection Times (ms)
               min  mean[+/-sd] median   max
 Connect:        0    0   0.0      0       1
 Processing:    47  269 129.6    252    1006
 Waiting:       47  265 128.2    247     840
 Total:         47  269 129.6    252    1006
 
 Percentage of the requests served within a certain time (ms)
   50%    252
   66%    318
   75%    357
   80%    377
   90%    430
   95%    491
   98%    595
   99%    647
  100%   1006 (longest request)

DBSight Setup

Caching Facet Search (NarrowBy) Results

The narrowBy search takes a lot of time, usually 4~5 times more than the basic search, especially for searches with a lot of number of hits.

So this feature let you

  1. cache the narrowBy results for searches with the most number of hits (call it TopFacetCache)
  2. cache the most recent narrowBy searches (call it LatestFacetCache)

Configure NarrowBy Caching

Here is how to you can choose the number of cache entries:

For TopFacetCache, you can choose something like 1000, or more, if memory is allowed.

For LatestFacetCache, it's more for users who are paginating the results. So you don't really need to put it pretty high. It could be 10 or 100.