Setup For Performance
From DBSight Full-Text Search Engine/Platform Wiki
| Table of contents |
Environment Setup
Hardware Configuration
For multiple concurrent searches, the conventional hard disk is the bottleneck. It's tested that Solid State Drive helps a lot.
External Links:
JVM Configuration
This is an example setup for index with size 3.3G, with 6 multi-Keywords narrowBy fields.
Resin
httpd.exe -J-Xmx1800m -J-Xms1800m -J-XX:NewRatio=3 -J-Xnoclassgc -J-XX:MaxPermSize=768m -J-XX:MaxGCPauseMillis=5000 -J-XX:+CMSClassUnloadingEnabled -J-XX:+CMSPermGenSweepingEnabled
Jetty
-Xmx1800m -Xms1800m -XX:NewRatio=3 -Xnoclassgc -XX:MaxPermSize=768m -XX:MaxGCPauseMillis=5000 -XX:+CMSClassUnloadingEnabled -XX:+CMSPermGenSweepingEnabled
Tomcat
Stop Tomcat server, set environment variable CATALINA_OPTS, and then restart Tomcat.
Look at the file tomcat-install/bin/catalina.sh or catalina.bat for how this variable is used. For example,
set CATALINA_OPTS="-Xmx1800m -Xms1800m -XX:NewRatio=3 -Xnoclassgc -XX:MaxPermSize=768m -XX:MaxGCPauseMillis=5000 -XX:+CMSClassUnloadingEnabled -XX:+CMSPermGenSweepingEnabled" (Windows) export CATALINA_OPTS="-Xmx1800m -Xms1800m -XX:NewRatio=3 -Xnoclassgc -XX:MaxPermSize=768m -XX:MaxGCPauseMillis=5000 -XX:+CMSClassUnloadingEnabled -XX:+CMSPermGenSweepingEnabled" (ksh/bash) setenv CATALINA_OPTS "-Xmx1800m -Xms1800m -XX:NewRatio=3 -Xnoclassgc -XX:MaxPermSize=768m -XX:MaxGCPauseMillis=5000 -XX:+CMSClassUnloadingEnabled -XX:+CMSPermGenSweepingEnabled" (tcsh/csh)
In catalina.bat or catallina.sh, you may have noticed CATALINA_OPTS, JAVA_OPTS, or both can be used to specify Tomcat JVM options. What is the difference between CATALINA_OPTS and JAVA_OPTS? The name CATALINA_OPTS is specific for Tomcat servlet container, whereas JAVA_OPTS may be used by other java applications (e.g., JBoss). Since environment variables are shared by all applications, we don't want Tomcat to inadvertently pick up the JVM options intended for other apps. I prefer to use CATALINA_OPTS.
JBoss
Stop JBoss server, edit $JBOSS_HOME/bin/run.conf, and then restart JBoss server.
You can change the line with JAVA_OPTS to something like:
JAVA_OPTS="-server -Xmx1800m -Xms1800m -XX:NewRatio=3 -Xnoclassgc -XX:MaxPermSize=768m -XX:MaxGCPauseMillis=5000 -XX:+CMSClassUnloadingEnabled -XX:+CMSPermGenSweepingEnabled"
Note
- NewRatio=3 actually is default setting. It's a often misunderstood setting and we actually found the default is best for DBSight. So, just use 3.
- noclassgc is because the templating language, Velocity, is dynamically generating some java classes. It's best not to auto garbage collect them.
OS setting
If you are using Linux, we highly recommend using ReiserFS for the hard disk partition DBSight works on. You will have faster indexing!
Setup Guideline
If large number of concurrent requests are needed, we would suggest Jetty 6.1 or later, since it has good support for Java NIO. And use the latest Java 6 JVM with the -server option.
If your index size is large, you may need to go with Sun's 64bit JVM + Intel Xeon CPU, and lots of memory.
Choosing Application Sever
Just did a non conclusive compare of Resin/3.1.0 vs Jetty(6.1.7) vs Tomcat(Apache-Coyote/1.1). Same JVM, same machine, no other difference at all. Here are just copy and paste of the "ab" results. In short, the requests per second for Resin, Jetty, and Tomcat are 18.90, 35.07, 37.01 respectively, but keep in mind that failures are 221, 98, 131 among 1000 requests. So, we would recommend Jetty if anyone would ask which application server is better.
First, result for Resin
[chris@localhost indexes]$ ab -n 1000 -c 10 "http://localhost:8080/dbs/search.do?indexName=my_shuqianji&q=java&btnSearch=dbSearch" This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0 Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Copyright 2006 The Apache Software Foundation, http://www.apache.org/ Benchmarking localhost (be patient) Completed 100 requests Completed 200 requests Completed 300 requests Completed 400 requests Completed 500 requests Completed 600 requests Completed 700 requests Completed 800 requests Completed 900 requests Finished 1000 requests Server Software: Resin/3.1.0 Server Hostname: localhost Server Port: 8080 Document Path: /dbs/search.do?indexName=my_shuqianji&q=java&btnSearch=dbSearch Document Length: 13700 bytes Concurrency Level: 10 Time taken for tests: 52.901577 seconds Complete requests: 1000 Failed requests: 221 (Connect: 0, Length: 221, Exceptions: 0) Write errors: 0 Total transferred: 13898389 bytes HTML transferred: 13662389 bytes Requests per second: 18.90 [#/sec] (mean) Time per request: 529.016 [ms] (mean) Time per request: 52.902 [ms] (mean, across all concurrent requests) Transfer rate: 256.55 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 1.6 0 17 Processing: 79 526 205.5 501 1264 Waiting: 78 523 205.7 497 1263 Total: 79 526 205.6 501 1264 Percentage of the requests served within a certain time (ms) 50% 501 66% 587 75% 645 80% 687 90% 807 95% 923 98% 1012 99% 1142 100% 1264 (longest request)
Results for Jetty
[chris@localhost indexes]$ ab -n 1000 -c 10 "http://localhost:8080/search.do?indexName=my_shuqianji&q=java&btnSearch=dbSearch" This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0 Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Copyright 2006 The Apache Software Foundation, http://www.apache.org/ Benchmarking localhost (be patient) Completed 100 requests Completed 200 requests Completed 300 requests Completed 400 requests Completed 500 requests Completed 600 requests Completed 700 requests Completed 800 requests Completed 900 requests Finished 1000 requests Server Software: Jetty(6.1.7) Server Hostname: localhost Server Port: 8080 Document Path: /search.do?indexName=my_shuqianji&q=java&btnSearch=dbSearch Document Length: 13700 bytes Concurrency Level: 10 Time taken for tests: 28.517546 seconds Complete requests: 1000 Failed requests: 98 (Connect: 0, Length: 98, Exceptions: 0) Write errors: 0 Total transferred: 13869234 bytes HTML transferred: 13703779 bytes Requests per second: 35.07 [#/sec] (mean) Time per request: 285.175 [ms] (mean) Time per request: 28.518 [ms] (mean, across all concurrent requests) Transfer rate: 474.94 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.7 0 12 Processing: 62 282 146.2 253 1115 Waiting: 62 278 145.9 251 1109 Total: 62 283 146.3 253 1115 Percentage of the requests served within a certain time (ms) 50% 253 66% 333 75% 373 80% 399 90% 475 95% 559 98% 660 99% 725 100% 1115 (longest request)
Results for Tomcat
[chris@localhost indexes]$ ab -n 1000 -c 10 "http://localhost:8080/search.do?indexName=my_shuqianji&q=java&btnSearch=dbSearch" This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0 Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Copyright 2006 The Apache Software Foundation, http://www.apache.org/ Benchmarking localhost (be patient) Completed 100 requests Completed 200 requests Completed 300 requests Completed 400 requests Completed 500 requests Completed 600 requests Completed 700 requests Completed 800 requests Completed 900 requests Finished 1000 requests Server Software: Apache-Coyote/1.1 Server Hostname: localhost Server Port: 8080 Document Path: /search.do?indexName=my_shuqianji&q=java&btnSearch=dbSearch Document Length: 13700 bytes Concurrency Level: 10 Time taken for tests: 27.16255 seconds Complete requests: 1000 Failed requests: 131 (Connect: 0, Length: 131, Exceptions: 0) Write errors: 0 Total transferred: 13874970 bytes HTML transferred: 13668970 bytes Requests per second: 37.01 [#/sec] (mean) Time per request: 270.163 [ms] (mean) Time per request: 27.016 [ms] (mean, across all concurrent requests) Transfer rate: 501.51 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.0 0 1 Processing: 47 269 129.6 252 1006 Waiting: 47 265 128.2 247 840 Total: 47 269 129.6 252 1006 Percentage of the requests served within a certain time (ms) 50% 252 66% 318 75% 357 80% 377 90% 430 95% 491 98% 595 99% 647 100% 1006 (longest request)
DBSight Setup
Caching Facet Search (NarrowBy) Results
The narrowBy search takes a lot of time, usually 4~5 times more than the basic search, especially for searches with a lot of number of hits.
So this feature let you
- cache the narrowBy results for searches with the most number of hits (call it TopFacetCache)
- cache the most recent narrowBy searches (call it LatestFacetCache)
Configure NarrowBy Caching
Here is how to you can choose the number of cache entries:
For TopFacetCache, you can choose something like 1000, or more, if memory is allowed.
For LatestFacetCache, it's more for users who are paginating the results. So you don't really need to put it pretty high. It could be 10 or 100.
