From DBSight Full-Text Search Engine/Platform Wiki
What DBSight offers
DBSight offers a Rapid Application Developing Tool + Scalable Searching Server + Maintenance.
Rapid Application Developing Tool
It's super cool to simply use SQL you defined to extract content, choose some scaffolds, and get a working search, isn't it!
Scalable Searching Server
Don't be fooled by thinking because DBSight is for rapid prototyping, it's not efficient. On the contrary, it's very efficient and well tested by production level servers. Many benchmarks shows normal "search" performance. But that's actually the most simple user case. Facet search performance, especially multi-valued facet search, are very efficient on DBSight. (If you don't know what's the fancy name "Multi-Valued Facet", just think it's tags.)
Many open source projects will be happy to call it success if you just have one setup and running after jumping through many hoops. But they may forget one thing, Maintenance! In reality, the database schema is evolving, the users' search requirement is changing, the index file could get corrupted, or you have development, testing, staging, and production environments. How would you quickly set up the server, or re-index the content, or incremental indexing, or adjust the index structure, or migrate from one instance to another? It's hard, but DBSight has these features under your control.
DBSight is fix-priced, and since it comes directly from developers, it's much lower priced.
Compared to Open Source alternatives
There is no open source software similar to DBSight. Period. Well, of course, you can hire several consultants to do it, creating lots of code to re-invent the wheel. And believe me it's much more expensive. If you do it yourselves, I would suggest you spend your time on more valuable projects, like how to position the product, and let DBSight be your tool. You can safely use the DBSight free version until you have a lot of users. We have some customers who did just like that.
Also remember, it's not an one-time job! Bills from consulting companies will keep coming, because they need to fix a new bug for you. And you need to keep in house experts for the "free" open source code.
And just because your developers want to learn new stuff themselves, you don't need to pay the cost to re-invent the wheel!
DBSight resolves bugs quickly
Some companies take an existing open source project as their own and sell services, but they let customers attempt to figure out the bugs, or they sell you services. Can you really call it as cheap as free? DBSight does not do this kind of tricks. With one year free services, DBSight tries very hard to remove bugs as soon as possible, and make the software as easy as possible, to reduce service calls. Most bugs are resolved within one day, and the UI would guide you smoothly to your goal.
DBSight is a turn-key solution. It will fix all the bugs, hide the complexity to upgrade, maintain, and error handling. You get peace of your mind!
Compared to Commercial alternatives
Let me know if you see one alternative that has similar features! And all of them are really over priced and not flexible to use.
DBSight vs. Lucene
DBSight is based on Lucene. So it inherits all Lucene's advantages.
Yet DBSight is bigger than Lucene. Actually, DBSight is more like Nutch, since DBSight provides crawler, and UI to search and render results.
DBSight vs. Nutch
Since DBSight is working on structured data from database, it crawl on databases rather than web pages like Nutch, or Google. Nutch is trying to learn from Google. See more comparison on DBSight vs. Google.
DBSight vs. Solr, Hibernate Search, etc
DBSight comes from real experience instead of what looks good on paper. Solr, ORM-bundled search like Hibernate Search, etc, are actually too tightly coupled with a real database or web site.
DBSight can batch indexing with high efficiency. You can compare the indexing performance if the alternatives do not simply die with the amount of data.
DBSight also innovate as fast as it can. The internal indexing and search have upgraded several generations, and beats all competitors.
|DBSight||Solr||Hibernate Search, Compass|
|Need Changes to existing application||No||Yes||Yes|
|Search as a Service||Yes||Yes||No|
|Development Cycle||3 minutes||Days,Weeks,Months||Days,Weeks,Months|
|Maintenance Cost||No||High, if glitch happens||High, if glitch happens|
|Change Structure on-the-fly||Yes||No||No|
|End User Query||Yes||Yes||No|
|Flexible UI Scaffolding||Yes||No||No|
|Time based Ranking||Yes||No||No|
|Content Access Control||Yes||No||No|
|Java Coding Needed||No||Yes||Yes|
A List of questions you can ask SOLR supporters
- Why SOLR consulting services are needed? How much is the annual service fee?
- How to move SOLR configuration changes from development to stage to production environments? What to do with existing data?
- What to do if SOLR has some index file error?
- What's the performance under distributed mode?
- What's the performance for multi-valued facet search? This is common for usage like Tagging, where each record could have several values, and each value is like a facet.
- What's the process when SOLR needs to be restarted? Also needs to stop the process that's submitting content to SOLR?
- How to rank latest results higher?
A List of questions you can ask Hibernate/Compass supporters
- When Lucene needs to merge index files, will the application suffer due to CPU/memory/disk usage?
- What to do if there are some index file error? How to rebuild the index?
- What's the performance to batch indexing content, if there is such a process?
- Does it support facet search? Multi-valued facet search?
- What's the flexibility? If adding a feature, just restart the server is fine or rebuilding the whole index is needed?
- How to rank latest results higher?
DBSight is more flexible to create, and what's more important, to manage a search as a service. You can keep adding features without affecting existing application.
Incrementally adding features, more easily to scale.
Here are some example features not found on other approaches.
- Efficient multi-valued facet search. DBSight has the most memory-efficient and best performance multi-valued facet search. DBSight has a unique data structure to efficiently store and search the tags in memory. Tagging is the most widely used multi-valued facet. Simply saying, DBSight does tagging well.
- Sum or Average for facet search, just like SQL's sum()/avg() for each "group by".
- Time-based ranking. DBSight can adjust ranking documents based on time, and does it efficiently.
- Easy to package and re-usable solutions. Business systems are often quite similar. DBSight can package a solution, including index configuration, result rendering, and even scheduling, into a re-usable zip file. You can use it on development, testing, staging, and production environments, or you can apply it to different customers' environment with minor adjustment.
- Easy support of multiple indexes. They could be totally independent indexes, or indexes with partitioned data.
- Much easier to manage. Things happen. You may have corrupted index, failed network, JVM memory runs out. If you have any down time, DBSight can always and safely re-create all the data. If you rely on content submission, you will take the pain to re-submit the content. When time is tight to bring back the services, you would want to use the time-tested DBSight crawler, instead of manually creating some scripts to do the work.
Loosely coupled vs. Tightly coupled
A real system doesn't always have search, or need to have search, on the first day.
DBSight comes with it's own flexible customizable crawler. If DBSight is down for any reason, your system mostly is not affected. If anything wrong, you can simply re-create the index, just by one click from the dashboard.
Solr or ORM-bundled Lucene solutions looks good on paper. But to pump data to create an index, you will need to have the system setup well from the beginning, and you can not make any mistake. If there is any glitch of the main system, for example, some content fails to send to Solr, or a disk error, the index will be incorrect, and it's hard to correct it.
Adjustable Index Structure vs. Fixed Index Structure
As you may know, to speedup different SQL queries, you may need several database indexes on the same table. It is the same for Lucene indexes. You will need different Lucene indexes for different search purpose.
With DBSight's own flexible customizable crawler, you can create different index structures for the same data, for better search performance. You can even create one smaller index for quicker updates, and one larger index for large batch updates on the same data.
It's simply not easy for other alternatives to adjust the index structure. If you want to do some new search.
Separated Indexing and Searching vs. One big machine for Indexing and Searching
Indexing and Searching are often both CPU-intensive and Disk-intensive processes. For large sites, it's not realistic to put them on one computer.
DBSight can put the indexing process on the backend machine(s), and let (several) searching servers handle the search requests.
Solr, or ORM-bundled Search, usually index and search on the same computer. ORM-bundled Search is even worse, it usually sits on your application server, competing resources with your applications. Everything looks good when the traffic is light.
Faster Search on new content
DBSight also supports content submission via HTTP. One difference is, the new content is kept in memory. So it's simply very fast. When background crawling finished, DBSight will update the index on disk and the "new content" submitted via HTTP will be discarded, since it's not new any more and it's just duplicating documents on the new index on disk.
DBSight vs. Google
Search your own data!
Because you can search on data that Google doesn't have: you own database!
Powerful templates for search results
Rather than simply displaying search results by relevance, you can display it in the templates we provide or you customized.
Some features are:
- sort results on fields you choose
- narrow results down by choosing a category when users see how many matches for each category
- subscribe to the search results by RSS
- and more ...
Your own ranking algorithm!
Rather than relying on some unknown algorithms, you can change the ranking by SQL update statements! Can it be any simpler?
More Efficient and faster updates
Using Google, your content in Google may be updated every one month, or more!
Why? Because Google need to go through your entire website to find out which page is updated. Google's aggressive web crawler consume a lot of resources, not only Google's but also YOURS! And if the link is too deep, it may never be updated.
On the contrary, using DBSight, you can handily find out the updated content by a simple SQL query. That's because if you have a database index on the modified date column, content is already organized by time!
You can even schedule your update checking every several minutes! On one of the user case, we have an intranet database which has 20G data in it, and 50 documents updated per minute. We choose to run the sql updating check every 4 minutes. So updates are searchable around 4~8 minutes after it's saved to database.
Can Google's aggressive yet slow web crawler beat that?
DBSight vs. MySQL, or other database
Directly quoted from a customer:
Either way what we tested is 10000000% better then the MSSQL fulltext we have now. Results are FAST and thorough with dbsight.
Lucene is at least 10 times faster than MySQL's full-text search.
MySQL's performance degrades when data size increases.
With gigabits data, DBSight can still achive sub-seconds speed.
Here is just an example, which is not anyway specially-tuned. On our testing box, PIII 450Mhz, 256M RAM, one index is 1.6G, it took 0.31 seconds to find 80,000 matched documents.
Easier! No schema change!
You don't need to change your existing schema.
MySQL, or any other database built-in full-text search, requires you to add sepecial columns. Different database has a different way to do it. It affects the way you add content, update content, and delete content.
It's quite a hassle. You also need to support multiple language, multiple document types, ...
You are not limiting all jobs to one big expensive computer. In the user case mentioned above, the 20G data is on a high-end computer. Yet the DBSight is installed on a Pentium III 450MHz, 256M RAM. It's not a decent machine at all. But it works well!
And you don't need aditional licenses from vendors like Oracle.
And much more!
- Search on CLOB/BLOB
- Support multiple languages, including double character encoding like Chinese, Japanese, Korean.
- Search Result Highlights
- Database Vendor Independent
- Results ranking
- Templates save you from creating everything by yourself
- No need to change your code
- more ...
Off topic: Are open source solution really better?
Hadoop much slower than normal database
Here is a study comparing performance on Hadoop and parallel DBMS.
... we've spent tuning Hadoop has now exceeded the time we spent on either of the other systems...
And Hadoop's speed is 2 times slower than other DBMS at best, and are 36 times and 21 times slower on some other tests.
So, be careful with consulting services from open source experts.
Summary of the study:
Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S.R., and Stonebraker, M. A comparison of approaches to large-scale data analysis. In Proceedings of the 35th SIGMOD International Conference on Management of Data. ACM Press, new York, 2009, 165–178.