Swish++ Indexer Speed

Posted by JD 06/19/2008 at 07:16

Today, I upgraded my backend search indexer to the latest version of swish++. I’ve been using swish-e and swish++ for YEARS and YEARS. We’re talking about 10 here. I’ve also used htdig and been mostly happy with it.

Ok, so back to the reason for this entry. The current version of swish++ indexes my entire site in under 30 seconds. The prior version took 10 minutes or so. A major speed improvement to say the least. I’m doing local file indexing, not going over HTTP.

According to the Feature List, they have

  • Lightning-fast searching
  • Use the same mmap(2) technique used for indexing and used again for searching.
  • The generated index file is written to disk such that it can be mmap’ed back into memory and
  • binary searched immediately, with no parsing of the data, also in O(log n) time.

Fantastic!

I had to change some of my search command options – forcing the results separator to be a TAB character ‘\t’. At first, I searched for a config file setting – no joy. Then I checked command line options and didn’t see any. Into the code. It is C++ … soon, I was modifying the ResultSeparator.h file and replacing the default " " into “\t”. Recompile, install. Update the cgi script that is the front-end to the program so the split() uses a tab. Done. Reindex the entire site and test. Working. Done. 20 minutes of effort.

F/LOSS Rocks. Just try to do that with closed source software. You can’t. If you are trying to run a company, how much is that worth to you? Do you really want to be held hostage by proprietary, closed source software? I’ve seen trivial changes to code estimated at over $100K by software vendors AND I’ve seen large companies pay it. Seems like blackmail to me. Then add in the 15%/yr maintenance charges to jeep updates coming. Nice work if you can get it.

Trackbacks

Use the following link to trackback from your own site:
https://blog.jdpfu.com/trackbacks?article_id=208