Swish++ Indexer Speed

Posted by JD 06/19/2008 at 07:16

Today, I upgraded my backend search indexer to the latest version of swish++. I’ve been using swish-e and swish++ for YEARS and YEARS. We’re talking about 10 here. I’ve also used htdig and been mostly happy with it.

Ok, so back to the reason for this entry. The current version of swish++ indexes my entire site in under 30 seconds. The prior version took 10 minutes or so. A major speed improvement to say the least. I’m doing local file indexing, not going over HTTP.

According to the Feature List, they have

Lightning-fast searching
Use the same mmap(2) technique used for indexing and used again for searching.
The generated index file is written to disk such that it can be mmap’ed back into memory and
binary searched immediately, with no parsing of the data, also in O(log n) time.

Fantastic!

I had to change some of my search command options – forcing the results separator to be a TAB character ‘\t’. At first, I searched for a config file setting – no joy. Then I checked command line options and didn’t see any. Into the code. It is C++ … soon, I was modifying the ResultSeparator.h file and replacing the default " " into “\t”. Recompile, install. Update the cgi script that is the front-end to the program so the split() uses a tab. Done. Reindex the entire site and test. Working. Done. 20 minutes of effort.

F/LOSS Rocks. Just try to do that with closed source software. You can’t. If you are trying to run a company, how much is that worth to you? Do you really want to be held hostage by proprietary, closed source software? I’ve seen trivial changes to code estimated at over $100K by software vendors AND I’ve seen large companies pay it. Seems like blackmail to me. Then add in the 15%/yr maintenance charges to jeep updates coming. Nice work if you can get it.

Posted in Computers
Tags linux
Meta no trackbacks, no comments, permalink, rss, atom

Trackbacks

Use the following link to trackback from your own site:
https://blog.jdpfu.com/trackbacks?article_id=208

No comments

JDPFu.com 2025

Open Source Solution Architect

Swish++ Indexer Speed

Trackbacks

Links

Pages

Syndicate

Tags