Thinking Sphinx Delta Changes
There’s been a bit of changes under the hood with Thinking Sphinx lately, and some of the more recent commits are pretty useful.
First off, something neat but minor - you can now use
timestamp columns as attributes - the plugin automatically maps
datetime types as needed.
There’s also now a cucumber-driven set of feature tests, which can run on MySQL and PostgreSQL. While that’s not important to most users, it makes it much less likely that I’ll break things. It’s also useful for the numerous contributors - just over 50 people as of this week! You all rock!
New Delta Possibilities
The major changes are around delta indexing, though. As well as the
default delta column approach, there’s now two other methods of getting
your changes into Sphinx. The first, requested by some Ultrasphinx
users, and heavily influenced by a fork by Ed
is datetime-driven deltas. You can use a
datetime column (the default
updated_at), and then run the
task on a regular basis to load recent changes into Sphinx.
define_index block would look something like the following:
define_index do # ... field and attribute definitions set_property :delta => :datetime, :threshold => 1.day end
If you want to use a column other than
updated_at, set it with the
The above situation is if you’re running the rake task once a day. The more often you run it, the lower you can set your threshold. This is a bit different to the normal delta approach, as changes will not appear in search results straight away - only whenever the rake task is run.
One of the biggest complaints with the default delta structure is that
it didn’t scale. Your delta index got larger and larger every time
records were updated, and that meant each change got slower and slower,
because the indexing time increased. When running multiple servers, you
could get a few
indexer processes running at once. That ain’t good.
So now, we have delayed deltas, using the
delayed_job plugin. You’ll need
to have the job queue being processed (via the
thinking_sphinx:delayed_delta rake task), but everything is pushed off
into that, instead of overloading your web server. It means the changes
take slightly longer to get into Sphinx, but that’s almost certainly not
going to be a problem.
Firstly, you’ll need to create the
delayed_jobs table (see the
delayed_job readme for example code), and then change your
define_index block so it looks something like this:
define_index do # ... field and attribute definitions set_property :delta => :delayed end
As part of the restructuring over the last couple of months, I’ve also added some additional code to Riddle, my Ruby API for Sphinx. It now has objects to represent all of the configuration elements of Sphinx (ie: settings for sources, indexes, indexer and searchd), and can generate the configuration file for you. This means you don’t need to worry about doing text manipulation, just do everything in neat, clean Ruby.
Documentation on this is non-existent, mind you, but the source shouldn’t be too hard to grok. I also need to update Thinking Sphinx’s documentation to cover the delta changes - for now, this blog post will have to do. If you get stuck, check out the Google Group.