Thinking Sphinx Delta Changes

There’s been a bit of changes under the hood with Thinking Sphinx lately, and some of the more recent commits are pretty useful.

Small Stuff

First off, something neat but minor - you can now use decimal, date and timestamp columns as attributes - the plugin automatically maps those to float and datetime types as needed.

There’s also now a cucumber-driven set of feature tests, which can run on MySQL and PostgreSQL. While that’s not important to most users, it makes it much less likely that I’ll break things. It’s also useful for the numerous contributors - just over 50 people as of this week! You all rock!

New Delta Possibilities

The major changes are around delta indexing, though. As well as the default delta column approach, there’s now two other methods of getting your changes into Sphinx. The first, requested by some Ultrasphinx users, and heavily influenced by a fork by Ed Hickey, is datetime-driven deltas. You can use a datetime column (the default is updated_at), and then run the thinking_sphinx:index:delta rake task on a regular basis to load recent changes into Sphinx.

Your define_index block would look something like the following:

define_index do
  # ... field and attribute definitions

  set_property :delta => :datetime, :threshold => 1.day
end

If you want to use a column other than updated_at, set it with the :delta_column option.

The above situation is if you’re running the rake task once a day. The more often you run it, the lower you can set your threshold. This is a bit different to the normal delta approach, as changes will not appear in search results straight away - only whenever the rake task is run.

Delayed Reaction

One of the biggest complaints with the default delta structure is that it didn’t scale. Your delta index got larger and larger every time records were updated, and that meant each change got slower and slower, because the indexing time increased. When running multiple servers, you could get a few indexer processes running at once. That ain’t good.

So now, we have delayed deltas, using the delayed_job plugin. You’ll need to have the job queue being processed (via the thinking_sphinx:delayed_delta rake task), but everything is pushed off into that, instead of overloading your web server. It means the changes take slightly longer to get into Sphinx, but that’s almost certainly not going to be a problem.

Firstly, you’ll need to create the delayed_jobs table (see the delayed_job readme for example code), and then change your define_index block so it looks something like this:

define_index do
  # ... field and attribute definitions

  set_property :delta => :delayed
end

Riddle Update

As part of the restructuring over the last couple of months, I’ve also added some additional code to Riddle, my Ruby API for Sphinx. It now has objects to represent all of the configuration elements of Sphinx (ie: settings for sources, indexes, indexer and searchd), and can generate the configuration file for you. This means you don’t need to worry about doing text manipulation, just do everything in neat, clean Ruby.

Documentation on this is non-existent, mind you, but the source shouldn’t be too hard to grok. I also need to update Thinking Sphinx’s documentation to cover the delta changes - for now, this blog post will have to do. If you get stuck, check out the Google Group.

Sphinx 0.9.9

One more thing: Thinking Sphinx and Riddle now both have Sphinx 0.9.9 branches - not merged into master, as most people are still using Sphinx 0.9.8, but you can find both code sets on GitHub.


Patrick Veverka left a comment on 6 Jan, 2009:

Wow, thanks for this great update. Is there anything different in the new version of Sphinx or Thinking Sphinx that would cause issues with wildcard searches? We’re using an older version of the plugin on one site and it does wildcard searches perfectly fine (i.e. searching for “P” returns “Patrick”, “Parker”, etc.) but when the latest version is installed on a brand new MacBook Pro, it returns no results.

Paul Smith left a comment on 6 Jan, 2009:

Awesome! I just emailed you about this a couple weeks ago and you’ve already added these changes. Thanks so much for all your hard work and an awesome plugin!

pat left a comment on 7 Jan, 2009:

Paul: No problems, glad to know it helps make life easier :)

Patrick: There shouldn’t be anything in Thinking Sphinx that’s caused that change (although there has been a fair bit of work done, so I can’t say for sure). If you’re running Sphinx 0.9.9, I really have no idea… were you using allow_star or enable_star? Perhaps let’s continue this discussion on the google group

Nikolay Kolev left a comment on 9 Jan, 2009:

Are you gonna implement some of the missing features compared to Ultrasphinx anytime soon?

pat left a comment on 10 Jan, 2009:

Hi Nikolay

The only one that I know I’m missing is facets - which I’ve been researching a bit lately. Is there anything else I’ve missed?

Roman Heinrich left a comment on 12 Jan, 2009:

Yeah, faceted search would be VERY cool! I’m using thinking sphinx and the only thing I miss from SOLR is faceted search… Very handy indeed. Is this a hard thing to implement? Maybe the first results are just a couple of hours coding.

Thanks for this amazing plugin!

pat left a comment on 12 Jan, 2009:

Hi Roman

Some people have put together quick solutions - if you search on the google group you should find some references to them. I still don’t feel I understand facets as a concept just yet - although getting close. Want to make sure the solution is solid for TS.

Romain left a comment on 19 Jan, 2009:

Hi,

I was wondering what are the main differences of ThinkingSphinx plugin over UltraSphinx and the other ruby/rails interface to Sphinx ? I am trying to establish which to use for my app.

Any hints and tips welcome !

pat left a comment on 20 Jan, 2009:

Hi Romain

While faceted search is mentioned in above comments, I’ve since added that to Thinking Sphinx. I’m not sure if UltraSphinx supports excerpts, but that’s the main Sphinx feature Thinking Sphinx doesn’t yet support (although there’s a fork or two out there that does, and I do plan to add it).

There is also a comparison blog post by Rein Henrichs.

shawn left a comment on 27 Jan, 2009:

What’s the trick for downloading the 0.9.9 version of the Riddle gem? The github download link doesn’t seem to work…

pat left a comment on 27 Jan, 2009:

Shawn: Ah, that’s not very helpful of GitHub. Try this link for the tar file. You will then need to run gem build riddle.gemspec and then install the gem file that gets generated.

jason left a comment on 25 Feb, 2009:

The delta search is very impressive and being a new convert to the rails community I’m extremely suprised how easy it is to take advantage of such cool technology.
One thing I’m struggling with however is that everything runs fine on my local environment, but when I move it to our hosting company, it does not seem to play well with mod_rails. In the same environment I can spin up mongrel and everything is as expected. Specifically, I have my models configured for delta, and all new items added since last index are still found - as i hoped. When using mod_rails I do not get the same outcome - I can see them through the console with regular search, but not with sphinx search either - and their delta fields are set to true.
The support folks come back to me with - I’ve made some mods to the vhost - try again. Still no luck.
Is this something others have encountered and is it reasonable that it can be fixed in the vhost file of apache?

Your help on this would be greatly appreciated.

pat left a comment on 26 Feb, 2009:

Hi Jason

I know a lot of people have had issues with the PATH variable being different for mod_rails/passenger - so it doesn’t know about the Sphinx executable indexer. That’d be the first thing I’d be checking for… If that doesn’t work, let’s continue this discussion on the google group

Tony Martin left a comment on 11 May, 2009:

Could you clarify which is the stable version to install. The usage page suggests checking out v0.9.5, but this does not appear to support delta index. You mention 0.9.8 above, but cant install, I get ‘v0.9.8’ did not match any file(s) known to git. I am using rails v2.1.1
Thanks

pat left a comment on 11 May, 2009:

Hi Tony

The usage page is woefully out-of-date when it comes to version numbers

  • the latest version on GitHub is 1.1.10. I’ve not been adding tags though, but just get the latest version, and things should work.

Cheers

rajesh left a comment on 5 Jun, 2009:

Hi Pat, I am using thinking sphinx in one of my applications which is search based. It has lot of input every 15 minutes and needs to be indexed. In my do\_index I have: set\_property :delta => :datetime, :delta\_column => :updated\_at, :threshold => 22.minutes I run the cron for delta ‘rake thinking\_sphinx:index:delta RAILS\_ENV=production’ every 20 minutes. But the new data is not getting indexed in expected time. I see the new data in search results after a interval of 4-5 hours. Also, I have to explicitly reindex completely at times. Not getting where exactly the problem is, am I missing anything? Do I need to do merge? I dont see ‘merge’ task in my rake list. Also, I found some .tmp files created in the sphinx folder. I suspect if these are related to the problem in some way. Where can I find the significance of all these file types? When I do ls -l db/sphinx/production: I see the following. total 259136 -rw-r—r— 1 root root 7008576 Jun 4 09:20 item\_core.spa -rw-r—r— 1 root root 7102080 Jun 5 00:40 item\_core.spa.tmp -rw-r—r— 1 root root 96134630 Jun 4 09:20 item\_core.spd -rw-r—r— 1 root root 0 Jun 5 00:40 item\_core.spd.tmp -rw-r—r— 1 root root 347 Jun 4 09:20 item\_core.sph -rw-r—r— 1 root root 2687497 Jun 4 09:20 item\_core.spi -rw-r—r— 1 root root 0 Jun 5 00:40 item\_core.spi.tmp -rw-r—r— 1 root root 0 Jun 3 07:40 item\_core.spk rw——— 1 root root 0 Jun 5 00:40 item\_core.spl -rw-r—r— 1 root root 2628216 Jun 4 09:20 item\_core.spm -rw-r—r— 1 root root 2097152 Jun 5 00:40 item\_core.spm.tmp -rw-r—r— 1 root root 143840229 Jun 4 09:20 item\_core.spp -rw-r—r— 1 root root 0 Jun 5 00:40 item\_core.spp.tmp -rw-r—r— 1 root root 94240 Jun 5 00:40 item\_delta.spa -rw-r—r— 1 root root 1302450 Jun 5 00:40 item\_delta.spd -rw-r—r— 1 root root 347 Jun 5 00:40 item\_delta.sph -rw-r—r— 1 root root 171537 Jun 5 00:40 item\_delta.spi -rw-r—r— 1 root root 0 Jun 3 07:40 item\_delta.spk rw——— 1 root root 0 Jun 5 00:40 item\_delta.spl -rw-r—r— 1 root root 35340 Jun 5 00:40 item\_delta.spm -rw-r—r— 1 root root 1923426 Jun 5 00:40 item\_delta.spp Thanks, Rajesh