Freelancing Gods 2014

God
06 Jan 2009

Thinking Sphinx Delta Changes

There’s been a bit of changes under the hood with Thinking Sphinx lately, and some of the more recent commits are pretty useful.

Small Stuff

First off, something neat but minor – you can now use decimal, date and timestamp columns as attributes – the plugin automatically maps those to float and datetime types as needed.

There’s also now a cucumber-driven set of feature tests, which can run on MySQL and PostgreSQL. While that’s not important to most users, it makes it much less likely that I’ll break things. It’s also useful for the numerous contributors – just over 50 people as of this week! You all rock!

New Delta Possibilities

The major changes are around delta indexing, though. As well as the default delta column approach, there’s now two other methods of getting your changes into Sphinx. The first, requested by some Ultrasphinx users, and heavily influenced by a fork by Ed Hickey, is datetime-driven deltas. You can use a datetime column (the default is updated_at), and then run the thinking_sphinx:index:delta rake task on a regular basis to load recent changes into Sphinx.

Your define_index block would look something like the following:

define_index do
  # ... field and attribute definitions
  
  set_property :delta => :datetime, :threshold => 1.day
end

If you want to use a column other than updated_at, set it with the :delta_column option.

The above situation is if you’re running the rake task once a day. The more often you run it, the lower you can set your threshold. This is a bit different to the normal delta approach, as changes will not appear in search results straight away – only whenever the rake task is run.

Delayed Reaction

One of the biggest complaints with the default delta structure is that it didn’t scale. Your delta index got larger and larger every time records were updated, and that meant each change got slower and slower, because the indexing time increased. When running multiple servers, you could get a few indexer processes running at once. That ain’t good.

So now, we have delayed deltas, using the delayed_job plugin. You’ll need to have the job queue being processed (via the thinking_sphinx:delayed_delta rake task), but everything is pushed off into that, instead of overloading your web server. It means the changes take slightly longer to get into Sphinx, but that’s almost certainly not going to be a problem.

Firstly, you’ll need to create the delayed_jobs table (see the delayed_job readme for example code), and then change your define_index block so it looks something like this:

define_index do
  # ... field and attribute definitions
  
  set_property :delta => :delayed
end

Riddle Update

As part of the restructuring over the last couple of months, I’ve also added some additional code to Riddle, my Ruby API for Sphinx. It now has objects to represent all of the configuration elements of Sphinx (ie: settings for sources, indexes, indexer and searchd), and can generate the configuration file for you. This means you don’t need to worry about doing text manipulation, just do everything in neat, clean Ruby.

Documentation on this is non-existent, mind you, but the source shouldn’t be too hard to grok. I also need to update Thinking Sphinx’s documentation to cover the delta changes – for now, this blog post will have to do. If you get stuck, check out the Google Group.

Sphinx 0.9.9

One more thing: Thinking Sphinx and Riddle now both have Sphinx 0.9.9 branches – not merged into master, as most people are still using Sphinx 0.9.8, but you can find both code sets on GitHub.

12 Jul 2008

Link: Jedlinski.pl devblog » Thinking Sphinx as Windows service

"Thinking Sphinx for Windows - a batch of simple rake tasks dedicated for Windows users."

12 Jun 2008

A Concise Guide to Using Thinking Sphinx

Okay, it’s well past time for the companion piece to my Sphinx Primer – let’s go through the basic process of using Thinking Sphinx with Rails.

Just to recap: Sphinx is a search engine that indexes data, and then you can query it with search terms to find out which documents are relevant. Why do you want to use it with Rails? Because it saves having to write messy SQL, and it’s so damn fast.

(If you’re getting a feeling of deja-vu, then it’s probably because you’ve read an old post on this blog that dealt with an old version of Thinking Sphinx. I’ve had a few requests for an updated article, so this is it.)

Installation

So: first step is to install Sphinx. This may be tricky on some systems – but I’ve never had a problem with it with Mac OS X or Ubuntu. My process is thus:

curl -O http://sphinxsearch.com/downloads/sphinx-0.9.8-rc2.tar.gz
tar zxvf sphinx-0.9.8-rc2.tar.gz
cd sphinx-0.9.8-rc2
./configure
make
sudo make install

If you’re using Windows, you can just grab the binaries.

Once that’s taken care of, you then want to take your Rails app, and install the plugin. If you’re running edge or 2.1, this is a piece of cake:

script/plugin install git://github.com/freelancing-god/thinking-sphinx.git

Otherwise, you’ve got a couple of options. The first is, if you have git installed, just clone to your vendor/plugins directory:

git clone git://github.com/freelancing-god/thinking-sphinx.git
  vendor/plugins/thinking-sphinx

If you’re not yet using git, then the easiest way is to download the tar file of the code. Try the following:

curl -L http://github.com/freelancing-god/thinking-sphinx/tarball/master
  -o thinking-sphinx.tar.gz
tar -xvf thinking-sphinx.tar.gz -C vendor/plugins
mv vendor/plugins/freelancing-god-thinking-sphinx* vendor/plugins/thinking-sphinx

Oh, and it’s worth noting: if you’re not using MySQL or PostgreSQL, then you’re out of luck – Sphinx doesn’t talk to any other relational databases.

Configuration

Next step: let’s get a model or two indexed. It might be worth refreshing your memory on what fields and attributes are for – can I recommend my Sphinx article (because I’m not at all biased)?

Ok, now let’s work with a simple Person model, and add a few fields:

class Person < ActiveRecord::Base
  define_index do
    indexes [first_name, last_name], :as => :name
    indexes location
  end
end

Nothing too scary – we’ve added two fields. The first is the first and last names of a person combined to one field with the alias ‘name’. The second is simply location.

Adding attributes is just as easy:

define_index do
  # ...
  
  has birthday
end

This attribute is the datetime value birthday (so you can now sort and filter your results by birthdays).

Managing Sphinx

We’ve set up a basic index – now what? We tell Sphinx to index the data, and then we can start searching. Rake is our friend for this:

rake thinking_sphinx:index
rake thinking_sphinx:start

Searching

Now for the fun stuff:

Person.search "Melbourne"

Or with some sorting:

Person.search "Melbourne", :order => :birthday

Or just people born within a 10 year window:

Person.search "Melbourne", :with => {:birthday => 25.years.ago..15.years.ago}

If you want to keep certain search terms to specific fields, use :conditions:

Person.search :conditions => {:location => "Melbourne"}

Just remember: :conditions is for fields, :with is for attributes (and :without for exclusive attribute filters).

Change

Your data changes – but unfortunately, Sphinx doesn’t update your indexes to match automatically. So there’s two things you need to do. Firstly, run rake thinking_sphinx:index regularly (using cron or something similar). ‘Regularly’ can mean whatever time frame you want – weekly, daily, hourly.

The second step is optional, but it’s needed to have your indexes always up to date. First, add a boolean column to your model, named ‘delta’, and have it default to true. Then, tell your index to use that delta field to keep track of changes:

define_index do
  # ...
  
  set_property :delta => true
end

Then you need to tell Sphinx about the updates:

rake thinking_sphinx:stop
rake thinking_sphinx:index
rake thinking_sphinx:start

Once that’s done, a delta index will be created – which holds any recent changes (since the last proper indexing), and gets re-indexed whenever a model is edited or created. This doesn’t mean you can stop the regular indexing, as that’s needed to keep delta indexes as small (and fast) as possible.

String Sorting

If you remember the details about fields and attributes, you’ll know that you can’t sort by fields. Which is a pain, but there’s ways around this – and it’s kept pretty damn easy in Thinking Sphinx. Let’s say we wanted to make our name field sortable:

define_index do
  indexes [first_name, last_name], :as => :name, :sortable => true
  
  # ...
end

Re-index and restart Sphinx, and sorting by name will work.

How is this done? Thinking Sphinx creates an attribute under the hood, called name_sort, and uses that, as Sphinx is quite fine with sorting by strings if they’re converted to ordinal values (which happens automatically when they’re attributes).

Pagination

Sphinx paginates automatically – in fact, there’s no way of turning that off. But that’s okay… as long as you can use your will_paginate helper, right? Never fear, Thinking Sphinx plays nicely with will_paginate, so your views don’t need to change at all:

<%= will_paginate @search_results %>

Associations

Sometimes you’ll want data in your fields (or attributes) from associations. This is a piece of cake:

define_index do
  indexes photos.caption, :as => :captions
  indexes friends.photos.caption, :as => :friends_photos
  
  # ...
end

Polymorphic associations are fine as well – but keep in mind, the more complex your index fields and attributes, the slower it will be for Sphinx to index (and you’ll definitely need some database indexes on foreign key columns to help it stay as speedy as possible).

Gotchas

In case things aren’t working, here’s some things to keep in mind:

  • Added an attribute, but can’t sort or filter by it? Have you reindexed and restarted Sphinx? It doesn’t automatically pick up these changes.
  • Sorting not working? If you’re specifying the attribute to sort by as a string, you’ll need to include the direction to sort by, just like with SQL: “birthday ASC”.
  • Using name or id columns in your fields or attributes? Make sure you specify them using symbols, as they’re core class methods in Ruby.
define_index do
  indexes :name
  
  # ...
  
  has photos(:id), :as => :photo_ids
end

And Next?

I realise this article is pretty light on details – but if you want more information, the first stop should be the extended usage page on the Thinking Sphinx site, quickly followed by the documentation. There’s also an email list to ask questions on.

21 May 2008

Sphinx + Rails + PostgreSQL

In case you’ve not been watching every commit carefully flow through Thinking Sphinx on GitHub – PostgreSQL support has been added. I’ve done a little bit of testing, and I’ve had some excellent support from Björn Andreasson and Tim Riley, so I feel it’s ready for people to start kicking the tires.

I’m no PostgreSQL expert – I definitely fall into the n00b category – so if you think there’s better ways to do what I’m doing, would love to hear them.

11 May 2008

Updates for Thinking Sphinx

I’ve been working away on Thinking Sphinx when I’ve had the time – and we’re nearing what I feel is a solid 1.0 release. I say we, because I’ve had some very generous people supply patches over the last few weeks – namely Arthur Zapparoli, James Healy, Chris Heald and Jae-Jun Hwang. Switching to git – and GitHub in particular – has made it very easy to accept patches.

Mind you, all of these changes aren’t committed just yet – and even when they are, there’ll still be a few more things to cross off the list before we hit the big 1.0 milestone, namely PostgreSQL support and solid spec coverage. Slowly edging closer.

In other news – to help share the Thinking Sphinx knowledge (after some prompting by a few users of the plugin), I’ve created a Google Group for it – so this will be the ideal place to ask questions about how to implement Sphinx into your app, suggest features, or report bugs.

If you’ve been pondering how to deploy Thinking Sphinx via Capistrano, I recommend you read a blog post by Wade Winningham – or if you’re interested in better ways of handling UTF-8 characters outside of the ‘normal’ set (ie: without accents and so forth), make sure you peruse James Healy’s solution.

And one last reminder – if you’re in Sydney on Wednesday evening and interested in learning a bit more about Sphinx in general and Thinking Sphinx in particular, come along to the monthly Ruby meet at the Crown Hotel in Surry Hills, as those are the topics I’ll be presenting on.

10 Apr 2008

Thinking Sphinx Reborn

So, over the last month or so I’ve been working hard on rewriting Thinking Sphinx – and it’s now time to release those changes publicly. The site’s now got a brief quickstart page and a detailed usage page beyond the rdoc files, and there will be more added over the coming weeks.

A quick overview of what’s shiny and new:

Better index definition syntax

This part reworked many times, finally to something I’m pretty happy with:

define_index do
  indexes [first_name, last_name], :as => :name, :sortable => true
  indexes email, location
  indexes [posts.content, posts.subject], :as => :posts
end

Polymorphic association support in indexes

When you’re drilling down into your associations for relevant field data, it’s now safe to use polymorphic associations – Thinking Sphinx is pretty smart about figuring what models to look at. Make sure you put table indexes on your _type columns though.

MVA Support

Multi-Value Attributes now work nice and cleanly – so you can tie an array of integers to any record.

Multi-Model Searching

Just like before, you can search for records of a specific model. This time around though, you can also search across all your models – and the results still use will_paginate if it’s installed.

ThinkingSphinx::Search.search "help"

Better Filter Support

It was kinda in there to start with, but now it’s much smarter – and it all goes into the conditions hash, just like a find call:

User.search :conditions => {:role_id => 5}
Article.search :conditions => {:author_ids => [12, 24, 48]}

Sorting by Fields

As you may have noticed in the first code block of this post, you can mark fields as :sortable – what this does is it uses Sphinx’s string attributes, and creates a matching attribute that acts as a sort-index to the field. When specifying the search options though, you can just use the field’s name – Thinking Sphinx knows what you’re talking about.

User.search "John", :order => :name
User.search "Smith", :order => "name DESC"

Even More

I’m so eager to share this new release that there’s probably a few things that need a bit more documentation – that will appear both on the Thinking Sphinx site and here on the blog. I’m planning on writing some articles that provide a solid overview to Sphinx in general – which will hopefully be some help no matter what plugin you use – and then dive into some regular ‘recipes’ of Thinking Sphinx usage, and some detailed posts of the cool new features as well.

Also in the pipeline is Merb support – just for ActiveRecord initially, but I’d love to get it talking to DataMapper as well.

Update: Jonathan Conway’s got a branch working in Merb and Rails – needless to say, I’ll be updating trunk with his patch as soon as possible.

14 Mar 2008

Sphinx 0.9.8-rc1 Updates

Another small sphinx-related post.

In line with the first release candidate release of Sphinx 0.9.8 last week, I’ve updated both my API, Riddle, and my plugin, Thinking Sphinx, to support it. Also, for those inclined, you can now get Riddle as a gem.

I’m slowly making progress on some major changes to Thinking Sphinx, so hopefully I’ll have something cool to show people soon. Oh, but some features that aren’t reflected in the documentation: most of Sphinx’s search options can be passed through when you call Model.search – including :group_by, :group_function, :field_weights, :sort_mode, etc. Consider it an exercise for the reader to figure out the details until I get around to improving the docs.

17 Jan 2008

Sphinx 0.9.8r1065

Short post, as befitting the importance of the content: Riddle and Thinking Sphinx have both been updated to support the current version of Sphinx, 0.9.8r1065.

07 Jan 2008

Link: Le-Blog-à-Dam - Page Cache Test - Rails Cache Test Plugin

If I get some spare time, this is something that would be nice to adapt to rspec

27 Dec 2007

Updates for Sphinx 0.9.8r985

Another quick Sphinx post – Riddle is updated to support Sphinx’s latest release (0.9.8r985), and Thinking Sphinx now has that new version of Riddle as well.

I’ve not tested any of this with the recently released Ruby 1.9 yet, though (but it’s on my list of things to do).

Also, thank-you to Joost Hietbrink (again) and Jonathan Conway for their patches to Thinking Sphinx – very much appreciated.

02 Dec 2007

Sphinx-related Updates

Two Sphinx-related tidbits:

Riddle

Riddle now has a tag in SVN for the 0.9.8-r909 release of Sphinx – not that there were really any functional changes compared to r871, besides the two extra match modes (Full Scan and Extended 2 – the latter isn’t going to hang around for long anyway).

Thinking Sphinx

As well as supporting the above version of Sphinx, I’ve now added some brief documentation to Thinking Sphinx that discusses attributes, sorting and delta indexes. To summarise kinda-briefly:

Attributes

Attributes are defined in the define_index block as follows:

define_index do |index|
  index.has.created_at
  index.has.updated_at
  # Field definitions go here
end

They can only be from the indexed model (not associations), and in line with Sphinx’s limitations, must be either integers, floats or timestamps.

Sorting

Ties in closely with attributes – as that’s all Sphinx will let you order by. Use it in the same way as you would in a find call:

Invoice.search :conditions => "expensive", :order => "created_at DESC"

Same approach works for the :include parameter (although this has nothing to do with Sphinx itself).

Delta Indexes

Delta indexes track changes to model records between proper indexes (ie: from the rake task thinking_sphinx:index) – all they require is a boolean field in the model’s table called delta, and for delta indexing to be enabled as follows:

define_index do |index|
  index.delta = true
  # Fields and attributes go here
end

The one catch – at this point, delta indexes are one step off current, as they get indexed before the current transaction to the database is committed. This will get better soon, thanks to some help from Joost Hietbrink and his colleagues at YelloYello – once I find some free time, I’ll get that working much more neatly.

14 Nov 2007

Sphinx's Riddle

Edit: I’ve changed the Subversion reference to Github, and it’s worth noting that Riddle works with Sphinx 0.9.8, 0.9.9 and 1.10-beta at the time of writing (January 2011). Original post continues below:

Built out of the work I’ve done for Thinking Sphinx (which has just got basic support for delta indexes, attributes and sorting – although the documentation doesn’t reflect that), I’ve extracted a new Ruby client that communicates with Sphinx, which I’ve named Riddle.

I’m not going to delve into the code here – because I’m not expecting it to be that useful to many people (and I just wrote examples in the documentation – go read that instead!) – but I’m very happy with how it’s ended up, and it’s got some level of specs to give it a thorough test. It’s also compatible with the most recent release of Sphinx (0.9.8 r871). Should you wish to poke around with it, just clone it from Github:

git clone \
  git://github.com/freelancing-god/riddle.git

It’s also being used in Evan Weaver’s UltraSphinx plugin, which I’m pretty pleased about.

21 Oct 2007

Link: James on Software: Introducing resource_controller: Focus on what makes your controller special.

18 Oct 2007

Link: ar-backup - Google Code

"Active Record backup is a Rails plugin which lets you backup your database schema and content in a schema.rb file and fixtures."

09 Oct 2007

Link: Hivelogic: Enkoder Rails Plugin

Plugin that encodes email links (and other content)

03 Oct 2007

A Thoughtful Sphinx

In one of the projects I’ve been working on lately, I’ve needed to implement a decent search – and so I looked at both Ferret and Sphinx. I ended up choosing the latter, although I’m not sure why – perhaps just to be different (most people I spoke to are using ferret), or perhaps because the setup seemed easier.

The next step was to pick a Sphinx plugin to work with. Ultrasphinx seemed to have a good set of features (particularly pagination), and supported fields from associations within indexes – something critical for what we were doing.

Unfortunately, grabbing fields from associations wasn’t that easy – and the SQL generated for the Sphinx configuration file was overly complex. I could (and did) change the config file manually, but that makes half the usefulness of the plugin worthless.

So, since I had some spare time, I wrote my own plugin. Much like Rails, it favours convention over configuration – perhaps a little too much so at this point, but I do plan to make it more flexible at some point. Installation is the same as any other plugin:

script/plugin install
  http://rails-oceania.googlecode.com/svn/patallan/thinking_sphinx

An example of defining indexes (within a model class):

define_index do |index|
  index.includes.email
  index.includes(:first_name, :last_name).as.name
  index.includes.tags.key.as.tags
  index.includes.articles.content.as.article_content
end

To index the data, just use the thinkingsphinx:index rake task (aliased to ts:in) – which will also generate the configuration file on the fly. My goal is to make changing the configuration file manually unnecessary – making the index task build the configuration file helps enforce this.

And to search:

# Searching particular fields
User.search(:conditions => {:name => "Pat"})
# Or searching all fields
User.search("Pat")
# Pagination is built in
User.search("Pat", :page => (params[:page] || 1))

Paginated results can also be used by the will_paginate helper from the plugin of the same name. Current documentation can be found on this site.

I managed to use ActiveRecord’s join and associations code, which kept my plugin reasonably lean. For interactions with Sphinx’s searchd daemon, I did look at Dmytro Shteflyuk’s Ruby Sphinx Client API, but the non-ruby-like syntax irritated me, so again, I coded my own – heavily influenced by the original though (ie: he did all the hard work, not me).

There’s no support for some way to update the index pseudo-incrementally (something that is a limitation of Sphinx). If I don’t feel like the incremental updating works well enough, then I may switch to Ferret – which might lead to a Thinking Ferret plugin, perhaps. We’ll just have to wait and see.

Nov 14th 2007 – Update: I’ve just released the internal Sphinx client as its own library – Riddle.

01 Oct 2007

Conditional Caching

Whenever I’ve described caching in Rails to anyone who isn’t familiar with it, I have made clear the limitations of each method:

  • Page caching is only useful when the output is exactly the same for every visitor, and you don’t need to confirm user authentication
  • Action caching allows you to run filters for every request – thus can be used to check if users are authenticated – but the rendered output has the same limitations as page caching
  • Fragment caching, while flexible, is definitely slower than the other two options.

Obviously, it’s best to use the fastest caching possible that fits your pages. That’s rarely page caching for the sites I code. Even action caching hasn’t been viable too often. Or so I thought.

I used to use fragment caching in most of my views – generally with extra parameters to indicate user role, so it would store different versions of the fragment for each role (ie: admin user, normal user, no user). This worked reasonably well, but often I was wrapping an entire view in a <% cache do %> block – almost action caching!

In the site I was coding at the time, ausdwcon.org, I realised the best time to cache would be when no user was logged in – as that would cover the majority of requests. So, to simplify: what I wanted was caching only when a certain condition was true.

Getting methods coded for conditional caching at a fragment level was a piece of cake. At the action caching level? Well, that was trickier, but with some help from the RORO crew I got it working, and into a plugin. You can find the code in the RORO svn repository and the documentation on this site. To install:

script/plugin install
  http://rails-oceania.googlecode.com/svn/patallan/conditional_caching

One brief example so you know what to expect:

# a controller
caches_action :index, :if => :no_user?

# application.rb
def no_user?
  session[:user_id].empty?
end

Obviously the :if parameter is the key – it can be a symbol pointer to an instance method on the controller, or it can be a Proc which is evaluated in the scope of the controller.

Now, the no-user condition is the only example I can think of where my plugin is useful – if you think of others, please let me know. Keep reading though, because I’ve got another helpful hint or two to share.

I mentioned above my usual method of using fragment caching – liberal use of extra parameters. It’s actually possible to do this with action caching too – which I only found out recently, so I’m assuming there are other people out there who aren’t aware of it either.

It’s not quite as easy as the equivalent fragment caching code, as you’re using both class and object level methods, but here’s an example:

# a controller
caches_action :index, :show, :cache_path => cache_params

# application.rb
def self.cache_params
  @cache_params ||= Proc.new { |controller|
    controller.params.merge(:role => controller.current_role)
  }
end

This has actually made me cut back on usage of my plugin – because most of my pages don’t have user-specific content.

Oh, and one more thing – to cut down on user-specific content in my views, I’ve been mapping my users controller as both a normal resource and a singleton resource. This means instead of a “Your Profile” link being /users/34, it’s just /user. Makes the controller code a little tricky, and the named routes get confused, but nothing a few clever helper methods can’t fix.

03 Aug 2007

Link: SpinBits // Services

30 Apr 2007

Link: bleak_house :: evan weaver

14 Mar 2007

Link: scie.nti.st » Exception Notifier Plugin and bad routes

RssSubscribe to the RSS feed

About Freelancing Gods

Freelancing Gods is written by , who works on the web as a web developer in Melbourne, Australia, specialising in Ruby on Rails.

In case you're wondering what the likely content here will be about (besides code), keep in mind that Pat is passionate about the internet, music, politics, comedy, bringing people together, and making a difference. And pancakes.

His ego isn't as bad as you may think. Honest.

Here's more than you ever wanted to know.

Ruby on Rails Projects

Other Sites

Creative Commons Logo All original content on this site is available through a Creative Commons by-nc-sa licence.