Freelancing Gods 2009

God
12 Jun 2008

A Concise Guide to Using Thinking Sphinx

Okay, it’s well past time for the companion piece to my Sphinx Primer – let’s go through the basic process of using Thinking Sphinx with Rails.

Just to recap: Sphinx is a search engine that indexes data, and then you can query it with search terms to find out which documents are relevant. Why do you want to use it with Rails? Because it saves having to write messy SQL, and it’s so damn fast.

(If you’re getting a feeling of deja-vu, then it’s probably because you’ve read an old post on this blog that dealt with an old version of Thinking Sphinx. I’ve had a few requests for an updated article, so this is it.)

Installation

So: first step is to install Sphinx. This may be tricky on some systems – but I’ve never had a problem with it with Mac OS X or Ubuntu. My process is thus:

curl -O http://sphinxsearch.com/downloads/sphinx-0.9.8-rc2.tar.gz
tar zxvf sphinx-0.9.8-rc2.tar.gz
cd sphinx-0.9.8-rc2
./configure
make
sudo make install

If you’re using Windows, you can just grab the binaries.

Once that’s taken care of, you then want to take your Rails app, and install the plugin. If you’re running edge or 2.1, this is a piece of cake:

script/plugin install git://github.com/freelancing-god/thinking-sphinx.git

Otherwise, you’ve got a couple of options. The first is, if you have git installed, just clone to your vendor/plugins directory:

git clone git://github.com/freelancing-god/thinking-sphinx.git
  vendor/plugins/thinking-sphinx

If you’re not yet using git, then the easiest way is to download the tar file of the code. Try the following:

curl -L http://github.com/freelancing-god/thinking-sphinx/tarball/master
  -o thinking-sphinx.tar.gz
tar -xvf thinking-sphinx.tar.gz -C vendor/plugins
mv vendor/plugins/freelancing-god-thinking-sphinx* vendor/plugins/thinking-sphinx

Oh, and it’s worth noting: if you’re not using MySQL or PostgreSQL, then you’re out of luck – Sphinx doesn’t talk to any other relational databases.

Configuration

Next step: let’s get a model or two indexed. It might be worth refreshing your memory on what fields and attributes are for – can I recommend my Sphinx article (because I’m not at all biased)?

Ok, now let’s work with a simple Person model, and add a few fields:

class Person < ActiveRecord::Base
  define_index do
    indexes [first_name, last_name], :as => :name
    indexes location
  end
end

Nothing too scary – we’ve added two fields. The first is the first and last names of a person combined to one field with the alias ‘name’. The second is simply location.

Adding attributes is just as easy:

define_index do
  # ...

  has birthday
end

This attribute is the datetime value birthday (so you can now sort and filter your results by birthdays).

Managing Sphinx

We’ve set up a basic index – now what? We tell Sphinx to index the data, and then we can start searching. Rake is our friend for this:

rake thinking_sphinx:index
rake thinking_sphinx:start

Searching

Now for the fun stuff:

Person.search "Melbourne"

Or with some sorting:

Person.search "Melbourne", :order => :birthday

Or just people born within a 10 year window:

Person.search "Melbourne", :with => {:birthday => 25.years.ago..15.years.ago}

If you want to keep certain search terms to specific fields, use :conditions:

Person.search :conditions => {:location => "Melbourne"}

Just remember: :conditions is for fields, :with is for attributes (and :without for exclusive attribute filters).

Change

Your data changes – but unfortunately, Sphinx doesn’t update your indexes to match automatically. So there’s two things you need to do. Firstly, run rake thinking_sphinx:index regularly (using cron or something similar). ‘Regularly’ can mean whatever time frame you want – weekly, daily, hourly.

The second step is optional, but it’s needed to have your indexes always up to date. First, add a boolean column to your model, named ‘delta’, and have it default to false. Then, tell your index to use that delta field to keep track of changes:

define_index do
  # ...

  set_property :delta => true
end

Then you need to tell Sphinx about the updates:

rake thinking_sphinx:stop
rake thinking_sphinx:index
rake thinking_sphinx:start

Once that’s done, a delta index will be created – which holds any recent changes (since the last proper indexing), and gets re-indexed whenever a model is edited or created. This doesn’t mean you can stop the regular indexing, as that’s needed to keep delta indexes as small (and fast) as possible.

String Sorting

If you remember the details about fields and attributes, you’ll know that you can’t sort by fields. Which is a pain, but there’s ways around this – and it’s kept pretty damn easy in Thinking Sphinx. Let’s say we wanted to make our name field sortable:

define_index do
  indexes [first_name, last_name], :as => :name, :sortable => true

  # ...
end

Re-index and restart Sphinx, and sorting by name will work.

How is this done? Thinking Sphinx creates an attribute under the hood, called name_sort, and uses that, as Sphinx is quite fine with sorting by strings if they’re converted to ordinal values (which happens automatically when they’re attributes).

Pagination

Sphinx paginates automatically – in fact, there’s no way of turning that off. But that’s okay… as long as you can use your will_paginate helper, right? Never fear, Thinking Sphinx plays nicely with will_paginate, so your views don’t need to change at all:

<%= will_paginate @search_results %>

Associations

Sometimes you’ll want data in your fields (or attributes) from associations. This is a piece of cake:

define_index do
  indexes photos.caption, :as => :captions
  indexes friends.photos.caption, :as => :friends_photos

  # ...
end

Polymorphic associations are fine as well – but keep in mind, the more complex your index fields and attributes, the slower it will be for Sphinx to index (and you’ll definitely need some database indexes on foreign key columns to help it stay as speedy as possible).

Gotchas

In case things aren’t working, here’s some things to keep in mind:

  • Added an attribute, but can’t sort or filter by it? Have you reindexed and restarted Sphinx? It doesn’t automatically pick up these changes.
  • Sorting not working? If you’re specifying the attribute to sort by as a string, you’ll need to include the direction to sort by, just like with SQL: “birthday ASC”.
  • Using name or id columns in your fields or attributes? Make sure you specify them using symbols, as they’re core class methods in Ruby.
define_index do
  indexes :name

  # ...

  has photos(:id), :as => :photo_ids
end

And Next?

I realise this article is pretty light on details – but if you want more information, the first stop should be the extended usage page on the Thinking Sphinx site, quickly followed by the documentation. There’s also an email list to ask questions on.

Comments

29 responses to this article

13 Jun 2008
Andrew Zielinski said:

Nice Introduction to Thinking Sphinx. Thanks!

13 Jun 2008
Benny said:

Can you make a quick rundown why Thinking Sphinx is to be preferred other other Sphinx helpers like Ultrasphinx ?

Which one has covered most of Sphinx’s features, etc.

13 Jun 2008
Marston A said:

Great article. I’m also curious if you’ve compared this to UltraSphinx and how they stack up?

13 Jun 2008
pat said:

Andrew: no problems.

Benny and Marston: Ultrasphinx is more complex – which is useful if you want to tweak Sphinx quite a bit. Thinking Sphinx’s goal is convention over configuration – although you can set most things to whatever you wish, it makes assumptions on file locations and such, so you can just start using Sphinx as quickly as possible.

Thinking Sphinx also allows you to set all the settings you wish through Ruby and YAML - so you don’t need to learn about the configuration syntax of Sphinx if you don’t wish to.

Beyond that, there’s very little feature differences between the two (indeed, both use my Sphinx API called Riddle to talk to Sphinx, so there’s a level of commonality there).

As for other plugins – to the best of my knowledge, they’re not actively maintained, and don’t use an API that’s inline with the latest version of Sphinx.

18 Jun 2008
Justin said:

Currently using ferret but I think I will give this a try. Good info, Thanks.

25 Jun 2008
Matthew Bergman said:

Ultrasphinx might be more complex but I actually find it less useful in terms of complex searches. Took me forever to index children due to an error it has in concatenating.

26 Jun 2008
Marco Bergantin said:

Hi, cool work!
just a tip about grouping with acts_as_taggable_on_steroids:
Tagging.search_with_results( :conditions=>:taggable_type=>‘Site’}, :group_by=>‘tag_ids’, :group_clause=>’@count desc’, :group_function=>:attr)
search_with_results == is the same than search but get directly the results Hash with @count attribute.

14 Jul 2008
Adrian said:

Will Thinking Sphinx work with multiple databases in Rails, ala hijacking connections on a per-request basis?

16 Jul 2008
Ahsan said:

Hi there,

I have a model with first_name and last_name, but thinking_sphinx chokes on this:
indexes [first_name, last_name], :as => :name, :sortable => true

And raises this: “Cannot define a field with no columns. Maybe you are trying to index a field with a reserved name (id, name). You can fix this error by using a symbol rather than a bare name (:id instead of id).”

I’m not sure why because my model doesn’t have any name attribute. I also tried changing :as => :name to :as => :something_else

Any ideas ?

16 Jul 2008
Ahsan said:

Ok, I realized what I was doing wrong:

I was doing: indexes [:first_name, :last_name]

No need for symbols. Still, the error was misleading.

16 Jul 2008
pat said:

Ahsan, you’re right, it’s a misleading error message – and ideally, TS should be fine with symbols anyway. Consider it a bug that needs fixing.

Thanks for the comments.

23 Jul 2008
Chris said:

After much headbashing with Sphinx and acts_as_sphinx I decided to try Thinking Sphinx instead. Glad I did. ‘tis marvellous!!

29 Jul 2008
michael said:

First of all… this plugin is great. i have tried it and it works very easy.

How do you set the indexes and start/restart this on a server you dont maintain yourself?

30 Jul 2008
Peter Bengtson said:

There is a bug in the configuration file generator – it always assumes that Postgres listens on port 5432. You need to change the following:

config = <<-SOURCE

source #{model.indexes.first.name}_#{index}_core
{
type = #{db_adapter}
sql_host = #{database_conf[:host] || “localhost”}
sql_port = #{database_conf[:port]}
sql_user = #{database_conf[:username]}
sql_pass = #{database_conf[:password]}
sql_db = #{database_conf[:database]}

sql_query_pre = #{charset_type "utf-8" && adapter :mysql ? “SET NAMES utf8” : ””} #{“sql_query_pre = SET SESSION group_concat_max_len = #{@options[:group_concat_max_len]}” if @options[:group_concat_max_len]}
sql_query_pre = #{to_sql_query_pre}
sql_query = #{to_sql.gsub(/\n/, ’ ‘)}
sql_query_range = #{to_sql_query_range}
sql_query_info = #{to_sql_query_info} #{attr_sources}
} SOURCE

That is, it needs a sql_port line, as added above.

30 Jul 2008
pat said:

Michael: if you don’t have shell access, I’ve no idea… in that sort of situation, you might be out of luck, unfortunately.

Peter: Thanks for that! I’ll add it in when I get the chance. Cheers.

31 Jul 2008
Kristof said:

Pat,

Many thanks for your great work! Great solution for full-text search and absolutely my preferred way of accessing Sphinx.

I have a little question regarding indexing polymorphic associations. Can you shine a little light on how to specify these?

01 Aug 2008
pat said:

Hi Kristof

You can use polymorphic associations just as you would normal associations, when defining fields and indexes.

The main thing to keep in mind: TS looks at the database’s existing records to figure out what models are used in the polymorphic joins (and thus what columns are sourced from where). So make sure you have indexes on your _type columns (else indexing gets really slow).

Feel free to let me know if you have any troubles (or post to the list).

03 Aug 2008
Brian Johnson said:

I need to split my model into multiple indices by country. I have tried everything I can think of and have posted on the Sphinx mailing list, but for my needs, that seems to be the only viable solution. Is there a way to configure multiple indices, each with their own delta and then search by main/delta pairs. Lets say we have Products_US_Main/Products_US_Delta and Products_MX_Main/Products_MX_Delta and I want to search only in the MX indices. I have been looking through the code and I saw a index_weight field, but it’s not clear to me how you would select an index or set of indices, or even if you can declare more than one index for a model and name it. Thanks.

04 Aug 2008
pat said:

Hi Brian

Unfortunately Thinking Sphinx doesn’t work like that. Not sure if Ultrasphinx does either… with some hacking, it might be possible for TS to do this, though – define_index can be called multiple times, technically, but not sure how well it’d work – and you’d want to loop through calls for each country.

Otherwise, you might be best writing your own solution, and perhaps use Riddle for talking to Sphinx?

09 Aug 2008
Craig Ambrose said:

Hi Pat,

Do you have any info on the performance implications of delta indexing? I’ve got delta indexing turned on for some of my models, such as Product, and it seems to be slowing down edits to these objects pretty significantly. In particular it’s a bit of a pain the way it updates the index even if I only change data on the model which is not actually indexed by sphinx. Failing that, some of my batch operations which operate on many Product records trigger a lot of small index updates. Do you think it would be better if I take the delta indexing off this and simple update the entire index more frequently? I think that some other sphinx plugins use delta indexing, but the delta index update is still not done within the mongrel process. Ie, every two minutes refresh the delta index.

got any suggestions?

cheers,

Craig

09 Aug 2008
Brian Johnson said:

I am using Ultrasphinx right now, primarily because of the association_sql capabilities that give me more advanced control over the index queries, but I thought it would be easier to add that to Thinking Sphinx than to add multiple indexes to Ultrasphinx due do the way indexes are generated in Ultrasphinx. If it’s complicated in both, I may take a second look at Ultrasphinx because I’ve already implemented it in my application.

11 Aug 2008
Lang Riley said:

Long time user and contributor to ultrasphinx and just switched to thinking sphinx. Your code is clean and well organized. Looks like it will be much easier to do more advanced sphinx things, like search within results to n depth, aka guided navigation on facets with multi-values. Thanks!!

12 Aug 2008
Justin said:

great plug in, much better than acts_as_ferret. One thing though, i have some products that have a lot of different sizes and i need to group those products but i’ve tired with now success. Could someone tell me how. Thanks

12 Aug 2008
justin said:

ner mind..

14 Aug 2008
pat said:

Craig: there is plans to shift delta indexing out to some messaging service. Not sure when that’ll happen though. For bulk updates, see this solution.

Brian: Not quite sure what you’re asking. The SQL for indexes are generated automatically by Thinking Sphinx, using ActiveRecord’s underlying classes.

Lang: Great to hear – although we are lacking some of those features in TS (at least, the core branch – a few people have their own facets implementations figured out).

Justin: I assume you got the problem sorted?

Don’t forget about the mailing list for any other questions.

13 Sep 2008
Dev Singh said:

I am trying to setup wildcard matching in TS.

Have created a sphinx.yml file with the following:

development: enable-star: true min_prefix:_len: 4 min_infix:_len: 4

(also tried various variations with allow_star: true)

The development.sphinx.conf file was generated successfully
Howevr I am unable to get searches like:

User.search “pat

to work. It returns an empty array. Is this syntax incorrect? is not the ”” what we use for a wildcard match for the sphinx query?

Using Sphinx latest 0.9.8

13 Sep 2008
pat said:

Hi Dev

What you’ve done is correct – although I’m not certain if you can have both min prefix and min infix set. Have you restarted the Sphinx daemon?

16 Sep 2008
franee said:

Hi,

Is there a way to search by specifying specific models to be searched?

thanks!

16 Sep 2008
pat said:

franee: If you want to use just a single model, ModelClass.search. If you want to search across several specific models, use ThinkingSphinx::Search.search "text", :classes => [ModelClassOne, ModelClassTwo]

Leave a Comment

Comments are formatted using Textile. Please be respectful of others when posting comments. Be nice.

RssSubscribe to the RSS feed

Related Links

Related Posts

About Freelancing Gods

Freelancing Gods is written by , who works on the web as a web developer in Melbourne, Australia, specialising in Ruby on Rails.

In case you're wondering what the likely content here will be about (besides code), keep in mind that Pat is passionate about the internet, music, politics, comedy, bringing people together, and making a difference. And pancakes.

His ego isn't as bad as you may think. Honest.

Here's more than you ever wanted to know.

Ruby on Rails Projects

Other Sites

Creative Commons Logo All original content on this site is available through a Creative Commons by-nc-sa licence.