Freelancing Gods 2014

God
10 Apr 2008

Thinking Sphinx Reborn

So, over the last month or so I’ve been working hard on rewriting Thinking Sphinx – and it’s now time to release those changes publicly. The site’s now got a brief quickstart page and a detailed usage page beyond the rdoc files, and there will be more added over the coming weeks.

A quick overview of what’s shiny and new:

Better index definition syntax

This part reworked many times, finally to something I’m pretty happy with:

define_index do
  indexes [first_name, last_name], :as => :name, :sortable => true
  indexes email, location
  indexes [posts.content, posts.subject], :as => :posts
end

Polymorphic association support in indexes

When you’re drilling down into your associations for relevant field data, it’s now safe to use polymorphic associations – Thinking Sphinx is pretty smart about figuring what models to look at. Make sure you put table indexes on your _type columns though.

MVA Support

Multi-Value Attributes now work nice and cleanly – so you can tie an array of integers to any record.

Multi-Model Searching

Just like before, you can search for records of a specific model. This time around though, you can also search across all your models – and the results still use will_paginate if it’s installed.

ThinkingSphinx::Search.search "help"

Better Filter Support

It was kinda in there to start with, but now it’s much smarter – and it all goes into the conditions hash, just like a find call:

User.search :conditions => {:role_id => 5}
Article.search :conditions => {:author_ids => [12, 24, 48]}

Sorting by Fields

As you may have noticed in the first code block of this post, you can mark fields as :sortable – what this does is it uses Sphinx’s string attributes, and creates a matching attribute that acts as a sort-index to the field. When specifying the search options though, you can just use the field’s name – Thinking Sphinx knows what you’re talking about.

User.search "John", :order => :name
User.search "Smith", :order => "name DESC"

Even More

I’m so eager to share this new release that there’s probably a few things that need a bit more documentation – that will appear both on the Thinking Sphinx site and here on the blog. I’m planning on writing some articles that provide a solid overview to Sphinx in general – which will hopefully be some help no matter what plugin you use – and then dive into some regular ‘recipes’ of Thinking Sphinx usage, and some detailed posts of the cool new features as well.

Also in the pipeline is Merb support – just for ActiveRecord initially, but I’d love to get it talking to DataMapper as well.

Update: Jonathan Conway’s got a branch working in Merb and Rails – needless to say, I’ll be updating trunk with his patch as soon as possible.

Comments

44 responses to this article

10 Apr 2008
Chris Parker said:

I really love this plugin. It makes indexing a breeze and is much easier to use than the alternatives.

Thanks alot!

10 Apr 2008
Thuva said:

Pat, congratulations on releasing a new version of Thinking Sphinx. I’m looking forward to using it in my current project.

10 Apr 2008
Chris Parker said:

Could you give the option for DISTINCT in GROUP_CONCAT?

For example:

CAST AS CHAR)

10 Apr 2008
Chris Parker said:

oops:

@ CAST AS CHAR)@

10 Apr 2008
pat said:

Thuva: Thank you :)

Chris: Great that you’re enjoying the plugin. Why do you need the distinct though? I think I know a pretty easy way for me to add it in, I’m just not (yet) sure why it’s needed.

11 Apr 2008
Rob Olson said:

Does Thinking Sphinx support eager loading of associated models? Similar to the :include option in ActiveRecord find calls?

11 Apr 2008
pat said:

Rob: Yes, if you specify :include in your search calls, it will be respected.

The one caveat – since Sphinx returns an array of ids, each model has to be loaded in a separate call to the database, so the full speed benefits of :include aren’t quite reached.

ie: Think of it like User.find(:first, :include => [:posts]) instead of User.find(:all, :include => [:posts])

11 Apr 2008
Oliver Beddows said:

I’m loving this plugin! Most excellent work!

I have one “query” related to searching a selected group of indexes.

It would great if you could supply a list of class names (like UltraSphinx does).

For example:
ThinkSphinx::Search.search(‘query’, { :class_names => [‘Article’, ‘Product’, ‘Customer’] })

Looking at ThinkSphinx::Search it seems to be capable of handing one index or all indexes, but not a selected group of indexes.

Is this a feature you have planned or thought about? Just curious… ;o)

12 Apr 2008
pat said:

Oliver: It’s not something I’d ever thought of – but easy enough to add. I’ll see if I can get it working over the next few days.

Thanks for the feedback :)

21 Apr 2008
Paul Goscicki said:

I’ve been reading about Thinking Sphinx a little and so far it looks promising (especially as a Ferret replacement). I’d suggest you write a comprehensive tutorial on Thinking Sphinx usage in Rails. Similar to the one Rails Envy did for Ferret (with highlighting, field boosting, pagination, etc).

21 Apr 2008
pat said:

Hi Paul – writing some detailed tutorials is definitely on my list of things to do. I’m hoping to start with a solid introduction to Sphinx first (by this weekend would be nice), and then I’ll move on to how to use Thinking Sphinx.

24 Apr 2008
Brian Johnson said:

This looks very interesting. I have narrowed down my options to Thinking Sphinx and Ultrasphinx. Does Ultrasphinx still have more features with this latest release? If so, what does it have the Thinking Sphinx doesn’t? In particular, faceting is very important for my application.

24 Apr 2008
pat said:

Hi Brian

At this stage, if you need faceting, go with Ultrasphinx. I know Patrick Lenz is working on a patch for faceting with Thinking Sphinx (check out his git tree on GitHub)… I don’t have the time to focus on it myself at the moment, so until he’s done, it looks like Ultrasphinx is what suits your needs best.

Beyond that though, I think the two plugins are pretty close in regards to features.

24 Apr 2008
Brian Johnson said:

My needs for faceting are still a few months out, so something in the works might be ok. There must be some fundamental difference between Ultrasphinx and Thinking Sphinx, otherwise there would be no need for multiple plugins. What are the two approaches and what are the benefits/problems with each approach. I’d love to have the time to simply try both of them out and learn for myself, but unfortunately I have too many other things on my plate right now. A brief explanation would be much appreciated. Thanks.

25 Apr 2008
pat said:

I feel there’s two main differences – the first is how you define indexes. I think my syntax is clean and obvious, and also allows you to get data through associations very easily.

The second is approach – I aim to have everything managed through Ruby and a bit of YAML. That is, you don’t need to edit the Sphinx configuration file yourself – the plugin sets it all up for you. I also aim for a very convention-over-configuration setup – if you don’t want to, you don’t need to worry about file paths – it puts it all in your application’s directory, and handles multiple environments fine.

Caveat with all this: obvious bias, plus I haven’t spent any time using Ultraphinx for a good 8 months or so.

28 Apr 2008
Björn Andreasson said:

Hi. Just tried your Sphinx plugin (third plugin this last week) and stumbled upon some problems.
1. Your plugin seems to be MySQL only? Found a “mysql” in plugins/thinking_sphinx/lib/thinking_sphinx/configuration.rb and replaced it to pgsql to make it work with my database.
2. … however, the SQL-string that gets generated won’t work. The postgres doesn’t like your “´” and when deleting those, it gives me the error tbl_archive.collect doesn’t exists. Which is true. Why does it try to get a column that doesn’t exist? What is “collect”?

Would love to get some answers :)

01 May 2008
Cheri A said:

Does Eager Loading work with Thinking Sphinx?

02 May 2008
pat said:

Cheri – to some extent it is – see the comments above between myself and Rob Olsen. I do have plans to support it fully though – hopefully they’ll be implemented in the next week or two.

02 May 2008
Cheri A said:

Great plugin! Do all the indexing features of Thinking Sphinx negate the need to add indices to my tables in sql?

03 May 2008
pat said:

Hi Cheri – no, you still need indexes in your database tables. Sphinx uses MySQL to extract the data, and indexes on your tables makes those queries faster.

Also – there are still plenty of times when you’ll need to use normal find calls. Sphinx doesn’t replace that.

04 May 2008
Cheri A said:

Thanks, Pat. So if I have a simple request, such as selecting an entry based on a primary id, it’s better to use find_by_sql and not sphinx, correct?

04 May 2008
pat said:

Cheri: yup – or even just find(id), if it’s that simple :)

12 Jul 2008
Artur Jedlinski said:

If you would like to use thinking_sphinx as Windows service (which is much more comfortable than typing “rake ts:start” again and again), you may find yourself interested in:

http://www.jedlinski.pl/blog/2008/07/12/thinking-sphinx-as-windows-service/

21 Sep 2008
mauro catenacci said:

Hi, I would like to know if I can associate models that are not directly associated in the model where you specify the define_index. i:e
in my model I have
belongs_to :imageasset
belongs_to :raider

now imageasset has associations of it’s own and I need to get them in here somehow. Is that possible?

22 Sep 2008
pat said:

Hi Mauro

You should be able to drill through your associations’ associations, like so:

indexes imageasset.creator.login

Cheers

23 Oct 2008
joren said:

hi,

i’m really loving this plugin but I stumbled on some small issues.
Say I have a list of names, john, jon, jonn,…
an I want to search for all names where “jo” comes in.

Is this supported, or do I have to edit some setting or is there a specific way to search?

thx

23 Oct 2008
pat said:

Hi Joren

It is supported once you add a couple of settings to config/sphinx.yml, for each environment (like you would in config/database.yml): set enable_star to true, and min_prefix_len to 1 (or longer, if you only want prefixes of 2 characters or more?). Stop Sphinx, reindex, restart Sphinx, and then try searching for “jo*”.

24 Oct 2008
joren said:

thx,

Peepcode published a pdf today about thinking sphinx. So that can be a help to.
Got it this morning, and there is some good information in it about how it works and about more settings.
http://peepcode.com/products/thinking-sphinx-pdf

24 Oct 2008
joren said:

ok, didn’t see till now, you’re already blogged about peepcode :)

24 Oct 2008
joren said:

shouldn’t it be useful to only reindex the new values and the values that were updated since the last index?
‘cause I don’t really like the extra field you need to use the delta functionality.

24 Oct 2008
pat said:

Joren: not sure what you mean – the delta index is only used for new/updated records.

24 Oct 2008
joren said:

we have a lot of records and don’t want even more writing in our db when we work with delta.

so every time we index we want to check only these records where the updated_at has changed since the previous time we indexed.

if we index every 15min, only reindex every record where updated_at is 15 minutes ago.

24 Oct 2008
pat said:

Joren: At this stage, there’s nothing like that in the official Thinking Sphinx – although I’d like to have it as a feature at some point.

For the moment though, have a look at Ed Hickey’s fork.

28 Oct 2008
joren said:

thx, this is the kind of stuff I needed.

great work for both of you

16 Nov 2008
scott said:

Hi there,

I am an avid ultrasphinx user and stumbled on your plugin recently. I have yet to try it but plan on doing so soon.

Question for you – does thinking sphinx generate a single sphinx index, that contains multiple models (as ultrasphinx does), or does it actually create a separate index and source for each Active Record model?

IE – in ultrasphinx, you get one index (besides delta), ‘main’, and in main there is fields added like class and class_id, so at the end of the configuration and indexing, you have one gigantic index, that contains all your data.

To make sphinx and ultrasphinx work for our project we have had to hand modify the sphinx.conf file, and manually specify to sphinx which indexes to search. We have 1.2 billion rows we are searching, so having things in a single index is not efficient.

So, in asking you this I am just curious your approach on generating the actual index(es), as if ThinkingSphinx keeps them all separated per model, this would make deployment and maintenance much less for us.

Thank you very much for your hard work!

19 Nov 2008
pat said:

Hi Scott

Thinking Sphinx generates an index per model – indeed, it actually generates two – user_core and user. The latter is a distributed index, used as a way to access user_core and user_delta (if deltas are enabled) in one single query.

Hopefully this separation makes life easier for you.

Cheers

15 Dec 2008
Canvas said:

Hi,

I have a very complecated view to search data from. The view joins 6 tables, incuding inner and outter joins. Triggers are used to update data to help the join when data in the 6 tables are updated. As a result, the view is very complicated.

I used thining_sphix to build index and searched on the view, it worked pretty nice. But since it’s a view, I can not add column “delta” into it and can not use the delta index as a result. But I really need to use delta index as my data updates quite often.

The solution I can think of now is to use a table to replace the view. And update data into the new table when any of the 6 related tables are updated. Therefore I have to extract the logic in the view and triggers and implement it somewhere. The side effect of this solution is that synchronize the data in the new table and the other six is quite messy.

Any suggestion will be appreciated. Thank you very much.

16 Dec 2008
Pat said:

Hi Canvas

It’s best to post your question to the google group so we can get more minds thinking about the best approach.

17 Dec 2008
Canvas said:

Thank you Pat. I just posted my question to the google group.

18 Mar 2009
joren said:

Hi,

Is there any reason why I only can search for the first 3 characters of an id?

I have this id 123456, when I search for ‘123*’ I get a result, bt when I search for ‘1234*’ or the complete id, My result is empty, whithout any error output in any logs.

Is this a setting or an option I can adjust in my config?
thanks

23 Mar 2009
pat said:

Joren: I’m not sure what the problem is – perhaps it’s worth discussing this on the google group?

06 Jun 2009
Alex said:
indexes email, location

My instinct says this is gross. Why use method_missing instead of a symbol? Are you saving the : character or is there some reason for that?

07 Jun 2009
pat said:

Alex: I’m using method missing because it makes more sense when defining fields and attributes – see the posts example in the line below what you’ve highlighted.

I agree it’s not brilliant, but it helps to keep the index definition clean.

23 Sep 2009
wxianfeng said:

it is easier to use than ultrasphinx………

Leave a Comment

Comments are formatted using Textile. Please be respectful of others when posting comments. Be nice.

RssSubscribe to the RSS feed

Related Links

Related Posts

About Freelancing Gods

Freelancing Gods is written by , who works on the web as a web developer in Melbourne, Australia, specialising in Ruby on Rails.

In case you're wondering what the likely content here will be about (besides code), keep in mind that Pat is passionate about the internet, music, politics, comedy, bringing people together, and making a difference. And pancakes.

His ego isn't as bad as you may think. Honest.

Here's more than you ever wanted to know.

Ruby on Rails Projects

Other Sites

Creative Commons Logo All original content on this site is available through a Creative Commons by-nc-sa licence.