Freelancing Gods 2008

God
21 May 2008

Sphinx + Rails + PostgreSQL

In case you’ve not been watching every commit carefully flow through Thinking Sphinx on GitHub – PostgreSQL support has been added. I’ve done a little bit of testing, and I’ve had some excellent support from Björn Andreasson and Tim Riley, so I feel it’s ready for people to start kicking the tires.

I’m no PostgreSQL expert – I definitely fall into the n00b category – so if you think there’s better ways to do what I’m doing, would love to hear them.

11 May 2008

Updates for Thinking Sphinx

I’ve been working away on Thinking Sphinx when I’ve had the time – and we’re nearing what I feel is a solid 1.0 release. I say we, because I’ve had some very generous people supply patches over the last few weeks – namely Arthur Zapparoli, James Healy, Chris Heald and Jae-Jun Hwang. Switching to git – and GitHub in particular – has made it very easy to accept patches.

Mind you, all of these changes aren’t committed just yet – and even when they are, there’ll still be a few more things to cross off the list before we hit the big 1.0 milestone, namely PostgreSQL support and solid spec coverage. Slowly edging closer.

In other news – to help share the Thinking Sphinx knowledge (after some prompting by a few users of the plugin), I’ve created a Google Group for it – so this will be the ideal place to ask questions about how to implement Sphinx into your app, suggest features, or report bugs.

If you’ve been pondering how to deploy Thinking Sphinx via Capistrano, I recommend you read a blog post by Wade Winningham – or if you’re interested in better ways of handling UTF-8 characters outside of the ‘normal’ set (ie: without accents and so forth), make sure you peruse James Healy’s solution.

And one last reminder – if you’re in Sydney on Wednesday evening and interested in learning a bit more about Sphinx in general and Thinking Sphinx in particular, come along to the monthly Ruby meet at the Crown Hotel in Surry Hills, as those are the topics I’ll be presenting on.

26 Apr 2008

Sphinx: A Primer

On Thursday night I presented to the Melbourne Ruby Group about Sphinx – first with a non-Ruby perspective, and then using Ruby, and more specifically Rails. I’ll be presenting again at the Sydney group in a couple of weeks, but I am also adapting the talk to a few blog posts – to allow a bit more detail in a few doses.

First up: Sphinx itself. Why should you read this? Because understanding Sphinx will help you use whichever library (Ruby or otherwise) smarter. It might also teach you some things you had no idea about (ie: this is the article I should have read when I started using Sphinx).

What is Sphinx?

Sphinx is a search engine. You feed it documents, each with a unique identifier and a bunch of text, and then you can send it search terms, and it will tell you the most relevant documents that match them. If you’re familiar with Lucene, Ferret or Solr, it’s pretty similar to those systems. You get the daemon running, your data indexed, and then using a client of some sort, start searching.

When indexing your data, Sphinx talks directly to your data source itself – which must be one of MySQL, PostgreSQL, or XML files – which means it can be very fast to index (if your SQL statements aren’t too complex, anyway).

Sphinx Structure

A Sphinx daemon (the process known as searchd) can talk to a collection of indexes, and each index can have a collection of sources. Sphinx can be directed to search a specific index, or all of them, but you can’t limit the search to a specific source explicitly.

Each source tracks a set of documents, and each document is made up of fields and attributes. While in other areas of software you could use those two terms interchangeably, they have distinct meanings in Sphinx (and thus require their own sections in this post).

Fields

Fields are the content for your search queries – so if you want words tied to a specific document, you better make sure they’re in a field in your source. They are only string data – you could have numbers and dates and such in your fields, but Sphinx will only treat them as strings, nothing else.

Attributes

Attributes are used for sorting, filtering and grouping your search results. Their values do not get paid any attention by Sphinx for search terms, though, and they’re limited to the following data types: integers, floats, datetimes (as Unix timestamps – and thus integers anyway), booleans, and strings. Take note that string attributes are converted to ordinal integers, which is especially useful for sorting, but not much else.

Multi-Value Attributes

There is also support in Sphinx to handle arrays of attributes for a single document – which go by the name of multi-value attributes. Currently (Sphinx version 0.9.8rc2) only integers are supported, so this isn’t quite as flexible as normal attributes, but it’s worth keeping in mind.

Filters

Filters are useful with attributes to limit your searches to certain sets of results – for example, limiting a forum post search to entries by a specific user id. Sphinx’s filters accept arrays or ranges – so if filtering by a single value, just put that in an array. The range filters are particularly useful for getting results from a certain time span.

Relevance

Relevancy is the default sorting order for Sphinx. I’ve no idea exactly how it is calculated, but there are a couple of things you can do easily enough in your queries to influence it. The first is index-level weighting, where you give specific indexes higher rankings than others. The other, similar in nature, but at a lower level, is field weightings. Generally these are set before each query, but it will depend on the library you use.

Keeping Your Indexes Updated

One thing that sets Sphinx apart from Ferret and other search engines is that there is no way to update fields for a specific document in your indexes. The main approach around this is having delta indexes – a small index with all the recent changes (which will be super-fast to index), so Sphinx will include that and the main index for its searches. Of the Rails plugins, both Thinking Sphinx and Ultrasphinx have support for this – I’ve no idea for other languages, mind you.

What’s next?

Next is when we’ll dive into some actual code – we’ll go through some of the common tasks for setting up Sphinx with Rails using Thinking Sphinx.

10 Apr 2008

Thinking Sphinx Reborn

So, over the last month or so I’ve been working hard on rewriting Thinking Sphinx – and it’s now time to release those changes publicly. The site’s now got a brief quickstart page and a detailed usage page beyond the rdoc files, and there will be more added over the coming weeks.

A quick overview of what’s shiny and new:

Better index definition syntax

This part reworked many times, finally to something I’m pretty happy with:

define_index do
  indexes [first_name, last_name], :as => :name, :sortable => true
  indexes email, location
  indexes [posts.content, posts.subject], :as => :posts
end

Polymorphic association support in indexes

When you’re drilling down into your associations for relevant field data, it’s now safe to use polymorphic associations – Thinking Sphinx is pretty smart about figuring what models to look at. Make sure you put table indexes on your _type columns though.

MVA Support

Multi-Value Attributes now work nice and cleanly – so you can tie an array of integers to any record.

Multi-Model Searching

Just like before, you can search for records of a specific model. This time around though, you can also search across all your models – and the results still use will_paginate if it’s installed.

ThinkingSphinx::Search.search "help"

Better Filter Support

It was kinda in there to start with, but now it’s much smarter – and it all goes into the conditions hash, just like a find call:

User.search :conditions => {:role_id => 5}
Article.search :conditions => {:author_ids => [12, 24, 48]}

Sorting by Fields

As you may have noticed in the first code block of this post, you can mark fields as :sortable – what this does is it uses Sphinx’s string attributes, and creates a matching attribute that acts as a sort-index to the field. When specifying the search options though, you can just use the field’s name – Thinking Sphinx knows what you’re talking about.

User.search "John", :order => :name
User.search "Smith", :order => "name DESC"

Even More

I’m so eager to share this new release that there’s probably a few things that need a bit more documentation – that will appear both on the Thinking Sphinx site and here on the blog. I’m planning on writing some articles that provide a solid overview to Sphinx in general – which will hopefully be some help no matter what plugin you use – and then dive into some regular ‘recipes’ of Thinking Sphinx usage, and some detailed posts of the cool new features as well.

Also in the pipeline is Merb support – just for ActiveRecord initially, but I’d love to get it talking to DataMapper as well.

Update: Jonathan Conway’s got a branch working in Merb and Rails – needless to say, I’ll be updating trunk with his patch as soon as possible.

14 Mar 2008

Sphinx 0.9.8-rc1 Updates

Another small sphinx-related post.

In line with the first release candidate release of Sphinx 0.9.8 last week, I’ve updated both my API, Riddle, and my plugin, Thinking Sphinx, to support it. Also, for those inclined, you can now get Riddle as a gem.

I’m slowly making progress on some major changes to Thinking Sphinx, so hopefully I’ll have something cool to show people soon. Oh, but some features that aren’t reflected in the documentation: most of Sphinx’s search options can be passed through when you call Model.search – including :group_by, :group_function, :field_weights, :sort_mode, etc. Consider it an exercise for the reader to figure out the details until I get around to improving the docs.

17 Jan 2008

Sphinx 0.9.8r1065

Short post, as befitting the importance of the content: Riddle and Thinking Sphinx have both been updated to support the current version of Sphinx, 0.9.8r1065.

27 Dec 2007

Updates for Sphinx 0.9.8r985

Another quick Sphinx post – Riddle is updated to support Sphinx’s latest release (0.9.8r985), and Thinking Sphinx now has that new version of Riddle as well.

I’ve not tested any of this with the recently released Ruby 1.9 yet, though (but it’s on my list of things to do).

Also, thank-you to Joost Hietbrink (again) and Jonathan Conway for their patches to Thinking Sphinx – very much appreciated.

02 Dec 2007

Sphinx-related Updates

Two Sphinx-related tidbits:

Riddle

Riddle now has a tag in SVN for the 0.9.8-r909 release of Sphinx – not that there were really any functional changes compared to r871, besides the two extra match modes (Full Scan and Extended 2 – the latter isn’t going to hang around for long anyway).

Thinking Sphinx

As well as supporting the above version of Sphinx, I’ve now added some brief documentation to Thinking Sphinx that discusses attributes, sorting and delta indexes. To summarise kinda-briefly:

Attributes

Attributes are defined in the define_index block as follows:

define_index do |index|
  index.has.created_at
  index.has.updated_at
  # Field definitions go here
end

They can only be from the indexed model (not associations), and in line with Sphinx’s limitations, must be either integers, floats or timestamps.

Sorting

Ties in closely with attributes – as that’s all Sphinx will let you order by. Use it in the same way as you would in a find call:

Invoice.search :conditions => "expensive", :order => "created_at DESC"

Same approach works for the :include parameter (although this has nothing to do with Sphinx itself).

Delta Indexes

Delta indexes track changes to model records between proper indexes (ie: from the rake task thinking_sphinx:index) – all they require is a boolean field in the model’s table called delta, and for delta indexing to be enabled as follows:

define_index do |index|
  index.delta = true
  # Fields and attributes go here
end

The one catch – at this point, delta indexes are one step off current, as they get indexed before the current transaction to the database is committed. This will get better soon, thanks to some help from Joost Hietbrink and his colleagues at YelloYello – once I find some free time, I’ll get that working much more neatly.

14 Nov 2007

Sphinx's Riddle

Built out of the work I’ve done for Thinking Sphinx (which has just got basic support for delta indexes, attributes and sorting – although the documentation doesn’t reflect that), I’ve extracted a new Ruby client that communicates with Sphinx, which I’ve named Riddle.

I’m not going to delve into the code here – because I’m not expecting it to be that useful to many people (and I just wrote examples in the documentation – go read that instead!) – but I’m very happy with how it’s ended up, and it’s got some level of specs to give it a thorough test. It’s also compatible with the most recent release of Sphinx (0.9.8 r871). Should you wish to poke around with it, just check it out from subversion:

svn co
  http://rails-oceania.googlecode.com/svn/patallan/riddle/trunk riddle

It’s also being used in Evan Weaver’s UltraSphinx plugin, which I’m pretty pleased about.

29 Oct 2007

Sphinx Quick Fix

Here’s one small filesystem tweak that’s been handy as I’ve been slowly rebuilding my development environment on Leopard over the last couple of days. It’s to get Sphinx working – there was no problems with compilation or installation, but when I ran searchd or indexer, it complained about not finding the mysql libraries:

dyld: Library not loaded: /usr/local/mysql/lib/mysql/libmysqlclient.15.dylib
  Referenced from: /usr/local/bin/indexer
  Reason: image not found

Now, the expected file path is incorrect – it shouldn’t have the second ‘mysql’. My attempts to change that with various configuration flags didn’t work, so I cheated, and added the folder as a symbolic link:

sudo ln -s /usr/local/mysql/lib /usr/local/mysql/lib/mysql

Suggestions of a cleaner solution always welcome.

RssSubscribe to the RSS feed

About Freelancing Gods

Freelancing Gods is written by , who works on the web as a web developer in Melbourne, Australia, specialising in Ruby on Rails.

In case you're wondering what the likely content here will be about (besides code), keep in mind that Pat is passionate about the internet, music, politics, comedy, bringing people together, and making a difference. And pancakes.

His ego isn't as bad as you may think. Honest.

Ruby on Rails Projects

Other Sites

Creative Commons Logo All original content on this site is available through a Creative Commons by-nc-sa licence.