Freelancing Gods 2008

God
01 Jul 2008

Rails Camp UK

Following in the steps of the Australian Rails Camps, it’s now time to announce the first UK edition. Running from Friday the 15th to Monday the 18th of August, it will be an extended weekend of hacking, talking, eating, drinking and games, with a bunch of smart and passionate Ruby developers.

Even though the name is “Rails Camp”, previous camps have included talks on topics from Merb to Rack to Extreme Programming – all topics somewhat related to Ruby are welcome.

If you’d like to come along, I’d recommend registering soon, as there’s a very limited number of places.

12 Jun 2008

A Concise Guide to Using Thinking Sphinx

Okay, it’s well past time for the companion piece to my Sphinx Primer – let’s go through the basic process of using Thinking Sphinx with Rails.

Just to recap: Sphinx is a search engine that indexes data, and then you can query it with search terms to find out which documents are relevant. Why do you want to use it with Rails? Because it saves having to write messy SQL, and it’s so damn fast.

(If you’re getting a feeling of deja-vu, then it’s probably because you’ve read an old post on this blog that dealt with an old version of Thinking Sphinx. I’ve had a few requests for an updated article, so this is it.)

Installation

So: first step is to install Sphinx. This may be tricky on some systems – but I’ve never had a problem with it with Mac OS X or Ubuntu. My process is thus:

curl -O http://sphinxsearch.com/downloads/sphinx-0.9.8-rc2.tar.gz
tar zxvf sphinx-0.9.8-rc2.tar.gz
cd sphinx-0.9.8-rc2
./configure
make
sudo make install

If you’re using Windows, you can just grab the binaries.

Once that’s taken care of, you then want to take your Rails app, and install the plugin. If you’re running edge or 2.1, this is a piece of cake:

script/plugin install git://github.com/freelancing-god/thinking-sphinx.git

Otherwise, you’ve got a couple of options. The first is, if you have git installed, just clone to your vendor/plugins directory:

git clone git://github.com/freelancing-god/thinking-sphinx.git
  vendor/plugins/thinking-sphinx

If you’re not yet using git, then the easiest way is to download the tar file of the code. Try the following:

curl -L http://github.com/freelancing-god/thinking-sphinx/tarball/master
  -o thinking-sphinx.tar.gz
tar -xvf thinking-sphinx.tar.gz -C vendor/plugins
mv vendor/plugins/freelancing-god-thinking-sphinx* vendor/plugins/thinking-sphinx

Oh, and it’s worth noting: if you’re not using MySQL or PostgreSQL, then you’re out of luck – Sphinx doesn’t talk to any other relational databases.

Configuration

Next step: let’s get a model or two indexed. It might be worth refreshing your memory on what fields and attributes are for – can I recommend my Sphinx article (because I’m not at all biased)?

Ok, now let’s work with a simple Person model, and add a few fields:

class Person < ActiveRecord::Base
  define_index do
    indexes [first_name, last_name], :as => :name
    indexes location
  end
end

Nothing too scary – we’ve added two fields. The first is the first and last names of a person combined to one field with the alias ‘name’. The second is simply location.

Adding attributes is just as easy:

define_index do
  # ...

  has birthday
end

This attribute is the datetime value birthday (so you can now sort and filter your results by birthdays).

Managing Sphinx

We’ve set up a basic index – now what? We tell Sphinx to index the data, and then we can start searching. Rake is our friend for this:

rake thinking_sphinx:index
rake thinking_sphinx:start

Searching

Now for the fun stuff:

Person.search "Melbourne"

Or with some sorting:

Person.search "Melbourne", :order => :birthday

Or just people born within a 10 year window:

Person.search "Melbourne", :with => {:birthday => 25.years.ago..15.years.ago}

If you want to keep certain search terms to specific fields, use :conditions:

Person.search :conditions => {:location => "Melbourne"}

Just remember: :conditions is for fields, :with is for attributes (and :without for exclusive attribute filters).

Change

Your data changes – but unfortunately, Sphinx doesn’t update your indexes to match automatically. So there’s two things you need to do. Firstly, run rake thinking_sphinx:index regularly (using cron or something similar). ‘Regularly’ can mean whatever time frame you want – weekly, daily, hourly.

The second step is optional, but it’s needed to have your indexes always up to date. First, add a boolean column to your model, named ‘delta’, and have it default to false. Then, tell your index to use that delta field to keep track of changes:

define_index do
  # ...

  set_property :delta => true
end

Then you need to tell Sphinx about the updates:

rake thinking_sphinx:stop
rake thinking_sphinx:index
rake thinking_sphinx:start

Once that’s done, a delta index will be created – which holds any recent changes (since the last proper indexing), and gets re-indexed whenever a model is edited or created. This doesn’t mean you can stop the regular indexing, as that’s needed to keep delta indexes as small (and fast) as possible.

String Sorting

If you remember the details about fields and attributes, you’ll know that you can’t sort by fields. Which is a pain, but there’s ways around this – and it’s kept pretty damn easy in Thinking Sphinx. Let’s say we wanted to make our name field sortable:

define_index do
  indexes [first_name, last_name], :as => :name, :sortable => true

  # ...
end

Re-index and restart Sphinx, and sorting by name will work.

How is this done? Thinking Sphinx creates an attribute under the hood, called name_sort, and uses that, as Sphinx is quite fine with sorting by strings if they’re converted to ordinal values (which happens automatically when they’re attributes).

Pagination

Sphinx paginates automatically – in fact, there’s no way of turning that off. But that’s okay… as long as you can use your will_paginate helper, right? Never fear, Thinking Sphinx plays nicely with will_paginate, so your views don’t need to change at all:

<%= will_paginate @search_results %>

Associations

Sometimes you’ll want data in your fields (or attributes) from associations. This is a piece of cake:

define_index do
  indexes photos.caption, :as => :captions
  indexes friends.photos.caption, :as => :friends_photos

  # ...
end

Polymorphic associations are fine as well – but keep in mind, the more complex your index fields and attributes, the slower it will be for Sphinx to index (and you’ll definitely need some database indexes on foreign key columns to help it stay as speedy as possible).

Gotchas

In case things aren’t working, here’s some things to keep in mind:

  • Added an attribute, but can’t sort or filter by it? Have you reindexed and restarted Sphinx? It doesn’t automatically pick up these changes.
  • Sorting not working? If you’re specifying the attribute to sort by as a string, you’ll need to include the direction to sort by, just like with SQL: “birthday ASC”.
  • Using name or id columns in your fields or attributes? Make sure you specify them using symbols, as they’re core class methods in Ruby.
define_index do
  indexes :name

  # ...

  has photos(:id), :as => :photo_ids
end

And Next?

I realise this article is pretty light on details – but if you want more information, the first stop should be the extended usage page on the Thinking Sphinx site, quickly followed by the documentation. There’s also an email list to ask questions on.

08 Jun 2008

The End of Charity

As I’m travelling, I’m reading more – so that means it’s time for another impromptu book-review/idea-sharing post.

The book in question this time around is Nic Frances’ ominously titled The End of Charity. The points of the book aren’t that scary though – I find them to be pretty spot-on with what’s needed.

A quick overview:

  • Society’s siloed approach isn’t working: Leaving businesses to focus on making money, and charities to making the world better isn’t really getting anywhere.
  • Value needs to represent more than financial worth: Goods and services need to be given more accurate values which incorporate social and environmental worth.
  • Businesses need to incorporate social and environmental mindsets into their operations: Remove the silos. Don’t leave the ‘doing good’ to a separate organisation (examples: Google Inc and Google.org, Microsoft and The Bill and Melinda Gates Foundation, McDonalds and Ronald McDonald House Charities).
  • The market will make it all work: Okay, that’s a little simplistic, but the market does a decent job at helping the best value goods and services come to the fore.

Now the book itself is far more detailed – Frances draws a lot from his own experiences, both in charities and in socially-minded businesses, so there’s no end of real world examples. It’s also extremely easy to digest, so I highly recommend reading it, even if you don’t have much of a business-focused mind.

Granted, some of these ideas can take some getting used to, especially on the left side of politics where broad strokes paint businesses (particularly corporations) as Bad, and charities and other non-profits as Good. A lot of what’s discussed in this book isn’t particularly new to me – I was introduced to the concepts while working at MBO (now Ergo Consulting) (which, perhaps not so surprisingly, had an awesome culture non unlike what Frances outlines for his own Cool nrg). I remember bristling at the idea put forward by our then CEO Paul Steele (who is currently COO at World Vision Australia) that business is the best way to enact social improvement.

A few years have passed since then, though, and I’ve come around to agreeing that the combined approach is far more likely to succeed than the old, siloed way.

Now, this hasn’t led to any dramatic chances in my freelancing lifestyle – but it’s got my brain ticking away, so you’ll just have to wait and see what comes of it. That said, what do you you think about all this? Do you agree? Disagree? Do you have some suggestions on how to make the organisation you work for take a more holistic approach?

28 May 2008

RailsConf 2008

I’ve just started my round-the-world conferences-and-holiday adventure, and the first stop is RailsConf in Portland – so if you’re in town and see me wandering around looking rather cluelessly, please say hi.

Also, in case you’re on the Twitter bandwagon, you’ll find me with the creative nickname of pat.

21 May 2008

Sphinx + Rails + PostgreSQL

In case you’ve not been watching every commit carefully flow through Thinking Sphinx on GitHub – PostgreSQL support has been added. I’ve done a little bit of testing, and I’ve had some excellent support from Björn Andreasson and Tim Riley, so I feel it’s ready for people to start kicking the tires.

I’m no PostgreSQL expert – I definitely fall into the n00b category – so if you think there’s better ways to do what I’m doing, would love to hear them.

11 May 2008

Updates for Thinking Sphinx

I’ve been working away on Thinking Sphinx when I’ve had the time – and we’re nearing what I feel is a solid 1.0 release. I say we, because I’ve had some very generous people supply patches over the last few weeks – namely Arthur Zapparoli, James Healy, Chris Heald and Jae-Jun Hwang. Switching to git – and GitHub in particular – has made it very easy to accept patches.

Mind you, all of these changes aren’t committed just yet – and even when they are, there’ll still be a few more things to cross off the list before we hit the big 1.0 milestone, namely PostgreSQL support and solid spec coverage. Slowly edging closer.

In other news – to help share the Thinking Sphinx knowledge (after some prompting by a few users of the plugin), I’ve created a Google Group for it – so this will be the ideal place to ask questions about how to implement Sphinx into your app, suggest features, or report bugs.

If you’ve been pondering how to deploy Thinking Sphinx via Capistrano, I recommend you read a blog post by Wade Winningham – or if you’re interested in better ways of handling UTF-8 characters outside of the ‘normal’ set (ie: without accents and so forth), make sure you peruse James Healy’s solution.

And one last reminder – if you’re in Sydney on Wednesday evening and interested in learning a bit more about Sphinx in general and Thinking Sphinx in particular, come along to the monthly Ruby meet at the Crown Hotel in Surry Hills, as those are the topics I’ll be presenting on.

26 Apr 2008

Sphinx: A Primer

On Thursday night I presented to the Melbourne Ruby Group about Sphinx – first with a non-Ruby perspective, and then using Ruby, and more specifically Rails. I’ll be presenting again at the Sydney group in a couple of weeks, but I am also adapting the talk to a few blog posts – to allow a bit more detail in a few doses.

First up: Sphinx itself. Why should you read this? Because understanding Sphinx will help you use whichever library (Ruby or otherwise) smarter. It might also teach you some things you had no idea about (ie: this is the article I should have read when I started using Sphinx).

What is Sphinx?

Sphinx is a search engine. You feed it documents, each with a unique identifier and a bunch of text, and then you can send it search terms, and it will tell you the most relevant documents that match them. If you’re familiar with Lucene, Ferret or Solr, it’s pretty similar to those systems. You get the daemon running, your data indexed, and then using a client of some sort, start searching.

When indexing your data, Sphinx talks directly to your data source itself – which must be one of MySQL, PostgreSQL, or XML files – which means it can be very fast to index (if your SQL statements aren’t too complex, anyway).

Sphinx Structure

A Sphinx daemon (the process known as searchd) can talk to a collection of indexes, and each index can have a collection of sources. Sphinx can be directed to search a specific index, or all of them, but you can’t limit the search to a specific source explicitly.

Each source tracks a set of documents, and each document is made up of fields and attributes. While in other areas of software you could use those two terms interchangeably, they have distinct meanings in Sphinx (and thus require their own sections in this post).

Fields

Fields are the content for your search queries – so if you want words tied to a specific document, you better make sure they’re in a field in your source. They are only string data – you could have numbers and dates and such in your fields, but Sphinx will only treat them as strings, nothing else.

Attributes

Attributes are used for sorting, filtering and grouping your search results. Their values do not get paid any attention by Sphinx for search terms, though, and they’re limited to the following data types: integers, floats, datetimes (as Unix timestamps – and thus integers anyway), booleans, and strings. Take note that string attributes are converted to ordinal integers, which is especially useful for sorting, but not much else.

Multi-Value Attributes

There is also support in Sphinx to handle arrays of attributes for a single document – which go by the name of multi-value attributes. Currently (Sphinx version 0.9.8rc2) only integers are supported, so this isn’t quite as flexible as normal attributes, but it’s worth keeping in mind.

Filters

Filters are useful with attributes to limit your searches to certain sets of results – for example, limiting a forum post search to entries by a specific user id. Sphinx’s filters accept arrays or ranges – so if filtering by a single value, just put that in an array. The range filters are particularly useful for getting results from a certain time span.

Relevance

Relevancy is the default sorting order for Sphinx. I’ve no idea exactly how it is calculated, but there are a couple of things you can do easily enough in your queries to influence it. The first is index-level weighting, where you give specific indexes higher rankings than others. The other, similar in nature, but at a lower level, is field weightings. Generally these are set before each query, but it will depend on the library you use.

Keeping Your Indexes Updated

One thing that sets Sphinx apart from Ferret and other search engines is that there is no way to update fields for a specific document in your indexes. The main approach around this is having delta indexes – a small index with all the recent changes (which will be super-fast to index), so Sphinx will include that and the main index for its searches. Of the Rails plugins, both Thinking Sphinx and Ultrasphinx have support for this – I’ve no idea for other languages, mind you.

What’s next?

Next is when we’ll dive into some actual code – we’ll go through some of the common tasks for setting up Sphinx with Rails using Thinking Sphinx.

RssSubscribe to the RSS feed

Recent Links

Recent Posts

Tag Density

About Freelancing Gods

Freelancing Gods is written by , who works on the web as a web developer in Melbourne, Australia, specialising in Ruby on Rails.

In case you're wondering what the likely content here will be about (besides code), keep in mind that Pat is passionate about the internet, music, politics, comedy, bringing people together, and making a difference. And pancakes.

His ego isn't as bad as you may think. Honest.

Ruby on Rails Projects

Other Sites

Creative Commons Logo All original content on this site is available through a Creative Commons by-nc-sa licence.