Freelancing Gods 2014

God
19 Oct 2011

A Sustainable Flying Sphinx?

In which I muse about what a sustainable web service could look like – but first, the backstory:

A year ago – almost to the day – I sat in a wine bar in Sydney’s Surry Hills with Steve Hopkins. I’d been thinking about how to get Sphinx working on Heroku, and ran him through the basic idea in my head of how it could work. His first question was “So, what are you working on tomorrow, then?”

By the end of the following day, I had some idea of how it would work. Over the next few months I had a proof of concept working, hit some walls, began again, and finally got to a point where I could launch an alpha release of Flying Sphinx.

In May, Flying Sphinx became available for all Heroku users – and earlier today (five months later), I received my monthly provider payment from Heroku, with the happy news that I’m now earning enough to cover all related ongoing expenses – things like AWS for the servers, Scalarium to manage them, and Tender for support.

Now, I’m not rolling in cash, and I’m certainly not earning enough through Flying Sphinx to pay rent, let alone be in a position to drop all client work and focus on Flying Sphinx full-time. That’s cool, either of those targets would be amazing.

And of course, money isn’t the be all and end all – even though this is a business, and I certainly don’t want to run at a loss. I want Flying Sphinx to be sustainable – in that it covers not only the hosting costs, but my time as well, along with supporting the broader system around it – code, people and beyond.

But what does a sustainable web service look like, particularly beyond the standard (outmoded) financial axis?

Sustainable Time

Firstly (and selfishly), it should cover the time spent maintaining and expanding the service. Flying Sphinx doesn’t use up a huge amount of my time right at the moment, but I’m definitely keen to improve a few things (in particular, offer Sphinx 2.0.1 alongside the existing 1.10-beta installation), and there is the occasional support query to deal with.

This one’s relatively straight-forward, really – I can track all time spent on Flying Sphinx and multiply that by a decent hourly rate. If it turns out I can’t manage all the work myself, then I pay someone else to help.

It certainly doesn’t look like I’m going to need anyone helping in the near future, mind you – nor am I drowning in support requests.

Sustainable Software

Ignoring the time I spend writing code for Flying Sphinx (as that’s covered by the previous section), pretty much every other piece of software involved with the service is open source. Front and centre among these is Sphinx itself.

I certainly don’t expect to be paid for my own open source contributions, but it certainly helps when there’s some funds trickling in to help motivate dealing with support questions, fixing bugs and adding features. It can also provide a stronger base to build a community as well.

With this in mind, I’m considering setting aside a percentage of any profit for Sphinx development – as any improvements to that help make Flying Sphinx a stronger offering.

(I could also cover my time spent on Thinking Sphinx either with a percentage cut – either way it would end up in my pocket though.)

Sustainable Hardware

This is where things get a little trickier – we’re not just dealing with bits and electrons, but also silicon and metals. The human race is pretty bad at weaning itself off of limited (as opposed to renewable) resources, and the hardware industry certainly is going to hit some limits in the future as certain metals become harder to source.

Of course, the servers use a lot of energy, so one thing I will be doing is offsetting the carbon. I’ve not yet figured out the best service to do this, but will start by looking at Brighter Planet.

From a social perspective, there’s also questions about how those resources are sourced. We should be considering the working conditions of where the metals are mined (and by whom), the people who are soldering the logic boards, and those who place the finished products into racks in data centres.

As an example, let’s look at Amazon. Given the recent issues raised with the conditions for staff in their warehouses, I think it’s fair to seek clarification on the situation of their web service colleagues. And what if there were significant ethical issues for using AWS? What then for Flying Sphinx, which runs EC2 instances and is an add-on for Heroku, a business built entirely on top of Amazon’s offerings?

I could at least use servers elsewhere – but that means bandwidth between servers and Heroku apps starts to cost money – and we introduce a step of latency into the service. Neither of those things are ideal. Or I could just say that I don’t want to support Amazon at all, and shut down Flying Sphinx, remove all my Heroku apps, and find some other hosting service to use.

Am I getting a little too carried away? Perhaps, but this is all hypothetical anyway. I’m guessing Amazon’s techs are looked after decently (though I’d love some confirmation on this), and am hoping the situation improves for their warehouse staff as well.

I am still searching for answers for what truly sustainable hardware – and moreso, sustainable web services – financially, socially, environmentally, and technically. What’s your take? What have I forgotten?

24 Sep 2011

Versioning your APIs

As I developed Flying Sphinx, I found myself both writing and consuming several APIs: from Heroku to Flying Sphinx, Flying Sphinx to Heroku, the flying-sphinx gem in apps to Flying Sphinx, Flying Sphinx to Sphinx servers, and Sphinx servers to Flying Sphinx.

None of that was particularly painful – but when Josh Kalderimis was improving the flying-sphinx gem, he noted that the API it interacts with wasn’t that great. Namely, it was inconsistent with what it returned (sometimes text status messages, sometimes JSON), it was sending authentication credentials as GET/POST parameters instead of in a header, and it wasn’t versioned.

I was thinking that given I control pretty much every aspect of the service, it didn’t matter if the APIs had versions or not. However, as Josh and I worked through improvements, it became clear that the apps using older versions of the flying-sphinx gem were going to have one expectation, and newer versions another. Versioning suddenly became a much more attractive idea.

The next point of discussion was how clients should specify which version they are after. Most APIs put this in the path – here’s Twitter’s as an example, specifying version 1:

https://api.twitter.com/1/statuses/user_timeline.json

However, I’d recently been working with Scalarium’s API, and theirs put the version information in a header (again, version 1):

Accept: application/vnd.scalarium-v1+json

Some research turned up a discussion on Hacker News about best practices for APIs – and it’s argued there that using headers keeps the paths focused on just the resource, which is a more RESTful approach. It also makes for cleaner URLs, which I like as well.

How to implement this in a Rails application though? My routing ended up looking something like this:

namespace :api do
  constrants ApiVersion.new(1) do
    scope :module => :v1 do
      resource :app do
        resources :indices
      end
    end
  end

  constraints ApiVersion.new(2) do
    scope :module => :v2
      resource :app do
        resources :indices
      end
    end
  end
end

The ApiVersion class (which I have saved to app/lib/api_version.rb) is where we check the version header and route accordingly:

class ApiVersion
  def initialize(version)
    @version = version
  end

  def matches?(request)
    versioned_accept_header?(request) || version_one?(request)
  end

  private

  def versioned_accept_header?(request)
    accept = request.headers['Accept']
    accept && accept[/application\/vnd\.flying-sphinx-v#{@version}\+json/]
  end

  def unversioned_accept_header?(request)
    accept = request.headers['Accept']
    accept.blank? || accept[/application\/vnd\.flying-sphinx/].nil?
  end

  def version_one?(request)
    @version == 1 && unversioned_accept_header?(request)
  end
end

You’ll see that I default to version 1 if no header is supplied. This is for the older versions of the flying-sphinx gem – but if I was starting afresh, I may default to the latest version instead.

All of this gives us URLs that look like something like this:

http://flying-sphinx.com/api/app
http://flying-sphinx.com/api/app/indices

My SSL certificate is locked to flying-sphinx.com – if it was wildcarded, then I’d be using a subdomain ‘api’ instead, and clean those URLs up even further.

The controllers are namespaced according to both the path and the version – so we end up with names like Api::V2::AppsController. It does mean you get a new set of controllers for each version, but I’m okay with that (though would welcome suggestions for other approaches).

Authentication is managed by namespaced application controllers – here’s an example for version 2, where I’m using headers:

class Api::V2::ApplicationController < ApplicationController
  skip_before_filter :verify_authenticity_token
  before_filter :check_api_params

  expose(:app) { App.find_by_identifier identifier }

  private

  def check_api_params
    # ensure the response returns with the same header value
    headers['X-Flying-Sphinx-Token'] = request.headers['X-Flying-Sphinx-Token']
    render_json_with_code 403 unless app && app.api_key == api_key
  end

  def api_token
    request.headers['X-Flying-Sphinx-Token']
  end

  def identifier
    api_token && api_token.split(':').first
  end

  def api_key
    api_token && api_token.split(':').last
  end
end

Authentication, in case it’s not clear, is done by a header named X-Flying-Sphinx-Token with a value of the account’s identifier and api_key concatenated together, separated by a colon.

(If you’re not familiar with the expose method, that’s from the excellent decent_exposure gem.)

So where does that leave us? Well, we have an elegantly namespaced API, and both versions and authentication is managed in headers instead of paths and parameters. I also made sure version 2 responses all return JSON. Josh is happy and all versions of the flying-sphinx gem are happy.

The one caveat with all of this? While it works for me, and it suits Flying Sphinx, it’s not the One True Way for API development. We had a great discussion at the most recent Rails Camp up at Lake Ainsworth about different approaches – at the end of the day, it really comes down to the complexity of your API and who it will be used by.

30 May 2011

Searching with Sphinx on Heroku

Just over two weeks ago, I released Flying Sphinx – which provides Sphinx search capability for Heroku apps. I’ll talk more about how I built it and the challenges faced at some point, but right now I just want to introduce the service and how you may go about using it.

Why Sphinx?

Perhaps you’re not familiar with Sphinx and how it can be useful. For those who are new to Sphinx, it’s a full-text search tool – think of your own personal Google for within your website. It comes with two main moving parts – the indexer tool for interpreting and storing your search data (indices), and the searchd tool, which runs as a daemon accepting search requests, and returns the most appropriate matches for a given search query.

In most situations, Sphinx is very fast at indexing your data, and connects directly to MySQL and PostgreSQL databases – so it’s quite a good fit for a lot of Rails applications.

Using Sphinx in Rails

I’ve written a gem, Thinking Sphinx, which integrates Sphinx neatly with ActiveRecord. It allows you to define indices in your models, and then use rake tasks to handle the processing of these indices, along with managing the searchd daemon.

If you want to install Sphinx, have a read through of this guide from the Thinking Sphinx documentation – in most cases it should be reasonably painless.

Installing Thinking Sphinx in a Rails 3 application is quite simple – just add the gem to your Gemfile:

gem 'thinking-sphinx', '2.0.5'

For older versions of Rails, the Thinking Sphinx docs have more details.

I’m not going to get too caught up in the details of how to structure indices – this is also covered within the Thinking Sphinx documentation – but here’s a quick example, for user account:

class User < ActiveRecord::Base
  # ...
  
  define_index do
    indexes name, :sortable => true
    indexes location
    
    has admin, created_at
  end
  
  # ...
end

The indexes method defines fields – which are the textual data that people can search for. In this case, we’ve got the user names and locations covered. The has method is for attributes – which are used for filtering and sorting (fields can’t be used for sorting by default). The distinction of fields and attributes is quite important – make sure you understand the difference.

Now that we have our index defined, we can have Sphinx grab the required data from our database, which is done via a rake task:

rake ts:index

What Sphinx does here is grab all the required data from the database, inteprets it and stores it in a custom format. This allows Sphinx to be smarter about ranking search results and matching words within your fields.

Once that’s done, we next start up the Sphinx daemon:

rake ts:start

And now we can search! Either in script/console or in an appropriate action, just use the search method on your model:

User.search 'pat'

This returns the first page of users that match your search query. Sphinx always paginates results – though you can set the page size to be quite large if you wish – and Thinking Sphinx search results can be used by both WillPaginate and Kaminari pagination view helpers.

Instead of sorting by the most relevant matches, here’s examples where we sort by name and created_at:

User.search 'pat', :order => :name
User.search 'pat', :order => :created_at

And if we only want admin users returned in our search, we can filter on the admin attribute:

User.search 'pat', :with => {:admin => true}

There’s many more options for search calls – the documentation (yet again) covers most of them quite well.

One more thing to remember – if you change your index structures, or add/remove index defintions, then you should restart and reindex Sphinx. This can be done in a single rake task:

rake ts:rebuild

If you just want the latest data to be processed into your indices, there’s no need to restart Sphinx – a normal ts:index call is fine.

Using Thinking Sphinx with Heroku

Now that we’ve got a basic search setup working quite nicely, let’s get it sorted out on Heroku as well. Firstly, let’s add the flying-sphinx gem to our Gemfile (below our thinking-sphinx reference):

gem 'flying-sphinx', '0.5.0'

Get that change (along with your indexed model setup) deployed to Heroku, then inform Heroku you’d like to use the Flying Sphinx add-on (the entry level plan costs $12 USD per month):

heroku addons:add flying_sphinx:wooden

And finally, let’s get our data on the site indexed and the daemon running:

heroku rake fs:index
heroku rake fs:start

Note the fs prefix instead of the ts prefix in those rake calls – the normal Thinking Sphinx tasks are only useful on your local machine (or on servers that aren’t Heroku).

When you run those rake tasks, you will probably see the following output:

Sphinx cannot be found on your system. You may need to configure the
following settings in your config/sphinx.yml file:
  * bin_path
  * searchd_binary_name
  * indexer_binary_name

For more information, read the documentation:
http://freelancing-god.github.com/ts/en/advanced_config.html

This is because Thinking Sphinx doesn’t have access to Sphinx locally, and isn’t sure which version of Sphinx is available. To have these warnings silenced, you should add a config/sphinx.yml file to your project, with the version set for the production environment:

production:
  version: 1.10-beta

Push that change up to Heroku, and you won’t see the warnings again.

For the more curious of you: the Sphinx daemon is located on a Flying Sphinx server, also located within the Amazon cloud (just like Heroku) to keep things fast and cheap. This is all managed by the flying-sphinx gem, though – you don’t need to worry about IP addresses or port numbers.

Also: the same rules apply with Flying Sphinx for modifying index structures or adding/removing index definitions – make sure you restart Sphinx so it’s aware of the changes:

heroku rake fs:rebuild

The final thing to note is that you’ll want the data in your Sphinx indices updated regularly – perhaps every day or every hour. This is best done on Heroku via their Cron add-on – since that’s just a rake task as well.

If you don’t have a cron task already, the following (perhaps in lib/tasks/cron.rake) will do the job:

desc 'Have cron index the Sphinx search indices'
task :cron => 'fs:index'

Otherwise, maybe something more like the following suits:

desc 'Have cron index the Sphinx search indices'
task :cron => 'fs:index' do
  # Other things to do when Cron comes calling
end

If you’d like your search data to have your latest changes, then I recommend you read up on delta indexing – both for Thinking Sphinx and for Flying Sphinx.

Further Sources

Keep in mind this is just an introduction – the documentation for Thinking Sphinx is pretty good, and Flying Sphinx is improving regularly. There’s also the Thinking Sphinx google group and the Flying Sphinx support site if you have questions about either, along with numerous blog posts (though the older they are, the more likely they’ll be out of date). And finally – I’m always happy to answer questions about this, so don’t hesitate to get in touch.

RssSubscribe to the RSS feed

About Freelancing Gods

Freelancing Gods is written by , who works on the web as a web developer in Melbourne, Australia, specialising in Ruby on Rails.

In case you're wondering what the likely content here will be about (besides code), keep in mind that Pat is passionate about the internet, music, politics, comedy, bringing people together, and making a difference. And pancakes.

His ego isn't as bad as you may think. Honest.

Here's more than you ever wanted to know.

Ruby on Rails Projects

Other Sites

Creative Commons Logo All original content on this site is available through a Creative Commons by-nc-sa licence.