Freelancing Gods 2012

God
22 Jan 2012

Backing up with Backup

I’ve found myself singing the praises of Michael van Rooijen’s backup gem twice in quick succession lately – and so, I just want to run through how I’m using it, and how useful I find it.

For those not familiar with it, Backup provides a neat DSL for creating backup scripts with archiving files and databases through to common data stores (S3, Rackspace, SFTP, etc), with notifications via email, Campfire and others. If you want a rundown of all the options, click the link above – there’s quite a few. I’m using the gem to make sure all critical data for Flying Sphinx is stored in multiple locations – and particularly, with different providers.

The documentation’s pretty solid, so I won’t keep you long, but here’s two examples. First up, here’s my script for copying an archive of essential files (including a SQLite database) off to Ninefold – with the private details changed:

Backup::Model.new(:database_backup, "Database Backup") do
  archive :oedipus do |archive|
    archive.add '/mnt/sphinx/oedipus'
  end

  compress_with Gzip do |compression|
    compression.best = true
  end

  store_with Ninefold do |nf|
    nf.storage_token  = 'STORAGE_TOKEN'
    nf.storage_secret = 'STORAGE_SECRET'
    nf.path           = "oedipus/#{`hostname`.strip}"
    nf.keep           = 20
  end

  notify_by Mail do |mail|
    mail.on_success = true
    mail.on_failure = true

    mail.from      = 'support-at-flying-sphinx'
    mail.to        = 'pat-at-freelancing-gods'
    mail.address   = 'smtp.sendgrid.com'
    mail.user_name = 'SMTP_USER_NAME'
    mail.password  = 'SMTP_PASSWORD'
  end
end

For the above, I added Ninefold support to Backup, and Michael was kind enough to merge my commits in.

For my next script, though, I’m syncing directories to both S3 (in Singapore) and Rackspace (in the UK). The current releases of Backup don’t support syncing to Rackspace – but I ended up taking inspiration from fellow Melburnian Ryan Allen’s Sir Sync-a-Lot and rewrote the S3 support with his bulk MD5 approach. The code was simple enough – thanks to Wesley Beary’s excellent Fog – so I adapted the code to handle Rackspace as well.

However, I’ve not written tests for this, and my code does not yet support mirroring – so, I’ve not yet provided a patch back to Michael. If you want to use my code, feel free – but I will get to submitting a proper patch soon.

All that said, here’s the script:

Backup::Model.new(:volume_backup, "Sphinx Backup") do
  sync_with S3 do |s3|
    s3.access_key_id      = 'ACCESS_KEY'
    s3.secret_access_key  = 'SECRET_KEY'
    s3.bucket             = "fs-#{`hostname`.strip}-sync"
    s3.region             = 'ap-southeast-1'
    s3.path               = ''
    s3.mirror             = false

    s3.directories do |directory|
      directory.add '/mnt/sphinx/oedipus'
      directory.add '/mnt/sphinx/flying-sphinx'
    end
  end

  sync_with Rackspace do |rs|
    rs.api_key  = 'API_KEY'
    rs.username = 'USER_NAME'
    rs.auth_url = 'lon.auth.api.rackspacecloud.com'
    rs.bucket   = "fs-#{`hostname`.strip}-sync"
    rs.path     = ''
    rs.mirror   = false

    rs.directories do |directory|
      directory.add '/mnt/sphinx/oedipus'
      directory.add '/mnt/sphinx/flying-sphinx'
    end
  end

  notify_by Mail do |mail|
    mail.on_success = true
    mail.on_failure = true

    mail.from      = 'support-at-flying-sphinx'
    mail.to        = 'pat-at-freelancing-gods'
    mail.address   = 'smtp.sendgrid.com'
    mail.user_name = 'SMTP_USER_NAME'
    mail.password  = 'SMTP_PASSWORD'
  end
end

I’ve been running the first script for several months, and the second for close to a month – both via cron – and had no problems at all. If you’ve not got a solid backup system in place because you’re finding it complex and frustrating, you’ve now got one less excuse.

21 Nov 2011

Cut and Polish: A Guide to Crafting Gems

As I mentioned here earlier in the year, a few weeks ago I had the pleasure of visiting Ukraine and speaking at the RubyC conference in Kyiv. My talk was a run through of how to build gems, some of the tools that can help, and a few best practices.

The video of my session is now online, if you’re interested:

There’s also the slides with notes, if you prefer that.

One of the questions asked towards the end was about publishing private gems, which I’d not dealt with before. However, Darcy was quick to tweet that Gemfury looks like a promising solution for those scenarios.

Please let me know if you think I’ve missed any critical elements of building and publishing gems – or if you have any further questions.

And many thanks to the RubyC team for putting together the conference and inviting me to speak – I had a great time!

19 Oct 2011

A Sustainable Flying Sphinx?

In which I muse about what a sustainable web service could look like – but first, the backstory:

A year ago – almost to the day – I sat in a wine bar in Sydney’s Surry Hills with Steve Hopkins. I’d been thinking about how to get Sphinx working on Heroku, and ran him through the basic idea in my head of how it could work. His first question was “So, what are you working on tomorrow, then?”

By the end of the following day, I had some idea of how it would work. Over the next few months I had a proof of concept working, hit some walls, began again, and finally got to a point where I could launch an alpha release of Flying Sphinx.

In May, Flying Sphinx became available for all Heroku users – and earlier today (five months later), I received my monthly provider payment from Heroku, with the happy news that I’m now earning enough to cover all related ongoing expenses – things like AWS for the servers, Scalarium to manage them, and Tender for support.

Now, I’m not rolling in cash, and I’m certainly not earning enough through Flying Sphinx to pay rent, let alone be in a position to drop all client work and focus on Flying Sphinx full-time. That’s cool, either of those targets would be amazing.

And of course, money isn’t the be all and end all – even though this is a business, and I certainly don’t want to run at a loss. I want Flying Sphinx to be sustainable – in that it covers not only the hosting costs, but my time as well, along with supporting the broader system around it – code, people and beyond.

But what does a sustainable web service look like, particularly beyond the standard (outmoded) financial axis?

Sustainable Time

Firstly (and selfishly), it should cover the time spent maintaining and expanding the service. Flying Sphinx doesn’t use up a huge amount of my time right at the moment, but I’m definitely keen to improve a few things (in particular, offer Sphinx 2.0.1 alongside the existing 1.10-beta installation), and there is the occasional support query to deal with.

This one’s relatively straight-forward, really – I can track all time spent on Flying Sphinx and multiply that by a decent hourly rate. If it turns out I can’t manage all the work myself, then I pay someone else to help.

It certainly doesn’t look like I’m going to need anyone helping in the near future, mind you – nor am I drowning in support requests.

Sustainable Software

Ignoring the time I spend writing code for Flying Sphinx (as that’s covered by the previous section), pretty much every other piece of software involved with the service is open source. Front and centre among these is Sphinx itself.

I certainly don’t expect to be paid for my own open source contributions, but it certainly helps when there’s some funds trickling in to help motivate dealing with support questions, fixing bugs and adding features. It can also provide a stronger base to build a community as well.

With this in mind, I’m considering setting aside a percentage of any profit for Sphinx development – as any improvements to that help make Flying Sphinx a stronger offering.

(I could also cover my time spent on Thinking Sphinx either with a percentage cut – either way it would end up in my pocket though.)

Sustainable Hardware

This is where things get a little trickier – we’re not just dealing with bits and electrons, but also silicon and metals. The human race is pretty bad at weaning itself off of limited (as opposed to renewable) resources, and the hardware industry certainly is going to hit some limits in the future as certain metals become harder to source.

Of course, the servers use a lot of energy, so one thing I will be doing is offsetting the carbon. I’ve not yet figured out the best service to do this, but will start by looking at Brighter Planet.

From a social perspective, there’s also questions about how those resources are sourced. We should be considering the working conditions of where the metals are mined (and by whom), the people who are soldering the logic boards, and those who place the finished products into racks in data centres.

As an example, let’s look at Amazon. Given the recent issues raised with the conditions for staff in their warehouses, I think it’s fair to seek clarification on the situation of their web service colleagues. And what if there were significant ethical issues for using AWS? What then for Flying Sphinx, which runs EC2 instances and is an add-on for Heroku, a business built entirely on top of Amazon’s offerings?

I could at least use servers elsewhere – but that means bandwidth between servers and Heroku apps starts to cost money – and we introduce a step of latency into the service. Neither of those things are ideal. Or I could just say that I don’t want to support Amazon at all, and shut down Flying Sphinx, remove all my Heroku apps, and find some other hosting service to use.

Am I getting a little too carried away? Perhaps, but this is all hypothetical anyway. I’m guessing Amazon’s techs are looked after decently (though I’d love some confirmation on this), and am hoping the situation improves for their warehouse staff as well.

I am still searching for answers for what truly sustainable hardware – and moreso, sustainable web services – financially, socially, environmentally, and technically. What’s your take? What have I forgotten?

24 Sep 2011

Versioning your APIs

As I developed Flying Sphinx, I found myself both writing and consuming several APIs: from Heroku to Flying Sphinx, Flying Sphinx to Heroku, the flying-sphinx gem in apps to Flying Sphinx, Flying Sphinx to Sphinx servers, and Sphinx servers to Flying Sphinx.

None of that was particularly painful – but when Josh Kalderimis was improving the flying-sphinx gem, he noted that the API it interacts with wasn’t that great. Namely, it was inconsistent with what it returned (sometimes text status messages, sometimes JSON), it was sending authentication credentials as GET/POST parameters instead of in a header, and it wasn’t versioned.

I was thinking that given I control pretty much every aspect of the service, it didn’t matter if the APIs had versions or not. However, as Josh and I worked through improvements, it became clear that the apps using older versions of the flying-sphinx gem were going to have one expectation, and newer versions another. Versioning suddenly became a much more attractive idea.

The next point of discussion was how clients should specify which version they are after. Most APIs put this in the path – here’s Twitter’s as an example, specifying version 1:

https://api.twitter.com/1/statuses/user_timeline.json

However, I’d recently been working with Scalarium’s API, and theirs put the version information in a header (again, version 1):

Accept: application/vnd.scalarium-v1+json

Some research turned up a discussion on Hacker News about best practices for APIs – and it’s argued there that using headers keeps the paths focused on just the resource, which is a more RESTful approach. It also makes for cleaner URLs, which I like as well.

How to implement this in a Rails application though? My routing ended up looking something like this:

namespace :api do
  constrants ApiVersion.new(1) do
    scope :module => :v1 do
      resource :app do
        resources :indices
      end
    end
  end

  constraints ApiVersion.new(2) do
    scope :module => :v2
      resource :app do
        resources :indices
      end
    end
  end
end

The ApiVersion class (which I have saved to app/lib/api_version.rb) is where we check the version header and route accordingly:

class ApiVersion
  def initialize(version)
    @version = version
  end

  def matches?(request)
    versioned_accept_header?(request) || version_one?(request)
  end

  private

  def versioned_accept_header?(request)
    accept = request.headers['Accept']
    accept && accept[/application\/vnd\.flying-sphinx-v#{@version}\+json/]
  end

  def unversioned_accept_header?(request)
    accept = request.headers['Accept']
    accept.blank? || accept[/application\/vnd\.flying-sphinx/].nil?
  end

  def version_one?(request)
    @version == 1 && unversioned_accept_header?(request)
  end
end

You’ll see that I default to version 1 if no header is supplied. This is for the older versions of the flying-sphinx gem – but if I was starting afresh, I may default to the latest version instead.

All of this gives us URLs that look like something like this:

http://flying-sphinx.com/api/app
http://flying-sphinx.com/api/app/indices

My SSL certificate is locked to flying-sphinx.com – if it was wildcarded, then I’d be using a subdomain ‘api’ instead, and clean those URLs up even further.

The controllers are namespaced according to both the path and the version – so we end up with names like Api::V2::AppsController. It does mean you get a new set of controllers for each version, but I’m okay with that (though would welcome suggestions for other approaches).

Authentication is managed by namespaced application controllers – here’s an example for version 2, where I’m using headers:

class Api::V2::ApplicationController < ApplicationController
  skip_before_filter :verify_authenticity_token
  before_filter :check_api_params

  expose(:app) { App.find_by_identifier identifier }

  private

  def check_api_params
    # ensure the response returns with the same header value
    headers['X-Flying-Sphinx-Token'] = request.headers['X-Flying-Sphinx-Token']
    render_json_with_code 403 unless app && app.api_key == api_key
  end

  def api_token
    request.headers['X-Flying-Sphinx-Token']
  end

  def identifier
    api_token && api_token.split(':').first
  end

  def api_key
    api_token && api_token.split(':').last
  end
end

Authentication, in case it’s not clear, is done by a header named X-Flying-Sphinx-Token with a value of the account’s identifier and api_key concatenated together, separated by a colon.

(If you’re not familiar with the expose method, that’s from the excellent decent_exposure gem.)

So where does that leave us? Well, we have an elegantly namespaced API, and both versions and authentication is managed in headers instead of paths and parameters. I also made sure version 2 responses all return JSON. Josh is happy and all versions of the flying-sphinx gem are happy.

The one caveat with all of this? While it works for me, and it suits Flying Sphinx, it’s not the One True Way for API development. We had a great discussion at the most recent Rails Camp up at Lake Ainsworth about different approaches – at the end of the day, it really comes down to the complexity of your API and who it will be used by.

10 Sep 2011

Speaking at RubyC

Just a quick note for anyone in or near Eastern Europe – I’ll be heading over to Kiev for RubyC in November. I’m going to be speaking there about how to build gems and the best practices when doing so.

RubyC

So, if that interests you (or you’d just like to catch up or hear some of the other speakers talk about interesting Ruby-related topics), then hopefully I’ll see you there!

02 Sep 2011

Combustion - Better Rails Engine Testing

I spent a good part of last month writing my first Rails engine – although it’s not yet released and for a client, so I won’t talk about that too much here.

Very quickly in the development process, I was looking around on how to test Rails engines. It seemed that, beyond some basic unit tests, having a full Rails application within your test or spec directory was the accepted approach for integration testing.

That felt kludgy and bloated to me, so I decided to try something a little different.

The end goal was full stack testing in a clear and manageable fashion – writing specs within my spec directory, not a bundled Rails app’s spec directory. Capybara’s DSL would be nice as well.

This, of course, meant having a Rails application to test through – but it turns out you can get away without the vast majority of files that Rails generates for you. Indeed, the one file a Rails app expects is config/database.yml – and that’s only if you have ActiveRecord in play.

Enter Combustion – my minimal Rails app-as-a-gem for testing engines, with smart defaults for your standard Rails settings.

Setting It Up

A basic setup is as follows:

  • Add the gem to your gemspec or Gemfile.
  • Run the generator in your engine’s directory to get a small Rails app stub created: combust (or bundle exec combust if you’re referencing the git repository instead).
  • Add Combustion.initialize! to your spec/spec_helper.rb (currently only RSpec is supported, but shouldn’t be hard to patch for TestUnit et al).

Here’s a sample spec_helper, mixing in Capybara as well:

require 'rubygems'
require 'bundler'

Bundler.require :default, :development

require 'capybara/rspec'

Combustion.initialize!

require 'rspec/rails'
require 'capybara/rails'

RSpec.configure do |config|
  config.use_transactional_fixtures = true
end

Putting It To Work

Firstly, you’ll want to make sure you’re using your engine within the test Rails application. The generator has likely added the hooks we need for this. If you’re adding routes, then edit spec/internal/config/routes.rb. If you’re dealing with models, make sure you add the tables to spec/internal/db/schema.rb. The README covers this a bit more detail.

And then, get stuck into your specs. Here’s a really simple example:

# spec/controllers/users_controller_spec.rb
require 'spec_helper'

describe UsersController do
  describe '#new' do
    it "runs successfully" do
      get :new

      response.should be_success
    end
  end
end

Or, using Capybara for integration:

# spec/acceptance/visitors_can_sign_up_spec.rb
require 'spec_helper'

describe 'authentication process' do
  it 'allows a visitor to sign up' do
    visit '/'

    click_link 'Sign Up'
    fill_in 'Name',     :with => 'Pat Allan'
    fill_in 'Email',    :with => 'pat@no-spam-please.com'
    fill_in 'Password', :with => 'chunkybacon'
    click_button 'Sign Up'

    page.should have_content('Sign Out')
  end
end

And that’s really the core of it. Write the specs you need to test your engine within the context of a full Rails application. If you need models, controllers or views in the internal application to fully test out your engine, then add them to the appropriate location within spec/internal – but only add what’s necessary.

Rack It Up

Oh, and one of my favourite little helpers is this: Combustion’s generator adds a config.ru file to your engine, which means you can fire up your test application in the browser – just run rackup and visit http://localhost:9292.

Caveats

As already mentioned, Combustion is built with RSpec in mind – but I will happily accept patches for TestUnit as well. Same for Cucumber – should work in theory, but I’m yet to try it.

It’s also written for Rails 3.1 – it may work with Rails 3.0 with some patches, but I very much doubt it’ll play nicely with anything before that. Still, feel free to investigate.

And it’s possible that this could be useful for integration testing for libraries that aren’t engines. If you want to try that, I’d love to hear how it goes.

Final Notes

So, where do we stand?

  • You can test your engine within a full Rails stack, without a full Rails app.
  • You only add what you need to your Rails app stub (that lives in spec/internal).
  • Your testing code is DRYer and easier to maintain.
  • You can use standard RSpec and Capybara helpers for integration testing.
  • You can view your test application via Rack.

I’m not the first to come up with this idea – after I had finished Combustion, it was pointed out to me that Kaminari’s test suite does a similar thing (just not extracted out into a separate library). It wouldn’t surprise me if others have done the same – but in my searching, I kept coming across well-known libraries with full Rails apps in their test or spec directories.

If you think Combustion could suit your engine, please give it a spin – I’d love to have others kick the tires and ensure it works in a wider set of situations. Patches and feedback are most definitely welcome.

30 May 2011

Searching with Sphinx on Heroku

Just over two weeks ago, I released Flying Sphinx – which provides Sphinx search capability for Heroku apps. I’ll talk more about how I built it and the challenges faced at some point, but right now I just want to introduce the service and how you may go about using it.

Why Sphinx?

Perhaps you’re not familiar with Sphinx and how it can be useful. For those who are new to Sphinx, it’s a full-text search tool – think of your own personal Google for within your website. It comes with two main moving parts – the indexer tool for interpreting and storing your search data (indices), and the searchd tool, which runs as a daemon accepting search requests, and returns the most appropriate matches for a given search query.

In most situations, Sphinx is very fast at indexing your data, and connects directly to MySQL and PostgreSQL databases – so it’s quite a good fit for a lot of Rails applications.

Using Sphinx in Rails

I’ve written a gem, Thinking Sphinx, which integrates Sphinx neatly with ActiveRecord. It allows you to define indices in your models, and then use rake tasks to handle the processing of these indices, along with managing the searchd daemon.

If you want to install Sphinx, have a read through of this guide from the Thinking Sphinx documentation – in most cases it should be reasonably painless.

Installing Thinking Sphinx in a Rails 3 application is quite simple – just add the gem to your Gemfile:

gem 'thinking-sphinx', '2.0.5'

For older versions of Rails, the Thinking Sphinx docs have more details.

I’m not going to get too caught up in the details of how to structure indices – this is also covered within the Thinking Sphinx documentation – but here’s a quick example, for user account:

class User < ActiveRecord::Base
  # ...
  
  define_index do
    indexes name, :sortable => true
    indexes location
    
    has admin, created_at
  end
  
  # ...
end

The indexes method defines fields – which are the textual data that people can search for. In this case, we’ve got the user names and locations covered. The has method is for attributes – which are used for filtering and sorting (fields can’t be used for sorting by default). The distinction of fields and attributes is quite important – make sure you understand the difference.

Now that we have our index defined, we can have Sphinx grab the required data from our database, which is done via a rake task:

rake ts:index

What Sphinx does here is grab all the required data from the database, inteprets it and stores it in a custom format. This allows Sphinx to be smarter about ranking search results and matching words within your fields.

Once that’s done, we next start up the Sphinx daemon:

rake ts:start

And now we can search! Either in script/console or in an appropriate action, just use the search method on your model:

User.search 'pat'

This returns the first page of users that match your search query. Sphinx always paginates results – though you can set the page size to be quite large if you wish – and Thinking Sphinx search results can be used by both WillPaginate and Kaminari pagination view helpers.

Instead of sorting by the most relevant matches, here’s examples where we sort by name and created_at:

User.search 'pat', :order => :name
User.search 'pat', :order => :created_at

And if we only want admin users returned in our search, we can filter on the admin attribute:

User.search 'pat', :with => {:admin => true}

There’s many more options for search calls – the documentation (yet again) covers most of them quite well.

One more thing to remember – if you change your index structures, or add/remove index defintions, then you should restart and reindex Sphinx. This can be done in a single rake task:

rake ts:rebuild

If you just want the latest data to be processed into your indices, there’s no need to restart Sphinx – a normal ts:index call is fine.

Using Thinking Sphinx with Heroku

Now that we’ve got a basic search setup working quite nicely, let’s get it sorted out on Heroku as well. Firstly, let’s add the flying-sphinx gem to our Gemfile (below our thinking-sphinx reference):

gem 'flying-sphinx', '0.5.0'

Get that change (along with your indexed model setup) deployed to Heroku, then inform Heroku you’d like to use the Flying Sphinx add-on (the entry level plan costs $12 USD per month):

heroku addons:add flying_sphinx:wooden

And finally, let’s get our data on the site indexed and the daemon running:

heroku rake fs:index
heroku rake fs:start

Note the fs prefix instead of the ts prefix in those rake calls – the normal Thinking Sphinx tasks are only useful on your local machine (or on servers that aren’t Heroku).

When you run those rake tasks, you will probably see the following output:

Sphinx cannot be found on your system. You may need to configure the
following settings in your config/sphinx.yml file:
  * bin_path
  * searchd_binary_name
  * indexer_binary_name

For more information, read the documentation:
http://freelancing-god.github.com/ts/en/advanced_config.html

This is because Thinking Sphinx doesn’t have access to Sphinx locally, and isn’t sure which version of Sphinx is available. To have these warnings silenced, you should add a config/sphinx.yml file to your project, with the version set for the production environment:

production:
  version: 1.10-beta

Push that change up to Heroku, and you won’t see the warnings again.

For the more curious of you: the Sphinx daemon is located on a Flying Sphinx server, also located within the Amazon cloud (just like Heroku) to keep things fast and cheap. This is all managed by the flying-sphinx gem, though – you don’t need to worry about IP addresses or port numbers.

Also: the same rules apply with Flying Sphinx for modifying index structures or adding/removing index definitions – make sure you restart Sphinx so it’s aware of the changes:

heroku rake fs:rebuild

The final thing to note is that you’ll want the data in your Sphinx indices updated regularly – perhaps every day or every hour. This is best done on Heroku via their Cron add-on – since that’s just a rake task as well.

If you don’t have a cron task already, the following (perhaps in lib/tasks/cron.rake) will do the job:

desc 'Have cron index the Sphinx search indices'
task :cron => 'fs:index'

Otherwise, maybe something more like the following suits:

desc 'Have cron index the Sphinx search indices'
task :cron => 'fs:index' do
  # Other things to do when Cron comes calling
end

If you’d like your search data to have your latest changes, then I recommend you read up on delta indexing – both for Thinking Sphinx and for Flying Sphinx.

Further Sources

Keep in mind this is just an introduction – the documentation for Thinking Sphinx is pretty good, and Flying Sphinx is improving regularly. There’s also the Thinking Sphinx google group and the Flying Sphinx support site if you have questions about either, along with numerous blog posts (though the older they are, the more likely they’ll be out of date). And finally – I’m always happy to answer questions about this, so don’t hesitate to get in touch.

RssSubscribe to the RSS feed

Recent Links

Recent Posts

Tag Density

About Freelancing Gods

Freelancing Gods is written by , who works on the web as a web developer in Melbourne, Australia, specialising in Ruby on Rails.

In case you're wondering what the likely content here will be about (besides code), keep in mind that Pat is passionate about the internet, music, politics, comedy, bringing people together, and making a difference. And pancakes.

His ego isn't as bad as you may think. Honest.

Here's more than you ever wanted to know.

Ruby on Rails Projects

Other Sites

Creative Commons Logo All original content on this site is available through a Creative Commons by-nc-sa licence.