Common Questions and Issues
Depending on how you have Sphinx setup, or what database you’re using, you might come across little issues and curiosities. Here’s a few to be aware of.
- Editing the generated Sphinx configuration file
- Running multiple instances of Sphinx on one machine
- Viewing Result Weights
- Wildcard Searching
- Slow Indexing
- MySQL and Large Fields
- PostgreSQL with Manual Fields and Attributes
- Delta Indexing Not Working
- Running Delta Indexing with Passenger
- Can only access the first thousand search results
- Vendored Delayed Job, AfterCommit and Riddle
- Filtering on String Attributes
- Models outside of
- Using Thinking Sphinx with Bundler
- Mixing Ranged Filters and OR Logic
- Removing HTML from Excerpts
- Using other Database Adapters
- Using OR Logic with Attribute Filters
- Catching Exceptions when Searching
- Slow Requests (Especially in Development)
- Errors saying no fields are defined
Editing the generated Sphinx configuration file
In most situations, you won’t need to edit this file yourself, and can rely on Thinking Sphinx to generate it reliably.
If you do want to customise the settings, you’ll find most options are
available to set via
config/sphinx.yml - many are mentioned on the
Advanced Sphinx Configuration page. For those
that aren’t mentioned on that page, you could still try setting it, and
there’s a fair chance it will work.
On the off chance that you actually do need to edit the file, make sure
you’re running the
thinking_sphinx:reindex task instead of the normal
thinking_sphinx:index task - as the latter will always regenerate the
configuration file, overwriting your customisations.
Running multiple instances of Sphinx on one machine
You can run as many Sphinx instances as you wish on one machine - but
each must be bound to a different port. You can do this via the
config/sphinx.yml file - just add a setting for the port for the
Other options are documented on the Advanced Sphinx Configuration page.
Viewing Result Weights
To retrieve the weights/rankings of each search result, you can
enumerate through your matches using
However, there is currently no clean way to get the weight of a specific result without looping though the dataset.
Sphinx can support wildcard searching (for example: Austr∗), but it is turned off by default. To enable it, you need to add two settings to your config/sphinx.yml file:
You can set the min_infix_len value to something higher if you don’t need single characters with a wildcard being matched. This may be a worthwhile fine-tuning, because the smaller the infixes are, the larger your index files become.
Don’t forget to rebuild your Sphinx indexes after making this change.
If Sphinx is taking a while to process all your records, there are a few common reasons for this happening. Firstly, make sure you have database indexes on any foreign key columns and any columns you filter or sort by.
Secondly - are you using fixtures, or are there large gaps between primary key values for your models? Sphinx isn’t set up to process disparate IDs efficiently by default - and Rails’ fixtures have randomly generated IDs, which are usually extremely large integers. To get around this, you’ll need to set sql_range_step in your config/sphinx.yml file for the appropriate environments:
MySQL and Large Fields
If you’ve got a field that is built off multiple values in one column - ie: through a has_many association - then you may hit MySQL’s default limit for string concatenation: 1024 characters. You can increase the group_concat_max_len value by adding the following to your define_index block:
If these fields get particularly large though, then there’s another setting you may need to set in your MySQL configuration: max_allowed_packet, which has a default of sixteen megabytes. You can’t set this option via Thinking Sphinx though (it’s a rare edge case).
PostgreSQL with Manual Fields and Attributes
If you’re using fields or attributes defined by strings (raw SQL), then the columns used in them aren’t automatically included in the GROUP BY clause of the generated SQL statement. To make sure the query is valid, you will need to explicitly add these columns to the GROUP BY clause.
A common example is if you’re converting latitude and longitude columns from degrees to radians via SQL.
Delta Indexing Not Working
Often people find delta indexing isn’t working on their production server. Sometimes, this is because Sphinx is running as one user on the system, and the Rails/Merb application is being served as a different user. Check your production.log and Apache/Nginx error log file for mentions of permissions issues to confirm this.
Indexing for deltas is invoked by the web user, and so needs to have access to the index files. The simplest way to ensure this is run all Thinking Sphinx rake tasks by that web user.
If you’re still having issues, and you’re using Passenger, read the next hint.
Running Delta Indexing with Passenger
If you’re using Phusion Passenger on your production server, with delta indexing on some models, a common issue people find is that their delta indexes don’t get processed.
If it’s not a permissions issue (see the previous hint), another common cause is because Passenger has it’s own PATH set up, and can’t execute the Sphinx binaries (indexer and searchd) implicitly.
The way around this is to find out where your binaries are on the server:
And then set the bin_path option in your config/sphinx.yml file for the production environment:
Can only access the first thousand search results
This is actually how Sphinx is supposed to behave. Have a read of the Large Result Sets section of the Advanced Configuration page to see why, and how to work around it if you really need to.
Vendored Delayed Job, AfterCommit and Riddle
If you’ve still got Delayed Job vendored as part of Thinking Sphinx and would rather use a more up-to-date version of the former, recent releases of Thinking Sphinx do not have it included any longer.
As for AfterCommit and Riddle, while they are still included for plugin installs, they’re no longer in the Thinking Sphinx gem (since 1.3.3). Instead, they are considered dependencies, and will be installed as separate gems.
Filtering on String Attributes
While you can have string columns as attributes in Sphinx, they aren’t stored as strings. Instead, Sphinx figures out the alphabetical order, and gives each string an integer value to make them useful for sorting. However, this means it’s close to impossible to filter on these attributes.
So, to get around this, there’s two options: firstly, use integer attributes instead, if you possibly can. This works for small result sets (for example: gender). Otherwise, you might want to consider manually converting the string to a CRC integer value:
This way, you can filter on it like so:
Of course, this isn’t amazingly clean, but it will work quite well. You should also take note that CRC32 encoding can have collisions, so it’s not the perfect solution.
Models outside of `app/models`
If you’re using plugins or other web frameworks (Radiant, Ramaze, etc)
that don’t always store their models in
app/models, you can tell
Thinking Sphinx to look in other locations when building the
By default, Thinking Sphinx will load all models in @app/models@ and @vendor/plugins/*/app/models@.
Using Thinking Sphinx with Bundler
If you’re using Thinking Sphinx with the gem manager Bundler, you will
need to set the
:require option to thinking_sphinx.
If this isn’t done, it can introduce issues with gem loading order and
script/console. And don’t forget that you will still need to explicitly
request the Thinking Sphinx tasks in your
Mixing Ranged Filters and OR Logic
While Sphinx allows for querying with ranged filters on attributes, you can’t have multiple filters joined by OR logic - all must match.
As a way around this, you might want to construct a SQL snippet which returns specific values for each range interval, and then filter by an array of values for the intervals you want. Check out Tiago’s solution on the Google Group.
This won’t suit all situations, of course - if you don’t have specific range intervals, then you’re going to have to try something else.
Removing HTML from Excerpts
For a while, Thinking Sphinx auto-escaped excerpts. However, Sphinx
itself can remove HTML entities for indexing and excerpts, which is a
better way to approach this. So, you’ll want to add the following
setting to your
Using other Database Adapters
If you’re using Thinking Sphinx in combination with a database adapter that isn’t quite run-of-the-mill, you may need to add a snippet of code to a Rails initialiser or equivalent (This is only available in versions 1.4.0 and 2.0.0 onwards, though).
Here’s an example that covers things for Octopus:
ThinkingSphinx.database_adapter accepts a symbol as well,
if you just want to presume that you’ll always be using either MySQL or
In most situations, though, you shouldn’t need to do this. Thinking Sphinx understands the standard MySQL, PostgreSQL, MySQL2, MySQL Plus and NullDB (as MySQL) adapters.
Using OR Logic with Attribute Filters
It is possible to filter on attributes using OR logic - although you need to be using Sphinx 0.9.9 or newer.
There’s two steps to it… firstly, you need to create a computed attribute while searching, using Sphinx’s select option, and then filter by that computed value. Here’s an example where we want to return all publicly visible articles, as well as articles belonging to the user with an ID of 5.
It’s important to note that you’ll want to include all existing
attribute values by default (that’s the
* at the start of the select).
It’s quite similar to standard SQL syntax.
Catching Exceptions when Searching
By default, Thinking Sphinx does not execute the search query until you examine your search results - which is usually in the view. This is so you can chain sphinx scopes without sending multiple (unnecessary) queries to Sphinx.
However, this means that exceptions will be fired from within the view - and most people put their exception handling in the controller. To force exceptions to fire when you actually define the search, all you need to do is to inform Thinking Sphinx that it should populate the results immediately:
Obviously, if you’re chaining scopes together, make sure you add this at the end with a final search call:
Slow Requests (Especially in Development)
If you’re finding a lot of requests are quite slow (particularly in your
local development environment), this could be because you have a lot of
models. Thinking Sphinx loads all models to determine which ones are
indexed by Sphinx (this is necessary to load search results), but you
can make things much faster by setting out a list of indexed
models in your
Errors saying no fields are defined
If you have defined fields (using the
indexes method) but you’re
getting an error saying none are defined, it could be due to other gems
packaging custom (and perhaps broken) versions of the BlankSlate gem. To
get around this, add the proper BlankSlate gem to your Gemfile above